<div style="text-align: center;">
    <h1><strong>Training a Resnet Model</strong></h1>
    <h1><strong>Sqlite & Local Folder</strong></h1>
</div> 

# Goal of the notebook:
#### The overarching goal of this notebook is to implement a pipeline for the custom training of <strong>Resnet models from the Keras library</strong> 
#### <strong>MLflow</strong> is implemented for the tracking of training experiments and the storage of results (i.e. runs parameters and artifacts)
#### In this notebook, <strong>Sqlite</strong> and a <strong>Local Folder</strong> were used as storage solutions    

# Summary:
### 1- Import of Packages and Dependencies
### 2- Import Environment Variables
### 3- Set the parameters to get the resnet models and build the dataset
### 4- Build the datasets
### 5- Generate a trainable model
### 6- We configure MLflow
### 7- We run the training of the Model using an MLflow experiment 

# 1- Import of Packages and Dependencies

In [16]:
import os
from dotenv import load_dotenv
from datetime import datetime
from utils.build_dataset import *
from utils.build_model import *
from datetime import datetime
import mlflow
from azure.storage.blob import BlobServiceClient
import tempfile
from tensorflow.keras.callbacks import ModelCheckpoint

# 2- Import Environment Variables  

In [17]:
# Load environment variables from the .env file
load_dotenv()

# Access environment variables using os.getenv() method
# We need api_key and pai_url to connect to the API and get the data
api_key = os.getenv("API_KEY")
api_url = os.getenv("API_URL")

# 3- Set the parameters to get the resnet model and build the dataset

### Parameters settings for trainable model compilation
- Please set the model name to be used
- Can be 'ResNet101', 'ResNet101V2', 'ResNet152', 'ResNet152V2', 'ResNet50', 'ResNet50V2'
- Run the cell to obtained a ResNet model ready to be trained

In [18]:
# Uncomment the model_name you want to use

model_name = "ResNet50" 
#model_name = "ResNet50V2" 
#model_name = "ResNet101" 
#model_name = "ResNet101V2" 
#model_name = "ResNet152" 
#model_name = "ResNet152V2" 

### Parameters settings for dataset collection
- Set the start_date using "YYYY-mm-DD" format (ex: "2020-08-01")
- Set the end_date using "YYYY-mm-DD" format (ex: "2020-08-01")
- Set the labels as string or list (ex: labels = ['vine', 'grass', 'ground'] or labels = 'ground') 

In [19]:
# We set the start date and end date for the training data
start_date = "2021-05-27"
end_date = "2021-06-01"

# We set the labels (i.e. 'vine', 'grass' or 'ground') or the list of labels we want to train the model on (i.e ['vine', 'grass', 'ground'])
labels = ['vine', 'grass', 'ground']

# 4- Build the datasets 

#### The train and validation datasets are created following 4 steps:
##### 1- The urls of images are collected according to the parameters we have set (i.e. labels, start_date, end_date)
##### 2- A data_frame is create in order to map data of the samples (df_sample_map)
##### 3- From the df_sample_map, HTTP request are perform to collect images and stored them locally in the 'media' folder
##### 4- The dataframe is saved locally to be exported latter on as an artifact
##### 5- Train and validation datasets (usable in model input) are generated with respect to the Resnet model used (i.e. preprocessing)        

In [None]:
# We collect the image urls for the labels and the dates
image_urls = get_image_urls_with_multiple_labels(labels, start_date, end_date, api_key, api_url)

# We create a dataframe with the image urls and the labels
df_sample_map = create_sample_map(image_urls)

# We download the images and save them in the media folder
image_dir = 'media'
df_sample_map = download_images(df_sample_map, image_dir)

# we save the dataset as a .csv file
df_sample_map.to_csv("dataset_csv.csv")

# We create the train and validation datasets for the given model
train_dataset, val_dataset = create_train_val_datasets(df_sample_map,
                              image_dir = 'media',
                              model_name = model_name,
                              )

In [None]:
df_sample_map = pd.read_csv("dataset_csv.csv")

# We create the train and validation datasets for the given model
train_dataset, val_dataset = create_train_val_datasets(df_sample_map,
                              image_dir = 'media',
                              model_name = model_name,
                              )

# 5- Generate a trainable model

#### The strategy behind the compile_new_model function (from the built_model module) can be breakdown following these steps:
##### 1- The import of the model is performed dynamically from the tf.keras.applications
##### 2- The model weights are imported without the top layers
##### 3- Customs top layers are added with respect to the original architure and the use case (3 classes)
##### 4- The new model is compile and return

In [None]:
# We generate the trainable model
model = compile_new_model(model_name)

# 6- We configure MLflow

In [None]:
# Construct the URI for the model and set the MLflow tracking URI
tracking_uri=f"sqlite:///mlflow.db"
mlflow.set_tracking_uri(tracking_uri)

# We set the experiment name
experiment_name = "my_experiment"

# Attempt to get the experiment by name
existing_experiment = mlflow.get_experiment_by_name(experiment_name)

# We check if the experiment exists and create it if it doesn't
if existing_experiment is None:
    # If the experiment doesn't exist, create it
    experiment_id = mlflow.create_experiment(
        experiment_name,
        tags={"version": "v1", "priority": "P1"},
    )
    print(f"Experiment '{experiment_name}' created.")
else:
    # If the experiment exists, use the existing experiment
    experiment_id = existing_experiment.experiment_id
    print(f"Experiment '{experiment_name}' already exists. Using the existing experiment.")

# 7- We run the training of the Model using an MLflow experiment 

In [9]:
temp_dir = 'temporary_model_dir'

# We use a temporary directory for ModelCheckpoint
with tempfile.TemporaryDirectory() as temp_dir:
    checkpoint_filepath = f"{temp_dir}/best_model.keras"

In [10]:
# We define the ModelCheckpoint callback
model_checkpoint = ModelCheckpoint(
    filepath=checkpoint_filepath,  # Temporary location
    monitor='val_loss',             # Metric to monitor
    save_best_only=True,            # Save only the best model
    save_weights_only=False,        # Save the entire model (architecture + weights)
    mode='min',                     # 'min' for loss
    verbose=1                       # Print saving information
)

In [None]:
# We set the number of epochs
number_of_epochs = 5

# Start a new MLflow run
with mlflow.start_run(experiment_id=experiment_id) as run:
    
    # Unable autologging for the model using the keras autolog to save the model using the .keras file format
    mlflow.keras.autolog()
    
    # Log other parameters    
    mlflow.log_param("model_name", model_name)
    mlflow.log_param("labels", labels)
    mlflow.log_param("start_date", start_date)
    mlflow.log_param("end_date", end_date)
    # Log the dataset as artifact
    mlflow.log_artifact("dataset_csv.csv")
    
    # We train the model
    history = model.fit(
        train_dataset,
        validation_data=val_dataset,
        epochs=number_of_epochs,
        callbacks=[model_checkpoint])

In [None]:
# We end the run
mlflow.end_run()