# Models Experiments

In this notebook, we will log and track the hyperparameters and scores of multiple model runs with the help of MLFlow.

<font color=red>Disclaimer:</font> note that this is a showcase of a MLFlow workflow. It's not the actual deployment of the models. You can run this notebook locally if you want, but you'll have to setup a local MLFlow server by following the instructions in: https://mlflow.org/docs/latest/getting-started/logging-first-model/step1-tracking-server.html.

# 0 Setup 

## 0.1 Imports

In [1]:
import numpy as np
import pandas as pd

# Scikit Learning libraries
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn import tree
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier

# MLFlow
import mlflow
from mlflow.models import infer_signature

## 0.2 Path Definition

In [2]:
HOME_PATH = '~/Documents/Data Science Projects/stars'
INTERIM_DATA_PATH = '/data/interim/'

## 0.3 Loading Data

The 'stars' notebook must be run before loading the data here.

In [3]:
stars_final_df = pd.read_csv(HOME_PATH + INTERIM_DATA_PATH + 'selected_features_df_final.csv', index_col = 0)
stars_final_df[['Star type', 'Temperature (K)']] = stars_final_df[['Star type', 'Temperature (K)']].astype(float)

# 1 Models

In this section, we setup the models that we'll track later with MLFlow.

In [4]:
X = stars_final_df .drop('Star type', axis = 1)
y = stars_final_df ['Star type']

## 1.1 CART

In [5]:
def cart_run():  
    
    # setting up the model. we use 4 maximum depth, with no minimum leaf size
    cart_params = {'max_depth': 4}
    cart_model = tree.DecisionTreeClassifier(**cart_params)
    
    # fitting the model to the training data
    cart_classifier = cart_model.fit(X_train, y_train)
    
    # getting model predictions over the test set
    cart_pred_labels_te = cart_model.predict(X_test)
    
    # model perfomance report
    cart_accuracy = accuracy_score(y_test, cart_pred_labels_te)
    cart_precision = precision_score(y_test, cart_pred_labels_te, average = 'weighted', zero_division = 0)
    cart_recall = recall_score(y_test, cart_pred_labels_te, average = 'weighted', zero_division = 0)
    cart_f1 = f1_score(y_test, cart_pred_labels_te, average = 'weighted', zero_division = 0)
    
    cart_metrics = {"accuracy": cart_accuracy,
                   "precision": cart_precision,
                   "recall": cart_recall,
                   "f1": cart_f1}

    return cart_model, cart_params, cart_metrics

# 2 Model Logging and Tracking

In [6]:
# setting client and global server reference

client = mlflow.MlflowClient(tracking_uri="http://127.0.0.1:8080")
mlflow.set_tracking_uri(uri="http://127.0.0.1:8080")

First, we will set and create the experiment for our dataset. Currently, the below cell is marked as raw (it doesn't have to run after first time). If you are running this notebook for the first time, set it as a code cell.

## 2.1 CART

In [7]:
# set the current experiment and return metadata
star_type_experiment = mlflow.set_experiment("Star_Type_Models")

# set the name of the current run
run_name = "cart_stars_test"

# set the artefacts path
artifact_path = "cart_stars"

In [8]:
# setting the training split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# run the experiment
cart_model, cart_params, cart_metrics = cart_run()

# Initiate the MLflow run context
with mlflow.start_run(run_name = run_name) as run:
    # log the parameters used for the model fit
    mlflow.log_params(cart_params)

    # log the error metrics that were calculated during validation
    mlflow.log_metrics(cart_metrics)

    # log an instance of the trained model for later use
    cart_model_info = mlflow.sklearn.log_model(
        sk_model=cart_model, input_example=X_test, artifact_path=artifact_path
    )

### 2.1.1 Loading Model and Making Predictions

As the title suggests, here we load the just trained model, and use it to make predictions on our dataset. The actual and predicted classes are shown in their respective columns.

In [14]:
# loading the just trained model
cart_loaded_model = mlflow.pyfunc.load_model(cart_model_info.model_uri)

# using it to make predictions on the test set
cart_predictions = cart_loaded_model.predict(X_test)

# setting the test df
cart_result = pd.DataFrame(X_test)
cart_result["actual_class"] = y_test
cart_result["predicted_class"] = cart_predictions

# showcasing the first n=10 rows of the test df. change the number of rows at will.
cart_result[:10]

Downloading artifacts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 264.53it/s]


Unnamed: 0,Temperature (K),Luminosity(L/Lo),Radius(R/Ro),Absolute magnitude(Mv),Star color encoded,actual_class,predicted_class
109,33421.0,352000.0,67.0,-5.79,0.0,4.0,3.0
71,3607.0,0.022,0.38,10.12,5.0,1.0,1.0
37,6380.0,1.35,0.98,2.93,9.0,3.0,3.0
74,3550.0,0.004,0.291,10.89,5.0,1.0,1.0
108,24345.0,142000.0,57.0,-6.24,0.0,4.0,3.0
227,10930.0,783930.0,25.0,-6.224,0.0,4.0,3.0
156,26140.0,14520.0,5.49,-3.8,1.0,3.0,3.0
220,23678.0,244290.0,35.0,-6.27,0.0,4.0,3.0
152,14060.0,1092.0,5.745,-2.04,1.0,3.0,3.0
194,3523.0,0.0054,0.319,12.43,5.0,1.0,1.0


## 2.2 Random Forest