# Fundamentals of Machine Learning - Exercise 12
Goal of the excercise is to learn how to save trained models and use selected advanced libraries like Plotly or Optuna.


![meme01](https://github.com/rasvob/VSB-FEI-Fundamentals-of-Machine-Learning-Exercises/blob/master/images/fml_12_meme_01.png?raw=true)

In [1]:
# For Google Colab
!pip install optuna

Collecting optuna
  Downloading optuna-4.1.0-py3-none-any.whl.metadata (16 kB)
Collecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.14.0-py3-none-any.whl.metadata (7.4 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Collecting Mako (from alembic>=1.5.0->optuna)
  Downloading Mako-1.3.7-py3-none-any.whl.metadata (2.9 kB)
Downloading optuna-4.1.0-py3-none-any.whl (364 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m364.4/364.4 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading alembic-1.14.0-py3-none-any.whl (233 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m233.5/233.5 kB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Downloading Mako-1.3.7-py3-none-any.whl (78 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.9/78.9 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: Ma

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import optuna
import joblib

import sklearn.datasets as skd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import make_scorer, accuracy_score

# 📊 Plotly
https://plotly.com/python/getting-started/

* The plotly Python library is an interactive, open-source plotting library that supports over chart types covering a wide range of statistical, financial, geographic or scientific use-cases
* Built on top of the Plotly JavaScript library (plotly.js)
* Plotly enables Python users to create **interactive web-based visualizations** that can be displayed in Jupyter notebooks

## 📒 Here we have some examples of commonly used plots
* 💡 Express API is easy to grasp and it is very similar to Seaborn

## Scatter plot

In [3]:
df = px.data.iris()
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,species_id
0,5.1,3.5,1.4,0.2,setosa,1
1,4.9,3.0,1.4,0.2,setosa,1
2,4.7,3.2,1.3,0.2,setosa,1
3,4.6,3.1,1.5,0.2,setosa,1
4,5.0,3.6,1.4,0.2,setosa,1


In [4]:
px.scatter(df, x="sepal_width", y="sepal_length", color="species", symbol="species")

## Line plot

In [5]:
df = px.data.gapminder().query("continent == 'Oceania'")
df.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
60,Australia,Oceania,1952,69.12,8691212,10039.59564,AUS,36
61,Australia,Oceania,1957,70.33,9712569,10949.64959,AUS,36
62,Australia,Oceania,1962,70.93,10794968,12217.22686,AUS,36
63,Australia,Oceania,1967,71.1,11872264,14526.12465,AUS,36
64,Australia,Oceania,1972,71.93,13177000,16788.62948,AUS,36


In [6]:
px.line(df, x='year', y='lifeExp', color='country', markers=True)

## Bar plot

In [7]:
df = px.data.medals_long()
df.head()

Unnamed: 0,nation,medal,count
0,South Korea,gold,24
1,China,gold,10
2,Canada,gold,9
3,South Korea,silver,13
4,China,silver,15


In [8]:
px.bar(df, x="medal", y="count", color="nation", text="nation", barmode='group')

## Box plot

In [9]:
df = px.data.gapminder().query("continent == 'Oceania'")
df.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
60,Australia,Oceania,1952,69.12,8691212,10039.59564,AUS,36
61,Australia,Oceania,1957,70.33,9712569,10949.64959,AUS,36
62,Australia,Oceania,1962,70.93,10794968,12217.22686,AUS,36
63,Australia,Oceania,1967,71.1,11872264,14526.12465,AUS,36
64,Australia,Oceania,1972,71.93,13177000,16788.62948,AUS,36


In [10]:
px.box(df, x='country', color="country", y="lifeExp")

## Heatmap

In [11]:
df = px.data.iris()
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,species_id
0,5.1,3.5,1.4,0.2,setosa,1
1,4.9,3.0,1.4,0.2,setosa,1
2,4.7,3.2,1.3,0.2,setosa,1
3,4.6,3.1,1.5,0.2,setosa,1
4,5.0,3.6,1.4,0.2,setosa,1


In [12]:
df_corr = df.iloc[:, :-2].corr()
df_corr

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
sepal_length,1.0,-0.109369,0.871754,0.817954
sepal_width,-0.109369,1.0,-0.420516,-0.356544
petal_length,0.871754,-0.420516,1.0,0.962757
petal_width,0.817954,-0.356544,0.962757,1.0


In [13]:
fig = px.imshow(df_corr, text_auto=True, color_continuous_scale="blues", aspect="auto")
fig.update_xaxes(side="bottom")
fig.show()

## 📌 Parallel categories diagram
* How to read it?

In [14]:
df = pd.read_csv('https://raw.githubusercontent.com/rasvob/VSB-FEI-Fundamentals-of-Machine-Learning-Exercises/master/datasets/titanic.csv', index_col=0)
df.head()

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [15]:
px.parallel_categories(df, dimensions=['Embarked', 'Sex', 'Survived'], color="Survived", color_continuous_scale=px.colors.diverging.Spectral)

![meme02](https://github.com/rasvob/VSB-FEI-Fundamentals-of-Machine-Learning-Exercises/blob/master/images/fml_12_meme_02.jpg?raw=true)m

# 🚀 Optuna
https://optuna.org/

* An open source hyperparameter optimization framework to automate hyperparameter search
* You can use it with any machine learning or deep learning framework
    * Scikit-learn, TF2, PyTorch, Keras, ...



## ⚡ Using Optuna is very simple
* You just need to define the `objective` which will be used for each trial
* Then you define the parameter ranges through `suggest_XYZ` function and use is as a regular parameter
* After that you can start tuning the parameters

In [16]:
X, y = skd.load_iris(return_X_y=True, as_frame=True)

In [17]:
X.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [18]:
y.head()

Unnamed: 0,target
0,0
1,0
2,0
3,0
4,0


In [19]:
def objective(trial, X, y):
    n_estimators = trial.suggest_int('n_estimators', 2, 20)
    max_depth = int(trial.suggest_int('max_depth', 1, 32))
    criterion = trial.suggest_categorical('criterion', ["gini", "entropy"])
    random_state = 13

    clf = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, criterion=criterion, random_state=random_state)

    acc_scorer = make_scorer(accuracy_score)
    cv_res = cross_val_score(clf, X, y, n_jobs=-1, cv=5, scoring=acc_scorer)

    return np.mean(cv_res)

In [20]:
study = optuna.create_study(direction='maximize', storage="sqlite:///db.sqlite3", study_name="Iris-RF-Tuning")
study.optimize(lambda trial: objective(trial, X, y), n_trials=100)

trial = study.best_trial

print('Accuracy: {}'.format(trial.value))
print("Best hyperparameters: {}".format(trial.params))

[I 2024-12-05 10:30:18,078] A new study created in RDB with name: Iris-RF-Tuning
[I 2024-12-05 10:30:20,518] Trial 0 finished with value: 0.9533333333333334 and parameters: {'n_estimators': 7, 'max_depth': 24, 'criterion': 'gini'}. Best is trial 0 with value: 0.9533333333333334.
[I 2024-12-05 10:30:20,728] Trial 1 finished with value: 0.96 and parameters: {'n_estimators': 11, 'max_depth': 23, 'criterion': 'entropy'}. Best is trial 1 with value: 0.96.
[I 2024-12-05 10:30:20,899] Trial 2 finished with value: 0.8733333333333333 and parameters: {'n_estimators': 5, 'max_depth': 1, 'criterion': 'gini'}. Best is trial 1 with value: 0.96.
[I 2024-12-05 10:30:21,158] Trial 3 finished with value: 0.9533333333333334 and parameters: {'n_estimators': 20, 'max_depth': 4, 'criterion': 'gini'}. Best is trial 1 with value: 0.96.
[I 2024-12-05 10:30:21,317] Trial 4 finished with value: 0.9466666666666667 and parameters: {'n_estimators': 3, 'max_depth': 16, 'criterion': 'gini'}. Best is trial 1 with valu

Accuracy: 0.96
Best hyperparameters: {'n_estimators': 11, 'max_depth': 23, 'criterion': 'entropy'}


## 💡 Dashboard
* Logs are hard to read - it is usually better to vizualize the tuning process
* You have two options with `Optuna`
    * You can use the basic online tool https://optuna.github.io/optuna-dashboard/
    * You can run local instance of https://github.com/optuna/optuna-dashboard for more advanced usage

![meme03](https://github.com/rasvob/VSB-FEI-Fundamentals-of-Machine-Learning-Exercises/blob/master/images/fml_12_meme_03.jpg?raw=true)

# ⚡ Model deploy
* How are ML/DL models used in production?
    * Do we train it every time from scratch?
* How would you deploy the model?

## Train the model on full data with the best parameter setup

In [21]:
params = study.best_trial.params
params

{'n_estimators': 11, 'max_depth': 23, 'criterion': 'entropy'}

In [22]:
clf = RandomForestClassifier(**params, random_state=13)

In [23]:
clf.fit(X, y)

In [24]:
df_feat_imp = pd.DataFrame({'Feature': X.columns, 'Importance': clf.feature_importances_}).sort_values(by='Importance')
df_feat_imp

Unnamed: 0,Feature,Importance
1,sepal width (cm),0.017104
0,sepal length (cm),0.020913
2,petal length (cm),0.431744
3,petal width (cm),0.530239


In [25]:
px.bar(df_feat_imp, y='Feature', x='Importance', orientation='h')

In [26]:
y_pred = clf.predict(X)
accuracy_score(y_true=y, y_pred=y_pred)

1.0

# Save the model using `joblib`
* There are other alternatives
    * https://skops.readthedocs.io/en/stable/
    * https://onnx.ai/sklearn-onnx/

In [27]:
filename = 'rf_best.bin'
joblib.dump(clf, filename)

['rf_best.bin']

# 📈 Load the model from disk

In [28]:
loaded_model = joblib.load(filename)

## Check if everything works fine 🙂

In [29]:
y_pred = loaded_model.predict(X)
accuracy_score(y_true=y, y_pred=y_pred)

1.0

In [30]:
df_feat_imp = pd.DataFrame({'Feature': X.columns, 'Importance': loaded_model.feature_importances_}).sort_values(by='Importance')
df_feat_imp

Unnamed: 0,Feature,Importance
1,sepal width (cm),0.017104
0,sepal length (cm),0.020913
2,petal length (cm),0.431744
3,petal width (cm),0.530239


In [31]:
px.bar(df_feat_imp, y='Feature', x='Importance', orientation='h')

![meme04](https://github.com/rasvob/VSB-FEI-Fundamentals-of-Machine-Learning-Exercises/blob/master/images/thats_all.jpg?raw=true)