# 1. Running and tracking machine learning experiments

## 1.0. The data we use: Palmer Pinguins

Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network. It provides a great dataset for data exploration & visualization, as an alternative to iris.

We will use this dataset in classification setting to predict the penguins’ species from anatomical information.

Each penguin is from one of the three following species: Adelie, Gentoo, and Chinstrap.

![palmer penguins](../resources/palmer_penguins.png "Palmer Penguins")


This problem is a classification problem since the target is categorical. We will use features based on penguins’ culmen measurement.

![Culmen features](../resources/culmen_depth.png)

## 1.1. Load and prepare the data

In [1]:
import pandas as pd

culmen_columns = ["Culmen Length (mm)", "Culmen Depth (mm)"]
target_column = "Species"

data_path = "../data/penguins_classification.csv"
data = pd.read_csv(data_path)

data.sample(5)

Unnamed: 0,Culmen Length (mm),Culmen Depth (mm),Species
280,46.1,18.2,Chinstrap
45,41.1,19.0,Adelie
309,47.5,16.8,Chinstrap
15,38.7,19.0,Adelie
91,34.0,17.1,Adelie


In [2]:
from sklearn.model_selection import train_test_split

data, target = data[culmen_columns], data[target_column]
data_train, data_test, target_train, target_test = train_test_split(
    data, target, random_state=0)

## 1.2. The modelling: Decision Tree

In [3]:
from sklearn.tree import DecisionTreeClassifier

tree = DecisionTreeClassifier(max_depth=3, max_leaf_nodes=4)
tree.fit(data_train, target_train)

In [4]:
test_score = tree.score(data_test, target_test)
print(f"Accuracy of the DecisionTreeClassifier: {test_score:.1%}")

Accuracy of the DecisionTreeClassifier: 96.5%


## 1.3. MLflow Stores

### Tracking stores

MLflow supports two types of backend stores: file store and database-backed store.

- Local file path (specified as file:/my/local/dir), where data is just directly stored locally. Defaults to `mlruns/`.
- Database encoded as +://:<password>@:/. Mlflow supports the dialects mysql, mssql, sqlite, and postgresql. For more details, see SQLAlchemy database uri.
- HTTP server (specified as https://my-server:5000), which is a server hosting an MLFlow tracking server.
- Databricks workspace (specified as databricks or as databricks://, a Databricks CLI profile.)

### Artifact stores
Where you store models, plots and other stuff.

- Amazon S3
- Azure Blob Storage
- Google Cloud Storage
- FTP server
- SFTP Server
- NFS
- HDFS

## 1.4. Setup and configure the tracking server

We want to use a database as a tracking store and a local directory as artifact store.
Using a database is required for later steps in the tutorial, like managing deployments. In this case, artifacts are stored under the local ./mlruns directory, and MLflow entities are inserted in a SQLite database file mlruns.db.


![tracking setup](../resources/tracking_setup.png)


Run to start the tracking server:

```mlflow server --backend-store-uri sqlite:///mflow.db --default-artifact-root mlruns/ --host 0.0.0.0 --port 5001```

in a terminal. Make sure you run it inside the virtual environment! Put `pipenv run` in front of it.

Now you should be able to see the Tracking UI in a browser at `http://0.0.0.0:5001`.

In [5]:
import mlflow
import mlflow.sklearn

In [6]:
remote_server_uri = "http://0.0.0.0:5001"   # set to your server URI
mlflow.set_tracking_uri(remote_server_uri)  # or set the MLFLOW_TRACKING_URI in the env

In [7]:
mlflow.tracking.get_tracking_uri()

'http://0.0.0.0:5001'

### Create a new experiment

In [8]:
exp_name = "penguin_classification"
mlflow.create_experiment(exp_name)

'249912329099373127'

## 1.5. Add tracking code to the machine learning experiment above

Basic things to track:
- Parameters: Key-value input parameters: `mlflow.log_param, mlflow.log_params`
- Metrics: Key-value metrics, where the value is numeric (can be updated over the run): `mlflow.log_metric, mlflow.log_metrics`

Recall the following code from above:

```python
# Load dataset
culmen_columns = ["Culmen Length (mm)", "Culmen Depth (mm)"]
target_column = "Species"

data_path = "../data/penguins_classification.csv"
data = pd.read_csv(data_path)

# Prepare a train-test-split
data, target = data[culmen_columns], data[target_column]
data_train, data_test, target_train, target_test = train_test_split(
    data, target, random_state=0)

# Initialize and fit a classifier
tree = DecisionTreeClassifier(max_depth=3, max_leaf_nodes=4)
tree.fit(data_train, target_train)

# Calculate test scores
test_score = tree.score(data_test, target_test)
print(f"Accuracy of the DecisionTreeClassifier: {test_score:.1%}")
```


Now we add the required code to track the experiment with MLflow.

In [11]:
mlflow.set_experiment(exp_name)  # <-- set the experiment we want to track to
with mlflow.start_run() as run:         # <-- start a run of the experiment
    print(f"Started run {run.info.run_id}")
    # Load dataset
    print("Load dataset...")
    culmen_columns = ["Culmen Length (mm)", "Culmen Depth (mm)"]
    target_column = "Species"

    data_path = "../data/penguins_classification.csv"
    data = pd.read_csv(data_path)

    # Prepare a train-test-split
    print("Prepare a train-test-split...")
    data, target = data[culmen_columns], data[target_column]
    data_train, data_test, target_train, target_test = train_test_split(
        data, target, random_state=0)

    # Initialize and fit a classifier
    max_depth = 6
    max_leaf_nodes = 9
    print(f"Initialize and fit a DecisionTreeClassifier with max_depth={max_depth}, max_leaf_nodes{max_leaf_nodes}")
    
    mlflow.log_params(            # <-- Track parameters
        {"max_depth": max_depth, 
         "max_leaf_nodes": max_leaf_nodes}
    )
    tree = DecisionTreeClassifier(
        max_depth=max_depth,
        max_leaf_nodes=max_leaf_nodes
    )
    tree.fit(data_train, target_train)

    # Calculate test scores
    test_score = tree.score(data_test, target_test)
    mlflow.log_metric("test_accuracy", test_score)   # <-- Track metrics
    print(f"Result: Accuracy of the DecisionTreeClassifier: {test_score:.1%}")

Started run 61b70ff88bc346ac98c6028fc3c3d1b3
Load dataset...
Prepare a train-test-split...
Initialize and fit a DecisionTreeClassifier with max_depth=6, max_leaf_nodes9
Result: Accuracy of the DecisionTreeClassifier: 96.5%
🏃 View run fun-flea-283 at: http://0.0.0.0:5001/#/experiments/249912329099373127/runs/61b70ff88bc346ac98c6028fc3c3d1b3
🧪 View experiment at: http://0.0.0.0:5001/#/experiments/249912329099373127


Have a look at the tracking UI to see how it played out!

## 1.6. Exercise: track some more stuff

What else could we want to track?

Examples:  
- Code Version: Git commit hash used for the run (if it was run from an MLflow Project)
- Start & End Time: Start and end time of the run
- Source: what code run?
- Plots.
- Properties of the input data
- Model artifacts
- ...

### Exercises: 
- Track the properties of the input data as parameters. (Hint: `data.shape[0]` gives the number of samples in a dataframe)
- Track the notebook 'source code'. (Hint: `mlflow.log_artifact()` takes a path and copies the file to the artifact store.)

In [13]:
# EXERCISE:

mlflow.set_experiment(exp_name)  # <-- set the experiment we want to track to
with mlflow.start_run() as run:         # <-- start a run of the experiment
    print(f"Started run {run.info.run_id}")
    # Load dataset
    print("Load dataset...")
    culmen_columns = ["Culmen Length (mm)", "Culmen Depth (mm)"]
    target_column = "Species"

    data_path = "../data/penguins_classification.csv"
    data = pd.read_csv(data_path)

    # Prepare a train-test-split
    print("Prepare a train-test-split...")
    data, target = data[culmen_columns], data[target_column]
    data_train, data_test, target_train, target_test = train_test_split(
        data, target, random_state=0)

    # Initialize and fit a classifier
    max_depth = 3
    max_leaf_nodes = 4
    print(f"Initialize and fit a DecisionTreeClassifier with max_depth={max_depth}, max_leaf_nodes{max_leaf_nodes}")
    
    mlflow.log_params(            # <-- Track parameters
        {"max_depth": max_depth, 
         "max_leaf_nodes": max_leaf_nodes}
    )
    tree = DecisionTreeClassifier(
        max_depth=max_depth,
        max_leaf_nodes=max_leaf_nodes
    )
    tree.fit(data_train, target_train)

    # Calculate test scores
    test_score = tree.score(data_test, target_test)
    mlflow.log_metric("test_accuracy", test_score)   # <-- Track metrics
    print(f"Result: Accuracy of the DecisionTreeClassifier: {test_score:.1%}")

Started run ba8e883119a64333ad9d956572df03c1
Load dataset...
Prepare a train-test-split...
Initialize and fit a DecisionTreeClassifier with max_depth=3, max_leaf_nodes4
Result: Accuracy of the DecisionTreeClassifier: 96.5%
🏃 View run clean-snipe-282 at: http://0.0.0.0:5001/#/experiments/249912329099373127/runs/ba8e883119a64333ad9d956572df03c1
🧪 View experiment at: http://0.0.0.0:5001/#/experiments/249912329099373127


In [None]:
# POSSIBLE SOLUTION:

mlflow.set_experiment(exp_name)
with mlflow.start_run() as run:
    print(f"Started run {run.info.run_id}")
    # Load dataset
    print("Load dataset...")
    culmen_columns = ["Culmen Length (mm)", "Culmen Depth (mm)"]
    target_column = "Species"

    data_path = "../data/penguins_classification.csv"
    data = pd.read_csv(data_path)
    mlflow.log_param("num_samples", data.shape[0])  # <-- ADDED: track the number of samples in the dataset

    # Prepare a train-test-split
    print("Prepare a train-test-split...")
    data, target = data[culmen_columns], data[target_column]
    data_train, data_test, target_train, target_test = train_test_split(
        data, target, random_state=0)

    # Initialize and fit a classifier
    max_depth = 3
    max_leaf_nodes = 4
    print(f"Initialize and fit a DecisionTreeClassifier with max_depth={max_depth}, max_leaf_nodes{max_leaf_nodes}")
    
    mlflow.log_params(
        {"max_depth": max_depth, 
         "max_leaf_nodes": max_leaf_nodes}
    )
    tree = DecisionTreeClassifier(
        max_depth=max_depth,
        max_leaf_nodes=max_leaf_nodes
    )
    tree.fit(data_train, target_train)

    # Calculate test scores
    test_score = tree.score(data_test, target_test)
    mlflow.log_metric("test_accuracy", test_score)
    
    mlflow.log_text("Here you can add general inforamtion about the run","run_info.txt")
    mlflow.log_artifact("1_Run and track experiments.ipynb")  # <-- ADDED: track the source code of the notebook
    print(f"Result: Accuracy of the DecisionTreeClassifier: {test_score:.1%}")

## 1.7. Log the model

We want to store the model artifacts to reuse it for deployment or later experimentation.
Since we used a scikit-learn model here, we can use the build-in module to store the model in sklearn format. 

```mlflow.sklearn.log_model(tree, "model")```

There are buildin modules for all kind of types of models, as well as the possibility to specify a custom format. Even autologging is available!

In [14]:
mlflow.set_experiment(exp_name)  # <-- set the experiment we want to track to
with mlflow.start_run() as run:         # <-- start a run of the experiment
    print(f"Started run {run.info.run_id}")
    # Load dataset
    print("Load dataset...")
    culmen_columns = ["Culmen Length (mm)", "Culmen Depth (mm)"]
    target_column = "Species"

    data_path = "../data/penguins_classification.csv"
    data = pd.read_csv(data_path)
    mlflow.log_param("num_samples", data.shape[0])  # <-- track the number of samples in the dataset

    # Prepare a train-test-split
    print("Prepare a train-test-split...")
    data, target = data[culmen_columns], data[target_column]
    data_train, data_test, target_train, target_test = train_test_split(
        data, target, random_state=0)

    # Initialize and fit a classifier
    max_depth = 10
    max_leaf_nodes = 15
    print(f"Initialize and fit a DecisionTreeClassifier with max_depth={max_depth}, max_leaf_nodes{max_leaf_nodes}")
    
    mlflow.log_params(            # <-- Track parameters
        {"max_depth": max_depth, 
         "max_leaf_nodes": max_leaf_nodes}
    )
    tree = DecisionTreeClassifier(
        max_depth=max_depth,
        max_leaf_nodes=max_leaf_nodes
    )
    tree.fit(data_train, target_train)

    # Calculate test scores
    test_score = tree.score(data_test, target_test)
    mlflow.log_metric("test_accuracy", test_score)   # <-- Track metrics
    print(f"Result: Accuracy of the DecisionTreeClassifier: {test_score:.1%}")
    
    # Log the model
    mlflow.sklearn.log_model(tree, "model", extra_pip_requirements=["mlflow==1.*"])  # <-- Log the model
    mlflow.log_artifact("1_Run and track experiments.ipynb")  # <-- track the source code of the notebook

Started run 1b96063c96c0499cbfa57cd4749bf278
Load dataset...
Prepare a train-test-split...
Initialize and fit a DecisionTreeClassifier with max_depth=10, max_leaf_nodes15
Result: Accuracy of the DecisionTreeClassifier: 100.0%


 - mlflow (current: 2.22.0, required: mlflow==1.*)
To fix the mismatches, call `mlflow.pyfunc.get_model_dependencies(model_uri)` to fetch the model's environment and install dependencies using the resulting environment file.


🏃 View run unruly-tern-438 at: http://0.0.0.0:5001/#/experiments/249912329099373127/runs/1b96063c96c0499cbfa57cd4749bf278
🧪 View experiment at: http://0.0.0.0:5001/#/experiments/249912329099373127


In [None]:
# Exercice: Run the experiment (in the above cell) with different model parameters.

## 1.8. Compare models in the UI.

## 1.9. Optional: Add a signature to the model

Define what the model expects and enforce later in deployment.

In [15]:
from mlflow.models.signature import ModelSignature
from mlflow.types.schema import Schema, ColSpec

input_schema = Schema([
  ColSpec("double", "Culmen Length (mm)"),
  ColSpec("double", "Culmen Depth (mm)"),
])
output_schema = Schema([ColSpec("string", "Species")])

signature = ModelSignature(inputs=input_schema, outputs=output_schema)

In [16]:
mlflow.set_experiment(exp_name)
with mlflow.start_run() as run:
    print(f"Started run {run.info.run_id}")
    # Load dataset
    print("Load dataset...")
    culmen_columns = ["Culmen Length (mm)", "Culmen Depth (mm)"]
    target_column = "Species"

    data_path = "../data/penguins_classification.csv"
    data = pd.read_csv(data_path)
    mlflow.log_param("num_samples", data.shape[0])

    # Prepare a train-test-split
    print("Prepare a train-test-split...")
    data, target = data[culmen_columns], data[target_column]
    data_train, data_test, target_train, target_test = train_test_split(
        data, target, random_state=0)

    # Initialize and fit a classifier
    max_depth = 10
    max_leaf_nodes = 15
    print(f"Initialize and fit a DecisionTreeClassifier with max_depth={max_depth}, max_leaf_nodes{max_leaf_nodes}")
    
    mlflow.log_params(
        {"max_depth": max_depth, 
         "max_leaf_nodes": max_leaf_nodes}
    )
    tree = DecisionTreeClassifier(
        max_depth=max_depth,
        max_leaf_nodes=max_leaf_nodes
    )
    tree.fit(data_train, target_train)

    # Calculate test scores
    test_score = tree.score(data_test, target_test)
    mlflow.log_metric("test_accuracy", test_score)
    print(f"Result: Accuracy of the DecisionTreeClassifier: {test_score:.1%}")
    # Log the model
    mlflow.sklearn.log_model(tree, "model", signature=signature, extra_pip_requirements=["mlflow==1.*"])  # <-- Now log the model with a signature
    mlflow.log_artifact("1_Run and track experiments.ipynb")

Started run 43fb4afe1a9e4167aa86b5c51aa6f72e
Load dataset...
Prepare a train-test-split...
Initialize and fit a DecisionTreeClassifier with max_depth=10, max_leaf_nodes15
Result: Accuracy of the DecisionTreeClassifier: 100.0%


 - mlflow (current: 2.22.0, required: mlflow==1.*)
To fix the mismatches, call `mlflow.pyfunc.get_model_dependencies(model_uri)` to fetch the model's environment and install dependencies using the resulting environment file.


🏃 View run kindly-horse-408 at: http://0.0.0.0:5001/#/experiments/249912329099373127/runs/43fb4afe1a9e4167aa86b5c51aa6f72e
🧪 View experiment at: http://0.0.0.0:5001/#/experiments/249912329099373127


## 1.10. Use the model to predict locally

Now we pick a model, retrieve it and make a prediction.

In [19]:
logged_model = 'runs:/43fb4afe1a9e4167aa86b5c51aa6f72e/model'

# Load model as a Sklearn Model.
loaded_model = mlflow.sklearn.load_model(logged_model)

# Predict on a Pandas DataFrame.
loaded_model.predict(pd.DataFrame(data_test))

array(['Adelie', 'Chinstrap', 'Adelie', 'Chinstrap', 'Adelie',
       'Chinstrap', 'Gentoo', 'Adelie', 'Gentoo', 'Adelie', 'Adelie',
       'Gentoo', 'Gentoo', 'Adelie', 'Adelie', 'Gentoo', 'Gentoo',
       'Gentoo', 'Chinstrap', 'Adelie', 'Chinstrap', 'Adelie',
       'Chinstrap', 'Gentoo', 'Adelie', 'Adelie', 'Gentoo', 'Chinstrap',
       'Gentoo', 'Gentoo', 'Adelie', 'Adelie', 'Adelie', 'Gentoo',
       'Gentoo', 'Adelie', 'Chinstrap', 'Gentoo', 'Chinstrap', 'Adelie',
       'Adelie', 'Adelie', 'Adelie', 'Gentoo', 'Adelie', 'Adelie',
       'Chinstrap', 'Gentoo', 'Gentoo', 'Chinstrap', 'Adelie', 'Adelie',
       'Adelie', 'Adelie', 'Chinstrap', 'Adelie', 'Gentoo', 'Adelie',
       'Adelie', 'Chinstrap', 'Adelie', 'Adelie', 'Adelie', 'Gentoo',
       'Adelie', 'Adelie', 'Adelie', 'Gentoo', 'Adelie', 'Gentoo',
       'Adelie', 'Chinstrap', 'Adelie', 'Adelie', 'Gentoo', 'Adelie',
       'Chinstrap', 'Adelie', 'Adelie', 'Gentoo', 'Adelie', 'Gentoo',
       'Gentoo', 'Adelie', 'Gentoo', 

In [21]:
import mlflow.pyfunc

model_name = "penguine_classification"
model_version = 1
data = [
    {"Culmen Length (mm)": 1,"Culmen Depth (mm)": 3},
    {"Culmen Length (mm)": 14,"Culmen Depth (mm)": 120}
]
model = mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}/{model_version}")

model.predict(pd.DataFrame(data_test))

 - mlflow (current: 2.22.0, required: mlflow==1.*)
To fix the mismatches, call `mlflow.pyfunc.get_model_dependencies(model_uri)` to fetch the model's environment and install dependencies using the resulting environment file.


array(['Adelie', 'Chinstrap', 'Adelie', 'Chinstrap', 'Adelie',
       'Chinstrap', 'Gentoo', 'Adelie', 'Gentoo', 'Adelie', 'Adelie',
       'Gentoo', 'Gentoo', 'Adelie', 'Adelie', 'Gentoo', 'Gentoo',
       'Gentoo', 'Chinstrap', 'Adelie', 'Chinstrap', 'Adelie',
       'Chinstrap', 'Gentoo', 'Adelie', 'Adelie', 'Gentoo', 'Chinstrap',
       'Gentoo', 'Gentoo', 'Adelie', 'Adelie', 'Adelie', 'Gentoo',
       'Gentoo', 'Adelie', 'Chinstrap', 'Gentoo', 'Chinstrap', 'Adelie',
       'Adelie', 'Adelie', 'Adelie', 'Gentoo', 'Adelie', 'Adelie',
       'Chinstrap', 'Gentoo', 'Gentoo', 'Chinstrap', 'Adelie', 'Adelie',
       'Adelie', 'Adelie', 'Chinstrap', 'Adelie', 'Gentoo', 'Adelie',
       'Adelie', 'Chinstrap', 'Adelie', 'Adelie', 'Adelie', 'Gentoo',
       'Adelie', 'Adelie', 'Adelie', 'Gentoo', 'Adelie', 'Gentoo',
       'Adelie', 'Chinstrap', 'Adelie', 'Adelie', 'Gentoo', 'Adelie',
       'Chinstrap', 'Adelie', 'Adelie', 'Gentoo', 'Adelie', 'Gentoo',
       'Gentoo', 'Adelie', 'Gentoo', 