# MLflow Diabetes regression- tracking example for Rahti

This notebook quickly demonstrates how to use MLflow application run in CSC Rahti container cloud to track machine learning training metrics.

After you have set up your MLflow application to Rahti, you can use this notebook to test it. First set up environment variables needed to connect Tracking server. After that, run all python codes and go check results in your Tracking server web ui. 

The model uses scikit-learns Random Forest Regressor. More information here: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

For more information see original tutorial and other documents here: https://mlflow.org/docs/latest/quickstart.html


---

## Step 1: set up variables 
For Conda based environment, fill the variables and run the cell below once:

In [None]:
!conda env config vars set MLFLOW_TRACKING_URI=https://<YOUR_APP_NAME>.rahtiapp.fi
!conda env config vars set MLFLOW_TRACKING_USERNAME=your_username
!conda env config vars set MLFLOW_TRACKING_PASSWORD=your_password

!conda env config vars set MLFLOW_S3_ENDPOINT_URL=https://<YOUR_APP_NAME>-minio.rahtiapp.fi
!conda env config vars set AWS_ACCESS_KEY_ID=your_generated_access_key
!conda env config vars set AWS_SECRET_ACCESS_KEY=your_generated_secret_key

To make your changes take effect please reactivate your Conda environment. After that you can check everything is as it should with command:

In [None]:
!conda env config vars list

## Step 2: import libraries
This step requires that you have the packages installed in your environment before importing.

In [None]:
import mlflow
import boto3

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor

## Step 3: run the model and store results to MLflow

In [None]:
# Create new experiment, under which the runs are saved for 
experiment_id = mlflow.set_experiment('diabetes_dataset')
print(experiment_id)

In [None]:
with mlflow.start_run():

    # Load and split data 
    db = load_diabetes()
    X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)
    
    # Set the parameters, change to see different results
    n_estimators = 100
    max_depth = 6
    max_features = 3
    
    # Create and train model
    rf = RandomForestRegressor(n_estimators=n_estimators, max_depth=max_depth, max_features=max_features)
    rf.fit(X_train, y_train)
    
    # Use the model to make predictions on the test dataset
    predictions = rf.predict(X_test)
    print(predictions)
    
    mlflow.sklearn.log_model(rf, "diabetes-model")

Run metrics should appear in your MLflow Tracking server --> `https://<YOUR_APP_NAME>.rahtiapp.fi`