# MLflow Diabetes regression- tracking example for Rahti

This notebook quickly demonstrates how to use MLflow application run in CSC Rahti container cloud to track machine learning training metrics.

After you have set up your MLflow application to Rahti, you can use this notebook to test it. First set up environment variables needed to connect Tracking server. After that, run all python codes and go check results in your Tracking server web ui. 

The model uses scikit-learns Random Forest Regressor. More information here: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

For more information see original tutorial and other documents here: https://mlflow.org/docs/latest/quickstart.html


---

## Step 1: set up variables 
For Conda based environment, fill the variables and run the cell below once:

In [None]:
!conda env config vars set MLFLOW_TRACKING_URI=https://<YOUR_APP_NAME>.rahtiapp.fi
!conda env config vars set MLFLOW_TRACKING_USERNAME=your_username
!conda env config vars set MLFLOW_TRACKING_PASSWORD=your_password

!conda env config vars set MLFLOW_S3_ENDPOINT_URL=https://<YOUR_APP_NAME>-minio.rahtiapp.fi
!conda env config vars set AWS_ACCESS_KEY_ID=your_generated_access_key
!conda env config vars set AWS_SECRET_ACCESS_KEY=your_generated_secret_key

To make your changes take effect please reactivate your Conda environment. After that you can check everything is as it should with command:

In [27]:
!conda env config vars list

MLFLOW_TRACKING_URI = https://mlflow-example.rahtiapp.fi/
MLFLOW_TRACKING_USERNAME = test
MLFLOW_TRACKING_PASSWORD = test
MLFLOW_S3_ENDPOINT_URL = https://mlflow-example-minio.rahtiapp.fi/
AWS_ACCESS_KEY_ID = 6QSN0RF3HDF4EWE
AWS_SECRET_ACCESS_KEY = 30A7NQNKDVTAHJV


## Step 2: import libraries
This step requires that you have the packages installed in your environment before importing.

In [28]:
import mlflow
import boto3

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor

## Step 3: run the model and store results to MLflow

In [29]:
# Create new experiment, under which the runs are saved for 
experiment_id = mlflow.set_experiment('diabetes_dataset')
print(experiment_id)

<Experiment: artifact_location='s3://default/4', creation_time=1690284838068, experiment_id='4', last_update_time=1690284838068, lifecycle_stage='active', name='diabetes_dataset', tags={}>


In [30]:
with mlflow.start_run():

    # Load and split data 
    db = load_diabetes()
    X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)
    
    # Set the parameters, change to see different results
    n_estimators = 100
    max_depth = 6
    max_features = 3
    
    # Create and train model
    rf = RandomForestRegressor(n_estimators=n_estimators, max_depth=max_depth, max_features=max_features)
    rf.fit(X_train, y_train)
    
    # Use the model to make predictions on the test dataset
    predictions = rf.predict(X_test)
    print(predictions)
    
    mlflow.sklearn.log_model(rf, "diabetes-model")



[108.16931411  99.49467906 189.53190614 146.59192162 159.92766597
 105.20423274 151.64472005 219.61144337 157.87936655 241.42520979
 145.25654901 157.75019007 159.96008273 192.82223993 107.33998598
 202.83046635 188.26101874 127.91090203 153.60140425 232.19902093
 227.5053228  162.86470146 189.21076445 181.83154091 264.34250689
 238.30675831 184.01923866 169.78031593 103.30753202 156.44570199
 174.61939788 105.32922029 186.48059657 113.057587    95.25539723
  89.35620926  95.96793403  98.97175417 102.12731024 161.28100174
  89.38576064 106.78702307 149.97180987  95.16899347 181.09151932
 114.59090179 175.85658494 216.34926917 246.05716653 143.1099221
 230.78466145 159.46448426 100.21456059 150.57236045 154.90232866
 142.21261974 164.24994669  93.31367334 199.17452982 169.0146804
 258.20572902 191.71389627 279.90122484 188.90214455 131.07634091
  95.40935648 232.94406379  87.19946333 215.59647699 182.45286516
 135.21900938 242.76574163  91.64654221 256.06197565 205.84137287
 195.2958274

Run metrics should appear in your MLflow Tracking server --> `https://<YOUR_APP_NAME>.rahtiapp.fi`