### Track machine learning training runs
### Log runs to an experiment

###[Banafsheh Hassani](https://www.linkedin.com/in/banafsheh-hassani-7b063a129/)

###[More Projects](https://github.com/BanafshehHassani)

[source](https://docs.databricks.com/_static/notebooks/mlflow/mlflow-log-runs.htmlc)

#This notebook includes
- Create a Random Forest model on a simple dataset  
- Uses the MLflow Tracking API to log the model  
- Selected model parameters and metrics

#Install mlflow

In [0]:
pip install mlflow 

#Train and test dataset which Import from scikit-learn

In [0]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
db = load_diabetes()
X = db.data
y = db.target
X_train, X_test, y_train, y_test = train_test_split(X, y)

# In this run the experiment_id or the experiment_name parameter is are not provided. 
* MLflow creates a notebook experiment and logs runs to it by default.

* Create and train model
* Make predictions
* Log parameters
* Log model
* Create metrics
* Log metrics

In [0]:
import mlflow
import mlflow.sklearn
import numpy as np
import pandas as pd
import sklearn.datasets
import sklearn.metrics
import sklearn.model_selection
import sklearn.ensemble
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

* Note: A notebook experiment is associated with a specific notebook. Databricks creates a notebook experiment by default when a run is started using `mlflow.start_run()` and there is no active experiment.

In [0]:
with mlflow.start_run():
  n_estimators = 100
  max_depth = 6
  max_features = 3
  
  # Create and train model
  rf = RandomForestRegressor(n_estimators = n_estimators, max_depth = max_depth, max_features = max_features)
  rf.fit(X_train, y_train)
  
  # Make predictions
  predictions = rf.predict(X_test)
  
  # Log parameters
  mlflow.log_param("num_trees", n_estimators)
  mlflow.log_param("maxdepth", max_depth)
  mlflow.log_param("max_feat", max_features)
  
  # Log model
  mlflow.sklearn.log_model(rf, "random-forest-model")
  
  # Create metrics
  mse = mean_squared_error(y_test, predictions)
    
  # Log metrics
  mlflow.log_metric("mse", mse)

# log MLflow runs to a workspace experiment

* Access these runs by using the experiment name in workspace file tree.
* Create and train model
* Make predictions
* Log parameters
* Log model
* Create metrics
* Log metrics

* Note:Workspace experiments are not associated with any notebook, and any notebook can log a run to these experiments by using the experiment name or the experiment ID when initiating a run.

In [0]:
# Access these runs by using the experiment name in workspace file tree.
experiment_name = "/Shared/diabetes_experiment/"
mlflow.set_experiment(experiment_name)

with mlflow.start_run():
  n_estimators = 100
  max_depth = 6
  max_features = 3
  # Create and train model
  rf = RandomForestRegressor(n_estimators = n_estimators, max_depth = max_depth, max_features = max_features)
  rf.fit(X_train, y_train)
  # Make predictions
  predictions = rf.predict(X_test)
  
  # Log parameters
  mlflow.log_param("num_trees", n_estimators)
  mlflow.log_param("maxdepth", max_depth)
  mlflow.log_param("max_feat", max_features)
  
  # Log model
  mlflow.sklearn.log_model(rf, "random-forest-model")
  
  # Create metrics
  mse = mean_squared_error(y_test, predictions)
    
  # Log metrics
  mlflow.log_metric("mse", mse)