# Step-by-Step Experiment Tracking with MLflow

This notebook demonstrates how to use MLflow for tracking machine learning experiments, following the guide "Step-by-Step Guide to Experiment Tracking with MLflow for Beginners.


## Import Required Libraries
First, we need to import necessary libraries, including MLflow and Scikit-Learn components.

In [None]:
import mlflow
import mlflow.sklearn
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

## Setting the Tracking URI

To keep track of our machine learning experiments neatly, we need a designated folder structure. As a first step, we create a folder named 'mlflow' right where we're currently working. This folder will hold all the important data and results from our experiments

In [8]:
mlflow.set_tracking_uri("file:mlflow")

# Setting an Experiment

In [9]:
mlflow.set_experiment("my_first_experiment")

2024/03/19 22:44:53 INFO mlflow.tracking.fluent: Experiment with name 'my_first_experiment' does not exist. Creating a new experiment.


<Experiment: artifact_location='file:///g:/Nishi_2023/nishikant-code-repos/MLflow-Blog-series/MLFlow-Quickstart-Step-by-Step-InDevelopment/mlflow/974063106414886512', creation_time=1710868493327, experiment_id='974063106414886512', last_update_time=1710868493327, lifecycle_stage='active', name='my_first_experiment', tags={}>

## Load and Prepare the Dataset
We use the California Housing dataset for this demonstration.

In [10]:
# Load California Housing dataset
data = fetch_california_housing()
X = data.data
y = data.target

## Splitting the Dataset
The dataset is split into training and testing sets.

In [11]:
# Split the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

## Define Parameters for the Model
Setting the parameters for the Random Forest model.

In [12]:
# Parameters for Random Forest
params = {
    "n_estimators": 100,
    "max_depth": 6,
    "random_state": 42
}

## MLflow Tracking
This section demonstrates the experiment tracking capabilities of MLflow.

### Start an MLflow Run
Each experiment run is encapsulated in an MLflow run.

In [14]:
# Start an MLflow run
with mlflow.start_run():
    # Log parameters
    # Logging the parameters of the model helps in tracking and reproducing experiments.
    mlflow.log_params(params)
    
    # Train the Model
    # Here, we train a Random Forest Regressor model.
    # Train a Random Forest model
    model = RandomForestRegressor(**params)
    model.fit(X_train, y_train)

    # Predict and Calculate MSE
    # We make predictions and calculate the Mean Squared Error (MSE) as a metric.
    predictions = model.predict(X_test)
    mse = mean_squared_error(y_test, predictions)
    
    # Log Metrics
    # Logging metrics such as MSE to MLflow.
    mlflow.log_metric("mse", mse)
    
    # Log the Model
    # Saving the trained model in MLflow.
    mlflow.sklearn.log_model(model, "model")
    
    # Finish the Run
    # The run concludes automatically at the end of the `with` block.

### MLflow UI
To view the experiment results, use the MLflow UI by running `mlflow server --backend-store-uri file:mlflow ` in the command line and visiting http://127.0.0.1:5000 in a web browser.
