## What is MLOps?

It is a set of tools that we can use for development, deployment and maintenance of machine learning models in order to improve them and register them.

## Experiments and runs in MLOps

We are going to uset a diabetes df in order to log our first run in MLFlow.

In [4]:
import mlflow
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor

### MLflow Experiment Setup and Auto-Logging

This section sets up an MLflow experiment by specifying the tracking URI and experiment ID. It also enables **auto-logging**, which automatically logs parameters, metrics, and models during training, making it easier to track and visualize the experiment's progress.

- **MLFLOW_TRACKING_URI**: Specifies the location of the MLflow server (in this case, a local server at `http://localhost:5000`).
- **mlflow.set_tracking_uri**: Configures the tracking URI for the MLflow client.
- **mlflow.set_experiment**: Sets the experiment ID to ensure that all logged data is associated with the correct experiment.
- **mlflow.autolog()**: Activates auto-logging to capture key model training details automatically, such as metrics, hyperparameters, and the final model.

This setup simplifies the process of tracking and comparing different runs of the machine learning model.


In [6]:
# set the experiment id
MLFLOW_TRACKING_URI = "http://localhost:5000"
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
mlflow.set_experiment(experiment_id="932019549495243714")

# setting autolog in order to save all data
mlflow.autolog()

2025/01/15 20:50:58 INFO mlflow.tracking.fluent: Autologging successfully enabled for sklearn.


### Model Training and Prediction

We load the **diabetes dataset** and split it into training and test sets. A **RandomForestRegressor** model is then trained on the training data, and predictions are made on the test dataset.

- **load_diabetes()**: Loads the diabetes dataset, which includes features related to diabetes progression and the target variable (disease progression).
- **train_test_split**: Splits the dataset into training and test sets to evaluate model performance.
- **RandomForestRegressor**: A random forest model is created with the following hyperparameters:
  - `n_estimators=100`: The model uses 100 trees.
  - `max_depth=6`: The maximum depth of each tree is limited to 6.
  - `max_features=3`: Each tree is allowed to use up to 3 features at each split.
- **rf.fit()**: Trains the random forest model on the training data (`X_train`, `y_train`).
- **rf.predict()**: Uses the trained model to make predictions on the test data (`X_test`).

The output is a set of predictions on the test dataset.


In [7]:
db = load_diabetes()

X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

# create and train models.
rf = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3)
rf.fit(X_train, y_train)

# use the model to make predictions on the test dataset.
predictions = rf.predict(X_test)

2025/01/15 20:51:00 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '5fa6c9e30ed640f7bd202c909fb94b39', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow


2025/01/15 20:51:03 INFO mlflow.tracking._tracking_service.client: 🏃 View run rumbling-flea-894 at: http://localhost:5000/#/experiments/932019549495243714/runs/5fa6c9e30ed640f7bd202c909fb94b39.
2025/01/15 20:51:03 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://localhost:5000/#/experiments/932019549495243714.


This line of code sets the name for the MLflow experiment. An experiment is a logical container in MLflow where different runs (model training sessions) are tracked. By specifying the experiment name as **"My First Experiment"**, all subsequent runs and their results will be logged under this experiment.

- **mlflow.set_experiment("My First Experiment")**: This creates or sets the experiment with the given name, ensuring that the experiment's parameters, metrics, and models are grouped together.

This helps organize and track the progress of different machine learning models in MLflow.


In [8]:
mlflow.set_experiment("My First Experiment")

<Experiment: artifact_location='mlflow-artifacts:/932019549495243714', creation_time=1736873321345, experiment_id='932019549495243714', last_update_time=1736873321345, lifecycle_stage='active', name='My First Experiment', tags={}>

### Logging with MLflow: Tags, Parameters, and Metrics

In this section, we start an MLflow run and log various details about the model training session, including custom tags, hyperparameters, and performance metrics.

- **mlflow.start_run()**: Starts a new run within the current experiment. This run is used to track all the logged information.
- **mlflow.set_tags()**: Sets custom tags for the current run. In this case, a tag `my_tag` is assigned the value `"my_value"`. Tags can be useful for categorizing and filtering runs.
- **mlflow.log_params()**: Logs hyperparameters or configuration details. Here, the number of estimators for the model is logged as `n_estimators: 101`.
- **mlflow.log_metrics()**: Logs performance metrics. In this example, the Mean Squared Error (MSE) of the model is logged as `mse: 5`.

This setup ensures that important information about the model training process is captured and can be visualized or compared later in the MLflow interface.


In [10]:
with mlflow.start_run():

    mlflow.set_tags({"my_tag": "my_value"})

    mlflow.log_params({"n_estimators": 101})

    mlflow.log_metrics({"mse": 5})

2025/01/15 20:52:05 INFO mlflow.tracking._tracking_service.client: 🏃 View run puzzled-cow-742 at: http://localhost:5000/#/experiments/932019549495243714/runs/05e6d4fb4d4c438bb843ec4f8ca04d05.
2025/01/15 20:52:05 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://localhost:5000/#/experiments/932019549495243714.
