# MLflow Notebook: Manual Logging, Autologging, and Model Version Control

## Introduction to MLflow

MLflow is an open-source platform designed to streamline the machine learning lifecycle, including experimentation, reproducibility, and deployment. It provides tools for tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow is highly flexible, supporting multiple machine learning frameworks like Scikit-learn, TensorFlow, and PyTorch, and integrates seamlessly with various data science workflows.

## Why Use MLflow?

MLflow simplifies the machine learning process by offering a centralized platform to:
- Track Experiments: Log parameters, metrics, and artifacts to compare different model runs.
- Reproduce Results: Ensure consistency by capturing the environment and code versions.
- Deploy Models: Facilitate deployment to production environments with standardized formats.
- Collaborate: Share experiments and models across teams for better collaboration.

This notebook demonstrates MLflow's capabilities using a telecom dataset to predict allocated bandwidth with a Linear Regression model, covering manual logging, autologging, and model version control.

### Setup

First, let's install.

In [1]:
!pip install mlflow pandas scikit-learn pyngrok -q

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.7/24.7 MB[0m [31m22.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m30.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m242.7/242.7 kB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m147.8/147.8 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.9/114.9 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.0/85.0 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m733.8/733.8 kB[0m [31m12.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m203.4/203.4 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

### Using Ngrok to access Ui

Add you token and run this code it will generate endpoint to access Mlflow Ui.

In [2]:
# Get ngrok token (Optional - for sharing your app)
# Go to https://ngrok.com and sign up for free
# Copy your token and paste it below
ngrok_token = "2yGBZnB7ngE0P19IGz2Qf2DW2EW_3Bj4VstSQQiy16UdWqKvX"  # Replace with your actual token

In [3]:
# Run Your App (With sharing - requires ngrok token)
from pyngrok import ngrok
import time
import threading

# Set your ngrok authentication token (replace ngrok_token with your actual token)
ngrok.set_auth_token(ngrok_token)

# Function to launch the Streamlit app using a system command
def run_app():
    !mlflow server --host 127.0.0.1 --port 5000

# Terminate any active ngrok tunnels before starting a new one
ngrok.kill()

# Start the Streamlit app in a separate thread so the script can continue running
app_thread = threading.Thread(target=run_app)
app_thread.start()

# Allow time for the Streamlit app to fully start before creating the tunnel
time.sleep(10)

# Create a public URL using ngrok and display it
try:
    public_url = ngrok.connect(5000)
    print("🚀 Your app is live!")
    print(f"🌐 Share this link: {public_url}")
    print("📱 Anyone can access your app with this link!")
except:
    print("⚠️ Need ngrok token for sharing. App is running locally.")

[2025-07-07 04:56:41 +0000] [549] [INFO] Starting gunicorn 23.0.0
[2025-07-07 04:56:41 +0000] [549] [INFO] Listening at: http://127.0.0.1:5000 (549)
[2025-07-07 04:56:41 +0000] [549] [INFO] Using worker: sync
[2025-07-07 04:56:41 +0000] [554] [INFO] Booting worker with pid: 554
[2025-07-07 04:56:41 +0000] [555] [INFO] Booting worker with pid: 555
[2025-07-07 04:56:41 +0000] [556] [INFO] Booting worker with pid: 556
[2025-07-07 04:56:41 +0000] [557] [INFO] Booting worker with pid: 557
🚀 Your app is live!
🌐 Share this link: NgrokTunnel: "https://3b87-34-45-82-150.ngrok-free.app" -> "http://localhost:5000"
📱 Anyone can access your app with this link!


Import the necessary libraries.

In [4]:
import mlflow
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error

## Data Preparation

The dataset contains telecom Quality of Service (QoS) metrics, including Application_Type, Signal_Strength, Latency, Required_Bandwidth, and Allocated_Bandwidth. We'll preprocess the data to make it suitable for modeling.

In [5]:
# Load Sample Dataset
data = pd.read_csv('Quality of Service 5G.csv')

# Clean and convert Signal_Strength and bandwidth columns
data['Signal_Strength'] = data['Signal_Strength'].str.replace(' dBm', '').astype(float)
data['Required_Bandwidth'] = data['Required_Bandwidth'].apply(lambda x: float(x.replace(' Mbps', '')) if 'Mbps' in x else float(x.replace(' Kbps', '')) / 1000)
data['Allocated_Bandwidth'] = data['Allocated_Bandwidth'].apply(lambda x: float(x.replace(' Mbps', '')) if 'Mbps' in x else float(x.replace(' Kbps', '')) / 1000)
data['Latency'] = data['Latency'].str.replace(' ms', '').astype(float)
data['Resource_Allocation'] = data['Resource_Allocation'].str.replace('%', '').astype(float)

# Convert categorical columns to numerical values
label_encoder = LabelEncoder()
data['Application_Type'] = label_encoder.fit_transform(data['Application_Type'])

# Define Features and Target
X = data[['Application_Type', 'Signal_Strength', 'Latency', 'Required_Bandwidth']]
y = data['Allocated_Bandwidth']

# Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Manual Logging with MLflow

Manual Logging: This involves explicitly logging parameters, metrics, and models using MLflow's API. It provides fine-grained control over what is logged, allowing customization for specific use cases.

Here, we train a Linear Regression model and manually log parameters and metrics.

In [6]:
mlflow.set_tracking_uri('http://localhost:5000')

# Start an MLflow run
with mlflow.start_run(run_name="Linear_Regression_Manual") as run:
    # Log Parameters
    fit_intercept = True
    mlflow.log_param("fit_intercept", fit_intercept)
    mlflow.log_param("random_state", 42)

    # Train Model
    model = LinearRegression(fit_intercept=fit_intercept)
    model.fit(X_train, y_train)

    # Make Predictions
    predictions = model.predict(X_test)
    mse = mean_squared_error(y_test, predictions)

    # Log Metrics
    mlflow.log_metric("mean_squared_error", mse)

    # Log Model
    mlflow.sklearn.log_model(model, "linear_regression_model")

    manual_run_id = run.info.run_id

    # Print Run ID
    print(f"Run ID: {run.info.run_id}")



Run ID: a12a4b3ca218433ca9db37433198eab0
🏃 View run Linear_Regression_Manual at: http://localhost:5000/#/experiments/0/runs/a12a4b3ca218433ca9db37433198eab0
🧪 View experiment at: http://localhost:5000/#/experiments/0


Explanation:
- mlflow.start_run(): Initiates a new MLflow run to track the experiment.
- mlflow.log_param(): Logs model parameters like fit_intercept.
- mlflow.log_metric(): Logs evaluation metrics like Mean Squared Error (MSE).
- mlflow.sklearn.log_model(): Saves the trained model as an artifact.

## Autologging with MLflow

Autologging: MLflow's autologging feature automatically logs parameters, metrics, and models for supported libraries like Scikit-learn, reducing manual effort and ensuring consistency.

Let's enable autologging and train the same Linear Regression model.

In [7]:
# Enable autologging
mlflow.sklearn.autolog()

# Start an MLflow run with autologging
with mlflow.start_run(run_name="Linear_Regression_Autolog") as run:
    # Train Model
    model = LinearRegression(fit_intercept=True)
    model.fit(X_train, y_train)

    # Make Predictions
    predictions = model.predict(X_test)

    # Print Run ID
    print(f"Run ID: {run.info.run_id}")



Run ID: 823013b306824507b61b5e12f9b678ad
🏃 View run Linear_Regression_Autolog at: http://localhost:5000/#/experiments/0/runs/823013b306824507b61b5e12f9b678ad
🧪 View experiment at: http://localhost:5000/#/experiments/0


Explanation:
- mlflow.sklearn.autolog(): Automatically logs parameters (e.g., fit_intercept), metrics (e.g., MSE, R²), and the model itself.
- Autologging captures additional details like model coefficients and training time, which are useful for analysis.

## Model Version Control

Model Version Control: MLflow's Model Registry allows you to manage and version models, facilitating collaboration and deployment. You can register models, assign versions, and transition them through stages like "Staging" or "Production."

Here, we register the model from the manual logging run and demonstrate version control.

In [8]:
from mlflow.tracking import MlflowClient
import mlflow

# Assuming manual_run_id is from your previous run
model_name = "Linear_Regression_Model"

try:
    # Register the model from the run
    model_uri = f"runs:/{manual_run_id}/linear_regression_model"
    result = mlflow.register_model(model_uri=model_uri, name=model_name)
    print(f"Model registered: {model_name}, Version: {result.version}")

    # Optionally, tag the model version (modern alternative to staging)
    client = MlflowClient()
    client.set_model_version_tag(
        name=model_name,
        version=result.version,
        key="stage",
        value="staging"
    )
    print(f"Model version {result.version} tagged with stage: staging")

except Exception as e:
    print(f"Error registering model: {str(e)}")

Successfully registered model 'Linear_Regression_Model'.
2025/07/07 04:57:18 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: Linear_Regression_Model, version 1


Model registered: Linear_Regression_Model, Version: 1
Model version 1 tagged with stage: staging


Created version '1' of model 'Linear_Regression_Model'.


Explanation:

- mlflow.register_model(): Registers the model in the MLflow Model Registry.
- client.transition_model_version_stage(): Moves the model to the "Staging" stage, indicating it's ready for testing.