# MLflow and DSPy Tutorial for Beginners

*Learn to manage ML experiments with MLflow and build modular AI solutions with DSPy in Google Colab.*

## Introduction

Machine Learning projects often involve numerous experiments, parameters, and iterations. **MLflow** is an open-source platform for managing the end-to-end machine learning lifecycle. It helps track experiments, models, and metrics, making it easier to organize and reproduce results. On the other hand, **DSPy** (Declarative Self-improving Python) is an open-source Python framework that allows developers to build language model applications using modular and declarative programming instead of relying on one-off prompting techniques. In simpler terms, DSPy lets you *program* your AI rather than just prompt it, enabling more structured and optimizable AI behaviors.

In this tutorial, we will introduce beginners to an AI/ML workflow that combines MLflow and DSPy. You'll learn how to set up your environment, manage sensitive API keys (using Google Colab's Secrets and OpenRouter), organize your code and data, integrate a dataset, train and optimize a model, evaluate its performance, and finally deploy the model. Throughout, we'll highlight best practices for reproducibility, experiment tracking, and modular coding. By the end, you should understand how MLflow can track your machine learning experiments and how DSPy can help build and refine AI model logic as code.

## User Guide

Let's walk through the key steps of our AI/ML workflow using MLflow and DSPy:

1. **Installation and Setup** – Install MLflow, DSPy, and other required libraries, and configure our environment.
2. **Managing Secrets (API Keys)** – Securely handle API keys (like OpenAI keys) using Google Colab Secrets and optionally OpenRouter for model access.
3. **Modular Code Structure** – Write reusable functions and classes to keep code organized and maintainable.
4. **File/Folder Organization** – Set up a clear project directory structure to manage datasets, models, and code.
5. **Data Integration** – Load the dataset and preprocess it for training.
6. **Model Training & Optimization** – Use DSPy to train and optimize AI models, while MLflow tracks the training experiments.
7. **Evaluation Metrics** – Assess model performance using key metrics and record these results.
8. **Deployment** – Save the trained model and demonstrate how it can be loaded or served for use in a production environment.
9. **Best Practices** – Ensure reproducibility, logging, and tracking experiments.

Run each code cell in order (this notebook is Colab-ready) and modify the examples to experiment on your own.

## Installation

First, we need to install and import the necessary libraries. We will use **MLflow** for experiment tracking and **DSPy** for building and optimizing our AI modules. We'll also use common libraries like **pandas** (for data handling) and **scikit-learn** (for a simple machine learning model in this example).

In [None]:
# Install MLflow, DSPy, and other required libraries
!pip install -q mlflow dspy pandas scikit-learn

## Google Colab Secrets and OpenRouter

When working with AI models, you often need API keys (for example, for accessing AI services). It's important **not to hardcode keys** in your notebook. Google Colab provides a **Secrets** feature that allows you to securely store API keys and retrieve them in your code. You can add secrets by clicking on the "Secrets" icon in Colab, then entering a name (e.g., `OPENAI_API_KEY`) and its value. Once stored, a secret can be accessed from your notebook.

To use a stored secret, Colab offers the `google.colab.userdata` module. For example, you can fetch your API key and set it as an environment variable:
```python
from google.colab import userdata
import os
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
```

Alternatively, you can use Python's `getpass` to input the key at runtime (so it won't be visible in the notebook).

In [None]:
from google.colab import userdata
import os, getpass

# Retrieve the API key from Colab Secrets, or prompt for it if not found
api_key = userdata.get('OPENAI_API_KEY')
if api_key is None or api_key == '':
    api_key = getpass.getpass('Enter your API key: ')
os.environ['OPENAI_API_KEY'] = api_key

print('API key set successfully!')

## Modular Code Structures

Writing modular code means breaking down your project into reusable components (functions, classes, modules) instead of writing one long script. This approach makes it easier to read, debug, and reuse code. For instance, you might have one function to preprocess data, another to train a model, and another to evaluate results. This modular structure is especially useful in machine learning projects where you may try multiple approaches. DSPy itself encourages a modular approach by letting you define *modules* for each part of an AI task.

## File/Folder Structure

Organizing files and directories is important for managing larger projects. Rather than keeping everything in one notebook, you should structure your project repository so that code, data, and results are separated logically. For example, a typical project structure might look like this:

```text
project_name/
├── data/
│   ├── train.csv
│   └── test.csv
├── models/
│   └── model.pkl
├── notebooks/
│   └── exploration.ipynb
├── src/
│   ├── preprocess.py
│   ├── train.py
│   └── inference.py
└── README.md
```

In Colab, you can also mount Google Drive or use cloud storage to organize your data and outputs in a similar way.

## Data Integration

Now that our environment is ready, let's integrate a dataset for our model. Data integration involves loading the data, exploring it, and preprocessing it so it's suitable for training. Common steps include:
- **Loading data** from a file or source (CSV, database, etc.),
- **Cleaning** the data (handling missing values, removing duplicates),
- **Feature engineering or encoding** (e.g., converting categorical values to numeric),
- **Splitting** the data into training and testing sets.

For simplicity, we'll use the classic Iris dataset from scikit-learn. This dataset contains flower measurements and species labels for classification. We'll load it, create a pandas DataFrame, and then split it into training and test sets.

In [None]:
# Load the Iris dataset and prepare the DataFrame
from sklearn.datasets import load_iris
import pandas as pd

data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target
print('Dataset sample:')
print(df.head())

# Split the data into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    df[data.feature_names], df['target'], test_size=0.2, random_state=42
)
print('Train set size:', X_train.shape)
print('Test set size:', X_test.shape)

## Model Training and Optimization

With our data ready, the next step is to train a model and track the process. We'll start by training a simple logistic regression model on the Iris dataset using MLflow to log parameters, metrics, and the model artifact. After that, we'll demonstrate how **DSPy** can be used to program and optimize a language-model-based component, showcasing the potential of building and refining custom AI modules.

### Training a Model with MLflow

We'll train a logistic regression model on the Iris training set. MLflow will log important details such as model parameters, evaluation metrics, and the model artifact. This logging makes it easy to reproduce and compare experiments.

In [None]:
import mlflow
import sklearn
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Start an MLflow experiment run
mlflow.set_experiment("Iris_Classification")
with mlflow.start_run(run_name="Logistic Regression") as run:
    # Initialize and train the model
    model = LogisticRegression(max_iter=1000)
    model.fit(X_train, y_train)
    
    # Predict on the test set and calculate accuracy
    predictions = model.predict(X_test)
    acc = accuracy_score(y_test, predictions)
    print('Test Accuracy:', acc)
    
    # Log parameters and metrics to MLflow
    mlflow.log_param('model_type', 'LogisticRegression')
    mlflow.log_param('max_iter', 1000)
    mlflow.log_param('C', 1.0)
    mlflow.log_metric('accuracy', acc)
    
    # Log the trained model as an artifact
    mlflow.sklearn.log_model(model, artifact_path="model")
    
# Get the run ID for later use (for loading the model)
run_id = run.info.run_id
print('Run ID:', run_id)

### Optimizing Model Behavior with DSPy

Now we'll showcase DSPy. Instead of focusing solely on traditional machine learning models, DSPy lets us create and optimize AI modules—miniature brains that can understand and execute specific tasks. Rather than relying on prompt engineering alone, DSPy allows you to program AI behavior, enabling deeper customization. In this example, we configure DSPy to use the **o3-mini-high** model (our new choice) and create a simple sentiment analysis module. This model will process an input sentence and output a boolean sentiment indicator.


In [None]:
import dspy

# Configure DSPy with the o3-mini-high model (make sure your API key is set)
lm = dspy.LM('o3-mini-high', api_key=os.getenv('OPENAI_API_KEY'))
dspy.configure(lm=lm)

# Define a DSPy prediction module for sentiment analysis
classifier = dspy.Predict('sentence -> sentiment: bool')

# Test the module with an example sentence
sentence = "I love the new design, it's brilliant!"
result = classifier(sentence=sentence)
print('Input:', sentence)
print('Predicted sentiment (positive=True):', result.sentiment)

## Evaluation Metrics

Evaluating model performance is a critical part of any ML workflow. For our classifier, we used **accuracy** on the test set as the primary metric (the proportion of correctly predicted samples). Depending on the problem, you might also consider other metrics such as precision, recall, and F1-score for classification tasks, or MSE/MAE for regression tasks. MLflow allows you to log any number of metrics during training and evaluation, making it easier to compare different runs and choose the best model.

## Deployment

After training and evaluating a model, the final step is deployment. Deployment can take many forms:
- **Batch predictions**: Using the model to generate predictions on new data in batches.
- **Real-time serving**: Hosting the model behind an API for on-demand predictions.

MLflow Models make deployment easier by packaging the model with its dependencies. In this notebook, we simulate deployment by loading the logged model and running predictions on new data. In a production setting, you might serve the model via MLflow's REST API or deploy it to a cloud platform.

In [None]:
# Load the model from the MLflow run and use it for inference
import mlflow

# Construct the model URI using the run ID and artifact path
model_uri = f"runs:/{run_id}/model"
loaded_model = mlflow.sklearn.load_model(model_uri)

# Perform a sample prediction using the loaded model
sample_inputs = X_test[:5]
sample_true = y_test[:5]
predictions = loaded_model.predict(sample_inputs)
print('Sample input features:\n', sample_inputs)
print('Model predictions:', predictions.tolist())
print('Actual labels:    ', sample_true.tolist())

## Best Practices

To wrap up, here are some best practices for AI/ML workflows:

- **Track Everything**: Use MLflow to log parameters, code versions, data versions, metrics, and artifacts. This makes your work reproducible and easier to debug.
- **Reproducibility**: Ensure that your experiments are reproducible by fixing random seeds, pinning package versions, and capturing environment details.
- **Version Control**: Keep your code in version control (e.g., Git). Log commit hashes in MLflow for traceability.
- **Modular Design**: Write modular code to isolate and reuse components across projects.
- **Testing and Validation**: Validate your model and code with appropriate testing methods.
- **Model Registry & Deployment**: Manage model versions using a registry and automate deployment when possible.
- **Continuous Monitoring**: Once deployed, continuously monitor your model's performance to address data drift or model decay.

By following these practices, you create a robust ML workflow. MLflow handles tracking and reproducibility, while DSPy enables you to build and optimize AI modules in a modular, programmable way. Together, these tools empower rapid prototyping and innovation in AI.