# Get Started with MLflow + Torch Lightning

**Author:** Chen Qian<br>
**Date created:** 2023/09/05<br>
**Last modified:** 2023/09/05<br>

In this guide, we will show how to train your model with PyTorch + Lightning and track your training using MLflow.

For metrics visualization, we will use [Databricks Community Edition](https://community.cloud.databricks.com/), which is completely free. If you haven't, please register an account via [link](https://www.databricks.com/try-databricks), we will use it later.

We recommend turn on the free-tier GPU in Colab by **Edit -> notebook settings -> Hardware Accelerator**, it will significanly shorten the time cost.

## Install dependencies

Let's install the `mlflow` package.

In [None]:
!pip install -q lightning datasets mlflow

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.3/18.3 MB[0m [31m40.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m83.5/83.5 kB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m188.5/188.5 kB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m225.4/225.4 kB[0m [31m22.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m148.1/148.1 kB[0m [31m18.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m80.2/80.2 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.7/78.7 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.1/143.1 kB[0m [31m18.4 MB/s

In [None]:
!pip install -q pydantic==1.10.11

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import lightning.pytorch as pl
import torch
from torch import nn
from torch.nn import functional as F

BATCH_SIZE = 256 if torch.cuda.is_available() else 64

## Load the dataset

We will do a simple image classification on handwritten digits with [mnist dataset](https://en.wikipedia.org/wiki/MNIST_database).

Let's load the dataset from `torchvision`, and convert it into a [`Dataloader`](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html), which provides nice data preprocessing features like batching.

In [None]:
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
import os

dataset = MNIST(os.getcwd(), download=True, transform=ToTensor())
train_set, val_set = torch.utils.data.random_split(dataset, [50000, 10000])
train_loader = torch.utils.data.DataLoader(train_set, batch_size=64)
val_loader = torch.utils.data.DataLoader(val_set, batch_size=64)

## Define the Model

Let's define a convolutional neural network as our classifier. In order to use Lightning to train our model, the model class should subclass from `LightningModule`. In short, `LightningModule` is `torch.nn.Module` plus training support, including loss computation, optimizer configuration and so on.

In [None]:
class MnistClassifier(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, (3, 3))
        self.conv2 = nn.Conv2d(32, 64, (3, 3))
        self.pool = nn.MaxPool2d(2, 2)
        self.linear = nn.LazyLinear(10)
        self.dropout = nn.Dropout()

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1)
        x = self.linear(x)
        return self.dropout(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        loss = F.cross_entropy(self(x), y)
        self.log("train_loss", loss)
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.001)

## Set up tracking/visualization tool

If you have not, please register an account of [Databricks community edition](https://www.databricks.com/try-databricks#account). It should take no longer than 1min to register.

Databricks CE (community edition) is a free platform for users to try out Databricks features. For this guide, we need the ML experiment dashboard for us to track our training progress.

After you have sucessfully registered an account, all you need to do is to run the command below to connect from Google Colab to your Databricks account. You will need to enter following information at prompt:
- **Databricks Host**: https://community.cloud.databricks.com/
- **Username**: your signed up email
- **Password**: your password

In [None]:
!databricks configure

Databricks Host (should begin with https://): https://community.cloud.databricks.com/
Username: qianchen94era@gmail.com
Password: 
Repeat for confirmation: 


Now this colab is connected to the hosted tracking server. In order to log to mlflow inside a Lightning pipeline, we need to instantiate a `lightning.pytorch.loggers.MLFlowLogger` instance. We need to specify the experiment name and tracking uri:
- `mlflow.set_tracking_uri`: **always use "databricks"**.
- `mlflow.set_experiment`: pick up a name you like, start with `/`.

In [None]:
import mlflow
from lightning.pytorch.loggers import MLFlowLogger

mlflow_logger = MLFlowLogger(
    experiment_name="/mlflow-torch-lightning-mnist",
    tracking_uri="databricks",
)

Create our model and `pytorch_lightning.Trainer` instance, and set the logger as the `mlflow_logger` we defined above. After bootstrapping these together, we can start the training, and it will automatically log the training into MLflow. The visualization dashboard could be found on Databricks CE.

In [None]:
classifier = MnistClassifier()

trainer = pl.Trainer(max_epochs=3, logger=mlflow_logger)
trainer.fit(
    model=classifier,
    train_dataloaders=train_loader,
    val_dataloaders=val_loader,
)

INFO: GPU available: True (cuda), used: True
INFO:lightning.pytorch.utilities.rank_zero:GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
INFO:lightning.pytorch.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
INFO:lightning.pytorch.utilities.rank_zero:HPU available: False, using: 0 HPUs
  rank_zero_warn("You passed in a `val_dataloader` but have no `validation_step`. Skipping val loop.")
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO: 
  | Name    | Type       | Params
---------------------------------------
0 | conv1   | Conv2d     | 320   
1 | conv2   | Conv2d     | 18.5 K
2 | pool    | MaxPool2d  | 0     
3 | linear  | LazyLinear | 0     
4 | dropout | Dropout    | 0     
---------------

Training: 0it [00:00, ?it/s]

INFO: `Trainer.fit` stopped: `max_epochs=3` reached.
INFO:lightning.pytorch.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=3` reached.


While your training is ongoing, you can find this training in your dashboard. Log in to your [Databricks CE](https://community.cloud.databricks.com/) account, and click on top left to select machine learning in the drop down list. Then click on the experiment icon. See the screenshot below:
![landing page](https://drive.google.com/uc?export=view&id=1QxVaolr-L-w96pKUOiYQut3aSRE-04tC)

After clicking the `Experiment` button, it will bring you to the experiment page, where you can find your runs. Clicking on the most recent experiment and run, you can find your metrics there, similar to:
![experiment page](https://drive.google.com/uc?export=view&id=1FyJUD6JDHADGn_gN62Syo6lUSkTA0stp)

You can click on metrics to see the chart.