# Tutorial - MLFlow experiements monitoring

In this notebook, we will see how to smonitor your experiments using the integrated **mlflow** callbacks.

In [1]:
# Install the library
%pip install pythae

Note: you may need to restart the kernel to use updated packages.


## Train your Pythae model

In [2]:
import torchvision.datasets as datasets

%load_ext autoreload
%autoreload 2

In [3]:
mnist_trainset = datasets.MNIST(root='../data', train=True, download=True, transform=None)

train_dataset = mnist_trainset.data[:-10000].reshape(-1, 1, 28, 28) / 255.
eval_dataset = mnist_trainset.data[-10000:].reshape(-1, 1, 28, 28) / 255.

  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


In [4]:
from pythae.models import BetaVAE, BetaVAEConfig
from pythae.trainers import BaseTrainerConfig
from pythae.pipelines.training import TrainingPipeline
from pythae.models.nn.benchmarks.mnist import Encoder_ResNet_VAE_MNIST, Decoder_ResNet_AE_MNIST

In [20]:
training_config = BaseTrainerConfig(
    output_dir='my_model',
    learning_rate=1e-3,
    batch_size=100,
    num_epochs=5, # Change this to train the model a bit more
)


model_config = BetaVAEConfig(
    input_dim=(1, 28, 28),
    latent_dim=16,
    beta=2.

)

model = BetaVAE(
    model_config=model_config,
    encoder=Encoder_ResNet_VAE_MNIST(model_config), 
    decoder=Decoder_ResNet_AE_MNIST(model_config) 
)

## Before lauching the pipeline, you will need to build your `MLFLowCallback`

To be able to access this feature you will need:
- the `mlflow` package installed in your virtual env. You can install it by running (`pip install mlflow`)

In [21]:
# Before being allowed to monitor your experiments you may need to run the following
#!pip install mlflow



In [33]:
# Create you callback
from pythae.trainers.training_callbacks import MLFlowCallback

callbacks = [] # the TrainingPipeline expects a list of callbacks

mlflow_cb = MLFlowCallback() # Build the callback 

# SetUp the callback 
mlflow_cb.setup(
    training_config=training_config, # training config
    model_config=model_config, # model config
    run_name="mlflow_cb_example", # specify your wandb project
)

callbacks.append(mlflow_cb) # Add it to the callbacks list

In [34]:
pipeline = TrainingPipeline(
    training_config=training_config,
    model=model
)

In [35]:
pipeline(
    train_data=train_dataset,
    eval_data=eval_dataset,
    callbacks=callbacks # pass the callbacks to the TrainingPipeline and you are done!
)

Preprocessing train data...
Preprocessing eval data...

Using Base Trainer

Model passed sanity check !

Created my_model/BetaVAE_training_2022-08-29_10-43-48. 
Training config, checkpoints and final model will be saved here.

Successfully launched training !



Training of epoch 1/5:   0%|          | 0/500 [00:00<?, ?batch/s]

Eval of epoch 1/5:   0%|          | 0/100 [00:00<?, ?batch/s]

--------------------------------------------------------------------------
Train loss: 44.2825
Eval loss: 40.3788
--------------------------------------------------------------------------


Training of epoch 2/5:   0%|          | 0/500 [00:00<?, ?batch/s]

Eval of epoch 2/5:   0%|          | 0/100 [00:00<?, ?batch/s]

--------------------------------------------------------------------------
Train loss: 40.1
Eval loss: 39.1482
--------------------------------------------------------------------------


Training of epoch 3/5:   0%|          | 0/500 [00:00<?, ?batch/s]

Eval of epoch 3/5:   0%|          | 0/100 [00:00<?, ?batch/s]

--------------------------------------------------------------------------
Train loss: 39.2773
Eval loss: 38.6415
--------------------------------------------------------------------------


Training of epoch 4/5:   0%|          | 0/500 [00:00<?, ?batch/s]

Eval of epoch 4/5:   0%|          | 0/100 [00:00<?, ?batch/s]

--------------------------------------------------------------------------
Train loss: 38.8512
Eval loss: 38.3104
--------------------------------------------------------------------------


Training of epoch 5/5:   0%|          | 0/500 [00:00<?, ?batch/s]

Eval of epoch 5/5:   0%|          | 0/100 [00:00<?, ?batch/s]

--------------------------------------------------------------------------
Train loss: 38.5628
Eval loss: 38.082
--------------------------------------------------------------------------
Training ended!
Saved final model in my_model/BetaVAE_training_2022-08-29_10-43-48/final_model


In [None]:
# You can comapre