# Introduction

[Catalyst](https://catalyst-team.github.io/catalyst/index.html) is a PyTorch framework for Deep Learning R&D. It focuses on reproducibility, rapid experimentation, and codebase reuse so you can create something new rather than write yet another train loop. The Catalyst library incorporates research best practices so users can focus on building models and not worry about writing boilerplate code.   

[Comet](https://www.comet.com/site/data-scientists/?utm_campaign=gradio-integration&utm_medium=colab) is an MLOps Platform that is designed to help Data Scientists and Teams build better models faster! Comet provides tooling to track, Explain, Manage, and Monitor your models in a single place! It works with Jupyter Notebooks and Scripts and most importantely it's 100% free!

# Setup

In [None]:
%pip install catalyst "comet_ml>=3.44.0"

# Login to Comet

In [None]:
import comet_ml

comet_ml.login(project_name="comet-example-catalyst-notebook")

# Import Dependencies

In [None]:
import os
import torch
from torch import nn, optim
from torch.utils.data import DataLoader

from catalyst import dl, utils
from catalyst.data import ToTensor
from catalyst.contrib.datasets import MNIST
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl
from catalyst.callbacks.checkpoint import CheckpointCallback

# Logging to Comet

In [None]:
def train(logger, hparams={"lr": 0.02, "batch_size": 32}):
    model = nn.Sequential(nn.Flatten(), nn.Linear(28 * 28, 10))
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=hparams["lr"])

    loaders = {
        "train": DataLoader(
            MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()),
            batch_size=hparams["batch_size"],
        ),
        "valid": DataLoader(
            MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()),
            batch_size=hparams["batch_size"],
        ),
    }

    runner = dl.SupervisedRunner(
        input_key="features", output_key="logits", target_key="targets", loss_key="loss"
    )

    # model training
    runner.train(
        model=model,
        criterion=criterion,
        optimizer=optimizer,
        loaders=loaders,
        hparams=hparams,
        num_epochs=1,
        callbacks=[
            dl.AccuracyCallback(
                input_key="logits", target_key="targets", topk_args=(1, 3, 5)
            ),
            dl.PrecisionRecallF1SupportCallback(
                input_key="logits", target_key="targets", num_classes=10
            ),
        ],
        logdir="./logs",
        valid_loader="valid",
        valid_metric="loss",
        minimize_valid_metric=True,
        verbose=True,
        load_best_on_end=True,
        loggers={"comet": logger},
    )

## Logging your training run to a Comet Experiment

In [None]:
logger = dl.CometLogger()
train(logger)

### Visualize your training Metrics

Visualize your metrics without leaving the notebook!

In [None]:
logger.experiment.display(tab="charts")

## Logging a training run while Running Offline

There may be situations where you would like to log your metrics, parameters, source code, etc, but you might not be able to access the public internet. (e.g. Running on a cluster inside a private network).

In those situations, you can run Comet's logging in Offline Mode. This will log your run as a zip file that you can upload later to the Comet UI.  

In [None]:
logger = dl.CometLogger(comet_mode="offline", **{"offline_directory": "/tmp"})
train(logger)

In [None]:
! comet upload /tmp/<your-experiment-id>.zip

## Continuing an existing Run

In order to resume a training run from where you left off, you will have to provide the experiment id of the existing run to the Comet logger.

Additionally, you can pass a list of strings to tag

In [None]:
previous_experiment_id = "previous-experiment-id"
logger = dl.CometLogger(experiment_id=previous_experiment_id, tags=["resumed"])
train(logger)

## Logging Multi Stage Runs

One of the key benefits of Catalyst is the ability to create multi-stage runs. The Comet Logger supports this right out of the box. Each metric and parameter in a multi-stage run will be logged to comet with a prefix denoting the stage.

The metric name format is `{stage_key}/{loader_key}_{metric_name}`

In [None]:
class CustomRunner(dl.IRunner):
    def __init__(self, logdir, device):
        # you could add all required extra params during Runner initialization
        # for our case, let's customize ``logdir`` and ``engine`` for the runs
        super().__init__()
        self._logdir = logdir
        self._device = device

    def get_engine(self):
        return dl.DeviceEngine(self._device)

    def get_loggers(self):
        return {
            "comet": dl.CometLogger(),
        }

    @property
    def stages(self):
        # suppose we have 2 stages:
        # 1st - with freezed encoder
        # 2nd with unfreezed whole network
        return ["train_freezed", "train_unfreezed"]

    def get_stage_len(self, stage: str) -> int:
        return 3

    def get_loaders(self, stage: str):
        loaders = {
            "train": DataLoader(
                MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()),
                batch_size=32,
            ),
            "valid": DataLoader(
                MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()),
                batch_size=32,
            ),
        }
        return loaders

    def get_model(self, stage: str):
        # the logic here is quite straightforward:
        # we create the model on the fist stage
        # and reuse it during next stages
        model = (
            self.model
            if self.model is not None
            else nn.Sequential(
                nn.Flatten(), nn.Linear(784, 128), nn.ReLU(), nn.Linear(128, 10)
            )
        )
        if stage == "train_freezed":
            # 1st stage
            # freeze layer
            utils.set_requires_grad(model[1], False)
        else:
            # 2nd stage
            utils.set_requires_grad(model, True)
        return model

    def get_criterion(self, stage: str):
        return nn.CrossEntropyLoss()

    def get_optimizer(self, stage: str, model):
        # we could also define different components for the different stages
        if stage == "train_freezed":
            return optim.Adam(model.parameters(), lr=1e-3)
        else:
            return optim.SGD(model.parameters(), lr=1e-1)

    def get_scheduler(self, stage: str, optimizer):
        return None

    def get_callbacks(self, stage: str):
        return {
            "criterion": dl.CriterionCallback(
                metric_key="loss", input_key="logits", target_key="targets"
            ),
            "optimizer": dl.OptimizerCallback(metric_key="loss"),
            # "scheduler": dl.SchedulerCallback(loader_key="valid", metric_key="loss"),
            "accuracy": dl.AccuracyCallback(
                input_key="logits", target_key="targets", topk_args=(1, 3, 5)
            ),
            "classification": dl.PrecisionRecallF1SupportCallback(
                input_key="logits", target_key="targets", num_classes=10
            ),
            # catalyst[ml] required
            # "confusion_matrix": dl.ConfusionMatrixCallback(
            #     input_key="logits", target_key="targets", num_classes=10
            # ),
            "checkpoint": dl.CheckpointCallback(
                self._logdir,
                loader_key="valid",
                metric_key="loss",
                minimize=True,
                save_n_best=3,
            ),
        }

    def handle_batch(self, batch):
        x, y = batch
        logits = self.model(x)

        self.batch = {
            "features": x,
            "targets": y,
            "logits": logits,
        }

In [None]:
runner = CustomRunner("/tmp", "cuda")
runner.run()

## Logging Model Checkpoints and Arbitary Data

The CometLogger also supports logging arbitary data files. In this example, we subclass the Checkpoint Callback and use the CometLogger to log our model weights

We can use a similar callback structure to log any arbitary data to Comet such as files containing model predictions, audio, text etc.  

In [None]:
logger = dl.CometLogger()

# sample data
num_samples, num_features, num_classes = int(1e4), int(1e1), 4
X = torch.rand(num_samples, num_features)
y = (
    torch.rand(
        num_samples,
    )
    * num_classes
).to(torch.int64)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_classes)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [2])


class CometCheckpointCallback(CheckpointCallback):
    def __init__(self, logdir, logger, best_score=None, save_n_best=1):
        super().__init__(logdir)
        self.logdir = logdir
        self.logger = logger
        self.save_n_best = save_n_best

    def on_epoch_end(self, runner: "IRunner") -> None:
        """
        Collects and saves checkpoint after epoch.

        Args:
            runner: current runner
        """
        if runner.is_infer_stage:
            return
        if runner.engine.is_ddp and not runner.engine.is_master_process:
            return

        if self._use_model_selection:
            # score model based on the specified metric
            score = runner.epoch_metrics[self.loader_key][self.metric_key]
        else:
            # score model based on epoch number
            score = runner.global_epoch_step

        is_best = False
        if self.best_score is None or self.is_better(score, self.best_score):
            self.best_score = score
            is_best = True

        if self.save_n_best > 0:
            # pack checkpoint
            checkpoint = self._pack_checkpoint(runner)
            # save checkpoint
            checkpoint_path = self._save_checkpoint(
                runner=runner,
                checkpoint=checkpoint,
                is_best=is_best,
                is_last=True,
            )
            self.logger.log_artifact(
                path_to_artifact=checkpoint_path,
                global_batch_step=runner.global_batch_step,
            )


# model training
runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    logdir="./logdir",
    num_epochs=3,
    valid_loader="valid",
    valid_metric="accuracy03",
    minimize_valid_metric=False,
    verbose=True,
    callbacks=[
        dl.AccuracyCallback(
            input_key="logits", target_key="targets", num_classes=num_classes
        ),
        CometCheckpointCallback(logdir="/tmp", logger=logger),
    ],
)

### Let's take a look at the Logged Checkpoint

In [None]:
logger.experiment.display(tab="assets")

## Logging Evaluations

In [None]:
logger = dl.CometLogger()

# sample data
num_samples, num_features, num_classes = int(1e4), int(1e1), 4
X = torch.rand(num_samples, num_features)
y = (
    torch.rand(
        num_samples,
    )
    * num_classes
).to(torch.int64)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_classes)

# model training
runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)
metrics = runner.evaluate_loader(
    loader,
    callbacks=[
        dl.AccuracyCallback(
            input_key="logits", target_key="targets", num_classes=num_classes
        ),
    ],
    model=model,
)
logger.log_metrics(metrics, stage_key="evaluation", loader_key="valid", scope="batch")
logger.experiment.end()