# Getting Started: ATOMMIC Fundamentals

Advanced Toolbox for Multitask Medical Imaging Consistency (ATOMMIC), is a toolbox for applying AI methods for accelerated MRI reconstruction (REC), MRI segmentation (SEG), quantitative MR imaging (qMRI), as well as multitask learning (MTL), i.e. performing multiple tasks simultaneously, such as reconstruction and segmentation. 

Each task is implemented in a separate collection, which consists of data loaders, transformations, models, metrics, and losses. A

ATOMMIC is designed to be modular and extensible, and it is easy to add new tasks, models, and datasets. 

ATOMMIC uses PyTorch Lightning for feasible high-performance multi-GPU/multi-node mixed-precision training.

In [None]:
"""
You can run either this notebook locally (if you have all the dependencies and a GPU) or on Google Colab.

Instructions for setting up Colab are as follows:
1. Open a new Python 3 notebook.
2. Import this notebook from GitHub (File -> Upload Notebook -> "GITHUB" tab -> copy/paste GitHub URL)
3. Connect to an instance with a GPU (Runtime -> Change runtime type -> select "GPU" for hardware accelerator)
4. Run this cell to set up dependencies.
"""
# If you're using Google Colab and not running locally, run this cell.

## Install dependencies
!pip install wget
!apt-get install sox libsndfile1 ffmpeg
!pip install text-unidecode

# ## Install ATOMMIC
# BRANCH = 'main'
# !python -m pip install git+https://github.com/wdika/atommic.git@$BRANCH

## Grab the config we'll use in this example
!mkdir configs

## Foundations of ATOMMIC
---------

ATOMMIC models leverage [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning) Module, and are compatible with the entire PyTorch ecosystem. This means that users have the full flexibility of using the higher level APIs provided by PyTorch Lightning (via Trainer), or write their own training and evaluation loops in PyTorch directly (by simply calling the model and the individual components of the model).

For ATOMMIC developers, a "Model" is the neural network(s) as well as all the infrastructure supporting those network(s), wrapped into a singular, cohesive unit. As such, all ATOMMIC models are constructed to contain the following out of the box (at the bare minimum, some models support additional functionality too!) -

 -  Neural Network architecture - all of the modules that are required for the model.

 -  Dataset + Data Loaders - all of the components that prepare the data for consumption during training or evaluation.

 -  Preprocessing + Postprocessing - all of the components that process the datasets so they can easily be consumed by the modules.

 -  Optimizer + Schedulers - basic defaults that work out of the box, and allow further experimentation with ease.

 - Any other supporting infrastructure - transforms, etc.

In [None]:
import atommic
atommic.__version__

## ATOMMIC Collections

ATOMMIC is sub-divided into a few fundamental collections based on their domains - `mtl`, `qmri`, `rec`, `seg`. When you performed the `import atommic` statement above, none of the above collections were imported. This is because you might not need all of the collections at once, so ATOMMIC allows partial imports of just one or more collection, as and when you require them.

-------
Let's import the above four collections - 

In [None]:
import atommic.collections.multitask.rs as atommic_mtlrs
import atommic.collections.quantitative as atommic_qmri
import atommic.collections.reconstruction as atommic_rec
import atommic.collections.segmentation as atommic_seg

## ATOMMIC Models in Collections

ATOMMIC contains several models for each of its collections. At a brief glance, let's look at all the Models that ATOMMIC offers for the above 4 collections.

In [None]:
mtlrs_models = [model for model in dir(atommic_mtlrs.nn) if not model.startswith("__") and not model.islower() and not "Block" in model]
mtlrs_models

In [None]:
qmri_models = [model for model in dir(atommic_qmri.nn) if not model.startswith("__") and not model.islower()]
qmri_models

In [None]:
rec_models = [model for model in dir(atommic_rec.nn) if not model.startswith("__") and not model.islower()]
rec_models

In [None]:
seg_models = [model for model in dir(atommic_seg.nn) if not model.startswith("__") and not model.islower()]
seg_models

## The ATOMMIC Model

Let's dive deeper into what a ATOMMIC model really is. There are many ways we can create these models - we can use the constructor and pass in a config, we can instantiate the model from a pre-trained checkpoint, or simply pass a pre-trained model name and instantiate a model directly from the cloud !

---------
For now, let's try to work with a reconstruction UNet model

In [None]:
rec_unet = atommic_rec.nn.UNet.from_pretrained('wdika/rec_unet_small_cc359_poisson2d_5x_10x_sense_autoestimationcsm')

In [None]:
rec_unet.summarize()

## Model Configuration using OmegaConf
--------

So we could download, instantiate and analyse the high level structure of the `UNet` model in a few lines! Now let's delve deeper into the configuration file that makes the model work.

First, we import [OmegaConf](https://omegaconf.readthedocs.io/en/latest/). OmegaConf is an excellent library that is used throughout ATOMMIC in order to enable us to perform yaml configuration management more easily. Additionally, it plays well with another library, [Hydra](https://hydra.cc/docs/intro/), that is used by ATOMMIC to perform on the fly config edits from the command line, dramatically boosting ease of use of our config files !

In [None]:
from omegaconf import OmegaConf

All ATOMMIC models come packaged with their model configuration inside the `cfg` attribute. While technically it is meant to be config declaration of the model as it has been currently constructed, `cfg` is an essential tool to modify the behaviour of the Model after it has been constructed. It can be safely used to make it easier to perform many essential tasks inside Models. 

To be doubly sure, we generally work on a copy of the config until we are ready to edit it inside the model

In [None]:
import copy

In [None]:
cfg = copy.deepcopy(rec_unet.cfg)
print(OmegaConf.to_yaml(cfg))

## Modifying the contents of the Model config
----------

Say we want to experiment with a different scheduler to this model during training. 

OmegaConf makes this a very simple task for us!

In [None]:
# OmegaConf won't allow you to add new config items, so we temporarily disable this safeguard.
OmegaConf.set_struct(cfg, False)

# Let's see the old optim config
print("Old Config: ")
print(OmegaConf.to_yaml(cfg.optim))

sched = {'name': 'InverseSquareRootAnnealing', 'warmup_steps': 1000, 'min_lr': 1e-6}
sched = OmegaConf.create(sched)  # Convert it into a DictConfig

# Assign it to cfg.optim.sched namespace
cfg.optim.sched = sched

# Let's see the new optim config
print("New Config: ")
print(OmegaConf.to_yaml(cfg.optim))

# Here, we restore the safeguards so no more additions can be made to the config
OmegaConf.set_struct(cfg, True)

## Updating the model from config
----------

ATOMMIC Models can be updated in a few ways, but we follow similar patterns within each collection so as to maintain consistency.

Here, we will show the two most common ways to modify core components of the model - using the `from_config_dict` method, and updating a few special parts of the model.

Remember, all ATOMMIC models are PyTorch Lightning modules, which themselves are PyTorch modules, so we have a lot of flexibility here!

In [None]:
# Update the model config
rec_unet.cfg = cfg

## Update a few special components of the Model
---------

While the above approach is good for most major components of the model, ATOMMIC has special utilities for a few components.

They are - 

 - `setup_training_data`
 - `setup_validation_data` and `setup_multi_validation_data`
 - `setup_test_data` and `setup_multi_test_data`
 - `setup_optimization`

These special utilities are meant to help you easily setup training, validation, testing once you restore a model from a checkpoint.

Let's discuss how to add the scheduler to the model below (which initially had just an optimizer in its config)

In [None]:
# Let's print out the current optimizer
print(OmegaConf.to_yaml(rec_unet.cfg.optim))

In [None]:
# Now let's update the config
rec_unet.setup_optimization(cfg.optim)

-------
We see a warning - 

```
Neither `max_steps` nor `iters_per_batch` were provided to `optim.sched`, cannot compute effective `max_steps` !
    Scheduler will not be instantiated !
```

We don't have a train dataset setup, nor do we have max_steps in the config. Most ATOMMIC schedulers cannot be instantiated without computing how many train steps actually exist!

Here, we can temporarily allow the scheduler construction by explicitly passing a max_steps value to be 100

In [None]:
OmegaConf.set_struct(cfg.optim.sched, False)

cfg.optim.sched.max_steps = 100

OmegaConf.set_struct(cfg.optim.sched, True)

In [None]:
# Now let's update the config and try again
rec_unet.setup_optimization(cfg.optim)

You might wonder why we didnt explicitly set `rec_unet.cfg.optim = cfg.optim`. 

This is because the `setup_optimization()` method does it for you! You can still update the config manually.

### Optimizer & Scheduler Config

Optimizers and schedulers are common components of models, and are essential to train the model from scratch.

They are grouped together under a unified `optim` namespace, as schedulers often operate on a given optimizer.



### Let's breakdown the general `optim` structure
```yaml
optim:
    name: novograd
    lr: 0.01

    # optimizer arguments
    betas: [0.8, 0.25]
    weight_decay: 0.001

    # scheduler setup
    sched:
      name: CosineAnnealing

      # Optional arguments
      max_steps: -1 # computed at runtime or explicitly set here
      monitor: val_loss
      reduce_on_plateau: false

      # scheduler config override
      warmup_steps: 1000
      warmup_ratio: null
      min_lr: 1e-9
```

Essential Optimizer components - 

 - `name`: String name of the optimizer. Generally a lower case of the class name.
 - `lr`: Learning rate is a required argument to all optimizers.

Optional Optimizer components - after the above two arguments are provided, any additional arguments added under `optim` will be passed to the constructor of that optimizer as keyword arguments

 - `betas`: List of beta values to pass to the optimizer
 - `weight_decay`: Optional weight decay passed to the optimizer.

Optional Scheduler components - `sched` is an optional setup of the scheduler for the given optimizer.

If `sched` is provided, only one essential argument needs to be provided : 

 - `name`: The name of the scheduler. Generally, it is the full class name.

Optional Scheduler components - 

 - `max_steps`: Max steps as an override from the user. If one provides `trainer.max_steps` inside the trainer configuration, that value is used instead. If neither value is set, the scheduler will attempt to compute the `effective max_steps` using the size of the train data loader. If that too fails, then the scheduler will not be created at all.

 - `monitor`: Used if you are using an adaptive scheduler such as ReduceLROnPlateau. Otherwise ignored. Defaults to `loss` - indicating train loss as monitor.

 - `reduce_on_plateau`: Required to be set to true if using an adaptive scheduler.

Any additional arguments under `sched` will be supplied as keyword arguments to the constructor of the scheduler.




## Creating Model from constructor vs restoring a model
---------

You might notice, we discuss all of the above setup methods in the context of model after it is restored. However, ATOMMIC scripts do not call them inside any of the example train scripts themselves.

This is because these methods are automatically called by the constructor when the Model is created for the first time, but these methods are skipped during restoration (either from a PyTorch Lightning checkpoint using `load_from_checkpoint`, or via `restore_from` method inside ATOMMIC Models).

This is done as most datasets are stored on a user's local directory, and the path to these datasets is set in the config (either set by default, or set by Hydra overrides). On the other hand, the models are meant to be portable. On another user's system, the data might not be placed at exactly the same location, or even on the same drive as specified in the model's config!

Therefore we allow the constructor some brevity and automate such dataset setup, whereas restoration warns that data loaders were not set up and provides the user with ways to set up their own datasets.

------

Why are optimizers not restored automatically? Well, optimizers themselves don't face an issue, but as we saw before, schedulers depend on the number of train steps in order to calculate their schedule.

However, if you don't wish to modify the optimizer and scheduler, and prefer to leave them to their default values, that's perfectly alright. The `setup_optimization()` method is automatically called by PyTorch Lightning for you when you begin training your model!

## Saving and restoring models
----------

ATOMMIC provides a few ways to save and restore models. If you utilize the Experiment Manager that is part of all ATOMMIC train scripts, PyTorch Lightning will automatically save checkpoints for you in the experiment directory.

We can also use packaged files using the specialized `save_to` and `restore_from` methods.

### Saving and Restoring from PTL Checkpoints
----------

The PyTorch Lightning Trainer object will periodically save checkpoints when the experiment manager is being used during training.

PyTorch Lightning checkpoints can then be loaded and evaluated / fine-tuned just as always using the class method `load_from_checkpoint`.

For example, restore a UNet model from a checkpoint - 

```python
rec_unet = atommic_rec.nn.UNet.load_from_checkpoint(<path to checkpoint>)
```

### Saving and Restoring from .atommic files
----------

There are a few models which might require external dependencies to be packaged with them in order to restore them properly.

We can use the `save_to` and `restore_from` method to package the entire model + its components into a tarfile. This can then be easily imported by the user and used to restore the model.

In [None]:
# Save the model
rec_unet.save_to('rec_unet.atommic')

In [None]:
!ls -d -- *.atommic 

In [None]:
# Restore the model
temp_unet = atommic_rec.nn.UNet.restore_from('rec_unet.atommic')

In [None]:
temp_unet.summarize()

In [None]:
# Note that the preprocessor + optimizer config have been preserved after the changes we made !
print(OmegaConf.to_yaml(temp_unet.cfg))

Note, that .atommic file is a simple .tar.gz with checkpoint, configuration and, potentially, other artifacts being used by the model

In [None]:
!cp rec_unet.atommic rec_unet.tar.gz
!tar -xvf rec_unet.tar.gz

### Extracting PyTorch checkpoints from ATOMMIC tarfiles (Model level)
-----------

While the .atommic tarfile is an excellent way to have a portable model, sometimes it is necessary for researchers to have access to the basic PyTorch save format. ATOMMIC aims to be entirely compatible with PyTorch, and therefore offers a simple method to extract just the PyTorch checkpoint from the .atommic tarfile.

In [None]:
import torch

In [None]:
state_dict = temp_unet.extract_state_dict_from('rec_unet.atommic', save_dir='./pt_ckpt/')
!ls ./pt_ckpt/

As we can see below, there is now a single basic PyTorch checkpoint available inside the `pt_ckpt` directory, which we can use to load the weights of the entire model as below

In [None]:
temp_unet.load_state_dict(torch.load('./pt_ckpt/model_weights.ckpt'))

### Extracting PyTorch checkpoints from ATOMMIC tarfiles (Module level)
----------

While the above method is exceptional when extracting the checkpoint of the entire model, sometimes there may be a necessity to load and save the individual modules that comprise the Model.

The same extraction method offers a flag to extract the individual model level checkpoints into their individual files, so that users have access to per-module level checkpoints.

In [None]:
state_dict = temp_unet.extract_state_dict_from('rec_unet.atommic', save_dir='./pt_module_ckpt/', split_by_module=True)
!ls ./pt_module_ckpt/

# ATOMMIC with Hydra

[Hydra](https://hydra.cc/docs/intro/) is used throughout ATOMMIC as a way to enable rapid prototyping using predefined config files. Hydra and OmegaConf offer great compatibility with each other when using ATOMMIC.

# Optionally you might want to remove any generated files

In [None]:
import os
import shutil

In [None]:
current_directory = os.getcwd()

In [None]:
# List all files in the folder
all_files = os.listdir(current_directory)

# List all files and directories in the folder
for root, dirs, files in os.walk(current_directory, topdown=False):
    for filename in files:
        file_path = os.path.join(root, filename)
        if not filename.endswith(".ipynb"):
            os.remove(file_path)
    for dir_name in dirs:
        dir_path = os.path.join(root, dir_name)
        if not any(file.endswith(".ipynb") for file in os.listdir(dir_path)):
            shutil.rmtree(dir_path)

In [None]:
# remove .ipynb checkpoints
for root, dirs, files in os.walk(current_directory, topdown=False):
    for dir_name in dirs:
        if dir_name == ".ipynb_checkpoints":
            checkpoint_dir = os.path.join(root, dir_name)
            shutil.rmtree(checkpoint_dir)