Copyright (c) MONAI Consortium  
Licensed under the Apache License, Version 2.0 (the "License");  
you may not use this file except in compliance with the License.  
You may obtain a copy of the License at  
&nbsp;&nbsp;&nbsp;&nbsp;http://www.apache.org/licenses/LICENSE-2.0  
Unless required by applicable law or agreed to in writing, software  
distributed under the License is distributed on an "AS IS" BASIS,  
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
See the License for the specific language governing permissions and  
limitations under the License.

## Setup environment

In [None]:
!python -c "import monai" || pip install -q "monai-weekly[ignite,pyyaml]"

## Setup imports

In [None]:
import torchvision
from monai.config import print_config

print_config()

# Integrating Non-MONAI Code Into a Bundle

This notebook will discuss strategies for integrating non-MONAI deep learning code into a bundle. This allows existing Pytorch workflows to be integrated into the bundle ecosystem, for example as a distributable bundle for the model zoo or some other repository like Hugging Face, or to integrate with MONAI Label. The assumption taken here is that you already have the components for preprocessing, inference, validation, and other parts of a workflow, and so the task is how to integrate these parts into MONAI types which can be embedded in config files.

In the following cells we'll construct a bundle which follows the [CIFAR10 tutorial](https://github.com/pytorch/tutorials/blob/32d834139b8627eeacb5fb2862be9f095fcb0b52/beginner_source/blitz/cifar10_tutorial.py) in Pytorch's tutorials repo. A number of code components will be copied into the `scripts` directory of the bundle and linked into config files suitable to be used on the command line.

We'll start with an initialised bundle with a "scripts" directory and provide an appropriate metadata file describing the CIFAR10 classification network we'll provide:

In [1]:
%%bash

python -m monai.bundle init_bundle IntegrationBundle
rm IntegrationBundle/configs/inference.json
mkdir IntegrationBundle/scripts
echo "" > IntegrationBundle/scripts/__init__.py

which tree && tree IntegrationBundle || true

/usr/bin/tree
[01;34mIntegrationBundle[00m
├── [01;34mconfigs[00m
│   └── metadata.json
├── [01;34mdocs[00m
│   └── README.md
├── LICENSE
├── [01;34mmodels[00m
└── [01;34mscripts[00m
    └── __init__.py

4 directories, 4 files


In [2]:
%%writefile IntegrationBundle/configs/metadata.json

{
    "version": "0.0.1",
    "changelog": {
        "0.0.1": "Initial version"
    },
    "monai_version": "1.2.0",
    "pytorch_version": "2.0.0",
    "numpy_version": "1.23.5",
    "optional_packages_version": {
        "torchvision": "0.15.0"
    },
    "name": "IntegrationBundle",
    "task": "Example Bundle",
    "description": "This illustrates integrating non-MONAI code (CIFAR10 classification) into a bundle",
    "authors": "Your Name Here",
    "copyright": "Copyright (c) Your Name Here",
    "data_source": "CIFAR10",
    "data_type": "float32",
    "intended_use": "This is suitable for demonstration only",
    "network_data_format": {
        "inputs": {
            "image": {
                "type": "image",
                "format": "magnitude",
                "modality": "natural",
                "num_channels": 3,
                "spatial_shape": [32, 32],
                "dtype": "float32",
                "value_range": [-1, 1],
                "is_patch_data": false,
                "channel_def": {
                    "0": "red",
                    "1": "green",
                    "2": "blue"
                }
            }
        },
        "outputs": {
            "pred": {
                "type": "probabilities",
                "format": "classes",
                "num_channels": 10,
                "spatial_shape": [10],
                "dtype": "float32",
                "value_range": [0, 1],
                "is_patch_data": false,
                "channel_def": {
                    "0": "plane",
                    "1": "car",
                    "2": "bird",
                    "3": "cat",
                    "4": "deer",
                    "5": "dog",
                    "6": "frog",
                    "7": "horse",
                    "8": "ship",
                    "9": "truck"
                }
            }
        }
    }
}

Overwriting IntegrationBundle/configs/metadata.json


Note that `torchvision` was added as an optional package but will be required to run the bundle. 

## Scripts

Taking the CIFAR10 tutorial as the "codebase" we're using currently, which we want to convert into a bundle, we want to copy components into `scripts` from that codebase. We'll start with the network given in the tutorial:

In [3]:
%%writefile IntegrationBundle/scripts/net.py

import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Writing IntegrationBundle/scripts/net.py


Data transforms and data loaders are provided using definitions from `torchvision`. If we assume that these aren't easily converted into MONAI types, we instead need a function to return data loaders which will be used in config files. We could adapt the existing code by simply copying it into a function returning these definitions for use in the bundle:

In [4]:
%%writefile IntegrationBundle/scripts/transforms.py

import torchvision.transforms as transforms

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])


Writing IntegrationBundle/scripts/transforms.py


In [5]:
%%writefile IntegrationBundle/scripts/dataloaders.py

import torch
import torchvision

batch_size = 4

def get_dataloader(is_training, transform):
    
    if is_training:
        trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
        trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                                  shuffle=True, num_workers=2)
        return trainloader
    else:
        testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
        testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                                 shuffle=False, num_workers=2)
        return testloader   

Writing IntegrationBundle/scripts/dataloaders.py


The training process in the tutorial is just a loop going through the dataset twice. The simplest adaptation for this is to wrap it in a function taking only the network and dataloader as arguments:

In [6]:
%%writefile IntegrationBundle/scripts/train.py

import torch.nn as nn
import torch.optim as optim


def train(net,trainloader):
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

    for epoch in range(2): 

        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            inputs, labels = data

            optimizer.zero_grad()

            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()
            if i % 2000 == 1999:  
                print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
                running_loss = 0.0

    print('Finished Training')


Writing IntegrationBundle/scripts/train.py


This function will hard code all sorts of parameters like loss function, learning rate, epoch count, etc. For this example it will work but of course if you're adapting other code it would make sense to include more parameterisation to your wrapper components. 

## Training

We can now define a training config file:

In [7]:
%%writefile IntegrationBundle/configs/train.yaml

imports:
- $import torch
- $import scripts
- $import scripts.net
- $import scripts.train
- $import scripts.transforms
- $import scripts.dataloaders

net:
  _target_: scripts.net.Net

transforms: '$scripts.transforms.transform'

dataloader: '$scripts.dataloaders.get_dataloader(True, @transforms)'

train:
- $scripts.train.train(@net, @dataloader)
- $torch.save(@net.state_dict(), './cifar_net.pth')


Writing IntegrationBundle/configs/train.yaml


The key concept demonstrated here is how to refer to definitions in the `scripts` directory within a config file and tie them together into a program. These definitions can be existing types or wrapper functions around existing code to make them easier to refer to here. A lot of good practice is ignored here but it shows how to adapt code into a bundle with minimal changes.

Let's train something simple with this setup:

In [1]:
%%bash

BUNDLE="./IntegrationBundle"

export PYTHONPATH=$BUNDLE

python -m monai.bundle run train \
    --bundle_root "$BUNDLE" \
    --meta_file "$BUNDLE/configs/metadata.json" \
    --config_file "$BUNDLE/configs/train.yaml" 

2023-09-11 17:28:16,125 - INFO - --- input summary of monai.bundle.scripts.run ---
2023-09-11 17:28:16,125 - INFO - > run_id: 'train'
2023-09-11 17:28:16,125 - INFO - > meta_file: './IntegrationBundle/configs/metadata.json'
2023-09-11 17:28:16,125 - INFO - > config_file: './IntegrationBundle/configs/train.yaml'
2023-09-11 17:28:16,125 - INFO - > bundle_root: './IntegrationBundle'
2023-09-11 17:28:16,125 - INFO - ---




Default logging file in 'configs/logging.conf' does not exist, skipping logging.


Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:56<00:00, 3010200.83it/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
[1,  2000] loss: 2.162
[1,  4000] loss: 1.888
[1,  6000] loss: 1.688
[1,  8000] loss: 1.580
[1, 10000] loss: 1.487
[1, 12000] loss: 1.446
[2,  2000] loss: 1.402
[2,  4000] loss: 1.392
[2,  6000] loss: 1.339
[2,  8000] loss: 1.317
[2, 10000] loss: 1.276
[2, 12000] loss: 1.275
Finished Training


## Testing 

The second part of the tutorial script is testing the network with the test data which can again be put into a simple routine called from a config file: 

In [2]:
%%writefile IntegrationBundle/scripts/test.py

import torch


def test(net, testloader):
    correct = 0
    total = 0
    
    with torch.no_grad():
        for data in testloader:
            images, labels = data
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    print(f'Accuracy of the network on the 10000 test images: {100 * correct // total} %')
    

Writing IntegrationBundle/scripts/test.py


In [3]:
%%writefile IntegrationBundle/configs/test.yaml

imports:
- $import torch
- $import scripts
- $import scripts.test
- $import scripts.transforms
- $import scripts.dataloaders

net:
  _target_: scripts.net.Net

transforms: '$scripts.transforms.transform'

dataloader: '$scripts.dataloaders.get_dataloader(False, @transforms)'

test:
- $@net.load_state_dict(torch.load('./cifar_net.pth'))
- $scripts.test.test(@net, @dataloader)


Writing IntegrationBundle/configs/test.yaml


In [4]:
%%bash

BUNDLE="./IntegrationBundle"

export PYTHONPATH=$BUNDLE

python -m monai.bundle run test \
    --bundle_root "$BUNDLE" \
    --meta_file "$BUNDLE/configs/metadata.json" \
    --config_file "$BUNDLE/configs/test.yaml" 

2023-09-11 17:31:17,644 - INFO - --- input summary of monai.bundle.scripts.run ---
2023-09-11 17:31:17,644 - INFO - > run_id: 'test'
2023-09-11 17:31:17,644 - INFO - > meta_file: './IntegrationBundle/configs/metadata.json'
2023-09-11 17:31:17,644 - INFO - > config_file: './IntegrationBundle/configs/test.yaml'
2023-09-11 17:31:17,644 - INFO - > bundle_root: './IntegrationBundle'
2023-09-11 17:31:17,644 - INFO - ---




Default logging file in 'configs/logging.conf' does not exist, skipping logging.


Files already downloaded and verified
Accuracy of the network on the 10000 test images: 54 %


## Inference

The original script lacked a section on inference with the network, however this is rather straight forward and so a script and config file can easily implement this:

In [8]:
%%writefile IntegrationBundle/scripts/inference.py

import torch
from PIL import Image


def inference(net, transforms, filenames):
    for fn in filenames:
        with Image.open(fn) as im:
            tim=transforms(im)
            outputs=net(tim[None])
            _, predictions = torch.max(outputs, 1)
            print(fn, predictions[0].item())

Overwriting IntegrationBundle/scripts/inference.py


In [9]:
%%writefile IntegrationBundle/configs/inference.yaml

imports:
- $import glob
- $import torch
- $import scripts
- $import scripts.inference
- $import scripts.transforms

ckpt_path: './cifar_net.pth'

input_dir: 'test_cifar10'
input_files: '$sorted(glob.glob(@input_dir+''/*.*''))'

net:
  _target_: scripts.net.Net

transforms: '$scripts.transforms.transform'

inference:
- $@net.load_state_dict(torch.load('./cifar_net.pth'))
- $scripts.inference.inference(@net, @transforms, @input_files)

Overwriting IntegrationBundle/configs/inference.yaml


Here we'll create a test set of image files to load and predict on:

In [23]:
root_dir = "."  # assuming CIFAR10 was downloaded to the current directory
num_images = 20
dataset = torchvision.datasets.CIFAR10(root=f"{root_dir}/data", train=False)

!mkdir -p test_cifar10

for i in range(num_images):
    pil, label = dataset[i]
    filename = f"test_cifar10/img{i:02}.png"
    print(filename, "Label:", label)
    pil.save(filename)

test_cifar10/img00.png Label: 3
test_cifar10/img01.png Label: 8
test_cifar10/img02.png Label: 8
test_cifar10/img03.png Label: 0
test_cifar10/img04.png Label: 6
test_cifar10/img05.png Label: 6
test_cifar10/img06.png Label: 1
test_cifar10/img07.png Label: 6
test_cifar10/img08.png Label: 3
test_cifar10/img09.png Label: 1
test_cifar10/img10.png Label: 0
test_cifar10/img11.png Label: 9
test_cifar10/img12.png Label: 5
test_cifar10/img13.png Label: 7
test_cifar10/img14.png Label: 9
test_cifar10/img15.png Label: 8
test_cifar10/img16.png Label: 5
test_cifar10/img17.png Label: 7
test_cifar10/img18.png Label: 8
test_cifar10/img19.png Label: 6


In [24]:
%%bash

BUNDLE="./IntegrationBundle"

export PYTHONPATH=$BUNDLE

python -m monai.bundle run inference \
    --bundle_root "$BUNDLE" \
    --meta_file "$BUNDLE/configs/metadata.json" \
    --config_file "$BUNDLE/configs/inference.yaml" 

2023-09-11 17:54:11,793 - INFO - --- input summary of monai.bundle.scripts.run ---
2023-09-11 17:54:11,793 - INFO - > run_id: 'inference'
2023-09-11 17:54:11,793 - INFO - > meta_file: './IntegrationBundle/configs/metadata.json'
2023-09-11 17:54:11,793 - INFO - > config_file: './IntegrationBundle/configs/inference.yaml'
2023-09-11 17:54:11,793 - INFO - > bundle_root: './IntegrationBundle'
2023-09-11 17:54:11,793 - INFO - ---




Default logging file in 'configs/logging.conf' does not exist, skipping logging.


test_cifar10/img00.png 3
test_cifar10/img01.png 8
test_cifar10/img02.png 8
test_cifar10/img03.png 0
test_cifar10/img04.png 6
test_cifar10/img05.png 6
test_cifar10/img06.png 1
test_cifar10/img07.png 4
test_cifar10/img08.png 3
test_cifar10/img09.png 1
test_cifar10/img10.png 0
test_cifar10/img11.png 9
test_cifar10/img12.png 6
test_cifar10/img13.png 7
test_cifar10/img14.png 9
test_cifar10/img15.png 1
test_cifar10/img16.png 5
test_cifar10/img17.png 3
test_cifar10/img18.png 8
test_cifar10/img19.png 4


## Adaptation Strategies

This notebook has demonstrated one strategy of integrating existing code into a bundle. Code from an existing project, in this case an example script, was copied into the `scripts` directory of a bundle with added functions to make definitions easily referenced in config files. What shows up in the config files is a thin adapter layer to interface between what is expected in bundles and the codebase. 

It's clear that a mixed approach, where old components are replaced with MONAI types, would also work well given the simplicity of the code here. Substituting the Torchvision transforms with those from MONAI, using a `Trainer` class instead of the `train` function, and similarly testing and inference using an `Evaluator` class, would produce essentially the same results. It is up to you to determine what rewriting of code in the config scripts is justified for your codebase rather than adapting things in some way. 

The third approach involves a codebase which is installed as a package. If an external network with its training components is installed with `pip` for example, perhaps no code would be needed to adapt into a bundle, and you can just write config scripts to import this package and reference its definitions. Some adapter code may be needed in `scripts` but this could be like those demonstrated here, simple wrapper functions returning objects assigned to keys in config files through evaluated Python expressions. 

Creating a bundle compatible with other tools requires you to define specific items in the config files. For example, MONAI Label states requirements [here](https://github.com/Project-MONAI/MONAILabel/blob/c90f42c0730554e3a05af93645ae84ccdcb5e14b/monailabel/tasks/infer/bundle.py#L33) as names that must be present in `inference.json/yaml` to work with the label server. You would have to provide `network_def`, `preprocessing`, `postprocessing`, and others. This means that the code from your existing codebase would have to be divided up into these components if it isn't already, and its inputs and output would have to match what would be expected of the MONAI types typically used for these definitions. 

If you need to adapt your code to a bundle it's going to be very specific to your situation how integration is going to work. Using config files as adapter layers is shown here to work, but by understanding how bundles are structured and what the moving pieces are to a bundle "program" you can figure out your own strategy.

### Adapting Data Processing

One common module is data processing, either pre or post at various stages. MONAI transforms assume that Numpy arrays or Pytorch tensors, or dictionaries thereof, are the inputs and outputs to transforms. You can integrate existing transforms using `Lambda/Lambdad` to wrap a callable object within a MONAI transform rather than define your own `Transform` subclass. This does require that data have the correct type and shape. For example, if you have a function in `scripts` simply called `preprocess` which accepts a single image input as a Numpy array, this can be adapted into a transform sequence as such:

```python
train_transforms:
- _target_: LoadImage
  image_only: true
- _target_: EnsureChannelFirst
- _target_: ToNumpy
- _target_: Lambda
  func: '$@scripts.preprocess'
- _target_: ToTensor
```

Minimising conversions to and from different formats would improve performance but otherwise this avoids complex rewriting of code to fit MONAI tranforms. A preprocess function which takes multiple inputs and produces multiple outputs would be more suited to a dictionary-based transform sequence but would also require adaptor code or a `MapTransform` subclass. 


## Summary and Next

In this tutorial we have looked at how to adapt code to a MONAI bundle:
* Wrapping code in thin adaptation layers
* Using these components in config files
* Discussion of the architectural concepts around the process of adaptation

In future tutorials we shall delve into other details and strategies with MONAI bundles.