# Image Classification 2D

>BloodMNIST Dataset Demo: This tutorial provides a comprehensive, step-by-step guide to using the bioMONAI platform for 2D microscopy image classification tasks. 

In [None]:
#| default_exp tutorial_classification

### Setup imports

In [None]:
from bioMONAI.data import *
from bioMONAI.transforms import *
from bioMONAI.core import *
from bioMONAI.core import Path
from bioMONAI.data import get_image_files
from bioMONAI.losses import *
from bioMONAI.metrics import *
from bioMONAI.datasets import download_medmnist

from fastai.vision.all import CategoryBlock, GrandparentSplitter, parent_label, resnet34, CrossEntropyLossFlat, accuracy

In [None]:
device = get_device()
print(device)

### Dataset Information and Download

We'll employ the publicly available BloodMNIST dataset. The BloodMNIST is based on a dataset of individual normal cells, captured from individuals without infection, hematologic or oncologic disease and free of any pharmacologic treatment at the moment of blood collection. It contains a total of 17,092 images and is organized into 8 classes.



> In this step, we will download the BloodMNIST dataset using the `download_medmnist` function from bioMONAI. This function will download the dataset and provide information about it. The dataset will be stored in the specified path. You can customize the path or dataset name as needed. Additionally, you can explore other datasets available in the MedMNIST collection by changing the dataset name in the `download_medmnist` function.

In [None]:
image_path = Path('../_data/medmnist_data/')
info = download_medmnist('bloodmnist', image_path, download_only=True)

### Create DataLoader

In this step, we will customize the DataLoader for the BloodMNIST dataset. The DataLoader is responsible for loading the data during training and validation. We will define the data loading strategy using the `BioDataLoaders.from_source()` method, configured with the arguments specified in `data_ops`. You can customize the following parameters to suit your needs:

- `batch_size`: The number of samples per batch. Adjust this based on your GPU memory capacity.
- `item_tfms`: List of item-level transformations to apply to the images. You can add or modify transformations to augment your dataset.
- `splitter`: The method to split the dataset into training and validation sets. You can customize the split strategy if needed.

>Feel free to experiment with different configurations to improve model performance or adapt to different datasets.

In [None]:
batch_size = 32

path = image_path/'bloodmnist'
train_path = path/'train'
val_path = path/'val'

data_ops = {
    'blocks':       (BioImageBlock(cls=BioImageMulti), CategoryBlock(info['label'])),            # define a `TransformBlock` tailored for bioimaging data
    'get_items':    get_image_files,                                                             # get image files in path
    'get_y':        parent_label,                                                                # Label item with the parent folder name
    'splitter':     GrandparentSplitter(train_name='train', valid_name='val'),                   # split data with the grandparent folder name
    'item_tfms':    [ScaleIntensity(min=0.0, max=1.0), RandRot90(prob=0.5), RandFlip(prob=0.5)], # list of item transforms
    'bs':           batch_size,                                                                  # batch size
}

data = BioDataLoaders.from_source(
    path,                           # root directory for data
    show_summary=False,             # print summary of the data
    **data_ops,                     # rest of method arguments
    )

# print length of training and validation datasets
print('train images:', len(data.train_ds.items), '\nvalidation images:', len(data.valid_ds.items))

### Visualize a Batch of Images

In this step, we will visualize a batch of images from the BloodMNIST dataset using the `show_batch` method. This will help us understand the data distribution and verify the transformations applied to the images. The `max_n` parameter specifies the number of images to display.

> - You can adjust the `max_n` parameter to display more or fewer images.
> - Experiment with different transformations in the `item_tfms` list to see their effects on the images.
> - Use the `show_batch` method at different stages of your data pipeline to ensure the data is being processed correctly.

In [None]:
data.show_batch(max_n=4)

### Train the Model

In this step, we will train the model using the `visionTrainer` class. The `fine_tune` method will be used to fine-tune the model for a specified number of epochs. The `freeze_epochs` parameter allows you to freeze the initial layers of the model for a certain number of epochs before unfreezing and training the entire model.

> - You can adjust the `epochs` parameter to train the model for more or fewer epochs based on your dataset and computational resources.
> - Experiment with different values for `freeze_epochs` to see how it affects model performance.
> - Monitor the training process and adjust the learning rate or other hyperparameters if needed.
> - Consider using techniques like early stopping or learning rate scheduling to improve training efficiency and performance.

##### VisionTrainer Class

The `visionTrainer` class is a high-level API designed to simplify the training process for vision models. It provides a convenient interface for training, fine-tuning, and evaluating deep learning models. Here are some key features and functionalities of the `visionTrainer` class:

- **Initialization**: The class is initialized with the data, model architecture, loss function, and metrics. It also provides options to display a summary of the model and data.
- **Fine-tuning**: The `fine_tune` method allows you to fine-tune the model for a specified number of epochs. You can freeze the initial layers of the model for a certain number of epochs before unfreezing and training the entire model.
- **Training**: The class handles the training loop, including forward and backward passes, loss computation, and optimization.
- **Evaluation**: The class provides methods to evaluate the model on validation and test datasets, compute metrics, and visualize results.
- **Customization**: You can customize various aspects of the training process, such as learning rate, batch size, and data augmentations, to suit your specific needs.

> The `visionTrainer` class is designed to streamline the training process, making it easier to experiment with different models and hyperparameters. It is particularly useful for tasks like image classification, where you can leverage pre-trained models and fine-tune them on your dataset.

In [None]:
model = resnet34

loss = CrossEntropyLossFlat()
metrics = accuracy

trainer = visionTrainer(data, model, loss_fn=loss, metrics=metrics, show_summary=False)

In [None]:
trainer.fine_tune(20, freeze_epochs=2)

### Evaluate the Model on Validation Data

In this step, we will evaluate the trained model on the validation dataset using the `evaluate_classification_model` function. This function computes the specified metrics and provides insights into the model's performance. Additionally, it can display the most confused classes to help identify areas for improvement.

> - You can customize the `metrics` parameter to include other evaluation metrics relevant to your task.
> - The `most_confused_n` parameter specifies the number of most confused classes to display. Adjust this value to see more or fewer confused classes.
> - Set the `show_graph` parameter to `True` to visualize the confusion matrix and other evaluation graphs.
> - Use this evaluation step to monitor the model's performance and make necessary adjustments to the training process or data pipeline.

In [None]:
evaluate_classification_model(trainer,  metrics=metrics, most_confused_n=5, show_graph=False)

### Save the Model

In this step, we will save the trained model using the `save` method of the `visionTrainer` class. Saving the model allows us to reuse it later without retraining. This is particularly useful when you want to deploy the model or continue training at a later time.

> - You can specify the file path and name for the saved model. Ensure the directory exists or create it if necessary.
> - Consider saving the model at different checkpoints during training to have backups and the ability to revert to a previous state if needed.
> - You can also save additional information such as the training history, optimizer state, and hyperparameters to facilitate future use or further training.

In [None]:
trainer.save('tmp-model')

### Evaluate the Model on Test Data

In this step, we will evaluate the trained model on the test dataset to assess its performance on unseen data. This is a crucial step to ensure that the model generalizes well and performs accurately on new, unseen samples. We will use the `evaluate_classification_model` function to compute the specified metrics and gain insights into the model's performance.

> - Ensure that the test dataset is completely separate from the training and validation datasets to get an unbiased evaluation.
> - You can customize the `metrics` parameter to include other evaluation metrics relevant to your task.
> - The `show_graph` parameter can be set to `True` to visualize the confusion matrix and other evaluation graphs.
> - Use this evaluation step to identify any potential issues with the model and make necessary adjustments to the training process or data pipeline.
> - Consider experimenting with different model architectures, hyperparameters, and data augmentations to further improve performance.

In [None]:
test_path = path/'test'

test_data = data.test_dl(get_image_files(test_path).shuffle(), with_labels=True)
# print length of test dataset
print('test images:', len(test_data.items))

In [None]:
evaluate_classification_model(trainer, test_data, metrics=metrics, show_graph=False);

### Load the Model

In this step, we will load the previously trained model using the `load` method of the `visionTrainer` class. In this example, we will:

> - Create a trainer instance and load the previously saved model.
> - Fine tune the model a several epochs more.
> - Evaluate the model with test data again.

In [None]:
model = resnet34

loss = CrossEntropyLossFlat()
metrics = accuracy

trainer2 = visionTrainer(data, model, loss_fn=loss, metrics=metrics, show_summary=False)

# Load saved model
trainer2.load('tmp-model')

# Train several additional epochs
trainer2.fine_tune(5, freeze_epochs=1)

# Evaluate the model on the test dataset
evaluate_classification_model(trainer2, test_data, metrics=metrics, show_graph=False)