<a href="https://colab.research.google.com/github/adines/SemiCompact/blob/main/Notebooks/ModelDistillation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Bioimage Model Zoo Model Distillation**

# **Creator of the notebook**
The creator of this notebook is Adrián Inés, from the University of La Rioja. For any questions or queries you can contact via email adrian.ines@unirioja.es.


# **How to use this notebook?**

---

<font size = 4>Video describing how to use our notebooks are available on youtube:
  - [**Video 1**](https://www.youtube.com/watch?v=GzD2gamVNHI&feature=youtu.be): Full run through of the workflow to obtain the notebooks and the provided test datasets as well as a common use of the notebook
  - [**Video 2**](https://www.youtube.com/watch?v=PUuQfP5SsqM&feature=youtu.be): Detailed description of the different sections of the notebook


---
###**Structure of a notebook**

<font size = 4>The notebook contains two types of cell:  

<font size = 4>**Text cells** provide information and can be modified by douple-clicking the cell. You are currently reading the text cell. You can create a new text by clicking `+ Text`.

<font size = 4>**Code cells** contain code and the code can be modfied by selecting the cell. To execute the cell, move your cursor on the `[ ]`-mark on the left side of the cell (play button appears). Click to execute the cell. After execution is done the animation of play button stops. You can create a new coding cell by clicking `+ Code`.

---
###**Table of contents, Code snippets** and **Files**

<font size = 4>On the top left side of the notebook you find three tabs which contain from top to bottom:

<font size = 4>*Table of contents* = contains structure of the notebook. Click the content to move quickly between sections.

<font size = 4>*Code snippets* = contain examples how to code certain tasks. You can ignore this when using this notebook.

<font size = 4>*Files* = contain all available files. After mounting your google drive (see section 1.) you will find your files and folders here. 

<font size = 4>**Remember that all uploaded files are purged after changing the runtime.** All files saved in Google Drive will remain. You do not need to use the Mount Drive-button; your Google Drive is connected in section 1.2.

<font size = 4>**Note:** The "sample data" in "Files" contains default files. Do not upload anything in here!

---
###**Making changes to the notebook**

<font size = 4>**You can make a copy** of the notebook and save it to your Google Drive. To do this click file -> save a copy in drive.

<font size = 4>To **edit a cell**, double click on the text. This will show you either the source code (in code cells) or the source text (in text cells).
You can use the `#`-mark in code cells to comment out parts of the code. This allows you to keep the original code piece in the cell as a comment.

#**0. Before getting started**
---
<font size = 4> To use this notebook, pay attention to the data structure. The dataset you want to use need to be organised in separate folders for each class.

<font size = 4>Here's a common data structure that can work:
*   PathDataset
    - **train**
        - class 1
            - img_1.tif, img_2.tif, ...
        - class 2
            - img_1.tif, img_2.tif, ...
        - class 3
            - img_1.tif, img_2.tif, ... 
    - **valid**
        - class 1
            - img_1.tif, img_2.tif, ...
        - class 2
            - img_1.tif, img_2.tif, ...
        - class 3
            - img_1.tif, img_2.tif, ...        
    - **unlabeled_images**
        - img_1.tif, img_2.tif, ...
        
---

# **1. Initialise the Colab session**
---

# **1.1. Install the dependencies**
---


In [None]:
#@markdown ##Install the dependencies

!pip install bioimageio.core -Uq
!pip install compact-distillation -Uq

print("Libraries installed")

In [None]:
#@markdown ##Load libraries

# ------- the imports for Distillation API -------
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import os, random
import shutil 
import distillation

import hashlib

# ------- the imports for bioimage.io model export -------
import bioimageio.core
import numpy as np
import torch
import torch.nn as nn
from bioimageio.core.build_spec import build_model, add_weights
from bioimageio.core.resource_tests import test_model

# ------- the imports for fastai -------
from fastai.vision.all import *
import fastai


# Colors for the warning messages
class bcolors:
  WARNING = '\033[31m'

W  = '\033[0m'  # white (normal)
R  = '\033[31m' # red

print("[Libraries have been loaded]")

## **1.2. Mount your Google Drive**
---
<font size = 4> To use this notebook on the data present in your Google Drive, you need to mount your Google Drive to this notebook.

<font size = 4> Play the cell below to mount your Google Drive and follow the link. In the new browser window, select your drive and select 'Allow', copy the code, paste into the cell and press enter. This will give Colab access to the data on the drive. 

<font size = 4> Once this is done, your data are available in the **Files** tab on the top left of notebook.

In [None]:

#@markdown ##Run this cell to connect your Google Drive to Colab

#@markdown * Click on the URL. 

#@markdown * Sign in your Google Account. 

#@markdown * Copy the authorization code. 

#@markdown * Enter the authorization code. 

#@markdown * Click on "Files" site on the right. Refresh the site. Your Google Drive folder should now be available here as "drive". 

#mounts user's Google Drive to Google Colab.

from google.colab import drive
drive.mount('/content/gdrive')




# **2. Train the model with Model Distillation**

---


In this section we are going to train a deep learning model using our dataset in which we have both labeled images and unlabeled images. For this we are going to use a semi-supervised learning method. Specifically, we will use a model distillation approach.

<font size = 4>**Semi-supervised methods** 

Semi-supervised learning methods are learning methods that use both labelled and unlabelled data, whereas self-supervised methods use only unlabelled data. In particular semi-supervised methods are an approach that in general (1) defines a base
model that is trained on labelled data, (2) uses the model to predict labels for unla-
belled data, and, finally, (3) initialise a model with the weights learned in (1), and (4)
retrains the model with both the most confident predictions produced in (2) and the
initial data; thus, enlarging the labelled training set. Semi-supervised learning meth-
ods can be grouped into three main types: self-training, consistency regularisation and
hybrid methods.

<font size = 4>**Self-training methods** 

Self-training is a basic approach that (1) defines a base model that is trained on
labelled data, (2) uses the model to predict labels for unlabelled data, and, finally, (3) retrains the model with the most confident predictions produced in (2); thus, enlarging the labelled training set. In a variant of self-training called distillation, a big model is used for (1) and (2), whereas a faster and smaller model than the model trained in (1) is employed in (3).

## **Model Distillation**
---

Model distillation is a form of self-training, a special kind of semi-supervised learning technique. Specifically, in model distillation, several models are employed to obtain predictions of unlabelled data; subsequently, those predictions are ensembled, and used to train a new model.

In our case we allow any type of combination of networks between the first and the second step.


In [None]:
#@markdown ##Where is the data?
PathDataset = "" #@param{type:"string"}


#@markdown ##Choose the base models to use
ResNet18 = False #@param {type:"boolean"}
ResNet50 = False #@param {type:"boolean"}
ResNet101 = False #@param {type:"boolean"}
EfficientNet = False #@param {type:"boolean"}
FBNet = False #@param {type:"boolean"}
MixNet = False #@param {type:"boolean"}
MNasNet = False #@param {type:"boolean"}
MobileNet = False #@param {type:"boolean"}
SqueezeNet = False #@param {type:"boolean"}
ShuffleNet = False #@param {type:"boolean"}


#@markdown ##Choose the target model to use
TargetModel = "MixNet" #@param ['ResNet18','ResNet50','ResNet101','EfficientNet','FBNet','MixNet','MNasNet','MobileNet','SqueezeNet','ShuffleNet']


#@markdown ##Choose training parameters

BatschSize = 32 #@param {type:"integer"}

ImageSize = 224 #@param{type:"integer"}

Confidence = 0.8 #@param {type:"slider", min:0, max:1, step:0.1}

BaseModels=[]

if ResNet18: BaseModels.append('ResNet18')
if ResNet50: BaseModels.append('ResNet50')
if ResNet101: BaseModels.append('ResNet101')
if EfficientNet: BaseModels.append('EfficientNet')
if FBNet: BaseModels.append('FBNet')
if MixNet: BaseModels.append('MixNet')
if MNasNet: BaseModels.append('MNasNet')
if MobileNet: BaseModels.append('MobileNet')
if SqueezeNet: BaseModels.append('SqueezeNet')
if ShuffleNet: BaseModels.append('ShuffleNet')


OutputPath=PathDataset+os.sep+'models'+os.sep+'outputModel'
if not os.path.exists(OutputPath):
  os.makedirs(OutputPath)

print("Start of training")
# Create a quality control in the Prediction Folder
distillation.modelDistillation(BaseModels, TargetModel, PathDataset, PathDataset+os.sep+'unlabeled_images', OutputPath, BatschSize, ImageSize, Confidence);


# **3. Model Information**
---

In this section we are going to fill in the information of our model so that it can be included in the [Bioimage Model Zoo](https://bioimage.io/#/).


In [None]:
#@markdown ##Enter the model name
ModelName = "" #@param{type:"string"}

#@markdown ##Enter the model description
ModelDescription = "" #@param {type:"string"}

ModelDocumentation= "Enter a path to a .md file" #@param {type:"string"}

#@markdown ##Enter the author's information
Name = "" #@param {type:"string"}
Affiliation = "" #@param {type:"string"}
GithubUser = "" #@param {type:"string"}
Orcid = "" #@param {type:"string"}

#@markdown ##Enter the license (A  [SPDX license identifier](https://spdx.org/licenses/))
License = "CC-BY-4.0" #@param {type:"string"}

#@markdown ##Enter the tags of the model separated by semicolons
Tags= "Classification; mixnet; blindness" #@param {type:"string"}

#@markdown ##Do you want to add a citation?
CitationText= "" #@param {type:"string"}
DOI = "" #@param {type:"string"}
URL = "" #@param {type:"string"}




path=LabeledData
bs=BatschSize
size=ImageSize

model=distillation.utils.getModel(TargetModel)

data_fast=ImageDataLoaders.from_folder(path,batch_tfms=aug_transforms(),item_tfms=Resize(size),bs=bs,device='cuda')

learner=cnn_learner(data_fast,model,metrics=[accuracy,Precision(average='macro'),Recall(average='macro'),F1Score(average='macro')])
learner.load(OutputPath+os.sep+'target_'+TargetModel+'.pth',device='cpu')

x,_ = data_fast.one_batch()
learner.model.cuda()
learner.model.eval()


x_mod=x.cpu().detach().numpy()[0]
x_mod2=np.expand_dims(x_mod, axis=0)


out=learner.model(x).cpu().detach().numpy()[0]
out_mod=np.expand_dims(out, axis=0)


np.save("test-input.npy", x_mod2)
np.save("test-output.npy",out_mod)

@patch
def requires_grad_(self:TensorBase, requires_grad=True):
    self.requires_grad = requires_grad
    return self

torch.jit.save(torch.jit.trace(learner.model, x), 'model.pt')

if not os.path.isfile(ModelDocumentation):
  print(ModelDocumentation+' is not a .md file')



author={}
if Name!="":
  author['name']=Name
if Affiliation!="":
  author['affiliation']=Affiliation
if GithubUser!="":
  author['github_user']=GithubUser
if Orcid!="":
  author['orcid']=Orcid

if Tags!="":
  tags=Tags.split(';')
  for tag in tags:
    tag=tag.strip()

citation={}
if CitationText!="":
  citation['text']=CitationText
if DOI!="":
  citation['doi']=DOI
if URL!="":
  citation['url']=URL

citations=[citation]

build_model(
    # the weight file and the type of the weights
    weight_uri="model.pt",
    weight_type="torchscript",
    # the test input and output data as well as the description of the tensors
    # these are passed as list because we support multiple inputs / outputs per model
    test_inputs=["test-input.npy"],
    test_outputs=["test-output.npy"],
    input_axes=["bcyx"],
    output_axes=["by"],
    # where to save the model zip, how to call the model and a short description of it
    output_path="model.zip",
    name=ModelName,
    description=ModelDescription,
    # additional metadata about authors, licenses, citation etc.
    authors=[author],
    license=License,
    documentation=ModelDocumentation,
    tags=tags,  # the tags are used to make models more findable on the website
    cite=citations,
    # sample_inputs=['02da652c74b8.tiff'],
    # test_inputs='blindness/images/0/005b95c28852.png',
    # n_inputs=1,
    # n_outputs=1,
    # sample_outputs=['02da652c74b8.tiff'],
    # test_outputs='0',
    # add_deepimagej_config=True
)

# **4. Test the model**

Now we can check if the built model is correct to be uploaded to the Bioimage Model Zoo. 

In [None]:
#@markdown ##<font color=orange>Test the model


my_model = bioimageio.core.load_resource_description("model.zip") 
test_model(my_model)