# Inspecting the FM

This notebook contains a tutorial on loading the Foundation Model, inspecting different layers and the architecture and running a forward pass over some sample data. 

In [3]:
!pip install foundation-cancer-image-biomarker -qq

## Loading the model 

Our python package provides a simple API to download the model weights and load it into resnet50 architecture so you can use your data for easy inference

Note: The download takes a couple of minutes (~8 mins). You can see a progress bar that shows you the time indication

In [5]:
from fmcib.models import fmcib_model

model = fmcib_model()

Downloading: 100% [738451713 / 738451713] bytes

2024-02-14 12:39:54.885 | INFO     | fmcib.models.load_model:load:66 - Loaded pretrained model weights 



Printing the model allows you to see its composition. The `LoadModel` is a container that consists of a `trunk` and `heads` components. 


More specifically,

The model is composed of a `ResNet` architecture with the following layers:

1. A 3D convolution layer (`Conv3d`) with 128 output channels and a 7x7x7 kernel size.
2. A 3D batch normalization layer (`BatchNorm3d`).
3. A ReLU activation function.
4. A 3D max pooling layer (`MaxPool3d`).

The main body of the model consists of four sequential layers (`layer1`, `layer2`, `layer3`, `layer4`), each containing multiple `ResNetBottleneck` blocks. Each `ResNetBottleneck` block consists of three convolution layers, three batch normalization layers, and a ReLU activation function. Some blocks also include a downsample layer, which is a sequential layer consisting of a convolution layer and a batch normalization layer.

The output of the model is processed by an adaptive average pooling layer (`AdaptiveAvgPool3d`) and an identity layer (`Identity`).


In [9]:
print(model)

LoadModel(
  (trunk): ResNet(
    (conv1): Conv3d(1, 128, kernel_size=(7, 7, 7), stride=(2, 2, 2), padding=(3, 3, 3), bias=False)
    (bn1): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (maxpool): MaxPool3d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): Sequential(
      (0): ResNetBottleneck(
        (conv1): Conv3d(128, 128, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
        (bn1): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
        (bn2): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv3d(128, 512, kernel_size=(1, 1, 1), stride=(1, 1, 1), bias=False)
        (bn3): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)

## Using the model for inference
We can take a dummy torch tensor and pass it through the model to see the input and output more clearly. The last layer is an average pooling layer, so you can effectively pass it any input size and you will receive a constant sized output that has 4096 dimensions (the number of channels of the last conv layer)


In [11]:
import torch

ip = torch.rand(
    1, 1, 50, 50, 50
)  # The input tensor should be of shape (B, C, H, W, D) where B is the batch size, C is the number of channels, H is the height, W is the width and D is the depth.
ip_2 = torch.rand(1, 1, 60, 60, 60)
ip_3 = torch.rand(1, 1, 30, 30, 30)

In [12]:
# The recommended size for the input tensor is 50x50x50
out = model(ip)
out.shape

torch.Size([1, 4096])

In [13]:
# The model can also take input of different sizes as well if needed (not tested)
out = model(ip_2)
out.shape

torch.Size([1, 4096])

In [15]:
# A very small input size is not supported as the filters are designed for atleast a 50x50x50 input. So this will give you an error.
out = model(ip_3)
out.shape

ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 1024, 1, 1, 1])