## Try out our model here.

We test our mutli-modal Faster R-CNN with MIMIC dataset here.

In [1]:
import torch, gc
import pandas as pd

from data.dataset import ReflacxDataset, collate_fn
from utils.transforms import get_transform
from models.load import ModelSetup, create_model_from_setup

## Suppress the assignement warning from pandas.
pd.options.mode.chained_assignment = None  # default='warn'

In [2]:
gc.collect()
# torch.cuda.memory_summary(device=None, abbreviated=False)

use_gpu = torch.cuda.is_available()
device = 'cuda' if use_gpu else 'cpu'
print(f"This notebook will running on device: [{device.upper()}]")

if use_gpu:
    torch.cuda.empty_cache()

This notebook will running on device: [CUDA]


### Define your MIMIC folde path here.

In [3]:
XAMI_MIMIC_PATH = "D:\XAMI-MIMIC"

use_iobb = True
io_type_str = "IoBB" if use_iobb else "IoU"

all_model_setups = [
    ModelSetup(
        name="original",
        use_clinical=False,
        use_custom_model=False,
        use_early_stop_model=True,
    ),
    ModelSetup(
        name="custom_without_clinical",
        use_clinical=False,
        use_custom_model=True,
        use_early_stop_model=True,
    ),
    ModelSetup(
        name="custom_with_clinical",
        use_clinical=True,
        use_custom_model=True,
        use_early_stop_model=True,
    ),
]


In [4]:
model_setup = all_model_setups[2]

# Initiate datasets and dataloaders
The batch size is also defined in this section. For testing purpose, we only set it as 2.

In [5]:
labels_cols = [
    "Enlarged cardiac silhouette",
    "Atelectasis",
    "Pleural abnormality",
    "Consolidation",
    "Pulmonary edema",
    #  'Groundglass opacity', # 6th disease.
]

dataset_params_dict = {
    "XAMI_MIMIC_PATH": XAMI_MIMIC_PATH,
    "with_clinical": model_setup.use_clinical,
    "dataset_mode": "normal",
    "bbox_to_mask": True,
    "labels_cols": labels_cols,
}

detect_eval_dataset = ReflacxDataset(
    **{**dataset_params_dict, "dataset_mode": "normal",},
    transforms=get_transform(train=False),
)

train_dataset = ReflacxDataset(
    **dataset_params_dict, split_str="train", transforms=get_transform(train=True),
)

val_dataset = ReflacxDataset(
    **dataset_params_dict, split_str="val", transforms=get_transform(train=False),
)

test_dataset = ReflacxDataset(
    **dataset_params_dict, split_str="test", transforms=get_transform(train=False),
)

batch_size = 2

train_dataloader = torch.utils.data.DataLoader(
    train_dataset, batch_size=batch_size, shuffle=True, collate_fn=collate_fn,
)

val_dataloader = torch.utils.data.DataLoader(
    val_dataset, batch_size=batch_size, shuffle=True, collate_fn=collate_fn,
)

test_dataloader = torch.utils.data.DataLoader(
    test_dataset, batch_size=batch_size, shuffle=True, collate_fn=collate_fn,
)


In [6]:
print(f"We used to have {len(detect_eval_dataset.df.dicom_id)}, after unifying, we will have {len(detect_eval_dataset.df.dicom_id.unique())}.")

We used to have 670, after unifying, we will have 590.


## Example instance from dataset
We show what's inside a single instance. It will provide:

- Images
- Clinical data
- Targets (Dictionary)

And, inside the target, there're:

- boxes (bounding boxes of abnormality)
- lable (disease index (Note: the class **0** means the background))
- image_id (idx to get that image)
- area (the areas that bouding boxes contain)
- iscrowd (if it's a place with multiple bouding boxes, we assume all the the bouding boxes are not crowd.)

In [7]:
train_dataset[0]

torch.Size([1, 4])

## Define Model.

We define he models here. Two backbone examples are in the below code section. The MobileNet is a light weight network, and ResNet is heavier, but usually perform better. In our case, the calculation is not the most important factor; therefore, we chose ResNet with feature pyramid networks (FPN) backbone.

In [8]:
model = create_model_from_setup(detect_eval_dataset, model_setup, device)

{'rpn_nms_thresh': 0.3, 'box_detections_per_img': 6, 'box_nms_thresh': 0.2, 'rpn_score_thresh': 0.0, 'box_score_thresh': 0.05}
c1
None


## Prepare data to feed

We prepare three main data to test the model:

- CXR image
- Clinical data
- Target

And, for each data, we adjust the format to what the model expect.

In [9]:
data = next(iter(train_dataloader))
data = train_dataset.prepare_input_from_data(data, device)

Using unified.
Using unified.


In [10]:
data[-1]

[{'masks': tensor([[[0, 0, 0,  ..., 0, 0, 0],
           [0, 0, 0,  ..., 0, 0, 0],
           [0, 0, 0,  ..., 0, 0, 0],
           ...,
           [0, 0, 0,  ..., 0, 0, 0],
           [0, 0, 0,  ..., 0, 0, 0],
           [0, 0, 0,  ..., 0, 0, 0]],
  
          [[0, 0, 0,  ..., 0, 0, 0],
           [0, 0, 0,  ..., 0, 0, 0],
           [0, 0, 0,  ..., 0, 0, 0],
           ...,
           [0, 0, 0,  ..., 0, 0, 0],
           [0, 0, 0,  ..., 0, 0, 0],
           [0, 0, 0,  ..., 0, 0, 0]]], device='cuda:0', dtype=torch.uint8),
  'image_path': 'D:\\XAMI-MIMIC\\patient_11495809\\CXR-JPG\\s52705409\\0295a5c7-982330bd-2203b511-4d052b0f-a43a5e17.jpg',
  'dicom_id': '0295a5c7-982330bd-2203b511-4d052b0f-a43a5e17',
  'iscrowd': tensor([0, 0], device='cuda:0'),
  'area': tensor([303715., 707805.], device='cuda:0', dtype=torch.float64),
  'image_id': tensor([108], device='cuda:0'),
  'labels': tensor([2, 2], device='cuda:0'),
  'boxes': tensor([[1719., 1623., 2414., 2060.],
          [ 391., 1392., 1

# Test Feedforawrd (Training)

In [11]:
output = model(*data[:-1], targets=data[-1])

  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]


## Results we get.
Four different losses are given in the result, we will use these losses to optimise the network while training. 

In [12]:
output

{'loss_classifier': tensor(1.8260, device='cuda:0', grad_fn=<NllLossBackward0>),
 'loss_box_reg': tensor(0.0002, device='cuda:0', grad_fn=<DivBackward0>),
 'loss_mask': tensor(0.8756, device='cuda:0',
        grad_fn=<BinaryCrossEntropyWithLogitsBackward0>),
 'loss_objectness': tensor(0.7195, device='cuda:0',
        grad_fn=<BinaryCrossEntropyWithLogitsBackward0>),
 'loss_rpn_box_reg': tensor(0.0047, device='cuda:0', dtype=torch.float64, grad_fn=<DivBackward0>)}

# Test Feedforawrd (Evaluation)

In [13]:
model.eval()
pred = model(*data[:-1])

## Results we get.
If we set the model to evaluation mode and don't pass the target to the forward function, the model will output prediction (detections). In the below sections, we show what's inside the detection of first instance (idx=0).

### Detection.

A detection contain *boxes*, *lables*, and *scores*.

- *boxes*: All the bounding boxes for this image. 
- *lables*: Labels corresponded to the bounding boxes.
- *score*: Score (Confidence) for each boudning box.

In [14]:
pred[0].keys()

dict_keys(['boxes', 'labels', 'scores', 'masks'])

In [15]:
pred[0]['boxes']

tensor([[ 468.9502, 1203.1854,  567.1108, 1426.9713],
        [2325.5161, 1316.1874, 2524.6357, 1799.9176],
        [2417.6809,  602.9283, 2519.1047,  829.5873],
        [ 532.8803, 1454.3362,  625.7414, 1654.4663],
        [   0.0000, 1011.0303, 1005.5333, 2017.4100],
        [   0.0000,   35.5040,  140.0621,  504.7518]], device='cuda:0',
       grad_fn=<StackBackward0>)

In [16]:
pred[0]['labels']

tensor([4, 3, 3, 4, 1, 1], device='cuda:0')

In [17]:
pred[0]['scores']

tensor([0.1922, 0.1879, 0.1859, 0.1859, 0.1854, 0.1841], device='cuda:0',
       grad_fn=<IndexBackward0>)