## UNET Model for the CAMUS dataset using the MONAI platform

### Purpose

This notebook will explore the models that will do a segmentation of the left ventricle. Here I will use the CAMUS dataset with the ES and ED phases with their corresponding ground truth masks as my input and output for the model. 

In [1]:
!python -c "import monai" || pip install -qU "monai[ignite, nibabel, torchvision,tqdm]==0.6.0"
!python -c "import matplotlib" || pip install -q matplotlib
!python -c "import cv2" || pip install opencv-python

In [2]:
!nvidia-smi

'nvidia-smi' is not recognized as an internal or external command,
operable program or batch file.


Import all the necessary packages.

In [2]:
import monai
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import SimpleITK as sitk
import medpy
import PIL
import os
import shutil
import tempfile
from pathlib import Path
import pprint

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
from monai.config import print_config
print_config()

MONAI version: 1.1.0
Numpy version: 1.24.2
Pytorch version: 1.13.1+cu117
MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
MONAI rev id: a2ec3752f54bfc3b40e7952234fbeb5452ed63e3
MONAI __file__: /home/jyoti/Projects/Python/topeka-chapter-ejection-fraction/venv/lib/python3.8/site-packages/monai/__init__.py

Optional dependencies:
Pytorch Ignite version: 0.4.11
Nibabel version: 5.0.1
scikit-image version: 0.20.0
Pillow version: 9.4.0
Tensorboard version: 2.12.0
gdown version: 4.6.4
TorchVision version: 0.14.1+cu117
tqdm version: 4.65.0
lmdb version: 1.4.0
psutil version: 5.9.4
pandas version: 1.5.3
einops version: NOT INSTALLED or UNKNOWN VERSION.
transformers version: NOT INSTALLED or UNKNOWN VERSION.
mlflow version: NOT INSTALLED or UNKNOWN VERSION.
pynrrd version: 1.0.0

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies



Set the base directory where the data resides. 
If using Google Drive set here

In [None]:
%env DATA_DIRECTORY = /content/drive/MyDrive/LVEF

If using a local drive, set directory below

In [5]:

%env DATA_DIRECTORY =  /opt/Data

env: DATA_DIRECTORY=/opt/Data


In [6]:
## if environment variable is not set , get a temp directory. 
directory = os.environ.get("DATA_DIRECTORY")
ROOT_DIR = Path(tempfile.mkdtemp()) if directory is None else Path(directory)
print(ROOT_DIR)

/opt/Data


In [7]:
def checkPathExists(path):
  if not os.path.exists(path):
    print(f"Cannot access path: {path}")
  else:
    print (f"Path {path} accessible")

Download and extract the CAMUS dataset. This step is not needed if the data was downloaded and unzipped.

In [8]:
pp = pprint.PrettyPrinter()
from monai.utils import set_determinism
set_determinism(seed=0)
from monai.apps import download_and_extract, extractall

In [9]:
CAMUS_ORIGINAL_DATA_DIR = 'CAMUS/original_data/data'
CAMUS_DATA_DIR = 'New_CAMUS_png/CAMUS'

In [10]:
# May not work on WIndows
#resource = "https://scholar.cu.edu.eg/Dataset_BUSI.zip"
compressed_file = ROOT_DIR.joinpath("CAMUS.zip")
DATA_DIR = ROOT_DIR.joinpath(CAMUS_DATA_DIR)
if not os.path.exists(DATA_DIR):
    extractall(compressed_file, ROOT_DIR)
checkPathExists(DATA_DIR)

Path /opt/Data/New_CAMUS_png/CAMUS accessible


Organize the file names used for training and testing into Python lists for easy access when needed.

In [11]:
TRAINING_DATA_DIR = DATA_DIR.joinpath('Training')
TESTING_DATA_DIR = DATA_DIR.joinpath('Testing')
TWO_CHANNEL = '2CH'
FOUR_CHANNEL = '4CH'
PHASE_NAMES = ['ED', 'ES']

In [12]:
### Set the file list as
#[ 
#   ED [(input_file, mask_file), (input_file, mask_file), ....]
#   ES [(input_file, mask_file), (input_file, mask_file), ....]
#]
def data_directories(data_path, class_names, chamber_view):
    num_phases = len(class_names)
    patient_list = [x for x in data_path.iterdir() if x.is_dir()]

    image_files_list = [
        [
            (p, Path(str(p).replace(f"{class_names[i]}", f"{class_names[i]}_gt")))
            for x in patient_list
            for j, p in enumerate(x.glob(f"**/{chamber_view}*{class_names[i]}.png"))
        ]
        for i in range(num_phases)
    ]
    return image_files_list

In [13]:
training_2chamber_image_files = data_directories(TRAINING_DATA_DIR, PHASE_NAMES, TWO_CHANNEL)
training_4chamber_image_files = data_directories(TRAINING_DATA_DIR, PHASE_NAMES, FOUR_CHANNEL)
testing_2chamber_image_files = data_directories(TESTING_DATA_DIR, PHASE_NAMES, TWO_CHANNEL)
testing_4chamber_image_files = data_directories(TESTING_DATA_DIR, PHASE_NAMES, FOUR_CHANNEL)
#pp.pprint(training_2chamber_image_files[1])

In [14]:
def data_description(image_files_list):
    num_total = len(image_files_list[0])
    image_width, image_height = PIL.Image.open(image_files_list[0][0][0]).size
    print(f"Total Image Count: {num_total}")
    print(f"Image Dimensions: {image_width} x {image_height}")

In [15]:
print(f"Two Chamber Training Data Count")
data_description(training_2chamber_image_files)
print("-------------")
print(f"Four Chamber Training Data Count")
data_description(training_4chamber_image_files)
print("-------------")
print(f"Two Chamber Testing Data Count")
data_description(testing_2chamber_image_files)
print("-------------")
print(f"Four Chamber Testing Data Count")
data_description(testing_4chamber_image_files)
print("-------------")

Two Chamber Training Data Count
Total Image Count: 400
Image Dimensions: 256 x 256
-------------
Four Chamber Training Data Count
Total Image Count: 400
Image Dimensions: 256 x 256
-------------
Two Chamber Testing Data Count
Total Image Count: 50
Image Dimensions: 256 x 256
-------------
Four Chamber Testing Data Count
Total Image Count: 50
Image Dimensions: 256 x 256
-------------


Lets load the information file also

In [17]:
TRAINING_2CH_INFO_DIR = TRAINING_DATA_DIR.joinpath('training_2ch_info')
TRAINING_4CH_INFO_DIR = TRAINING_DATA_DIR.joinpath('training_4ch_info')
TESTING_2CH_INFO_DIR = TESTING_DATA_DIR.joinpath('testing_2ch_info')
TESTING_4CH_INFO_DIR = TESTING_DATA_DIR.joinpath('testing_4ch_info')
checkPathExists(TRAINING_2CH_INFO_DIR)
checkPathExists(TRAINING_4CH_INFO_DIR)
checkPathExists(TESTING_2CH_INFO_DIR)
checkPathExists(TESTING_4CH_INFO_DIR)

Path /opt/Data/New_CAMUS_png/CAMUS/Training/training_2ch_info accessible
Path /opt/Data/New_CAMUS_png/CAMUS/Training/training_4ch_info accessible
Path /opt/Data/New_CAMUS_png/CAMUS/Testing/testing_2ch_info accessible
Path /opt/Data/New_CAMUS_png/CAMUS/Testing/testing_4ch_info accessible


In [27]:
for file in TRAINING_2CH_INFO_DIR.glob(f"**/*.cfg"):
    with open(file) as f: 
        data = f.readlines() 
        data = [x.rstrip('\n') for x in data]
        data = { x.split(': ')[0]: x.split(': ')[1] for x in data}
        print(data)

{'ED': '1', 'ES': '16', 'NbFrame': '16', 'Sex': 'M', 'Age': '79', 'ImageQuality': 'Good', 'LVedv': '45.7', 'LVesv': '17.9', 'LVef': '60.9'}
{'ED': '1', 'ES': '19', 'NbFrame': '19', 'Sex': 'F', 'Age': '87', 'ImageQuality': 'Good', 'LVedv': '97.7', 'LVesv': '45.0', 'LVef': '54.0'}
{'ED': '1', 'ES': '22', 'NbFrame': '22', 'Sex': 'F', 'Age': '63', 'ImageQuality': 'Good', 'LVedv': '62.1', 'LVesv': '24.5', 'LVef': '60.6'}
{'ED': '1', 'ES': '17', 'NbFrame': '17', 'Sex': 'M', 'Age': '77', 'ImageQuality': 'Medium', 'LVedv': '64.1', 'LVesv': '20.1', 'LVef': '68.6'}
{'ED': '1', 'ES': '23', 'NbFrame': '23', 'Sex': 'F', 'Age': '82', 'ImageQuality': 'Good', 'LVedv': '40.3', 'LVesv': '15.7', 'LVef': '61.1'}
{'ED': '1', 'ES': '14', 'NbFrame': '14', 'Sex': 'F', 'Age': '28', 'ImageQuality': 'Good', 'LVedv': '70.5', 'LVesv': '30.6', 'LVef': '56.6'}
{'ED': '1', 'ES': '15', 'NbFrame': '15', 'Sex': 'M', 'Age': '80', 'ImageQuality': 'Medium', 'LVedv': '69.4', 'LVesv': '30.9', 'LVef': '55.5'}
{'ED': '1', 'ES'