# Contrastive Language Image Pre-Training (CLIP) on Radiology Objects in COntext (ROCO)

## Import libraries and data
Before starting executing the notebook, do the following steps:
- Go to "Runtime" > "Change type of runtime" and select a GPU-based runtime;
- Load the `resized_train.zip`, `caption_prediction_train.csv`, `concept_detection_train.csv`  files in the File folder.

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
from PIL import Image
import matplotlib.pyplot as plt

tfk = tf.keras
tfkl = tfk.layers

Extract the images from `resized_train.zip`:

In [None]:
!unzip resized_train.zip

Load textual data:

In [33]:
# Load the captions
captions = pd.read_csv("caption_prediction_train.csv",sep="\t", index_col="ID")

# Load the labels
labels = pd.read_csv("concept_detection_train.csv",sep="\t",index_col="ID")
# Each label string contains multiple labels separated by a semicolumn.
# Transform the strings in lists of labels.
labels["cuis"] = labels["cuis"].str.split(pat=";")

We explore the images to check if they are all 128x128 pixel and how many channels they have:

In [3]:
max_channels = 1
for i in labels["ID"]:
  image_name = "resized_train/"+i+".jpg"
  img = Image.open(image_name)
  img = np.array(img)
  if img.ndim==3 and img.shape[2] > max_channels:
    max_channels = img.shape[2]
  # Check if the image has the expected resolution of 128x128
  if img.shape[0]!=128 or img.shape[1]!=128:
    print(f"Error in {image_name}: its resolution is " +
      f"{img.shape[0]}x{img.shape[1]}, while it should be 128x128")

print(f"The maximum number of channels is {max_channels}.")

The maximum number of channels is 3.


We import all the images:

In [13]:
images = tfk.utils.image_dataset_from_directory("./resized_train/",
                                                batch_size=32,
                                                labels=None,
                                                label_mode=None,
                                                image_size=(128,128),
                                                shuffle=False,
                                                color_mode = "rgb")

Found 83275 files belonging to 1 classes.


Set the random seed for reproducibility:

In [4]:
seed = 24948989491

rng = np.random.default_rng(seed)
tf.random.set_seed(seed)

## Build the CLIP model