## AIMI High School Internship 2024
### Notebook 2: Training a Computer Vision Model to Classify Pneumonia

**The Problem**: Given a chest X-ray, our goal in this project is to classify the image into one of four classes: **pneumonia, pneumothorax, pleural effusion**, and **normal**.  

**Your Second Task**: You should now have a training dataset consisting of (a) chest X-rays and (b) labels extracted from radiologist reports processed using NLP or a similar technique. Now, your goal is to train a computer vision model to classify the images. You have **two options** for this task, and you may attempt one or both of these:
- *Standard Classification* : Train a model to predict which class of pneumonia a chest x-ray belongs to using image-only derived features.
- *Classification w/ Metadata (stretch)*: Train a model that predicts which class of pneumonia a chest x-ray belongs to using image and additional patient metadata-derived features.

In this notebook, we provide some simple starter code to get you started on training a computer vision model. You are not required to use this template - feel free to modify as you see fit.

**Submitting Your Model**: We have created a leaderboard where you can submit your model and view results on the held-out test set. We provide instructions below for submitting your model to the leaderboard. **Please follow these directions carefully**.

We will evaluate your results on the held-out test set with the following evaluation metrics:
- **Accuracy**: the ratio of correctly predicted observations to the total observations. It tells us the proportion of true results (both true positives and true negatives) among the total number of cases examined. While straightforward, accuracy can be misleading in the context of imbalanced datasets where the number of observations in different classes varies significantly.
- **AUROC (Area Under the Receiver Operating Characteristic curve)**: a performance measurement for classification problems at various threshold settings. It tells us how well a model is capable of distinguishing between classes. The higher the AUROC, the better the model is at predicting 0s as 0s and 1s as 1s. An AUROC of 0.5 suggests no discriminative ability (equivalent to random guessing), while an AUROC of 1.0 indicates perfect discrimination.
- **Precision**: the ratio of correctly predicted positive observations to the total predicted positives. It is a measure of a classifier's exactness. High precision indicates a low false positive rate. It's particularly useful when the costs of False Positives are high.
- **Recall**: (also known as sensitivity) the ratio of correctly predicted positive observations to the all observations in actual class - yes. It is a measure of a classifier's completeness. High recall indicates that the class is correctly recognized (a low number of False Negatives).
- **F1**: the harmonic mean of precision and recall. It's a way to combine both precision and recall into a single measure that captures both properties. This score can be particularly useful if you need to balance precision and recall, which is often the case in uneven class distribution scenarios. The F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0.

## Load Data
Before you begin, make sure to go to `Runtime` > `Change Runtime Type` and select a T4 GPU.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import os
os.chdir(r'/content/drive/MyDrive/Cody - AIMI 2024/2024 AIMI Summer Internship - Intern Materials/Datasets')

In [None]:
!unzip -qq student_data_split.zip -d /content/

In [None]:
# Switch back to /content/student_data_split folder to work with downloaded datasets
os.chdir(r'/content/student_data_split')

In [None]:
# Confirm we can now see the student_test and student_train folders + Reports.json
!ls

Reports.json  student_test  student_train


## Import Libraries
We are leveraging the PyTorch framework to train our models. For more information and tutorials on PyTorch, see this link: https://pytorch.org/tutorials/beginner/basics/intro.html

In [None]:
%%capture
%pip install "comet_ml>=3.38.0" torch torchvision tqdm
from comet_ml import Experiment
from comet_ml.integration.pytorch import watch

In [None]:
# Some libraries that you may find useful are included here.
# To import a library that isn't provided with Colab, use the following command: !pip install <package_name>
import torch
import pandas as pd
from PIL import Image
import numpy as np
from tqdm import tqdm
from torchvision.transforms import v2
from torch import nn
from torchvision import models


In [None]:
# Load your image paths and extracted labels from your saved file
#dataset = pd.read_pickle("/content/drive/MyDrive/Cody - AIMI 2024/conditionsDf.pkl")
dataframe = pd.read_pickle("/content/drive/MyDrive/Cody - AIMI 2024/train_data.pkl")


# Display the first few rows of the DataFrame to confirm it's loaded correctly
dataframe.head()

Unnamed: 0,Patient ID,Study ID,Image Path,Label,Encoded Labels
0,patient39668,student_train/patient39668/study2,student_train/patient39668/study2/view1_fronta...,normal,"[0.0, 0.0, 0.0, 1.0]"
1,patient17014,student_train/patient17014/study2,student_train/patient17014/study2/view1_fronta...,pneumothorax,"[0.0, 1.0, 0.0, 0.0]"
2,patient11443,student_train/patient11443/study1,student_train/patient11443/study1/view1_fronta...,pneumothorax,"[0.0, 1.0, 0.0, 0.0]"
3,patient29294,student_train/patient29294/study1,student_train/patient29294/study1/view1_fronta...,"pneumothorax, pleural effusion","[0.0, 1.0, 1.0, 0.0]"
4,patient34615,student_train/patient34615/study71,student_train/patient34615/study71/view1_front...,pleural effusion,"[0.0, 0.0, 1.0, 0.0]"


## Create Dataloaders
We will implement a custom Dataset class to load in data. A custom Dataset class must have three methods: `__init__`, which sets up any class variables, `__len__`, which defines the total number of images, and `__getitem__`, which returns a single image and its paired label.

In [None]:
from torch.utils.data import Dataset

class ChestXRayDataset(Dataset):
    def __init__(self, dataframe, transforms):
        #super(ChestXRayDataset, self).__init__(**kwargs)

        self.dataframe = dataframe
        self.transforms = transforms

    def __len__(self):
        return len(self.dataframe)


    def __getitem__(self, idx):
        out_dict = {"idx": torch.tensor(idx),}

        image_path = self.dataframe.loc[idx,'Image Path']
        labels = self.dataframe.loc[idx,'Encoded Labels']

        image = Image.open(image_path).convert("RGB")
        #image = torch.tensor(image, dtype=torch.float32)
        if(self.transforms is not None):
            image = self.transforms(image)

        out_dict["img"] = image
        out_dict["label"] = torch.tensor(labels, dtype=torch.float32)

        #return out_dict
        return out_dict["img"], out_dict["label"]
        #image, target

## Define Training Components
Here, define any necessary components that you need to train your model, such as the model architecture, the loss function, and the optimizer.

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [None]:
class Resnext50(nn.Module):
    def __init__(self, n_classes):
        super().__init__()
        resnet = models.resnext50_32x4d(pretrained=True)
        resnet.fc = nn.Sequential(
            nn.Dropout(p=0.2),
            nn.Linear(in_features=resnet.fc.in_features, out_features=n_classes)
        )
        self.base_model = resnet
        self.sigm = nn.Sigmoid()

    def forward(self, x):
        return self.sigm(self.base_model(x))

Weighted classes for loss function to deal with imbalanced classes (from Harjyot)


In [None]:
#Class frequencies
#class_frequencies = np.array([449, 1155, 1641, 13525])

#Inverse of class frequencies
#class_weights = 1.0 / class_frequencies
#Normalize class weights
#class_weights /= class_weights.sum()

#print("Class weights:", class_weights)

#class_weights_tensor = torch.tensor(class_weights, dtype=torch.float32).to(device)

Class weights: [0.58977703 0.22927263 0.16137105 0.01957929]


In [None]:
#hyperparameters

batch_size = 64
k_folds = 5
num_epochs_per_k = 5
learning_rate = 1e-4

current_epoch = 0


In [None]:
#loss_fn = torch.nn.BCEWithLogitsLoss(weight = class_weights_tensor)
loss_fn = torch.nn.BCEWithLogitsLoss()

transforms = v2.Compose([
    v2.ToImage(),
    v2.Resize(size=(224, 224), antialias=True),
    #v2.RandomHorizontalFlip(p=0.5),
    v2.ToDtype(torch.float32, scale=True),
    v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

train_dataset = ChestXRayDataset(dataframe, transforms)
#dataloader = torch.utils.data.DataLoader(dataset=dataset, batch_size="""Customize batch size""", shuffle=True, drop_last=True)

num_classes = 4
label_space = ['pneumonia', 'pneumothorax', 'pleural effusion', 'normal']

model = Resnext50(num_classes)
model.to(device)
opt = torch.optim.AdamW(model.parameters(), lr=learning_rate) # AdamW is a commonly-used optimizer. Feel free to modify.



Downloading: "https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth" to /root/.cache/torch/hub/checkpoints/resnext50_32x4d-7cdf4587.pth
100%|██████████| 95.8M/95.8M [00:01<00:00, 86.8MB/s]


In [None]:
experiment = Experiment(
  api_key="REDACTED",
  project_name="aimi2024-resnext50",
  workspace="summit"
)
watch(model)


## Visualizations
- Create some visualizations to highlight model performance e.g. `multilabel_confusion_matrix`, plot of train vs val loss history, plot of train vs val accuracy history.

In [None]:
current_epoch

0

## Training Code
We provide starter code below that implements a simple training loop in PyTorch. Feel free to modify as you see fit.

In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score, multilabel_confusion_matrix, roc_auc_score

'''
def calculate_metrics_original(pred, target, threshold=0.5):
    pred = np.array(pred > threshold, dtype=float)
    return {'micro/precision': precision_score(y_true=target, y_pred=pred, average='micro'),
            'micro/recall': recall_score(y_true=target, y_pred=pred, average='micro'),
            'micro/f1': f1_score(y_true=target, y_pred=pred, average='micro'),
            'macro/precision': precision_score(y_true=target, y_pred=pred, average='macro'),
            'macro/recall': recall_score(y_true=target, y_pred=pred, average='macro'),
            'macro/f1': f1_score(y_true=target, y_pred=pred, average='macro'), #stick with macro
            'samples/precision': precision_score(y_true=target, y_pred=pred, average='samples'),
            'samples/recall': recall_score(y_true=target, y_pred=pred, average='samples'),
            'samples/f1': f1_score(y_true=target, y_pred=pred, average='samples'),
            }
'''
def calculate_metrics(pred, target, threshold=0.5):
    thresholded_preds = np.empty_like(pred)
    thresholded_preds[:] = pred
    thresholded_preds = np.array(thresholded_preds > threshold, dtype=float)

    f1 = f1_score(y_true=target, y_pred=thresholded_preds, average=None)
    f1_macro = f1_score(y_true=target, y_pred=thresholded_preds, average='macro')

    auc = roc_auc_score(y_true=target, y_score=pred, average=None)
    auc_macro = roc_auc_score(y_true=target, y_score=pred, average='macro')

    return {'f1': f1, 'f1_macro': f1_macro, 'auc': auc, 'auc_macro': auc_macro}

In [None]:
def train(model, loss_fn, train_loader, opt, max_epoch, current_epoch):

    best_val_loss = np.inf
    best_val_metrics = []
    test_freq = 1

    for epoch in range(0, max_epoch):
        current_epoch += 1

        print(f"Training epoch {current_epoch}")
        current_loss = 0.0

        model.train()

        for index, (inputs, targets) in enumerate(tqdm(train_loader)):
        #for index, data in tqdm(enumerate(train_loader, 0)):
        #  inputs, targets = data

          inputs, targets = inputs.to(device), targets.to(device)
          opt.zero_grad()

          output = model(inputs)
          loss = loss_fn(output, targets)

          loss.backward()
          opt.step()

    return current_epoch


In [None]:
#https://stackoverflow.com/questions/42703500/how-do-i-save-a-trained-model-in-pytorch
#https://pytorch.org/tutorials/beginner/saving_loading_models.html

In [None]:
from sklearn.model_selection import KFold

kf = KFold(n_splits=k_folds, shuffle=True)
fold_results = {}
threshold = 0.5

best_macro_f1 = 0.0 #higher f1 score is better, 1 is best

for fold, (train_idx, val_idx) in enumerate(kf.split(train_dataset)):
    print(f"Fold {fold + 1}")
    print("-------")

    train_subsampler = torch.utils.data.SubsetRandomSampler(train_idx)
    val_subsampler = torch.utils.data.SubsetRandomSampler(val_idx)

    train_loader = torch.utils.data.DataLoader(
        dataset=train_dataset,
        batch_size=batch_size,
        sampler=train_subsampler,
    )
    val_loader = torch.utils.data.DataLoader(
        dataset=train_dataset,
        batch_size=batch_size,
        sampler=val_subsampler,
    )

    current_epoch = train(model, loss_fn, train_loader, opt, max_epoch=num_epochs_per_k, current_epoch=current_epoch)


    # Evaluate using  split
    print("Evaluating fold...")

    save_path = f'/content/drive/MyDrive/Cody - AIMI 2024/Trains/model-fold-{fold+1}.pth'
    torch.save(model.state_dict(), save_path)  # Save model weights for inference
    model.eval()

    correct, total = {"total":0, 0:0, 1:0, 2:0, 3:0},{"total":0, 0:0, 1:0, 2:0, 3:0}
    with torch.no_grad():
        total_results = []
        total_targets = []

        for index, (data, target) in enumerate(tqdm(val_loader)):
            data, target = data.to(device), target.to(device)
            output = model(data)

            total_results.extend(output.cpu().numpy())
            total_targets.extend(target.cpu().numpy())


            predicted = (output > threshold).int()  # Get a binary mask for predicted classes
            confusion_matrices = multilabel_confusion_matrix(np.array(predicted.cpu().numpy()), np.array(target.cpu().numpy()))
            for i, matrix in enumerate(confusion_matrices):
                tn, fp, fn, tp = matrix.ravel()
                accuracy = (tp + tn) / (tp + tn + fp + fn)
                correct[i] += tp + tn
                total[i] += tp + tn + fp + fn
                correct["total"] += tp + tn
                total["total"] += tp + tn + fp + fn
                #print(f"Accuracy for Class {i}: {accuracy * 100:.2f}%")

        #Fold metrics
        print(f'----------FOLD {fold+1} SUMMARY---------')
        metrics = calculate_metrics(np.array(total_results), np.array(total_targets))
        print(f'Macro F1 Score: {metrics["f1_macro"]}   Class Breakdown: {metrics["f1"]}')
        print(f'Macro AUROC: {metrics["auc_macro"]}   Class Breakdown: {metrics["auc"]}')

        print(f'Overall accuracy for fold {fold+1}: {(correct["total"] / total["total"]) * 100:.2f}%')
        fold_results[fold+1] = {}
        fold_results[fold+1]["total"] = 100.0 * (correct["total"] / total["total"])
        for key in correct.keys():
            if key != "total":
                print(f'Accuracy for class {key}: {(correct[key] / total[key]) * 100:.2f}%')
                fold_results[fold+1][key] = 100.0 * (correct[key] / total[key])
        print('--------------------------------')

        #Save best checkpoint based on F1 Score
        #F1 Score preferred over AUROC b/c of data imbalance - https://stackoverflow.com/questions/44172162/f1-score-vs-roc-auc
        if(metrics["f1_macro"] > best_macro_f1):
            best_macro_f1 = metrics["f1_macro"]
            state = {
                'epoch': current_epoch,
                'state_dict': model.state_dict(),
                'optimizer': opt.state_dict(),
            }
            save_path = f'/content/drive/MyDrive/Cody - AIMI 2024/Trains/best.ckpt'
            torch.save(state, save_path)


'''
print(f'K-Fold cross validation results for  {k_folds} folds')
print('----------------------------------')
sums = {"total":0.0, 0:0.0, 1:0.0, 2:0.0, 3:0.0}
for fold_idx, value in fold_results.items():
  for key, saved_accuracy in value.items():
    sums[key] += saved_accuracy

print(f'Average overall accuracy: {sums["total"]/len(fold_results.items())} %')
for key in sums.keys():
  if key != "total":
    print(f'Average accuracy for class {key} ({label_space[key]}): {sums[key]/len(fold_results.items())} %')
'''

Fold 1
-------
Training epoch 6


100%|██████████| 210/210 [03:35<00:00,  1.02s/it]


Training epoch 7


100%|██████████| 210/210 [03:26<00:00,  1.02it/s]


Training epoch 8


100%|██████████| 210/210 [03:28<00:00,  1.01it/s]


Training epoch 9


100%|██████████| 210/210 [03:27<00:00,  1.01it/s]


Training epoch 10


100%|██████████| 210/210 [03:32<00:00,  1.01s/it]


Evaluating fold...


100%|██████████| 53/53 [00:48<00:00,  1.09it/s]


----------FOLD 1 SUMMARY---------
Macro F1 Score: 0.3260035636263955   Class Breakdown: [0.         0.80198265 0.5020316  0.        ]
Macro AUROC: 0.6802616357161302   Class Breakdown: [0.60438886 0.82289145 0.67251121 0.62125502]
Overall accuracy for fold 1: 79.40%
Accuracy for class 0: 88.76%
Accuracy for class 1: 76.18%
Accuracy for class 2: 67.12%
Accuracy for class 3: 85.51%
--------------------------------
Fold 2
-------
Training epoch 11


100%|██████████| 210/210 [03:28<00:00,  1.01it/s]


Training epoch 12


100%|██████████| 210/210 [03:28<00:00,  1.01it/s]


Training epoch 13


100%|██████████| 210/210 [03:27<00:00,  1.01it/s]


Training epoch 14


100%|██████████| 210/210 [03:26<00:00,  1.02it/s]


Training epoch 15


100%|██████████| 210/210 [03:32<00:00,  1.01s/it]


Evaluating fold...


100%|██████████| 53/53 [00:47<00:00,  1.12it/s]


----------FOLD 2 SUMMARY---------
Macro F1 Score: 0.3895938277623249   Class Breakdown: [0.         0.89664083 0.66173448 0.        ]
Macro AUROC: 0.6792755950301451   Class Breakdown: [0.47958109 0.93619114 0.78265796 0.51867218]
Overall accuracy for fold 2: 84.71%
Accuracy for class 0: 88.52%
Accuracy for class 1: 88.08%
Accuracy for class 2: 75.47%
Accuracy for class 3: 86.77%
--------------------------------
Fold 3
-------
Training epoch 16


100%|██████████| 210/210 [03:27<00:00,  1.01it/s]


Training epoch 17


100%|██████████| 210/210 [03:27<00:00,  1.01it/s]


Training epoch 18


100%|██████████| 210/210 [03:28<00:00,  1.01it/s]


Training epoch 19


100%|██████████| 210/210 [03:28<00:00,  1.01it/s]


Training epoch 20


100%|██████████| 210/210 [03:32<00:00,  1.01s/it]


Evaluating fold...


100%|██████████| 53/53 [00:50<00:00,  1.06it/s]


----------FOLD 3 SUMMARY---------
Macro F1 Score: 0.4201180932786357   Class Breakdown: [0.         0.9395441  0.74092827 0.        ]
Macro AUROC: 0.7469879003829779   Class Breakdown: [0.56227027 0.96638799 0.83935929 0.61993405]
Overall accuracy for fold 3: 87.31%
Accuracy for class 0: 88.55%
Accuracy for class 1: 92.73%
Accuracy for class 2: 81.69%
Accuracy for class 3: 86.29%
--------------------------------
Fold 4
-------
Training epoch 21


100%|██████████| 210/210 [03:30<00:00,  1.00s/it]


Training epoch 22


100%|██████████| 210/210 [03:26<00:00,  1.02it/s]


Training epoch 23


100%|██████████| 210/210 [03:25<00:00,  1.02it/s]


Training epoch 24


100%|██████████| 210/210 [03:31<00:00,  1.01s/it]


Training epoch 25


100%|██████████| 210/210 [03:25<00:00,  1.02it/s]


Evaluating fold...


100%|██████████| 53/53 [00:46<00:00,  1.14it/s]


----------FOLD 4 SUMMARY---------
Macro F1 Score: 0.43077266777775675   Class Breakdown: [0.         0.95346485 0.76962583 0.        ]
Macro AUROC: 0.7675316139631279   Class Breakdown: [0.5769287  0.97236671 0.86226553 0.65856552]
Overall accuracy for fold 4: 87.35%
Accuracy for class 0: 88.52%
Accuracy for class 1: 94.51%
Accuracy for class 2: 81.28%
Accuracy for class 3: 85.09%
--------------------------------
Fold 5
-------
Training epoch 26


100%|██████████| 210/210 [03:26<00:00,  1.02it/s]


Training epoch 27


100%|██████████| 210/210 [03:26<00:00,  1.02it/s]


Training epoch 28


100%|██████████| 210/210 [03:25<00:00,  1.02it/s]


Training epoch 29


100%|██████████| 210/210 [03:28<00:00,  1.00it/s]


Training epoch 30


100%|██████████| 210/210 [03:26<00:00,  1.02it/s]


Evaluating fold...


100%|██████████| 53/53 [00:47<00:00,  1.13it/s]


----------FOLD 5 SUMMARY---------
Macro F1 Score: 0.44865725948451385   Class Breakdown: [0.         0.96333501 0.83129403 0.        ]
Macro AUROC: 0.7847462154058129   Class Breakdown: [0.57937485 0.98213802 0.90244996 0.67502203]
Overall accuracy for fold 5: 89.10%
Accuracy for class 0: 87.87%
Accuracy for class 1: 95.65%
Accuracy for class 2: 86.43%
Accuracy for class 3: 86.46%
--------------------------------


'\nprint(f\'K-Fold cross validation results for  {k_folds} folds\')\nprint(\'----------------------------------\')\nsums = {"total":0.0, 0:0.0, 1:0.0, 2:0.0, 3:0.0}\nfor fold_idx, value in fold_results.items():\n  for key, saved_accuracy in value.items():\n    sums[key] += saved_accuracy\n\nprint(f\'Average overall accuracy: {sums["total"]/len(fold_results.items())} %\')\nfor key in sums.keys():\n  if key != "total":\n    print(f\'Average accuracy for class {key} ({label_space[key]}): {sums[key]/len(fold_results.items())} %\')\n'

In [None]:
#Save last ckpt
state = {
    'epoch': current_epoch,
    'state_dict': model.state_dict(),
    'optimizer': opt.state_dict(),
}
save_path = f'/content/drive/MyDrive/Cody - AIMI 2024/Trains/last.ckpt'
torch.save(state, save_path)

In [None]:
'''
# Assuming 'output' contains logits
threshold = 0.5  # Adjust this threshold as needed
predicted = (output > threshold).int()  # Get a binary mask for predicted classes

confusion_matrices = multilabel_confusion_matrix(np.array(predicted.cpu().numpy()), np.array(target.cpu().numpy()))

# Accessing individual confusion matrices
confusion_matrix_class0 = confusion_matrices[0]
confusion_matrix_class1 = confusion_matrices[1]
confusion_matrix_class2 = confusion_matrices[2]
confusion_matrix_class3 = confusion_matrices[3]

print("Confusion Matrix for Class 0:\n", confusion_matrix_class0)
print("Confusion Matrix for Class 1:\n", confusion_matrix_class1)
print("Confusion Matrix for Class 2:\n", confusion_matrix_class2)
print("Confusion Matrix for Class 3:\n", confusion_matrix_class3)

for i, matrix in enumerate(confusion_matrices):
    tn, fp, fn, tp = matrix.ravel()
    accuracy = (tp + tn) / (tp + tn + fp + fn)
    print(f"Accuracy for Class {i}: {accuracy * 100:.2f}%")
    print("-----------------------------------")
'''

'\n# Assuming \'output\' contains logits\nthreshold = 0.5  # Adjust this threshold as needed\npredicted = (output > threshold).int()  # Get a binary mask for predicted classes\n\nconfusion_matrices = multilabel_confusion_matrix(np.array(predicted.cpu().numpy()), np.array(target.cpu().numpy()))\n\n# Accessing individual confusion matrices\nconfusion_matrix_class0 = confusion_matrices[0]\nconfusion_matrix_class1 = confusion_matrices[1]\nconfusion_matrix_class2 = confusion_matrices[2]\nconfusion_matrix_class3 = confusion_matrices[3]\n\nprint("Confusion Matrix for Class 0:\n", confusion_matrix_class0)\nprint("Confusion Matrix for Class 1:\n", confusion_matrix_class1)\nprint("Confusion Matrix for Class 2:\n", confusion_matrix_class2)\nprint("Confusion Matrix for Class 3:\n", confusion_matrix_class3)\n\nfor i, matrix in enumerate(confusion_matrices):\n    tn, fp, fn, tp = matrix.ravel()\n    accuracy = (tp + tn) / (tp + tn + fp + fn)\n    print(f"Accuracy for Class {i}: {accuracy * 100:.2f}%

In [None]:
experiment.end()

# Some misc test functions (ignore)

In [None]:
def alternative_evaluate(model): #change train to evaluate using this one to get best?? or of individual class idk
    model.eval()

    with torch.no_grad():
        correct, total = 0, 0

        for index, (data, target) in enumerate(tqdm(val_loader)): #iterate through val_loader in batches of inputs&targets in sizes of batch_size
            data, target = data.to(device), target.to(device)

            outputs = model(data).cpu()

            preds = np.array(outputs)
            target_labels = target.cpu().numpy()
            #preds and target_labels is an array of len batch_size with each row containing an array of len 4 of class pred/labels

            #go through all the predictions, if greater than threshold set to 1, else 0
            #bit of a hack- [0,0,0,0] should never exist, convert to [0,0,0,1]
            # ['pneumonia', 'pneumothorax', 'pleural effusion', 'normal']
            for(i, pred) in enumerate(preds):
              preds[i] = (pred > threshold).astype(int)
              if(np.all(preds[i] == 0)):
                preds[i] = [0, 0, 0, 1]

            for i in range(batch_size):
              if(i >= len(preds[i]) or i >= len(target_labels[i])):
                break
              if(np.array_equal(preds[i], target_labels[i])):
                correct += 1
              total += 1
        print(f"Accuracy: {correct / total * 100:.2f}%")

def alternative_evaluate_2(model):
    model.eval()

    with torch.no_grad():
        total_results = []
        total_targets = []

        for index, (data, target) in enumerate(tqdm(val_loader)):
            data, target = data.to(device), target.to(device)
            outputs = model(data)
            total_results.extend(outputs.cpu().numpy())
            total_targets.extend(target.cpu().numpy())

        #calculate_metrics_original(np.array(total_results), np.array(total_targets))
        calculate_metrics(np.array(total_results), np.array(total_targets))




In [None]:
alternative_evaluate_2(model)

100%|██████████| 53/53 [00:36<00:00,  1.44it/s]

F1: [0.         0.52909418 0.         0.        ]
F1 Macro: 0.1322735452909418
AUC: [0.56815749 0.57825563 0.61725626 0.50773807]
AUC Macro: 0.5678518635904377





In [None]:
def unencode_multi_hot(encoded_labels, label_space):
    original_labels = []
    for label_vector in encoded_labels:
        labels = [label_space[idx] for idx, value in enumerate(label_vector) if value == 1]
        original_labels.append(labels)
    return original_labels



# Evaluating on Test Dataset


# Load Model

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = Resnext50(num_classes)
model.to(device)
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)

checkpoint = torch.load(r'/content/drive/MyDrive/Cody - AIMI 2024/Trains/Run_2/best.ckpt')
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])




In [None]:
test_transforms = v2.Compose([
    v2.ToImage(),
    v2.Resize(size=(224, 224), antialias=True),
    v2.ToDtype(torch.float32, scale=True),
    v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

def getImage(image_path):
  image = Image.open(image_path).convert("RGB")
  image = test_transforms(image)
  return image

In [None]:
threshold = 0.5

def predict(model, image_tensor):
  model.eval()
  with torch.no_grad():
      input = image_tensor.unsqueeze(0)  #image lacks batch layer, so insert a batch dimension of size 1
      input = input.to(device)

      outputs = model(input).cpu()
      preds = np.array(outputs)

      rounded_preds, thresholded_preds = np.empty_like(preds), np.empty_like(preds)
      rounded_preds[:] = preds
      thresholded_preds[:] = preds

      for(i, pred) in enumerate(preds):
              thresholded_preds[i] = (pred > threshold).astype(int)
              if(np.all(thresholded_preds[i] == 0)):
                thresholded_preds[i] = [0, 0, 0, 1]
              rounded_preds[i] = [round(num, 7) for num in pred]

      return thresholded_preds[0], rounded_preds[0]


In [None]:
test_dataframe = pd.read_csv("/content/drive/MyDrive/Cody - AIMI 2024/2024 AIMI Summer Internship - Intern Materials/Datasets/test_annotations.csv")
os.chdir(r'/content/student_data_split')

In [None]:
#np.set_printoptions(formatter={'float': lambda x: "{0:0.2f}".format(x)})

processed_patients = []
#number_pneumonia = 0 #temporary just to double check

for index, row in tqdm(test_dataframe.iterrows(), total=test_dataframe.shape[0]):
  images = os.listdir(row['study_id'])
  for image in images:
    thresholded_preds, rounded_preds = predict(model, getImage(row['study_id'] + '/' + image))

    patient = {
        'study_id' : row['study_id'],
        'Pneumothorax' : thresholded_preds[1],
        'Pneumonia' : thresholded_preds[0],
        'Pleural Effusion' : thresholded_preds[2],
        'No Finding' : thresholded_preds[3],
        'Pneumothorax Probs' : rounded_preds[1],
        'Pneumonia Probs' : rounded_preds[0],
        'Pleural Effusion Probs' : rounded_preds[2],
        'No Finding Probs' : rounded_preds[3],
    }

    #temporary, check # of pneumonia to make sure not exporting wrong
    #if(thresholded_preds[0] == 1):
    #  number_pneumonia += 1

    processed_patients.append(patient)

    break # too lazy to deal/combine output from multiple images for now, will handle later

#print(f"\n {number_pneumonia} pneumonia detected")

100%|██████████| 2983/2983 [01:11<00:00, 41.45it/s]


In [None]:
test_processed_dataframe = pd.DataFrame(processed_patients)
test_processed_dataframe.to_csv(r'/content/drive/MyDrive/Cody - AIMI 2024/test_results.csv', index=False, float_format='%.10f')

## Submitting Your Results
Once you have successfully trained your model, generate predictions on the test set and save your results as a `.csv` file. This file can then be uploaded to the leaderboard: https://vilmedic.app/misc/aimi24/leaderboard.

An example `test_results.csv` has been provided for reference only in the `2024 AIMI Summer Internship - Intern Materials/Datasets/Labels` folder. *Do not submit this, the results will be really poor. *

Your final `.csv` file **must** have the following format:
- There must be a column titled `study_id` with the paths to the study_id for the test set image, e.g. `student_test/patient35172/study3`.
- The provided columns from `test_annotations.csv` must be present: "Pneumothorax", "Pneumonia", "Pleural Effusion", "No Finding:
  - Each of these columns must contain a binary value `0` or `1` representing the **observed/ground-truth** absence or presence of the disease status.
- Added columns "Pneumothorax Probs", "Pneumonia Probs", "Pleural Effusion Probs", "No Finding Probs" containing the singular probability values belonging to each class.
  - Each of these columns must contain a continuous value representing the **predicted** probability of the absence or presence of the disease status for that class.
  - *Hint:* Depending on which loss function you used, you might already be outputing probabilities. You can then derive predictions by thresholding your probabilities to a binarized output. If your model outputs logits directly, then apply the sigmoid activation function `torch.sigmoid(logits)` to get probabilities and then threshold to get binary predictions.
- Double check that the length of the dataset passed into your dataloader matches the length of your final dataframe.

In [None]:
model = # Model Architecture
ckpt = torch.load("/content/best.pkl")
model.load_state_dict(ckpt["state_dict"])

test_dataset = ChestXRayDataset("""Fill in args here""")
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=4, shuffle=False, drop_last=False)

In [None]:
# Write method to load in data from test_loader, compute model predictions, and append results to test_results dict
test_results = {"image_path": [], "pred": []}

In [None]:
test_results = pd.DataFrame(test_results)
test_results.to_csv(f"/content/test_results.csv")