<a href="https://colab.research.google.com/github/HD666g/NTHU_2023_DLBOI_HW/blob/main/hw2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div align="center">

# National Tsing Hua University

### Fall 2023

#### 11210IPT 553000

#### Deep Learning in Biomedical Optical Imaging

## Homework 2

</div>


### ✏️ Task A: Transitioning to Cross-Entropy Loss (20 pts)

In Lab, we utilized the **Binary Cross-Entropy (BCE) Loss** for a binary classification task. The BCE loss is articulated as:

$$ \text{BCE}(y, \hat{y}) = - \left( y \log(\hat{y}) + (1 - y) \log(1 - \hat{y}) \right) $$

Here, $y$ is the true label (0 or 1), and $\hat{y}$ denotes the predicted probability of $y=1$.

In this task, we aim to explore the implementation of a model using **Cross-Entropy (CE) Loss**, which is a more common approach for classification tasks, especially when dealing with multiple classes. CE loss is expressed as:

$$ \text{CE}(y, \hat{y}) = -\sum_{i} y^{(i)} \log(\hat{y}^{(i)}) $$

In this expression, $y$ represents the ground truth labels, $ \hat{y} $ is the predictions from your model, and $i$ is the index of the class.


#### 1. Modify the Loss (3 pts)
Transition to using Cross-Entropy (CE) Loss for the classification task by utilizing PyTorch's built-in functionalities. You can refer to the [official PyTorch documentation](https://pytorch.org/docs/stable/nn.html) for detailed information and guidance to ensure the correct implementation of the CE loss.

In [None]:
import torch.nn as nn

# Replace '...' with the appropriate loss function in PyTorch
loss = nn.CrossEntropyLoss()

#### 2. Modify the Model Architecture (2 pts)
To adapt the original code for use with Cross-Entropy (CE) loss, make necessary modifications to the model architecture. Ensure it is compatible and optimized for the application of CE loss. Consider the number of output nodes and the activation function used in the output layer for effective multi-class classification.

In [None]:
# Modifying the architecture to be compatible with CE loss
ce_model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(256*256*1, 256),
    nn.ReLU(),
    nn.Linear(256, 20)
).cuda()

print(ce_model)

Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=65536, out_features=256, bias=True)
  (2): ReLU()
  (3): Linear(in_features=256, out_features=2, bias=True)
)


#### 3. Reflection Questions (15 pts, 5 pts for each)
Provide detailed answers to the questions below:

**Q1. Loss Function Comparison:**  
   What are the differences between Binary Cross-Entropy (BCE) loss and Cross-Entropy (CE) loss?

**Q2. Model Architecture Modification:**  
   What motivated the specific changes you made to the model architecture?

**Q3. Adapting to CE Loss:**  
   In the original code configured for BCE loss, two major adjustments are needed for adaptation to CE loss. Analyze and explain the necessity for these changes, referring to the code below.

```python
for images, labels in train_loader:
    images = images.cuda()
    images = images / 255.0
    labels = labels.cuda()
    optimizer.zero_grad()
    outputs = model(images)

    # Change #1: Adaptation to the labels for CE loss
    labels = labels.long()  # Changed from labels.float().unsqueeze(1) for BCE loss

    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    total_loss += loss.item()

    # Change #2: Predictions for CE loss
    train_predicted = outputs.argmax(-1)  # Changed from torch.sigmoid(outputs) > 0.5 for BCE loss
    train_correct += (train_predicted == labels).sum().item()
```

#### Put Your Response Here:

##### 1. First of all, as the question indicated, BCE is for binary classifications, so the out put will be only 0 or 1; but CE can be used for multiple classifications, so the output will be more than 0 and 1, even the output will be other kinds of expressions. So as mention above, the output layer of BCE and CE will be different, the number of ouptput layer of CE tends to be more than BCE.

##### 2. Frist of all, I change the out_features to 20, it is because CE loss can be used for multi-class classification problem, so the output features can be bigger than 1.

##### 3. For Change#1, CE use 'long' type label is because CE is used for multi-class classcification, so the label of CE may be like (0, 1, 2, 3...) and these labels correspond to 'long' type label;
As for Change#2, CE use $$\text -\sum_{i}$$ is because, the sum of prediction of BCE is already 1, but CE due to multi-class, CE need to ensure its sum of prediction is equal to 1. And this make the result of CE can be easily interpret.

### ✏️ Task B: Creating an Evaluation Code (20 pts)

Evaluate the performance of a pretrained deep learning model with a test dataset of chest X-ray images available in `test_normal.npy` and `test_pneumonia.npy` files. These files respectively contain 200 grayscale normal and pneumonia chest X-ray images, each of size 256×256. The objective is to calculate the model’s accuracy, defined as the percentage of images correctly classified. To accomplish this, you are tasked to write code that loads, processes, and evaluates the model on this specific dataset. Ensure each segment of code replacing the `...` placeholders is functional and aligns with the steps provided in the instructions.

**Note: ⚠️ Ensure to upload your trained model's weights to your working environment if needed.**

### Step 0: Download test dataset

In [2]:
!wget https://raw.githubusercontent.com/TacoXDD/homeworks/master/dataset/test/test_normal.npy
!wget https://raw.githubusercontent.com/TacoXDD/homeworks/master/dataset/test/test_pneumonia.npy

--2023-10-14 03:56:57--  https://raw.githubusercontent.com/TacoXDD/homeworks/master/dataset/test/test_normal.npy
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13107328 (12M) [application/octet-stream]
Saving to: ‘test_normal.npy’


2023-10-14 03:56:58 (246 MB/s) - ‘test_normal.npy’ saved [13107328/13107328]

--2023-10-14 03:56:58--  https://raw.githubusercontent.com/TacoXDD/homeworks/master/dataset/test/test_pneumonia.npy
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13107328 (12M) [application/octet-stream]
Saving to: ‘test_p

### Step 1: Prepare your test dataset

In [3]:
import numpy as np

test_abnormal = np.load('test_pneumonia.npy')
test_normal = np.load('test_normal.npy')

print(f'Shape of test_abnormal: {test_abnormal.shape}')
print(f'Shape of test_normal: {test_normal.shape}')

# For the data having presence of pneumonia assign 1, for the normal ones assign 0.
test_abnormal_labels = np.ones((test_abnormal.shape[0],))
test_normal_labels = np.zeros((test_normal.shape[0],))

x_test = np.concatenate((test_abnormal, test_normal), axis=0)
y_test = np.concatenate((test_abnormal_labels, test_normal_labels), axis=0)

print(f'Shape of x_test: {x_test.shape}')
print(f'Shape of y_test: {y_test.shape}')

Shape of test_abnormal: (200, 256, 256)
Shape of test_normal: (200, 256, 256)
Shape of x_test: (400, 256, 256)
Shape of y_test: (400,)


### Step 2: Load Test Images into PyTorch DataLoader (5 pts)

In [4]:
import torch
from torch.utils.data import DataLoader, TensorDataset, random_split

# Convert to PyTorch tensors
x_test = torch.from_numpy(x_test).long()
y_test = torch.from_numpy(y_test).long()

# Combine the images and labels into a dataset
test_dataset = TensorDataset(x_test, y_test)

# Create a dataloader to load data in batches. Set batch size to 32.
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=True)

print(f'Number of samples in train is {len(test_loader.dataset)}.')
print(f'X_train: max value is {x_test.max().item()}, min value is {x_test.min().item()}, data type is {x_test.dtype}.')

Number of samples in train is 400.
X_train: max value is 255, min value is 0, data type is torch.int64.


### Step 3: Prepare Your Trained Model  (5 pts)
- Define the architecture to match exactly with the trained model intended for inference. Ensure strict alignment to avoid errors during evaluation.
- Load the weights from the trained model and set the model to evaluation mode

In [6]:
from torch import nn
# Declare the model architecture
model = nn.Sequential(
    nn.Flatten(),

    nn.Linear(256*256*1, 64),
    nn.BatchNorm1d(64),
    nn.ReLU(),
    nn.Dropout(0.5),

    nn.Linear(64, 64),
    nn.BatchNorm1d(64),
    nn.ReLU(),
    nn.Dropout(0.5),

    nn.Linear(64, 64),
    nn.BatchNorm1d(64),
    nn.ReLU(),
    nn.Dropout(0.5),

    nn.Linear(64, 1)
).cuda()

# Load the trained weights
!wget https://github.com/HD666g/NTHU_2023_DLBOI_HW/raw/main/HW2/trained_weights.pth
model.load_state_dict(torch.load('trained_weights.pth'))

# Set the model to evaluation mode
model.eval()

--2023-10-14 03:57:36--  https://github.com/HD666g/NTHU_2023_DLBOI_HW/raw/main/HW2/trained_weights.pth
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/HD666g/NTHU_2023_DLBOI_HW/main/HW2/trained_weights.pth [following]
--2023-10-14 03:57:36--  https://raw.githubusercontent.com/HD666g/NTHU_2023_DLBOI_HW/main/HW2/trained_weights.pth
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16821224 (16M) [application/octet-stream]
Saving to: ‘trained_weights.pth.1’


2023-10-14 03:57:36 (279 MB/s) - ‘trained_weights.pth.1’ saved [16821224/16821224]



Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): Linear(in_features=65536, out_features=64, bias=True)
  (2): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (3): ReLU()
  (4): Dropout(p=0.5, inplace=False)
  (5): Linear(in_features=64, out_features=64, bias=True)
  (6): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (7): ReLU()
  (8): Dropout(p=0.5, inplace=False)
  (9): Linear(in_features=64, out_features=64, bias=True)
  (10): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (11): ReLU()
  (12): Dropout(p=0.5, inplace=False)
  (13): Linear(in_features=64, out_features=1, bias=True)
)

### Step 4: Perform Inference and Calculate the Accuracy (10 pts)
- Ensure the image values are processed in a manner consistent with the training phase.
- Use the model that was trained with BCE loss to execute inference on the test dataset.
- Note that inference should be performed in GPU.

In [12]:
test_correct = 0
test_total = 0

with torch.no_grad():
    for images, labels in test_loader:

        images = images.cuda()
        images = images / 255

        labels = labels.cuda()

        outputs = model(images)

        labels_float = labels.float().unsqueeze(1)  # Convert labels to float and match shape with outputs
        predicted = torch.sigmoid(outputs) > 0.5

        test_correct += (predicted.float() == labels_float).sum().item()
        test_total += labels.size(0)
test_accuracy = 100. * test_correct / test_total
print(f'Test accuracy is {test_accuracy:.2f}%.')

Test accuracy is 77.75%.
