<div align="center">

# National Tsing Hua University

### Fall 2023

#### 11210IPT 553000

#### Deep Learning in Biomedical Optical Imaging

## Homework 2

</div>


### ✏️ Task A: Transitioning to Cross-Entropy Loss (20 pts)

In Lab, we utilized the **Binary Cross-Entropy (BCE) Loss** for a binary classification task. The BCE loss is articulated as:

$$ \text{BCE}(y, \hat{y}) = - \left( y \log(\hat{y}) + (1 - y) \log(1 - \hat{y}) \right) $$

Here, $y$ is the true label (0 or 1), and $\hat{y}$ denotes the predicted probability of $y=1$.

In this task, we aim to explore the implementation of a model using **Cross-Entropy (CE) Loss**, which is a more common approach for classification tasks, especially when dealing with multiple classes. CE loss is expressed as:

$$ \text{CE}(y, \hat{y}) = -\sum_{i} y^{(i)} \log(\hat{y}^{(i)}) $$

In this expression, $y$ represents the ground truth labels, $ \hat{y} $ is the predictions from your model, and $i$ is the index of the class.


#### 1. Modify the Loss (3 pts)
Transition to using Cross-Entropy (CE) Loss for the classification task by utilizing PyTorch's built-in functionalities. You can refer to the [official PyTorch documentation](https://pytorch.org/docs/stable/nn.html) for detailed information and guidance to ensure the correct implementation of the CE loss.

In [None]:
import torch.nn as nn

# Replace '...' with the appropriate loss function in PyTorch
loss = nn.CrossEntropyLoss()

#### 2. Modify the Model Architecture (2 pts)
To adapt the original code for use with Cross-Entropy (CE) loss, make necessary modifications to the model architecture. Ensure it is compatible and optimized for the application of CE loss. Consider the number of output nodes and the activation function used in the output layer for effective multi-class classification.

In [None]:
# Modifying the architecture to be compatible with CE loss
ce_model = nn.Sequential(
    nn.Flatten(), #Converting from 2D to 1D
    nn.Linear(256*256*1, 256),
    nn.ReLU(),
    nn.Linear(256, 2)
).cuda()

#### 3. Reflection Questions (15 pts, 5 pts for each)
Provide detailed answers to the questions below:

**Q1. Loss Function Comparison:**  
   What are the differences between Binary Cross-Entropy (BCE) loss and Cross-Entropy (CE) loss?

<u>***Answer:***</u>  Cross-Entropy (CE) loss is mainly used for multi-class classification whereas Binary Cross-Entropy (BCE) is utilized for binary classification. In terms of implementation if CE loss is used in binary classification, it expects two output features, whereas BCE loss expects 1 output feature. The key distinction is that if one wants to generate output as a form of probability, you should use BCE loss because we can use sigmoid to process the single output as the loss function.

**Q2. Model Architecture Modification:**  
   What motivated the specific changes you made to the model architecture?

<u>***Answer:***</u> For the CE loss we need to ensure the last layer has a number of nodes equivalent to the number of classes. From previously mentioned in Q1, When CE loss when is used in binary classification, it expects 2 output features. That is why I change from **nn.Linear(256, 1)** to **nn.Linear(256, 2)**.

**Q3. Adapting to CE Loss:**  
   In the original code configured for BCE loss, two major adjustments are needed for adaptation to CE loss. Analyze and explain the necessity for these changes, referring to the code below.

```python
for images, labels in train_loader:
    images = images.cuda()
    images = images / 255.0
    labels = labels.cuda()
    optimizer.zero_grad()
    outputs = model(images)

    # Change #1: Adaptation to the labels for CE loss
    labels = labels.long()  # Changed from labels.float().unsqueeze(1) for BCE loss

    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    total_loss += loss.item()

    # Change #2: Predictions for CE loss
    train_predicted = outputs.argmax(-1)  # Changed from torch.sigmoid(outputs) > 0.5 for BCE loss
    train_correct += (train_predicted == labels).sum().item()
```

<u>***Answer:***</u>
<u><br>Change #1:</u> For BCE loss, the labels are often in float format, representing the probability of belonging to a particular class. Typically, there are 0.0 for negative class, and 1.0 for positive class. Moreover, BCE loss expects the targets and predictions to have the same shape, there's a need to add an additional dimension using 'unsqueeze(1)' to match the out put shape from the model. In contrast, CE loss requires labels to be class indices (integers) rather than float values. For binary classification the labels should be 0 for one class and 1 for the other class. That why the code convert the label to a 'long' datatype. Moreover, there is no need for the additional dimension since CE loss expects the labels in this compact format.

<u><br>Change #2:</u> For BCE loss, the model's output after sigmoid activation typically a value between 0 and 1, representing probability of belonging to the positive class. A threshold is used to convert this probability into a binary label which often uses the threshold of 0.5. However, with CE loss, the model's output consists of raw scores (logits) for each class. To determine the predicted class, it simply chooses the class with the highest raw score by using 'argmax(-1)'. It will return the index of the maximum value along the last dimension

<br>Both of these changes are critical to ensuring that the model is trained correctly when switching from BCE to CE loss and that its predictions are interpreted correctly.


#### Put Your Response Here:

##### 1. De Jun Huang.(Jun 12 2021). *Learning Day57/Practical 5:Loss function - CrossEntropyLoss vs BCELoss in Pytorch;Softmax vs sigmoid;Loss calculation.* https://medium.com/dejunhuang/learning-day-57-practical-5-loss-function-crossentropyloss-vs-bceloss-in-pytorch-softmax-vs-bd866c8a0d23

##### 2. Pytorch Contributors. (2023). *Crossentropyloss*. https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

### ✏️ Task B: Creating an Evaluation Code (20 pts)

Evaluate the performance of a pretrained deep learning model with a test dataset of chest X-ray images available in `test_normal.npy` and `test_pneumonia.npy` files. These files respectively contain 200 grayscale normal and pneumonia chest X-ray images, each of size 256×256. The objective is to calculate the model’s accuracy, defined as the percentage of images correctly classified. To accomplish this, you are tasked to write code that loads, processes, and evaluates the model on this specific dataset. Ensure each segment of code replacing the `...` placeholders is functional and aligns with the steps provided in the instructions.

**Note: ⚠️ Ensure to upload your trained model's weights to your working environment if needed.**

### Step 0: Download test dataset

In [None]:
!wget https://raw.githubusercontent.com/TacoXDD/homeworks/master/dataset/test/test_normal.npy
!wget https://raw.githubusercontent.com/TacoXDD/homeworks/master/dataset/test/test_pneumonia.npy

### Step 1: Prepare your test dataset

In [None]:
import numpy as np

test_abnormal = np.load('test_pneumonia.npy')
test_normal = np.load('test_normal.npy')

print(f'Shape of test_abnormal: {test_abnormal.shape}')
print(f'Shape of test_normal: {test_normal.shape}')

# For the data having presence of pneumonia assign 1, for the normal ones assign 0.
test_abnormal_labels = np.ones((test_abnormal.shape[0],))
test_normal_labels = np.zeros((test_normal.shape[0],))

x_test = np.concatenate((test_abnormal, test_normal), axis=0)
y_test = np.concatenate((test_abnormal_labels, test_normal_labels), axis=0)

print(f'Shape of x_test: {x_test.shape}')
print(f'Shape of y_test: {y_test.shape}')

### Step 2: Load Test Images into PyTorch DataLoader (5 pts)

In [None]:
import torch
from torch.utils.data import DataLoader, TensorDataset, random_split

# Convert to PyTorch tensors
x_test = torch.from_numpy(x_test).float()
y_test = torch.from_numpy(y_test).long()

# Combine the images and labels into a dataset
test_dataset = TensorDataset(x_test, y_test)

# Create a dataloader to load data in batches. Set batch size to 32.
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

### Step 3: Prepare Your Trained Model  (5 pts)
- Define the architecture to match exactly with the trained model intended for inference. Ensure strict alignment to avoid errors during evaluation.
- Load the weights from the trained model and set the model to evaluation mode

In [None]:
# Declare the model architecture
model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(256*256*1, 256),
    nn.ReLU(),
    nn.Linear(256, 1)
).cuda()

# Load the trained weights
checkpoint_path = "model_classification.pth"
model.load_state_dict(torch.load(checkpoint_path))

# Set the model to evaluation mode
model.eval()

### Step 4: Perform Inference and Calculate the Accuracy (10 pts)
- Ensure the image values are processed in a manner consistent with the training phase.
- Use the model that was trained with BCE loss to execute inference on the test dataset.
- Note that inference should be performed in GPU.

In [None]:
test_correct = 0
test_total = 0

with torch.no_grad():
    for images, labels in test_loader:

        images = images / 255
        images = images.cuda()

        labels = labels.cuda()

        outputs = model(images)

        labels_float = labels.float().unsqueeze(1)  # Convert labels to float and match shape with outputs
        predicted = torch.sigmoid(outputs) > 0.5

        test_correct += (predicted.float() == labels_float).sum().item()
        test_total += labels.size(0)

print(f'Test accuracy is {100 * test_correct/test_total}%.')