<div align="center">

# National Tsing Hua University

### Fall 2023

#### 11210IPT 553000

#### Deep Learning in Biomedical Optical Imaging

## Homework 2

</div>


### ✏️ Task A: Transitioning to Cross-Entropy Loss (20 pts)

In Lab, we utilized the **Binary Cross-Entropy (BCE) Loss** for a binary classification task. The BCE loss is articulated as:

$$ \text{BCE}(y, \hat{y}) = - \left( y \log(\hat{y}) + (1 - y) \log(1 - \hat{y}) \right) $$

Here, $y$ is the true label (0 or 1), and $\hat{y}$ denotes the predicted probability of $y=1$.

In this task, we aim to explore the implementation of a model using **Cross-Entropy (CE) Loss**, which is a more common approach for classification tasks, especially when dealing with multiple classes. CE loss is expressed as:

$$ \text{CE}(y, \hat{y}) = -\sum_{i} y^{(i)} \log(\hat{y}^{(i)}) $$

In this expression, $y$ represents the ground truth labels, $ \hat{y} $ is the predictions from your model, and $i$ is the index of the class.


#### 1. Modify the Loss (3 pts)
Transition to using Cross-Entropy (CE) Loss for the classification task by utilizing PyTorch's built-in functionalities. You can refer to the [official PyTorch documentation](https://pytorch.org/docs/stable/nn.html) for detailed information and guidance to ensure the correct implementation of the CE loss.

In [None]:
import torch.nn as nn

# Replace '...' with the appropriate loss function in PyTorch
loss = nn.CrossEntropyLoss()

#### 2. Modify the Model Architecture (2 pts)
To adapt the original code for use with Cross-Entropy (CE) loss, make necessary modifications to the model architecture. Ensure it is compatible and optimized for the application of CE loss. Consider the number of output nodes and the activation function used in the output layer for effective multi-class classification.

In [None]:
# Modifying the architecture to be compatible with CE loss
ce_model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(256*256*1, 256),
    nn.ReLU(),
    nn.Linear(256, 2)
).cuda()

#### 3. Reflection Questions (15 pts, 5 pts for each)
Provide detailed answers to the questions below:

**Q1. Loss Function Comparison:**  
   What are the differences between Binary Cross-Entropy (BCE) loss and Cross-Entropy (CE) loss?

**Q2. Model Architecture Modification:**  
   What motivated the specific changes you made to the model architecture?

**Q3. Adapting to CE Loss:**  
   In the original code configured for BCE loss, two major adjustments are needed for adaptation to CE loss. Analyze and explain the necessity for these changes, referring to the code below.

```python
for images, labels in train_loader:
    images = images.cuda()
    images = images / 255.0
    labels = labels.cuda()
    optimizer.zero_grad()
    outputs = model(images)

    # Change #1: Adaptation to the labels for CE loss
    labels = labels.long()  # Changed from labels.float().unsqueeze(1) for BCE loss

    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    total_loss += loss.item()

    # Change #2: Predictions for CE loss
    train_predicted = outputs.argmax(-1)  # Changed from torch.sigmoid(outputs) > 0.5 for BCE loss
    train_correct += (train_predicted == labels).sum().item()
```

#### Put Your Response Here:

##### 1. The main difference between BCE and CE is that BCE is used for binary classification tasks, where there are two possible classes. However, CE can be used for  multi-class classification tasks, where there are more than two classes. Therefore, the labels of BCE and CE have different meanings, in BCE case, we normally will use sigmoid function as activation function because the float value between 0 and 1 can represent the probability of the sample which is suitable for the binary class membership. As for the CE class, the target label for each sample is represented as an integer, indicating the index of the true class among multiple classes. Therefore, the softmax activation function is applied to these neurons to transform the real-valued outputs into class probabilities.

##### 2. In the model architecture coding part, we changed the last output line "nn.linear(256,2)", the reason that we change to 2 is because in CE loss case, the neurons means the number of classes and it computes a weighted sum of the input values for each class. Therefore, in our case choosing 2 is because we only class 0 and 1 for normal and abnormal, respectively.

##### 3. For the change #1, compared to BCE loss, the label of CE loss is used to compare the probability distribution of network outputs with the true class labels, which are integer indices; therefore,we should use ".long" to change to integers to represent the true class for each sample. For the change #2, unlike BCE loss that need to distinguish the probabliity, CE loss uesd "outputs.argmax(-1)" to find the index of the class with the highest probability for each sample. Moreover, the index "-1" means the last axis of the tensor, so argmax(-1) is used to find the index of the maximum value along the last axis.

### ✏️ Task B: Creating an Evaluation Code (20 pts)

Evaluate the performance of a pretrained deep learning model with a test dataset of chest X-ray images available in `test_normal.npy` and `test_pneumonia.npy` files. These files respectively contain 200 grayscale normal and pneumonia chest X-ray images, each of size 256×256. The objective is to calculate the model’s accuracy, defined as the percentage of images correctly classified. To accomplish this, you are tasked to write code that loads, processes, and evaluates the model on this specific dataset. Ensure each segment of code replacing the `...` placeholders is functional and aligns with the steps provided in the instructions.

**Note: ⚠️ Ensure to upload your trained model's weights to your working environment if needed.**

### Step 0: Download test dataset

In [None]:
!wget https://raw.githubusercontent.com/TacoXDD/homeworks/master/dataset/test/test_normal.npy
!wget https://raw.githubusercontent.com/TacoXDD/homeworks/master/dataset/test/test_pneumonia.npy

--2023-10-12 19:41:27--  https://raw.githubusercontent.com/TacoXDD/homeworks/master/dataset/test/test_normal.npy
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13107328 (12M) [application/octet-stream]
Saving to: ‘test_normal.npy’


2023-10-12 19:41:28 (186 MB/s) - ‘test_normal.npy’ saved [13107328/13107328]

--2023-10-12 19:41:28--  https://raw.githubusercontent.com/TacoXDD/homeworks/master/dataset/test/test_pneumonia.npy
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13107328 (12M) [application/octet-stream]
Saving to: ‘test_p

### Step 1: Prepare your test dataset

In [None]:
import numpy as np

test_abnormal = np.load('test_pneumonia.npy')
test_normal = np.load('test_normal.npy')

print(f'Shape of test_abnormal: {test_abnormal.shape}')
print(f'Shape of test_normal: {test_normal.shape}')

# For the data having presence of pneumonia assign 1, for the normal ones assign 0.
test_abnormal_labels = np.ones((test_abnormal.shape[0],))
test_normal_labels = np.zeros((test_normal.shape[0],))

x_test = np.concatenate((test_abnormal, test_normal), axis=0)
y_test = np.concatenate((test_abnormal_labels, test_normal_labels), axis=0)

print(f'Shape of x_test: {x_test.shape}')
print(f'Shape of y_test: {y_test.shape}')

### Step 2: Load Test Images into PyTorch DataLoader (5 pts)

In [None]:
import torch
from torch.utils.data import DataLoader, TensorDataset, random_split

# Convert to PyTorch tensors
x_test = torch.from_numpy(x_test).float()
y_test = torch.from_numpy(y_test).long()

# Combine the images and labels into a dataset
test_dataset = TensorDataset(x_test, y_test)

# Create a dataloader to load data in batches. Set batch size to 32.
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=True)

### Step 3: Prepare Your Trained Model  (5 pts)
- Define the architecture to match exactly with the trained model intended for inference. Ensure strict alignment to avoid errors during evaluation.
- Load the weights from the trained model and set the model to evaluation mode

In [None]:
# Declare the model architecture
import torch.nn as nn
model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(256*256*1, 256),
    nn.ReLU(),
    nn.Linear(256, 1)
).cuda()

# Load the trained weights
model.load_state_dict(torch.load('model_classification.pth'))

# Set the model to evaluation mode
model.eval()

### Step 4: Perform Inference and Calculate the Accuracy (10 pts)
- Ensure the image values are processed in a manner consistent with the training phase.
- Use the model that was trained with BCE loss to execute inference on the test dataset.
- Note that inference should be performed in GPU.

In [None]:
test_correct = 0
test_total = 0

with torch.no_grad():
    for images, labels in test_loader:

        images = images.cuda()
        images = images / 255.

        labels = labels.cuda()

        outputs = model(images)

        labels_float = labels.float().unsqueeze(1)  # Convert labels to float and match shape with outputs
        predicted = torch.sigmoid(outputs) > 0.5

        test_correct += (predicted.float() == labels_float).sum().item()
        test_total += labels.size(0)
        test_accuracy=test_correct/test_total

print(f'Test accuracy is {100*test_accuracy:.2f}%.')

NameError: ignored