#  Deep Learning Assignment 1
Author: 林彥成, P88101029, NCKU
Date: 21 Mar, 2023

## General description


Please download the dataset from Moodle. Regarding the detailed information of the dataset, please see the readme in the dataset.

Please build an "Image Classification Pipeline" based on the following:

* Reading images to an array
* Feature extraction (transform the image into a fixed-length feature vector)
* Apply any classifier to verify the performance.

It is suggested to use OpenCV lib to extract features and use a linear classifier (Perceptron) to predict the label. You can find perception easily from GitHub, like

* Perceptron: https://github.com/Vercaca/Perceptron/blob/master/perceptron.py 
* Reference: https://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/readings/L03%20Linear%20Classifiers.pdf

Using any open-source perception is acceptable, and you must indicate which GitHub you adopted.

You should at least adopt one image feature extractor to transform the pixel domain into another feature domain. 

The recommended image feature extraction methods (but not limited to) can be found as follows:

* Global color histogram [ref: https://en.wikipedia.org/wiki/Color_histogram]
* HoG [ref: https://en.wikipedia.org/wiki/Histogram_of_oriented_gradients]
* BoVW [ref: https://medium.com/analytics-vidhya/bag-of-visual-words-bag-of-features-9a2f7aec7866]

Note that you should train a model based on the training set and evaluate the performance by validation and testing set.

## Assignment Requirements:

Your code needs to meet the functionality below:
* Any classifier is acceptable. For example, SVM, perception, linear classifier, or even NN classifier. In this assignment, you should choose at least three different classifiers to evaluate the performance based on our dataset.
* Image feature extraction (can be any existing package)
* For the training phase, your algorithm can be used to train a model based on the extracted features of the training samples.
* For the testing phase, your code should be able to effectively classify the extracted feature of the test sample.
* At least one ensemble learning-based classifier should be used, such as AdaBoost, Xgboost, Lightgbm, Catboost, etc.,
* ReadMe (markdown format) indicates how to train a model and evaluate performance. 
* Complete source codes (Link of Colab or GitHub)
* Word/PDF report to show that
    * GitHub/Colab link
    * What model you used
    * The curves of the training accuracy and validation accuracy
    * The predicted result on the validation/testing  set (evaluated in top-1 accuracy and top-5 accuracy, DO NOT use any existing package/toolbox)
* Performance comparison among three classifiers.

---

# 環境資訊

In [2]:
!nvidia-smi

Tue Mar 21 17:22:45 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.57       Driver Version: 515.57       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:06:00.0  On |                  N/A |
| 25%   43C    P5    24W / 125W |    918MiB /  6144MiB |      6%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [3]:
!pip3 freeze > requirements.txt

# 讀取圖像數據並準備數據

In [44]:
import os
import zipfile
dataset_path = './dataset'
if not os.path.exists(dataset_path):
    with zipfile.ZipFile('./images.zip', 'r') as zf:
        zf.extractall(path=dataset_path)
else:
    print('The images.zip has been extracted.')

The images.zip has been extracted.


# 載入所需套件

In [42]:
%matplotlib inline
import os
import random
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from PIL import Image
import torchvision.models as models
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.svm import LinearSVC
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import learning_curve
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
import numpy as np
import matplotlib.pyplot as plt

# 圖像特徵提取

## 讀取資料與建立feature extractor

* 首先利用pytorch中的dataset模組建立自己的dataset
* 接著將三個檔案分別包裝成dataset格式之後利用pytorch的dataloader載入以便後續使用
* 最後建立一個feature extractor(在此次作業中我是使用pytorch提供的預訓練resnet50)

In [12]:
# 定義自己的dataset
class dataset_C0(torch.utils.data.Dataset):
    def __init__(self, file_path, image_size=256):
        with open(file_path, 'r') as f:
            lines = f.readlines()
        self.data = [line.rstrip().split() for line in lines]
        self.transform = transforms.Compose([
            transforms.Resize(image_size),
            transforms.CenterCrop(image_size-1),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])
        ])
        self.root_path = './dataset/'

    def __getitem__(self, idx):
        image_path, image_label = self.data[idx]
        img = Image.open(self.root_path+image_path).convert('RGB')
        img = self.transform(img)
        return img, int(image_label)

    def __len__(self):
        return len(self.data)

# train set, val set, test set
IMG_SIZE = 64
BATCH_SIZE = 128
train_data = dataset_C0(os.path.join(dataset_path,'train.txt'), IMG_SIZE)
val_data = dataset_C0(os.path.join(dataset_path,'val.txt'), IMG_SIZE)
test_data = dataset_C0(os.path.join(dataset_path,'test.txt'), IMG_SIZE)

train_loader = torch.utils.data.DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=False)
val_loader = torch.utils.data.DataLoader(val_data, batch_size=BATCH_SIZE, shuffle=False)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=BATCH_SIZE, shuffle=False)

# 建立feature estractor
model = models.resnet50(pretrained=True)

In [13]:
print('train set size = ',len(train_data))
print('val set size = ',len(val_data))
print('test set size = ',len(test_data))

train set size =  63325
val set size =  450
test set size =  450


## 利用特徵提取器提取特徵

In [39]:
feature_name_list = ['Resnet50','HoG']

### Resnet50

* 因為我是使用預訓練的resnet50所以我們只需要利用它提供到進入FC層之前的特徵即可(FC層為一分類器)

In [14]:
# 特徵擷取
def extract_features(data_loader, model):
    features_extractor = nn.Sequential(*list(model.children())[:-1]) #取出最後一層之前的所有層
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model.eval()
    model.to(device)
    
    #循環取出所有影像的特徵(進入FC之前的向量)
    features, labels = [], []
    for i, (inputs, targets) in enumerate(data_loader):
        inputs = inputs.to(device)
        # print(inputs.shape)
        with torch.no_grad():
            temp = features_extractor(inputs).flatten(start_dim=1)
        features.append(temp.cpu().numpy())
        labels.append(targets.numpy())
    return np.concatenate(features), np.concatenate(labels) #這樣寫是為了方便帶入sklearn的分類器中訓練(要符合固定格式)


# # 利用上面的函式提取train set, val set, test set的特徵
train_features, train_labels = extract_features(train_loader, model)
val_features, val_labels = extract_features(val_loader, model)
test_features, test_labels = extract_features(test_loader, model)

In [18]:
feature_name = feature_name_list[0]
np.save(os.path.join(dataset_path,'train_features_'+feature_name),train_features)
np.save(os.path.join(dataset_path,'train_labels_'+feature_name),train_labels)
np.save(os.path.join(dataset_path,'val_features_'+feature_name),val_features)
np.save(os.path.join(dataset_path,'val_labels_'+feature_name),val_labels)
np.save(os.path.join(dataset_path,'test_features_'+feature_name),test_features)
np.save(os.path.join(dataset_path,'test_labels_'+feature_name),test_labels)

### HoG

# 評估指標

In [88]:

sorted_indices = np.argsort(val_preds_5,axis=1)[::-1][:1]
top_k_accuracies = sorted_indices==val_labels

  top_k_accuracies = sorted_indices==val_labels


In [37]:
def top_1_acc(y_true, y_pred):
    """
    計算 top 1 accuracy
    
    參數:
    y_true -- 真實標籤 (numpy array= m x 1)
    y_pred -- 模型預測機率 (numpy array= m x n)
    k -- 前 k 高的機率會被考慮 (int)
    m -- 樣本個數
    n -- 標籤數量
    
    回傳:
    accuracy -- top k accuracy (float)
    """
    sorted_indices = np.flip(np.argsort(y_pred,axis=1),axis=1) # Descent sorting
    pred_top1 = sorted_indices[:,0]
    accuracy = np.mean(y_true==pred_top1)
    
    return accuracy

def top_5_acc(y_true, y_pred):
    """
    計算 top 1 accuracy
    
    參數:
    y_true -- 真實標籤 (numpy array= m x 1)
    y_pred -- 模型預測機率 (numpy array= m x n)
    k -- 前 k 高的機率會被考慮 (int)
    m -- 樣本個數
    n -- 標籤數量
    
    回傳:
    accuracy -- top k accuracy (float)
    """
    k=2
    sorted_indices = np.flip(np.argsort(y_pred,axis=1),axis=1) # Descent sorting
    pred_top5 = sorted_indices[:,:5]
    num_correct = 0
    for i in range(y_true.shape[0]):
        if y_true[i] in pred_top5[i,:]:
            num_correct += 1
    accuracy = num_correct / len(y_true)
    
    return accuracy



In [36]:
import numpy as np

# 範例資料
y_true = np.array([2,1,0,0,2,1]).reshape(6,1)
y_pred = np.array([[0.2, 0.7, 0.1], 
                   [0.8, 0.3, 0.7], 
                   [0.8, 0.3, 0.7],
                   [0.5, 0.3, 0.7],
                   [0.5, 0.3, 0.7],
                   [0.1, 0.2, 0.2]])

# 計算 top 1 accuracy
accuracy = calculate_top_k_accuracy(y_true, y_pred, 1)
print("Top 1 accuracy:", accuracy)

# 計算 top 2 accuracy
accuracy = calculate_top_k_accuracy(y_true, y_pred, 2)
print("Top 2 accuracy:", accuracy)


top_1 = top_1_acc(y_true, y_pred)
print("Top 1 accuracy:", top_1)

top_5 = top_5_acc(y_true, y_pred)
print("Top 5 accuracy:", top_5)


Top 1 accuracy: 0.0
Top 2 accuracy: 0.0
Top 1 accuracy: 0.3333333333333333
Top 5 accuracy: 0.6666666666666666


# 訓練和評估模型

## Features taking

'Resnet50','HoG'

In [40]:
feature_name = feature_name_list[0]
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(feature_name)
print(device)

Resnet50
cuda:0


In [45]:
train_features = np.load(os.path.join(dataset_path,'train_features_'+feature_name+'.npy'))
train_labels = np.load(os.path.join(dataset_path,'train_labels_'+feature_name+'.npy'))
val_features = np.load(os.path.join(dataset_path,'val_features_'+feature_name+'.npy'))
val_labels = np.load(os.path.join(dataset_path,'val_labels_'+feature_name+'.npy'))
test_features = np.load(os.path.join(dataset_path,'test_features_'+feature_name+'.npy'))
test_labels = np.load(os.path.join(dataset_path,'test_labels_'+feature_name+'.npy'))

## Mechine Learning

### Adaboost

In [48]:
train_index = np.arange(len(train_features))
random.shuffle(train_index)
tf = train_features[train_index[:2000],:]
tl = train_labels[train_index[:2000]]


In [67]:
le = LabelEncoder()
y = val_labels.copy()
le.fit(y)

In [84]:
top_1_acc(val_labels,val_preds_5)

  top_k_accuracies = sorted_indices==y_true


0.0

In [49]:
best_depth = 3

# 建立AdaBoost分類器
adaboost = AdaBoostClassifier(
    base_estimator=DecisionTreeClassifier(max_depth=best_depth)
)

# 訓練AdaBoost分類器
# adaboost.fit(train_features, train_labels)
adaboost.fit(tf, tl)



In [52]:
calculate_top_k_accuracy(val_preds_5, val_labels,1)

0.0

In [50]:
# val

val_preds = adaboost.predict(val_features)
val_preds_5 = adaboost.predict_proba(val_features)
print(val_preds_5.shape)
print(f'Top1 Accuracy on validation set: \
      {calculate_top_k_accuracy(val_preds, val_labels,1):.4f}')
print(f'Top5 Accuracy on validation set: \
      {calculate_top_k_accuracy(val_preds_5, val_labels,5):.4f}')

(450, 50)


IndexError: invalid index to scalar variable.

### XGBoost

### LightGBM

### Linear SVM

### KNN

## Neural Network

In [44]:
# # 將提取的特徵轉為DataLoader形式
def feature_loader(features,labels,BATCH_SIZE):
    X = torch.tensor(features, dtype=torch.float32)
    y = torch.tensor(labels, dtype=torch.float32).reshape(-1, 1)
    loader = torch.utils.data.DataLoader(list(zip(X,y)), batch_size=BATCH_SIZE, shuffle=True)
    return loader

train_feature_loader = feature_loader(train_features,train_labels,BATCH_SIZE)
val_feature_loader = feature_loader(val_features,val_labels,BATCH_SIZE)
test_feature_loader = feature_loader(test_features,test_labels,BATCH_SIZE)

### CNN

In [41]:


# model = CNN(num_classes=50).to(device)
loss_fcn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
# optimizer = torch.optim.Adam(params=model.parameters(), lr=1e-3)

torch.manual_seed(42)
EPOCH = 30
train_loss_list = []
train_top1_list = []
train_top5_list = []
val_loss_list = []
val_top1_list = []
val_top5_list = []
for epoch in range(EPOCH):
    # print(f"Epoch: {epoch + 1}")
    ### training
    model.train()
    train_loss, train_top1, train_top5 = 0, 0, 0
    for batch, (X, y) in enumerate(train_feature_loader, start=1):
        X, y = X.to(device), y.to(device)
        output = model(X)
        # y_pred = torch.argmax(softmax(y_logit, dim=1), dim=1) # top-1
        # loss = loss_fcn(y_logit, y.to(torch.int64))
        # train_loss += loss
        # train_acc += accuracy_fn(y, y_pred)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        break
        
        # if batch % 400 == 0:
            # print(f"Looked at {batch * len(X)}/{len(train_dataloader.dataset)} samples")
        
    train_loss /= len(train_dataloader)
    train_acc /= len(train_dataloader)
    train_loss_list.append(train_loss)
    train_top1_list.append(train_top1)
    train_top5_list.append(train_top5)
    

    ### validation
    model.eval()
    val_loss, val_top1, val_top5 = 0, 0, 0
    with torch.inference_mode():
        for X, y in val_feature_loader:
            X, y = X.to(device), y.to(device)
            val_logit = model(X)
            val_pred = torch.argmax(softmax(val_logit, dim=1), dim=1)

            val_loss += loss_fcn(val_logit, y.to(torch.int64))
            val_acc += accuracy_fn(y, val_pred)

        val_loss /= len(val_dataloader)
        val_acc /= len(val_dataloader)
        val_loss_list.append(val_loss)
        val_top1_list.append(val_top1)
        val_top5_list.append(val_top5)

    print(f"Train loss: {train_loss:.5f}, Train acc: {train_acc:.2f}% | Validation loss: {val_loss:.5f}, Validation acc: {val_acc:.2f}%")



RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [32, 2048]