# ARIN7102 Applied Datamining and Text Analytics: Assignment 1

## Question 5: Support Vector Machine

Complete and hand in this completed worksheet with your assignment submission. 

In this exercise you will:
    
- Understand the logic of the following code to use SVM for image classification.
- Implement a linear SVM by calling functions from scikit-learn module. 
- Tune parameters of SVM. Analyze the tuned results.

TEST ACC BASELINE for default SVM + PCA (128 Components) model --> 23%

#### 1. Prepare CIFAR10 dataset
Prepare CIFAR-10 images here for image classification.

In [1]:
import torch
import torchvision.datasets as datasets


cifar_trainset = datasets.CIFAR10(root='./dataset', train=True, download=True, transform=None)
cifar_testset = datasets.CIFAR10(root='./dataset', train=False, download=True, transform=None)

X_train = torch.tensor(cifar_trainset.data) / 255
X_test = torch.tensor(cifar_testset.data) / 255
y_train = torch.tensor(cifar_trainset.targets)
y_test = torch.tensor(cifar_testset.targets)

print('Training set:', )
print('  data shape:', X_train.shape)
print('  labels shape: ', y_train.shape)
print('Test set:')
print('  data shape: ', X_test.shape)
print('  labels shape', y_test.shape)

Files already downloaded and verified
Files already downloaded and verified
Training set:
  data shape: torch.Size([50000, 32, 32, 3])
  labels shape:  torch.Size([50000])
Test set:
  data shape:  torch.Size([10000, 32, 32, 3])
  labels shape torch.Size([10000])


#### 2. Extract feature vectors
We utilized PCA (128 components) to extract the features vectors. Please understand the logic of following code. And feel free to propose another method to improve your model, including making use of any feature selection methods.

In [2]:
#################################################################################################################
# TODO:                                                                                                         #
# Feature extraction method.                                                                                    #
# Currently we are using PCA method to extract the features, but you can also change it to improve your model   #
#################################################################################################################
'''
Reference on other feature extractors: 
1. https://medium.com/the-owl/extracting-features-from-an-intermediate-layer-of-a-pretrained-model-in-pytorch-c00589bda32b 
2. https://learnopencv.com/pytorch-for-beginners-image-classification-using-pre-trained-models/ 
3. https://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html 
'''
# ***** START OF YOUR CODE *****
from sklearn.decomposition import PCA

pca = PCA(n_components=128)
num_train = X_train.shape[0]
num_test = X_test.shape[0]

X_train_feat = pca.fit_transform(X_train.reshape(num_train, -1))
X_test_feat = pca.transform(X_test.reshape(num_test, -1))
# ***** END OF YOUR CODE *****

print(X_train_feat.shape)

(50000, 128)


#### 3. Implement a Linear Support Vector Machine(SVM)
Check the documentation at https://scikit-learn.org/stable/modules/classes.html#module-sklearn.svm to see how to use these functions to train and test a Support Vector Machine(SVM). Then, implement a your SVM and in the following *TODO* block. Before implementing the function, it is recommended to read the function description and NOTE inside the function. Moreover, you can only write your code in the predefined place. 

- You may try any kind of data preprocessing to improve your testing results
- You may tune any parameters like kernels and penality functions, etc to improve your testing results.
- You can also use other feature extractor other than PCA, such as LDA, resnet18, 50, transformer etc.   
- Do not change the dataset
- Do not use any GPU acceleration method in your implementation. We are going to evaluate your model solely using CPU. Thus, if your model training takes up more than >1 hour, we will disqualify your model from the competiton but you will still get some points.
- Then, you need to write down what methods/ parameters you have tried and the detailed analysis of results. If you use some extra libraries in your implementation, please let us know in your analysis so that we can also reproduce your model.

In [3]:
from sklearn import svm

#################################################################################################################
# Step 1 TODO:                                                                                                  #
# Training a linear SVM using training set.                                                                     #
# You need to write code that trains a SVM. Please refer to the document of sklearn.svm.                        #
#                                                                                                               #
# Currently the ACC test result for default SVM + PCA (as feature extractor) is ~23%, Try to improve it!        #
# With your own implementation                                                                                  #
#                                                                                                               #
# You may tune SVM parameters such as kernels and penality functions to improve your testing results.           #
# You can also use other feature extractor other than PCA, such as LDA, HOG, or Deep learning pretrained models such as ResNets, Transformer etc.      #
# Then, you need to write down what methods / parameters you have tried and the detailed analysis of results.   #
# Note that you can either write your detailed analysis in the next block or in your answer sheet               #
#################################################################################################################
# *****START OF YOUR CODE *****

# Create a svm instance and Train the svm instance svc with the training set samples 'X_train_feat' and labels 'y_train' as input
svc = svm.SVC(kernel='linear')
svc.fit(X_train_feat, y_train)
# *****END OF YOUR CODE *****


########################################################################################################
# Step 2 TODO:                                                                                         #
# Using the trained SVM to predict the classification results of validation set.                       #
# You need to write code that test the SVM you implemented above.                                      #
# Again you may refer to the document to see if any functions are already defined for prediction.      #
########################################################################################################
# ***** START OF YOUR CODE *****

# Test our trained svm instance svc with the test set samples 'X_test_feat'
y2_pred = svc.predict(X_test_feat)
# ***** END OF YOUR CODE *****

# Print the predicted outputs and accuracies.
# We use precision, recall and f-measure to measure the accuracy of image classification. 
# To understand the meaning of these metrics, you can refer to 
# https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html.
from sklearn.metrics import classification_report
print(classification_report(y_test, y2_pred))

              precision    recall  f1-score   support

           0       0.46      0.50      0.48      1000
           1       0.46      0.49      0.48      1000
           2       0.31      0.30      0.30      1000
           3       0.32      0.32      0.32      1000
           4       0.36      0.28      0.31      1000
           5       0.34      0.32      0.33      1000
           6       0.42      0.50      0.46      1000
           7       0.47      0.43      0.45      1000
           8       0.51      0.53      0.52      1000
           9       0.46      0.48      0.47      1000

    accuracy                           0.41     10000
   macro avg       0.41      0.41      0.41     10000
weighted avg       0.41      0.41      0.41     10000



#### Better Feature Extraction
In order to get better testing result, I will try to use pretrain resnet34 as the main feature extraction method to obtain the feature. And then I will use the SVM to complete the classify task.

In [1]:
import torch
import torch.nn as nn
import time
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader
from sklearn.svm import SVC
from tqdm import tqdm

In [2]:
batch_size = 64
# Use pretrain resnet 34 to extract image feature
# resnet 18: models.ResNet18_Weights.DEFAULT
# resnet 34: models.ResNet34_Weights.DEFAULT
# resnet 50: models.ResNet50_Weights.DEFAULT
model = models.resnet34(weights=models.ResNet34_Weights.DEFAULT)
model.fc = nn.Identity()
for param in model.parameters():
    param.requires_grad = False

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

**In order to use transform to adapt resnet 34 (with 224x224 input image size), reload the dataset but not change any other things**

In [3]:
start_time = time.time()

cifar_trainset = datasets.CIFAR10(root='./dataset', train=True, download=False, transform=transform)
cifar_testset = datasets.CIFAR10(root='./dataset', train=False, download=False, transform=transform)

train_loader = DataLoader(cifar_trainset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(cifar_testset, batch_size=batch_size, shuffle=False)

In [4]:
def extract_features(model, loader, tip):
    model.eval()
    features = []
    labels = []
    with torch.no_grad():
        for inputs, targets in tqdm(loader,f"{tip} dataset feature extract: "):
            outputs = model(inputs)
            features.append(outputs)
            labels.append(targets)
    features = torch.cat(features, dim=0)
    labels = torch.cat(labels, dim=0)
    return features.numpy(), labels.numpy()

X_train_feat, y_train = extract_features(model, train_loader,"train")
X_test_feat, y_test = extract_features(model, test_loader,"test")
featrue_extraction_time = time.time()

print(X_train_feat.shape)
print(y_train.shape)

train dataset feature extract: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 782/782 [17:35<00:00,  1.35s/it]
test dataset feature extract: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 157/157 [03:23<00:00,  1.29s/it]

(50000, 512)
(50000,)





In [5]:
#################################################################################################################
# Step 1 TODO:                                                                                                  #
# Training a linear SVM using training set.                                                                     #
#################################################################################################################

svm = SVC(kernel='linear')
svm.fit(X_train_feat, y_train)

########################################################################################################
# Step 2 TODO:                                                                                         #
# Using the trained SVM to predict the classification results of validation set.                       #
# Test the SVM you implemented above.                                                                  #
########################################################################################################
from sklearn.metrics import classification_report
y2_pred = svm.predict(X_test_feat)
print(classification_report(y_test, y2_pred))

print("Feature extract time consumption",featrue_extraction_time - start_time)
print("SVM time consumption:",time.time() - featrue_extraction_time)
print("Total time consumption:",time.time() - start_time)

              precision    recall  f1-score   support

           0       0.88      0.91      0.89      1000
           1       0.93      0.94      0.94      1000
           2       0.82      0.84      0.83      1000
           3       0.77      0.81      0.79      1000
           4       0.84      0.85      0.84      1000
           5       0.86      0.85      0.86      1000
           6       0.92      0.88      0.90      1000
           7       0.92      0.89      0.90      1000
           8       0.94      0.93      0.94      1000
           9       0.94      0.91      0.93      1000

    accuracy                           0.88     10000
   macro avg       0.88      0.88      0.88     10000
weighted avg       0.88      0.88      0.88     10000

Feature extract time consumption 1259.9525394439697
SVM time consumption: 169.76516246795654
Total time consumption: 1429.7177019119263


# Analysis:
###############################################################################

TODO: what methods/ parameters you have tried and the detailed analysis of results.
You can give your analysis either in the notebook or in a pdf

###############################################################################

- For the PCA+SVM approach, it can project complex high-dimensional data into a low-dimensional space, thereby achieving a certain degree of class separation. However, since PCA reduces dimensionality by identifying the principal directions of data distribution while paying less attention to intra-class compactness and inter-class separability, the differences between different categories become less distinct after projection. This makes it difficult for the SVM to find a decision boundary that can correctly classify most of the samples during the training process. As a result, this not only increases the training time for the SVM but also lowers the final accuracy. To address this issue, I further enhanced the process by extracting information from different channels and positions of the images, while maximizing the feature distances between different categories. For this purpose, I chose to use the ResNet series for feature extraction.

- ResNet introduces skip connections combined with convolutional layers, enabling the network to transform input image data into high-quality flat vectors before the final fully connected (fc) layer. At the same time, it ensures that these vectors maximize the inter-class distances in the feature space. Considering that the final testing process needs to be completed in a CPU environment within a one-hour time limit, using a small pre-trained model is essential. 

- During the experiment, I tested three different ResNet models: ResNet18, ResNet34 and ResNet50
    - ResNet18: Due to its smaller size, ResNet18 was insufficient in extracting features from the input images, leading to slightly lower classification performance when fed into the SVM compared to ResNet34.
    - ResNet34: ResNet34, with its stronger feature extraction capabilities, performed better. I removed the model's fc layer and directly fed the extracted feature vectors into the SVM for classification. To ensure proper feature extraction, I resized the input images to 224×224.
    - ResNet50: Compared to ResNet34, ResNet50 has a more complex network structure and a larger number of parameters, theoretically enabling it to achieve more powerful image feature extraction capabilities. However, the image content in CIFAR-10 is relatively simple. On one hand, it does not require an overly complex network for feature extraction, and on the other hand, using ResNet50 for inference on a CPU takes too much time. Therefore, after testing, I decided to abandon ResNet50. 

- It is worth noting that, to ensure the ResNet model extracts features correctly, I resized the input images to 224×224. Finally, after approximately one hour of runtime, I achieved an 88% classification accuracy on the CIFAR-10 dataset, which meets the expected results.

**Data Analysis**  
- The experimental results show that the model based on ResNet34 feature extraction and SVM classifier achieves an overall accuracy of 88% on the CIFAR-10 dataset, and the performance is relatively stable. For example, the f1-score of categories 1, 8 and 9 reached 0.94, 0.94 and 0.93 respectively, indicating that the feature extraction of these categories was clear and the classification was accurate.  

- However, for category 3, its precision is only 0.77, the lowest of all categories, which means that this kind of category may have some similarity with the other categories, leading to confusion in the classifier. Such issues could potentially be resolved by using larger models, such as ResNet50 or ResNet101, which are capable of extracting more comprehensive features and improving classification performance. However, due to limitations in hardware and time, the final classification accuracy is somewhat affected. 

- In addition, the running time in CPU environment is: 
    - Feature extraction: 1260 seconds (21 min)
    - SVM process: 170 seconds (2.8 min)
    - Total process: 1430 seconds (24 min)

- Overall, by combining deep learning and traditional machine learning methods, the model balances the quality of feature extraction and the computational cost, achieving a good classification effect, but there is still room for improvement in the ability to distinguish individual category