# Feature Extraction
Convolutional Neural Networks can also be use just for extracting specific features from the dataset. Lets take an example of Dog vs Cat image dataset. Now using CNN we can extract the features from the images and then use any classification algorithm to classify the images. We will be using Support Vector Machine. Here extracted features will act as columns. The features extraction flow is as follows:  

First we have to get the dataset from kaggle so get your own `kaggle.json` and run next cell.

In [1]:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle competitions download -c dogs-vs-cats

Downloading train.zip to /content
 99% 540M/543M [00:04<00:00, 134MB/s]
100% 543M/543M [00:04<00:00, 129MB/s]
Downloading test1.zip to /content
 98% 265M/271M [00:02<00:00, 108MB/s] 
100% 271M/271M [00:02<00:00, 99.1MB/s]
Downloading sampleSubmission.csv to /content
  0% 0.00/86.8k [00:00<?, ?B/s]
100% 86.8k/86.8k [00:00<00:00, 91.2MB/s]


#### Extract the images.

In [None]:
!unzip -q test1.zip
!unzip -q train.zip

Store images in such a way that it will be easier to pass them to `dataset.ImageFolder`. It requires all images in the folders based on their labels so we will require subfolders 'dog' and 'cat' in the main 'train' directory.

In [None]:
!cd train && mkdir dog cat
!cd test1 && mkdir dog cat
!mv train/dog.* train/dog
!mv train/cat.* train/cat

#### Import Required libraries

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
%matplotlib notebook
%matplotlib inline
from sklearn import metrics, datasets
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import BaggingClassifier
from sklearn import svm
import torch
import torchvision
from torchvision import transforms, datasets, models
from torch.utils.data import Dataset, DataLoader, random_split
from torch import nn
from torch.nn import functional as F
import os

In [None]:
# select cuda device
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [None]:
# Apply following transformation on the dataset.
# ResNet is used so images should be in (244,244) format 
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(244),
    transforms.ToTensor(),
    transforms.Normalize((0,), (0.50,)),
])

dataset = datasets.ImageFolder(root='train', transform=transform)

In [None]:
split_ratio = 0.8
data_len = len(dataset)
train_size = int(split_ratio * data_len)
test_size = data_len - train_size

# Randomly splits data into given sizes
train_dataset, test_dataset = random_split(dataset, lengths=(train_size, test_size))

`datasets.ImageFolder` stores images in tuple format. Each position in the tuple has 2 elements first is the preprocessed image and second is the label.

In [6]:
print(train_dataset[0])
print(test_dataset[0])

(tensor([[[0.2980, 0.2745, 0.2745,  ..., 0.4627, 0.4784, 0.4706],
         [0.2824, 0.2588, 0.2667,  ..., 0.4706, 0.4863, 0.4784],
         [0.2667, 0.2588, 0.2588,  ..., 0.4706, 0.4784, 0.4706],
         ...,
         [1.0431, 1.0431, 1.0431,  ..., 0.6667, 0.5412, 0.5490],
         [1.0667, 1.1059, 1.1608,  ..., 0.7529, 0.6745, 0.5647],
         [1.1608, 1.1922, 1.1686,  ..., 0.6275, 0.7294, 0.7294]],

        [[0.2431, 0.2588, 0.2745,  ..., 0.4706, 0.4863, 0.4784],
         [0.2353, 0.2431, 0.2667,  ..., 0.4784, 0.4941, 0.4863],
         [0.2431, 0.2431, 0.2510,  ..., 0.4784, 0.4863, 0.4784],
         ...,
         [0.6039, 0.5647, 0.5569,  ..., 0.6667, 0.5412, 0.5490],
         [0.6667, 0.6039, 0.6353,  ..., 0.7529, 0.6745, 0.5647],
         [0.7608, 0.7137, 0.6824,  ..., 0.6196, 0.7294, 0.7294]],

        [[0.2980, 0.2824, 0.2902,  ..., 0.5098, 0.5255, 0.5176],
         [0.2745, 0.2745, 0.2824,  ..., 0.5176, 0.5333, 0.5255],
         [0.2431, 0.2588, 0.2745,  ..., 0.5176, 0.5255, 0

In [None]:
trainloader = DataLoader(train_dataset, batch_size=1, shuffle=True)
testloader = DataLoader(test_dataset, batch_size=1, shuffle=True)

In [None]:
criterion = nn.NLLLoss()

Initializing pretrained ResNet Model.

In [9]:
model_rsn_pre = models.resnet50(pretrained=True)

Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /root/.cache/torch/checkpoints/resnet50-19c8e357.pth
100%|██████████| 97.8M/97.8M [00:02<00:00, 45.5MB/s]


In [10]:
model_rsn_pre.to(device)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

Now we just want the features extracted from the image. The ResNet pretrained model is trained on Imagenet dataset. So its last layer has 1000 nodes for 1000 classes so we will take the output of second last layer to get features extracted from the image. Second last layer is a pooling layer so we have to flatten its three dimensional output before passing it to SVM. 

In [None]:
# Function to get the ooutput of secong last layer
def pooling_output(x):
    global model_rsn_pre
    for layer_name, layer in model_rsn_pre._modules.items():
        x = layer(x)
        if layer_name == 'avgpool':
            break
    return x

In [None]:
# Returns the features and their corresponding labels 
def feature_extractor(dataloader):
    labels = []
    features = []
    with torch.no_grad():
        model_rsn_pre.eval()
        for inputs, label in dataloader:
            result = pooling_output(inputs.to(device))
            features.append(result.cpu().view(1, -1))
            labels.append(label)
            torch.cuda.empty_cache()
    return features, labels

In [None]:
# extract features from train data
train_features, train_labels = feature_extractor(trainloader)

In [None]:
# extract features from test data
test_features, test_labels = feature_extractor(testloader)

Now extracted features have one extra dimension so use `torch.stack` and then `tensor.squeeze`to manage its dimensions.

In [None]:
train_features = torch.stack(train_features)
train_labels = torch.stack(train_labels)
test_features = torch.stack(test_features)
test_labels = torch.stack(test_labels)

In [None]:
train_features = train_features.squeeze()
train_labels = train_labels.squeeze()
test_features = test_features.squeeze()
test_labels = test_labels.squeeze()

`sklearn` requires the inputs in numpy format so convert everything to numpy.

In [None]:
train_features = train_features.numpy()
train_labels = train_labels.numpy()
test_features = test_features.numpy()
test_labels = test_labels.numpy()

In [None]:
# reshape numpy array in the format that is required by SVM 
train_features = train_features.reshape(len(train_features), 2048,)
train_labels = train_labels.reshape(len(train_labels),)
test_features = test_features.reshape(len(test_features), 2048,)
test_labels = test_labels.reshape(len(test_labels),)

In [19]:
print(train_features.shape)
print(train_labels.shape)

(20000, 2048)
(20000,)


#### Defining the SVM model
Here `BaggingClassifier` is used over SVM to increase it's speed.

In [20]:
n_estimators = 10 
clf = BaggingClassifier(svm.LinearSVC(random_state=42, verbose=True, max_iter=1000000), max_samples=1.0 / n_estimators, n_estimators=n_estimators)
clf.fit(train_features,train_labels)

[LibLinear][LibLinear][LibLinear][LibLinear][LibLinear][LibLinear][LibLinear][LibLinear][LibLinear][LibLinear]

BaggingClassifier(base_estimator=LinearSVC(C=1.0, class_weight=None, dual=True,
                                           fit_intercept=True,
                                           intercept_scaling=1,
                                           loss='squared_hinge',
                                           max_iter=1000000, multi_class='ovr',
                                           penalty='l2', random_state=42,
                                           tol=0.0001, verbose=True),
                  bootstrap=True, bootstrap_features=False, max_features=1.0,
                  max_samples=0.1, n_estimators=10, n_jobs=None,
                  oob_score=False, random_state=None, verbose=0,
                  warm_start=False)

In [None]:
test_pred = clf.predict(test_features)

In [22]:
train_labels

array([0, 1, 1, ..., 1, 1, 1])

The confusion matrix and accuracy on the testing dataset containing 5000 images.

In [23]:
print(metrics.confusion_matrix(test_labels, test_pred))
print(metrics.accuracy_score(test_labels, test_pred))

[[2475   26]
 [  29 2470]]
0.989


### Classification without feature exctraction
In this case we will directly pass the image to SVM model without using CNNs.

In [None]:
# Defining transformations
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(244),
    transforms.Grayscale(),
    transforms.ToTensor(),
    transforms.Normalize((0,), (0.50,)),
])

dataset = datasets.ImageFolder(root='train', transform=transform)

In [None]:
split_ratio = 0.8
data_len = len(dataset)
train_size = int(split_ratio * data_len)
test_size = data_len - train_size

# Randomly splits data into given sizes
train_dataset, test_dataset = random_split(dataset, lengths=(train_size, test_size))

Previously the size of extracted features was 2048 but now we are directly passing the original image to the SVM so the input image size after flattening is going to be 59536. The system will run out of RAM if we try to create a variabel containig 20000 images of that size so we will divide the dataset into batched of size 1000 and train our model on each batch.

In [None]:
trainloader = DataLoader(train_dataset, batch_size=1000)
testloader = DataLoader(test_dataset, batch_size=1000)

In [27]:
n_estimators = 10
clf = BaggingClassifier(svm.LinearSVC(random_state=42, max_iter=1000000), max_samples=1.0 / n_estimators, n_estimators=n_estimators)

# Training over 1000 images per batch
i = 0
for x, y, in trainloader:
    x = x.view(x.shape[0], -1).numpy()
    y = y.numpy()
    clf.fit(x, y)
    print("Batch:", i)
    i += 1


Batch: 0
Batch: 1
Batch: 2
Batch: 3
Batch: 4
Batch: 5
Batch: 6
Batch: 7
Batch: 8
Batch: 9
Batch: 10
Batch: 11
Batch: 12
Batch: 13
Batch: 14
Batch: 15
Batch: 16
Batch: 17
Batch: 18
Batch: 19


In [None]:
labels = np.array([])
predictions = np.array([])
# Testing in the batch of 1000
for x, y, in testloader:
    x = x.view(x.shape[0], -1).numpy()
    # print(x.shape)
    y = y.numpy()
    # print(y.shape)
    a = clf.predict(x)
    predictions = np.concatenate((predictions, a), axis=0)
    labels = np.concatenate((labels, y), axis=0)

In [29]:
print(predictions.shape)

(5000,)


In [30]:
print(metrics.confusion_matrix(labels, predictions))
print(metrics.accuracy_score(labels, predictions))

[[1683  782]
 [1602  933]]
0.5232
