# Microsoft Vision classification example
This example shows an example of a simple way to classify images from any dataset using pretrained Microsoft Vision model.

Using 1 GPU, under 10 minutes of training we can achieve score 92.92% accuracy on CIFAR-10 dataset using kNN algorithm.
We also show how to plug-in LinearClassification algorithm on top of frozen featues.

In [None]:
pip install microsoftvision

conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

In [None]:
# This mounts your Google Drive to the Colab VM.
from google.colab import drive
drive.mount('/content/drive')

# TODO: Enter the foldername in your Drive where you have saved the unzipped
# assignment folder, e.g. 'cs231n/assignments/assignment1/'
FOLDERNAME = 'cs231n/assignments/assignment2/'
assert FOLDERNAME is not None, "[!] Enter the foldername."

# Now that we've mounted your Drive, this ensures that
# the Python interpreter of the Colab VM can load
# python files from within it.
import sys
sys.path.append('/content/drive/My Drive/{}'.format(FOLDERNAME))

# This downloads the CIFAR-10 dataset to your Drive
# if it doesn't already exist.
%cd /content/drive/My\ Drive/$FOLDERNAME/cs231n/datasets/
!bash get_datasets.sh
%cd /content/drive/My\ Drive/$FOLDERNAME

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/My Drive/cs231n/assignments/assignment2/cs231n/datasets
/content/drive/My Drive/cs231n/assignments/assignment2


In [None]:
# Setup cell.
import numpy as np
import matplotlib.pyplot as plt
from cs231n.classifiers.cnn import *
from cs231n.data_utils import get_CIFAR10_data
from cs231n.gradient_check import eval_numerical_gradient_array, eval_numerical_gradient
from cs231n.layers import *
from cs231n.fast_layers import *
from cs231n.solver import Solver

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
  """ returns relative error """
  return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [None]:
import torch
from torch.utils.data import DataLoader
from torchvision.datasets import CIFAR10
import torchvision.transforms as transforms
import numpy as np
from sklearn.linear_model import LogisticRegression
from progressbar import progressbar
from sklearn.neighbors import KNeighborsClassifier
import microsoftvision

And the new element: Microsoft Vision package. It gives immediate access to pretrained ResNet50 model in just 1 line. With interface very similar to torchvision

We'll start by necessary imports. We are using pytorch as model backend and sklearn LogisticRegression for simplicity and reproducibility of our results.

We'll start by necessary imports. We are using pytorch as model backend and sklearn LogisticRegression for simplicity and reproducibility of our results.

Let's define how we'll preprocess images. We can notice that Microsoft Vision model is using images in BGR format, hence the swapping of image channels at the end of preprocessing

In [None]:
class Preprocess:
    def __init__(self):
        self.preprocess = transforms.Compose([
                                           transforms.Resize(224),
                                           transforms.CenterCrop(224),
                                           transforms.ToTensor(),
                                           transforms.Normalize(mean=[0.406, 0.456, 0.485], std=[0.225, 0.224, 0.229])])

    def __call__(self, x):
        return self.preprocess(x)[[2,1,0],:,:]

Import the CIFAR-10 dataset with division to train and test sets. This can be replaced with any dataset without any changes to the rest of the code.

In [None]:
# Load the (preprocessed) CIFAR-10 data.
data = get_CIFAR10_data()
for k, v in list(data.items()):
    print(f"{k}: {v.shape}")

X_train: (49000, 3, 32, 32)
y_train: (49000,)
X_val: (1000, 3, 32, 32)
y_val: (1000,)
X_test: (1000, 3, 32, 32)
y_test: (1000,)


In [None]:


train_dataset = CIFAR10('path', download=True, train=True, transform=Preprocess())
test_dataset = CIFAR10('path', download=True, train=False, transform=Preprocess())

Files already downloaded and verified
Files already downloaded and verified


And now, we are importing Microsoft Vision model with ResNet50 architecture. We are specifying that we want the pretrained version (same interface as torchvision).

In [None]:
model = microsoftvision.models.resnet50(pretrained=True)

Loading Microsoft Vision pretrained model
Model already downloaded.


Microsoft vision model is just used to extract image features, without fine-tuning. Therefore we are setting it to evaluation mode. Let's use GPU to speed-up computation.

In [None]:
model.eval()
model.cuda()

In [None]:
def get_features(dataset, model):
    all_features = []
    all_labels = []

    with torch.no_grad():
        for images, labels in progressbar(DataLoader(dataset, batch_size=128, num_workers=8)):
            images = images.cuda()
            labels = labels.cuda()
            features = model(images)

            all_features.append(features)
            all_labels.append(labels)

    return torch.cat(all_features).cpu().numpy(), torch.cat(all_labels).cpu().numpy()

We're extracting all image features from training and test set. Those features will be used to train the linear regression model and calculate accuracy score.

In [None]:
train_features, train_labels = get_features(train_dataset, model)
test_features, test_labels = get_features(test_dataset, model)

NameError: ignored

Fit the classifier to the training set and then measure performance on test set. Whole operation will take less than 10 min using 1 GPU!

In [None]:
# You can plug-in any classifier

#classifier = LogisticRegression(random_state=0, max_iter=1000, verbose=1, n_jobs=16)
classifier = KNeighborsClassifier(n_neighbors=5, n_jobs=16)

In [None]:
classifier.fit(train_features, train_labels)
predictions = classifier.predict(test_features)
accuracy = np.mean((test_labels == predictions).astype(np.float)) * 100.
print(f"Accuracy: {accuracy}")