# Microsoft Vision classification example
This example shows an example of a simple way to classify images from any dataset using pretrained Microsoft Vision model.

Using 1 GPU, under 10 minutes of training we can achieve score 92.92% accuracy on CIFAR-10 dataset using kNN algorithm.
We also show how to plug-in LinearClassification algorithm on top of frozen featues. 

In [None]:
#conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

#pip install progressbar2


In [1]:

# TODO: Enter the foldername in your Drive where you have saved the unzipped
# assignment folder, e.g. 'cs231n/assignments/assignment1/'
FOLDERNAME = 'home/ubuntu/Vision-Classifiers/Microsoft-Vision-Classifier/'
assert FOLDERNAME is not None, "[!] Enter the foldername."

# Now that we've mounted your Drive, this ensures that
# the Python interpreter of the Colab VM can load
# python files from within it.
import sys
sys.path.append('/home/ubuntu/Vision-Classifiers/Microsoft-Vision-Classifier')

# This downloads the CIFAR-10 dataset to your Drive
# if it doesn't already exist.
%cd /$FOLDERNAME/datasets/
!bash get_datasets.sh
%cd /$FOLDERNAME

/home/ubuntu/Vision-Classifiers/Microsoft-Vision-Classifier/datasets
/home/ubuntu/Vision-Classifiers/Microsoft-Vision-Classifier


In [2]:
# Setup cell.
import numpy as np
import matplotlib.pyplot as plt
from data_utils import get_CIFAR10_data

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
  """ returns relative error """
  return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

In [3]:
import torch
import torchvision
from torch.utils.data import DataLoader
from torchvision.datasets import CIFAR10
import torchvision.transforms as transforms
import numpy as np
from sklearn.linear_model import LogisticRegression
import progressbar
from progressbar import progressbar

from sklearn.neighbors import KNeighborsClassifier


In [4]:
import microsoftvision

Let's define how we'll preprocess images. We can notice that Microsoft Vision model is using images in BGR format, hence the swapping of image channels at the end of preprocessing

In [5]:
class Preprocess:
    def __init__(self):
        self.preprocess = transforms.Compose([
                                           transforms.Resize(224),
                                           transforms.CenterCrop(224),
                                           transforms.ToTensor(),
                                           transforms.Normalize(mean=[0.406, 0.456, 0.485], std=[0.225, 0.224, 0.229])])

    def __call__(self, x):
        return self.preprocess(x)[[2,1,0],:,:]

Import the CIFAR-10 dataset with division to train and test sets. This can be replaced with any dataset without any changes to the rest of the code.

In [6]:
# Load the (preprocessed) CIFAR-10 data.
data = get_CIFAR10_data()
for k, v in list(data.items()):
    print(f"{k}: {v.shape}")

X_train: (49000, 3, 32, 32)
y_train: (49000,)
X_val: (1000, 3, 32, 32)
y_val: (1000,)
X_test: (1000, 3, 32, 32)
y_test: (1000,)


In [7]:


train_dataset = CIFAR10('./path', download=True, train=True, transform=Preprocess())
test_dataset = CIFAR10('./path', download=True, train=False, transform=Preprocess())

Files already downloaded and verified
Files already downloaded and verified


And now, we are importing Microsoft Vision model with ResNet50 architecture. We are specifying that we want the pretrained version (same interface as torchvision).

In [8]:
model = microsoftvision.models.resnet50(pretrained=True)

Loading Microsoft Vision pretrained model
Model already downloaded.


Microsoft vision model is just used to extract image features, without fine-tuning. Therefore we are setting it to evaluation mode. Let's use GPU to speed-up computation.

In [10]:
model.eval()
model.cuda()

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

In [11]:
def get_features(dataset, model):
    all_features = []
    all_labels = []
    import time
    import logging
    import progressbar
    torch.cuda.empty_cache()

    with torch.no_grad():
        for images, labels in progressbar.progressbar(DataLoader(dataset, batch_size=128, num_workers=8)):
            images = images.cuda()
            labels = labels.cuda()
            features = model(images)

            all_features.append(features)
            all_labels.append(labels)

    return torch.cat(all_features).cpu().numpy(), torch.cat(all_labels).cpu().numpy()

We're extracting all image features from training and test set. Those features will be used to train the linear regression model and calculate accuracy score.

In [12]:
train_features, train_labels = get_features(train_dataset, model)
test_features, test_labels = get_features(test_dataset, model)

100% (391 of 391) |######################| Elapsed Time: 0:04:52 Time:  0:04:52
100% (79 of 79) |########################| Elapsed Time: 0:00:58 Time:  0:00:58


Fit the classifier to the training set and then measure performance on test set. Whole operation will take less than 10 min using 1 GPU!

In [13]:
# You can plug-in any classifier

classifier = LogisticRegression(random_state=0, max_iter=1000, verbose=1, n_jobs=16)

In [14]:
classifier.fit(train_features, train_labels)
predictions = classifier.predict(test_features)
accuracy = np.mean((test_labels == predictions).astype(np.float)) * 100.
print(f"Accuracy: {accuracy}")

[Parallel(n_jobs=16)]: Using backend LokyBackend with 16 concurrent workers.


Accuracy: 92.86


[Parallel(n_jobs=16)]: Done   1 out of   1 | elapsed: 24.5min finished
