# Active Learning : Using trained model to identify unlabelled images that are hard to classify and thus need to be labelled by human

In this notebook, we will use the trained model to identify images which has the low probability score for the most probable class and save it to [weights and biases dashboard](https://wandb.ai/bhattbhuwan13/active-learning/runs/34g8uy09?workspace=user-bhattbhuwan13) so that those images can later be labelled by humans.

In [68]:
import torch
import torchvision
import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder

import torch.nn as nn
import torch.nn.functional as F

import torch.optim as optim

import matplotlib.pyplot as plt
%matplotlib inline

import numpy as np
import PIL

In [2]:
test_transform = transforms.Compose(
    [
     transforms.Resize(32), # The dataset loaded from the disk isn't 32 X 32 so we need to introduce
#         transformations to make them 32 X 32
     transforms.CenterCrop(32),
     transforms.ToTensor(),
#      transforms.Normalize((0.0, 0.0, 0.0), (255.0, 255.0, 255.0)),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])

batch_size = 128

testset = ImageFolder(root='./test/', transform=test_transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=4)

## The model

We will use the same model from the previous notebook and then load the saved weights from the local drive.

In [37]:
NUM_CLASSES = 10
EPOCHS = 10
DEVICE = 'cpu' # We will make predictions on CPU

In [4]:
class CNN(nn.Module):
    def __init__(self, n_classes=10):
        super().__init__()
        self.conv = nn.Conv2d(in_channels=3, 
                              out_channels=20, 
                              kernel_size=3)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(20, 10, 3)
#         self.fc1 = nn.Linear(10 * 3 * 3, 80)
        self.fc1 = nn.Linear(360, 80)
        self.fc2 = nn.Linear(80, 40)
        self.fc3 = nn.Linear(40, n_classes)
    
    def forward(self, x):
        x = self.pool(F.relu(self.conv(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x) # Should have passed this output to softmax layers before returning
        return x

## Setting up weights and biases for logging images

In [50]:

import wandb

# 1. Start a new run
wandb.init(project='active-learning', entity='bhattbhuwan13')  

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
wandb: Currently logged in as: bhattbhuwan13 (use `wandb login --relogin` to force relogin)
wandb: wandb version 0.12.4 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade


## Function for converting the normalized array back to normal 

In [79]:
def array2img(arr, 
              mean=[0.5, 0.5, 0.5], 
              std=[0.5, 0.5, 0.5]):

    arr = arr.numpy().transpose((1, 2, 0))
    mean = np.array(mean)
    std = np.array(std)
    arr = std * arr + mean
    arr = np.clip(arr, 0, 1)
    arr = arr * 255
    arr = arr.astype(int)
    return arr

## Load and predict

In [38]:
PATH = './saved_model/active.pth'

model = CNN(NUM_CLASSES)
model.load_state_dict(torch.load(PATH, map_location=DEVICE))

<All keys matched successfully>

## Predict and save difficult images to weights and biases

The code below uses the model to make predictions on unlabelled data and log difficult images(images for which the probability score of the most probable class is less than 0.5) to the weights and biases server. For each batch of predictions:

- The model spits out probability of each sample belonging to one of 10 classes
- If the highest probability score is less than 0.5, the sample is considered as difficult example and save to the weights and biases server

**Note: For convinience, the code below runs for only one iteration(batch). However, upon removing the break statement, it will run for the entire set of unlabelled images**

In [92]:
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images = images.to(DEVICE)
        model = model.to(DEVICE)

        predicted_values = model(images)
        max_prob = torch.max(predicted_values.data, 1).values
        max_prob = torch.where(max_prob<0.5) 
        difficult_images = images[max_prob] # filters out all
        # those images for which the model has less confidence
       
        difficult_images_converted = [array2img(array) for array in difficult_images]
        wandb.log({'images': [wandb.Image(image) for image in difficult_images_converted]})
#         wandb.log({"difficult examples": difficult_images})
#         to_log = [wandb.Image(image) for image in difficult_images_converted]
#         table= wandb.Table(data=to_log, columns=['image'])
#         wandb.log({"cifar10_images": table})
        break