# Image Classification Tutorial 


In [13]:
import sys
!{sys.executable} -m pip install "deepchecks[vision]" --quiet --upgrade


[notice] A new release of pip is available: 23.0.1 -> 23.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Load Data
=========

We will use torchvision and torch.utils.data packages for loading the
data. The model we are building will learn to classify **ants** and
**bees**. We have about 120 training images each for ants and bees.
There are 75 validation images for each class. This dataset is a very
small subset of imagenet.


In [14]:
import albumentations as A
import numpy as np
import os
import PIL.Image
import torch
import torchvision
from albumentations.pytorch import ToTensorV2
from torch import nn
from torch.utils.data import DataLoader

class CustomDataset(torchvision.datasets.ImageFolder):

    def __getitem__(self, index: int):
        """overrides __getitem__ to be compatible to albumentations"""
        path, target = self.samples[index]
        sample = self.loader(path)
        sample = self.get_cv2_image(sample)
        if self.transforms is not None:
            transformed = self.transforms(image=sample, target=target)
            sample, target = transformed["image"], transformed["target"]
        else:
            if self.transform is not None:
                sample = self.transform(image=sample)['image']
            if self.target_transform is not None:
                target = self.target_transform(target)

        return sample, target

    def get_cv2_image(self, image):
        if isinstance(image, PIL.Image.Image):
            return np.array(image).astype('uint8')
        elif isinstance(image, np.ndarray):
            return image
        else:
            raise RuntimeError("Only PIL.Image and CV2 loaders currently supported!")

data_dir = '../data/SUR'
# Just normalization for validation
data_transforms = A.Compose([
    A.Resize(height=256, width=256),
    A.CenterCrop(height=224, width=224),
    A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
    ToTensorV2(),
])
train_dataset = CustomDataset(root=os.path.join(data_dir,'train'))
train_dataset.transforms = data_transforms

test_dataset = CustomDataset(root=os.path.join(data_dir, 'test'))
test_dataset.transforms = data_transforms

Visualize the dataset
=====================

Let\'s see how our data looks like.


In [15]:
print(f'Number of training images: {len(train_dataset)}')
print(f'Number of validation images: {len(test_dataset)}')
print(f'Example output of an image shape: {train_dataset[0][0].shape}')
print(f'Example output of a label: {train_dataset[0][1]}')

Number of training images: 4800
Number of validation images: 1200
Example output of an image shape: torch.Size([3, 224, 224])
Example output of a label: 0


Downloading a pre-trained model
===============================

Now, we will download a pre-trained model from torchvision, that was
trained on the ImageNet dataset.


In [16]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = torchvision.models.resnet18(pretrained=True)
num_ftrs = model.fc.in_features
# We have only 2 classes
model.fc = nn.Linear(num_ftrs, 6)
model = model.to(device)
_ = model.eval()

# Validating the Model with Deepchecks

Now, after we have the training data, validation data and the model, we
can validate the model with deepchecks test suites.



In [17]:
from deepchecks.vision.vision_data import BatchOutputFormat

def deepchecks_collate_fn(batch) -> BatchOutputFormat:
    """Return a batch of images, labels and predictions for a batch of data. The expected format is a dictionary with
    the following keys: 'images', 'labels' and 'predictions', each value is in the deepchecks format for the task.
    You can also use the BatchOutputFormat class to create the output.
    """
    # batch received as iterable of tuples of (image, label) and transformed to tuple of iterables of images and labels:
    batch = tuple(zip(*batch))

    # images:
    inp = torch.stack(batch[0]).detach().numpy().transpose((0, 2, 3, 1))
    mean = [0.485, 0.456, 0.406]
    std = [0.229, 0.224, 0.225]
    inp = std * inp + mean
    images = np.clip(inp, 0, 1) * 255

    #labels:
    labels = batch[1]

    #predictions:
    logits = model.to(device)(torch.stack(batch[0]).to(device))
    predictions = nn.Softmax(dim=1)(logits)
    return BatchOutputFormat(images=images, labels=labels, predictions=predictions)

We have a single label here, which is the tomato class The label\_map is
a dictionary that maps the class id to the class name, for display
purposes.


In [18]:
LABEL_MAP = {
    0: 'Ia',
    1: 'IIa',
    2: 'IIIa',
    3: 'IVc',
    4: 'IVd',
    5: 'Va'
  }

Now that we have our updated collate function, we can recreate the
dataloader in the deepchecks format, and use it to create a VisionData
object:


In [19]:
from deepchecks.vision import VisionData

train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True, collate_fn=deepchecks_collate_fn)
test_loader = DataLoader(test_dataset, batch_size=4, shuffle=True, collate_fn=deepchecks_collate_fn)

training_data = VisionData(batch_loader=train_loader, task_type='classification', label_map=LABEL_MAP)
test_data = VisionData(batch_loader=test_loader, task_type='classification', label_map=LABEL_MAP)

Making sure our data is in the correct format:
==============================================

The VisionData object automatically validates your data format and will
alert you if there is a problem. However, you can also manually view
your images and labels to make sure they are in the correct format by
using the `head` function to conveniently visualize your data:


In [20]:
training_data.head()

VBox(children=(HTML(value='<div style="display:flex; flex-direction: column; gap: 10px;">\n                <di…

And observe the output:

Running Deepchecks\' suite on our data and model!
=================================================

Now that we have defined the task class, we can validate the train and
test data with deepchecks\' train test validation suite. This can be
done with this simple few lines of code:


In [21]:
from deepchecks.vision.suites import train_test_validation

suite = train_test_validation()
result = suite.run(training_data, test_data,  max_samples = 5000)

We also have suites for:
`data integrity <deepchecks.vision.suites.data_integrity>`{.interpreted-text
role="func"} - validating a single dataset and
`model evaluation <deepchecks.vision.suites.model_evaluation>`{.interpreted-text
role="func"} -evaluating the model\'s performance.


Observing the results:
======================

The results can be saved as a html file with the following code:


In [24]:
result.save_as_html('output_img_classification.html')

# Or displayed in a new window in an IDE
# result.show_in_window()

'output_img_classification.html'

Or, if working inside a notebook, the output can be displayed directly
by simply printing the result object:


In [23]:
result

Accordion(children=(VBox(children=(HTML(value='\n<h1 id="summary_8HZFFGHF3OBS0SWV5QD8C1AM0">Train Test Validat…