# **Report Deep Learning Project** 
##### 
Marcus Vukojevik 238817 marcus.vukojevik@studenti.unitn.it  
Mattias Trettel 247187 mattias.trettel@studenti.unitn.it

## **Project Description**

#### Our approach was to leverage MEMO alongside BLIP as a complementary model to adjust class distributions manually prior to applying the entropy minimization algorithm. BLIP (Bootstrapping Language-Image Pretraining) is a vision-language model that generates textual descriptions or answers based on visual input and is particularly valuable for augmenting image classification tasks by providing additional semantic context. We used Imagenet-A as the dataset. The basic idea is simple: if a model has difficulty categorizing certain classes of images, why not get help from a second model? This second model can be used as a second opinion, given the predictions of the first model. In our case, we use BLIP to manually adjust the logit values (which are the unnormalized predictions of the classes) before applying entropy minimization. We query BLIP about the presence of ResNet's output labels in image augmentations, and if BLIP confirms their presence, we manually increase the probability of those classes.

## **Index**

#### **Section 1**: 
- a) import of the dataset Imagenet-A and related class labels
- b) defining argmix functions
- c) defining augmentations functions
- d) defining utilities function
#### **Section 2**: **MEMO** baseline  
#### **Section 3**: **MEMO** plus **BLIP**  

###### 
Note: **Section 2** and **Section 3** both require the dataset, labels and functions imported and defined in **Section 1**. However, they can be executed independently of each other.



## **Section 1-a** 

#### This cell is responsible for retrieving the ImageNet-A dataset and its class labels

In [None]:
# Import necessary libraries
import os
import urllib
import json

# Dataset URL
dataset_url = 'https://people.eecs.berkeley.edu/~hendrycks/imagenet-a.tar'

# Destination path to download the file
destination = '/content/imagenet-a.tar'

# Download the file
!wget -O {destination} {dataset_url}

# Extract the .tar file (print suppressed)
!tar -xvf {destination} -C /content/ > /dev/null 2>&1

dataset_path = '/content/imagenet-a'

# Verify extracted files
num_folders = 0
num_files = 0

# Iterate over all items in the directory
for root, dirs, files in os.walk(dataset_path):
    num_folders += len(dirs)
    num_files += len(files)

print("=====================================")
print(f"Number of folders: {num_folders}/200")
print(f"Number of files: {num_files}/7501")


# Download the ImageNet class labels file
LABELS_URL = "https://raw.githubusercontent.com/anishathalye/imagenet-simple-labels/master/imagenet-simple-labels.json"
try:
    response = urllib.request.urlopen(LABELS_URL)
    data = response.read().decode()
    class_idx = json.loads(data)

    # Check if `class_idx` is a list and contains elements
    if isinstance(class_idx, list) and len(class_idx) > 0:
        print("Download and loading of the labels completed")
    else:
        print("Error: Wrong data format")

except Exception as e:
    print(f"Error: {e}")

## **Section 1-b Augmix** 

#### Here we define the **augmix** functions  
AugMix is ​​a data augmentation technique designed to improve the robustness and generalization of machine learning models. The provided code implements the AugMix augmentation pipeline, which mixes multiple image transformations together to broaden the diversity of the training data while maintaining the semantic content of the images.
#### Key steps:
- **Preaugmentation**:  
The original image undergoes a series of transformations. This ensures that the image is prepared in a way that is conducive to further augmentation.

- **Augmentation Mixing**:  
The code applies a set of predefined augmentations to the image and combines them using a weighted average. This process is controlled by a mixing coefficient derived from a Dirichlet distribution, which determines how much influence each augmentation will have in the final mixed image.

- **Mixing of Augmented Images**:  
The final image is created by mixing the preaugmented image with the mixed augmented images. A Beta distribution is used to decide the blending ratio, ensuring that the augmented images are combined in a diverse yet controlled manner.



In [None]:
import numpy as np
import torch
from PIL import ImageOps, Image
from torchvision import transforms

## AugMix Data Augmentation
## Reference: https://github.com/google-research/augmix

def _augmix_aug(x_orig):
    """
    Apply AugMix augmentation to the original image.
    
    Args:
    - x_orig (PIL.Image): The original image to be augmented.
    
    Returns:
    - mix (torch.Tensor): The augmented image tensor after mixing.
    """
    # Apply pre-augmentation transformations
    x_orig = preaugment(x_orig)
    # Preprocess the original image
    x_processed = preprocess(x_orig)
    
    # Sample weights for each augmentation
    w = np.float32(np.random.dirichlet([1.0, 1.0, 1.0]))
    # Sample mixing coefficient
    m = np.float32(np.random.beta(1.0, 1.0))
    
    # Initialize an empty tensor to accumulate the mixed augmented images
    mix = torch.zeros_like(x_processed)
    
    # Apply augmentations and mix images
    for i in range(3):
        x_aug = x_orig.copy()
        # Apply a random number of augmentations
        for _ in range(np.random.randint(1, 4)):
            x_aug = np.random.choice(augmentations)(x_aug)
        # Add weighted augmented image to mix
        mix += w[i] * preprocess(x_aug)
    
    # Mix original and augmented images
    mix = m * x_processed + (1 - m) * mix
    return mix

# Short alias for the augmentation function
aug = _augmix_aug

# Define various augmentation functions
def autocontrast(pil_img, level=None):
    """Apply autocontrast to the image."""
    return ImageOps.autocontrast(pil_img)

def equalize(pil_img, level=None):
    """Apply histogram equalization to the image."""
    return ImageOps.equalize(pil_img)

def rotate(pil_img, level):
    """Rotate the image by a random angle."""
    degrees = int_parameter(rand_lvl(level), 30)
    if np.random.uniform() > 0.5:
        degrees = -degrees
    return pil_img.rotate(degrees, resample=Image.BILINEAR, fillcolor=128)

def solarize(pil_img, level):
    """Solarize the image by inverting colors above a certain threshold."""
    level = int_parameter(rand_lvl(level), 256)
    return ImageOps.solarize(pil_img, 256 - level)

def shear_x(pil_img, level):
    """Apply horizontal shear to the image."""
    level = float_parameter(rand_lvl(level), 0.3)
    if np.random.uniform() > 0.5:
        level = -level
    return pil_img.transform((224, 224), Image.AFFINE, (1, level, 0, 0, 1, 0), resample=Image.BILINEAR, fillcolor=128)

def shear_y(pil_img, level):
    """Apply vertical shear to the image."""
    level = float_parameter(rand_lvl(level), 0.3)
    if np.random.uniform() > 0.5:
        level = -level
    return pil_img.transform((224, 224), Image.AFFINE, (1, 0, 0, level, 1, 0), resample=Image.BILINEAR, fillcolor=128)

def translate_x(pil_img, level):
    """Translate the image horizontally."""
    level = int_parameter(rand_lvl(level), 224 / 3)
    if np.random.random() > 0.5:
        level = -level
    return pil_img.transform((224, 224), Image.AFFINE, (1, 0, level, 0, 1, 0), resample=Image.BILINEAR, fillcolor=128)

def translate_y(pil_img, level):
    """Translate the image vertically."""
    level = int_parameter(rand_lvl(level), 224 / 3)
    if np.random.random() > 0.5:
        level = -level
    return pil_img.transform((224, 224), Image.AFFINE, (1, 0, 0, 0, 1, level), resample=Image.BILINEAR, fillcolor=128)

def posterize(pil_img, level):
    """Reduce the number of bits for each color channel."""
    level = int_parameter(rand_lvl(level), 4)
    return ImageOps.posterize(pil_img, 4 - level)

# Helper functions to scale parameter values
def int_parameter(level, maxval):
    """
    Scale an integer parameter according to the level.
    
    Args:
    - level (float): Level of the operation between [0, PARAMETER_MAX].
    - maxval (int): Maximum value for the operation.
    
    Returns:
    - int: Scaled integer parameter.
    """
    return int(level * maxval / 10)

def float_parameter(level, maxval):
    """
    Scale a float parameter according to the level.
    
    Args:
    - level (float): Level of the operation between [0, PARAMETER_MAX].
    - maxval (float): Maximum value for the operation.
    
    Returns:
    - float: Scaled float parameter.
    """
    return float(level) * maxval / 10.

def rand_lvl(n):
    """
    Generate a random level for augmentation.
    
    Args:
    - n (float): Maximum value for the random level.
    
    Returns:
    - float: Random level between 0.1 and n.
    """
    return np.random.uniform(low=0.1, high=n)

# List of augmentations
augmentations = [
    autocontrast,
    equalize,
    lambda x: rotate(x, 1),
    lambda x: solarize(x, 1),
    lambda x: shear_x(x, 1),
    lambda x: shear_y(x, 1),
    lambda x: translate_x(x, 1),
    lambda x: translate_y(x, 1),
    lambda x: posterize(x, 1),
]

# Define preprocessing and pre-augmentation pipelines
mean = [0.485, 0.456, 0.406]  # Mean for normalization
std = [0.229, 0.224, 0.225]   # Standard deviation for normalization

preprocess = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
])

preaugment = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
])


## **Section 1-c ImageAugmentor** 

This code defines a class called ImageAugmentor which is designed to apply various data augmentation techniques to an image and display the augmented images. It relies on argmix.


In [None]:
import matplotlib.pyplot as plt
from torchvision.transforms import v2
from PIL import ImageEnhance, ImageFilter
import concurrent.futures
import random

#libraries already called:
#from PIL import Image
#from torchvision import transforms

# Define a class for image augmentation
class ImageAugmentor:
    # Initialization method
    def __init__(self, augmentations=None):
        # If no augmentations are provided, use default set of augmentations
        if augmentations is None:
            self.augmentations = v2.Compose([
                v2.RandomHorizontalFlip(),
                v2.RandomVerticalFlip(),
                v2.RandomRotation(degrees=45),
                v2.RandomResizedCrop(size=(224, 224), scale=(0.8, 1.0)),
            ])
        else:
            # If augmentations are provided, use those
            self.augmentations = augmentations

    # Method to apply augmentations to an image
    def apply_augmentations(self, image_path, aug, n_augmentations):
        # Load and convert the image to RGB
        image = Image.open(image_path).convert("RGB")
        
        # Define a helper function to perform augmentation
        def augment_image(_):
            if aug:
                return transforms.ToPILImage()(_augmix_aug(image))
            else:
                return self.augmentations(image)
        
        # Use ThreadPoolExecutor to perform augmentations in parallel
        with concurrent.futures.ThreadPoolExecutor() as executor:
            augmented_images = list(executor.map(augment_image, range(n_augmentations)))
    
        return augmented_images

    # Method to apply a set of predefined augmentations to an image
    def apply_all_augmentations(self, image_path):
        # Load and convert the image to RGB
        image = Image.open(image_path).convert('RGB')
        augmentations = []

        # Helper function to apply an augmentation function with its arguments
        def apply_augmentation(func, *args):
            return func(*args)

        # List of augmentation tasks with corresponding functions and arguments
        tasks = [
            (random_rotation, image, 360),
            (random_crop, image, (int(image.width * 0.8), int(image.height * 0.8))),
            (random_zoom, image, 0.8, 1.2),
            (random_shift, image, 10, 10),
            (shear_image, image, 0.2),
            (adjust_brightness, image, 1.5),
            (adjust_contrast, image, 1.5),
            (adjust_saturation, image, 1.5),
            (adjust_hue, image, 50),
            (add_noise, image, 25),
            (blur_image, image, 2),
            (sharpen_image, image, 2),
            (grayscale_image, image),
            (cutout_image, image, 50),
            (flip_image_horizontal, image),
            (flip_image_vertical, image),
            (rotate_image, image, 45),
            (crop_image, image, (10, 10, image.width-10, image.height-10)),
            (zoom_image, image, 1.1),
            (shift_image, image, 5, 5),
        ]

        # Use ThreadPoolExecutor to perform all augmentations in parallel
        with concurrent.futures.ThreadPoolExecutor() as executor:
            results = [executor.submit(apply_augmentation, task[0], *task[1:]) for task in tasks]
            for future in concurrent.futures.as_completed(results):
                augmentations.append(future.result())

        return augmentations

    # Method to display the original and augmented images
    def show_images(self, original_image, augmented_images):
        # Create a subplot with the original and augmented images
        fig, axes = plt.subplots(1, len(augmented_images) + 1, figsize=(15, 5))
        axes[0].imshow(original_image)
        axes[0].set_title("Original Image")
        axes[0].axis("off")
        
        try:
            # Display augmented images, converting to PIL if necessary
            for i, aug_image in enumerate(augmented_images):
                axes[i + 1].imshow(transforms.ToPILImage()(aug_image))
                axes[i + 1].set_title(f"Augmented Image {i+1}")
                axes[i + 1].axis("off")
        except:
            for i, aug_image in enumerate(augmented_images):
                axes[i + 1].imshow(aug_image)
                axes[i + 1].set_title(f"Augmented Image {i+1}")
                axes[i + 1].axis("off")
        
        plt.show()

    
    # Augmentations functions
def rotate_image(image, angle):
    return image.rotate(angle)

def crop_image(image, crop_area):
    return image.crop(crop_area)

def zoom_image(image, zoom_factor):
    width, height = image.size
    x_center, y_center = width / 2, height / 2
    new_width, new_height = width / zoom_factor, height / zoom_factor
    left = x_center - new_width / 2
    top = y_center - new_height / 2
    right = x_center + new_width / 2
    bottom = y_center + new_height / 2
    return image.crop((left, top, right, bottom)).resize((width, height), Image.LANCZOS)

def shift_image(image, dx, dy):
    width, height = image.size
    return Image.fromarray(np.roll(np.roll(np.array(image), dx, axis=1), dy, axis=0), 'RGB')

def flip_image_horizontal(image):
    return image.transpose(Image.FLIP_LEFT_RIGHT)

def flip_image_vertical(image):
    return image.transpose(Image.FLIP_TOP_BOTTOM)

def random_rotation(image, max_angle):
    angle = random.uniform(-max_angle, max_angle)
    return rotate_image(image, angle)

def random_crop(image, crop_size):
    width, height = image.size
    left = random.randint(0, width - crop_size[0])
    top = random.randint(0, height - crop_size[1])
    right = left + crop_size[0]
    bottom = top + crop_size[1]
    return crop_image(image, (left, top, right, bottom))

def random_zoom(image, min_zoom, max_zoom):
    zoom_factor = random.uniform(min_zoom, max_zoom)
    return zoom_image(image, zoom_factor)

def random_shift(image, max_dx, max_dy):
    dx = random.randint(-max_dx, max_dx)
    dy = random.randint(-max_dy, max_dy)
    return shift_image(image, dx, dy)

def shear_image(image, shear_factor):
    width, height = image.size
    xshift = abs(shear_factor) * width
    new_width = width + int(round(xshift))
    return image.transform((new_width, height), Image.AFFINE,
                           (1, shear_factor, -xshift if shear_factor > 0 else 0, 0, 1, 0), Image.BICUBIC)

def adjust_brightness(image, factor):
    enhancer = ImageEnhance.Brightness(image)
    return enhancer.enhance(factor)

def adjust_contrast(image, factor):
    enhancer = ImageEnhance.Contrast(image)
    return enhancer.enhance(factor)

def adjust_saturation(image, factor):
    enhancer = ImageEnhance.Color(image)
    return enhancer.enhance(factor)

def adjust_hue(image, factor):
    image = np.array(image.convert('HSV'))
    image[..., 0] = (image[..., 0].astype(int) + factor) % 256
    return Image.fromarray(image, 'HSV').convert('RGB')

def add_noise(image, noise_level):
    image = np.array(image)
    noise = np.random.normal(0, noise_level, image.shape)
    noisy_image = np.clip(image + noise, 0, 255).astype(np.uint8)
    return Image.fromarray(noisy_image, 'RGB')

def blur_image(image, radius):
    return image.filter(ImageFilter.GaussianBlur(radius))

def sharpen_image(image, factor):
    enhancer = ImageEnhance.Sharpness(image)
    return enhancer.enhance(factor)

def grayscale_image(image):
    return ImageOps.grayscale(image).convert('RGB')

def cutout_image(image, mask_size, mask_value=0):
    image = np.array(image)
    height, width = image.shape[:2]
    y = np.random.randint(height)
    x = np.random.randint(width)
    y1 = np.clip(y - mask_size // 2, 0, height)
    y2 = np.clip(y + mask_size // 2, 0, height)
    x1 = np.clip(x - mask_size // 2, 0, width)
    x2 = np.clip(x + mask_size // 2, 0, width)
    image[y1:y2, x1:x2] = mask_value
    return Image.fromarray(image, 'RGB')


## **Section 1-d Utilities functions**  

This code contains several utilities for preprocessing images and manipulating the results, calculate metrics like marginal entropy and KL divergence to evaluate the uncertainty of the model's predictions, and map the class IDs output by the model to readable labels

In [None]:
# Define transformations for pre-processing images
preprocess = transforms.Compose([
    transforms.Resize(256),                                                         #Resizes the image to 256 pixels on the shortest side
    transforms.CenterCrop(224),                                                     #Performs a centered crop of 224x224 pixels
    transforms.ToTensor(),                                                          #Converts the image to a PyTorch tensor
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),    #Normalizes the image using the specified mean and standard deviation
])

# Function to load and pre-process an image
def load_and_preprocess_image(img_path):
    img = Image.open(img_path).convert("RGB")
    img_tensor = preprocess(img)
    img_tensor = img_tensor.unsqueeze(0)  # Aggiungi una dimensione per il batch
    return img_tensor

# Function to pre-process an already loaded image
def process_image(img):
    img_tensor = preprocess(img)
    img_tensor = img_tensor.unsqueeze(0)  # Aggiungi una dimensione per il batch
    return img_tensor


# Function to calculate marginal entropy (an indication of the uncertainty in the model's predictions)
def marginal_entropy(outputs):
    logits = outputs - outputs.logsumexp(dim=-1, keepdim=True)
    avg_logits = logits.logsumexp(dim=0) - np.log(logits.shape[0])
    min_real = torch.finfo(avg_logits.dtype).min
    avg_logits = torch.clamp(avg_logits, min=min_real)
    return -(avg_logits * torch.exp(avg_logits)).sum(dim=-1), avg_logits

# Function to calculate Kullback-Leibler (KL) divergence (how one probability distribution diverges from a reference distribution)
def kl_div(outputs):
    kl_divs = []
    for i in range(outputs.size(1)):
        class_probs = outputs[:, i]
        class_avg_prob = class_probs.mean(dim=0)
        kl_div = (class_probs * (torch.log(class_probs) - torch.log(class_avg_prob))).sum(dim=-1)
        kl_divs.append(kl_div)
    
    # Sum of the KL Divergences
    total_kl_div = torch.stack(kl_divs).sum()
    return total_kl_div



# Function to get the readable label from the class ID
def get_class_label(class_id):
    return class_mapping.get(class_id, "Unknown class")


# ImageNet class mapping
class_mapping_data = [
    "n01498041 stingray",
    "n01531178 goldfinch",
    "n01534433 junco",
    "n01558993 American robin",
    "n01580077 jay",
    "n01614925 bald eagle",
    "n01616318 vulture",
    "n01631663 newt",
    "n01641577 American bullfrog",
    "n01669191 box turtle",
    "n01677366 green iguana",
    "n01687978 agama",
    "n01694178 chameleon",
    "n01698640 American alligator",
    "n01735189 garter snake",
    "n01770081 harvestman",
    "n01770393 scorpion",
    "n01774750 tarantula",
    "n01784675 centipede",
    "n01819313 sulphur-crested cockatoo",
    "n01820546 lorikeet",
    "n01833805 hummingbird",
    "n01843383 toucan",
    "n01847000 duck",
    "n01855672 goose",
    "n01882714 koala",
    "n01910747 jellyfish",
    "n01914609 sea anemone",
    "n01924916 flatworm",
    "n01944390 snail",
    "n01985128 crayfish",
    "n01986214 hermit crab",
    "n02007558 flamingo",
    "n02009912 great egret",
    "n02037110 oystercatcher",
    "n02051845 pelican",
    "n02077923 sea lion",
    "n02085620 Chihuahua",
    "n02099601 Golden Retriever",
    "n02106550 Rottweiler",
    "n02106662 German Shepherd Dog",
    "n02110958 pug",
    "n02119022 red fox",
    "n02123394 Persian cat",
    "n02127052 lynx",
    "n02129165 lion",
    "n02133161 American black bear",
    "n02137549 mongoose",
    "n02165456 ladybug",
    "n02174001 rhinoceros beetle",
    "n02177972 weevil",
    "n02190166 fly",
    "n02206856 bee",
    "n02219486 ant",
    "n02226429 grasshopper",
    "n02231487 stick insect",
    "n02233338 cockroach",
    "n02236044 mantis",
    "n02259212 leafhopper",
    "n02268443 dragonfly",
    "n02279972 monarch butterfly",
    "n02280649 small white",
    "n02281787 gossamer-winged butterfly",
    "n02317335 starfish",
    "n02325366 cottontail rabbit",
    "n02346627 porcupine",
    "n02356798 fox squirrel",
    "n02361337 marmot",
    "n02410509 bison",
    "n02445715 skunk",
    "n02454379 armadillo",
    "n02486410 baboon",
    "n02492035 white-headed capuchin",
    "n02504458 African bush elephant",
    "n02655020 pufferfish",
    "n02669723 academic gown",
    "n02672831 accordion",
    "n02676566 acoustic guitar",
    "n02690373 airliner",
    "n02701002 ambulance",
    "n02730930 apron",
    "n02777292 balance beam",
    "n02782093 balloon",
    "n02787622 banjo",
    "n02793495 barn",
    "n02797295 wheelbarrow",
    "n02802426 basketball",
    "n02814860 lighthouse",
    "n02815834 beaker",
    "n02837789 bikini",
    "n02879718 bow",
    "n02883205 bow tie",
    "n02895154 breastplate",
    "n02906734 broom",
    "n02948072 candle",
    "n02951358 canoe",
    "n02980441 castle",
    "n02992211 cello",
    "n02999410 chain",
    "n03014705 chest",
    "n03026506 Christmas stocking",
    "n03124043 cowboy boot",
    "n03125729 cradle",
    "n03187595 rotary dial telephone",
    "n03196217 digital clock",
    "n03223299 doormat",
    "n03250847 drumstick",
    "n03255030 dumbbell",
    "n03291819 envelope",
    "n03325584 feather boa",
    "n03355925 flagpole",
    "n03384352 forklift",
    "n03388043 fountain",
    "n03417042 garbage truck",
    "n03443371 goblet",
    "n03444034 go-kart",
    "n03445924 golf cart",
    "n03452741 grand piano",
    "n03483316 hair dryer",
    "n03584829 clothes iron",
    "n03590841 jack-o'-lantern",
    "n03594945 jeep",
    "n03617480 kimono",
    "n03666591 lighter",
    "n03670208 limousine",
    "n03717622 manhole cover",
    "n03720891 maraca",
    "n03721384 marimba",
    "n03724870 mask",
    "n03775071 mitten",
    "n03788195 mosque",
    "n03804744 nail",
    "n03837869 obelisk",
    "n03840681 ocarina",
    "n03854065 organ",
    "n03888257 parachute",
    "n03891332 parking meter",
    "n03935335 piggy bank",
    "n03982430 billiard table",
    "n04019541 hockey puck",
    "n04033901 quill",
    "n04039381 racket",
    "n04067472 reel",
    "n04086273 revolver",
    "n04099969 rocking chair",
    "n04118538 rugby ball",
    "n04131690 salt shaker",
    "n04133789 sandal",
    "n04141076 saxophone",
    "n04146614 school bus",
    "n04147183 schooner",
    "n04179913 sewing machine",
    "n04208210 shovel",
    "n04235860 sleeping bag",
    "n04252077 snowmobile",
    "n04252225 snowplow",
    "n04254120 soap dispenser",
    "n04270147 spatula",
    "n04275548 spider web",
    "n04310018 steam locomotive",
    "n04317175 stethoscope",
    "n04344873 couch",
    "n04347754 submarine",
    "n04355338 sundial",
    "n04366367 suspension bridge",
    "n04376876 syringe",
    "n04389033 tank",
    "n04399382 teddy bear",
    "n04442312 toaster",
    "n04456115 torch",
    "n04482393 tricycle",
    "n04507155 umbrella",
    "n04509417 unicycle",
    "n04532670 viaduct",
    "n04540053 volleyball",
    "n04554684 washing machine",
    "n04562935 water tower",
    "n04591713 wine bottle",
    "n04606251 shipwreck",
    "n07583066 guacamole",
    "n07695742 pretzel",
    "n07697313 cheeseburger",
    "n07697537 hot dog",
    "n07714990 broccoli",
    "n07718472 cucumber",
    "n07720875 bell pepper",
    "n07734744 mushroom",
    "n07749582 lemon",
    "n07753592 banana",
    "n07760859 custard apple",
    "n07768694 pomegranate",
    "n07831146 carbonara",
    "n09229709 bubble",
    "n09246464 cliff",
    "n09472597 volcano",
    "n09835506 baseball player",
    "n11879895 rapeseed",
    "n12057211 yellow lady's slipper",
    "n12144580 corn",
    "n12267677 acorn"
]

class_mapping = {}
for line in class_mapping_data:
    parts = line.strip().split(' ')
    class_mapping[parts[0]] = ' '.join(parts[1:])

## **Section 2 MEMO baseline**  

### MEMO overview: Test-Time Augmentation for Robustness

MEMO is a Test-Time Augmentation (TTA) technique designed to enhance a model's performance on test data. The MEMO process can be summarized in the following steps:

1. **Loading the Pre-trained Model**: We use a pre-trained model, in this case a ResNet50.  

2. **Generating Augmentations**: For each test image, we generate various augmentations (variations of the image). The augmentations can include transformations such as rotations, crops, color changes, etc.  

3. **Computing Predictions**: We pass each augmented image through the model to obtain predictions, producing a set of predictions for each version of the image.  

4. **Minimizing Entropy**: We use an entropy loss to adapt the model weights during testing. The goal is to minimize the entropy of the predictions, making the model more confident in its predictions.  

5. **Updating Model Weights**: We update the model weights based on the calculated entropy loss.  

6. **Final Prediction**: After updating the weights, we pass the original (non-augmented) image through the model to obtain the final prediction.


Below, we present the implementation of MEMO baseline with augmix augmentations

In [None]:
from torchvision.models import resnet50, ResNet50_Weights
from copy import deepcopy
from tqdm import tqdm

# Initialize the augmentator and the number of augmentations
augmentator = ImageAugmentor()
n_augmentations = 64

# Determine the device to use (GPU if available, otherwise CPU)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load the pre-trained ResNet50 model with default weights
model = resnet50(weights=ResNet50_Weights.DEFAULT).to(device)
model.train()  # Set the model to training mode

# Initialize the optimizer (AdamW in this case)
optimizer = torch.optim.AdamW(model.parameters(), lr=0.001)

# Save the initial state of the model
model_weights = deepcopy(model.state_dict())

# Save the initial BatchNorm running mean and variance
train_mean = []
train_var = []
for module in model.modules():
    if isinstance(module, torch.nn.BatchNorm2d):
        train_mean.append(deepcopy(module.running_mean))
        train_var.append(deepcopy(module.running_var))

# Variables to track the total number of images and correctly classified images
total_images = 0
correcly_classified_images = 0

# Iterate over all folders in the 'imagenet-a' directory
for folder in tqdm(os.listdir(dataset_path), desc="Processing classes"):
    image_folder = os.path.join(dataset_path, folder)
    
    # Skip if not a directory
    if os.path.isdir(image_folder) == False:
        continue
    
    # Get the true label of the current class from the folder name
    true_label = get_class_label(image_folder[-9:])
    
    # List all image files in the folder
    image_files = [os.path.join(image_folder, img) for img in os.listdir(image_folder) if img.endswith('.jpg') or img.endswith('.png')]

    total_images += len(image_files)

    equal_images_current_folder = 0

    N = 16  # Used for BatchNorm calculations

    # Iterate over all images in the folder
    for i in image_files:
        model.train()  # Set the model to training mode

        # Apply augmentations to the image
        images_aug = augmentator.apply_augmentations(i, True, n_augmentations)
        likelyhood_list = []

        optimizer.zero_grad()  # Zero the gradients
        
        # Predict for each augmented image
        for img in images_aug:
            proc = process_image(img).to(device)  # Process the image
            output = model(proc)
            _, preds = torch.max(output, 1)  # Get the predicted class
            likelyhood_list.append(output)
       
        # Adjust BatchNorm statistics
        index = 0
        for module in model.modules():
            if isinstance(module, torch.nn.BatchNorm2d):
                # Retrieve current statistics
                mu_train = train_mean[index]
                var_train = train_var[index]
                
                # Calculate test statistics
                mu_test = module.running_mean
                var_test = module.running_var
                
                # Mix the statistics
                module.running_mean = (N / (N + 1)) * mu_train + (1 / (N + 1)) * mu_test
                module.running_var = (N / (N + 1)) * var_train + (1 / (N + 1)) * var_test

                index += 1

        # Convert the list of probabilities to a tensor and calculate the mean entropy
        probabilities_tensor = torch.stack(likelyhood_list)
        mean_entropy, avg_logits = marginal_entropy(probabilities_tensor)

        # Backpropagation and update the model parameters
        mean_entropy.backward()
        optimizer.step()

        model.eval()  # Set the model to evaluation mode

        # Predict the original image
        with torch.no_grad():
            output = model(load_and_preprocess_image(i).to(device))
            _, preds = torch.max(output, 1)

            # Check if the prediction is correct
            if (class_idx[preds.item()] == true_label):
                equal_images_current_folder += 1
                correcly_classified_images += 1

        # Restore the initial model state
        model.load_state_dict(model_weights)

    # Print the number of correctly classified images for the current class
    print(f"Number of images classified correctly for the class ----> {true_label}: {equal_images_current_folder}")

# Print the overall classification accuracy
print("__________________________________________________________")
print(f"Total number of images: {total_images}")
print(f"Number of correctly classified images: {correcly_classified_images}")
print(f"Accuracy: {correcly_classified_images / total_images}")


## **Section 3: MEMO plus BLIP**  

### MEMO Plus BLIP Overview: Integrating Test-Time Augmentation with Visual Question Answering
MEMO plus BLIP is an advanced Test-Time Augmentation (TTA) technique that incorporates Visual Question Answering (VQA) to enhance model performance on test data. The process involves the following steps:

1. **Loading the Pre-trained Models**: We utilize a pre-trained ResNet50 model for image classification and a pre-trained BLIP model for VQA.

2. **Generating Augmentations**: For each test image, we create multiple augmented versions by applying various transformations such as rotations, crops, and color adjustments.

3. **Computing Predictions**: Each augmented image is processed through the ResNet50 model to obtain predictions, resulting in a set of predictions for each augmented version of the image.

4. **Incorporating VQA with BLIP**: For each prediction, we leverage the BLIP model to ask a question about the presence of the predicted class in the original (non-augmented) image. The detailed steps are:
    - **Generate Questions**: For each predicted class from the augmented images, formulate a question like "Is there a [predicted class] in the picture?"
    - **Use BLIP for Answers**: The BLIP model processes both the question and the original image to generate an answer.
    - **Adjust Prediction Confidence**:
        - **Positive Response**: If BLIP confirms the presence of the class (e.g., with answers like "yes", "there is", "correct", "true"), the confidence score of the corresponding prediction is increased by a specific value.
        - **Negative Response**: If BLIP indicates the absence of the class (e.g., with answers like "no", "not", "false"), the confidence score of the prediction remains unchanged.

5. **Minimizing Entropy**: An entropy-based loss function is used to adapt the model weights during the testing phase. The objective is to minimize the entropy of the predictions, thereby increasing the model's confidence.

6. **Updating Model Weights**: Based on the entropy loss calculated, the model weights are updated to refine the prediction accuracy.

7. **Final Prediction**: After updating the model weights, the original (non-augmented) image is passed through the ResNet50 model again to obtain the final, adjusted prediction.

In [None]:
# Import necessary libraries
from torchvision.models import resnet50, ResNet50_Weights
from copy import deepcopy
from tqdm import tqdm
from transformers import BlipProcessor, BlipForQuestionAnswering

# Load the processor and model for Visual Question Answering (VQA)
processor_blip = BlipProcessor.from_pretrained("Salesforce/blip-vqa-base")
model_blip = BlipForQuestionAnswering.from_pretrained("Salesforce/blip-vqa-base")

# Initialize an image augmentation class (assumed to be implemented elsewhere)
augmentator = ImageAugmentor()
n_augmentations = 20  # Number of augmentations to apply

# Set the device to GPU if available, otherwise use CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load the ResNet50 model with pre-trained weights and move it to the selected device
model = resnet50(weights=ResNet50_Weights.DEFAULT).to(device)
model.train()  # Set the model to training mode

# Initialize the AdamW optimizer for model training with a learning rate of 0.001
optimizer = torch.optim.AdamW(model.parameters(), lr=0.001)

# Save the initial state of the model to restore later
model_weights = deepcopy(model.state_dict())

# Initialize lists to store batch normalization statistics
train_mean = []
train_var = []

# Loop through the model's modules to extract statistics from BatchNorm2d layers
for module in model.modules():
    if isinstance(module, torch.nn.BatchNorm2d):
        train_mean.append(deepcopy(module.running_mean))
        train_var.append(deepcopy(module.running_var))

# Initialize counters for total and correctly classified images
total_images = 0
correcly_classified_images = 0

# Set experiment type
memo = True
dataseth = "imagenet-a"  # Can also be "imagenetv2" (not imported in this file)

# Iterate through each folder in the dataset directory
for folder in tqdm(os.listdir(f"{dataset_path}"), desc="Processing classes"):

    # Construct the full path to the current image folder
    image_folder = os.path.join(dataset_path, folder)
    
    # Skip non-directory files
    if not os.path.isdir(image_folder):
        continue

    # Determine the true class label based on the experiment type
    true_label = get_class_label(folder) if dataseth == "imagenet-a" else class_idx[int(folder)]
    
    # List of image files in the current folder
    image_files = [os.path.join(image_folder, img) for img in os.listdir(image_folder) if img.endswith('.jpg') or img.endswith('.png') or img.endswith(".jpeg")]
    total_images += len(image_files)  # Update total image count

    equal_images_current_folder = 0  # Counter for correctly classified images in the current folder
    N = 16  # A constant for updating batch normalization statistics

    for i in image_files:
        
        model.train()  # Ensure the model is in training mode
        if memo:
            # Apply augmentations to the image
            images_aug = augmentator.apply_all_augmentations(i)
            likelihood_list = []  # List to store the output probabilities

            optimizer.zero_grad()  # Clear the previous gradients

            prediction_tmp = []  # Temporary list to store predictions for this image

            image_test = Image.open(i)  # Open the image file

            dictionary_blip = {}  # Dictionary to store answers from the BLIP model

            for img in images_aug:
                proc = process_image(img).to(device)  # Process the augmented image
                output = model(proc)  # Get the model's output
                
                _, preds = torch.max(output, 1)  # Get the predicted class
                prediction_tmp.append(class_idx[preds.item()])  # Store the predicted class

                if class_idx[preds.item()] in dictionary_blip:
                    if dictionary_blip[class_idx[preds.item()]] in ["yes", "there is", "correct", "true"]:
                        # Increase the probability of the predicted class if the answer is affirmative
                        output[0][preds.item()] += output[0][preds.item()] * 2
                else:
                    # Formulate a question for the BLIP model
                    question = f"Is there a {class_idx[preds.item()]} in the picture?"

                    # Prepare the input for the BLIP model
                    inputs = processor_blip(image_test, question, return_tensors="pt")
                    # Generate an answer from the BLIP model
                    out = model_blip.generate(**inputs, max_length=2, num_beams=1)
                    # Decode the answer
                    risposta_blip = processor_blip.decode(out[0], skip_special_tokens=True)

                    dictionary_blip[class_idx[preds.item()]] = risposta_blip.lower()  # Store the answer

                    if risposta_blip.lower() in ["yes", "there is", "correct", "true"]:
                        # Increase the probability of the predicted class if the answer is affirmative
                        output[0][preds.item()] += output[0][preds.item()] * 2

                likelihood_list.append(output)  # Add the output to the list

            # Update batch normalization statistics
            index = 0
            for module in model.modules():
                if isinstance(module, torch.nn.BatchNorm2d):
                    mu_train = train_mean[index]  # Training mean
                    var_train = train_var[index]  # Training variance
                    mu_test = module.running_mean  # Test mean
                    var_test = module.running_var  # Test variance
                    
                    # Update running statistics
                    module.running_mean = (N / (N + 1)) * mu_train + (1 / (N + 1)) * mu_test
                    module.running_var = (N / (N + 1)) * var_train + (1 / (N + 1)) * var_test

                    index += 1

            # Convert the list of probabilities to a tensor and compute the mean entropy
            probabilities_tensor = torch.stack(likelihood_list)
            mean_entropy, avg_logits = marginal_entropy(probabilities_tensor)

            # Perform backpropagation and update model parameters
            mean_entropy.backward()
            optimizer.step()

        model.eval()  # Set the model to evaluation mode

        with torch.no_grad():
            # Load and preprocess the image for evaluation
            output = model(load_and_preprocess_image(i).to(device))
            _, preds = torch.max(output, 1)  # Get the predicted class
            if class_idx[preds.item()] == true_label:
                equal_images_current_folder += 1  # Increment counter for correctly classified images
                correcly_classified_images += 1

        model.load_state_dict(model_weights)  # Restore the model to its original state
    
    # Print the number of correctly classified images for the current class
    print(f"The number of images classified correctly for class ----> {true_label}: {equal_images_current_folder}")

# Print the overall classification accuracy and statistics
print("__________________________________________________________")
print(f"Total number of images: {total_images}")
print(f"Number of correctly classified images: {correcly_classified_images}")
print(f"Accuracy: {correcly_classified_images / total_images}")


### **Best result**

Macinando classi:   0% 0/202 [00:00<?, ?it/s] The number of images classified correctly for class ----> mask: 0  
Macinando classi:   0% 1/202 [04:24<14:45:30, 264.33s/it] The number of images classified correctly for class ----> jack-o'-lantern: 4  
Macinando classi:   1% 2/202 [07:01<11:10:36, 201.18s/it] The number of images classified correctly for class ----> rapeseed: 3  
Macinando classi:   1% 3/202 [08:58<9:00:45, 163.04s/it] The number of images classified correctly for class ----> garbage truck: 1  
Macinando classi:   2% 4/202 [09:56<6:40:57, 121.50s/it] The number of images classified correctly for class ----> barn: 0  
Macinando classi:   2% 5/202 [12:39<7:27:28, 136.29s/it] The number of images classified correctly for class ----> bikini: 1  
Macinando classi:   3% 6/202 [14:57<7:27:06, 136.87s/it] The number of images classified correctly for class ----> oystercatcher: 5  
Macinando classi:   3% 7/202 [16:04<6:10:26, 113.98s/it] The number of images classified correctly for class ----> scorpion: 49  
Macinando classi:   4% 8/202 [26:54<15:21:00, 284.85s/it] The number of images classified correctly for class ----> jellyfish: 11  
Macinando classi:   4% 9/202 [36:45<20:23:32, 380.38s/it] The number of images classified correctly for class ----> lynx: 3  
Macinando classi:   5% 10/202 [47:58<25:06:57, 470.92s/it] The number of images classified correctly for class ----> starfish: 7  
Macinando classi:   5% 11/202 [51:26<20:42:56, 390.45s/it] The number of images classified correctly for class ----> cello: 1  
Macinando classi:   6% 12/202 [52:10<15:02:02, 284.86s/it] The number of images classified correctly for class ----> tarantula: 4  
Macinando classi:   6% 13/202 [56:54<14:56:54, 284.73s/it] The number of images classified correctly for class ----> unicycle: 2  
Macinando classi:   7% 14/202 [58:54<12:16:04, 234.92s/it] The number of images classified correctly for class ----> syringe: 0  
Macinando classi:   7% 15/202 [1:00:04<9:37:05, 185.16s/it] The number of images classified correctly for class ----> ambulance: 1  
Macinando classi:   8% 16/202 [1:00:51<7:25:09, 143.60s/it] The number of images classified correctly for class ----> hot dog: 0  
Macinando classi:   8% 17/202 [1:02:06<6:18:55, 122.89s/it] The number of images classified correctly for class ----> sleeping bag: 0  
Macinando classi:   9% 18/202 [1:03:00<5:14:00, 102.39s/it] The number of images classified correctly for class ----> sea lion: 9  
Macinando classi:   9% 19/202 [1:07:08<7:25:08, 145.95s/it] The number of images classified correctly for class ----> snowplow: 0  
Macinando classi:  10% 20/202 [1:08:20<6:15:09, 123.68s/it] The number of images classified correctly for class ----> baseball player: 0  
Macinando classi:  11% 22/202 [1:09:36<4:12:41, 84.23s/it] The number of images classified correctly for class ----> monarch butterfly: 4  
Macinando classi:  11% 23/202 [1:19:25<10:24:39, 209.38s/it] The number of images classified correctly for class ----> harvestman: 13  
Macinando classi:  12% 24/202 [1:26:41<13:17:03, 268.67s/it] The number of images classified correctly for class ----> organ: 0  
Macinando classi:  12% 25/202 [1:27:56<10:37:01, 215.94s/it] The number of images classified correctly for class ----> doormat: 0  
Macinando classi:  13% 26/202 [1:29:38<8:59:57, 184.07s/it] The number of images classified correctly for class ----> golf cart: 6  
Macinando classi:  13% 27/202 [1:31:35<8:00:53, 164.88s/it] The number of images classified correctly for class ----> koala: 2  
Macinando classi:  14% 28/202 [1:34:32<8:08:28, 168.44s/it] The number of images classified correctly for class ----> billiard table: 0  
Macinando classi:  14% 29/202 [1:37:02<7:50:14, 163.09s/it] The number of images classified correctly for class ----> academic gown: 1    
Macinando classi:  15% 30/202 [1:38:14<6:30:36, 136.26s/it] The number of images classified correctly for class ----> grasshopper: 11    
Macinando classi:  15% 31/202 [1:49:35<14:08:23, 297.68s/it] The number of images classified correctly for class ----> pug: 5    
Macinando classi:  16% 32/202 [1:52:17<12:08:47, 257.22s/it] The number of images classified correctly for class ----> suspension bridge: 1    
Macinando classi:  16% 33/202 [1:55:09<10:53:35, 232.05s/it] The number of images classified correctly for class ----> red fox: 7    
Macinando classi:  17% 34/202 [2:03:01<14:10:11, 303.64s/it] The number of images classified correctly for class ----> soap dispenser: 0  
Macinando classi:  17% 35/202 [2:03:42<10:25:58, 224.90s/it] The number of images classified correctly for class ----> parking meter: 1  
Macinando classi:  18% 36/202 [2:05:07<8:26:31, 183.08s/it] The number of images classified correctly for class ----> stingray: 7  
Macinando classi:  18% 37/202 [2:14:07<13:17:24, 289.97s/it] The number of images classified correctly for class ----> Golden Retriever: 1  
Macinando classi:  19% 38/202 [2:15:57<10:45:16, 236.08s/it] The number of images classified correctly for class ----> candle: 9  
Macinando classi:  19% 39/202 [2:22:13<12:35:41, 278.17s/it] The number of images classified correctly for class ----> torch: 6  
Macinando classi:  20% 40/202 [2:24:55<10:56:29, 243.15s/it] The number of images classified correctly for class ----> mantis: 30  
Macinando classi:  20% 41/202 [2:36:03<16:34:22, 370.57s/it] The number of images classified correctly for class ----> small white: 8  
Macinando classi:  21% 42/202 [2:43:00<17:05:48, 384.68s/it] The number of images classified correctly for class ----> cradle: 0  
Macinando classi:  21% 43/202 [2:43:49<12:32:14, 283.87s/it] The number of images classified correctly for class ----> balloon: 0  
Macinando classi:  22% 44/202 [2:47:13<11:24:26, 259.91s/it] The number of images classified correctly for class ----> acoustic guitar: 3  
Macinando classi:  22% 45/202 [2:49:30<9:44:02, 223.20s/it] The number of images classified correctly for class ----> cowboy boot: 1  
Macinando classi:  23% 46/202 [2:52:23<9:00:29, 207.88s/it] The number of images classified correctly for class ----> dumbbell: 4  
Macinando classi:  23% 47/202 [2:54:26<7:51:27, 182.50s/it] The number of images classified correctly for class ----> dragonfly: 22  
Macinando classi:  24% 48/202 [3:03:35<12:31:04, 292.62s/it] The number of images classified correctly for class ----> rocking chair: 2  
Macinando classi:  24% 49/202 [3:05:43<10:19:39, 243.00s/it] The number of images classified correctly for class ----> Chihuahua: 4  
Macinando classi:  25% 50/202 [3:07:22<8:26:27, 199.92s/it] The number of images classified correctly for class ----> couch: 1  
Macinando classi:  25% 51/202 [3:11:46<9:11:44, 219.23s/it] The number of images classified correctly for class ----> centipede: 16  
Macinando classi:  26% 52/202 [3:18:40<11:33:44, 277.50s/it] The number of images classified correctly for class ----> American robin: 24  
Macinando classi:  26% 53/202 [3:29:58<16:27:18, 397.57s/it] The number of images classified correctly for class ----> armadillo: 4  
Macinando classi:  27% 54/202 [3:32:47<13:31:33, 329.01s/it] The number of images classified correctly for class ----> parachute: 0  
Macinando classi:  27% 55/202 [3:34:15<10:29:28, 256.93s/it] The number of images classified correctly for class ----> banana: 4  
Macinando classi:  28% 56/202 [3:36:31<8:56:25, 220.45s/it] The number of images classified correctly for class ----> American black bear: 2  
Macinando classi:  28% 57/202 [3:39:53<8:39:22, 214.91s/it] The number of images classified correctly for class ----> gossamer-winged butterfly: 11  
Macinando classi:  29% 58/202 [3:49:13<12:44:34, 318.58s/it] The number of images classified correctly for class ----> teddy bear: 18  
Macinando classi:  29% 59/202 [3:55:42<13:29:44, 339.75s/it] The number of images classified correctly for class ----> quill: 4  
Macinando classi:  30% 60/202 [3:56:48<10:09:21, 257.48s/it] The number of images classified correctly for class ----> lighthouse: 2  
Macinando classi:  30% 61/202 [3:59:46<9:09:14, 233.72s/it] The number of images classified correctly for class ----> ocarina: 2  
Macinando classi:  31% 62/202 [4:00:35<6:56:05, 178.33s/it] The number of images classified correctly for class ----> sundial: 2  
Macinando classi:  31% 63/202 [4:02:39<6:15:13, 161.96s/it] The number of images classified correctly for class ----> jeep: 5  
Macinando classi:  32% 64/202 [4:06:04<6:42:07, 174.84s/it] The number of images classified correctly for class ----> chest: 1  
Macinando classi:  32% 65/202 [4:07:35<5:41:42, 149.65s/it] The number of images classified correctly for class ----> ant: 17  
Macinando classi:  33% 66/202 [4:15:07<9:05:12, 240.54s/it] The number of images classified correctly for class ----> snail: 17  
Macinando classi:  33% 67/202 [4:23:49<12:11:17, 325.02s/it] The number of images classified correctly for class ----> great egret: 16  
Macinando classi:  34% 68/202 [4:34:14<15:26:34, 414.89s/it] The number of images classified correctly for class ----> flagpole: 0  
Macinando classi:  34% 69/202 [4:37:02<12:35:48, 340.97s/it] The number of images classified correctly for class ----> leafhopper: 14  
Macinando classi:  35% 70/202 [4:41:32<11:43:06, 319.60s/it] The number of images classified correctly for class ----> skunk: 17  
Macinando classi:  35% 71/202 [4:48:59<13:00:57, 357.69s/it] The number of images classified correctly for class ----> marmot: 0  
Macinando classi:  36% 72/202 [4:51:40<10:47:21, 298.78s/it] The number of images classified correctly for class ----> fly: 18  
Macinando classi:  36% 73/202 [5:02:39<14:34:38, 406.81s/it] The number of images classified correctly for class ----> balance beam: 6  
Macinando classi:  37% 74/202 [5:06:08<12:21:04, 347.38s/it] The number of images classified correctly for class ----> broom: 0  
Macinando classi:  37% 75/202 [5:09:48<10:54:39, 309.29s/it] The number of images classified correctly for class ----> goose: 12  
Macinando classi:  38% 76/202 [5:21:04<14:40:30, 419.29s/it] The number of images classified correctly for class ----> banjo: 0  
Macinando classi:  38% 77/202 [5:22:09<10:52:10, 313.04s/it] The number of images classified correctly for class ----> cliff: 0  
Macinando classi:  39% 78/202 [5:23:30<8:22:43, 243.26s/it] The number of images classified correctly for class ----> corn: 2  
Macinando classi:  39% 79/202 [5:25:37<7:07:28, 208.53s/it] The number of images classified correctly for class ----> flamingo: 8  
Macinando classi:  40% 80/202 [5:33:25<9:42:09, 286.31s/it] The number of images classified correctly for class ----> pomegranate: 1  
Macinando classi:  40% 81/202 [5:38:38<9:53:51, 294.47s/it] The number of images classified correctly for class ----> bow tie: 2  
Macinando classi:  41% 82/202 [5:41:16<8:26:37, 253.31s/it] The number of images classified correctly for class ----> viaduct: 4  
Macinando classi:  41% 83/202 [5:43:24<7:07:53, 215.74s/it] The number of images classified correctly for class ----> bow: 6  
Macinando classi:  42% 84/202 [5:46:01<6:29:54, 198.26s/it] The number of images classified correctly for class ----> beaker: 0  
Macinando classi:  42% 85/202 [5:46:43<4:55:20, 151.46s/it] The number of images classified correctly for class ----> ladybug: 38  
Macinando classi:  43% 86/202 [5:57:35<9:42:44, 301.42s/it] The number of images classified correctly for class ----> volleyball: 3  
Macinando classi:  43% 87/202 [6:00:31<8:25:56, 263.97s/it] The number of images classified correctly for class ----> basketball: 10  
Macinando classi:  44% 88/202 [6:05:24<8:37:41, 272.47s/it] The number of images classified correctly for class ----> limousine: 0  
Macinando classi:  44% 89/202 [6:06:15<6:28:02, 206.04s/it] The number of images classified correctly for class ----> salt shaker: 1  
Macinando classi:  45% 90/202 [6:08:54<5:58:37, 192.12s/it] The number of images classified correctly for class ----> newt: 2  
Macinando classi:  45% 91/202 [6:15:35<7:51:15, 254.74s/it] The number of images classified correctly for class ----> water tower: 1  
Macinando classi:  46% 92/202 [6:20:00<7:52:16, 257.60s/it] The number of images classified correctly for class ----> American bullfrog: 7  
Macinando classi:  46% 93/202 [6:31:26<11:41:28, 386.13s/it] The number of images classified correctly for class ----> mushroom: 5  
Macinando classi:  47% 94/202 [6:42:12<13:55:48, 464.34s/it] The number of images classified correctly for class ----> American alligator: 8  
Macinando classi:  47% 95/202 [6:50:29<14:05:18, 474.00s/it] The number of images classified correctly for class ----> shipwreck: 1  
Macinando classi:  48% 96/202 [6:51:51<10:29:32, 356.35s/it] The number of images classified correctly for class ----> umbrella: 2  
Macinando classi:  48% 97/202 [6:55:14<9:03:00, 310.29s/it] The number of images classified correctly for class ----> bubble: 0  
Macinando classi:  49% 98/202 [6:59:42<8:36:19, 297.88s/it] The number of images classified correctly for class ----> Rottweiler: 3  
Macinando classi:  49% 99/202 [7:03:33<7:56:26, 277.54s/it] The number of images classified correctly for class ----> pufferfish: 3  
Macinando classi:  50% 100/202 [7:08:35<8:04:28, 284.98s/it] The number of images classified correctly for class ----> breastplate: 0  
Macinando classi:  50% 101/202 [7:09:25<6:01:00, 214.46s/it] The number of images classified correctly for class ----> sewing machine: 4  
Macinando classi:  50% 102/202 [7:11:36<5:15:47, 189.48s/it] The number of images classified correctly for class ----> maraca: 1  
Macinando classi:  51% 103/202 [7:13:09<4:24:48, 160.49s/it] The number of images classified correctly for class ----> wine bottle: 3  
Macinando classi:  51% 104/202 [7:17:24<5:08:29, 188.87s/it] The number of images classified correctly for class ----> spider web: 2  
Macinando classi:  52% 105/202 [7:21:51<5:43:05, 212.22s/it] The number of images classified correctly for class ----> fox squirrel: 27  
Macinando classi:  52% 106/202 [7:33:38<9:37:04, 360.67s/it] The number of images classified correctly for class ----> rugby ball: 2  
Macinando classi:  53% 107/202 [7:35:14<7:25:21, 281.28s/it] The number of images classified correctly for class ----> sulphur-crested cockatoo: 10  
Macinando classi:  53% 108/202 [7:43:42<9:07:29, 349.46s/it] The number of images classified correctly for class ----> flatworm: 0  
Macinando classi:  54% 109/202 [7:55:28<11:47:17, 456.31s/it] The number of images classified correctly for class ----> wheelbarrow: 4  
Macinando classi:  54% 110/202 [7:59:58<10:14:12, 400.57s/it] The number of images classified correctly for class ----> box turtle: 5  
Macinando classi:  55% 111/202 [8:07:10<10:21:26, 409.75s/it] The number of images classified correctly for class ----> washing machine: 0  
Macinando classi:  55% 112/202 [8:08:50<7:55:30, 317.01s/it] The number of images classified correctly for class ----> drumstick: 1  
Macinando classi:  56% 113/202 [8:10:15<6:06:40, 247.19s/it] The number of images classified correctly for class ----> sandal: 1  
Macinando classi:  56% 114/202 [8:14:43<6:11:49, 253.51s/it] The number of images classified correctly for class ----> weevil: 0  
Macinando classi:  57% 115/202 [8:17:23<4:23:54, 223.45s/it] The number of images classified correctly for class ----> lion: 1  
Macinando classi:  57% 116/202 [8:19:50<4:57:21, 207.46s/it] The number of images classified correctly for class ----> revolver: 0  
Macinando classi:  58% 117/202 [8:21:03<4:06:29, 174.00s/it] The number of images classified correctly for class ----> hair dryer: 1  
Macinando classi:  58% 118/202 [8:22:08<3:23:55, 145.66s/it] The number of images classified correctly for class ----> bison: 4  
Macinando classi:  59% 119/202 [8:24:11<3:12:57, 139.49s/it] The number of images classified correctly for class ----> broccoli: 3  
Macinando classi:  59% 120/202 [8:25:25<2:45:33, 121.14s/it] The number of images classified correctly for class ----> submarine: 1  
Macinando classi:  60% 121/202 [8:27:28<2:44:16, 121.69s/it] The number of images classified correctly for class ----> jay: 26  
Macinando classi:  60% 122/202 [8:34:19<4:33:58, 205.48s/it] The number of images classified correctly for class ----> Christmas stocking: 0  
Macinando classi:  61% 123/202 [8:36:54<4:10:57, 190.61s/it] The number of images classified correctly for class ----> clothes iron: 0  
Macinando classi:  61% 124/202 [8:37:17<3:03:40, 141.29s/it] The number of images classified correctly for class ----> mitten: 0  
Macinando classi:  62% 125/202 [8:40:16<3:15:41, 152.49s/it] The number of images classified correctly for class ----> tricycle: 1  
Macinando classi:  62% 126/202 [8:40:41<2:24:58, 114.45s/it] The number of images classified correctly for class ----> goldfinch: 4  
Macinando classi:  63% 127/202 [8:51:51<5:50:18, 280.24s/it] The number of images classified correctly for class ----> chameleon: 18  
Macinando classi:  63% 128/202 [9:02:01<7:46:57, 378.61s/it] The number of images classified correctly for class ----> cucumber: 1  
Macinando classi:  64% 129/202 [9:05:17<6:34:18, 324.09s/it] The number of images classified correctly for class ----> vulture: 17  
Macinando classi:  64% 130/202 [9:13:33<7:30:27, 375.38s/it] The number of images classified correctly for class ----> go-kart: 2  
Macinando classi:  65% 131/202 [9:14:36<5:33:47, 282.08s/it] The number of images classified correctly for class ----> porcupine: 8  
Macinando classi:  65% 132/202 [9:21:12<6:08:46, 316.09s/it] The number of images classified correctly for class ----> baboon: 1  
Macinando classi:  66% 133/202 [9:21:58<4:30:16, 235.02s/it] The number of images classified correctly for class ----> acorn: 0  
Macinando classi:  66% 134/202 [9:24:25<3:56:37, 208.78s/it] The number of images classified correctly for class ----> sea anemone: 2  
Macinando classi:  67% 135/202 [9:35:45<6:30:44, 349.92s/it] The number of images classified correctly for class ----> toaster: 3  
Macinando classi:  67% 136/202 [9:37:29<5:03:53, 276.26s/it] The number of images classified correctly for class ----> schooner: 0  
Macinando classi:  68% 137/202 [9:39:12<4:02:54, 224.22s/it] The number of images classified correctly for class ----> cockroach: 24  
Macinando classi:  68% 138/202 [9:46:26<5:06:25, 287.27s/it] The number of images classified correctly for class ----> spatula: 1  
Macinando classi:  69% 139/202 [9:48:18<4:06:24, 234.67s/it] The number of images classified correctly for class ----> mongoose: 9  
Macinando classi:  69% 140/202 [9:54:34<4:46:22, 277.14s/it] The number of images classified correctly for class ----> bell pepper: 0  
Macinando classi:  70% 141/202 [9:59:01<4:38:40, 274.10s/it] The number of images classified correctly for class ----> junco: 27  
Macinando classi:  70% 142/202 [10:10:21<6:35:37, 395.63s/it] The number of images classified correctly for class ----> canoe: 1  
Macinando classi:  71% 143/202 [10:11:28<4:52:22, 297.33s/it] The number of images classified correctly for class ----> snowmobile: 0  
Macinando classi:  71% 144/202 [10:12:55<3:46:17, 234.09s/it] The number of images classified correctly for class ----> nail: 0  
Macinando classi:  72% 145/202 [10:13:44<2:49:29, 178.42s/it] The number of images classified correctly for class ----> pretzel: 0  
Macinando classi:  72% 146/202 [10:14:49<2:14:55, 144.57s/it] The number of images classified correctly for class ----> guacamole: 0  
Macinando classi:  73% 147/202 [10:16:54<2:07:11, 138.76s/it] The number of images classified correctly for class ----> lorikeet: 17  
Macinando classi:  73% 148/202 [10:28:28<4:34:39, 305.19s/it] The number of images classified correctly for class ----> pelican: 9  
Macinando classi:  74% 149/202 [10:34:46<4:48:58, 327.14s/it] The number of images classified correctly for class ----> saxophone: 0  
Macinando classi:  74% 150/202 [10:36:23<3:43:37, 258.04s/it] The number of images classified correctly for class ----> carbonara: 0  
Macinando classi:  75% 151/202 [10:37:51<2:56:02, 207.10s/it] The number of images classified correctly for class ----> German Shepherd Dog: 0  
Macinando classi:  75% 152/202 [10:39:09<2:20:15, 168.31s/it] The number of images classified correctly for class ----> grand piano: 1  
Macinando classi:  76% 154/202 [10:40:49<1:30:52, 113.59s/it] The number of images classified correctly for class ----> stick insect: 51  
Macinando classi:  77% 155/202 [10:47:27<2:24:18, 184.22s/it] The number of images classified correctly for class ----> Persian cat: 17  
Macinando classi:  77% 156/202 [10:54:57<3:14:32, 253.76s/it] The number of images classified correctly for class ----> agama: 13  
Macinando classi:  78% 157/202 [11:03:00<3:57:05, 316.12s/it] The number of images classified correctly for class ----> reel: 0  
Macinando classi:  78% 158/202 [11:04:26<3:04:29, 251.58s/it] The number of images classified correctly for class ----> piggy bank: 2  
Macinando classi:  79% 159/202 [11:05:44<2:24:49, 202.08s/it] The number of images classified correctly for class ----> envelope: 1  
Macinando classi:  79% 160/202 [11:09:18<2:23:56, 205.64s/it] The number of images classified correctly for class ----> bee: 3  
Macinando classi:  80% 161/202 [11:20:05<3:48:41, 334.67s/it] The number of images classified correctly for class ----> rhinoceros beetle: 1  
Macinando classi:  80% 162/202 [11:22:32<3:06:15, 279.39s/it] The number of images classified correctly for class ----> stethoscope: 1  
Macinando classi:  81% 163/202 [11:24:11<2:26:55, 226.05s/it] The number of images classified correctly for class ----> digital clock: 2  
Macinando classi:  81% 164/202 [11:24:59<1:49:34, 173.02s/it] The number of images classified correctly for class ----> duck: 15  
Macinando classi:  82% 165/202 [11:32:41<2:39:51, 259.22s/it] The number of images classified correctly for class ----> kimono: 2  
Macinando classi:  82% 166/202 [11:35:16<2:16:50, 228.06s/it] The number of images classified correctly for class ----> rotary dial telephone: 0  
Macinando classi:  83% 167/202 [11:36:35<1:46:56, 183.33s/it] The number of images classified correctly for class ----> green iguana: 20  
Macinando classi:  83% 168/202 [11:47:41<3:05:50, 327.94s/it] The number of images classified correctly for class ----> school bus: 4  
Macinando classi:  84% 169/202 [11:50:52<2:37:50, 286.99s/it] The number of images classified correctly for class ----> hermit crab: 4  
Macinando classi:  84% 170/202 [11:53:32<2:12:43, 248.85s/it] The number of images classified correctly for class ----> fountain: 3  
Macinando classi:  85% 171/202 [11:56:08<1:54:16, 221.18s/it] The number of images classified correctly for class ----> white-headed capuchin: 0  
Macinando classi:  85% 172/202 [11:57:12<1:26:59, 173.98s/it] The number of images classified correctly for class ----> shovel: 0  
Macinando classi:  86% 173/202 [11:58:58<1:14:12, 153.54s/it] The number of images classified correctly for class ----> castle: 0  
Macinando classi:  86% 174/202 [12:00:30<1:03:04, 135.16s/it] The number of images classified correctly for class ----> cottontail rabbit: 9  
Macinando classi:  87% 175/202 [12:07:21<1:38:05, 217.96s/it] The number of images classified correctly for class ----> steam locomotive: 0  
Macinando classi:  87% 176/202 [12:11:12<1:36:03, 221.69s/it] The number of images classified correctly for class ----> chain: 1  
Macinando classi:  88% 177/202 [12:17:35<1:52:31, 270.05s/it] The number of images classified correctly for class ----> African bush elephant: 0  
Macinando classi:  88% 178/202 [12:20:46<1:38:31, 246.31s/it] The number of images classified correctly for class ----> garter snake: 4  
Macinando classi:  89% 179/202 [12:29:47<2:08:17, 334.68s/it] The number of images classified correctly for class ----> marimba: 4  
Macinando classi:  89% 180/202 [12:31:27<1:37:00, 264.55s/it] The number of images classified correctly for class ----> cheeseburger: 1  
Macinando classi:  90% 181/202 [12:32:54<1:13:53, 211.10s/it] The number of images classified correctly for class ----> yellow lady's slipper: 0  
Macinando classi:  90% 182/202 [12:33:51<55:01, 165.06s/it] The number of images classified correctly for class ----> lemon: 0  
Macinando classi:  91% 183/202 [12:36:21<50:47, 160.40s/it] The number of images classified correctly for class ----> lighter: 7  
Macinando classi:  91% 184/202 [12:42:17<1:05:43, 219.07s/it] The number of images classified correctly for class ----> toucan: 5  
Macinando classi:  92% 185/202 [12:47:57<1:12:21, 255.38s/it] The number of images classified correctly for class ----> airliner: 1  
Macinando classi:  92% 186/202 [12:50:47<1:01:14, 229.68s/it] The number of images classified correctly for class ----> hockey puck: 0  
Macinando classi:  93% 187/202 [12:52:10<46:24, 185.64s/it] The number of images classified correctly for class ----> bald eagle: 11  
Macinando classi:  93% 188/202 [13:01:35<1:09:53, 299.52s/it] The number of images classified correctly for class ----> volcano: 2  
Macinando classi:  94% 189/202 [13:03:37<53:20, 246.21s/it] The number of images classified correctly for class ----> forklift: 0  
Macinando classi:  94% 190/202 [13:04:57<39:16, 196.35s/it] The number of images classified correctly for class ----> custard apple: 1  
Macinando classi:  95% 191/202 [13:05:36<27:22, 149.35s/it] The number of images classified correctly for class ----> manhole cover: 1  
Macinando classi:  95% 192/202 [13:09:14<28:18, 169.80s/it] The number of images classified correctly for class ----> goblet: 6  
Macinando classi:  96% 193/202 [13:11:58<25:13, 168.19s/it] The number of images classified correctly for class ----> hummingbird: 38  
Macinando classi:  96% 194/202 [13:22:38<41:17, 309.67s/it] The number of images classified correctly for class ----> apron: 0  
Macinando classi:  97% 195/202 [13:24:48<29:50, 255.73s/it] The number of images classified correctly for class ----> racket: 1  
Macinando classi:  97% 196/202 [13:27:08<22:05, 220.96s/it] The number of images classified correctly for class ----> accordion: 1  
Macinando classi:  98% 197/202 [13:27:41<13:43, 164.66s/it] The number of images classified correctly for class ----> crayfish: 3  
Macinando classi:  98% 198/202 [13:38:11<20:16, 304.22s/it] The number of images classified correctly for class ----> obelisk: 4  
Macinando classi:  99% 199/202 [13:40:27<12:41, 253.72s/it] The number of images classified correctly for class ----> tank: 3  
Macinando classi:  99% 200/202 [13:43:12<07:34, 227.02s/it] The number of images classified correctly for class ----> feather boa: 2  
Macinando classi: 100% 201/202 [13:44:02<02:53, 173.99s/it] The number of images classified correctly for class ----> mosque: 1  
Macinando classi: 100% 202/202 [13:45:31<00:00, 245.21s/it]    
__________________________________________________________
Total number of images: 7450  
Number of correctly classified images: 1056  
Accuracy: 0.141744966442953  