# Classifying  Textual  Data  with  pretrained  Vision  Models through  Transfer  Learning  and  Data Transformations

For implementation of the paper, I used the codes in the author github repository:

https://github.com/EddCBen/Classifying-Textual-Data-with-pretrained-Vision-Models-through-Transfer-Learning-and-Data-Transforms

# BERT Embeddings for Reviews

At first, we want to generate representations for IMDB dataset using the last six layers of
pre-trained BERT-base model:

In [None]:
# import libraries
import torch
import numpy as np
import pandas as pd
import torch.nn as nn
from torch.nn import functional
from transformers import BertTokenizer, BertModel
from pathlib import Path

In [None]:
# Loading Data
imdb_path = Path("/content/drive/MyDrive/IMDB")
df = pd.read_csv(imdb_path / 'IMDB Dataset.csv')
sentences = df['review']

In [None]:
# BERT tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
bertModel = BertModel.from_pretrained('bert-base-uncased', output_hidden_states=True)
bertModel.eval()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(30522, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0-11): 12 x BertLayer(
        (attention): BertAttention(
          (self): BertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
  

In [None]:
# Encoding and getting text embeddings from last 6 layers of BERT
data_list = []
for counter, sent in enumerate(sentences):
    print("Embedding number : {}".format(counter))
    cls_12layers = []
    encoded_sent = tokenizer.encode_plus(
                                    sent,
                                    add_special_tokens = True,
                                    max_length = 512,
                                    padding = 'longest',
                                    truncation = True,
                                    return_attention_mask = True,
                                    return_tensors = 'pt',
                                    return_length = True
                                    )
    with torch.no_grad():
        bertModel.eval()
        output = bertModel.cuda()(encoded_sent['input_ids'].to(torch.device("cuda")))
    hidden_states = output.hidden_states[6:]

    for i,_ in enumerate(hidden_states):
        # output size : 6 * 768
        cls_12layers.append(hidden_states[i].squeeze()[0].cpu())
    cls_12layers = torch.stack(cls_12layers)
    data_list.append(cls_12layers)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Embedding number : 45000
Embedding number : 45001
Embedding number : 45002
Embedding number : 45003
Embedding number : 45004
Embedding number : 45005
Embedding number : 45006
Embedding number : 45007
Embedding number : 45008
Embedding number : 45009
Embedding number : 45010
Embedding number : 45011
Embedding number : 45012
Embedding number : 45013
Embedding number : 45014
Embedding number : 45015
Embedding number : 45016
Embedding number : 45017
Embedding number : 45018
Embedding number : 45019
Embedding number : 45020
Embedding number : 45021
Embedding number : 45022
Embedding number : 45023
Embedding number : 45024
Embedding number : 45025
Embedding number : 45026
Embedding number : 45027
Embedding number : 45028
Embedding number : 45029
Embedding number : 45030
Embedding number : 45031
Embedding number : 45032
Embedding number : 45033
Embedding number : 45034
Embedding number : 45035
Embedding number : 45036
Embedding 

In [None]:
# Save Embeddings
torch.save(torch.stack(data_list),"/content/drive/MyDrive/IMDB/bert-embed/IMDB_cls_last6layers.pt")

# Convert Embeddings to Image

Now, Our embeddings are ready! In the next step, we should generate images for BERT-representations
of the IMDB Dataset using pyDeepInsight from the following paper:

**DeepInsight: A methodology to transform a non-image data to an image
     for convolution neural network architecture**

In [None]:
!python3 -m pip -q install git+https://github.com/alok-ai-lab/pyDeepInsight.git
!pip install umap-learn

  Preparing metadata (setup.py) ... [?25l[?25hdone


In [None]:
# import libraries
from pyDeepInsight import ImageTransformer
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from sklearn.manifold import TSNE
from matplotlib import pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns
import torch
import os
from sklearn.preprocessing import MinMaxScaler
from pathlib import Path

In [None]:
# Loading Data
data_path = Path('/content/drive/MyDrive/IMDB/bert-embed')
data = torch.load(data_path / "IMDB_cls_last6layers.pt")
data = data[:,:-1,:]
data = np.array(data).reshape(50000,-1)

In [None]:
# Use t-SNE and DeepInsight to generate 50 * 50 pixel images from BERT Embeddings
tsne = TSNE(
    n_components=2,
    random_state=1701,
    n_jobs=-1)

it = ImageTransformer(
    feature_extractor=tsne,
    pixels=50)

X_train_img = it.fit_transform(data)
dInsightImages = torch.from_numpy(X_train_img)

In [None]:
del data

In [None]:
del X_train_img

In [None]:
dInsightImages = dInsightImages.cuda()

In [None]:
# Save obtained images
torch.save(dInsightImages, "/content/drive/MyDrive/IMDB/imdb-image/Ready_images-six2elev.pt")

# Training

Our images are ready for training! Firstly we will change some parts of pretrained models based on the paper. Secondly, we will fine-tune them on the IMDB-Image Dataset:

In [2]:
# Import libraries
import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.init as init
import numpy as np
from torch.utils.data import TensorDataset, random_split, DataLoader
from torchvision import models
from torch import LongTensor
from torch.autograd import Variable
from torch.nn import functional
from torch.nn.functional import interpolate
import json

dtype = torch.cuda.FloatTensor
batch_size = 32

## AlexNet

For AlexNet, we will use the first two pretrained convolutional layers which outputs 192 feature maps for each input image:

In [3]:
# Determine feature extractor
def set_parameter_requires_grad(model, train_early=False):

    feature_extractor_early = model.features[0:5]
    if train_early == True:
        for param in feature_extractor_early.parameters():
            param.requires_grad = True
    else:
        for param in feature_extractor_early.parameters():
            param.requires_grad = False

    return feature_extractor_early

def create_feature_extractor(CNNmodel):
    model = CNNmodel
    model_features = set_parameter_requires_grad(model, train_early=False)
    return model_features

pretrained_early = create_feature_extractor(models.alexnet(pretrained=True))

Downloading: "https://download.pytorch.org/models/alexnet-owt-7be5be79.pth" to /root/.cache/torch/hub/checkpoints/alexnet-owt-7be5be79.pth
100%|██████████| 233M/233M [00:01<00:00, 149MB/s]


In [4]:
# AlexNet
class alexnet(nn.Module):
    def __init__(self):
        global batch_size
        super().__init__()
        self.name = "alexnet"
        self.feature_extractor = pretrained_early  # Early Layers PRetrained

        self.conv_auto_encoder = nn.Sequential(
        nn.Conv2d(in_channels=192, out_channels=192, kernel_size=2),
        nn.ReLU(),
        nn.BatchNorm2d(192),
        nn.Conv2d(in_channels=192, out_channels=192, kernel_size=2),
        nn.ReLU(),
	nn.BatchNorm2d(192),
        nn.Conv2d(in_channels=192, out_channels=192, kernel_size=2),
        nn.ReLU(),
	nn.BatchNorm2d(192),
        nn.Conv2d(in_channels=192, out_channels=64, kernel_size=2),
        nn.ReLU(),
        nn.BatchNorm2d(64)
	)
        self.Adaptiveavgpool = nn.AdaptiveAvgPool2d(5)
        self.classifier = nn.Sequential(
        nn.Dropout(p=0.3, inplace=False),
        nn.Linear(1600, 700),
        nn.ReLU(inplace=True),
        nn.Dropout(p=0.3, inplace=False),
        nn.Linear(700, 50),
        nn.ReLU(inplace=True),
        nn.Linear(50, 2)
        )

        self.softmax = nn.Softmax(dim=1)

    def forward(self, input_embedding):
        preTrained_features = self.feature_extractor(input_embedding)
        found_features = self.conv_auto_encoder(preTrained_features)
        found_features = self.Adaptiveavgpool(found_features)
        conv_shape = found_features.shape

        try:
            found_features = found_features.contiguous().view(batch_size, conv_shape[1] * conv_shape[2]*conv_shape[3])
        except Exception as e:
            found_features = found_features.contiguous().view(16, conv_shape[1] * conv_shape[2]*conv_shape[3])

        logits = self.classifier(found_features)

        return logits

# ResNext

For ResNext, we will utilize the first convolutional layer as well as the first residual layer:

In [5]:
# Determine feature extractor layers
def resnext_frozen():
    resnext = models.resnext50_32x4d(pretrained=True)
    feature_extractor = nn.Sequential(resnext.conv1,
                                      resnext.bn1,
                                      resnext.relu,
				                              resnext.maxpool,
				                              resnext.layer1)

    for param in feature_extractor.parameters():
        param.requires_grad = False

    del resnext
    return feature_extractor

In [6]:
# ResNext
class resnext(nn.Module):
    def __init__(self):
        global batch_size
        super().__init__()
        self.name = "resnext"
        self.feature_extractor = resnext_frozen()
        self.conv_auto_encoder = nn.Sequential(
                        nn.Conv2d(in_channels=256, out_channels=192, kernel_size=4),
                        nn.ReLU(),
                        nn.BatchNorm2d(192),
                        nn.Dropout(p=0.05),
                        nn.Conv2d(in_channels=192, out_channels=128, kernel_size=4),
                        nn.ReLU(),
                        nn.BatchNorm2d(128),
                        nn.Dropout(p=0.05),
                        nn.Conv2d(in_channels=128, out_channels=64, kernel_size=4),
                        nn.ReLU(),
                        nn.BatchNorm2d(64),
                        nn.Dropout(p=0.05),
                        nn.Conv2d(in_channels=64, out_channels=32, kernel_size=4),
                        nn.ReLU(),
                        nn.BatchNorm2d(32),
                                        )
        self.adaptavgpool = nn.AdaptiveAvgPool2d(10)

        self.classifier = nn.Sequential(
                        nn.Linear(3200,1600),
                        nn.ReLU(inplace=True),
                        nn.Dropout(p=0.3, inplace = False),
                        nn.Linear(1600,700),
                        nn.ReLU(inplace=True),
                        nn.Dropout(p=0.3, inplace = False),
                        nn.Linear(700,50),
                        nn.ReLU(inplace=True),
                        nn.Linear(50,2)
                        )

    def forward(self, input_embedding):
        from_pretrained = self.feature_extractor(input_embedding)
        from_init = self.conv_auto_encoder(from_pretrained)
        pooled_features = self.adaptavgpool(from_init)

        pooled_features = pooled_features.contiguous().view(pooled_features.shape[0],-1)

        logits = self.classifier(pooled_features)

        return logits


## ResNet

For ResNet, we will use the first downsampling convolutional layer and the first residual layer:

In [7]:
# Define feature extractor layers
def resnet_layer1_frozen():
    resnet = models.wide_resnet50_2(pretrained=True)
    feature_extractor = nn.Sequential(resnet.conv1,
                                        resnet.bn1,
    			                resnet.relu,
				        resnet.maxpool,
				        resnet.layer1)

    for param in feature_extractor.parameters():
        param.requires_grad = False

    del resnet
    return feature_extractor

In [8]:
# ResNet
class resnet(nn.Module):
    def __init__(self):
        global batch_size
        super().__init__()
        self.name = "resnet"
        self.feature_extractor = resnet_layer1_frozen()
        self.conv_auto_encoder = nn.Sequential(
			    nn.Conv2d(in_channels=256, out_channels=192, kernel_size=4),
                            nn.ReLU(),
			    nn.BatchNorm2d(192),
			    nn.Dropout(p=0.2),
                            nn.Conv2d(in_channels=192, out_channels=128, kernel_size=4),
                            nn.ReLU(),
                            nn.BatchNorm2d(128),
			    nn.Dropout(p=0.2),
                            nn.Conv2d(in_channels=128, out_channels=64, kernel_size=4),
                            nn.ReLU(),
                            nn.BatchNorm2d(64),
			    nn.Dropout(p=0.2),
                            nn.Conv2d(in_channels=64, out_channels=32, kernel_size=4),
                            nn.ReLU(),
                            nn.BatchNorm2d(32),
										    )
        self.adaptavgpool = nn.AdaptiveAvgPool2d(10)
        self.classifier = nn.Sequential(
				    nn.Linear(3200,1600),
				    nn.ReLU(inplace=True),
				    nn.Dropout(p=0.3, inplace = False),
				    nn.Linear(1600,700),
				    nn.ReLU(inplace=True),
				    nn.Dropout(p=0.3, inplace = False),
		        	    nn.Linear(700,50),
				    nn.ReLU(inplace=True),
				    nn.Linear(50,2)
						)

    def forward(self, input_embedding):
      from_pretrained = self.feature_extractor(input_embedding)
      from_init = self.conv_auto_encoder(from_pretrained)
      pooled_features = self.adaptavgpool(from_init)
      pooled_features = pooled_features.contiguous().view(pooled_features.shape[0],-1)
      logits = self.classifier(pooled_features)

      return logits

## ShuffleNet

For ShuffleNet, we used the ﬁrst convolutional layer followed by batch normalization, and stage2 mentioned in the shuffle-net paper:

In [9]:
# Determine feature extractor layers
def shuffleNet_frozen():
    shufflenet = models.shufflenet_v2_x1_0(pretrained=True)
    feature_extractor = nn.Sequential(shufflenet.conv1,
                                      shufflenet.maxpool,
                                      shufflenet.stage2)

    for param in feature_extractor.parameters():
        param.requires_grad = True

    del shufflenet
    return feature_extractor

In [10]:
# shuffleNet
class shufflenet(nn.Module):
    def __init__(self):
        global batch_size
        super().__init__()
        self.name = "shufflenet"
        self.feature_extractor = shuffleNet_frozen()
        self.conv_auto_encoder = nn.Sequential(
                        nn.Conv2d(in_channels=116, out_channels=192, kernel_size=4),
                        nn.ReLU(),
                        nn.BatchNorm2d(192),
                        nn.Dropout(p=0.05),
                        nn.Conv2d(in_channels=192, out_channels=128, kernel_size=4),
                        nn.ReLU(),
                        nn.BatchNorm2d(128),
                        nn.Dropout(p=0.05),
                        nn.Conv2d(in_channels=128, out_channels=64, kernel_size=4),
                        nn.ReLU(),
                        nn.BatchNorm2d(64),
                        nn.Dropout(p=0.05),
                        nn.Conv2d(in_channels=64, out_channels=32, kernel_size=4),
                        nn.ReLU(),
                        nn.BatchNorm2d(32),
                                        )
        self.adaptavgpool = nn.AdaptiveAvgPool2d(10)

        self.classifier = nn.Sequential(
                        nn.Linear(3200,1600),
                        nn.ReLU(inplace=True),
                        nn.Dropout(p=0.3, inplace = False),
                        nn.Linear(1600,700),
                        nn.ReLU(inplace=True),
                        nn.Dropout(p=0.3, inplace = False),
                        nn.Linear(700,50),
                        nn.ReLU(inplace=True),
                        nn.Linear(50,2)
                        )

    def forward(self, input_embedding):
        from_pretrained = self.feature_extractor(input_embedding)
        from_init = self.conv_auto_encoder(from_pretrained)
        pooled_features = self.adaptavgpool(from_init)

        pooled_features = pooled_features.contiguous().view(pooled_features.shape[0],-1)

        logits = self.classifier(pooled_features)

        return logits

## VGG16

 In case of VGG16, we use the ﬁrst 12 layers, containing 4 Convolutional layers for the feature extractor in this experiment:

In [11]:
# Determine feature extractor layers
def vgg16_frozen():
    vgg16 = models.vgg16(pretrained=True)
    feature_extractor = vgg16.features[:12]

    for param in feature_extractor.parameters():
        param.requires_grad = False

    del vgg16
    return feature_extractor

In [12]:
# VGG16
class vgg16(nn.Module):
    def __init__(self):
        global batch_size
        super().__init__()
        self.name = "vgg16"
        self.feature_extractor = vgg16_frozen()
        self.conv_auto_encoder = nn.Sequential(
                        nn.Conv2d(in_channels=256, out_channels=192, kernel_size=4),
                        nn.ReLU(),
                        nn.BatchNorm2d(192),
                        nn.Dropout(p=0.05),
                        nn.Conv2d(in_channels=192, out_channels=128, kernel_size=4),
                        nn.ReLU(),
                        nn.BatchNorm2d(128),
                        nn.Dropout(p=0.05),
                        nn.Conv2d(in_channels=128, out_channels=64, kernel_size=4),
                        nn.ReLU(),
                        nn.BatchNorm2d(64),
                        nn.Dropout(p=0.05),
                        nn.Conv2d(in_channels=64, out_channels=32, kernel_size=4),
                        nn.ReLU(),
                        nn.BatchNorm2d(32),
                                        )
        self.adaptavgpool = nn.AdaptiveAvgPool2d(10)

        self.classifier = nn.Sequential(
                        nn.Linear(3200,1600),
                        nn.ReLU(inplace=True),
                        nn.Dropout(p=0.3, inplace = False),
                        nn.Linear(1600,700),
                        nn.ReLU(inplace=True),
                        nn.Dropout(p=0.3, inplace = False),
                        nn.Linear(700,50),
                        nn.ReLU(inplace=True),
                        nn.Linear(50,2)
                        )

    def forward(self, input_embedding):
        from_pretrained = self.feature_extractor(input_embedding)
        from_init = self.conv_auto_encoder(from_pretrained)
        pooled_features = self.adaptavgpool(from_init)

        pooled_features = pooled_features.contiguous().view(pooled_features.shape[0],-1)

        logits = self.classifier(pooled_features)

        return logits

## Time to train

After making our CNN models ready, it's time to train them on IMDB-Image Dataset:

In [13]:
# Import libraries
import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.init as init
import numpy as np
from torch.nn import functional
from torch.nn.functional import interpolate
from pathlib import Path
from torch.utils.data import TensorDataset, DataLoader, Dataset, random_split
import sys

np.random.seed(42)
torch.manual_seed(42)
device = torch.device("cuda:0")

In [14]:
# Load IMDB dataset to get labels
import pandas as pd
df = pd.read_csv("/content/drive/MyDrive/IMDB/IMDB Dataset.csv")
df.head()

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive


In [15]:
# Encoding labels
from sklearn.preprocessing import LabelBinarizer
enc = LabelBinarizer()
y = enc.fit_transform(df['sentiment'])

In [16]:
# Make labels ready for training
from keras.utils import to_categorical

categorical_labels = to_categorical(y, num_classes=2)

In [17]:
# Loading Images
data = torch.load("/content/drive/MyDrive/IMDB/imdb-image/Ready_images-six2elev.pt",map_location='cuda:0')
labels = torch.tensor(categorical_labels)

In [18]:
# Create dataset for training
def create_dataset(input_embedding, input_labels):
	global batch_size
	dataset = TensorDataset(input_embedding.type(dtype).cuda(),
			            input_labels.type(dtype).cuda())
	#Splits
	train_size = int(0.8 * len(dataset))
	val_size = int(0.2 * len(dataset))

	train_dataset, val_dataset = random_split(dataset, [train_size, val_size])
	train_loader = DataLoader(dataset=train_dataset, batch_size = batch_size,shuffle=True)
	val_loader = DataLoader(dataset = val_dataset, batch_size = batch_size,shuffle=True)
	return train_loader, val_loader

train_loader, val_loader = create_dataset(data, labels)

del data, labels

In [19]:
# Helper function to init layer weights
def initialize_parameters(m) -> None:
	if isinstance(m, nn.Linear):
		m.weight.data = init.xavier_uniform_(m.weight.data,
						    gain=nn.init.calculate_gain('relu'))
	if isinstance(m, nn.Conv2d):
		m.weight.data = init.xavier_normal_(m.weight.data)

In [20]:
# Learning rates for different models
learning_rates = {'alexnet': {"CAE LR": 0.00001, "LC LR":0.0005},
                    'resnet': {"CAE LR": 0.00005, "LC LR":0.0001},
                    'resnext': {"CAE LR": 0.00005, "LC LR":0.001 },
                    'shufflenet': {"CAE LR": 0.0005, "LC LR":0.001},
                    'vgg16': {"CAE LR": 0.00005, "LC LR":0.001}
                }

In [21]:
# Model dictionary
model_dict = {'alexnet': alexnet(),
              'resnet': resnet(),
              'resnext': resnext(),
              'shufflenet': shufflenet(),
              'vgg16':vgg16()}

Downloading: "https://download.pytorch.org/models/wide_resnet50_2-95faca4d.pth" to /root/.cache/torch/hub/checkpoints/wide_resnet50_2-95faca4d.pth
100%|██████████| 132M/132M [00:03<00:00, 44.7MB/s]
Downloading: "https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth" to /root/.cache/torch/hub/checkpoints/resnext50_32x4d-7cdf4587.pth
100%|██████████| 95.8M/95.8M [00:01<00:00, 83.7MB/s]
Downloading: "https://download.pytorch.org/models/shufflenetv2_x1-5666bf0f80.pth" to /root/.cache/torch/hub/checkpoints/shufflenetv2_x1-5666bf0f80.pth
100%|██████████| 8.79M/8.79M [00:00<00:00, 73.6MB/s]
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/hub/checkpoints/vgg16-397923af.pth
100%|██████████| 528M/528M [00:06<00:00, 82.9MB/s]


In [22]:
# Method to create the desired model
def create_model(model_name):
    model = model_dict[model_name]
    model.conv_auto_encoder.apply(initialize_parameters)
    model.classifier.apply(initialize_parameters)
    model = model.cuda()
    return model

In [23]:
# the whole process of training and evaluation of a model
def train_and_evaluate():
    # Training settings and results
    train_total = 0
    train_correct = 0
    val_total = 0
    val_correct = 0
    train_accuracies = []
    val_accuracies = []
    train_losses = []
    val_losses = []
    model_status_dict = []
    Epochs = 15
    iteration  = 0
    step = 0
    val_step = 0

    for epoch in range(Epochs):
        for i, (input_batch, label) in enumerate(train_loader):
            model.train()
            input_batch = scale_image_batch(input_batch)
            input_batch = z_normalize(input_batch)
            label = label.contiguous().view(batch_size, 2)
            label = torch.max(label.long().to(device),1)[1]
            output = model(input_batch)
            _, predicted = torch.max(output.data, 1)
            train_total += label.size(0)
            train_correct += (predicted == label).sum().item()
            train_accuracy = train_correct/train_total
            train_accuracies.append(train_accuracy)
            loss = criterion(output, label)
            train_losses.append(loss.item())
            step += 1
            opt_model(loss)
            iteration += 1

            if iteration %50 == 0:
                model.eval()
                for j,(val_input_batch, val_label) in enumerate(val_loader):
                    val_input_batch = scale_image_batch(val_input_batch)
                    val_input_batch = z_normalize(val_input_batch)
                    try:
                        val_label = val_label.contiguous().view(batch_size,2)
                    except Exception as e:
                        val_label = val_label.contiguous().view(16,2)
                    val_label = torch.max(val_label.long().to(device),1)[1]
                    val_output = model(val_input_batch)
                    _, val_predicted = torch.max(val_output.data, 1)
                    val_total += val_label.size(0)
                    val_correct += (val_predicted == val_label).sum().item()
                    val_accuracy = val_correct/val_total
                    val_accuracies.append(val_accuracy)
                    val_loss = criterion(val_output, val_label)
                    val_step += 1
                    val_losses.append(val_loss.item())

                try:
                    print(f"""    epoch: {epoch + 1}
                    \t     Train Loss : {np.mean(train_losses)}
                    \t     Validation Loss : {np.mean(val_losses)}
                    \t     Training Accuracy : {train_accuracy}
                    \t     Validation Accuracy : {val_accuracy}
                    """)

                    m_dict = {
                        'epoch': epoch + 1,
                        'Train Loss' : np.mean(train_losses),
                        'Validation Loss' : np.mean(val_losses),
                        'Training Accuracy' : train_accuracy,
                        'Validation Accuracy' : val_accuracy
                            }

                    # Save model status
                    model_status_dict.append(m_dict)

                except Exception as e:
                    print(e)
                    continue

    #Saving Validation Losses, Accuracies and states
    save_path = "/content/drive/MyDrive/IMDB/results"
    torch.save(val_losses, save_path + "/" + str(model.name)+"_val_losses.pt")
    torch.save(val_accuracies , save_path + "/" + str(model.name)+"_val_accs.pt")
    with open(save_path + "/" + str(model.name)+"_status.txt" , 'w') as file:
      json.dump(model_status_dict, file)


In [24]:
# optimizing model
def opt_model(loss)-> None:
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

In [25]:
#Scaling and Z-normalizing images
upsample = nn.Upsample(scale_factor = 3, mode = "nearest")

def scale_image_batch(image_batch) -> torch.Tensor:
    a = torch.movedim(image_batch, -1,1)
    scaled_batch = upsample(a)
    return scaled_batch.cuda()



def z_normalize(input_tensor) -> torch.Tensor:
    mean = input_tensor.mean()
    std = input_tensor.std()
    up = torch.sub(input_tensor, mean)
    down = torch.add(std**2, 0.0001**2)
    return torch.div(up,torch.sqrt(down))

### AlexNet

In the paper, it was mentioned that AlexNet achieved 0.87 accuracy on validation but here we achieved 0.79 accuracy:

In [None]:
model = create_model('alexnet')

optimizer = optim.Adam([{'params': model.feature_extractor.parameters()},
    {'params': model.conv_auto_encoder.parameters(), 'lr' : learning_rates[str(model.name)]['CAE LR']},
    {'params': model.classifier.parameters(), 'lr' : learning_rates[str(model.name)]['LC LR']}],
    lr=0.0)

criterion = nn.CrossEntropyLoss()

train_and_evaluate()

    epoch: 1
                    	     Train Loss : 0.8706789493560791
                    	     Validation Loss : 0.656785101745837
                    	     Training Accuracy : 0.5325
                    	     Validation Accuracy : 0.626
                    
    epoch: 1
                    	     Train Loss : 0.7716198515892029
                    	     Validation Loss : 0.6501881605900895
                    	     Training Accuracy : 0.5640625
                    	     Validation Accuracy : 0.6285
                    
    epoch: 1
                    	     Train Loss : 0.7309416969617207
                    	     Validation Loss : 0.6379981338025663
                    	     Training Accuracy : 0.5802083333333333
                    	     Validation Accuracy : 0.6381
                    
    epoch: 1
                    	     Train Loss : 0.7049411226809025
                    	     Validation Loss : 0.6228290775332588
                    	     Training Accuracy : 0.5990625
        

### ResNet

In the paper, the best result of ResNet on validation was 0.85 but we achieved 0.80:

In [None]:
model = create_model('resnet')

optimizer = optim.Adam([{'params': model.feature_extractor.parameters()},
    {'params': model.conv_auto_encoder.parameters(), 'lr' : learning_rates[str(model.name)]['CAE LR']},
    {'params': model.classifier.parameters(), 'lr' : learning_rates[str(model.name)]['LC LR']}],
    lr=0.0)

criterion = nn.CrossEntropyLoss()

train_and_evaluate()

    epoch: 1
                    	     Train Loss : 0.8575129628181457
                    	     Validation Loss : 0.8629978181074222
                    	     Training Accuracy : 0.536875
                    	     Validation Accuracy : 0.4967
                    
    epoch: 1
                    	     Train Loss : 0.7885432541370392
                    	     Validation Loss : 0.7803108251323334
                    	     Training Accuracy : 0.546875
                    	     Validation Accuracy : 0.51185
                    
    epoch: 1
                    	     Train Loss : 0.757962236404419
                    	     Validation Loss : 0.7430868788014332
                    	     Training Accuracy : 0.5660416666666667
                    	     Validation Accuracy : 0.5273333333333333
                    
    epoch: 1
                    	     Train Loss : 0.7340668308734893
                    	     Validation Loss : 0.7886733762658061
                    	     Training Accuracy : 0.5

### ResNext

The ResNext accuracy on validation was 0.85 but here we have 0.81:

In [26]:
model = create_model('resnext')

optimizer = optim.Adam([{'params': model.feature_extractor.parameters()},
    {'params': model.conv_auto_encoder.parameters(), 'lr' : learning_rates[str(model.name)]['CAE LR']},
    {'params': model.classifier.parameters(), 'lr' : learning_rates[str(model.name)]['LC LR']}],
    lr=0.0)

criterion = nn.CrossEntropyLoss()

train_and_evaluate()

    epoch: 1
                    	     Train Loss : 0.9217997193336487
                    	     Validation Loss : 0.6492835372781601
                    	     Training Accuracy : 0.5475
                    	     Validation Accuracy : 0.6291
                    
    epoch: 1
                    	     Train Loss : 0.7712770956754684
                    	     Validation Loss : 0.5970987487620059
                    	     Training Accuracy : 0.60125
                    	     Validation Accuracy : 0.67655
                    
    epoch: 1
                    	     Train Loss : 0.6982699722051621
                    	     Validation Loss : 0.5603028067702048
                    	     Training Accuracy : 0.6414583333333334
                    	     Validation Accuracy : 0.7057666666666667
                    
    epoch: 1
                    	     Train Loss : 0.6501203415542841
                    	     Validation Loss : 0.5459037795900917
                    	     Training Accuracy : 0.670

### ShuffleNet

0.86 was the accuracy of ShuffleNet in the paper but here we have 0.81:

In [None]:
model = create_model('shufflenet')

optimizer = optim.Adam([{'params': model.feature_extractor.parameters()},
    {'params': model.conv_auto_encoder.parameters(), 'lr' : learning_rates[str(model.name)]['CAE LR']},
    {'params': model.classifier.parameters(), 'lr' : learning_rates[str(model.name)]['LC LR']}],
    lr=0.0)

criterion = nn.CrossEntropyLoss()

train_and_evaluate()

    epoch: 1
                    	     Train Loss : 0.8882088911533356
                    	     Validation Loss : 0.6747247121585443
                    	     Training Accuracy : 0.5425
                    	     Validation Accuracy : 0.5384
                    
    epoch: 1
                    	     Train Loss : 0.7437540575861931
                    	     Validation Loss : 0.6080252870012777
                    	     Training Accuracy : 0.6115625
                    	     Validation Accuracy : 0.631
                    
    epoch: 1
                    	     Train Loss : 0.6849273810784022
                    	     Validation Loss : 0.57348513301696
                    	     Training Accuracy : 0.6435416666666667
                    	     Validation Accuracy : 0.6717
                    
    epoch: 1
                    	     Train Loss : 0.6462773324549198
                    	     Validation Loss : 0.5476663155963246
                    	     Training Accuracy : 0.66875
           

### VGG16

In the paper, VGG16 achieved 0.86 accuracy on teh validation set but here we achieved 0.81:

In [None]:
model = create_model('vgg16')

optimizer = optim.Adam([{'params': model.feature_extractor.parameters()},
    {'params': model.conv_auto_encoder.parameters(), 'lr' : learning_rates[str(model.name)]['CAE LR']},
    {'params': model.classifier.parameters(), 'lr' : learning_rates[str(model.name)]['LC LR']}],
    lr=0.0)

criterion = nn.CrossEntropyLoss()

train_and_evaluate()

    epoch: 1
                    	     Train Loss : 0.8304985892772675
                    	     Validation Loss : 0.6213187615330608
                    	     Training Accuracy : 0.556875
                    	     Validation Accuracy : 0.6699
                    
    epoch: 1
                    	     Train Loss : 0.7149149578809738
                    	     Validation Loss : 0.5874904777866583
                    	     Training Accuracy : 0.625
                    	     Validation Accuracy : 0.6916
                    
    epoch: 1
                    	     Train Loss : 0.6679607278108597
                    	     Validation Loss : 0.5716650282866912
                    	     Training Accuracy : 0.649375
                    	     Validation Accuracy : 0.7015
                    
    epoch: 1
                    	     Train Loss : 0.633828731328249
                    	     Validation Loss : 0.5781076107971584
                    	     Training Accuracy : 0.67140625
                  