# Corpus Nummorum (CN) Data Challange Top 5 Accuarcies

---

**Goethe University Frankfurt am Main**

Summer Semester 2023

<br>

## *Multimodal Fusion Model for  historical coin classification*

---

**Authors:** Bastian Rothenburger, Garegin Ktoian 
<br>

**Contact:** Bastian Rothenburger ([s7072002@rz.uni-frankfurt.de](mailto:s7072002@rz.uni-frankfurt.de))<br>

---

<br>

## Table of Contents
  - [Setup](#setup)
- [1 Types](#1-types)
  - [1.1 ResNet18 Base](#11-resnet18-base)
    - [1.1.1 ImageEncoder Resnet18](#111-imageencoder-resnet18)
    - [1.1.2 TextEncoder hidden size 10000](#112-textencoder-hidden-size-10000)
    - [1.1.3 Fusion Model ResNet18 hidden size 10000](#113-fusion-model-resnet18-hidden-size-10000)
  - [1.2 ResNet101 Base](#12-resnet101-base)
    - [1.2.1 ImageEncoder ResNet101](#121-imageencoder-resnet101)
    - [1.2.2 TextEncoder hidden size 12000](#122-textencoder-hidden-size-12000)
    - [1.2.3 Fusion Model ResNet101 hidden size 12000](#123-fusion-model-resnet101-hidden-size-12000)
- [2 Mints Multimodal](#2-mints-multimodal)
  - [2.1 ResNet18 Base](#21-resnet18-base)
    - [2.1.1 ImageEncoder ResNet18](#211-imageencoder-resnet18)
    - [2.1.2 TextEncoder hidden size](#212-textencoder-hidden-size)
    - [2.1.3 Fusion Model Resnet18 10000 hidden size](#213-fusion-model-resnet18-10000-hidden-size)
  - [2.2 ResNet101 Base](#22-resnet101-base)
    - [2.2.1 ImageEncoder ResNet 101](#221-imageencoder-resnet-101)
    - [2.2.2 TextEncoder hidden size 12000](#222-textencoder-hidden-size-12000)
    - [2.2.3 Fusion Model ResNet101 hidden size 12000](#223-fusion-model-resnet101-hidden-size-12000)
- [3 Mints Image Only](#3-mints-image-only)
  - [3.1 Resnet 18](#31-resnet-18)
  - [3.2 ResNet50](#32-resnet50)
  - [3.3 ResNet101](#33-resnet101)
  - [3.4 ResNet 50 + Augments](#34-resnet-50-+-augments)
  - [3.5 ResNet101 + Augemnts](#35-resnet101-+-augemnts)

<br>

## Setup

---
General Remarks:
Some Modules have to be adjusted in order to work properly
For all type models you have to change the defintion of TextEmbeddings in CoinDataset.
For all Fusion models with another base than ResNet18 you have to change it in the constructur of 
the Fusion model class in the architectures modul

In [1]:

from ModelTrainer import ModelTrainer
import torchvision
from torchvision.models.resnet import resnet50, ResNet50_Weights, resnet18, ResNet18_Weights, resnet101, ResNet101_Weights 
from torchvision.models import swin_v2_b ,Swin_V2_B_Weights
import torchvision.models as models
import torch
import torchvision.transforms as transforms
import torch.nn as nn
import pandas as pd
import os
import matplotlib.pyplot as plt
import numpy as np
from Architectures import TextClassefier, Transformer_TextClassifier, Fusion, Fusion_From_Scratch
# Check GPU support on your machine.
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

print(device)
%load_ext autoreload
%autoreload 2

  "class": algorithms.Blowfish,


cuda:0


# 1 Types

In [7]:

dir = r"C:\Users\gar43\OneDrive\Documents\DataChallenge\multimodal - Types"
model_save_path = dir +"\\models"

# specifies image data 
train, val = dir+"\\train.csv", dir+"\\val.csv"

# specifies language data
train_emb, val_emb   = dir+'\\embeddings_train_alternative.npy', dir+'\\embeddings_val_alternative.npy'


# specifies output feature size for imageencoder, textencoder and fusionmodel 
output_features = len(pd.read_csv(train, delimiter=',', skiprows=0, low_memory=False, encoding='iso-8859-1')["class"].unique())


"""
set to true if you want to apply augments
should increase performance slightly (roughly 2-3%)
but also increases training time
"""
use_augments = False

if use_augments:
    train_augmentations = transforms.Compose([
    transforms.ToPILImage(),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
    transforms.RandomRotation(15),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ToTensor(),  # convert images to tensors
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),  # normalize images
    transforms.Resize((299, 299))
    ])
else:
    train_augmentations = transforms.Compose([
    transforms.ToTensor(),  # convert images to tensors
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),  # normalize images
    transforms.Resize((299, 299))
    ])

(95, 10, 1536)


## 1.1 ResNet18 Base

---

## 1.1.1 ImageEncoder Resnet18

---

In [3]:
model = models.resnet18(weights=ResNet18_Weights.DEFAULT)

model.fc = nn.Linear(model.fc.in_features, output_features)

# hyperparameters can be set here. Check ModelTrainer.py for details. If not specified differently default values of the Modeltrainer have been used 
Solver = ModelTrainer(model=model,  
            train_path=train,
            train_embedding_path=train_emb, 
            val_path=val,
            val_embedding_path=val_emb, 
            train_augmentations=train_augmentations,
            save_path=model_save_path,
            postfix='ImageEncoder_ResNet18',
            batch_size=16
            )

cuda
2445
torch.Size([16, 3, 299, 299]) torch.Size([16, 95]) torch.Size([16, 1536])
data loaded


In [4]:
Solver.load(model_save_path+"\\modelImageEncoder_ResNet18.tar")
Solver.evaluate()

Validation Top-1 Accuracy: 0.8103
Validation Top-5 Accuracy: 0.9768


(tensor(0.8103, device='cuda:0', dtype=torch.float64), 0.9767741935483871)

## 1.1.2 TextEncoder hidden size 10000

---

In [3]:

model = TextClassefier(input_size=1536, hidden_size=10000, output_size=output_features)

# hyperparameters can be set here. Check ModelTrainer.py for details. If not specified differently default values of the Modeltrainer have been used 
Solver = ModelTrainer(model=model,  
            train_path=train, 
           train_embedding_path=train_emb, 
            val_path=val,
            val_embedding_path=val_emb, 
            train_augmentations=train_augmentations,
            save_path=model_save_path,
            postfix='TextEncoder',
            batch_size=4
            )

cuda
2445
torch.Size([4, 3, 299, 299]) torch.Size([4, 95]) torch.Size([4, 1536])
data loaded


In [4]:
Solver.load(model_save_path+"\\modelTextEncoder.tar")
accs = []
top_5_accs = []

for i in range(10):
    acc, top_5_acc = Solver.evaluate(multimodal=True, show=False)
    accs.append(acc.cpu().numpy())
    top_5_accs.append(top_5_acc)
print('\n This is the average:')    
print(f'Validation Top-1 Accuracy: {np.array(accs).sum()/10:.4f}')
print(f'Validation Top-5 Accuracy: {np.array(top_5_accs).sum()/10:.4f}')


 This is the average:
Validation Top-1 Accuracy: 0.8190
Validation Top-5 Accuracy: 0.9947


## 1.1.3 Fusion Model ResNet18 hidden size 10000

---

In [4]:
TextModel = TextClassefier(input_size=1536, hidden_size=10000, output_size=output_features)
image_encoder = dir + "\\models\modelImageEncoder_ResNet18.tar"
text_encoder = dir + "\\models\modelTextEncoder.tar"
model = Fusion(image_encoder, text_encoder, TextModel, output_size=output_features)

# hyperparameters can be set here. Check ModelTrainer.py for details. If not specified differently default values of the Modeltrainer have been used 
Solver = ModelTrainer(model=model,  
            train_path=train, 
           train_embedding_path=train_emb, 
            val_path=val,
            val_embedding_path=val_emb, 
            train_augmentations=train_augmentations,
            save_path=model_save_path,
            postfix='TR0_Res18_types_with_des_78_multimodal',
            batch_size=4,
            )

cuda
2445
torch.Size([4, 3, 299, 299]) torch.Size([4, 95]) torch.Size([4, 1536])
data loaded


In [5]:
Solver.load(model_save_path+"\\modelTR0_Res18_types_with_des_78_multimodal.tar")
accs = []
top_5_accs = []

for i in range(10):
    acc, top_5_acc = Solver.evaluate(multimodal=True, show=False)
    accs.append(acc.cpu().numpy())
    top_5_accs.append(top_5_acc)
print('\n This is the average:')    
print(f'Validation Top-1 Accuracy: {np.array(accs).sum()/10:.4f}')
print(f'Validation Top-5 Accuracy: {np.array(top_5_accs).sum()/10:.4f}')


 This is the average:
Validation Top-1 Accuracy: 0.8441
Validation Top-5 Accuracy: 0.9955


# 1.2 ResNet101 Base

---

In [3]:
model_save_path = r"F:\Users\basti\Documents\Goethe Uni\Data Challange\top5_accs\typesmodel"

## 1.2.1 ImageEncoder ResNet101

In [9]:
model = models.resnet101(weights=ResNet101_Weights.DEFAULT)

model.fc = nn.Linear(model.fc.in_features, output_features)

# hyperparameters can be set here. Check ModelTrainer.py for details. If not specified differently default values of the Modeltrainer have been used 
Solver = ModelTrainer(model=model,  
            train_path=train,
            train_embedding_path=train_emb, 
            val_path=val,
            val_embedding_path=val_emb, 
            train_augmentations=train_augmentations,
            save_path=model_save_path,
            postfix='ImageEncoder_ResNet18',
            batch_size=16
            )

cuda
2445
torch.Size([16, 3, 299, 299]) torch.Size([16, 95]) torch.Size([16, 1536])
data loaded


In [10]:
Solver.load(model_save_path+"\\modelImageEncoder_ResNet101.tar")
Solver.evaluate()

Validation Top-1 Accuracy: 0.8245
Validation Top-5 Accuracy: 0.9819


(tensor(0.8245, device='cuda:0', dtype=torch.float64), 0.9819354838709677)

## 1.2.2 TextEncoder hidden size 12000

---

In [4]:

model = TextClassefier(input_size=1536, hidden_size=12000, output_size=output_features)

# hyperparameters can be set here. Check ModelTrainer.py for details. If not specified differently default values of the Modeltrainer have been used 
Solver = ModelTrainer(model=model,  
            train_path=train, 
           train_embedding_path=train_emb, 
            val_path=val,
            val_embedding_path=val_emb, 
            train_augmentations=train_augmentations,
            save_path=model_save_path,
            postfix='TextEncoder',
            batch_size=4
            )

cuda
2445
torch.Size([4, 3, 299, 299]) torch.Size([4, 95]) torch.Size([4, 1536])
data loaded


In [5]:
# because of our dataset construction the behavior is non deterministic, so we average over ten evaluation runs
Solver.load(model_save_path+"\\modelTextEncoder.tar")
accs = []
top_5_accs = []

for i in range(10):
    acc, top_5_acc = Solver.evaluate(multimodal=True, show=False)
    accs.append(acc.cpu().numpy())
    top_5_accs.append(top_5_acc)
print('\n This is the average:')    
print(f'Validation Top-1 Accuracy: {np.array(accs).sum()/10:.4f}')
print(f'Validation Top-5 Accuracy: {np.array(top_5_accs).sum()/10:.4f}')
    


 This is the average:
Validation Top-1 Accuracy: 0.8173
Validation Top-5 Accuracy: 0.9952


## 1.2.3 Fusion Model ResNet101 hidden size 12000

---

In [3]:
TextModel = TextClassefier(input_size=1536, hidden_size=12000, output_size=output_features)
image_encoder = model_save_path+"\\modelImageEncoder_ResNet101.tar"
text_encoder = model_save_path+"\\modelTextEncoder.tar"
model = Fusion(image_encoder, text_encoder, TextModel, output_size=output_features)

# hyperparameters can be set here. Check ModelTrainer.py for details. If not specified differently default values of the Modeltrainer have been used 
Solver = ModelTrainer(model=model,  
            train_path=train, 
           train_embedding_path=train_emb, 
            val_path=val,
            val_embedding_path=val_emb, 
            train_augmentations=train_augmentations,
            save_path=model_save_path,
            postfix='TR0_Res18_types_with_des_78_multimodal',
            batch_size=4,
            )

cuda
2446
torch.Size([4, 3, 299, 299]) torch.Size([4, 95]) torch.Size([4, 1536])
data loaded


In [4]:
Solver.load(model_save_path+"\\modelTR0_Res101_types_with_des_78_multimodal.tar")
#Solver.evaluate(multimodal=True)


accs = []
top_5_accs = []

for i in range(10):
    acc, top_5_acc = Solver.evaluate(multimodal=True, show=False)
    accs.append(acc.cpu().numpy())
    top_5_accs.append(top_5_acc)
print('\n This is the average:')    
print(f'Validation Top-1 Accuracy: {np.array(accs).sum()/10:.4f}')
print(f'Validation Top-5 Accuracy: {np.array(top_5_accs).sum()/10:.4f}')



 This is the average:
Validation Top-1 Accuracy: 0.8181
Validation Top-5 Accuracy: 0.9926


# 2 Mints Multimodal

---

In [2]:
dir = r"C:\Users\gar43\OneDrive\Documents\DataChallenge\Multimodal - Mints"
model_save_path = dir +"\\models"

# specifies image data 
train, val = dir+"\\train.csv", dir+"\\val.csv"

# specifies language data
train_emb, val_emb   = dir+'\\embeddings_train.npy', dir+'\\embeddings_val.npy'

# specifies output feature size for imageencoder, textencoder and fusionmodel 
output_features = len(pd.read_csv(train, delimiter=',', skiprows=0, low_memory=False, encoding='iso-8859-1')["class"].unique())


"""
set to true if you want to apply augments
should increase performance slightly (roughly 2-3%)
but also increases training time
"""
use_augments = False

if use_augments:
    train_augmentations = transforms.Compose([
    transforms.ToPILImage(),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
    transforms.RandomRotation(15),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ToTensor(),  # convert images to tensors
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),  # normalize images
    transforms.Resize((299, 299))
    ])
else:
    train_augmentations = transforms.Compose([
    transforms.ToTensor(),  # convert images to tensors
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),  # normalize images
    transforms.Resize((299, 299))
    ])

## 2.1 ResNet18 Base

---

## 2.1.1 ImageEncoder ResNet18

---

In [3]:
model = models.resnet18(weights=ResNet18_Weights.DEFAULT)

model.fc = nn.Linear(model.fc.in_features, output_features)

# hyperparameters can be set here. Check ModelTrainer.py for details. If not specified differently default values of the Modeltrainer have been used 
Solver = ModelTrainer(model=model,  
            train_path=train,
            train_embedding_path=train_emb, 
            val_path=val,
            val_embedding_path=val_emb, 
            train_augmentations=train_augmentations,
            save_path=model_save_path,
            postfix='ImageEncoder_ResNet18',
            batch_size=16
            )

cuda
21008
torch.Size([16, 3, 299, 299]) torch.Size([16, 83]) torch.Size([16, 1536])
data loaded


In [4]:
Solver.load(model_save_path+"\\modelImageEncoder_ResNet18.tar")
Solver.evaluate()

Validation Top-1 Accuracy: 0.8126
Validation Top-5 Accuracy: 0.9528


(tensor(0.8126, device='cuda:0', dtype=torch.float64), 0.9528383027522935)

## 2.1.2 TextEncoder hidden size

---

In [5]:
model = TextClassefier(input_size=1536, hidden_size=10000, output_size=output_features)

# hyperparameters can be set here. Check ModelTrainer.py for details. If not specified differently default values of the Modeltrainer have been used 
Solver = ModelTrainer(model=model,  
            train_path=train, 
           train_embedding_path=train_emb, 
            val_path=val,
            val_embedding_path=val_emb, 
            train_augmentations=train_augmentations,
            save_path=model_save_path,
            postfix='TextEncoder',
            batch_size=4
            )

cuda
21008
torch.Size([4, 3, 299, 299]) torch.Size([4, 83]) torch.Size([4, 1536])
data loaded


In [6]:
Solver.load(model_save_path+"\\modelTextEncoder.tar")
accs = []
top_5_accs = []

for i in range(10):
    acc, top_5_acc = Solver.evaluate(multimodal=True, show=False)
    accs.append(acc.cpu().numpy())
    top_5_accs.append(top_5_acc)
print('\n This is the average:')    
print(f'Validation Top-1 Accuracy: {np.array(accs).sum()/10:.4f}')
print(f'Validation Top-5 Accuracy: {np.array(top_5_accs).sum()/10:.4f}')


 This is the average:
Validation Top-1 Accuracy: 0.8817
Validation Top-5 Accuracy: 0.9712


## 2.1.3 Fusion Model Resnet18 10000 hidden size

---

In [8]:
TextModel = TextClassefier(input_size=1536, hidden_size=10000, output_size=output_features)
image_encoder = model_save_path+"\\modelImageEncoder_ResNet18.tar"
text_encoder = model_save_path+"\\modelTextEncoder.tar"
model = Fusion(image_encoder, text_encoder, TextModel, output_size=output_features)

# hyperparameters can be set here. Check ModelTrainer.py for details. If not specified differently default values of the Modeltrainer have been used 
Solver = ModelTrainer(model=model,  
            train_path=train, 
           train_embedding_path=train_emb, 
            val_path=val,
            val_embedding_path=val_emb, 
            train_augmentations=train_augmentations,
            save_path=model_save_path,
            postfix='TR0_Res18_types_with_des_78_multimodal',
            batch_size=4,
            )

cuda
21008
torch.Size([4, 3, 299, 299]) torch.Size([4, 83]) torch.Size([4, 1536])
data loaded


In [9]:
Solver.load(model_save_path+"\\modelFusion.tar")
accs = []
top_5_accs = []

for i in range(10):
    acc, top_5_acc = Solver.evaluate(multimodal=True, show=False)
    accs.append(acc.cpu().numpy())
    top_5_accs.append(top_5_acc)
print('\n This is the average:')    
print(f'Validation Top-1 Accuracy: {np.array(accs).sum()/10:.4f}')
print(f'Validation Top-5 Accuracy: {np.array(top_5_accs).sum()/10:.4f}')


 This is the average:
Validation Top-1 Accuracy: 0.9213
Validation Top-5 Accuracy: 0.9844


# 2.2 ResNet101 Base

---

## 2.2.1 ImageEncoder ResNet 101

---

In [4]:
model_save_path = dir +"\\models2"

In [11]:
model = models.resnet101(weights=ResNet101_Weights.DEFAULT)

model.fc = nn.Linear(model.fc.in_features, output_features)

# hyperparameters can be set here. Check ModelTrainer.py for details. If not specified differently default values of the Modeltrainer have been used 
Solver = ModelTrainer(model=model,  
            train_path=train,
            train_embedding_path=train_emb, 
            val_path=val,
            val_embedding_path=val_emb, 
            train_augmentations=train_augmentations,
            save_path=model_save_path,
            postfix='ImageEncoder_ResNet18',
            batch_size=16
            )

cuda
21008
torch.Size([16, 3, 299, 299]) torch.Size([16, 83]) torch.Size([16, 1536])
data loaded


In [12]:
Solver.load(model_save_path+"\\modelImageEncoder_ResNet101.tar")
Solver.evaluate()

Validation Top-1 Accuracy: 0.8492
Validation Top-5 Accuracy: 0.9683


(tensor(0.8492, device='cuda:0', dtype=torch.float64), 0.9683199541284404)

## 2.2.2 TextEncoder hidden size 12000

---

In [5]:
model = TextClassefier(input_size=1536, hidden_size=12000, output_size=output_features)

# hyperparameters can be set here. Check ModelTrainer.py for details. If not specified differently default values of the Modeltrainer have been used 
Solver = ModelTrainer(model=model,  
            train_path=train, 
           train_embedding_path=train_emb, 
            val_path=val,
            val_embedding_path=val_emb, 
            train_augmentations=train_augmentations,
            save_path=model_save_path,
            postfix='TextEncoder',
            batch_size=4
            )

cuda
21008
torch.Size([4, 3, 299, 299]) torch.Size([4, 83]) torch.Size([4, 1536])
data loaded


In [6]:
Solver.load(model_save_path+"\\modelTextEncoder.tar")
accs = []
top_5_accs = []

for i in range(10):
    acc, top_5_acc = Solver.evaluate(multimodal=True, show=False)
    accs.append(acc.cpu().numpy())
    top_5_accs.append(top_5_acc)
print('\n This is the average:')    
print(f'Validation Top-1 Accuracy: {np.array(accs).sum()/10:.4f}')
print(f'Validation Top-5 Accuracy: {np.array(top_5_accs).sum()/10:.4f}')


 This is the average:
Validation Top-1 Accuracy: 0.8849
Validation Top-5 Accuracy: 0.9713


## 2.2.3 Fusion Model ResNet101 hidden size 12000

---

In [3]:
TextModel = TextClassefier(input_size=1536, hidden_size=12000, output_size=output_features)
image_encoder = model_save_path+"\\modelImageEncoder_ResNet101.tar"
text_encoder = model_save_path+"\\modelTextEncoder.tar"
model = Fusion(image_encoder, text_encoder, TextModel, output_size=output_features)

# hyperparameters can be set here. Check ModelTrainer.py for details. If not specified differently default values of the Modeltrainer have been used 
Solver = ModelTrainer(model=model,  
            train_path=train, 
           train_embedding_path=train_emb, 
            val_path=val,
            val_embedding_path=val_emb, 
            train_augmentations=train_augmentations,
            save_path=model_save_path,
            postfix='TR0_Res18_types_with_des_78_multimodal',
            batch_size=4,
            )

cuda
21008
torch.Size([4, 3, 299, 299]) torch.Size([4, 83]) torch.Size([4, 1536])
data loaded


In [4]:
Solver.load(model_save_path+"\\modelFusion.tar")
accs = []
top_5_accs = []

for i in range(10):
    acc, top_5_acc = Solver.evaluate(multimodal=True, show=False)
    accs.append(acc.cpu().numpy())
    top_5_accs.append(top_5_acc)
print('\n This is the average:')    
print(f'Validation Top-1 Accuracy: {np.array(accs).sum()/10:.4f}')
print(f'Validation Top-5 Accuracy: {np.array(top_5_accs).sum()/10:.4f}')


 This is the average:
Validation Top-1 Accuracy: 0.9378
Validation Top-5 Accuracy: 0.9910


# 3 Mints Image Only

---

In [2]:
dir = r"F:\Users\basti\Documents\Goethe Uni\Data Challange"
model_save_path = dir +"\\models"

# specifies image data 
train, val = dir+"\\train.csv", dir+"\\val.csv"

# specifies language data
train_emb, val_emb   = dir+'\\embeddings_train.npy', dir+'\\embeddings_val.npy'

# specifies output feature size for imageencoder, textencoder and fusionmodel 
output_features = len(pd.read_csv(train, delimiter=',', skiprows=0, low_memory=False, encoding='iso-8859-1')["class"].unique())


"""
set to true if you want to apply augments
should increase performance slightly (roughly 2-3%)
but also increases training time
"""
use_augments = False

if use_augments:
    train_augmentations = transforms.Compose([
    transforms.ToPILImage(),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
    transforms.RandomRotation(15),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ToTensor(),  # convert images to tensors
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),  # normalize images
    transforms.Resize((299, 299))
    ])
else:
    train_augmentations = transforms.Compose([
    transforms.ToTensor(),  # convert images to tensors
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),  # normalize images
    transforms.Resize((299, 299))
    ])

## 3.1 Resnet 18

---

In [3]:
model = models.resnet18(weights=ResNet18_Weights.DEFAULT)

model.fc = nn.Linear(model.fc.in_features, output_features)

# hyperparameters can be set here. Check ModelTrainer.py for details. If not specified differently default values of the Modeltrainer have been used 
Solver = ModelTrainer(model=model,  
            train_path=train,
            train_embedding_path=train_emb, 
            val_path=val,
            val_embedding_path=val_emb, 
            train_augmentations=train_augmentations,
            save_path=model_save_path,
            postfix='ImageEncoder_ResNet18',
            batch_size=16
            )

cuda
21593
torch.Size([16, 3, 299, 299]) torch.Size([16, 83]) torch.Size([16, 1536])
data loaded


In [5]:
Solver.load(model_save_path+"\\modelTR0_Res18.tar")
Solver.evaluate()

Validation Top-1 Accuracy: 0.8058
Validation Top-5 Accuracy: 0.9482


(tensor(0.8058, device='cuda:0', dtype=torch.float64), 0.9481553940749021)

## 3.2 ResNet50

---

In [None]:
model = models.resnet50(weights=ResNet50_Weights.DEFAULT)

model.fc = nn.Linear(model.fc.in_features, output_features)

# hyperparameters can be set here. Check ModelTrainer.py for details. If not specified differently default values of the Modeltrainer have been used 
Solver = ModelTrainer(model=model,  
            train_path=train,
            train_embedding_path=train_emb, 
            val_path=val,
            val_embedding_path=val_emb, 
            train_augmentations=train_augmentations,
            save_path=model_save_path,
            postfix='ImageEncoder_ResNet18',
            batch_size=16
            )

In [None]:
Solver.load(model_save_path+"\\modelTR0_Res50.tar")
Solver.evaluate()

## 3.3 ResNet101

---

In [6]:
model = models.resnet101(weights=ResNet101_Weights.DEFAULT)

model.fc = nn.Linear(model.fc.in_features, output_features)

# hyperparameters can be set here. Check ModelTrainer.py for details. If not specified differently default values of the Modeltrainer have been used 
Solver = ModelTrainer(model=model,  
            train_path=train,
            train_embedding_path=train_emb, 
            val_path=val,
            val_embedding_path=val_emb, 
            train_augmentations=train_augmentations,
            save_path=model_save_path,
            postfix='ImageEncoder_ResNet18',
            batch_size=16
            )

cuda
21593
torch.Size([16, 3, 299, 299]) torch.Size([16, 83]) torch.Size([16, 1536])
data loaded


In [7]:
Solver.load(model_save_path+"\\modelTR0_Res101.tar")
Solver.evaluate()

Validation Top-1 Accuracy: 0.8291
Validation Top-5 Accuracy: 0.9584


(tensor(0.8291, device='cuda:0', dtype=torch.float64), 0.9583566238121856)

## 3.4 ResNet 50 + Augments

---

In [8]:
model = models.resnet50(weights=ResNet50_Weights.DEFAULT)

model.fc = nn.Linear(model.fc.in_features, output_features)

# hyperparameters can be set here. Check ModelTrainer.py for details. If not specified differently default values of the Modeltrainer have been used 
Solver = ModelTrainer(model=model,  
            train_path=train,
            train_embedding_path=train_emb, 
            val_path=val,
            val_embedding_path=val_emb, 
            train_augmentations=train_augmentations,
            save_path=model_save_path,
            postfix='ImageEncoder_ResNet18',
            batch_size=16
            )

cuda
21593
torch.Size([16, 3, 299, 299]) torch.Size([16, 83]) torch.Size([16, 1536])
data loaded


In [9]:
Solver.load(model_save_path+"\\modelTR5_Res50.tar")
Solver.evaluate()

Validation Top-1 Accuracy: 0.8432
Validation Top-5 Accuracy: 0.9652


(tensor(0.8432, device='cuda:0', dtype=torch.float64), 0.9652040245947456)

## 3.5 ResNet101 + Augemnts

---

In [10]:
model = models.resnet101(weights=ResNet101_Weights.DEFAULT)

model.fc = nn.Linear(model.fc.in_features, output_features)

# hyperparameters can be set here. Check ModelTrainer.py for details. If not specified differently default values of the Modeltrainer have been used 
Solver = ModelTrainer(model=model,  
            train_path=train,
            train_embedding_path=train_emb, 
            val_path=val,
            val_embedding_path=val_emb, 
            train_augmentations=train_augmentations,
            save_path=model_save_path,
            postfix='ImageEncoder_ResNet18',
            batch_size=16
            )

cuda
21593
torch.Size([16, 3, 299, 299]) torch.Size([16, 83]) torch.Size([16, 1536])
data loaded


In [12]:
Solver.load(model_save_path+"\\modelTR5_Res101.tar")
Solver.evaluate()

Validation Top-1 Accuracy: 0.8569
Validation Top-5 Accuracy: 0.9701


(tensor(0.8569, device='cuda:0', dtype=torch.float64), 0.9700950251537171)