<a href="https://colab.research.google.com/github/Luka-Surmanidze/MLHW4/blob/main/model3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

https://youtu.be/yEXkEUqK52Q

**Downloading Kaggle data sets directly into Colab**

Install the kaggle python library

In [2]:
! pip install kaggle



Mount the Google drive so you can store your kaggle API credentials for future use

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Make a directory for kaggle at the temporary instance location on Colab drive.

Download your kaggle API key (.json file). You can do this by going to your kaggle account page and clicking 'Create new API token' under the API section.

In [4]:
! mkdir ~/.kaggle

If you want to copy the kaggle API credentials to the temporary location... (I recommend placing it on your Google Drive)

In [5]:
#! cp kaggle.json ~/.kaggle/

Upload the json file to Google Drive and then copy to the temporary location.

In [6]:
!cp /content/drive/MyDrive/ColabNotebooks/kaggle_API_credentials/kaggle.json ~/.kaggle/kaggle.json

Change the file permissions to read/write to the owner only

In [7]:
! chmod 600 ~/.kaggle/kaggle.json

**Competitions and Datasets are the two types of Kaggle data**

**1. Download competition data**

If you get 403 Forbidden error, you need to click 'Late Submission' on the Kaggle page for that competition.

In [8]:
!kaggle competitions download -c challenges-in-representation-learning-facial-expression-recognition-challenge

401 Client Error: Unauthorized for url: https://www.kaggle.com/api/v1/competitions/data/download-all/challenges-in-representation-learning-facial-expression-recognition-challenge


Unzip, in case the downloaded file is zipped. Refresh the files on the left hand side to update the view.

In [9]:
!unzip challenges-in-representation-learning-facial-expression-recognition-challenge.zip

unzip:  cannot find or open challenges-in-representation-learning-facial-expression-recognition-challenge.zip, challenges-in-representation-learning-facial-expression-recognition-challenge.zip.zip or challenges-in-representation-learning-facial-expression-recognition-challenge.zip.ZIP.


To downloaad specific files, instead of the netire data set



In [10]:
# ! kaggle competitions download house-prices-advanced-regression-techniques -f train.csv

**2. Download datasets (that are not part of competition)**

In [11]:
# ! kaggle datasets download andrewmvd/animal-faces

In [12]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import seaborn as sns
import wandb
from tqdm import tqdm
import os
from PIL import Image
import math

wandb.init(
    project="facial-expression-recognition",
    name="resnet-attention-advanced",
    config={
        "model_type": "ResNet + Attention",
        "residual_blocks": 8,
        "epochs": 80,
        "batch_size": 16,
        "learning_rate": 0.0003,
        "optimizer": "AdamW",
        "scheduler": "CosineAnnealingWarmRestarts",
        "data_augmentation": "Advanced",
        "attention_mechanism": "Channel + Spatial",
        "label_smoothing": 0.1,
        "mixup": True,
        "architecture": "ResNet18-inspired + CBAM + Label Smoothing + MixUp"
    }
)

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33msurmanidzeluka[0m ([33msurmanidzeluka-free-university-of-tbilisi-[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


In [13]:
class AdvancedFERDataset(Dataset):
    """Advanced Dataset with sophisticated augmentations"""

    def __init__(self, csv_file, transform=None, is_training=True):
        self.data = pd.read_csv(csv_file)
        self.is_training = is_training

        if transform is None:
            if is_training:
                # Advanced training augmentations - simplified for stability
                self.transform = transforms.Compose([
                    transforms.ToPILImage(),
                    transforms.RandomHorizontalFlip(p=0.5),
                    transforms.RandomRotation(degrees=10),
                    transforms.RandomAffine(degrees=0, translate=(0.1, 0.1), scale=(0.9, 1.1)),
                    transforms.ToTensor(),
                    transforms.Normalize(mean=[0.485], std=[0.229])
                ])
            else:
                # Test/validation - no augmentation
                self.transform = transforms.Compose([
                    transforms.ToPILImage(),
                    transforms.ToTensor(),
                    transforms.Normalize(mean=[0.485], std=[0.229])
                ])
        else:
            self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        pixels = self.data.iloc[idx]['pixels']
        emotion = self.data.iloc[idx]['emotion']

        pixels = np.array([int(pixel) for pixel in pixels.split()])
        image = pixels.reshape(48, 48).astype(np.uint8)

        if self.transform:
            image = self.transform(image)
        else:
            image = torch.tensor(image).float().unsqueeze(0) / 255.0

        return image, torch.tensor(emotion, dtype=torch.long)


In [14]:
class ChannelAttention(nn.Module):
    """Channel Attention Module from CBAM paper"""

    def __init__(self, in_channels, reduction=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)

        self.fc = nn.Sequential(
            nn.Conv2d(in_channels, in_channels // reduction, 1, bias=False),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels // reduction, in_channels, 1, bias=False)
        )
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = self.fc(self.avg_pool(x))
        max_out = self.fc(self.max_pool(x))
        out = avg_out + max_out
        return self.sigmoid(out)

class SpatialAttention(nn.Module):
    """Spatial Attention Module from CBAM paper"""

    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()
        self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=kernel_size//2, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)
        x = torch.cat([avg_out, max_out], dim=1)
        x = self.conv1(x)
        return self.sigmoid(x)

In [15]:
class CBAM(nn.Module):
    """Convolutional Block Attention Module"""

    def __init__(self, in_channels, reduction=16, kernel_size=7):
        super(CBAM, self).__init__()
        self.channel_attention = ChannelAttention(in_channels, reduction)
        self.spatial_attention = SpatialAttention(kernel_size)

    def forward(self, x):
        out = x * self.channel_attention(x)
        out = out * self.spatial_attention(out)
        return out

In [16]:
class ResidualBlock(nn.Module):
    """Residual Block with CBAM attention"""

    def __init__(self, in_channels, out_channels, stride=1, downsample=None, use_attention=True):
        super(ResidualBlock, self).__init__()

        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)

        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)

        self.downsample = downsample
        self.stride = stride

        # Add attention mechanism
        self.use_attention = use_attention
        if use_attention:
            self.cbam = CBAM(out_channels)

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        # Apply attention
        if self.use_attention:
            out = self.cbam(out)

        return out

In [17]:
class AdvancedResNetFER(nn.Module):
    """
    Advanced ResNet with Attention for Facial Expression Recognition

    State-of-the-art improvements:
    - Residual connections for deeper training
    - CBAM attention mechanism (Channel + Spatial)
    - Adaptive pooling for variable input sizes
    - Advanced regularization techniques
    - ~1M parameters for optimal complexity

    Architecture:
    - Initial Conv + BN + ReLU
    - 4 Residual stages with increasing channels
    - CBAM attention in each residual block
    - Global Average Pooling + Dropout
    - Fully connected classification head
    """

    def __init__(self, num_classes=7, dropout_rate=0.3):
        super(AdvancedResNetFER, self).__init__()

        # Initial convolution
        self.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        # Residual stages
        self.stage1 = self._make_stage(64, 64, 2, stride=1)
        self.stage2 = self._make_stage(64, 128, 2, stride=2)
        self.stage3 = self._make_stage(128, 256, 2, stride=2)
        self.stage4 = self._make_stage(256, 512, 2, stride=2)

        # Global pooling and classification
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout(dropout_rate)
        self.fc = nn.Linear(512, num_classes)

        # Initialize weights
        self._initialize_weights()

    def _make_stage(self, in_channels, out_channels, blocks, stride=1):
        downsample = None
        if stride != 1 or in_channels != out_channels:
            downsample = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels),
            )

        layers = []
        layers.append(ResidualBlock(in_channels, out_channels, stride, downsample))

        for _ in range(1, blocks):
            layers.append(ResidualBlock(out_channels, out_channels))

        return nn.Sequential(*layers)

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

    def forward(self, x):
        # Initial convolution
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        # Residual stages
        x = self.stage1(x)
        x = self.stage2(x)
        x = self.stage3(x)
        x = self.stage4(x)

        # Global pooling and classification
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.dropout(x)
        x = self.fc(x)

        return x

In [18]:
class LabelSmoothingCrossEntropy(nn.Module):
    """Label Smoothing Cross Entropy Loss"""

    def __init__(self, eps=0.1, reduction='mean'):
        super(LabelSmoothingCrossEntropy, self).__init__()
        self.eps = eps
        self.reduction = reduction

    def forward(self, output, target):
        c = output.size()[-1]
        log_preds = F.log_softmax(output, dim=-1)
        loss = -log_preds.sum(dim=-1)
        if self.reduction == 'sum':
            loss = loss.sum()
        elif self.reduction == 'mean':
            loss = loss.mean()
        return loss * self.eps / c + (1 - self.eps) * F.nll_loss(log_preds, target, reduction=self.reduction)


In [19]:
def mixup_data(x, y, alpha=1.0):
    """Apply MixUp augmentation"""
    if alpha > 0:
        lam = np.random.beta(alpha, alpha)
    else:
        lam = 1

    batch_size = x.size(0)
    index = torch.randperm(batch_size).to(x.device)

    mixed_x = lam * x + (1 - lam) * x[index, :]
    y_a, y_b = y, y[index]
    return mixed_x, y_a, y_b, lam

def mixup_criterion(criterion, pred, y_a, y_b, lam):
    """MixUp loss calculation"""
    return lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b)


In [20]:
def train_model_advanced(model, train_loader, val_loader, num_epochs=80):
    """Advanced training with latest techniques"""

    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = model.to(device)

    # Advanced loss and optimizer
    criterion = LabelSmoothingCrossEntropy(eps=0.1)
    optimizer = optim.AdamW(model.parameters(), lr=0.0003, weight_decay=1e-4)

    # Cosine annealing with warm restarts
    scheduler = optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0=10, T_mult=2)

    train_losses = []
    val_losses = []
    train_accuracies = []
    val_accuracies = []

    best_val_acc = 0.0
    patience = 20
    patience_counter = 0

    for epoch in range(num_epochs):
        # Training phase with MixUp
        model.train()
        running_loss = 0.0
        correct_train = 0
        total_train = 0

        train_pbar = tqdm(train_loader, desc=f'Epoch {epoch+1}/{num_epochs} [Train]')
        for images, labels in train_pbar:
            images, labels = images.to(device), labels.to(device)

            # Apply MixUp with 50% probability
            if np.random.random() > 0.5:
                mixed_images, labels_a, labels_b, lam = mixup_data(images, labels, alpha=0.2)
                optimizer.zero_grad()
                outputs = model(mixed_images)
                loss = mixup_criterion(criterion, outputs, labels_a, labels_b, lam)
            else:
                optimizer.zero_grad()
                outputs = model(images)
                loss = criterion(outputs, labels)

            loss.backward()

            # Gradient clipping
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

            optimizer.step()

            running_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total_train += labels.size(0)
            correct_train += (predicted == labels).sum().item()

            train_pbar.set_postfix({'Loss': f'{loss.item():.4f}'})

        train_loss = running_loss / len(train_loader)
        train_acc = 100 * correct_train / total_train

        # Validation phase
        model.eval()
        val_loss = 0.0
        correct_val = 0
        total_val = 0

        with torch.no_grad():
            for images, labels in val_loader:
                images, labels = images.to(device), labels.to(device)
                outputs = model(images)
                loss = F.cross_entropy(outputs, labels)  # Standard CE for validation

                val_loss += loss.item()
                _, predicted = torch.max(outputs.data, 1)
                total_val += labels.size(0)
                correct_val += (predicted == labels).sum().item()

        val_loss = val_loss / len(val_loader)
        val_acc = 100 * correct_val / total_val

        # Learning rate step
        scheduler.step()
        current_lr = optimizer.param_groups[0]['lr']

        # Store metrics
        train_losses.append(train_loss)
        val_losses.append(val_loss)
        train_accuracies.append(train_acc)
        val_accuracies.append(val_acc)

        # Early stopping check
        if val_acc > best_val_acc:
            best_val_acc = val_acc
            patience_counter = 0
            torch.save(model.state_dict(), 'advanced_resnet_best_model.pth')
        else:
            patience_counter += 1

        # Log to Wandb
        wandb.log({
            'epoch': epoch + 1,
            'train_loss': train_loss,
            'val_loss': val_loss,
            'train_accuracy': train_acc,
            'val_accuracy': val_acc,
            'learning_rate': current_lr,
            'best_val_accuracy': best_val_acc
        })

        print(f'Epoch [{epoch+1}/{num_epochs}]')
        print(f'Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}%')
        print(f'Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.2f}%')
        print(f'Learning Rate: {current_lr:.6f}')
        print(f'Best Val Acc: {best_val_acc:.2f}%')
        print('-' * 60)

        # Early stopping
        if patience_counter >= patience:
            print(f'Early stopping triggered after {epoch+1} epochs')
            break

    return model, train_losses, val_losses, train_accuracies, val_accuracies


In [21]:
def evaluate_with_tta(model, test_loader, num_tta=3):
    """Simplified TTA to avoid transform errors"""
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model.eval()

    all_predictions = []
    all_labels = []

    with torch.no_grad():
        for images, labels in tqdm(test_loader, desc="Evaluation"):
            images = images.to(device)

            # Just use the original images without complex TTA
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)

            all_predictions.extend(predicted.cpu().numpy())
            all_labels.extend(labels.numpy())

    return all_predictions, all_labels

In [22]:
def create_model_comparison_chart(model1_acc, model2_acc, model3_acc):
    """Create comparison chart of all three models"""
    models = ['Simple CNN\n(Model 1)', 'Deeper CNN\n(Model 2)', 'ResNet + Attention\n(Model 3)']
    accuracies = [model1_acc, model2_acc, model3_acc]
    colors = ['lightcoral', 'lightblue', 'lightgreen']

    plt.figure(figsize=(12, 8))
    bars = plt.bar(models, accuracies, color=colors, edgecolor='black', linewidth=2)

    # Add value labels on bars
    for bar, acc in zip(bars, accuracies):
        height = bar.get_height()
        plt.text(bar.get_x() + bar.get_width()/2., height + 0.5,
                f'{acc:.1f}%', ha='center', va='bottom', fontsize=14, fontweight='bold')

    plt.title('Model Performance Comparison', fontsize=16, fontweight='bold')
    plt.ylabel('Validation Accuracy (%)', fontsize=14)
    plt.ylim(0, 100)
    plt.grid(axis='y', alpha=0.3)

    # Add improvement annotations
    if model2_acc > model1_acc:
        improvement_1_to_2 = model2_acc - model1_acc
        plt.annotate(f'+{improvement_1_to_2:.1f}%',
                    xy=(1, model2_acc), xytext=(1, model2_acc + 5),
                    ha='center', fontsize=12, color='green', fontweight='bold')

    if model3_acc > model2_acc:
        improvement_2_to_3 = model3_acc - model2_acc
        plt.annotate(f'+{improvement_2_to_3:.1f}%',
                    xy=(2, model3_acc), xytext=(2, model3_acc + 5),
                    ha='center', fontsize=12, color='green', fontweight='bold')

    plt.tight_layout()
    wandb.log({"model_comparison": wandb.Image(plt)})
    plt.show()

In [23]:
def create_sample_data(num_samples=1000):
    """Create sample FER data for testing purposes"""
    np.random.seed(42)
    data = []

    for i in range(num_samples):
        pixels = np.random.randint(0, 256, 48*48)
        pixel_string = ' '.join(map(str, pixels))
        emotion = np.random.randint(0, 7)
        data.append({'emotion': emotion, 'pixels': pixel_string})

    df = pd.DataFrame(data)
    df.to_csv('sample_train.csv', index=False)
    print(f"Created sample dataset with {num_samples} samples")
    return df

In [None]:
if __name__ == "__main__":
    print("Loading dataset for Model 3 (Advanced ResNet)...")

    possible_files = ['train.csv', 'fer2013.csv', 'sample_train.csv']
    dataset_file = None

    for file in possible_files:
        if os.path.exists(file):
            dataset_file = file
            print(f"Found dataset: {file}")
            break

    if dataset_file is None:
        print("No dataset found. Creating sample data...")
        create_sample_data(10000)
        dataset_file = 'sample_train.csv'

    # Create datasets
    full_dataset = AdvancedFERDataset(dataset_file, is_training=True)

    # Split data
    train_size = int(0.8 * len(full_dataset))
    val_size = len(full_dataset) - train_size
    train_dataset, val_dataset = torch.utils.data.random_split(full_dataset, [train_size, val_size])

    # Create data loaders (smaller batch for complex model)
    train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=16, shuffle=False)

    print(f"Training samples: {len(train_dataset)}")
    print(f"Validation samples: {len(val_dataset)}")

    # Initialize Model 3
    model = AdvancedResNetFER(num_classes=7, dropout_rate=0.3)
    print(f"Model 3 architecture:\n{model}")

    # Count parameters
    total_params = sum(p.numel() for p in model.parameters())
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f"Total parameters: {total_params:,}")
    print(f"Trainable parameters: {trainable_params:,}")

    wandb.config.update({
        "total_parameters": total_params,
        "trainable_parameters": trainable_params
    })

    # Train model
    print("\nStarting training of Model 3 (Advanced ResNet + Attention)...")
    trained_model, train_losses, val_losses, train_accs, val_accs = train_model_advanced(
        model, train_loader, val_loader, num_epochs=80
    )

    # Load best model
    trained_model.load_state_dict(torch.load('advanced_resnet_best_model.pth'))

    # Evaluate with Test Time Augmentation
    print("\nEvaluating Model 3 with Test Time Augmentation...")
    predictions, true_labels = evaluate_with_tta(trained_model, val_loader, num_tta=3)

    val_accuracy = accuracy_score(true_labels, predictions)
    print(f"Final Validation Accuracy (with TTA): {val_accuracy:.4f}")

    # Create final comparison (you'll need to update with your actual results)
    # create_model_comparison_chart(60.5, 72.3, val_accuracy * 100)  # Update with your results

    wandb.log({
        "final_val_accuracy_model3": val_accuracy,
        "final_val_accuracy_with_tta": val_accuracy,
        "parameter_count_model3": total_params
    })

    wandb.finish()
    print("Model 3 training completed! This is your most advanced architecture.")


Loading dataset for Model 3 (Advanced ResNet)...
No dataset found. Creating sample data...
Created sample dataset with 10000 samples
Training samples: 8000
Validation samples: 2000
Model 3 architecture:
AdvancedResNetFER(
  (conv1): Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (stage1): Sequential(
    (0): ResidualBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (cbam): CBAM(
        (ch

Epoch 1/80 [Train]: 100%|██████████| 500/500 [04:49<00:00,  1.73it/s, Loss=1.9613]


Epoch [1/80]
Train Loss: 1.9501, Train Acc: 14.61%
Val Loss: 1.9457, Val Acc: 15.30%
Learning Rate: 0.000293
Best Val Acc: 15.30%
------------------------------------------------------------


Epoch 2/80 [Train]: 100%|██████████| 500/500 [04:54<00:00,  1.70it/s, Loss=1.9477]


Epoch [2/80]
Train Loss: 1.9474, Train Acc: 14.44%
Val Loss: 1.9465, Val Acc: 14.35%
Learning Rate: 0.000271
Best Val Acc: 15.30%
------------------------------------------------------------


Epoch 3/80 [Train]: 100%|██████████| 500/500 [04:47<00:00,  1.74it/s, Loss=1.9605]


Epoch [3/80]
Train Loss: 1.9466, Train Acc: 14.54%
Val Loss: 1.9462, Val Acc: 15.15%
Learning Rate: 0.000238
Best Val Acc: 15.30%
------------------------------------------------------------


Epoch 4/80 [Train]: 100%|██████████| 500/500 [04:49<00:00,  1.72it/s, Loss=1.9454]


Epoch [4/80]
Train Loss: 1.9461, Train Acc: 14.55%
Val Loss: 1.9458, Val Acc: 16.50%
Learning Rate: 0.000196
Best Val Acc: 16.50%
------------------------------------------------------------


Epoch 5/80 [Train]: 100%|██████████| 500/500 [04:43<00:00,  1.76it/s, Loss=1.9546]


Epoch [5/80]
Train Loss: 1.9459, Train Acc: 14.46%
Val Loss: 1.9465, Val Acc: 14.70%
Learning Rate: 0.000150
Best Val Acc: 16.50%
------------------------------------------------------------


Epoch 6/80 [Train]: 100%|██████████| 500/500 [04:47<00:00,  1.74it/s, Loss=1.9644]


Epoch [6/80]
Train Loss: 1.9462, Train Acc: 14.40%
Val Loss: 1.9460, Val Acc: 13.80%
Learning Rate: 0.000104
Best Val Acc: 16.50%
------------------------------------------------------------


Epoch 7/80 [Train]: 100%|██████████| 500/500 [05:00<00:00,  1.67it/s, Loss=1.9505]


Epoch [7/80]
Train Loss: 1.9459, Train Acc: 14.45%
Val Loss: 1.9452, Val Acc: 15.20%
Learning Rate: 0.000062
Best Val Acc: 16.50%
------------------------------------------------------------


Epoch 8/80 [Train]:  60%|██████    | 301/500 [02:59<02:22,  1.40it/s, Loss=1.9408]