<br>
<h2 style = "font-size:60px; font-family:Garamond ; font-weight : normal; background-color: #f6f5f5 ; color : #fe346e; text-align: center; border-radius: 100px 100px;">Wikipedia Image/Caption Starter</h2>
<br>

![](https://media.premiumtimesng.com/wp-content/files/2019/03/87de3a515e8f3848f1b62b55456b6d95.jpeg)

<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;">This Notebook tries to provide a trainable approach by training <code>Efficientnet b0</code> for generating image embeddings and <code>Xlm-roberta</code> for generating text embeddings</span>

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Install Required Libraries</h1></span>

In [1]:
!pip install git+https://github.com/rwightman/pytorch-image-models
!pip install --upgrade wandb

Collecting git+https://github.com/rwightman/pytorch-image-models
  Cloning https://github.com/rwightman/pytorch-image-models to /tmp/pip-req-build-qeu8jvub
  Running command git clone -q https://github.com/rwightman/pytorch-image-models /tmp/pip-req-build-qeu8jvub
  Resolved https://github.com/rwightman/pytorch-image-models to commit ddc29da974023416ac2bf2468a80a18438c0090d
Building wheels for collected packages: timm
  Building wheel for timm (setup.py) ... [?25ldone
[?25h  Created wheel for timm: filename=timm-0.5.0-py3-none-any.whl size=418633 sha256=a4f41577763eec84267a6674b8fa96d7e5bbb48d8f744d6e1ac5453dff66436c
  Stored in directory: /tmp/pip-ephem-wheel-cache-_lf9j81v/wheels/69/3d/b0/be55cbadabd87a0e1875d63c7492d199097a39cc2433637650
Successfully built timm
Installing collected packages: timm
Successfully installed timm-0.5.0
Collecting wandb
  Downloading wandb-0.12.6-py2.py3-none-any.whl (1.7 MB)
[K     |████████████████████████████████| 1.7 MB 638 kB/s eta 0:00:01
Installi

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Import Required Libraries 📚</h1></span>

In [2]:
import os
import gc
import cv2
import copy
import time
import random
from PIL import Image
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

import base64
import pickle

# For downloading images
from io import BytesIO

# For data manipulation
import numpy as np
import pandas as pd

# Pytorch Imports
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.utils.data import Dataset, DataLoader
from torch.cuda import amp

# Utils
import joblib
from tqdm import tqdm
from collections import defaultdict

# Sklearn Imports
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import StratifiedKFold, KFold

# For Image Models
import timm

# For Transformer Models
from transformers import AutoTokenizer, AutoModel

# Albumentations for augmentations
import albumentations as A
from albumentations.pytorch import ToTensorV2

# For colored terminal text
from colorama import Fore, Back, Style
b_ = Fore.BLUE
sr_ = Style.RESET_ALL

import warnings
warnings.filterwarnings("ignore")

# For descriptive error messages
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

<img src="https://i.imgur.com/gb6B4ig.png" width="400" alt="Weights & Biases" />

<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;"> Weights & Biases (W&B) is a set of machine learning tools that helps you build better models faster. <strong>Kaggle competitions require fast-paced model development and evaluation</strong>. There are a lot of components: exploring the training data, training different models, combining trained models in different combinations (ensembling), and so on.</span>

> <span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">⏳ Lots of components = Lots of places to go wrong = Lots of time spent debugging</span>

<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">W&B can be useful for Kaggle competition with it's lightweight and interoperable tools:</span>

* <span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">Quickly track experiments,<br></span>
* <span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">Version and iterate on datasets, <br></span>
* <span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">Evaluate model performance,<br></span>
* <span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">Reproduce models,<br></span>
* <span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">Visualize results and spot regressions,<br></span>
* <span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">Share findings with colleagues.</span>

<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">To learn more about Weights and Biases check out this <strong><a href="https://www.kaggle.com/ayuraj/experiment-tracking-with-weights-and-biases">kernel</a></strong>.</span>

In [3]:
import wandb

try:
    from kaggle_secrets import UserSecretsClient
    user_secrets = UserSecretsClient()
    api_key = user_secrets.get_secret("wandb_api")
    wandb.login(key=api_key)
    anony = None
except:
    anony = "must"
    print('If you want to use your W&B account, go to Add-ons -> Secrets and provide your W&B access token. Use the Label name as wandb_api. \nGet your W&B access token from here: https://wandb.ai/authorize')

[34m[1mwandb[0m: W&B API key is configured (use `wandb login --relogin` to force relogin)
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Training Configuration ⚙️</h1></span>

In [4]:
CONFIG = {"seed": 2021,
          "epochs": 5,
          "img_size": 256,
          "image_model_name": "tf_efficientnet_b0",
          "text_model_name": "xlm-roberta-base",
          "embedding_size": 256,
          "train_batch_size": 32,
          "valid_batch_size": 64,
          "learning_rate": 1e-4,
          "scheduler": 'CosineAnnealingLR',
          "min_lr": 1e-6,
          "T_max": 500,
          "weight_decay": 1e-6,
          "max_length": 32,
          "n_accumulate": 1,
          "device": torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
          }

CONFIG["tokenizer"] = AutoTokenizer.from_pretrained(CONFIG['text_model_name'])

Downloading:   0%|          | 0.00/512 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/9.10M [00:00<?, ?B/s]

In [5]:
CONFIG

{'seed': 2021,
 'epochs': 5,
 'img_size': 256,
 'image_model_name': 'tf_efficientnet_b0',
 'text_model_name': 'xlm-roberta-base',
 'embedding_size': 256,
 'train_batch_size': 32,
 'valid_batch_size': 64,
 'learning_rate': 0.0001,
 'scheduler': 'CosineAnnealingLR',
 'min_lr': 1e-06,
 'T_max': 500,
 'weight_decay': 1e-06,
 'max_length': 32,
 'n_accumulate': 1,
 'device': device(type='cuda', index=0),
 'tokenizer': PreTrainedTokenizerFast(name_or_path='xlm-roberta-base', vocab_size=250002, model_max_len=512, is_fast=True, padding_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'sep_token': '</s>', 'pad_token': '<pad>', 'cls_token': '<s>', 'mask_token': AddedToken("<mask>", rstrip=False, lstrip=True, single_word=False, normalized=False)})}

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Set Seed for Reproducibility</h1></span>

In [6]:
def set_seed(seed=42):
    '''Sets the seed of the entire notebook so results are the same every time we run.
    This is for REPRODUCIBILITY.'''
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    # When running on the CuDNN backend, two further options must be set
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    # Set a fixed value for the hash seed
    os.environ['PYTHONHASHSEED'] = str(seed)
    
set_seed(CONFIG['seed'])

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Download Dataset</h1></span>

<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;">Please check <a href="https://www.kaggle.com/c/wikipedia-image-caption/discussion/284720">this discussion</a> to know about the dataset creation procedure</span>

In [7]:
run = wandb.init(project="Wikipedia", 
                 anonymous="must")
artifact = run.use_artifact('dchanda/Wikipedia/Wiki-data:latest', type='dataset')
artifact_dir = artifact.download()
run.finish()

for file in os.listdir(artifact_dir):
    filepath = os.path.join(artifact_dir, file)
    with open(filepath, "rb") as fp:
        data = pickle.load(fp)

[34m[1mwandb[0m: Currently logged in as: [33mdchanda[0m (use `wandb login --relogin` to force relogin)


[34m[1mwandb[0m: Downloading large artifact Wiki-data:latest, 1095.86MB. 1 files... Done. 0:0:0


VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;">Dataset Structure</span>
<blockquote>
<ul>
    <li><span style="color: #000508; font-family: Segoe UI; font-size: 1.3em; font-weight: 300;"><code>data</code>: list of dictionaries</span>
    </li>
    <ul>
        <li><code>b64_bytes</code>: base64 encoded bytes of the image file at a 300px resolution</li>
        <li><code>caption_title_and_reference_description</code>: list of captions</li>
        <li><code>target</code>: 1 for positive samples and -1 for negative samples</li>
    </ul>

In [8]:
random.shuffle(data)

train_data = data[:45000]
valid_data = data[45000:]
print(f"Number of training samples: {len(train_data)}")
print(f"Number of validation samples: {len(valid_data)}")

Number of training samples: 45000
Number of validation samples: 10038


# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Visualize Images</h1></span>

In [9]:
# run = wandb.init(project='Wikipedia',
#                  job_type='Visualization',
#                  anonymous='must')

# preview_table = wandb.Table(columns=['Image', 'Captions'])
# for content in json_content[:1000]:
#     out = base64.b64decode(content['b64_bytes'])
#     img = Image.open(BytesIO(out)).convert("RGB")
#     preview_table.add_data(wandb.Image(img), 
#                            content['caption_title_and_reference_description'])

# wandb.log({'Visualization': preview_table})
# run.finish()

<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;"><a href="https://wandb.ai/dchanda/Wikipedia/runs/2kzujq78">View the Complete Table Here ⮕</a></span>

![](https://i.imgur.com/Uebq4gp.gif)

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Dataset Class</h1></span>

In [10]:
class WikipediaDataset(Dataset):
    def __init__(self, data, tokenizer, max_length, transforms=None):
        self.data = data
        self.max_len = max_length
        self.tokenizer = tokenizer
        self.transforms = transforms
        
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, index):
        image_bytes = base64.b64decode(self.data[index]["b64_bytes"])
        img = np.asarray(Image.open(BytesIO(image_bytes)).convert("RGB"))
        caption = random.choice(self.data[index]["caption_title_and_reference_description"])
        caption = caption.replace("[SEP]", "</s>") # sep token for xlm-roberta
        inputs = self.tokenizer.encode_plus(
                caption,
                truncation=True,
                add_special_tokens=True,
                max_length=self.max_len,
                padding='max_length'
            )
        target = self.data[index]['target']
        
        ids = inputs['input_ids']
        mask = inputs['attention_mask']
        
        if self.transforms:
            img = self.transforms(image=img)["image"]
        
        return {
            'ids': torch.tensor(ids, dtype=torch.long),
            'mask': torch.tensor(mask, dtype=torch.long),
            'image': img,
            'target': torch.tensor(target, dtype=torch.long)
        }

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Augmentations</h1></span>

In [11]:
data_transforms = {
    "train": A.Compose([
        A.Resize(CONFIG['img_size'], CONFIG['img_size']),
        A.HorizontalFlip(p=0.5),
        A.Normalize(
                mean=[0.485, 0.456, 0.406], 
                std=[0.229, 0.224, 0.225], 
                max_pixel_value=255.0, 
                p=1.0
            ),
        ToTensorV2()], p=1.),
    
    "valid": A.Compose([
        A.Resize(CONFIG['img_size'], CONFIG['img_size']),
        A.Normalize(
                mean=[0.485, 0.456, 0.406], 
                std=[0.229, 0.224, 0.225], 
                max_pixel_value=255.0, 
                p=1.0
            ),
        ToTensorV2()], p=1.)
}

In [12]:
train_dataset = WikipediaDataset(train_data, CONFIG["tokenizer"], CONFIG["max_length"], 
                                 transforms=data_transforms["train"])
train_loader = DataLoader(train_dataset, batch_size=CONFIG['train_batch_size'], 
                          num_workers=4, shuffle=True, pin_memory=True, drop_last=True)

valid_dataset = WikipediaDataset(valid_data, CONFIG["tokenizer"], CONFIG["max_length"], 
                                 transforms=data_transforms["valid"])
valid_loader = DataLoader(valid_dataset, batch_size=CONFIG['valid_batch_size'], 
                          num_workers=4, shuffle=False, pin_memory=True)

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Create Model</h1></span>

![](https://i.imgur.com/PzooBWU.png)

In [13]:
class WikipediaModel(nn.Module):
    def __init__(self, image_model, text_model, embedding_size):
        super(WikipediaModel, self).__init__()
        self.image_model = timm.create_model(image_model, pretrained=True)
        self.n_features = self.image_model.classifier.in_features
        self.image_model.reset_classifier(0)
        self.image_drop = nn.Dropout(p=0.2)
        self.image_fc = nn.Linear(self.n_features, embedding_size)
        
        self.text_model = AutoModel.from_pretrained(text_model)
        self.text_drop = nn.Dropout(p=0.2)
        self.text_fc = nn.Linear(768, embedding_size)
        
        self.freeze_backbone()
        
    def forward(self, images, ids, mask):
        image_features = self.image_model(images)
        image_embeddings = self.image_fc(self.image_drop(image_features))
        
        out = self.text_model(input_ids=ids,attention_mask=mask,
                              output_hidden_states=False)
        out = self.text_drop(out[1])
        text_embeddings = self.text_fc(out)

        return image_embeddings, text_embeddings
    
    def freeze_backbone(self):
        for params in self.image_model.parameters():
            params.requires_grad = False
        # Only finetune final layer
        self.image_fc.weight.requires_grad = True
        self.image_fc.bias.requires_grad = True
        
        for params in self.text_model.parameters():
            params.requires_grad = False
        # Only finetune final layer
        self.text_fc.weight.requires_grad = True
        self.text_fc.bias.requires_grad = True
    

model = WikipediaModel(CONFIG['image_model_name'], CONFIG['text_model_name'], CONFIG['embedding_size'])
model.to(CONFIG['device']);

Downloading: "https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/tf_efficientnet_b0_aa-827b6e33.pth" to /root/.cache/torch/hub/checkpoints/tf_efficientnet_b0_aa-827b6e33.pth


Downloading:   0%|          | 0.00/1.12G [00:00<?, ?B/s]

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Loss Function</h1></span>

In [14]:
def criterion(outputs1, outputs2, targets):
    return nn.CosineEmbeddingLoss()(outputs1, outputs2, targets)

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Training Function</h1></span>

In [15]:
def train_one_epoch(model, optimizer, scheduler, dataloader, device, epoch):
    model.train()
    
    dataset_size = 0
    running_loss = 0.0
    
    bar = tqdm(enumerate(dataloader), total=len(dataloader))
    for step, data in bar:
        ids = data['ids'].to(device, dtype = torch.long)
        mask = data['mask'].to(device, dtype = torch.long)
        images = data['image'].to(device, dtype=torch.float)
        targets = data['target'].to(device, dtype=torch.long)
        
        batch_size = ids.size(0)

        image_outputs, text_outputs = model(images, ids, mask)
        loss = criterion(image_outputs, text_outputs, targets)
        loss = loss / CONFIG['n_accumulate']
        loss.backward()
    
        if (step + 1) % CONFIG['n_accumulate'] == 0:
            optimizer.step()

            # zero the parameter gradients
            optimizer.zero_grad()

            if scheduler is not None:
                scheduler.step()
                
        running_loss += (loss.item() * batch_size)
        dataset_size += batch_size
        
        epoch_loss = running_loss / dataset_size
        
        bar.set_postfix(Epoch=epoch, Train_Loss=epoch_loss,
                        LR=optimizer.param_groups[0]['lr'])
    gc.collect()
    
    return epoch_loss

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Validation Function</h1></span>

In [16]:
@torch.no_grad()
def valid_one_epoch(model, dataloader, device, epoch):
    model.eval()
    
    dataset_size = 0
    running_loss = 0.0
    
    bar = tqdm(enumerate(dataloader), total=len(dataloader))
    for step, data in bar:        
        ids = data['ids'].to(device, dtype = torch.long)
        mask = data['mask'].to(device, dtype = torch.long)
        images = data['image'].to(device, dtype=torch.float)
        targets = data['target'].to(device, dtype=torch.long)
        
        batch_size = ids.size(0)
        
        image_outputs, text_outputs = model(images, ids, mask)
        loss = criterion(image_outputs, text_outputs, targets)
        
        running_loss += (loss.item() * batch_size)
        dataset_size += batch_size
        
        epoch_loss = running_loss / dataset_size
        
        bar.set_postfix(Epoch=epoch, Valid_Loss=epoch_loss,
                        LR=optimizer.param_groups[0]['lr'])   
    
    gc.collect()
    
    return epoch_loss

# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Run Training</h1></span>

In [17]:
def run_training(model, optimizer, scheduler, device, num_epochs):
    # To automatically log gradients
    wandb.watch(model, log_freq=100)
    
    if torch.cuda.is_available():
        print("[INFO] Using GPU: {}\n".format(torch.cuda.get_device_name()))
    
    start = time.time()
    best_model_wts = copy.deepcopy(model.state_dict())
    best_epoch_loss = np.inf
    history = defaultdict(list)
    
    for epoch in range(1, num_epochs + 1): 
        gc.collect()
        train_epoch_loss = train_one_epoch(model, optimizer, scheduler, 
                                           dataloader=train_loader, 
                                           device=CONFIG['device'], epoch=epoch)
        
        val_epoch_loss = valid_one_epoch(model, valid_loader, device=CONFIG['device'], 
                                         epoch=epoch)
    
        history['Train Loss'].append(train_epoch_loss)
        history['Valid Loss'].append(val_epoch_loss)
        
        # Log the metrics
        wandb.log({"Train Loss": train_epoch_loss})
        wandb.log({"Valid Loss": val_epoch_loss})
        
        # deep copy the model
        if val_epoch_loss <= best_epoch_loss:
            print(f"{b_}Validation Loss Improved ({best_epoch_loss} ---> {val_epoch_loss})")
            best_epoch_loss = val_epoch_loss
            run.summary["Best Loss"] = best_epoch_loss
            best_model_wts = copy.deepcopy(model.state_dict())
            PATH = "Loss{:.4f}_epoch{:.0f}.bin".format(best_epoch_loss, epoch)
            torch.save(model.state_dict(), PATH)
            # Save a model file from the current directory
            print(f"Model Saved{sr_}")
            
        print()
    
    end = time.time()
    time_elapsed = end - start
    print('Training complete in {:.0f}h {:.0f}m {:.0f}s'.format(
        time_elapsed // 3600, (time_elapsed % 3600) // 60, (time_elapsed % 3600) % 60))
    print("Best Loss: {:.4f}".format(best_epoch_loss))
    
    # load best model weights
    model.load_state_dict(best_model_wts)
    
    return model, history

In [18]:
def fetch_scheduler(optimizer):
    if CONFIG['scheduler'] == 'CosineAnnealingLR':
        scheduler = lr_scheduler.CosineAnnealingLR(optimizer,T_max=CONFIG['T_max'], 
                                                   eta_min=CONFIG['min_lr'])
    elif CONFIG['scheduler'] == 'CosineAnnealingWarmRestarts':
        scheduler = lr_scheduler.CosineAnnealingWarmRestarts(optimizer,T_0=CONFIG['T_0'], 
                                                             eta_min=CONFIG['min_lr'])
    elif CONFIG['scheduler'] == None:
        return None
        
    return scheduler

<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;">Define Optimizer and Scheduler</span>

In [19]:
optimizer = optim.Adam(model.parameters(), lr=CONFIG['learning_rate'], 
                       weight_decay=CONFIG['weight_decay'])
scheduler = fetch_scheduler(optimizer)

<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;">Start Training</span>

In [20]:
run = wandb.init(project='Wikipedia', 
                 config=CONFIG,
                 job_type='Train',
                 anonymous='must')

In [21]:
model, history = run_training(model, optimizer, scheduler, 
                              device=CONFIG['device'],
                              num_epochs=CONFIG['epochs'])

[INFO] Using GPU: Tesla P100-PCIE-16GB



100%|██████████| 1406/1406 [05:44<00:00,  4.08it/s, Epoch=1, LR=9.39e-6, Train_Loss=0.504]
100%|██████████| 157/157 [00:59<00:00,  2.66it/s, Epoch=1, LR=9.39e-6, Valid_Loss=0.512]


[34mValidation Loss Improved (inf ---> 0.5119646857354384)
Model Saved[0m



100%|██████████| 1406/1406 [04:59<00:00,  4.69it/s, Epoch=2, LR=6.93e-5, Train_Loss=0.502]
100%|██████████| 157/157 [00:58<00:00,  2.69it/s, Epoch=2, LR=6.93e-5, Valid_Loss=0.511]


[34mValidation Loss Improved (0.5119646857354384 ---> 0.5108492633134489)
Model Saved[0m



100%|██████████| 1406/1406 [04:58<00:00,  4.70it/s, Epoch=3, LR=6.04e-5, Train_Loss=0.502]
100%|██████████| 157/157 [00:57<00:00,  2.73it/s, Epoch=3, LR=6.04e-5, Valid_Loss=0.514]





100%|██████████| 1406/1406 [04:57<00:00,  4.72it/s, Epoch=4, LR=1.53e-5, Train_Loss=0.499]
100%|██████████| 157/157 [00:58<00:00,  2.67it/s, Epoch=4, LR=1.53e-5, Valid_Loss=0.517]





100%|██████████| 1406/1406 [04:58<00:00,  4.71it/s, Epoch=5, LR=9.91e-5, Train_Loss=0.499]
100%|██████████| 157/157 [00:59<00:00,  2.63it/s, Epoch=5, LR=9.91e-5, Valid_Loss=0.52] 



Training complete in 0h 30m 47s
Best Loss: 0.5108


In [22]:
run.finish()

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Train Loss,█▆▅▁▁
Valid Loss,▂▁▃▆█

0,1
Best Loss,0.51085
Train Loss,0.49902
Valid Loss,0.52031


# <span><h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Visualizations</h1></span>

<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;"><a href="https://wandb.ai/dchanda/Wikipedia/runs/whr9irvs">View the Complete Dashboard Here ⮕</a></span>

In [23]:
# Code taken from https://www.kaggle.com/ayuraj/interactive-eda-using-w-b-tables

# This is just to display the W&B run page in this interactive session.
from IPython import display

# we create an IFrame and set the width and height
iF = display.IFrame(run.url, width=1080, height=720)
iF

![Upvote!](https://img.shields.io/badge/Upvote-If%20you%20like%20my%20work-07b3c8?style=for-the-badge&logo=kaggle)