<br>
<h1 style = "font-size:60px; font-family:Garamond ; font-weight : normal; background-color: #f6f5f5 ; color : #fe346e; text-align: center; border-radius: 100px 100px;">Let's Try T5</h1>
<br>

![](https://1.bp.blogspot.com/-o4oiOExxq1s/Xk26XPC3haI/AAAAAAAAFU8/NBlvOWB84L0PTYy9TzZBaLf6fwPGJTR0QCLcBGAsYHQ/s640/image3.gif)

<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;">Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by <b>introducing a unified framework that converts all text-based language problems into a text-to-text format</b>. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ''Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.</span> <br>
<br>
<br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.2em; font-weight: 300;"><i>Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer: <a href='https://arxiv.org/abs/1910.10683'>https://arxiv.org/abs/1910.10683</a></i></span>

<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;">In this kernel we will be using the <code>T5EncoderModel</code> provided by Huggingface transformers library to get an encoded representation of the text and then use a <code>Linear</code> layer on top of it for getting our output</span>

<h3>📌 BERT Baseline Notebook:</h3> <h4><a href='https://www.kaggle.com/debarshichanda/pytorch-commonlit-readability-bert-baseline/'>https://www.kaggle.com/debarshichanda/pytorch-commonlit-readability-bert-baseline</a></h4>

<h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Install Required Libraries</h1>

In [None]:
!pip install -q nlpretext loguru

<h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Import Required Libraries 📚</h1>

In [None]:
import os
import gc
import copy
import time
import numpy as np
import pandas as pd
import plotly.graph_objects as go

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.utils.data import Dataset, DataLoader
from torch.cuda import amp

import transformers
from transformers import T5Tokenizer, T5EncoderModel
from transformers import AdamW, get_linear_schedule_with_warmup

from tqdm import tqdm
from collections import defaultdict

from loguru import logger

from sklearn.metrics import mean_squared_error
from sklearn.model_selection import StratifiedKFold, KFold

from nlpretext import Preprocessor
from nlpretext.basic.preprocess import (normalize_whitespace, remove_punct, 
                                        remove_eol_characters, remove_stopwords, 
                                        lower_text, unpack_english_contractions)

from colorama import Fore
b_ = Fore.BLUE

import warnings
warnings.filterwarnings("ignore")

<h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Read the Data 📖</h1>

In [None]:
train_df = pd.read_csv("../input/commonlitreadabilityprize/train.csv")
train_df.head()

In [None]:
test_df = pd.read_csv("../input/commonlitreadabilityprize/test.csv")
test_df.head()

<h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Preprocessing</h1>

![](https://github.com/artefactory/NLPretext/raw/master/references/logo_nlpretext.png)

<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;">We will use <i>NLPretext</i> library for preprocessing our text</span>

In [None]:
preprocessor = Preprocessor()
preprocessor.pipe(unpack_english_contractions)
preprocessor.pipe(remove_eol_characters)
preprocessor.pipe(lower_text)
preprocessor.pipe(normalize_whitespace)

In [None]:
train_df['excerpt'] = train_df['excerpt'].apply(preprocessor.run)

<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;">Maximum Length of Text present in the Dataset</span>

In [None]:
excerpt_lenghts = train_df['excerpt'].apply(lambda x: len(x.split()))
max(excerpt_lenghts)

<h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Training Configuration ⚙️</h1>

In [None]:
class CONFIG:
    seed = 42
    max_len = 205
    model_name = 't5-base'
    hidden_state = 768
    hidden_state_fixed = 768 # ONLY CHANGE WHEN CHANGING THE MODEL
                             # 512 for t5-small, 768 for t5-base, 1024 for t5-large   
    train_batch_size = 32
    valid_batch_size = 32
    epochs = 20
    learning_rate = 1e-5
    n_accumulate = 1
    folds = 10
    tokenizer = T5Tokenizer.from_pretrained(model_name)
    tokenizer.save_pretrained('./tokenizer')
    device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

<h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Set Seed for Reproducibility</h1>

In [None]:
def set_seed(seed = 42):
    '''Sets the seed of the entire notebook so results are the same every time we run.
    This is for REPRODUCIBILITY.'''
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    # When running on the CuDNN backend, two further options must be set
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    # Set a fixed value for the hash seed
    os.environ['PYTHONHASHSEED'] = str(seed)
    
set_seed(CONFIG.seed)

<h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Create Folds</h1>

<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;">Code taken from <a href="https://www.kaggle.com/tolgadincer/continuous-target-stratification?rvi=1&scriptVersionId=52551118&cellId=6">https://www.kaggle.com/tolgadincer/continuous-target-stratification?rvi=1&scriptVersionId=52551118&cellId=6</a></span>

In [None]:
def create_folds(df, n_s=5, n_grp=None):
    df['kfold'] = -1
    
    if n_grp is None:
        skf = KFold(n_splits=n_s, random_state=CONFIG.seed)
        target = df.target
    else:
        skf = StratifiedKFold(n_splits=n_s, shuffle=True, random_state=CONFIG.seed)
        df['grp'] = pd.cut(df.target, n_grp, labels=False)
        target = df.grp
    
    for fold_no, (t, v) in enumerate(skf.split(target, target)):
        df.loc[v, 'kfold'] = fold_no
    return df

In [None]:
df = create_folds(train_df, n_s=CONFIG.folds, n_grp=12)
df.head()

<h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Dataset Class</h1>

In [None]:
class T5Dataset(Dataset):
    def __init__(self, df, tokenizer, max_len):
        self.text = df['excerpt'].values
        self.target = df['target'].values
        self.max_len = max_len
        self.tokenizer = tokenizer
        
    def __len__(self):
        return len(self.text)
    
    def __getitem__(self, index):
        text = self.text[index]
        inputs = self.tokenizer.encode_plus(
            text,
            truncation=True,
            add_special_tokens=True,
            max_length=self.max_len,
            padding='max_length'
        )
        ids = inputs['input_ids']
        mask = inputs['attention_mask']
        
        return {
            'ids': torch.tensor(ids, dtype=torch.long),
            'mask': torch.tensor(mask, dtype=torch.long),
            'target': torch.tensor(self.target[index], dtype=torch.float)
        }

<h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Loss Function</h1>

In [None]:
def criterion(outputs, targets):
    return torch.sqrt(nn.MSELoss()(outputs.view(-1), targets.view(-1)))

<h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Create Model</h1>

<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;">We need <code>T5Pooler</code> to pool the outputs of the model</span><br>
<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;">This is simply done by taking the mean of the hidden states </span>

In [None]:
class T5Pooler(nn.Module):
    def __init__(self, hidden_size, activation=nn.Tanh()):
        super().__init__()
        self.dense = nn.Linear(CONFIG.hidden_state_fixed, hidden_size)
        self.activation = activation
        
    def forward(self, hidden_states):
        # We simply take the mean of the hidden states
        mean_tensor = torch.mean(hidden_states, dim=1)
        pooled_output = self.dense(mean_tensor)
        pooled_output = self.activation(pooled_output)
        return pooled_output

In [None]:
class T5Model(nn.Module):
    def __init__(self):
        super(T5Model, self).__init__()
        self.t5 = T5EncoderModel.from_pretrained(CONFIG.model_name)
        self.pooler = T5Pooler(CONFIG.hidden_state, nn.LeakyReLU())
        self.fc = nn.Linear(CONFIG.hidden_state, 1)
    
    def forward(self, ids, mask):
        outputs = self.t5(ids, attention_mask=mask)
        pooled_outputs = self.pooler(outputs.last_hidden_state)
        outputs = self.fc(pooled_outputs)
        return outputs

model = T5Model()
model.to(CONFIG.device);

<h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Training Function</h1>

In [None]:
def train_one_epoch(model, optimizer, dataloader, device, epoch):
    model.train()
    scaler = amp.GradScaler()
    
    dataset_size = 0
    running_loss = 0.0
    
    bar = tqdm(enumerate(dataloader), total=len(dataloader))
    for step, data in bar:        
        ids = data['ids'].to(device, dtype = torch.long)
        mask = data['mask'].to(device, dtype = torch.long)
        targets = data['target'].to(device, dtype = torch.float)
        
        batch_size = ids.size(0)
        
        with amp.autocast(enabled=True):
            outputs = model(ids, mask)
            loss = criterion(outputs, targets)
            loss = loss / CONFIG.n_accumulate
            
        scaler.scale(loss).backward()
        
        if (step + 1) % CONFIG.n_accumulate == 0:
            scaler.step(optimizer)
            scaler.update()
            
            # zero the parameter gradients
            optimizer.zero_grad()
                
        running_loss += (loss.item() * batch_size)
        dataset_size += batch_size
        
        epoch_loss = running_loss/dataset_size
        
        bar.set_postfix(Epoch=epoch, Train_Loss=epoch_loss,
                        LR=optimizer.param_groups[0]['lr'])
    gc.collect()
    
    return epoch_loss

<h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Validation Function</h1>

In [None]:
@torch.no_grad()
def valid_one_epoch(model, optimizer, dataloader, device, epoch):
    model.eval()
    
    dataset_size = 0
    running_loss = 0.0
    
    TARGETS = []
    PREDS = []
    
    bar = tqdm(enumerate(dataloader), total=len(dataloader))
    for step, data in bar:        
        ids = data['ids'].to(device, dtype = torch.long)
        mask = data['mask'].to(device, dtype = torch.long)
        targets = data['target'].to(device, dtype = torch.float)
        
        batch_size = ids.size(0)
        
        outputs = model(ids, mask)
        loss = criterion(outputs, targets)
        
        running_loss += (loss.item() * batch_size)
        dataset_size += batch_size
        
        epoch_loss = running_loss/dataset_size
        
        PREDS.extend(outputs.cpu().detach().numpy().tolist())
        TARGETS.extend(targets.cpu().detach().numpy().tolist())
        
        bar.set_postfix(Epoch=epoch, Valid_Loss=epoch_loss,
                        LR=optimizer.param_groups[0]['lr'])   
        
    val_rmse = mean_squared_error(TARGETS, PREDS, squared=False)
    gc.collect()
    
    return epoch_loss, val_rmse

<h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Run</h1>

In [None]:
@logger.catch
def run(model, optimizer, scheduler, device, num_epochs):    
    start = time.time()
    best_model_wts = copy.deepcopy(model.state_dict())
    best_rmse = np.inf
    history = defaultdict(list)
    
    for epoch in range(1, num_epochs + 1): 
        gc.collect()
        train_epoch_loss = train_one_epoch(model, optimizer, dataloader=train_loader, 
                                           device=CONFIG.device, epoch=epoch)
        
        valid_epoch_loss, valid_epoch_rmse = valid_one_epoch(model, optimizer,
                                                       dataloader=valid_loader, 
                                                       device=CONFIG.device, epoch=epoch)
    
        history['Train Loss'].append(train_epoch_loss)
        history['Valid Loss'].append(valid_epoch_loss)
        history['Valid RMSE'].append(valid_epoch_rmse)
        
        print(f'Valid RMSE: {valid_epoch_rmse}')
        
        if scheduler is not None:
            scheduler.step()
        
        # deep copy the model
        if valid_epoch_rmse <= best_rmse:
            print(f"{b_}Validation RMSE Improved ({best_rmse} ---> {valid_epoch_rmse})")
            best_rmse = valid_epoch_rmse
            best_model_wts = copy.deepcopy(model.state_dict())
            PATH = "RMSE{:.4f}_epoch{:.0f}.bin".format(best_rmse, epoch)
            torch.save(model.state_dict(), PATH)
            print("Model Saved")
            
        print()
    
    end = time.time()
    time_elapsed = end - start
    print('Training complete in {:.0f}h {:.0f}m {:.0f}s'.format(
        time_elapsed // 3600, (time_elapsed % 3600) // 60, (time_elapsed % 3600) % 60))
    print("Best Loss: {:.4f}".format(best_rmse))
    
    # load best model weights
    model.load_state_dict(best_model_wts)
    
    return model, history

In [None]:
def prepare_data(fold):
    df_train = df[df.kfold != fold].reset_index(drop=True)
    df_valid = df[df.kfold == fold].reset_index(drop=True)
    
    train_dataset = T5Dataset(df_train, CONFIG.tokenizer, CONFIG.max_len)
    valid_dataset = T5Dataset(df_valid, CONFIG.tokenizer, CONFIG.max_len)

    train_loader = DataLoader(train_dataset, batch_size=CONFIG.train_batch_size, 
                              num_workers=4, shuffle=True, pin_memory=True)
    valid_loader = DataLoader(valid_dataset, batch_size=CONFIG.valid_batch_size, 
                              num_workers=4, shuffle=False, pin_memory=True)
    
    return train_loader, valid_loader

<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;">Create Dataloaders</span>

In [None]:
train_loader, valid_loader = prepare_data(fold=0)

<span style="color: #000508; font-family: Segoe UI; font-size: 1.5em; font-weight: 300;">Define Optimizer and Scheduler</span>

In [None]:
# Defining Optimizer with weight decay to params other than bias and layer norms
param_optimizer = list(model.named_parameters())
no_decay = ["bias", "LayerNorm.bias", "LayerNorm.weight"]
optimizer_parameters = [
    {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 
     'weight_decay': 0.0001},
    {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 
     'weight_decay': 0.0}
    ]

optimizer = AdamW(optimizer_parameters, lr=CONFIG.learning_rate)

# Defining LR Scheduler
scheduler = get_linear_schedule_with_warmup(
    optimizer, 
    num_warmup_steps=0, 
    num_training_steps=CONFIG.epochs
)

<h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Train Fold: 0</h1>

In [None]:
model, history = run(model, optimizer, scheduler, device=CONFIG.device, num_epochs=CONFIG.epochs)

<h1 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #f6f5f5; color :#fe346e; border-radius: 100px 100px; text-align:center">Visualizations 📉</h1>

In [None]:
epochs = list(range(1, CONFIG.epochs + 1))
fig = go.Figure()
trace1 = go.Scatter(x=epochs, y=history['Train Loss'],
                    mode='lines+markers',
                    name='Train Loss')
trace2 = go.Scatter(x=epochs, y=history['Valid Loss'],
                    mode='lines+markers',
                    name='Valid Loss')
layout = go.Layout(template="plotly_dark", title='Loss Curve', 
                   xaxis=dict(title='Epochs'), yaxis=dict(title='Loss'))
fig = go.Figure(data = [trace1, trace2], layout = layout)
fig.show()

![Upvote!](https://img.shields.io/badge/Upvote-If%20you%20like%20my%20work-07b3c8?style=for-the-badge&logo=kaggle)