<a href="https://www.kaggle.com/code/ayushs9020/training-multiple-models-on-commonlit?scriptVersionId=137524646" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#FFC0CB; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #FFC0CB">1 | Goal ⚽️</p>

<div style="border-radius:10px; border:#FFC0CB solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
<img src = "https://media.tenor.com/h9V9BRchFpIAAAAC/goal-soccer.gif">
    
This time we will try to train multiple models and also try to understand them in deep
    
* $RoBERTa$ $Base$
* $RoBERTa$ $Large$

# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#00FFFF; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #00FFFF">2 | Data 📊 </p>

<div style="border-radius:10px; border:#00FFFF solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

<img src = "https://media3.giphy.com/media/v1.Y2lkPTc5MGI3NjExNWxsanEzZ3VsbjdjaDV6Ymt6ajhsd2t2ZnoxNGV0ZXM0ZnJqcno4OSZlcD12MV9naWZzX3NlYXJjaCZjdD1n/xT9C25UNTwfZuk85WP/200w.gif">

In [1]:
import pandas as pd 

<div style="border-radius:10px; border:#00FFFF solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
This time we will focus on the `training` data only. Our training data is divided into $2$ directories
* **[Train Prompts](https://www.kaggle.com/competitions/commonlit-evaluate-student-summaries/data?select=prompts_train.csv)** - This file contains which prompts were given. THese are important as summaries irrelevant to the prompts will be awarded with lower points.
* **[Train Summaries](https://www.kaggle.com/competitions/commonlit-evaluate-student-summaries/data?select=summaries_train.csv)** - This file contains which summareis were given in response to the prompts. 

Both the files are in `CSV(Comma Seperated Values)`, which include a common column `prompts_id` that can be seen as a connecting point between the $2$ distributed Data, this can be effectively concatenated with 
```
prompts_train.merge(summaries_train , on = "prompt_id")
```

In [2]:
train = pd.read_csv("/kaggle/input/commonlit-evaluate-student-summaries/prompts_train.csv").merge(
    pd.read_csv("/kaggle/input/commonlit-evaluate-student-summaries/summaries_train.csv") , on = "prompt_id")

train.head()

Unnamed: 0,prompt_id,prompt_question,prompt_title,prompt_text,student_id,text,content,wording
0,39c16e,Summarize at least 3 elements of an ideal trag...,On Tragedy,Chapter 13 \r\nAs the sequel to what has alrea...,00791789cc1f,1 element of an ideal tragedy is that it shoul...,-0.210614,-0.471415
1,39c16e,Summarize at least 3 elements of an ideal trag...,On Tragedy,Chapter 13 \r\nAs the sequel to what has alrea...,0086ef22de8f,The three elements of an ideal tragedy are: H...,-0.970237,-0.417058
2,39c16e,Summarize at least 3 elements of an ideal trag...,On Tragedy,Chapter 13 \r\nAs the sequel to what has alrea...,0094589c7a22,Aristotle states that an ideal tragedy should ...,-0.387791,-0.584181
3,39c16e,Summarize at least 3 elements of an ideal trag...,On Tragedy,Chapter 13 \r\nAs the sequel to what has alrea...,00cd5736026a,One element of an Ideal tragedy is having a co...,0.088882,-0.59471
4,39c16e,Summarize at least 3 elements of an ideal trag...,On Tragedy,Chapter 13 \r\nAs the sequel to what has alrea...,00d98b8ff756,The 3 ideal of tragedy is how complex you need...,-0.687288,-0.460886


# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#FF0000; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #FF0000">3 | Tokenization🎟</p>

<div style="border-radius:10px; border:#FF0000 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">


<img src = "https://www.qualicen.de/wp-content/uploads/2021/03/TokenizerMeme.png" width = 400>

In [3]:
from transformers import RobertaTokenizer
import numpy as np
import tqdm

<div style="border-radius:10px; border:#FF0000 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
Tokenization is the process of giving tokens to each distinct data point and seperating each token at particular values. This can be done by various ways such as 
* $Bag$ $of$ $Words$
* $Word$ $2$ $Vec$
* $Term$ $Frequency$ $-$ $Inverse$ $Document$ $Frequency$ $(TF-IDF)$
* $Global$ $Vectors$ $for$ $Word$ $Representation$ $(GLOVE)$

Here we will be using **[RoBERTa Tokenizer](https://huggingface.co/docs/transformers/model_doc/roberta#transformers.RobertaTokenizer)**

In [4]:
tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
tokenizer

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

RobertaTokenizer(name_or_path='roberta-base', vocab_size=50265, model_max_length=512, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'eos_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'unk_token': AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'sep_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'pad_token': AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'cls_token': AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'mask_token': AddedToken("<mask>", rstrip=False, lstrip=True, single_word=False, normalized=True)}, clean_up_tokenization_spaces=True)

<div style="border-radius:10px; border:#FF0000 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

If we send a sample text like 
```
Your dreams are shining in the darkness. There is a intoxication in your eyes, You are in my dreams, in the answers, in the questions, Every day, I steal you in my thoughts
```

In [5]:
tokenizer("Why do I sleep less and dream more these days? , It seems God has some good intentions for me. , Yesterday I was a beggar, today I am a prince of the heart. , It seems God has some good intentions for me.")

{'input_ids': [0, 7608, 109, 38, 3581, 540, 8, 3366, 55, 209, 360, 116, 2156, 85, 1302, 1840, 34, 103, 205, 11304, 13, 162, 4, 2156, 15267, 38, 21, 10, 39882, 271, 6, 452, 38, 524, 10, 13705, 9, 5, 1144, 4, 2156, 85, 1302, 1840, 34, 103, 205, 11304, 13, 162, 4, 2], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

<div style="border-radius:10px; border:#FF0000 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

What we would focus on the `input_ids`. 

In [6]:
tokens = tokenizer("Why do I sleep less and dream more these days? , It seems God has some good intentions for me. , Yesterday I was a beggar, today I am a prince of the heart. , It seems God has some good intentions for me." , 
                  return_tensors = "pt")["input_ids"]
tokens

tensor([[    0,  7608,   109,    38,  3581,   540,     8,  3366,    55,   209,
           360,   116,  2156,    85,  1302,  1840,    34,   103,   205, 11304,
            13,   162,     4,  2156, 15267,    38,    21,    10, 39882,   271,
             6,   452,    38,   524,    10, 13705,     9,     5,  1144,     4,
          2156,    85,  1302,  1840,    34,   103,   205, 11304,    13,   162,
             4,     2]])

In [7]:
tokens = []
for index in tqdm.tqdm(range(7165) , total = 7165):
    stri = ""
    for columns in ["prompt_question" , "prompt_title" , "prompt_text" , "text"]:
        stri += "\n\n" + str(train[columns][index])
        
    token = tokenizer(stri , return_tensors = "pt")["input_ids"]
    token = [x.detach().numpy().tolist() for x in token]
    tokens.append(token)

  0%|          | 0/7165 [00:00<?, ?it/s]Token indices sequence length is longer than the specified maximum sequence length for this model (849 > 512). Running this sequence through the model will result in indexing errors
100%|██████████| 7165/7165 [01:21<00:00, 87.95it/s]


In [8]:
np.save("/kaggle/working/Sample Tokens" , np.array(tokens))

  np.save("/kaggle/working/Sample Tokens" , np.array(tokens))


# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#808080; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #808080">4 | DataLoader 📁</p>

<div style="border-radius:10px; border:#808080 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

<img src = "https://jacobwgillespie.com/from-rest-to-graphql-3.jpg" width = 400>

In [9]:
from torch.utils.data import Dataset , DataLoader
import torch

<div style="border-radius:10px; border:#808080 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

Lets first make a simple class 

In [10]:
class DataSet(Dataset):pass

<div style="border-radius:10px; border:#808080 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

Lets intialize the constructor

In [11]:
class DataSet(Dataset):
    
    def __init__(self , target = "content"):pass

<div style="border-radius:10px; border:#808080 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

Now lets load our datasets

In [12]:
class DataSet(Dataset):
    
    def __init__(self , target = "content"):
        
        train = pd.read_csv("/kaggle/input/commonlit-evaluate-student-summaries/prompts_train.csv").merge(
            pd.read_csv("/kaggle/input/commonlit-evaluate-student-summaries/summaries_train.csv") , on = "prompt_id")
        
        self.embeds = np.load("/kaggle/working/Sample Tokens.npy" , allow_pickle = True)
        
        self.content = train["content"]
        self.wording = train["wording"]
        
        self.target = target

<div style="border-radius:10px; border:#808080 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

Now we will just add some getters

In [13]:
class DataSet(Dataset):
    
    def __init__(self , target = "content"):
        
        train = pd.read_csv("/kaggle/input/commonlit-evaluate-student-summaries/prompts_train.csv").merge(
            pd.read_csv("/kaggle/input/commonlit-evaluate-student-summaries/summaries_train.csv") , on = "prompt_id")
        
        self.embeds = np.load("/kaggle/working/Sample Tokens.npy" , allow_pickle = True).tolist()
        
        self.content = train["content"]
        self.wording = train["wording"]
        
        self.target = target
        
    def __len__(self): return self.content.shape[0]
    
    def __getitem__(self , index):
        
        r_embeds = torch.tensor(self.embeds[index] , dtype = torch.long)
        
        if self.target == "content": r_targets = torch.tensor(self.content[index] , dtype = torch.float32)
        if self.target == "wording": r_targets = torch.tensor(self.wording[index] , dtype = torch.float32)
            
        return r_embeds , r_targets

In [14]:
train = DataSet(target = "content")

train_d = DataLoader(train , shuffle = True , batch_size = 1)

# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#00FF00; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #00FF00">5 | Model Setup 💻</p>

<div style="border-radius:10px; border:#00FF00 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">


<img src = "https://1.bp.blogspot.com/-SUiFNF4VT1Q/YL1Otnqu-RI/AAAAAAAAicQ/WhJbPcGwJRUb9UuJmxBcUCWBFGcIh57UgCNcBGAsYHQ/s675/E3OEwuMWUAwfU1I.jpg" width = 400>

In [15]:
from transformers import RobertaModel
import torch.nn as nn

caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl6StatusC1EN10tensorflow5error4CodeESt17basic_string_viewIcSt11char_traitsIcEENS_14SourceLocationE']
caused by: ['/opt/conda/lib/python3.10/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZTVN10tensorflow13GcsFileSystemE']


<div style="border-radius:10px; border:#00FF00 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

Now lets setup the model, that will predict for us 
        
## $5.1 , 5.2$ $|$ $RoBERTa$

Before `training a Roberta Model`, it is important to first `understand the model in deep`, it gives insights about its life.
* $How$ $it$ $was$ $originated..?$ 
* $What$ $hurdles$ $it$ $faced...?$
* $How$ $did$ $it$ $overcome$ $them...?$
* $What$ $upgrades$ $it$ $had$ $in$ $its$ $life...?$
* $How$ $its$ $life$ $went...?$ 
* $Did$ $it$ $had$ $any$ $hearbreaks...?$
* $Was$ $it$ $a$ $job$ $person$, $or$ $an$ $enterprenuer...?$

These information is `important`, as it becomes a `good topic` for `gossips` among the `Data Scientists` and other related people

So it was just a normal summer of $July$ , $2019$, when some scientists from $FACEBOOK$ $AI$, found that $BERT$ which was released in $2017$, was a person, who had a `great amount of knowledge`/`abilities`/`strength`/`LEGO Peices(very very valuable)`, but was `not upto its potential`. Researches did some `extensive researh` and found that, with some `fine-tuning`/`consilting-sessions` $BERT$ could `beat many more LLMs`, that were `born after it`, which was very much `extraordinary` (in itself). This was kind of a `heartbreak` for $BERT$, but it also `motivated` it to be a `better version` of itslef. 

<img src = "https://i.imgflip.com/2uzxes.jpg?a469272" width = 400>

Scientists helped it to `restore its heart`, they got it back from depression (yes it was in depression). It got upgraded in many a ways, first was the name, $BERT$ wanted to change its name, so it was given a new name $RoBERTa$ $Robust$ $Optimized$ $BERT$ $Pretraining$ $Approach$ 

* $Larger$ $Dataset$ - Previously $BERT$ was trained on **[Toronto BookCorpus](https://en.wikipedia.org/wiki/BookCorpus)/[English Wikipidea](https://huggingface.co/datasets/wikipedia)** $3.5 Billion$ $Words$. $RoBERTa$ was now trained on a larger dataset of **[CC-NEWS](https://paperswithcode.com/dataset/cc-news)/[Open Web Text](https://huggingface.co/datasets/Skylion007/openwebtext)/[STORIES](https://paperswithcode.com/dataset/cc-stories)** $15 Billion$ $Words$. This helped it to `gather more knowledge` and be more `resilient` to the curelty of the world. People name this ability as $Robust$
* $Larger$ $Batch$ $Size$ - The `heartbroken version BERT` had a batch size of $128$, whereas , $RoBERTa$ was also upgraded on Batch Size $8,000$

Roberta had also `unlocked a new skill` as it progressed. $BERT$, its previous version, had `MASKED tokens` of a `fixed size` and `place` (`static`), whereas $RoBERTa$ had `Dynamically Changing Masking Pattern`

Along with new abilites, $RoBERTa$ was also forced to `leave some abilities`, one of which was `able to see ahead in time`, people also call this **Next Sentence Prediction**. Scientinst found that, $NSP$ was not that `much usefull in training`, and thus removed it 

With these upgraded, $Roberta$ was now able to `beat many more LLMs ahead of its time`, it also `surpassed its own version` on **[GLUE  General Language Understanding Evaluation benchmark](https://huggingface.co/datasets/glue)**, with a preceding score of $0.1$, previous was $8.5$. $RoBERTa$ also achieved great results on $4/9$ tasks of GLUE which included **[MNLI](https://cims.nyu.edu/~sbowman/multinli/)/[QNLI](https://paperswithcode.com/dataset/qnli)/[RTE](https://paperswithcode.com/dataset/rte)/[STS-B](https://paperswithcode.com/task/sts-b)** as well aced the exams of **[SQuad](https://huggingface.co/datasets/squad)/[RACE](https://www.cs.cmu.edu/~glai1/data/race/)**

The $Tokenization$ process `remained the same`, wehere $2$ `concatenated sequence` of `tokens` are provided named ad $x_1 , x_2 , x_3 , ... , x_n$/$y_1 , y_2 , y_3 , ... , y_M$. Assuming that the lenght do not exceeds $512$, $M+N<512$

There are several tasks, on which RoBERTa is trained
* $Dynamic$ $Masked$ $Langauge$ $Modeling$ $(MLM)$ - $RoBERTA$ uniformaly masks $80$% of the tokens. Out of these $80$%, $10$% are decided to be always masked, and the remaining are changed overtime. This is done by repeating the sentence with different postions of masks

* $Next$ $Sentence$ $Prediction$ $(NSP)$ - NSP has now become a `questionable training procedure`, founders of $BERT$ said that it `adds great improvment` to the `models`/`architechture`. But there recent studies shows that it is `not required that much` for `training`/`better performances`. Thus $2$ `different models were trained`, one with NSP, and one without. The one which was not trained on NSP performed better 

Adam Optimizer was taken into account with the following hyperparameters
* $\beta_1$ $=$ $0.9$
* $\beta_2$ $=$ $0.98$
* $\alpha$ $=$ $1e-6$/$1^{-6}$
* $L_2$ $=$ $0.01$
* $epochs$ $=$ $10,000$
* $GELU$
    
We will be training $2$ different models of the same base of $RoBERTa$
* $RoBERTa$ $Base$
* $RoBERTa$ $Large$

In [16]:
class model(nn.Module):
    
    def __init__(self):
        super(model, self).__init__()
    
        self.r_model = RobertaModel.from_pretrained("roberta-base")

        self.linear_1 = nn.LayerNorm(768)
        self.out = nn.Linear(768, 1)
    
    def forward(self, inputs):
    
        emb = self.r_model(inputs)[0]
        emb = torch.mean(emb, axis=1)
        
        output = self.ln(emb)
        output = self.out(output)
        
        return output

In [17]:
class model(nn.Module):
    
    def __init__(self):
        super(model, self).__init__()
    
        self.r_model = RobertaModel.from_pretrained("roberta-large")

        self.ln = nn.LayerNorm(1024)
        self.out = nn.Linear(1024, 1)
    
    def forward(self, inputs):
    
        emb = self.r_model(inputs)[0]
        emb = torch.mean(emb, axis = 1)
        
        output = self.ln(emb)
        output = self.out(output)
        
        return output

In [18]:
sample_model = model()

Downloading (…)lve/main/config.json:   0%|          | 0.00/482 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.bias', 'lm_head.dense.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#FFA500; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #FFA500">6 | Training Arguments 💾</p>

<div style="border-radius:10px; border:#FFA500 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
<img src = "https://images7.memedroid.com/images/UPLOADED668/5f917ecab395b.jpeg" width = 400>

## $6.1$ $|$ $Loss$ $Function$
    
As we have a Regression Task, it is best to use $Mean$ $Squared$ $Loss$ $(MSE Loss)$

In [19]:
loss_func = nn.MSELoss()

<div style="border-radius:10px; border:#FFA500 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

## $6.2$ $|$ $Optimizer$ 
    
* $RoBERTa$ $Base$
* $RoBERTa$ $Large$

In [20]:
optim = torch.optim.Adam(sample_model.parameters() , 
                        lr = 1e-6 , weight_decay = 0.01)

# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#00FFFF; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #00FFFF">7 | Training Loop 💽</p>

<div style="border-radius:10px; border:#00FFFF solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
<img src = "https://programmerhumor.io/wp-content/uploads/2022/10/programmerhumor-io-programming-memes-8dfe737e6fb797b-758x757.jpg" width = 400>

Now lets intialize the training loop
    
I cannot use `Kaggle GPU` due to some reason, thats why I am importing the results from `Wandb`. Below is the code I used 
   
## $7.1$ $|$ $RoBERTa$ $Base$
    
The total time was around $5$ $Minutes$ for $1$ epoch, with a batch size of $1$
    
## $7.2$ $|$ $RoBERTa$ $Large$
    
The total time was around $14$ $Minutes$ for $1$ epoch, with a batch size of $1$

```
wandb.watch(ro , loss_func)

for x , y in tqdm.tqdm(train_d , total = len(train_d)):
    
    torch.cuda.empty_cache()
    x = x[0]
    if x.shape[1] > 512: x = x[: , :512]
    
    x = x.to("cuda")
    y = y.to("cuda")

    pred = ro(x)

    loss = loss_func(pred , y)
    wandb.log({"loss": loss})

    loss.backward

    optim.step()
```

# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#00FFFF; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #00FFFF">8 | Results 🏆</p>

<div style="border-radius:10px; border:#00FFFF solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

<img src = "https://media.tenor.com/xoVXud0uxOgAAAAC/miracle-miracle-miracle.gif">

In [21]:
from IPython.display import IFrame

<div style="border-radius:10px; border:#00FFFF solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">

Lets see how our model performed over the data 
    
## $8.1$ $|$ $RoBERTa$ $Base$

In [22]:
IFrame("https://wandb.ai//ayushsinghal659/RoBERTa%20xx%20Small%7CCommonLit%20Evaluate%20Student/reports/RoBERTa-Small-CommonLit-Evaluate-Student--Vmlldzo0OTE5MDcz" , 1000 , 400)

<div style="border-radius:10px; border:#00FFFF solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
## $8.2$ $|$ $RoBERTa$ $Large$

loss	▇▄▅▃▁▅▃▂█▁▅▂▂▆▃▄▃▂▃▂▃▄█▅▁▇▁▃▃▄▃▇▁▆▂▃▄▁▂▁

loss	1.80729

In [23]:
IFrame("https://wandb.ai/ayushsinghal659/RoBERTa%20Large%20%7C%20CommonLit/reports/RoBERTa-Large-CommonLit--Vmlldzo0OTQwMjU3" , 1000 , 400)

# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#FFC0CB; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #FFC0CB">9 | TO DO LIST ✔️</p>
<div style="border-radius:10px; border:#FFC0CB solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
<img src = "https://images2.memedroid.com/images/UPLOADED10/513bbf63f07c1.jpeg" width = 400>

* $TO$ $DO$ $LIST$ $1$ $:$ $IMPROVE$ $SCORES$
* $TO$ $DO$ $LIST$ $2$ $:$ $REDUCE$ $TRAINING$ $TIME$
* $TO$ $DO$ $LIST$ $3$ $:$ $UPDGRADE$ $MODEL$
* $TO$ $DO$ $LIST$ $4$ $:$ $DANCE$

# <p style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#800080; font-size:140%; text-align:left;padding: 0px; border-bottom: 3px solid #800080">10 | Ending 🏁</p>
<div style="border-radius:10px; border:#800080 solid; padding: 15px; background-color: #F3f9ed; font-size:100%; text-align:left">
    
<img src = "https://media.tenor.com/QGk8WCu_5r8AAAAd/the-rock-stop.gif" width = 400>
    
**THAT IT FOR TODAY GUYS**

**WE WILL GO DEEPER INTO THE DATA IN THE UPCOMING VERSIONS**

**PLEASE COMMENT YOUR THOUGHTS, HIHGLY APPRICIATED**

**DONT FORGET TO MAKE AN UPVOTE, IF YOU LIKED MY WORK  $:)$**
    
<img src = "https://i.imgflip.com/19aadg.jpg">
    
**PEACE OUT $!!!$**