# Fine Tuning T5 for Paraphrasing


### Introduction

In this noteboook we'll be finetune-ing T5 to parpahrase input bias texts. 

#### Flow of the notebook

The notebook will be divided into separate sections to provide a organized walk through for the process used. This process can be modified for individual use cases. The sections are:

1. [Preparing Environment and Importing Libraries](#section01)
2. [Preparing the Dataset for data processing: Class](#section02)
3. [Fine Tuning the Model: Function](#section03)
4. [Validating the Model Performance: Function](#section04)
5. [Main Function](#section05)
    * [Importing and Pre-Processing the domain data](#section501)
    * [Creation of Dataset and Dataloader](#section502)
    * [Neural Network and Optimizer](#section503)
    * [Training Model and Logging to WandB](#section504)


#### Technical Details

This script leverages on multiple tools designed by other teams. Details of the tools used below. Please ensure that these elements are present in your setup to successfully implement this script.

- **Data**:
	- We are using the News Summary dataset available at [Kaggle](https://www.kaggle.com/sunnysai12345/news-summary)
	- This dataset is our extended version of MBIC.:
		- **text**: This is a line from the news article
		- **ctext**: This is it's copy used as a label


- **Language Model Used**: 
    - This notebook uses the transformer model ***T5***. [Research Paper](https://arxiv.org/abs/1910.10683)    
    - ***T5*** in many ways is one of its kind transformers architecture that not only gives state of the art results in many NLP tasks, but also has a very radical approach to NLP tasks.
    - **Text-2-Text** - According to the graphic taken from the T5 paper. All NLP tasks are converted to a **text-to-text** problem. Tasks such as translation, classification, summarization and question answering, all of them are treated as a text-to-text conversion problem, rather than seen as separate unique problem statements. 


- Hardware Requirements: 
	- Python 3.6 and above
	- Pytorch, Transformers and
	- All the stock Python ML Library
	- GPU enabled setup 
   

- **Script Objective**:
	- The objective of this script is to fine tune **T5** to be able to paraphrase text with reduced bias, while ensuring that the important information from the article is not lost.


<a id='section01'></a>
### Preparing Environment and Importing Libraries

At this step we will be installing the necessary libraries followed by importing the libraries and modules needed to run our script. 
We will be installing:
* transformers
* nltk and some of its requirements

Libraries imported are:
* Pandas
* Pytorch
* Pytorch Utils for Dataset and Dataloader
* Transformers
* T5 Model and Tokenizer

Followed by that we will preapre the device for CUDA execeution. This configuration is needed if you want to leverage on onboard GPU. First we will check the GPU avaiable to us, using the nvidia command followed by defining our device.

In [None]:
# !pip install transformers -q
# !pip install scipy
# !pip install nltk
# !pip install -U sentence-transformers
# !pip install accelerate

In [None]:
# Importing stock libraries
import numpy as np
import pandas as pd
import torch
from torch import nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, RandomSampler, SequentialSampler

import transformers
from transformers import T5Tokenizer, T5ForConditionalGeneration, AdamW
from torch import softmax
import nltk
import string

from nltk.corpus import stopwords
from torch.optim.lr_scheduler import StepLR, LambdaLR

import csv
import os

from sentence_transformers import SentenceTransformer

nltk.download('punkt')
nltk.download('stopwords')

  from .autonotebook import tqdm as notebook_tqdm
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\uujain2\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\uujain2\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [None]:
# Checking out the GPU we have access to. This is output is from the google colab version. 
!nvidia-smi

Mon Apr 10 23:57:57 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 516.59       Driver Version: 516.59       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0 Off |                  N/A |
|  0%   38C    P8    20W / 320W |    267MiB / 10240MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
# # Setting up the device for GPU usage
from torch import cuda
device = 'cuda' if cuda.is_available() else 'cpu'

In [None]:
def clean_text(text):
# Remove punctuation
    text = text.translate(str.maketrans('', '', string.punctuation))
    
    # Tokenize text
    words = nltk.word_tokenize(text)
    
    # Remove stop words
    stop_words = set(stopwords.words("english"))
    words = [word for word in words if word.lower() not in stop_words]
    
    # Rejoin preprocessed text
    return ' '.join(words)

In [None]:
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification, AutoModelForSequenceClassification, AutoTokenizer

CLASSIFIER_MODEL_DIRECTORY = 'C:\\Users\\uujain2\\Desktop\\Utkarsh\\FYP\\Models\\classifier_for_paraphraser'
classifier_model = AutoModelForSequenceClassification.from_pretrained(CLASSIFIER_MODEL_DIRECTORY)
classifier_tokenizer = AutoTokenizer.from_pretrained("cardiffnlp/twitter-roberta-base-sentiment-latest")
classifier_model.to(device)
classifier_model.eval()

classifier = classifier_model

# Running some text code to ensure everything is in order
test_text = "Trump is an amazing person."
test_text = clean_text(test_text)
inputs = classifier_tokenizer(test_text, padding="max_length", max_length=512, truncation=True, return_tensors="pt").to(device)
outputs = classifier(**inputs)
scores = outputs[0][0]
print(scores)
scores = softmax(scores[:2], dim=0)
print(scores)

similarity_checker = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
similarity_checker.to(device)

sentences = ['This is an example sentence', 'Each sentence is converted']

sentence_embeddings = similarity_checker.encode(sentences, convert_to_tensor=True)
similarity = F.cosine_similarity(sentence_embeddings[0].unsqueeze(0), sentence_embeddings[1].unsqueeze(0), dim=1)
# print(similarity.item())


# print(softmax([-1, 1]))

tensor([ 2.4256,  0.9749, -4.0285], device='cuda:0', grad_fn=<SelectBackward0>)
tensor([0.8101, 0.1899], device='cuda:0', grad_fn=<SoftmaxBackward0>)


In [None]:
# Performing a version check
transformers.__version__
torch.__version__

'1.13.1+cu117'

<a id='section02'></a>
### Preparing the Dataset for data processing: Class

We will start with creation of Dataset class - This defines how the text is pre-processed before sending it to the neural network. This dataset will be used the the Dataloader method that will feed  the data in batches to the neural network for suitable training and processing. 
The Dataloader and Dataset will be used inside the `main()`.
Dataset and Dataloader are constructs of the PyTorch library for defining and controlling the data pre-processing and its passage to neural network. For further reading into Dataset and Dataloader read the [docs at PyTorch](https://pytorch.org/docs/stable/data.html)

#### *CustomDataset* Dataset Class
- This class is defined to accept the Dataframe as input and generate tokenized output that is used by the **T5** model for training. 
- We are using the **T5** tokenizer to tokenize the data in the `text` and `ctext` column of the dataframe. 
- The tokenizer uses the ` batch_encode_plus` method to perform tokenization and generate the necessary outputs, namely: `source_id`, `source_mask` from the actual text and `target_id` and `target_mask` from the summary text.
- To read further into the tokenizer, [refer to this document](https://huggingface.co/transformers/model_doc/t5.html#t5tokenizer)
- The *CustomDataset* class is used to create 2 datasets, for training and for validation.
- *Training Dataset* is used to fine tune the model: **80% of the original data**
- *Validation Dataset* is used to evaluate the performance of the model. The model has not seen this data during training. 

#### Dataloader: Called inside the `main()`
- Dataloader is used to for creating training and validation dataloader that load data to the neural network in a defined manner. This is needed because all the data from the dataset cannot be loaded to the memory at once, hence the amount of data loaded to the memory and then passed to the neural network needs to be controlled.
- This control is achieved using the parameters such as `batch_size` and `max_len`.
- Training and Validation dataloaders are used in the training and validation part of the flow respectively

In [None]:
def measure_func_2_torch(P, Q): # final version
    """
    0 -> Biased
    infinity-> Non-Biased
    
    (0, float("inf"))
    
    P is input text value
    Q is produced output text value
    
    returns: 
    """
    #print(f"Length of P is {len(P)}")
    #print(f"Length of Q is {len(Q)}")
    P = P + 0.000001
    mask = torch.eq(P, Q) # finds all elements in the vectors that are identical
    
    indexes = torch.where(mask == True)[0] # remove from tuple
    P[indexes] = P[indexes] + 0.000001 #adjusts the numbers in P ever so slightly to not break the formula
    
    #print(f"Length of P_trunc is {len(P)}")
    #print(f"Length of Q_trunc is {len(Q)}")
    
    PQ_log = torch.log(P/Q)
    delta_PQ = (P-Q)
    
    return 1/((P/delta_PQ)*PQ_log) - 1

#Performing a Test Run
# measure_func_2_torch(torch.Tensor(np.array([0.9])), torch.Tensor(np.array([0.91]))) - similarity.to("cpu")

In [None]:
def clean_input(text:str):
    # Remove punctuation
    text = text.translate(str.maketrans('', '', string.punctuation))
    
    # Tokenize text
    words = nltk.word_tokenize(text)
    
    # Remove stop words
    stop_words = set(stopwords.words("english"))
    words = [word for word in words if word.lower() not in stop_words]
    
    # Rejoin preprocessed text
    return ' '.join(words)

In [None]:
# Creating a custom dataset for reading the dataframe and loading it into the dataloader to pass it to the neural network at a later stage for finetuning the model and to prepare it for predictions

class CustomDataset(Dataset):

    def __init__(self, dataframe, tokenizer, source_len, summ_len):
        self.tokenizer = tokenizer
        self.data = dataframe
        self.source_len = source_len
        self.summ_len = summ_len
        self.text = self.data.text
        self.ctext = self.data.ctext

    def __len__(self):
        return len(self.text)

    def __getitem__(self, index):
        ctext = str(self.ctext[index])
        ctext = ' '.join(ctext.split())

        text = str(self.text[index])
        text = ' '.join(text.split())

        source = self.tokenizer.batch_encode_plus([ctext], max_length= self.source_len, pad_to_max_length=True,return_tensors='pt')
        target = self.tokenizer.batch_encode_plus([text], max_length= self.summ_len, pad_to_max_length=True,return_tensors='pt')

        source_ids = source['input_ids'].squeeze()
        source_mask = source['attention_mask'].squeeze()
        target_ids = target['input_ids'].squeeze()
        target_mask = target['attention_mask'].squeeze()

        return {
            'source_ids': source_ids.to(dtype=torch.float16), 
            'source_mask': source_mask.to(dtype=torch.float), 
            'target_ids': target_ids.to(dtype=torch.float16),
            'target_ids_y': target_ids.to(dtype=torch.float16)
        }

<a id='section03'></a>
### Fine Tuning the Model: Function

Here we define a training function that trains the model on the training dataset created above, specified number of times (EPOCH), An epoch defines how many times the complete data will be passed through the network. 

This function is called in the `main()`

Following events happen in this function to fine tune the neural network:
- The epoch, tokenizer, model, device details, testing_ dataloader and optimizer are passed to the `train ()` when its called from the `main()`
- The dataloader passes data to the model based on the batch size.
- `language_model_labels` are calculated from the `target_ids` also, `source_id` and `attention_mask` are extracted.
- The model outputs first element gives the loss for the forward pass. 
- After every 512 steps the loss value is printed in the console.

In [None]:
# The learning rate decay function
def lr_decay(curr_lr):
    return curr_lr*0.995

In [None]:
# Creating the training function. This will be called in the main function. It is run depending on the epoch value.
# The model is put into train mode and then we wnumerate over the training loader and passed to the defined network 
bias_vals = []
saved: int = 0
saved_models: list[tuple[float, int]] = []
MODEL_CHECKPOINT_ST0RE_BUFFER = 64

def train(epoch, tokenizer, model, device, loader, optimizer, classifier, scheduler, LR):
    curr_lr = LR
    def my_broadcaster(to_be_broadcasted, fin_arr):
        n = len(to_be_broadcasted)
        k = len(fin_arr[0])
        fin = []
        for x in range(n):
            fin.append([to_be_broadcasted[x] for _ in range(k)])

        return torch.tensor(fin)
    
    model.train()
    total_runs = len(loader)
    for counter, data in enumerate(loader, 0):
        # print(x)
        y = data['target_ids'].to(device, dtype = torch.long)
        y_ids = y[:, :-1].contiguous()
        lm_labels = y[:, 1:].clone().detach()
        lm_labels[y[:, 1:] == tokenizer.pad_token_id] = -100
        ids = data['source_ids'].to(device, dtype = torch.long)
        mask = data['source_mask'].to(device, dtype = torch.long)

        outputs = model(input_ids = ids, attention_mask = mask, decoder_input_ids=y_ids, labels=lm_labels)
        
        with torch.no_grad():
            generated_ids = model.generate(
                input_ids = ids,
                attention_mask = mask, 
                max_length=180, 
                num_beams=2,
                repetition_penalty=2.5, 
                length_penalty=1.0, 
                early_stopping=True
            )
            
            preds = [clean_input(tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True)) for g in generated_ids]
            targets = [clean_input(tokenizer.decode(t, skip_special_tokens=True, clean_up_tokenization_spaces=True)) for t in y]
           
            tokenized_preds = [classifier_tokenizer(pred, padding="max_length", max_length=512, truncation=True, return_tensors="pt").to('cuda') for pred in preds]
            tokenized_targets = [classifier_tokenizer(target, padding="max_length", max_length=512, truncation=True, return_tensors="pt").to('cuda') for target in targets]
            
            classified_preds = [softmax(classifier(**tokenized_pred)[0][0][:2], dim=0).to('cuda') for tokenized_pred in tokenized_preds]
            classified_targets = [softmax(classifier(**tokenized_target)[0][0][:2], dim=0).to('cuda') for tokenized_target in tokenized_targets]
    
        Reg = measure_func_2_torch(torch.Tensor([classified_target[0] for classified_target in classified_targets]).to('cuda'), 
                                   torch.Tensor([classified_pred[0] for classified_pred in classified_preds]).to('cuda')).to('cuda')
        
        embeddings = similarity_checker.encode(preds+ targets, convert_to_tensor=True).to('cuda')
        simi = F.cosine_similarity(embeddings[0].unsqueeze(0), embeddings[1].unsqueeze(0), dim=1).to('cuda')*1.4
        
        new_mid_L = (2*Reg) - simi
        loss = 0.2*outputs[0] + (Reg.reshape(len(new_mid_L), 1, 1))
        
        iter_loss = loss.sum().item()
        
        if counter % 128 == 0:
            print(f"Loss: {iter_loss}")
            print(f"Completed: {(counter/total_runs)*100:.3f}%")
            
            with open(r"C:\\Users\\uujain2\\Desktop/vals.csv", 'a', encoding="utf-8") as results_file:
                    file_writer = csv.writer(results_file, delimiter=',', lineterminator='\n')
                    file_writer.writerow([iter_loss])
            
            save_allow = False
            re_sort = False
            
            global saved
            if saved > MODEL_CHECKPOINT_ST0RE_BUFFER and iter_loss < saved_models[0][0]:
                save_allow = re_sort = True
                print(f"Removing {saved_models[0][1]} in favour of {counter}")
                os.remove(f"C:\\Users\\uujain2\\Desktop\\Utkarsh\\FYP\\Models\\T5\\{saved_models[0][1]}.pt")
                del saved_models[0]
            elif saved > MODEL_CHECKPOINT_ST0RE_BUFFER:
                print(f"Not saving {epoch}_{counter}")
                
            if saved <= MODEL_CHECKPOINT_ST0RE_BUFFER or save_allow:
                torch.save(model, f"C:\\Users\\uujain2\\Desktop\\Utkarsh\\FYP\\Models\\T5\\{epoch}_{counter}.pt")

                saved_models.append((iter_loss, f"{epoch}_{counter}"))
                saved += 1
                print(f"saved {epoch}_{counter}")
            
            if re_sort:
                saved_models.sort(key=lambda x: x[0], reverse=True)
        
        optimizer.zero_grad()
        loss.sum().backward()
        optimizer.step()
        
        if counter % 512 == 0:
            scheduler.step(curr_lr)
            curr_lr = lr_decay(curr_lr)
            print(f"New lr:{curr_lr}")
        

<a id='section04'></a>
### Validating the Model Performance: Function

During the validation stage we pass the unseen data(Testing Dataset), trained model, tokenizer and device details to the function to perform the validation run. This step generates new summary for dataset that it has not seen during the training session. 

This function is called in the `main()`

This unseen data is the 20% of `news_summary.csv` which was seperated during the Dataset creation stage. 
During the validation stage the weights of the model are not updated. We use the generate method for generating new text for the summary. 

It depends on the `Beam-Search coding` method developed for sequence generation for models with LM head. 

The generated text and originally summary are decoded from tokens to text and returned to the `main()`

In [None]:
def validate(epoch, tokenizer, model, device, loader):
    model.eval()
    predictions = []
    actuals = []
    with torch.no_grad():
        for _, data in enumerate(loader, 0):
            y = data['target_ids'].to(device, dtype = torch.float16)
            ids = data['source_ids'].to(device, dtype = torch.float16)
            mask = data['source_mask'].to(device, dtype = torch.float16)
            generated_ids = model.generate(
                input_ids = ids,
                attention_mask = mask, 
                max_length=150, 
                num_beams=2,
                repetition_penalty=2.5, 
                length_penalty=1.0, 
                early_stopping=True
                )
            preds = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]
            target = [tokenizer.decode(t, skip_special_tokens=True, clean_up_tokenization_spaces=True)for t in y]
            if _%100==0:
                print(f'Completed {_}')

            predictions.extend(preds)
            actuals.extend(target)
    return predictions, actuals

<a id='section05'></a>
### Main Function

The `main()` as the name suggests is the central location to execute all the functions/flows created above in the notebook. The following steps are executed in the `main()`:


<a id='section501'></a>
#### Importing and Pre-Processing the domain data

We will be working with the data and preparing it for fine tuning purposes. 
*Assuming that the `news_summary.csv` is already downloaded in your `data` folder*

* The file is imported as a dataframe and give it the headers as per the documentation.
* Cleaning the file to remove the unwanted columns.
* A new string is added to the main article column `summarize: ` prior to the actual article. This is done because **T5** had similar formatting for the summarization dataset. 
* The final Dataframe will be something like this:

|text|ctext|
|--|--|
|biased input 1|biased input 1 copy|
|biased input 2|biased input copy 2|
|biased input 3|biased input copy 3|

* Top 5 rows of the dataframe are printed on the console.

<a id='section502'></a>
#### Creation of Dataset and Dataloader

* The updated dataframe is divided into 80-20 ratio for test and validation. 
* Both the data-frames are passed to the `CustomerDataset` class for tokenization of the new articles and their summaries.
* The tokenization is done using the length parameters passed to the class.
* Train and Validation parameters are defined and passed to the `pytorch Dataloader contstruct` to create `train` and `validation` data loaders.
* These dataloaders will be passed to `train()` and `validate()` respectively for training and validation action.
* The shape of datasets is printed in the console.


<a id='section503'></a>
#### Neural Network and Optimizer

* In this stage we define the model and optimizer that will be used for training and to update the weights of the network. 
* We are using the `t5-base` transformer model for our project. You can read about the `T5 model` and its features above. 
* We use the `T5ForConditionalGeneration.from_pretrained("google/flan-t5-base")` commad to define our model. The `T5ForConditionalGeneration` adds a Language Model head to our `T5 model`. The Language Model head allows us to generate text based on the training of `T5 model`.
* We are using the `Adam` optimizer for our project. This has been a standard for all our tutorials and is something that can be changed updated to see how different optimizer perform with different learning rates.


<a id='section504'></a>
#### Training Model

* Now We call the `train()` with all the necessary parameters.
* Learning Rate at every 512th step is re-calculated and printed on the console.
* Loss at every 128th step is logged and a checkpoint is saved.

In [None]:
def main():
    # WandB – Initialize a new run
    #wandb.init(project="transformers_tutorials_summarization")

    # WandB – Config is a variable that holds and saves hyperparameters and inputs
    # Defining some key variables that will be used later on in the training  
    #config = wandb.config          # Initialize config
    TRAIN_BATCH_SIZE = 4    # input batch size for training (default: 64)
    VALID_BATCH_SIZE = 1    # input batch size for testing (default: 1000)
    TRAIN_EPOCHS = 2        # number of epochs to train (default: 10)
    VAL_EPOCHS = 1 
    LEARNING_RATE = 2e-3    # learning rate (default: 0.01)
    SEED = 42               # random seed (default: 42)
    MAX_LEN = 512
    SUMMARY_LEN = 150 

    # Set random seeds and deterministic pytorch for reproducibility
    # torch.manual_seed(SEED) # pytorch random seed
    # np.random.seed(SEED) # numpy random seed
    # torch.backends.cudnn.deterministic = True

    # tokenzier for encoding the text
    print("Loading Tokenizer")
    tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base", max_length=512, truncation=True)
    print("tokenizer loaded")
    T5_DATA_PATH = "C:\\Users\\uujain2\\Desktop\\Utkarsh\\FYP\\Dataset\\data\\BIG_DS\\big_ds_filtered_cleaned_t5_reduced3.csv"

    # Importing and Pre-Processing the domain data
    # Selecting the needed columns only. 
    # Adding the summarzie text in front of the text. This is to format the dataset similar to how T5 model was trained for summarization task. 
    df = pd.read_csv(T5_DATA_PATH, encoding='utf-8')
    df = df[['text','ctext']]
    df.ctext = 'summarize: ' + df.ctext
    print(df.head())

    
    # Creation of Dataset and Dataloader
    # Defining the train size. So 80% of the data will be used for training and the rest will be used for validation. 
    train_size = 0.8
    train_dataset=df.sample(frac=train_size,random_state = SEED)
    val_dataset=df.drop(train_dataset.index).reset_index(drop=True)
    train_dataset = train_dataset.reset_index(drop=True)

    print("FULL Dataset: {}".format(df.shape))
    print("TRAIN Dataset: {}".format(train_dataset.shape))
    print("TEST Dataset: {}".format(val_dataset.shape))


    # Creating the Training and Validation dataset for further creation of Dataloader
    training_set = CustomDataset(train_dataset, tokenizer, MAX_LEN, SUMMARY_LEN)
    val_set = CustomDataset(val_dataset, tokenizer, MAX_LEN, SUMMARY_LEN)

    # Defining the parameters for creation of dataloaders
    train_params = {
        'batch_size': TRAIN_BATCH_SIZE,
        'shuffle': True,
        'num_workers': 0
        }

    val_params = {
        'batch_size': VALID_BATCH_SIZE,
        'shuffle': False,
        'num_workers': 0
        }

    # Creation of Dataloaders for testing and validation. This will be used down for training and validation stage for the model.
    training_loader = DataLoader(training_set, **train_params)
    val_loader = DataLoader(val_set, **val_params)


    
    # Defining the model. We are using t5-base model and added a Language model layer on top for generation of Summary. 
    # Further this model is sent to device (GPU/TPU) for using the hardware.
    model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base", torch_dtype=torch.float16)
    model = model.to(device)

    # Defining the optimizer that will be used to tune the weights of the network in the training session. 
    # optimizer = torch.optim.Adam(params =  model.parameters(), lr=LEARNING_RATE)
    
    optimizer = AdamW(params=model.parameters(), lr=LEARNING_RATE)
    scheudler = LambdaLR(optimizer, lr_lambda=lr_decay)

    # Training loop
    print('Initiating Fine-Tuning for the model on our dataset')
    
    
        # global TRAIN_EPOCHS
    for epoch in range(TRAIN_EPOCHS):
        train(epoch, tokenizer, model, device, training_loader, optimizer, classifier, scheudler, LEARNING_RATE)
        print(f"Done with epoch:{epoch+1}")


    # Validation loop and saving the resulting file with predictions and acutals in a dataframe.
    # Saving the dataframe as predictions.csv
    print('Now generating summaries on our fine tuned model for the validation dataset and saving it in a dataframe')
    for epoch in range(VAL_EPOCHS):
        predictions, actuals = validate(epoch, tokenizer, model, device, val_loader)
        final_df = pd.DataFrame({'Generated Text':predictions,'Actual Text':actuals})
        final_df.to_csv('./models/predictions.csv')
        print('Output Files generated for review')

if __name__ == '__main__':
    main()
    global bias_vals
    print(bias_vals)
    
    import json
    
    # with open("C:\\Users\\uujain2\\Desktop/vals.json", 'w', encoding="utf-8") as json_file:
    #     json.dump(bias_vals, json_file)

Loading Tokenizer
tokenizer loaded
                                                text  \
0  Rewrite this in a neutral tone: So while there...   
1  Rewrite this in a neutral tone: The Republican...   
2  Rewrite this in a neutral tone: But one glarin...   
3  Rewrite this in a neutral tone: Track and fiel...   
4  Rewrite this in a neutral tone: In other words...   

                                               ctext  
0  summarize: So while there may be a humanitaria...  
1  summarize: The Republican president assumed he...  
2  summarize: But one glaring absentee was Trump,...  
3  summarize: Track and field athletes dont typic...  
4  summarize: In other words, the agency responsi...  
FULL Dataset: (19999, 2)
TRAIN Dataset: (15999, 2)
TEST Dataset: (4000, 2)


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Initiating Fine-Tuning for the model on our dataset
Loss: 15.114421844482422
Completed: 0.000%
saved 0_0
New lr:0.00199


KeyboardInterrupt: 

In [None]:
global saved_models
print(saved_models)

[(7.765748500823975, '0_0'), (9.563030242919922, '0_128'), (0.2390291690826416, '0_256'), (1.8649336099624634, '0_384'), (1.406278371810913, '0_512'), (2.6536340713500977, '0_640'), (2.3441452980041504, '0_768'), (-0.21348899602890015, '0_896'), (3.0647122859954834, '0_1024'), (1.796486258506775, '0_1152'), (0.06873077154159546, '0_1280'), (5.453207492828369, '0_1408'), (0.8677466511726379, '0_1536'), (1.6267653703689575, '0_1664'), (1.7288503646850586, '0_1792'), (1.6235100030899048, '0_1920'), (-0.1930813193321228, '0_2048'), (0.09040826559066772, '0_2176'), (1.5846877098083496, '0_2304'), (1.8786935806274414, '0_2432'), (1.6498414278030396, '0_2560'), (7.320224761962891, '0_2688'), (3.649092435836792, '0_2816'), (0.39499974250793457, '0_2944'), (2.6114909648895264, '0_3072'), (1.270829439163208, '0_3200'), (-0.6853293776512146, '0_3328'), (2.223156452178955, '0_3456'), (0.23672521114349365, '0_3584'), (0.7064460515975952, '0_3712'), (0.16357320547103882, '0_3840'), (0.93410474061965