### Importing libraries

In [7]:
!!pip install -q -U transformers datasets accelerate peft trl bitsandbytes wandb

[]

> - **Transformers library** : This library is part of the Hugging Face Transformers ecosystem. It provides pre-trained models and various tools for working with natural language processing (NLP) tasks, including text generation, translation, summarization, and more.
>
> - **Datasets** : Another library from Hugging Face, datasets provides a collection of datasets for training and evaluating machine learning models. It facilitates easy access to common datasets used in NLP and other domains.
>
> - **Accelerate** :  This library is part of the Hugging Face Accelerated Inference API. It aims to accelerate the training and inference of machine learning models by utilizing hardware accelerators such as GPUs and TPUs.
>
> - **Parameter-Efficient Fine-Tuning (PEFT)** : PEFT methods only fine-tune a some of the model parameters, decreasing the computational and storage costs
>
> - **Transformer Reinforcement Learning (TRL)** : to use the SFTTrainer method
>
> - **Weights & Biases** (W&B) is a machine learning experiment tracking and visualization tool. The wandb library allows you to log and visualize your training metrics, hyperparameters, and other experiment-related information.

### Huggingface🤗 token

In [8]:
hf_token = "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"

> - #### **LoRA**
> Low-Rank Adaptation of Large Language Models is a training technique used to primarily reduce the number of trainable parameters to make the trianing faster and memory efficient. 
>
> - #### **AutoModelForCausalLM**
> To create a class of the relevant architecture.
>
> - #### **AutoTokenizer**
> Tokenizer class
>
> - #### **BitsAndBytesConfig**
> Wrapper class for all possible attributes and features that can be used on a model that has been loaded using bitsandbytes.
>
> - #### **Pipeline**
> Takes raw text as input, tokenizes it, and then feeds it to the model to perform a specific task like text summarization.

In [9]:
import os
import torch 
from datasets import load_dataset #to load dataset from the 🤗 library 
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    AutoTokenizer,
    TrainingArguments,
    pipeline,
)
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training
from trl import SFTTrainer #to Supervise Fine-tune the model

### Model

In [10]:
# Loading base Model from the Huggingface library
base_model = "mistralai/Mistral-7B-v0.1"
new_model = "Fine-tuned-Mistral7B"

tokenizer_config.json:   0%|          | 0.00/237 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/1.08k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

> - The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters
>
> - "Fine-tuned-Mistral7B" is the name of our fine-tuned model that will be uploaded on the HuggingFace library

### Loading Dataset

In [None]:
dataset = load_dataset("YashaP/ACMprojec", split="train")

We merged two datasets :
- SciTLDR :
    - allenai/scitldr
    - [The SciTLDR dataset on the HuggingFace Library 🔗 ](https://huggingface.co/datasets/allenai/scitldr)
    - SciTLDR is a pre-curated dataset of 5.4K TLDRs of over 3.2K papers. 
    - Supported Tasks : Summarisation
-  VIT_Notes
    - We created a custom dataset from VIT notes and their summaries using ChatGPT 
    
    

### Tokenization

To convert a sequence of text into smaller parts referred to as tokens.
This is needed to make it easier for the machine to understand and analyze the text by breaking it down into smaller parts.

In [None]:
tokenizer = AutoTokenizer.from_pretrained(base_model, use_fast=True)
tokenizer.pad_token = tokenizer.unk_token
tokenizer.padding_side = "right"
#To add extra tokens (a special token called [PAD] ) to the right of sequences to make them all of the same length

### Quantisation Configuration

We quantised the model to reduce memory and computational costs by representing weights and activations with lower-precision data types like 8-bit integers (int8). 

In [11]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

config.json:   0%|          | 0.00/869 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.84G [00:00<?, ?B/s]

Some weights of the model checkpoint at microsoft/phi-1_5 were not used when initializing PhiForCausalLM: ['model.layers.16.self_attn.q_proj.weight', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.22.self_attn.q_proj.bias', 'model.layers.19.self_attn.v_proj.bias', 'model.layers.23.self_attn.k_proj.bias', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.13.self_attn.k_proj.bias', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.8.self_attn.q_proj.bias', 'model.layers.11.self_attn.q_proj.bias', 'model.layers.17.self_attn.k_proj.bias', 'model.layers.6.self_attn.q_proj.bias', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.14.self_attn.k_proj.bias', 'model.layers.5

generation_config.json:   0%|          | 0.00/74.0 [00:00<?, ?B/s]

### Configuring LoRA 

> LoRA (Low-Rank Adaptation of Large Language Models) is a lightweight training technique mainly used to reduce the number of trainable parameters. It works by inserting a smaller number of new weights into the model and training only those weights; making LoRA much faster, memory-efficient, and produce smaller model weights (a few hundred MBs), which are easier to store and share. 

In [None]:
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=['up_proj', 'down_proj', 'gate_proj', 'k_proj', 'q_proj', 'v_proj', 'o_proj']
)

### Loading the base model

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map={"": 0}
)

model = prepare_model_for_kbit_training(model)

In [1]:
# Setting training arguments
training_arguments = Train?ingArguments(
        output_dir="./results",
        num_train_epochs=3,
        per_device_train_batch_size=2,
        gradient_accumulation_steps=1,
        evaluation_strategy="steps",
        eval_steps=1000,
        logging_steps=1,
        optim="paged_adamw_8bit",
        learning_rate=2e-4,
        lr_scheduler_type="linear",
        warmup_steps=10,
        report_to="wandb",
)

# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    eval_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="input",
    max_seq_length=512,
    tokenizer=tokenizer,
    args=training_arguments,
)

# Training the model
trainer.train()

self.args.save_steps = len(epoch_iterator)

# Save the trained model
trainer.model.save_pretrained(new_model)

SyntaxError: invalid syntax (744354839.py, line 2)

### Merging the base model with the trained adapter.

In [None]:
# Reloading model in FP16 and merge it with LoRA weights
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map={"": 0},
)
model = PeftModel.from_pretrained(model, new_model)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

### Saving the Model to the Huggingface🤗 Hub

In [2]:
model.push_to_hub(new_model, use_temp_dir=False, token=hf_token)
tokenizer.push_to_hub(new_model, use_temp_dir=False, token=hf_token)

NameError: name 'model' is not defined

### Using pickle to upload the model

Pickle serializes the object and converts it into a byte stream that will then be saved in the file model.pkl.

In [None]:
import pickle
pickle.dump(model, open('model.pkl', 'wb'))

### Testing the model using a pipeline

> - Pipelines act as objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks
>
> - Pipelines are made of:
    - A Tokenizer instance in charge of mapping raw textual input to token
    - A Model instance
    - Some (optional) post processing for enhancing model’s output

In [None]:
# Running summarisation pipeline
prompt = "Machine learning is a subfield of artificial intelligence that focuses on the development of algorithms and statistical models, allowing computer systems to improve their performance on a specific task through experience and data. Instead of being explicitly programmed to perform a task, machines equipped with machine learning algorithms can analyze and learn from data, identifying patterns and making predictions or decisions without explicit human intervention. This dynamic capability to adapt and improve over time has found applications in various domains, such as image and speech recognition, natural language processing, recommendation systems, and predictive analytics. Machine learning continues to play a pivotal role in advancing technology and reshaping industries by enabling computers to autonomously acquire and apply knowledge from data."
instruction = f"###Instruction->PROVIDE ME WITH A SUMMARY FOR THE GIVEN INPUT WHILE KEEPING THE MOST IMPORTANT DETAILS INTACT:\n{prompt}\n\n### SUMMARY:\n"
pipe = pipeline(task="summarization", model=model, tokenizer=tokenizer, max_length=128)
result = pipe(instruction)
print(result[0]['generated_text'][len(instruction):])

In [None]:
# Empty VRAM
del model
del pipe
del trainer
import gc
gc.collect()
gc.collect()

> The purpose of this code cell is to release memory occupied by the model, pipeline, and trainer objects, as well as to explicitly trigger the garbage collector to reclaim any memory that is no longer in use.It frees up resources, especially when working with large models or datasets to avoid memory issues. 