# | NLP | LLM | Fine-tuning | Own Chatbot LoRA T5 |

## Natural Language Processing (NLP) and Large Language Models (LLM) with Fine-Tuning LLM and mae Chatbot with LoRA and Flan-T5 


![Learning](https://t3.ftcdn.net/jpg/06/14/01/52/360_F_614015247_EWZHvC6AAOsaIOepakhyJvMqUu5tpLfY.jpg)


# <b>1 <span style='color:#78D118'>|</span> Overview</b>

In this notebook we're going to Fine-Tuning LLM:

<img src="https://github.com/YanSte/NLP-LLM-Fine-tuning-Trainer/blob/main/img_2.png?raw=true" alt="Learning" width="50%">

Many LLMs are general purpose models trained on a broad range of data and use cases. This enables them to perform well in a variety of applications, as shown in previous modules. It is not uncommon though to find situations where applying a general purpose model performs unacceptably for specific dataset or use case. This often does not mean that the general purpose model is unusable. Perhaps, with some new data and additional training the model could be improved, or fine-tuned, such that it produces acceptable results for the specific use case.

<img src="https://github.com/YanSte/NLP-LLM-Fine-tuning-Trainer/blob/main/img_1.png?raw=true" alt="Learning" width="50%">

Fine-tuning uses a pre-trained model as a base and continues to train it with a new, task targeted dataset. Conceptually, fine-tuning leverages that which has already been learned by a model and aims to focus its learnings further for a specific task.

It is important to recognize that fine-tuning is model training. The training process remains a resource intensive, and time consuming effort. Albeit fine-tuning training time is greatly shortened as a result of having started from a pre-trained model. 

<img src="https://github.com/YanSte/NLP-LLM-Fine-tuning-Trainer/blob/main/img_3.png?raw=true" alt="Learning" width="50%">



### The Power of Fine-Tuning: An Overview
Fine-tuning, a crucial aspect of adapting pre-trained models to specific tasks, has witnessed a revolutionary approach known as Low Rank Adaptation (LoRA). Unlike conventional fine-tuning methods, LoRA strategically freezes pre-trained model weights and introduces trainable rank decomposition matrices into the Transformer architecture's layers. This innovative technique significantly reduces the number of trainable parameters, leading to expedited fine-tuning processes and mitigated overfitting.

### What is LoRA?
LoRA represents a paradigm shift in fine-tuning strategies, offering efficiency and effectiveness. By reducing the number of trainable parameters and GPU memory requirements, LoRA proves to be a powerful tool for tailoring pre-trained large models to specific tasks. This article explores how LoRA can be employed to create a personalized chatbot.

<img src="https://github.com/YanSte/NLP-LLM-Fine-tuning-T5-Small-Reviews/blob/main/img_1.png?raw=true" alt="Learning" width="50%">


## Learning Objectives
Prepare a novel dataset

 By the end of this notebook, you will be able to:
1. Prepare a novel dataset
2. Fine-tune the t5 model manually (Without Trainer) as a chatbot
3. Using Accelerator

# Setup

In [1]:
%%capture
!pip install -U torch torchvision
!pip install -U evaluate
!pip install -U transformers
!pip install -U datasets
!pip install -U accelerate
!pip install -U peft
!pip install -U deepspeed

In [2]:
import torch
assert torch.cuda.is_available()

In [3]:
###### General ######
import os
import gc
import pandas as pd
import numpy as np
from tqdm import tqdm

###### Torch ######

import torch
from torch.utils.data import DataLoader

###### Hugging face ######

# Dataset
# --
from datasets import load_dataset
from datasets import Dataset, DatasetDict
import datasets

# Transformers
# --
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, AutoConfig, default_data_collator, get_linear_schedule_with_warmup, DataCollatorForSeq2Seq, set_seed
import transformers

# Perf
# --
from peft import LoraConfig, TaskType, get_peft_model, get_peft_model_state_dict, PeftModel
from peft.utils.other import fsdp_auto_wrap_policy

# Accelerator
# --
from accelerate import Accelerator



#### Config

In [4]:
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "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"

In [5]:
# General
# ---
cache_dir = "./cache"
seed = 42

# Model
# ---
model_name_or_path = "google/flan-t5-small"#"google/flan-t5-xxl"


# Tokenizer 
# ---
max_length = 256

# Preprocess Tokenize
# ---
preprocess_tokenize_num_proc=16

# DataLoader
# ---
batch_size = 2

# Lora
# ---
task_type=TaskType.SEQ_2_SEQ_LM
inference_mode=False
r=8
lora_alpha=32
lora_dropout=0.1

# Model
# ---
def get_peft_model_name_or_path(model_name_or_path, peft_config): 
    return f"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_")

def get_optimizer(model, lr):
    return torch.optim.AdamW(model.parameters(), lr=lr)

lr = 1e-4
num_epochs = 1

#### Setup

In [6]:
pd.set_option('display.max_column', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_seq_items', None)
pd.set_option('display.max_colwidth', 500)
pd.set_option('expand_frame_repr', True)

In [7]:
set_seed(seed)

In [8]:
accelerator = Accelerator()

# Or
# Initialize accelerator with config from configs/accelerate_ds_z3.yaml
# accelerator = (
#     Accelerator(log_with=args.report_to, logging_dir=args.output_dir) if args.with_tracking else Accelerator()
# )

In [9]:
# To have only one message (and not 8) per logs of Transformers or Datasets, we set the logging verbosity
if accelerator.is_main_process:
    datasets.utils.logging.set_verbosity_warning()
    transformers.utils.logging.set_verbosity_info()
else:
    datasets.utils.logging.set_verbosity_error()
    transformers.utils.logging.set_verbosity_error()   

#### Methods

In [10]:
def accelerator_print_sep(accelerator=accelerator):
    sep = "#" * 12
    accelerator.print(sep)

# <b>2 <span style='color:#78D118'>|</span> Fine-Tuning</b>

### Step 1 - Data Preparation

The first step of the fine-tuning process is to identify a specific task and supporting dataset.

We will use two datasets:

[Hello-SimpleAI/HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3?source=post_page-----d7817b77fac0--------------------------------)

and

[MohamedRashad/ChatGPT-prompts](https://huggingface.co/datasets/MohamedRashad/ChatGPT-prompts?source=post_page-----d7817b77fac0--------------------------------)

In [11]:
hello_dataset = load_dataset("Hello-SimpleAI/HC3", name="all")
hc3_dataset = load_dataset("MohamedRashad/ChatGPT-prompts")

Downloading data:   0%|          | 0.00/39.3M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/24322 [00:00<?, ? examples/s]

Downloading readme:   0%|          | 0.00/404 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/422k [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

In [12]:
hc3_dataset

DatasetDict({
    train: Dataset({
        features: ['human_prompt', 'chatgpt_response'],
        num_rows: 360
    })
})

In [13]:
hello_dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'question', 'human_answers', 'chatgpt_answers', 'source'],
        num_rows: 24322
    })
})

In [14]:
hello_df = pd.DataFrame(hello_dataset['train'])
hc3_df = pd.DataFrame(hc3_dataset['train'])

In [15]:
questions, answers = [], []

# Process Hello DataFrame
for _, row in hello_df.iterrows():
    
    for answer in row["human_answers"]:
        questions.append("Human: " + row["question"])
        answers.append("Assistant: " + answer)
        
    for answer in row["chatgpt_answers"]:
        questions.append("Human: " + row["question"])
        answers.append("Assistant: " + answer)

# Process Hc3 DataFrame
for _, row in hc3_df.iterrows():
    human_prompt = row["human_prompt"]
    chatgpt_response = row["chatgpt_response"]
    questions.append("Human: " + human_prompt)
    answers.append("Assistant: " + chatgpt_response)

# Create a new DataFrame
df = pd.DataFrame()
df["question"] = questions
df["answer"] = answers

# Save to CSV file
df.to_csv("./train.csv", index=False)

In [16]:
df.head()

Unnamed: 0,question,answer
0,"Human: Why is every book I hear about a "" NY Times # 1 Best Seller "" ? ELI5 : Why is every book I hear about a "" NY Times # 1 Best Seller "" ? Should n't there only be one "" # 1 "" best seller ? Please explain like I'm five.","Assistant: Basically there are many categories of "" Best Seller "" . Replace "" Best Seller "" by something like "" Oscars "" and every "" best seller "" book is basically an "" oscar - winning "" book . May not have won the "" Best film "" , but even if you won the best director or best script , you 're still an "" oscar - winning "" film . Same thing for best sellers . Also , IIRC the rankings change every week or something like that . Some you might not be best seller one week , but you may be the nex..."
1,"Human: Why is every book I hear about a "" NY Times # 1 Best Seller "" ? ELI5 : Why is every book I hear about a "" NY Times # 1 Best Seller "" ? Should n't there only be one "" # 1 "" best seller ? Please explain like I'm five.","Assistant: If you 're hearing about it , it 's because it was a very good or very well - publicized book ( or both ) , and almost every good or well - publicized book will be # 1 on the NY Times bestseller list for at least a little bit . Kindof like how almost every big or good movies are # 1 at the box office on their opening weekend ."
2,"Human: Why is every book I hear about a "" NY Times # 1 Best Seller "" ? ELI5 : Why is every book I hear about a "" NY Times # 1 Best Seller "" ? Should n't there only be one "" # 1 "" best seller ? Please explain like I'm five.","Assistant: One reason is lots of catagories . However , how the NY Times calculates its best seller list is n't comprehensive , and is pretty well understood by publishers . So publishers can [ buy a few books ] ( URL_0 ) in the right bookstores and send a book to the top of the list for at least a week ."
3,"Human: Why is every book I hear about a "" NY Times # 1 Best Seller "" ? ELI5 : Why is every book I hear about a "" NY Times # 1 Best Seller "" ? Should n't there only be one "" # 1 "" best seller ? Please explain like I'm five.","Assistant: There are many different best seller lists that are published by various organizations, and the New York Times is just one of them. The New York Times best seller list is a weekly list that ranks the best-selling books in the United States based on sales data from a number of different retailers. The list is published in the New York Times newspaper and is widely considered to be one of the most influential best seller lists in the book industry. \nIt's important to note that the ..."
4,"Human: If salt is so bad for cars , why do we use it on the roads ? As the title states , why do we use it ? is there no other option or what ? Please explain like I'm five.","Assistant: salt is good for not dying in car crashes and car crashes are worse for cars then salt . Some places use other things , but salt is really cheap compared to most alternatives , although sand is pretty good ."


In [17]:
dataset = Dataset.from_pandas(df, split='train')

Creer Train test dataset

In [18]:
dataset = dataset.train_test_split(shuffle=True, seed=42)
test_ds = dataset.pop("test")
dataset["validation"] = test_ds

In [19]:
dataset

DatasetDict({
    train: Dataset({
        features: ['question', 'answer'],
        num_rows: 64356
    })
    validation: Dataset({
        features: ['question', 'answer'],
        num_rows: 21453
    })
})

In [20]:
pd.DataFrame(dataset["train"][:15]).head(15)

Unnamed: 0,question,answer
0,"Human: How does war make money ? You here a lot about war profiteering , but how does it work ? How does it generate revenue , and for whom ? Edit : I am also curious how U.S. compainies make money rebuilding town we wage wars in Please explain like I'm five.","Assistant: [ It 's called the Military - Industrial Complex ] ( URL_0 ) . When we go to war , or even just by maintaining a superior military force , the companies supplying troops , handling logistics , manufacturing weapons , and developing technology are making hundreds of billions of dollars per year . All paid for by taxes and through taking on additional national debt . Lobbyists from all these companies work very closely with legislators to ensure that projects and contracts keep comi..."
1,Human: when is world war hulk movie come,"Assistant: I'm sorry, but there is no World War Hulk movie currently in production or announced. World War Hulk is a comic book event published by Marvel Comics in 2007, in which the character Hulk becomes enraged and declares war on the Marvel Universe after being exiled from Earth by a group of superheroes. If you are interested in watching a movie or TV show featuring the character Hulk, you might enjoy the Marvel Cinematic Universe films and TV shows that are available to stream or rent...."
2,Human: Paying Off Principal of Home vs. Investing In Mutual Fund,"Assistant: Deciding whether to pay off the principal of your home or invest in a mutual fund is a personal decision that depends on your financial situation and goals. Here are a few things to consider when making this decision:Interest rates: If you have a mortgage with a high interest rate, paying off the principal can help you save money on interest over the long term. On the other hand, if you have a mortgage with a low interest rate, you may be able to earn a higher return by investing ..."
3,"Human: For the first time in my life, I'm going to be making real money…what should I do with it?","Assistant: On the one hand, it's a great idea to open a Roth IRA now, once you've got the cash to contribute. It's a tax designation sounds like it would fit your meager earnings this year. The main reason to open one now rather than later is that some types of withdrawls require the account be aged 5 years. But you can also withdraw the amount you've contributed tax free any time. Student loans right now are pricey, so if you're carrying a balance at say 6.8 percent fixed you should pay tha..."
4,Human: How did we choose the first day of a season ? Why is the first day of : - Autumn / Fall : Sept. 21 - Winter : Dec. 21 - Spring : March 21 - Summer June 21 Please explain like I'm five.,"Assistant: The first days of spring and fall are both equinoxes , days where the sun is out equal to the moon . Summer and winter both have their solstice on the longest and shortest days of the year respectively ."
5,"Human: Why are laws which require identification for voting , such as in Texas , being scrutinized by the media ? Would n't it make sense that only citizens can vote ? And it seems that everyone having IDs would be safer . But ELI5 . Explain like I'm five.","Assistant: Their is no evidence of voter fraud that could be prevented by identification for voting . The laws are under suspension because the majority of people they affect ( who lack the proper ID ) are primarily minorities . Since these populations are primarily democratic , and the states creating the laws are primarily republican , it looks a great deal like the republicans are trying to rewrite the law to take away the voting rights of groups that are opposed to their views . In simpl..."
6,"Human: What makes creating your own operating system so complicated ? Seems like there are thousands upon thousands of complex software programs out there , but very very few operating systems . Please explain like I'm five.","Assistant: There is so little need for new operating systems . For almost all practical purposes , it 's far easier to modify existing operating system to suit your needs , than it is to build a new one . And do n't think it 's just Windows either , even outside the Linux and OS X , there are tons of special purpose operating systems which are intended for very specific tasks . Like , there are multiple operating systems intended to run on machines which only have couple of megabytes of stor..."
7,Human: How can schools have the right to keep a student after the normal school day ? What gives them this right ? Why can schools force children and students to stay in their building for punishment ? Please explain like I'm five.,"Assistant: "" I do n't care how long it takes . I will keep you here all night . "" "" We ca n't keep them past four . "" "" I will keep you here till four . """
8,Human: What are the duties of military personnel stationed in peaceful countries . For example American military personnel stationed in South Korea or Japan . What are their responsibilities and duties . Please explain like I'm five.,"Assistant: You are either training for why you are there ( via exercises , coordinating with allied forces , etc . ) or supporting those doing said training ( such as logistics , maintenance , medical , etc . )"
9,"Human: Why did the world 's magnetic field suddenly reverse 780,000 years ago ? Can anyone explain Matuyama - Brunhes event and how we were able to identify it from sea cores and the like ? This fascinates and is beyond me . Explain like I'm five.","Assistant: The giant trench in the atlantic ocean is constantly spewing new hot molten crust up from the surface . very slowly of course . but over time thousands and millions of years the magma comes up cools and hardens into new sea floor . the earth is a giant magnet , that s why compasses work and birds know where to fly for winter etc . There are bits of iron ( which is magnetic ) that is in the magma . it is lined up with the earths magnetic field while its hot , the little domains are..."


### Step 2 - Select pre-trained model with tokenizer and DataCollator


In [21]:
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path, cache_dir=cache_dir)

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, cache_dir=cache_dir)

data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

loading configuration file config.json from cache at ./cache/models--google--flan-t5-small/snapshots/0fc9ddf78a1e988dac52e2dac162b0ede4fd74ab/config.json
Model config T5Config {
  "_name_or_path": "google/flan-t5-small",
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "classifier_dropout": 0.0,
  "d_ff": 1024,
  "d_kv": 64,
  "d_model": 512,
  "decoder_start_token_id": 0,
  "dense_act_fn": "gelu_new",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "gated-gelu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_decoder_layers": 8,
  "num_heads": 6,
  "num_layers": 8,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summarization": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
      "min_length"

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

loading weights file model.safetensors from cache at ./cache/models--google--flan-t5-small/snapshots/0fc9ddf78a1e988dac52e2dac162b0ede4fd74ab/model.safetensors
Generate config GenerationConfig {
  "decoder_start_token_id": 0,
  "eos_token_id": 1,
  "pad_token_id": 0
}

All model checkpoint weights were used when initializing T5ForConditionalGeneration.

All the weights of T5ForConditionalGeneration were initialized from the model checkpoint at google/flan-t5-small.
If your task is similar to the task the model of the checkpoint was trained on, you can already use T5ForConditionalGeneration for predictions without further training.


generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

loading configuration file generation_config.json from cache at ./cache/models--google--flan-t5-small/snapshots/0fc9ddf78a1e988dac52e2dac162b0ede4fd74ab/generation_config.json
Generate config GenerationConfig {
  "decoder_start_token_id": 0,
  "eos_token_id": 1,
  "pad_token_id": 0
}



tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

loading file spiece.model from cache at ./cache/models--google--flan-t5-small/snapshots/0fc9ddf78a1e988dac52e2dac162b0ede4fd74ab/spiece.model
loading file tokenizer.json from cache at ./cache/models--google--flan-t5-small/snapshots/0fc9ddf78a1e988dac52e2dac162b0ede4fd74ab/tokenizer.json
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at ./cache/models--google--flan-t5-small/snapshots/0fc9ddf78a1e988dac52e2dac162b0ede4fd74ab/special_tokens_map.json
loading file tokenizer_config.json from cache at ./cache/models--google--flan-t5-small/snapshots/0fc9ddf78a1e988dac52e2dac162b0ede4fd74ab/tokenizer_config.json


### Step 3 - Tokenizer



In [22]:
def preprocess_tokenize(examples):
    inputs = [doc for doc in examples["question"]]
    
    model_inputs = tokenizer(
        inputs, 
        max_length=max_length, 
        padding=True, 
        truncation=True
    )

    # Setup the tokenizer for targets
    labels = tokenizer(
        examples["answer"], 
        max_length=max_length, 
        padding=True, 
        truncation=True
    )


    model_inputs["labels"] = labels["input_ids"]
    
    return model_inputs

In [23]:
with accelerator.main_process_first():
    processed_datasets = dataset.map(
        preprocess_tokenize,
        batched=True,
        num_proc=preprocess_tokenize_num_proc,
        remove_columns=dataset["train"].column_names,
        load_from_cache_file=False,
        desc="Running tokenizer on dataset",
    )

    
train_dataset = processed_datasets["train"]
eval_dataset = processed_datasets["validation"]    

Running tokenizer on dataset (num_proc=16):   0%|          | 0/64356 [00:00<?, ? examples/s]

Running tokenizer on dataset (num_proc=16):   0%|          | 0/21453 [00:00<?, ? examples/s]

In [24]:
train_dataset

Dataset({
    features: ['input_ids', 'attention_mask', 'labels'],
    num_rows: 64356
})

In [25]:
eval_dataset

Dataset({
    features: ['input_ids', 'attention_mask', 'labels'],
    num_rows: 21453
})

### Step 4 - DataLoader

In [26]:
train_dataloader = DataLoader(
    train_dataset, 
    shuffle=True, 
    collate_fn=data_collator, 
    batch_size=batch_size, 
    pin_memory=True
)

eval_dataloader = DataLoader(
    eval_dataset, 
    collate_fn=data_collator, 
    batch_size=batch_size, 
    pin_memory=True
)


### Step 5 - Lora


In [27]:
peft_config = LoraConfig(
    task_type=task_type, 
    inference_mode=False, 
    r=r, 
    lora_alpha=lora_alpha, 
    lora_dropout=lora_dropout
)

Note:
1. **`PeftModel.from_pretrained`:**
    - By default, the adapter of the PEFT model is frozen (non-trainable).
    - You can change this by adjusting the `is_trainable` configuration.

2. **`get_peft_model` function:**
    - Parameters are not frozen by default.
    - Result: you obtain a trainable PEFT model for the SFT task.

3. **Fine-tuning an already fine-tuned PEFT model:**
    - Utilize `from_pretrained`.
    - Set `is_trainable = True` to enable training of the previously fine-tuned model.

In [28]:
model = get_peft_model(model, peft_config)
accelerator_print_sep()
accelerator.print(model.print_trainable_parameters())
accelerator_print_sep()

############
trainable params: 344,064 || all params: 77,305,216 || trainable%: 0.445072166928555
None
############


In [29]:
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:1024"

### Step 6 -  Learning Ratescheduler

In [30]:
# Setup optimizer
optimizer = get_optimizer(model, lr)    

In [31]:
lr_scheduler = get_linear_schedule_with_warmup(
    optimizer=optimizer,
    num_warmup_steps=0,
    num_training_steps=(len(train_dataloader) * num_epochs),
)

In [32]:
if getattr(accelerator.state, "fsdp_plugin", None) is not None:
    accelerator.state.fsdp_plugin.auto_wrap_policy = fsdp_auto_wrap_policy(model)

### Step 7 -  Accelerator setup

### Cleaning

In [33]:
torch.cuda.empty_cache()
gc.collect()

66

In [34]:
model, train_dataloader, eval_dataloader, optimizer, lr_scheduler = accelerator.prepare(
    model, 
    train_dataloader, 
    eval_dataloader, 
    optimizer, 
    lr_scheduler
)
accelerator_print_sep()
accelerator.print(model)
accelerator_print_sep()

############
PeftModelForSeq2SeqLM(
  (base_model): LoraModel(
    (model): T5ForConditionalGeneration(
      (shared): Embedding(32128, 512)
      (encoder): T5Stack(
        (embed_tokens): Embedding(32128, 512)
        (block): ModuleList(
          (0): T5Block(
            (layer): ModuleList(
              (0): T5LayerSelfAttention(
                (SelfAttention): T5Attention(
                  (q): lora.Linear(
                    (base_layer): Linear(in_features=512, out_features=384, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.1, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=512, out_features=8, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=8, out_features=384, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
  

### DeepSpeed

This code is designed to determine whether the model is currently using the third stage of the DeepSpeed decentralization process (zero_stage). DeepSpeed is a library that optimizes training models on distributed architectures, particularly on GPUs.

Detailed explanations:
- `accelerator`: It seems to refer to an object that manages hardware acceleration, possibly provided by the Hugging Face Accelerated Inference API.
- `accelerator.state`: Accesses the internal state of the accelerator object.
- `accelerator.state.deepspeed_plugin`: If DeepSpeed is in use, this accesses the object representing the DeepSpeed plugin within the accelerator state.
- `accelerator.state.deepspeed_plugin.zero_stage`: Accesses the current stage of the DeepSpeed decentralization process.

In [35]:
is_ds_zero_3 = False
if getattr(accelerator.state, "deepspeed_plugin", None):
    is_ds_zero_3 = accelerator.state.deepspeed_plugin.zero_stage == 3

### Step 7 -  Train

In [36]:
def train_epoch(model, train_dataloader, optimizer, lr_scheduler, accelerator):
    """
    Train the model for one epoch.

    Args:
        model (torch.nn.Module): The PyTorch model to be trained.
        train_dataloader (torch.utils.data.DataLoader): DataLoader for the training dataset.
        optimizer (torch.optim.Optimizer): The optimizer used for training.
        lr_scheduler (torch.optim.lr_scheduler._LRScheduler): Learning rate scheduler.
        accelerator (accelerator): The accelerator to handle distributed training.

    Returns:
        torch.Tensor: Average training loss for the epoch.
    """
    model.train()
    total_loss = 0

    for step, batch in enumerate(tqdm(train_dataloader, disable=not accelerator.is_main_process)):
        outputs = model(**batch)
        loss = outputs.loss
        total_loss += loss.detach().float()
        accelerator.backward(loss)
        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()

    return total_loss / len(train_dataloader)

def evaluate_epoch(model, eval_dataloader, tokenizer, accelerator):
    """
    Evaluate the model on the validation dataset for one epoch.

    Args:
        model (torch.nn.Module): The PyTorch model to be evaluated.
        eval_dataloader (torch.utils.data.DataLoader): DataLoader for the validation dataset.
        tokenizer: Tokenizer for decoding model predictions.
        accelerator (accelerator): The accelerator to handle distributed training.

    Returns:
        Tuple[torch.Tensor, List[str]]: Average evaluation loss for the epoch and list of predicted tokens.
    """
    model.eval()
    eval_loss = 0
    eval_preds = []

    for step, batch in enumerate(tqdm(eval_dataloader, disable=not accelerator.is_main_process)):
        with torch.no_grad():
            outputs = model(**batch)

        loss = outputs.loss
        eval_loss += loss.detach().float()
        preds = accelerator.gather_for_metrics(torch.argmax(outputs.logits, -1)).detach().cpu().numpy()
        eval_preds.extend(tokenizer.batch_decode(preds, skip_special_tokens=True))

    return eval_loss / len(eval_dataloader), eval_preds

def save_model(model, peft_model_name_or_path, accelerator):
    """
    Save the model to the specified path.

    Args:
        model (torch.nn.Module): The PyTorch model to be saved.
        peft_model_name_or_path (str): The path or name to save the model.
        accelerator (accelerator): The accelerator to handle distributed training.
    """
    accelerator.wait_for_everyone()
    model.save_pretrained(peft_model_name_or_path)
    tokenizer.save_pretrained(peft_model_name_or_path)
    accelerator.wait_for_everyone()

In [37]:
peft_model_name_or_path = get_peft_model_name_or_path(model_name_or_path, peft_config)

# Main training loop
for epoch in range(num_epochs):
    # Train
    # ---
    accelerator_print_sep()
    accelerator.print(f"Train epoch: {epoch=}")
    
    train_loss = train_epoch(model, train_dataloader, optimizer, lr_scheduler, accelerator)
    train_ppl = torch.exp(train_loss)
    
    accelerator_print_sep()
    accelerator.print(f"{epoch=}: {train_ppl=} {train_loss=}")

    # Eval
    # ---
    accelerator_print_sep()
    accelerator.print(f"Eval epoch: {epoch=}")
    
    eval_loss, eval_preds = evaluate_epoch(model, eval_dataloader, tokenizer, accelerator)
    eval_ppl = torch.exp(eval_loss)
    
    accelerator_print_sep()
    accelerator.print(f"{epoch=}: {eval_ppl=} {eval_loss=}")
    
    # Save
    # ---
    accelerator_print_sep()
    accelerator.print(f"Save epoch: {epoch=}")
    save_model(model, peft_model_name_or_path, accelerator)

############
Train epoch: epoch=0


  0%|          | 0/32178 [00:00<?, ?it/s]You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
100%|██████████| 32178/32178 [43:27<00:00, 12.34it/s]


############
epoch=0: train_ppl=tensor(17.3434, device='cuda:0') train_loss=tensor(2.8532, device='cuda:0')
############
Eval epoch: epoch=0


100%|██████████| 10727/10727 [05:54<00:00, 30.28it/s]
tokenizer config file saved in google_flan-t5-small_LORA_SEQ_2_SEQ_LM/tokenizer_config.json
Special tokens file saved in google_flan-t5-small_LORA_SEQ_2_SEQ_LM/special_tokens_map.json
Copy vocab file to google_flan-t5-small_LORA_SEQ_2_SEQ_LM/spiece.model


############
epoch=0: eval_ppl=tensor(10.8168, device='cuda:0') eval_loss=tensor(2.3811, device='cuda:0')
############
Save epoch: epoch=0


# Eval

In [38]:
from peft import PeftConfig

#config = PeftConfig.from_pretrained(peft_config)
# load base LLM model and tokenizer

#  peft_config = LoraConfig(task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)
    
 #     model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
#    + model = get_peft_model(model, peft_config)
 #   + model.print_trainable_parameters()
    
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)

#model = get_peft_model(model, config)
# Load the Lora model
# model.load_adapter(peft_model_id)
model = PeftModel.from_pretrained(model, peft_model_name_or_path, device_map={"":0})
model.eval()

print("Peft model loaded")


tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

loading file spiece.model from cache at /root/.cache/huggingface/hub/models--google--flan-t5-small/snapshots/0fc9ddf78a1e988dac52e2dac162b0ede4fd74ab/spiece.model
loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--google--flan-t5-small/snapshots/0fc9ddf78a1e988dac52e2dac162b0ede4fd74ab/tokenizer.json
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--google--flan-t5-small/snapshots/0fc9ddf78a1e988dac52e2dac162b0ede4fd74ab/special_tokens_map.json
loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--google--flan-t5-small/snapshots/0fc9ddf78a1e988dac52e2dac162b0ede4fd74ab/tokenizer_config.json


config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--google--flan-t5-small/snapshots/0fc9ddf78a1e988dac52e2dac162b0ede4fd74ab/config.json
Model config T5Config {
  "_name_or_path": "google/flan-t5-small",
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "classifier_dropout": 0.0,
  "d_ff": 1024,
  "d_kv": 64,
  "d_model": 512,
  "decoder_start_token_id": 0,
  "dense_act_fn": "gelu_new",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "gated-gelu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_decoder_layers": 8,
  "num_heads": 6,
  "num_layers": 8,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summarization": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 20

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--google--flan-t5-small/snapshots/0fc9ddf78a1e988dac52e2dac162b0ede4fd74ab/model.safetensors
Generate config GenerationConfig {
  "decoder_start_token_id": 0,
  "eos_token_id": 1,
  "pad_token_id": 0
}

All model checkpoint weights were used when initializing T5ForConditionalGeneration.

All the weights of T5ForConditionalGeneration were initialized from the model checkpoint at google/flan-t5-small.
If your task is similar to the task the model of the checkpoint was trained on, you can already use T5ForConditionalGeneration for predictions without further training.


generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--google--flan-t5-small/snapshots/0fc9ddf78a1e988dac52e2dac162b0ede4fd74ab/generation_config.json
Generate config GenerationConfig {
  "decoder_start_token_id": 0,
  "eos_token_id": 1,
  "pad_token_id": 0
}



Peft model loaded


In [39]:
text = [
    "Human: when is world war hulk movie come",
    "Human: What makes creating your own operating system so complicated ? Seems like there are thousands upon thousands of complex software programs out there , but very very few operating systems . Please explain like I'm five."
] 

inputs = tokenizer(
    text, 
    return_tensors="pt", 
    truncation=True, 
    padding=True
)

pred = model.generate(
    input_ids=inputs["input_ids"], 
    attention_mask=inputs["attention_mask"]
)

pdf = pd.DataFrame(
    zip(text, tokenizer.batch_decode(pred, skip_special_tokens=True))
)
display(pdf)


Generate config GenerationConfig {
  "decoder_start_token_id": 0,
  "eos_token_id": 1,
  "pad_token_id": 0
}



Unnamed: 0,0,1
0,Human: when is world war hulk movie come,Assistant: The movie is a fictionalized account of the war in Iraq. The movie is
1,"Human: What makes creating your own operating system so complicated ? Seems like there are thousands upon thousands of complex software programs out there , but very very few operating systems . Please explain like I'm five.",Assistant: It is a very complicated process. It's a lot of
