<a href="https://colab.research.google.com/github/bearbearyu1223/llm-fine-tuning-playground/blob/main/finetune_falcon_7b_conversation_summarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Fine-tune [Falcon-7b-sharded model](https://huggingface.co/vilsonrodrigues/falcon-7b-sharded)** on [samsum](https://huggingface.co/datasets/samsum) can be found on Hugging Face.
Links to both the model and dataset are in the notebook.


##Installs and imports

In [1]:
#all installs
!pip install -q -U trl accelerate git+https://github.com/huggingface/peft.git
!pip install transformers==4.34.0
!pip install huggingface_hub==0.18.0
!pip install -q datasets bitsandbytes einops wandb


#all imports
import torch
import time
from huggingface_hub import notebook_login
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer, GenerationConfig
from peft import LoraConfig, get_peft_model, PeftConfig, PeftModel, prepare_model_for_kbit_training, TaskType
from transformers import TrainingArguments
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM

#ignore warnings
import warnings
warnings.filterwarnings("ignore")

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting huggingface_hub==0.18.0
  Using cached huggingface_hub-0.18.0-py3-none-any.whl (301 kB)
Installing collected packages: huggingface_hub
  Attempting uninstall: huggingface_hub
    Found existing installation: huggingface-hub 0.17.3
    Uninstalling huggingface-hub-0.17.3:
      Successfully uninstalled huggingface-hub-0.17.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tokenizers 0.14.1 requires huggingface_hub<0.18,>=0.16.4, but you have huggingface-hub 0.18.0 which is incompatible.[0m[31m
[0mSuccessfully installed huggingface_hub-0.18.0




##Notebook connection to Hugging face

In [2]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: write).
Your token has been saved to /roo

In [3]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.svâ€¦

##Loading the dataset from hugging face and Formatting the Training Dataset

---



In [4]:
!pip install py7zr



In [5]:
dataset_name = "samsum"
dataset = load_dataset(dataset_name)

train_dataset = dataset['train']
eval_dataset = dataset['validation']
test_dataset = dataset['test']
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary'],
        num_rows: 14732
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary'],
        num_rows: 819
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary'],
        num_rows: 818
    })
})

##Loading the model and Setting up bitsandbytes config

We will use sharded version of falcon-7b model.


In [6]:
model_name = "vilsonrodrigues/falcon-7b-sharded"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
model.config.use_cache = False

Loading checkpoint shards:   0%|          | 0/15 [00:00<?, ?it/s]

##Loading the tokenizer

In [7]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

##Setting up the LoRA config

In [8]:
model = prepare_model_for_kbit_training(model)

lora_alpha = 32 #16
lora_dropout = 0.05 #0.1
lora_rank = 32 #64

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_rank,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=[
        "query_key_value",
        "dense",
        "dense_h_to_4h",
        "dense_4h_to_h",
    ]
)

peft_model = get_peft_model(model, peft_config)

##Load the trainer

In [9]:
output_dir = "falcon_7b_LoRA_dialogue_summarization"
per_device_train_batch_size = 16 #4
gradient_accumulation_steps = 4
optim = "paged_adamw_32bit"
save_steps = 10
logging_steps = 10
learning_rate = 2e-4
max_grad_norm = 0.3
max_steps = 600
warmup_ratio = 0.03
lr_scheduler_type = "cosine" #"constant"

training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
    push_to_hub=True,
    report_to="wandb"
)

##Passing arguments to the SFTT trainer

In [10]:
def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['id'])):
        prompt= f"""
         ### Summarize the conversation below

         ### Dialogue:
         {example['dialogue'][i]}

         ### Summary:
         {example['summary'][i]}
        """
        output_texts.append(prompt)
    return output_texts

In [11]:
max_seq_length = 256

trainer = SFTTrainer(
    model=peft_model,
    train_dataset=train_dataset,
    formatting_func=formatting_prompts_func,
    tokenizer=tokenizer,
    peft_config=peft_config,
    max_seq_length=max_seq_length,
    args=training_arguments,
)

In [12]:
# upcasting the layer norms in torch.bfloat16 for more stable training
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.bfloat16)

##Train the model

You can check your training time if you are doing multiple experiments

In [13]:
import os
import time
import wandb
os.environ["WANDB_SILENT"] = "true"
start = time.time()
wandb.init(project="falcon-7b-peft-lora-dialogue-summarization")

In [14]:
peft_model.config.use_cache = False
trainer.train()

You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
10,2.2901
20,1.9514
30,1.7617
40,1.6496
50,1.4871
60,2.2138
70,1.8239
80,1.7032
90,1.6138
100,1.4594


TrainOutput(global_step=600, training_loss=1.6835430892308554, metrics={'train_runtime': 3397.2105, 'train_samples_per_second': 11.303, 'train_steps_per_second': 0.177, 'total_flos': 2.693804210213038e+17, 'train_loss': 1.6835430892308554, 'epoch': 2.61})

##Save the model

In [15]:
#trainer.save() #if you want to save your model locally

##Push to hub

In [16]:
trainer.push_to_hub()

'https://huggingface.co/bearbearyu1223/falcon_7b_LoRA_dialogue_summarization/tree/main/'

##Inference

In [17]:
# Loading PEFT model
PEFT_MODEL = "bearbearyu1223/"+ output_dir
config = PeftConfig.from_pretrained(PEFT_MODEL)
peft_base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    return_dict=True,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

peft_model = PeftModel.from_pretrained(peft_base_model, PEFT_MODEL)

peft_tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
peft_tokenizer.pad_token = peft_tokenizer.eos_token

adapter_config.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/15 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

In [49]:
# Generate responses from both orignal model and fine-tuned model
def get_response(dialogue, max_new_tokens=10):
  prompt= f"""
         ### Summarize the conversation below

         ### Dialogue:
         {dialogue}

         ### Summary:

        """
  peft_encoding = peft_tokenizer(prompt, return_tensors="pt").to(torch.device("cuda:0"))
  peft_outputs = peft_model.generate(input_ids=peft_encoding.input_ids, generation_config=GenerationConfig(max_new_tokens=max_new_tokens, pad_token_id = peft_tokenizer.eos_token_id,
                                                                                                         eos_token_id = peft_tokenizer.eos_token_id,
                                                                                                         attention_mask = peft_encoding.attention_mask,
                                                                                                         temperature=0.1, top_p=0.5,  repetition_penalty=10.0, num_return_sequences=1,))
  peft_text_output = peft_tokenizer.decode(peft_outputs[0], skip_special_tokens=True)
  return peft_text_output

In [50]:
import re

def remove_incomplete_sentences(text):

  # Compile the regular expression for complete sentences.
  incomplete_sentence_regex = re.compile(r"(^.*[\.\?!]|^\S[^.\?!]*)")

  # Find all of the incomplete sentences in the text.
  complete_sentences = incomplete_sentence_regex.findall(text)
  text = " ".join(complete_sentences)

  # Return the text with the incomplete sentences removed.
  return text


In [82]:
import transformers

test_index=550
dialogue=test_dataset[test_index]['dialogue']
summary=test_dataset[test_index]['summary']
condense_rate=0.8
max_new_tokens=int(len(dialogue.strip().split())*condense_rate)

peft_output=get_response(dialogue, max_new_tokens)
sub = "### Summary:"
peft_summary = peft_output.split(sub)[1]
post_processed_peft_model_summary = remove_incomplete_sentences(peft_summary.strip())

pipeline = transformers.pipeline(
    "text-generation",
    model=peft_base_model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
sequences = pipeline(
    formatting_prompts_func(test_dataset)[test_index],
    max_new_tokens=max_new_tokens,
    temperature=0.1, top_p=0.5,  repetition_penalty=10.0, num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.eos_token_id,
)
post_processed_base_model_summary= remove_incomplete_sentences(peft_summary.strip())
dash_line = '-'.join('' for x in range(100))
print(dash_line)
print('BASELINE HUMAN SUMMARY:')
print(summary)
print(dash_line)
print('BASE MODEL SUMMARY:')
print(sequences[0]['generated_text'])
print(dash_line)
print('PEFT MODEL SUMMARY')
print(post_processed_peft_model_summary)

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
Hansel will tell his sis to text Jeremih back.
---------------------------------------------------------------------------------------------------
BASE MODEL SUMMARY:

         ### Summarize the conversation below

         ### Dialogue:
         Jeremih: hey, tell your sis to text back
Hansel: haha, thats your issues bro, dont drag me into it
Jeremih: she's mad at me
Hansel: for what
Jeremih: i dont even knowðŸ˜”
Hansel:ðŸ˜¢ðŸ˜‚
Jeremih: youre laughing
Hansel: haha, ill tell her but next time i wont interfere
Jeremih: Okay bro, thanks

         ### Summary:
         Hansel will tell his sis to text Jeremih back.
         She is angry with him and he doesn't know why. 
         He won't interfere in their relationship anymore.
         Jeremih wants to talk to her again.
         Hansel thinks that
----------------------------------------------------------