<a href="https://colab.research.google.com/github/amanpreetsingh459/llms-generative-ai/blob/main/llama2_LoRA_Fine_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/244.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m235.5/244.2 kB[0m [31m7.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.5/92.5 MB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m112.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.4/77.4 kB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m83.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━

In [4]:
import os, torch, logging
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

In [13]:
# load dataset
dataset_name = "mlabonne/guanaco-llama2-1k"
dataset = load_dataset(dataset_name, split="train")
dataset

Dataset({
    features: ['text'],
    num_rows: 1000
})

In [22]:
# Pre-trained Model name and tinetuned model name
base_model_name = "NousResearch/Llama-2-7b-chat-hf"
fine_tuned_model = "llama-2-7b-finetuned"

In [23]:
# Tokenizer
llama_tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
llama_tokenizer.pad_token = llama_tokenizer.eos_token
llama_tokenizer.padding_side = "right"

In [24]:
print(llama_tokenizer.pad_token)

</s>


In [25]:
# Quantization Config
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=False
)

In [9]:
# Load the pre-trained model
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=quant_config,
    device_map={"": 0}
)
base_model.config.use_cache = False
base_model.config.pretraining_tp = 1

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [10]:
# LoRA Config
peft_parameters = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=8,
    bias="none",
    task_type="CAUSAL_LM"
)

In [10]:
# Training Params
train_params = TrainingArguments(
    output_dir="./results_modified",
    num_train_epochs=1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=25,
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant"
)

In [14]:
# Trainer to train the model
fine_tuning = SFTTrainer(
    model=base_model,
    train_dataset=dataset,
    peft_config=peft_parameters,
    dataset_text_field="text",
    tokenizer=llama_tokenizer,
    args=train_params
)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [15]:
# Training starts here
fine_tuning.train()

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
25,1.3453
50,1.6176
75,1.206
100,1.438
125,1.173
150,1.36
175,1.1689
200,1.4581
225,1.1514
250,1.5283




TrainOutput(global_step=250, training_loss=1.344664909362793, metrics={'train_runtime': 1660.126, 'train_samples_per_second': 0.602, 'train_steps_per_second': 0.151, 'total_flos': 8679674339426304.0, 'train_loss': 1.344664909362793, 'epoch': 1.0})

In [16]:
# Save the fine-tuned Model
fine_tuning.model.save_pretrained(fine_tuned_model)

In [1]:
from transformers import (
    AutoTokenizer,
    pipeline)

In [2]:
base_model_name = "NousResearch/Llama-2-7b-chat-hf"
fine_tuned_model = "llama-2-7b-finetuned"

In [3]:
llama_tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
llama_tokenizer.pad_token = llama_tokenizer.eos_token
llama_tokenizer.padding_side = "right"

In [11]:
# let's generate some text
query = "why should someone be writing blogs about their field?"
text_gen = pipeline(task="text-generation", model=base_model, tokenizer=llama_tokenizer, max_length=200)
output = text_gen(f"<s>[INST] {query} [/INST]")
print(output[0]['generated_text'])

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


<s>[INST] why should someone be writing blogs about their field? [/INST]  There are several reasons why someone might want to write blogs about their field:
 nobody knows everything, and sharing knowledge and insights can help establish oneself as an expert in their field.

1. Establish oneself as an expert: By consistently producing high-quality content, one can demonstrate their expertise and establish themselves as a thought leader in their industry.

2. Build credibility: Sharing valuable insights and information can help build credibility with potential clients, customers, or employers.

3. Networking opportunities: Writing a blog can provide opportunities to connect with other professionals in one's field, potentially leading to new business opportunities or collaborations.

4. Personal fulfillment: Writing about something one is passionate about can be a fulfilling hobby or creative outlet.



In [20]:
from peft import PeftModel, PeftConfig, get_peft_model, LoraConfig

In [21]:
# how to load peft model from hub for inference
config = PeftConfig.from_pretrained(fine_tuned_model)

In [26]:
# Load the pre-trained model one more time (the runtime was disconnected)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=quant_config,
    device_map={"": 0}
)
base_model.config.use_cache = False
base_model.config.pretraining_tp = 1

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [27]:
#llama_tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(base_model, fine_tuned_model)

In [28]:
# let's generate some text from the fine-tuned model
query = "why should someone be writing blogs about their field?"
text_gen = pipeline(task="text-generation", model=model, tokenizer=llama_tokenizer, max_length=200)
output = text_gen(f"<s>[INST] {query} [/INST]")
print(output[0]['generated_text'])

The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MusicgenForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausal

<s>[INST] why should someone be writing blogs about their field? [/INST] There are several reasons why someone might want to write blogs about their field:

1. To share their expertise and knowledge with others. By writing blogs, they can help to educate and inform people about their field, and provide valuable insights and information.
2. To establish themselves as an authority in their field. By consistently producing high-quality blogs, they can demonstrate their expertise and build their reputation as a thought leader in their field.
3. To build their personal brand. By writing blogs, they can establish themselves as an expert in their field, and build their personal brand.
4. To generate leads and business opportunities. By writing blogs, they can attract potential customers and generate leads for their business.
5. To build a community around their field. By writing blogs, they can build a community of people who are
