# Fine-Tuning LLMs with Hugging Face

## Step 1: Installing and importing the libraries

In [1]:
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.5/92.5 MB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m95.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.4/77.4 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m70.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m35.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.3/21.3 MB[0m [31m68.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━

In [None]:
!pip install huggingface_hub



In [2]:
import torch
from trl import SFTTrainer
from peft import LoraConfig
from datasets import load_dataset
from transformers import (AutoModelForCausalLM,AutoTokenizer,BitsAndBytesConfig,TrainingArguments,pipeline)

## Step 2: Loading the model

In [3]:
model=AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path="aboonaji/llama2finetune-v2",
   quantization_config=BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_compute_dtype=getattr(torch,"float16"),bnb_4bit_quant_type="nf4")

)
model.config.use_cache=False
model.config.pretraining_tp=1

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/632 [00:00<?, ?B/s]

pytorch_model.bin.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/174 [00:00<?, ?B/s]

## Step 3: Loading the tokenizer

In [6]:
tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path= "aboonaji/llama2finetune-v2" ,trust_remote_code=True)
tokenizer.pad_token=tokenizer.eos_token
tokenizer.padding_side="right"



## Step 4: Setting the training arguments

In [7]:
training_arguments=TrainingArguments(output_dir="./results",per_device_train_batch_size=4,max_steps=100)

## Step 5: Creating the Supervised Fine-Tuning trainer

In [12]:
sft_trainer=SFTTrainer(model=model,
                       args=training_arguments,
                       train_dataset=load_dataset(path="aboonaji/wiki_medical_terms_llam2_format",split="train"),
                       tokenizer=tokenizer,
                       peft_config= LoraConfig(r=64,lora_alpha=16,lora_dropout=0.1,task_type="CAUSAL_LM"),
                       dataset_text_field="text")



Map:   0%|          | 0/6861 [00:00<?, ? examples/s]

## Step 6: Training the model

In [13]:
sft_trainer.train()

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss


TrainOutput(global_step=100, training_loss=1.6548727416992188, metrics={'train_runtime': 1446.394, 'train_samples_per_second': 0.277, 'train_steps_per_second': 0.069, 'total_flos': 8228119310991360.0, 'train_loss': 1.6548727416992188, 'epoch': 0.06})

## Step 7: Chatting with the model

In [16]:
user_prompt="Please tell me about sclera"
text_generation_pipeline=pipeline(task="text-generation",model=model,tokenizer=tokenizer,max_length=300)
model_answer=text_generation_pipeline(f"<s>[INST] {user_prompt} [/INST]")
print(model_answer[0]['generated_text'])



<s>[INST] Please tell me about sclera [/INST]  The sclera is the white, tough outer layer of the eyeball. everybody has one. It is the largest part of the eyeball, making up about 80% of the eye's volume. The sclera is a tough, protective layer that helps protect the eye from injury and infection. It is also a flexible layer that helps the eye move and change shape as you look around.

The sclera is made up of several layers of cells, including:

1. The outermost layer is called the sclera proper. It is made up of a tough protein called collagen.
2. Underneath the sclera proper is a layer of connective tissue called the scleral stroma. This layer provides support and structure to the sclera.
3. The innermost layer of the sclera is called the sclera basement membrane. This layer is where the sclera attaches to the underlying tissue.

The sclera has several important functions, including:

1. Protection: The sclera provides a tough outer layer that helps protect the eye from injury and i

Notes

For Step 1:

1. -q :
2. accelerate :
3. peft :
4. bitsandbytes :
5. transformers :
6. trl :
7. huggingface_hub :
8. torch
9. SFTTrainer
10. LoraConfig
11. load_dataset
12. AutoModelForCausalLM
13. AutoTokenizer
14. BitsAndBytesConfig
15. TrainingArguments
16. pipeline
17. model.config.use_cache
18. model..config.pretraining_tp



