# Fine-Tune LLMs with Hugging Face

## Step 1: Installing and importing the libraries

In [1]:
!pip uninstall accelerate peft bitsandbytes transformers trl -y
!pip install accelerate peft==0.13.2 bitsandbytes transformers trl==0.12.0

Found existing installation: accelerate 1.11.0
Uninstalling accelerate-1.11.0:
  Successfully uninstalled accelerate-1.11.0
Found existing installation: peft 0.17.1
Uninstalling peft-0.17.1:
  Successfully uninstalled peft-0.17.1
[0mFound existing installation: transformers 4.57.1
Uninstalling transformers-4.57.1:
  Successfully uninstalled transformers-4.57.1
[0mCollecting accelerate
  Downloading accelerate-1.11.0-py3-none-any.whl.metadata (19 kB)
Collecting peft==0.13.2
  Downloading peft-0.13.2-py3-none-any.whl.metadata (13 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.48.2-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting transformers
  Downloading transformers-4.57.1-py3-none-any.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.0/44.0 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting trl==0.12.0
  Downloading trl-0.12.0-py3-none-any.whl.metadata (10 kB)
Downloading peft-0.13.2-py3-none-any.whl (320 kB)


In [2]:
!pip install huggingface_hub



In [3]:
import torch
from trl import SFTTrainer
from peft import LoraConfig
from datasets import load_dataset
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, pipeline)

## Step 2: Loading the model

In [4]:
llama_model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path = "aboonaji/llama2finetune-v2",
                                                   quantization_config = BitsAndBytesConfig(load_in_4bit = True,
                                                                                            bnb_4bit_compute_dtype = getattr(torch, "float16"),
                                                                                            bnb_4bit_quant_type = "nf4"))
llama_model.config.use_cache = False
llama_model.config.pretraining_tp = 1

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/632 [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/583 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

adapter_model.bin:   0%|          | 0.00/33.6M [00:00<?, ?B/s]

## Step 3: Loading the tokenizer

In [6]:
llama_tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path = "aboonaji/llama2finetune-v2",
                                                trust_remote_code = True)
llama_tokenizer.pad_token = llama_tokenizer.eos_token
llama_tokenizer.padding_side = "right"

tokenizer_config.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]

## Step 4: Setting the training arguments

In [14]:
training_arguments = TrainingArguments(output_dir = "./results", per_device_train_batch_size = 1,gradient_accumulation_steps=4, fp16=True, max_steps = 100)

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


## Step 5: Creating the Supervised Fine-Tuning trainer

In [18]:
llama_sft_trainer = SFTTrainer(model = llama_model,
                               args = training_arguments,
                               train_dataset = load_dataset(path = "aboonaji/wiki_medical_terms_llam2_format", split = "train"),
                               tokenizer = llama_tokenizer,
                               peft_config = LoraConfig(task_type = "CAUSAL_LM", r = 64, lora_alpha = 16, lora_dropout = 0.1),
                               dataset_text_field = "text")


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Map:   0%|          | 0/6861 [00:00<?, ? examples/s]

## Step 6: Training the model

In [19]:
import os
os.environ["WANDB_DISABLED"] = "true"
llama_sft_trainer.train()

Step,Training Loss


Step,Training Loss


TrainOutput(global_step=100, training_loss=1.6623654174804687, metrics={'train_runtime': 620.6002, 'train_samples_per_second': 0.645, 'train_steps_per_second': 0.161, 'total_flos': 1.094811811295232e+16, 'train_loss': 1.6623654174804687, 'epoch': 0.058300539279988337})

In [None]:
# Warning: To run the training you first need to sign up here: https://wandb.ai/authorize?ref=models
# And then you will directly find in the main page your API key which you will enter in the output below.

## Step 7: Chatting with the model

In [21]:
user_prompt = "Tell me about botulism(give me just 5 important points that is few sentence)"
text_generation_pipeline = pipeline(task = "text-generation", model = llama_model, tokenizer= llama_tokenizer, max_length = 300)
model_answer = text_generation_pipeline(f"<s>[INST] {user_prompt} [/INST]")
print(model_answer[0]['generated_text'])


Device set to use cuda:0


<s>[INST] Tell me about botulism(give me just 5 important points that is few sentence) [/INST]  Sure, here are 5 important points about botulism:
 everybody gets botulism from eating food that has botulinum toxin in it.
botulism is a bacterial infection that can cause muscle weakness, paralysis, and breathing problems.
botulism can be treated with an antitoxin that blocks the toxin from spreading.
botulism can be prevented by properly canning or freezing food to kill the bacteria.
botulism can be fatal if left untreated, but the mortality rate is low with proper medical care.
