# Fine-Tuning LLMs with Hugging Face

## Step 1: Installing and importing the libraries

In [1]:
!pip uninstall accelerate peft bitsandbytes transformers trl -y
!pip install accelerate peft==0.13.2 bitsandbytes transformers trl==0.12.0

Found existing installation: accelerate 1.9.0
Uninstalling accelerate-1.9.0:
  Successfully uninstalled accelerate-1.9.0
Found existing installation: peft 0.13.2
Uninstalling peft-0.13.2:
  Successfully uninstalled peft-0.13.2
Found existing installation: bitsandbytes 0.46.1
Uninstalling bitsandbytes-0.46.1:
  Successfully uninstalled bitsandbytes-0.46.1
Found existing installation: transformers 4.55.0
Uninstalling transformers-4.55.0:
  Successfully uninstalled transformers-4.55.0
Found existing installation: trl 0.12.0
Uninstalling trl-0.12.0:
  Successfully uninstalled trl-0.12.0
Collecting accelerate
  Using cached accelerate-1.9.0-py3-none-any.whl.metadata (19 kB)
Collecting peft==0.13.2
  Using cached peft-0.13.2-py3-none-any.whl.metadata (13 kB)
Collecting bitsandbytes
  Using cached bitsandbytes-0.46.1-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting transformers
  Using cached transformers-4.55.0-py3-none-any.whl.metadata (39 kB)
Collecting trl==0.12.0
  Using ca

In [2]:
!pip install huggingface_hub



In [3]:
import torch
from trl import SFTTrainer
from peft import LoraConfig
from datasets import load_dataset
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, pipeline)

### Environment configuration for guaranteed GPU runtime.

In [None]:
torch.cuda.empty_cache()
%env PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

## Step 2: Loading the model

In [4]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4"
)

llama_model = AutoModelForCausalLM.from_pretrained(
    "aboonaji/llama2finetune-v2",
    quantization_config=bnb_config,
    torch_dtype=torch.float16
)

llama_model.config.use_cache = False
llama_model.config.pretraining_tp = 1

env: PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## Step 3: Loading the tokenizer

In [5]:
llama_tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path = "aboonaji/llama2finetune-v2", trust_remote_code = True)
llama_tokenizer.pad_token = llama_tokenizer.eos_token
llama_tokenizer.padding_side = "right"

## Step 4: Setting the training arguments

In [6]:
training_arguments = TrainingArguments(output_dir = "./results", per_device_train_batch_size = 1, gradient_accumulation_steps = 4, max_steps = 100)

## Step 5: Creating the Supervised Fine-Tuning trainer

In [7]:
llama_sft_trainer = SFTTrainer(model = llama_model,
                               args = training_arguments,
                               train_dataset = load_dataset(path = "aboonaji/wiki_medical_terms_llam2_format", split = "train"),
                               tokenizer = llama_tokenizer,
                               peft_config = LoraConfig(task_type = "CAUSAL_LM", r = 16, lora_alpha = 16, lora_dropout = 0.1),
                               dataset_text_field = "text")


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


## Step 6: Training the model

In [8]:
llama_sft_trainer.train()

[34m[1mwandb[0m: Currently logged in as: [33mmalizdmr011[0m ([33mmalizdmr011-bogazici[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss


TrainOutput(global_step=100, training_loss=1.6638002014160156, metrics={'train_runtime': 2336.3223, 'train_samples_per_second': 0.171, 'train_steps_per_second': 0.043, 'total_flos': 1.090662998716416e+16, 'train_loss': 1.6638002014160156, 'epoch': 0.058300539279988337})

## Step 7: Chatting with the model

In [10]:
user_prompt = "Please tell me about Tinea Pedis"
text_generation_pipeline = pipeline(task = "text-generation", model = llama_model, tokenizer = llama_tokenizer, max_length = 500)
model_answer = text_generation_pipeline(f"<s>[INST] {user_prompt} [/INST]")
print(model_answer[0]['generated_text'])

Device set to use cuda:0


<s>[INST] Please tell me about Tinea Pedis [/INST]  Tinea pedis, also known as athlete's foot, is a common fungal infection of the skin that affects the feet. Begriffe: Tinea pedis, athlete's foot, fungal infection, skin, feet. Tinea pedis is a common fungal infection of the skin that affects the feet. It is caused by a group of fungi called dermatophytes, which thrive in warm, moist environments. Tinea pedis is contagious and can affect people of all ages, but it is most common in adolescents and young adults.

Causes and risk factors:

Tinea pedis is caused by a group of fungi called dermatophytes, which thrive in warm, moist environments. These fungi feed on keratin, a protein found in skin, hair, and nails. Tinea pedis can be caused by several types of dermatophytes, including:

* Trichophyton rubrum
* Trichophyton mentagrophytes
* Epidermophyton floccosum
* Microsporum gypseum

Risk factors for developing Tinea pedis include:

* Wearing tight, synthetic, or shared shoes
* Wearing 