**Fine Tunning:**

Fine-tuning a Large Language Model (LLM) involves training it on a specific
dataset to specialize it for a task or domain.

**Why Fine tunning ?**

*   To Enhanced Performance (improved accuracy on the particular domain)
*   less data required to train the model (i.e it consumes less memory and less GPU resources)

**Steps to fine tune a LLM:**
1.   Choose a appropriate llm for your application
2.   Prepare your dataset
1.   Choose a Fine-Tuning Method (Full Fine-Tuning or PEFT)
2.   Train your LLM with fine tunning data
1.   Evaluate your model performance and analyse its accuracy
2.   At last deploy your model


---

**LoRA fine tunning vs Full fine tunning:**



*   LoRA (Low-Rank Adaptation) fine-tuning is a parameter-efficient method that updates only a small number of additional weight matrices, effectively freezing the original model's weights.

*   Full fine-tuning, on the other hand, updates all the model's weights during the adaptation process.














In [1]:
!pip install transformers peft accelerate trl bitsandbytes

Collecting trl
  Downloading trl-0.18.1-py3-none-any.whl.metadata (11 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.46.0-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting datasets>=3.0.0 (from trl)
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting transformers
  Downloading transformers-4.52.4-py3-none-any.whl.metadata (38 kB)
Collecting huggingface-hub>=0.25.0 (from peft)
  Downloading huggingface_hub-0.32.4-py3-none-any.whl.metadata (14 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets>=3.0.0->trl)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets>=3.0.0->trl)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets>=3.0.0->trl)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting hf-xet<2.0.0,>=1.1.2 (from huggingface-hub>=0.25.0->peft)
  Downloading hf_xet-1.1.3-cp37-abi3-m

**Transformer:**
  
  A library from Hugging Face to load pretrained models (like BERT, GPT, etc.) and tokenizers.
  it provides tools to load models, tokenize text, and generate predictions using transformers.

**Datasets:**
  
  A Hugging Face library that provides thousands of pre-built datasets and tools to load, split, and preprocess your own data easily.

**peft (Parameter-Efficient Fine-Tuning):**

   Used to fine-tune big models efficiently using techniques like LoRA, prefix tuning, etc.

**Accelerate:**

   A Hugging Face library to run training across multiple GPUs or on TPU/CPU easily with minimal code changes.

**Bitsandbytes:**

   A library to quantize large models (e.g., 8-bit, 4-bit) to save GPU memory and enable fine-tuning on smaller GPUs.

**TRL(Transformer Reinforcement Learning:)**

  A Hugging Face library designed to fine-tune language models using reinforcement learning technique




In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name="distilgpt2"
model=AutoModelForCausalLM.from_pretrained(model_name)
tokenizer=AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token=tokenizer.eos_token

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

By using AutoModelForCausalLM we able to load model from Hugging face and AutoTokenizer is that automatically loads the correct tokenizer for a given pre-trained model.
Models like GPT that process only fixed length inputs so we use pad_token

In [None]:
from datasets import Dataset
data={
    "prompt": [
        "What is AI?",
        "Explain machine learning.",
        "What is Python used for?",
    ],
    "completion": [
        "AI stands for Artificial Intelligence.",
        "Machine learning is a way for computers to learn from data.",
        "Python is used for web development, AI, and more.",
    ]

}

train_data = [{"text":f"{prompt}{comp}"} for prompt,comp in zip(data["prompt"],data["completion"])]
train_dataset=Dataset.from_list(train_data)
train_dataset

Dataset({
    features: ['text'],
    num_rows: 3
})

To train the model , create a dataset or use a specific dataset

In [None]:
def tokenize(sample_data):
  return tokenizer(sample_data["text"],truncation=True,padding='max_length',max_length=128)

train_dataset=train_dataset.map(tokenize,batched=True)
train_dataset

Map:   0%|          | 0/3 [00:00<?, ? examples/s]

Dataset({
    features: ['text', 'input_ids', 'attention_mask'],
    num_rows: 3
})

To tokenize the created dataset

In [None]:
from peft import get_peft_model, LoraConfig, TaskType
from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(load_in_8bit=True)

model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config, device_map="auto")

lora_config = LoraConfig(
    r=8,
    target_modules=["c_attn"],
    lora_dropout=0.1,
    bias="none",
    task_type=TaskType.CAUSAL_LM,
)

model = get_peft_model(model, lora_config)


In [None]:
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

training_args = TrainingArguments(
    output_dir="./finetuned-model",
    per_device_train_batch_size=2,
    num_train_epochs=5,
    logging_steps=10,
    save_steps=10,
    logging_dir='./logs',
    report_to="none" # Disable Weights & Biases logging
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

trainer.train()

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.


Step,Training Loss
10,3.8153


TrainOutput(global_step=10, training_loss=3.815334701538086, metrics={'train_runtime': 2.3455, 'train_samples_per_second': 6.395, 'train_steps_per_second': 4.263, 'total_flos': 491630100480.0, 'train_loss': 3.815334701538086, 'epoch': 5.0})

In [None]:
model.save_pretrained("finetuned-model")
tokenizer.save_pretrained("finetuned-model")

from transformers import pipeline
pipe = pipeline("text-generation", model="finetuned-model", tokenizer=tokenizer)

pipe("what is ai")


Device set to use cuda:0
