# Fine-tuning LLM with HuggingFace

The model we are using is from `aboonaji` : https://huggingface.co/aboonaji/llama2finetune-v2

The dataset we are using is from `aboonaji` :

https://huggingface.co/datasets/aboonaji/wiki_medical_terms_llam2_format

### Part 1. Installing and Importing libraries

In [1]:
# installing libraries 
%pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.7.10
%pip install huggingface_hub 

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
# importing libraries
import torch
import scipy
from trl import SFTTrainer
from peft import LoraConfig
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, pipeline

  from .autonotebook import tqdm as notebook_tqdm
  warn("The installed version of bitsandbytes was compiled without GPU support. "


'NoneType' object has no attribute 'cadam32bit_grad_fp32'


W1231 11:30:57.323000 20476 site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.


### Part 2. Building the AI

#### loading the model

In [3]:
llama_model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path="aboonaji/llama2finetune-v2", torch_dtype=torch.float32, device_map={"": "cpu"},
                                                   quantization_config = BitsAndBytesConfig(load_in_4bit=True,
                                                                                            bnb_4bit_compute_dtype=getattr(torch, 'float16'),
                                                                                            bnb_4bit_quant_type='nf4',))
llama_model.config.use_cache = False
llama_model.config.pretraining_tp=1

Loading checkpoint shards: 100%|██████████| 2/2 [05:45<00:00, 172.94s/it]


#### loading the tokenizer

In [4]:
llama_tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path="aboonaji/llama2finetune-v2",
                                                trust_remote_code=True)
llama_tokenizer.pad_token = llama_tokenizer.eos_token
llama_tokenizer.padding_side = "right" 

#### setting the training arguments

In [5]:
training_args = TrainingArguments(
    output_dir="./llama-sft",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    logging_steps=10,

    fp16=False,
    bf16=False,
    no_cuda=True,          
    report_to="none"
)

### Part 3. Training the AI

#### creating the supervised fine-tuning trainer (SFTTrainer)

In [9]:
from datasets import load_dataset

dataset = load_dataset(
    "aboonaji/wiki_medical_terms_llam2_format",
    split="train"
)

print(dataset.column_names)

['text']


In [10]:
# Count empty / bad rows
bad_rows = dataset.filter(
    lambda x: x["text"] is None or len(x["text"].strip()) == 0
)

print("Bad rows:", len(bad_rows))

Bad rows: 0


In [11]:
llama_tokenizer.pad_token = llama_tokenizer.eos_token
llama_tokenizer.padding_side = "right"
llama_tokenizer.model_max_length = 512

In [12]:
import trl, transformers, datasets
print("trl:", trl.__version__)
print("transformers:", transformers.__version__)
print("datasets:", datasets.__version__)

trl: 0.7.10
transformers: 4.31.0
datasets: 4.4.2


In [8]:
trainer = SFTTrainer(
    model=llama_model,
    tokenizer=llama_tokenizer,
    args=training_args,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=512,
    packing=False,
    peft_config=LoraConfig(
        task_type="CAUSAL_LM",
        r=64,
        lora_alpha=16,
        lora_dropout=0.1
    ),
)


Map: 100%|██████████| 6861/6861 [00:21<00:00, 326.14 examples/s]


IndexError: list index out of range

#### training the model

In [None]:
trainer.train()

API KEY: sign in, cpoy api key, then the above line will run completely and the training starts

### Part 4. Chatting with AI

In [None]:
user_prompt = "Tell me about Paracetamol Poising"
text_gen_pipeline = pipeline(task = "text-generation", model = llama_model, tokenizer = llama_tokenizer, max_length = 300)
model_answer = text_gen_pipeline(f"<s>[INST]{user_prompt}[/ISNT]")
print(model_answer[0]['generated_text']) 

next search for Bursitis inplace of Paracetamol Poising