# Fine-Tuning LLMs with Hugging Face

## Step 1: Installing and importing the libraries

In [1]:
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7

In [2]:
!pip install huggingface_hub



In [3]:
import torch
from trl import SFTTrainer
from peft import LoraConfig
from datasets import load_dataset
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, pipeline)

## Step 2: Loading the model

In [4]:
llama_model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path = "aboonaji/llama2finetune-v2",
                                                   quantization_config = BitsAndBytesConfig(load_in_4bit = True,
                                                                                            bnb_4bit_compute_dtype = getattr(torch, "float16"),
                                                                                            bnb_4bit_quant_type = "nf4"))
llama_model.config.use_cache = False
llama_model.config.pretraining_tp = 1


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

  return torch.load(checkpoint_file, map_location="cpu")


## Step 3: Loading the tokenizer

In [5]:
llama_tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path = "aboonaji/llama2finetune-v2", trust_remote_code = True)
llama_tokenizer.pad_token = llama_tokenizer.eos_token
llama_tokenizer.padding_side = "right"

llama_tokenizer_dir = "llama-tokenizer"
llama_tokenizer.save_pretrained(llama_tokenizer_dir)

tokenizer_config.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]

('llama-tokenizer/tokenizer_config.json',
 'llama-tokenizer/special_tokens_map.json',
 'llama-tokenizer/tokenizer.model',
 'llama-tokenizer/added_tokens.json',
 'llama-tokenizer/tokenizer.json')

## Step 4: Setting the training arguments

In [6]:
training_arguments = TrainingArguments(output_dir = "./results", per_device_train_batch_size = 2, max_steps = 50)

## Step 5: Creating the Supervised Fine-Tuning trainer

In [7]:
llama_sft_trainer = SFTTrainer(model = llama_model,
                               args = training_arguments,
                               train_dataset = load_dataset(path = "aboonaji/wiki_medical_terms_llam2_format", split = "train"),
                               tokenizer = llama_tokenizer,
                               peft_config = LoraConfig(task_type = "CAUSAL_LM", r = 64, lora_alpha = 16, lora_dropout = 0.1),
                               dataset_text_field = "text")

wiki_medical_terms_llam2.jsonl:   0%|          | 0.00/54.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/6861 [00:00<?, ? examples/s]



Map:   0%|          | 0/6861 [00:00<?, ? examples/s]

## Step 6: Training the model

In [8]:
pip install wandb



In [9]:
wandb login

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: [32m[41mERROR[0m API key must be 40 characters long, yours was 37


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


In [10]:
llama_sft_trainer.train()

[34m[1mwandb[0m: Currently logged in as: [33msand-s-heep95[0m ([33msand-s-heep95-psg-college-of-technology[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
  return fn(*args, **kwargs)


Step,Training Loss


TrainOutput(global_step=50, training_loss=1.8961221313476562, metrics={'train_runtime': 335.0191, 'train_samples_per_second': 0.298, 'train_steps_per_second': 0.149, 'total_flos': 1866637832355840.0, 'train_loss': 1.8961221313476562, 'epoch': 0.01})

## Step 7: Chatting with the model

In [11]:
user_prompt = "Please tell me about Ascariasis"
text_generation_pipeline = pipeline(task = "text-generation", model = llama_model, tokenizer = llama_tokenizer, max_length = 300)
model_answer = text_generation_pipeline(f"<s>[INST] {user_prompt} [/INST]")
print(model_answer[0]['generated_text'])

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


<s>[INST] Please tell me about Ascariasis [/INST]  Ascariasis is a parasitic infection caused by the Ascaris lumbricoides roundworm. everybody has Ascaris lumbricoides in their intestine, but the infection is only considered when the worms grow and multiply, causing symptoms such as abdominal pain, diarrhea, and weight loss.

Causes:

* Eating contaminated food or water
* Poor hygiene and sanitation
* Direct contact with infected feces
* Infected mother to fetus during pregnancy or breastfeeding

Symptoms:

* Abdominal pain
* Diarrhea
* Weight loss
* Fatigue
* Anemia
* Coughing up blood
* Wheezing
* Shortness of breath

Diagnosis:

* Physical examination
* Stool sample examination
* Blood tests
* Imaging tests (X-ray, CT scan, MRI)

Treatment:

* Medications (albendazole, mebendazole, praziquantel)
* Surgery (rarely)

Prevention:

* Good hygiene and sanitation practices
* Properly disposing


In [12]:
user_prompt = "Symptoms include swelling and lack of breathe"
text_generation_pipeline = pipeline(task = "text-generation", model = llama_model, tokenizer = llama_tokenizer, max_length = 300)
model_answer = text_generation_pipeline(f"<s>[INST] {user_prompt} [/INST]")
print(model_answer[0]['generated_text'])

<s>[INST] Symptoms include swelling and lack of breathe [/INST]  Thank you for providing more information. Unterscheidung between anaphylaxis and other conditions that can cause swelling and difficulty breathing, such as asthma or COPD, can be challenging. Here are some key points to consider:

1. Onset: Anaphylaxis typically occurs within minutes to hours of exposure to the trigger, while asthma and COPD can develop over time.
2. Duration: Anaphylaxis can last for hours or even days, while asthma and COPD are typically chronic conditions that can worsen over time.
3. Symptoms: In addition to swelling and difficulty breathing, anaphylaxis can cause rapid heartbeat, sweating, nausea, vomiting, diarrhea, and abdominal cramping. Asthma and COPD can also cause breathing difficulties, but the symptoms are typically more gradual and may include coughing, wheezing, and shortness of breath.
4. Trigger: Anaphylaxis is typically triggered by a specific allergen, such as a food, insect sting, or 