<a href="https://colab.research.google.com/github/Daryldactyl/Medical_LLM_Huggingface/blob/main/Medical_Wiki_LLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-Tuning LLMs with Hugging Face

## Step 1: Installing and importing the libraries

In [1]:
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/244.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━[0m [32m174.1/244.2 kB[0m [31m5.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.5/92.5 MB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m64.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.4/77.4 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m68.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━

In [2]:
!pip install huggingface_hub



In [3]:
import torch
from trl import SFTTrainer
from peft import LoraConfig
from datasets import load_dataset
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, pipeline)

## Step 2: Loading the model

In [4]:
llama_model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path = 'aboonaji/llama2finetune-v2',
                                                   quantization_config = BitsAndBytesConfig(load_in_4bit=True,
                                                                                            bnb_4bit_compute_dtype=getattr(torch, 'float16'),
                                                                                            bnb_4bit_quant_type='nf4'))
llama_model.config.use_cache = False
llama_model.config.pretraining_tp = 1

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/632 [00:00<?, ?B/s]

pytorch_model.bin.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/174 [00:00<?, ?B/s]

## Step 3: Loading the tokenizer

In [5]:
llama_tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path = 'aboonaji/llama2finetune-v2',
                                                trust_remote_code = True)
llama_tokenizer.pad_token = llama_tokenizer.eos_token
llama_tokenizer.padding_side = 'right'

tokenizer_config.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]

## Step 4: Setting the training arguments

In [6]:
training_arguments = TrainingArguments(output_dir='./results', per_device_train_batch_size=4, max_steps=100)

## Step 5: Creating the Supervised Fine-Tuning trainer

In [7]:
llama_sft_trainer = SFTTrainer(model=llama_model,
                               args=training_arguments,
                               train_dataset=load_dataset(path='aboonaji/wiki_medical_terms_llam2_format',split='train'),
                               tokenizer = llama_tokenizer,
                               peft_config = LoraConfig(r = 64, lora_alpha=16, lora_dropout=0.1, task_type='CAUSAL_LM'),
                               dataset_text_field = 'text')

Downloading data:   0%|          | 0.00/54.1M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]



Map:   0%|          | 0/6861 [00:00<?, ? examples/s]

## Step 6: Training the model

In [8]:
llama_sft_trainer.train()

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss


TrainOutput(global_step=100, training_loss=1.6529505920410157, metrics={'train_runtime': 1580.2536, 'train_samples_per_second': 0.253, 'train_steps_per_second': 0.063, 'total_flos': 8228119310991360.0, 'train_loss': 1.6529505920410157, 'epoch': 0.06})

## Step 7: Chatting with the model

In [9]:
user_prompt = 'Can you tell me about Juvenile myoclonic epilepsy'
text_generation_pipeline = pipeline(task = 'text-generation', model = llama_model, tokenizer = llama_tokenizer, max_length = 300)
model_answer = text_generation_pipeline(f'<s>[INST] {user_prompt} [/INST]')
print(model_answer[0]['generated_text'])

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


<s>[INST] Can you tell me about Juvenile myoclonic epilepsy [/INST]  Juvenile myoclonic epilepsy (JME) is a rare and inherited neurological disorder that affects the brain and nervous system. Unterscheidung between JME and other types of epilepsy can be challenging, as the symptoms can be similar. However, there are some key differences that can help doctors make a diagnosis.

Causes:
Juvenile myoclonic epilepsy is caused by mutations in the CNBP gene, which codes for a protein called CNBP. This protein is involved in the regulation of the expression of genes involved in the development and maintenance of neurons. Mutations in the CNBP gene can lead to the misfolding and accumulation of toxic protein aggregates, which can disrupt normal brain function and lead to seizures.

Symptoms:
The symptoms of JME typically appear in childhood or adolescence and can vary in severity. The most common symptoms include:

* Myoclonic jerks: These are sudden, involuntary muscle jerks that can occur in