<a href="https://colab.research.google.com/github/Hydenx2004/AI-doctor-chatbot-/blob/main/AI_DOCTOR_(1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-Tuning LLMs with Hugging Face

## Step 1: Installing and importing the libraries

In [None]:
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.5/92.5 MB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m54.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.4/77.4 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m46.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m39.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.3/21.3 MB[0m [31m47.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━

In [None]:
!pip install huggingface_hub



In [None]:
import torch
from trl import SFTTrainer
from peft import LoraConfig
from datasets import load_dataset
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, pipeline)

## Step 2: Loading the model

In [None]:
# Load a fine-tuned language model with quantization configuration
llama_model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path="aboonaji/llama2finetune-v2",  # The path or name of the pre-trained model to load.
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,  # Load the model using 4-bit quantization to save memory and improve efficiency.
        bnb_4bit_compute_dtype=getattr(torch, "float16"),  # Use float16 data type for computation to balance precision and performance.
        bnb_4bit_quant_type="nf4"  # Use NF4 quantization type, which is a specific method of 4-bit quantization.
    )
)

# Disable caching mechanism
llama_model.config.use_cache = False  # Setting use_cache to False disables the caching of intermediate states during generation. This can save memory at the cost of potentially slower generation.

# Configure pretraining
llama_model.config.pretraining_tp = 1  # This parameter is used for specific pretraining configurations. Setting it to 1 could be a model-specific requirement or a way to control the training phases.


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/632 [00:00<?, ?B/s]

pytorch_model.bin.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/174 [00:00<?, ?B/s]

## Step 3: Loading the tokenizer

In [None]:
# Load a pre-trained tokenizer for the LLaMA model
llama_tokenizer = AutoTokenizer.from_pretrained(
    pretrained_model_name_or_path="aboonaji/llama2finetune-v2",  # The path or name of the pre-trained model to load the tokenizer from.
    trust_remote_code=True  # Allows loading custom code from the model repository if needed.
)

# Set the padding token to be the same as the end-of-sequence (EOS) token
llama_tokenizer.pad_token = llama_tokenizer.eos_token  # This sets the padding token to be the end-of-sequence token. Useful for models that do not have a dedicated pad token.

# Set the padding side to "right"
llama_tokenizer.padding_side = "right"  # Specifies that padding should be added to the right side of sequences. Ensures that all sequences are the same length when fed into the model.

# The tokenizer is now configured for use with the LLaMA model with appropriate padding settings.

tokenizer_config.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]

## Step 4: Setting the training arguments

In [None]:
# Define the training arguments for the model training process
training_arguments = TrainingArguments(
    output_dir="./results",  # The directory where the trained model and other outputs will be saved.
    per_device_train_batch_size=4,  # The batch size per GPU/TPU core/CPU during training. This defines how many samples will be processed at once per device.
    max_steps=100  # The total number of training steps to perform. Overrides the number of epochs if specified. Used for quick testing or debugging.
)

## Step 5: Creating the Supervised Fine-Tuning trainer

In [None]:
# Load the training dataset
train_dataset = load_dataset(path="aboonaji/wiki_medical_terms_llam2_format", split="train")

# Define the SFT (Sequence-to-Sequence Fine-Tuning) Trainer for the LLaMA model
llama_sft_trainer = SFTTrainer(
    model=llama_model,  # The pre-trained model to be fine-tuned.
    args=training_arguments,  # The training arguments specifying how the training should be conducted.
    train_dataset=train_dataset,  # The dataset to be used for training.
    tokenizer=llama_tokenizer,  # The tokenizer to process the dataset.
    peft_config=LoraConfig(
        task_type="CAUSAL_LM",  # The type of task. "CAUSAL_LM" indicates a causal language modeling task.
        r=64,  # The rank of the low-rank adaptation matrices used in the LoRA method. Higher values might capture more information but increase computational cost.
        lora_alpha=16,  # A scaling factor for the LoRA method. It adjusts the influence of the low-rank matrices on the model's original parameters.
        lora_dropout=0.1  # Dropout rate for the LoRA method to prevent overfitting. Dropout is applied to the low-rank matrices.
    ),
    dataset_text_field="text"  # The field in the dataset that contains the text data to be used for training.
)


Downloading data:   0%|          | 0.00/54.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/6861 [00:00<?, ? examples/s]



Map:   0%|          | 0/6861 [00:00<?, ? examples/s]

## Step 6: Training the model

In [None]:
llama_sft_trainer.train()

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss


TrainOutput(global_step=100, training_loss=1.6550128173828125, metrics={'train_runtime': 1586.0497, 'train_samples_per_second': 0.252, 'train_steps_per_second': 0.063, 'total_flos': 8228119310991360.0, 'train_loss': 1.6550128173828125, 'epoch': 0.06})

##  Chatting with the model

In [None]:
user_prompt = input("Enter your prompt: ")
text_generation_pipeline = pipeline(task = "text-generation", model = llama_model, tokenizer = llama_tokenizer, max_length = 300)
model_answer = text_generation_pipeline(f"<s>[INST] {user_prompt} [/INST]")
print(model_answer[0]['generated_text'])

Enter your prompt: Im feeling a mild cough what should i do?




<s>[INST] Im feeling a mild cough what should i do? [/INST]  If you're feeling a mild cough, there are several things you can do to help manage your symptoms and potentially shorten the duration of your cough:
 nobody likes a cough, especially when it's mild. here are some things you can do to help manage your symptoms and potentially shorten the duration of your cough:

1. Stay hydrated: Drinking plenty of fluids, such as water, tea, or soup, can help to thin out mucus in your throat and make it easier to cough up.
2. Use a humidifier: Dry air can irritate your throat and make your cough worse. Using a humidifier can help to add moisture to the air and soothe your throat.
3. Take over-the-counter medications: Cough suppressants, such as dextromethorphan, can help to reduce the urge to cough. Expectorants, such as guaifenesin, can help to loosen mucus in your chest and make it easier to cough up.
4. Rest: Getting enough rest can help your body to fight off the underlying cause of your 