## Finetune Falcon-7b on a Google colab

Welcome to this Google Colab notebook that shows how to fine-tune the recent Falcon-7b model on a single Google colab and turn it into a chatbot

We will leverage PEFT library from Hugging Face ecosystem, as well as QLoRA for more memory efficient finetuning

## Setup

Run the cells below to setup and install the required libraries.

In [None]:
!pip install -qU bitsandbytes transformers datasets accelerate loralib einops xformers
!pip install -q -U git+https://github.com/huggingface/peft.git

import os
import bitsandbytes as bnb
import pandas as pd
import torch
import torch.nn as nn
import transformers
from datasets import load_dataset
from peft import (
    LoraConfig,
    PeftConfig,
    get_peft_model,
    prepare_model_for_kbit_training,
)
from transformers import (
    AutoConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


## Loading the Pre-Trained Model

In [None]:
model_id = "tiiuae/falcon-7b"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    load_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True,
    quantization_config=bnb_config,
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## Preparing the Model for QLoRA

In [None]:
model = prepare_model_for_kbit_training(model)

You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.


## Configuring LoRA

In [None]:
config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)

## Loading and Preparing the Dataset


In [None]:
def generate_prompt(data_point):
  # Define prompt template for Falcon-7B
  PROMPT_TEMPLATE = """<|system|>You are a helpful medical assistant.<|endoftext|>
<|user|>Question: {question}<|endoftext|>
<|assistant|>Answer: {answer}<|endoftext|>"""
  return PROMPT_TEMPLATE.format(
      question=data_point["question"],
      answer=data_point["answer"])

def generate_and_tokenize_prompt(data_point):
  full_prompt = generate_prompt(data_point)
  tokenized_full_prompt = tokenizer(full_prompt, padding=True, truncation=True)
  return tokenized_full_prompt

from datasets import load_dataset
dataset = load_dataset("lavita/MedQuAD", split="train")

dataset = dataset.shuffle().map(generate_and_tokenize_prompt)

Map:   0%|          | 0/47441 [00:00<?, ? examples/s]

## Setting Up the Training Arguments

In [None]:
# 1. Define and Create Output Directory
OUTPUT_DIR = "/content/falcon-7b-medquad"
if not os.path.exists(OUTPUT_DIR):
    os.makedirs(OUTPUT_DIR)
    print(f"Created output directory: {OUTPUT_DIR}")
else:
    print(f"Output directory already exists: {OUTPUT_DIR}")

Created output directory: /content/falcon-7b-medquad


In [None]:
training_args = transformers.TrainingArguments(
    auto_find_batch_size=True,  # Starts at 8, adjusts down if OOM
    per_device_train_batch_size=4,  # Initial guess for T4
    num_train_epochs=1,
    learning_rate=2e-4,
    fp16=True,  # T4-compatible
    save_total_limit=2,  # Reduced to save space
    logging_steps=10,
    save_strategy="steps",
    save_steps=250,
    max_steps=500,
    gradient_checkpointing=True,
    optim="paged_adamw_32bit",
    warmup_ratio=0.03,
    lr_scheduler_type="cosine",
    report_to="none"
)

## Training the Model

In [20]:
trainer = transformers.Trainer(
    model=model,
    train_dataset=dataset,
    args=training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False
trainer.train()

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.
  return fn(*args, **kwargs)


Step,Training Loss
10,1.0393
20,0.7042
30,0.7675
40,0.8952


You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.


Step,Training Loss
10,0.8443
20,0.8217
30,0.5619
40,0.5138
50,0.6473
60,0.5985
70,0.6123
80,0.7368
90,0.6813
100,0.9346


You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.


Step,Training Loss
10,0.7462
20,0.5983
30,0.8296
40,0.4679
50,0.4569
60,0.5139
70,0.2837
80,0.5832
90,0.5561
100,0.5129


  return fn(*args, **kwargs)


Step,Training Loss
10,0.7462
20,0.5983
30,0.8296
40,0.4679
50,0.4569
60,0.5139
70,0.2837
80,0.5832
90,0.5561
100,0.5129


TrainOutput(global_step=500, training_loss=0.6217149052619934, metrics={'train_runtime': 2242.2986, 'train_samples_per_second': 0.223, 'train_steps_per_second': 0.223, 'total_flos': 3025831077976320.0, 'train_loss': 0.6217149052619934, 'epoch': 0.010539406842182922})

In [21]:
# 9. Save Final Model
final_output_dir = f"{OUTPUT_DIR}-final"
try:
    trainer.save_model(final_output_dir)
    tokenizer.save_pretrained(final_output_dir)
    print(f"Model saved to {final_output_dir}")
except Exception as e:
    print(f"Failed to save model: {e}")

Model saved to /content/falcon-7b-medquad-final


In [26]:
!zip -r falcon-7b-medquad-final.zip /content/falcon-7b-medquad-final
from google.colab import files
files.download("falcon-7b-medquad-final.zip")

  adding: content/falcon-7b-medquad-final/ (stored 0%)
  adding: content/falcon-7b-medquad-final/special_tokens_map.json (deflated 49%)
  adding: content/falcon-7b-medquad-final/tokenizer.json (deflated 81%)
  adding: content/falcon-7b-medquad-final/training_args.bin (deflated 51%)
  adding: content/falcon-7b-medquad-final/tokenizer_config.json (deflated 84%)
  adding: content/falcon-7b-medquad-final/adapter_config.json (deflated 54%)
  adding: content/falcon-7b-medquad-final/README.md (deflated 66%)
  adding: content/falcon-7b-medquad-final/adapter_model.safetensors (deflated 7%)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Model testing

In [37]:
import time
# Define test prompt function (adapted from your training code)
def generate_test_prompt(question):
    PROMPT_TEMPLATE = """<|system|>You are a helpful medical assistant.<|endoftext|>
<|user|>Question: {question}<|endoftext|>
<|assistant|>Answer: """
    return PROMPT_TEMPLATE.format(question=question)
# Define generation function
def generate_response(question):
    prompt = generate_test_prompt(question)
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

    start_time = time.time()

    outputs = model.generate(
        **inputs,
        max_new_tokens=200,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        pad_token_id=tokenizer.eos_token_id,
        use_cache=False  # This line is added to disable caching
    )
    end_time = time.time()

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    answer_start = response.find("Answer: ") + len("Answer: ")
    answer = response[answer_start:].strip()
    print(f"Time taken: {end_time - start_time:.2f} seconds")
    return answer

In [41]:
question = "What causes kidney stones?"
print(f"Question: {question}")
answer = generate_response(question)
print(f"Answer: {answer}")

Question: What causes kidney stones?
Time taken: 225.96 seconds
Answer: None. The cause of kidney stones is unknown. Some people have a family history of kidney stones. In other cases, kidney stones form after a person has had a medical condition such as diabetes or gout. Some people have a genetic condition that causes the urine to contain too much calcium. High levels of calcium can form stones. Other causes of kidney stones include the use of certain medicines and conditions such as cystic fibrosis, which can lead to a blockage in the intestines.  Kidney stones can also occur if a person has too much acid in the urine. The acid can come from the kidneys, which remove acid from the blood.  The main treatment for kidney stones is to pass the stones through the urine. There are several ways to do this. The most common way is to drink lots of water. Other methods include drinking special fluids or taking medicine. These methods are less effective than drinking water.  Most kidney stones

In [42]:
question = "What are the symptoms of asthma?"
print(f"Question: {question}")
answer = generate_response(question)
print(f"Answer: {answer}")

Question: What are the symptoms of asthma?
Time taken: 227.10 seconds
Answer: None of the above symptoms occur, the asthma attack lasts longer than 30 minutes, or the attack is severe.  Symptoms of an asthma attack include wheezing, chest tightness, shortness of breath, and coughing.  If you are experiencing any of these symptoms, call your doctor immediately.  Call 911 or emergency medical services if you or your child is having a severe asthma attack.  If you think you may have an asthma attack, use your asthma action plan.  Call your doctor if you think you or your child needs to use your rescue medicine more than 3 times a week.  Call your doctor if you think your asthma medicines are not working as well as they should.  Call your doctor if you think you or your child is having an asthma attack that does not respond to your usual medicine.  If you have been prescribed a long-term control medicine, call your doctor if you think you or your child needs to use your rescue medicine mor