<div align="center">
<h2> Fine tuning Gemma for efficient Function calling </h2>
</div>

### In this code we are fine tuning the gemma model(2b parameters) for efficient function calling. the dataset we have used is: "glaiveai/glaive-function-calling-v2".
### The Final finetuned model will be availabe on my Hugging face. Link : [Final Model](https://huggingface.co/Dharinesh/finetuned-gemma-function-calling)

## Installation of Dependencies:

In [None]:
!pip install "unsloth[colab]@git+https://github.com/unslothai/unsloth.git"
!pip install -q transformers datasets peft trl bitsandbytes accelerate
!pip install -q git+https://github.com/huggingface/transformers.git@main

Collecting unsloth[colab]@ git+https://github.com/unslothai/unsloth.git
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-unkphdrr/unsloth_1c42bcd8ee284a29946d85bdb4cba7df
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-unkphdrr/unsloth_1c42bcd8ee284a29946d85bdb4cba7df
  Resolved https://github.com/unslothai/unsloth.git to commit a2ee56813ed67b7f5336793cbca84442a94140fd
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting xformers@ https://download.pytorch.org/whl/cu121/xformers-0.0.22.post7-cp310-cp310-manylinux2014_x86_64.whl (from unsloth[colab]@ git+https://github.com/unslothai/unsloth.git)
  Downloading https://download.pytorch.org/whl/cu121/xformers-0.0.22.post7-cp310-cp310-manylinux2014_x86_64.whl (211.8 MB)
[2K 

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hcanceled
[31mERROR: Operation cancelled by user[0m[31m
[0m

## Importing Libraries

In [None]:
import os
import torch
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, PeftModel, PeftConfig
from trl import SFTTrainer
from transformers import TrainingArguments

## Checking GPU

In [None]:
print(f"GPU is available: {torch.cuda.is_available()}")
print(f"GPU device name: {torch.cuda.get_device_name(0)}")

GPU is available: True
GPU device name: Tesla T4


## Hugging Face Authentication:

In [None]:
from huggingface_hub import login
login(token="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")

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


## Loading the Model and Dataset

In [None]:
# Model and dataset names
model_name = "unsloth/gemma-1.1-2b-it-bnb-4bit"
dataset_name = "glaiveai/glaive-function-calling-v2"

In [None]:
# Load the model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    use_auth_token=True,
    use_cache=False
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.13k [00:00<?, ?B/s]

Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.


model.safetensors:   0%|          | 0.00/2.07G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

In [None]:
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=True)
tokenizer.pad_token = tokenizer.eos_token



tokenizer_config.json:   0%|          | 0.00/40.6k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

## Preparing and Filtering the Dataset:

In [None]:
# Prepare the dataset
dataset = load_dataset(dataset_name)

def prepare_prompt(example):
    return f"SYSTEM: {example['system']}\nUSER: {example['chat']}"

def filter_long_examples(example):
    prompt = prepare_prompt(example)
    return len(tokenizer.encode(prompt)) <= 512

filtered_dataset = dataset["train"].filter(filter_long_examples)

def tokenize_prompt(example):
    prompt = prepare_prompt(example)
    inputs = tokenizer(prompt, truncation=True, max_length=512, return_tensors="pt")
    return inputs

tokenized_dataset = filtered_dataset.map(tokenize_prompt, batched=True, remove_columns=filtered_dataset.column_names)

train_dataset = tokenized_dataset

Filter:   0%|          | 0/112960 [00:00<?, ? examples/s]

Map:   0%|          | 0/69727 [00:00<?, ? examples/s]

## LoRA Configuration for Efficient Fine-Tuning:

In [None]:
# LoRA Configuration
lora_config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

## Training Arguments and SFT Configuration:

In [None]:
# Training Arguments
output_dir = "./lora_gemma_function_calling"
training_args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=2,
    learning_rate=2e-4,
    num_train_epochs=5,  # Reduced for demonstration, increase for better results
    logging_steps=10,
    save_steps=100,
    save_total_limit=3,
    gradient_checkpointing=True,  # Enable gradient checkpointing
    # ... (other SFT config parameters)
)

## Initializing and Training with SFTTrainer:

In [None]:
# Initialize the Trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset,
    peft_config=lora_config,
    dataset_text_field="chat",
    max_seq_length=512,
    tokenizer=tokenizer,
    args=training_args,
)



In [None]:
# Train the model
trainer.train()



Step,Training Loss
10,0.8175
20,0.5537
30,0.3799
40,0.2527
50,0.2101
60,0.1754
70,0.1773
80,0.1585


TrainOutput(global_step=85, training_loss=0.33073512946858125, metrics={'train_runtime': 719.5335, 'train_samples_per_second': 0.486, 'train_steps_per_second': 0.118, 'total_flos': 2071963820359680.0, 'train_loss': 0.33073512946858125, 'epoch': 4.857142857142857})

In [None]:
# Save the trained model
trainer.model.save_pretrained(output_dir)

## Generating Outputs with the Fine-Tuned Model:

In [None]:
# Function for generating outputs
def get_model_output(model, tokenizer, prompt, max_new_tokens=100):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_new_tokens=max_new_tokens,
        temperature=0.7,
        num_return_sequences=1,
        do_sample=True,
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)


In [None]:
# Load the model and tokenizer
config = PeftConfig.from_pretrained(output_dir)
fine_tuned_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    return_dict=True,
    device_map='auto',
    use_auth_token=True
)
fine_tuned_model = PeftModel.from_pretrained(fine_tuned_model, output_dir)
tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=True)
tokenizer.pad_token = tokenizer.eos_token

Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.


## Custom input

In [None]:
input_str = """
<bos><|im_start|>system
You are a helpful assistant with access to the following functions. Use them if required -
{
    "name": "calculate_median",
    "description": "Calculate the median of a list of numbers",
    "parameters": {
        "type": "object",
        "properties": {
            "numbers": {
                "type": "array",
                "items": {
                    "type": "number"
                },
                "description": "The list of numbers"
            }
        },
        "required": [
            "numbers"
        ]
    }
}
<|im_end|>
<|im_start|>user
Hi, I have a list of numbers and I need to find the median. The numbers are 5, 2, 9, 1, 7, 4, 6, 3, 8.<|im_end|>
<|im_start|>assistant
<functioncall>
"""

output = get_model_output(fine_tuned_model, tokenizer, input_str, max_new_tokens=100)
print(output)


<|im_start|>system
You are a helpful assistant with access to the following functions. Use them if required -
{
    "name": "calculate_median",
    "description": "Calculate the median of a list of numbers",
    "parameters": {
        "type": "object",
        "properties": {
            "numbers": {
                "type": "array",
                "items": {
                    "type": "number"
                },
                "description": "The list of numbers"
            }
        },
        "required": [
            "numbers"
        ]
    }
}
<|im_end|>
<|im_start|>user
Hi, I have a list of numbers and I need to find the median. The numbers are 5, 2, 9, 1, 7, 4, 6, 3, 8.<|im_end|>
<|im_start|>assistant
<functioncall>
```
The median of the list is 6.

The median of a list of numbers is the middle value when the list is sorted from smallest to largest.

```
```
```


## Pushing the Model to Hugging Face Hub:

In [None]:
model.push_to_hub ("Dharinesh/finetuned-gemma-function-calling")
tokenizer.push_to_hub ("Dharinesh/finetuned-gemma-function-calling")

model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

README.md:   0%|          | 0.00/5.18k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/Dharinesh/finetuned-gemma-function-calling/commit/d2225856350742bc6a84bf41b63372c0b0d8fac5', commit_message='Upload tokenizer', commit_description='', oid='d2225856350742bc6a84bf41b63372c0b0d8fac5', pr_url=None, pr_revision=None, pr_num=None)