This is a competition hosted on [Kaggle](https://www.kaggle.com/competitions/data-assistants-with-gemma) publish by Google.

Supervised Fine-Tuning (SFT) adapts pre-trained models to specific tasks using labeled data, improving performance by adjusting the entire model. [LoRA](https://arxiv.org/abs/2106.09685) is an efficient fine-tuning approach within SFT that reduces computational and storage costs by introducing low-rank matrices.

In [None]:
!pip install -U transformers bitsandbytes accelerate
!pip install datasets --no-deps
!pip install trl
!pip install peft

Collecting transformers
  Downloading transformers-4.50.0-py3-none-any.whl.metadata (39 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.45.3-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch<3,>=2.0->bitsandbytes)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch<3,>=2.0->bitsandbytes)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch<3,>=2.0->bitsandbytes)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch<3,>=2.0->bitsandbytes)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch<3,>=2.0->bitsandbytes)
  Downloading nvidia

In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Token has not been saved to git credential helper.


In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoProcessor
import torch
from IPython.display import Markdown
import pandas as pd
from datasets import Dataset
from peft import LoraConfig
from trl import SFTTrainer, SFTConfig
import ast

## Load Model
We use Google's model [Gemma 1.0](https://huggingface.co/google/gemma-2b-it) 2B instruct version.

In [None]:
# model_id = 'google/gemma-2b'
model_id = 'google/gemma-2b-it'

# 4 Bit Config
bnb_config_4bit = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config_4bit, low_cpu_mem_usage=True, trust_remote_code=True)

# Loading Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.padding_side = "right"
print(f"Gemma 1.0 4Bit Model size: {model.get_memory_footprint()/1024./1024./1024.:,} GB")

config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/34.2k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

Gemma 1.0 4Bit Model size: 1.8995556831359863 GB


In [None]:
system =  "You are a skilled software engineer. "
question =system + "What is the difference between a variable and an object"

prompt = f"Question: {question} \n Answer: "

inputs = tokenizer(prompt, return_tensors='pt', padding=True, truncation=True, max_length=512).to("cuda")

outputs = model.generate(**inputs, num_return_sequences=1, max_new_tokens=512)

text = tokenizer.decode(outputs[0], skip_special_tokens=True)

# print(text)
Markdown(text.split("Answer:")[1])

 

Sure, here's the difference between a variable and an object in software engineering:

**Variable:**

* A variable is a named memory location that stores a single piece of data.
* It is declared using a keyword (e.g., int, float, string).
* The variable name should follow the same rules as other identifiers (e.g., cannot start with a number).
* Variables are used to store and retrieve data in a program.
* A variable can be assigned a value once or multiple times.

**Object:**

* An object is a more complex data structure that contains multiple variables and methods.
* It is created using a constructor function.
* Objects are stored in memory and have their own memory addresses.
* Objects can contain multiple variables of different data types.
* Objects can have methods that perform operations on their data.
* Objects are used to represent real-world entities or data structures.

**Here's an example to illustrate the difference:**

```python
# Variable
name = "John"
age = 30

# Object
person = {"name": "John", "age": 30}
```

In this example, the `name` variable is a variable that stores a string, while the `person` object is an object that contains multiple variables and methods.

**Key differences:**

| Feature | Variable | Object |
|---|---|---|
| Data type | Any | Objects |
| Memory address | No | Yes |
| Creation | Using `var` keyword | Using constructor function |
| Scope | Local to function | Global |
| Use case | Storing and retrieving data | Representing real-world entities |

## Load the Supplementary Dataset

In [None]:
!gdown '1HKzCy_vxb8hUzVEWhhgL_pNTIXIvEdw7' --output Dataset_Python_Question_Answer.csv

Downloading...
From: https://drive.google.com/uc?id=1HKzCy_vxb8hUzVEWhhgL_pNTIXIvEdw7
To: /content/Dataset_Python_Question_Answer.csv
  0% 0.00/721k [00:00<?, ?B/s]100% 721k/721k [00:00<00:00, 73.1MB/s]


In [None]:
data = pd.read_csv('Dataset_Python_Question_Answer.csv')
dataset = Dataset.from_pandas(data)
data.head()

Unnamed: 0,Question,Answer
0,What is the difference between a variable and...,"[""Sure, here's the difference between a variab..."
1,What is the difference between a built-in fun...,"[""Sure. Here's the difference between built-in..."
2,What is the difference between the `print` fu...,"[""Sure. Here's the difference between the two ..."
3,What is the difference between an expression ...,"[""Sure! Here's the difference between an expre..."
4,What is the difference between `True` and `Fa...,"[""Sure. Here's the difference between `True` a..."


In [None]:
print('The size of the dataset: ', data.shape, '\b.')

The size of the dataset:  (419, 2) .


## Supervised Finetune the model with LoRA (Low-Rank Adaptation)
**Supervised Fine-Tuning (SFT)** is a method for adapting pre-trained models to **specific tasks** using labeled data. It involves adjusting the entire model to improve performance on a given task.

- **LoRA (Low-Rank Adaptation)**: A specialized approach within SFT that focuses on efficient fine-tuning by introducing low-rank matrices. It allows for fewer parameter updates and reduces computational and storage costs. Please refer to [Paper](https://arxiv.org/abs/2106.09685) for more details. There is a brief survey [Website](https://towardsdatascience.com/understanding-lora-low-rank-adaptation-for-finetuning-large-models-936bce1a07c6/).

In [None]:
def formatting_func(example):
    template = "Instruction:\n{instruction}\n\nResponse:\n{response}"
    list_from_str = ast.literal_eval(example['Answer'])
    line = template.format(instruction=example['Question'], response='\n '.join(list_from_str))
    return line


LoRA Arguments
- **r (int)**: Defines the rank of the low-rank decomposition in LoRA, balancing memory usage and accuracy, with a default value of 8.
- **target_modules (List[str])**: Specifies the Transformer layers where LoRA will be applied, including `q_proj`, `o_proj`, `k_proj`, `v_proj`, `gate_proj`, `up_proj`, and `down_proj`.
- **task_type (str, optional)**: Specifies the fine-tuning task type, though not used in this configuration, it can help optimize LoRA for specific tasks like "CAUSAL_LM".

Please see [LoRA Document](https://huggingface.co/docs/peft/package_reference/lora) for more details.

In [None]:
print(model)

GemmaForCausalLM(
  (model): GemmaModel(
    (embed_tokens): Embedding(256000, 2048, padding_idx=0)
    (layers): ModuleList(
      (0-17): 18 x GemmaDecoderLayer(
        (self_attn): GemmaAttention(
          (q_proj): Linear4bit(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear4bit(in_features=2048, out_features=256, bias=False)
          (v_proj): Linear4bit(in_features=2048, out_features=256, bias=False)
          (o_proj): Linear4bit(in_features=2048, out_features=2048, bias=False)
        )
        (mlp): GemmaMLP(
          (gate_proj): Linear4bit(in_features=2048, out_features=16384, bias=False)
          (up_proj): Linear4bit(in_features=2048, out_features=16384, bias=False)
          (down_proj): Linear4bit(in_features=16384, out_features=2048, bias=False)
          (act_fn): GELUActivation()
        )
        (input_layernorm): GemmaRMSNorm((2048,), eps=1e-06)
        (post_attention_layernorm): GemmaRMSNorm((2048,), eps=1e-06)
      )
    )
    (n

In [None]:
# https://stackoverflow.com/questions/76768226/target-modules-for-applying-peft-lora-on-different-models
def get_specific_layer_names(model):
    # Create a list to store the layer names
    layer_names = []

    # Recursively visit all modules and submodules
    for name, module in model.named_modules():
        # Check if the module is an instance of the specified layers
        if isinstance(module, (torch.nn.Linear, torch.nn.Embedding, torch.nn.Conv2d)):
            # model name parsing

            layer_names.append('.'.join(name.split('.')[4:]).split('.')[0])

    return layer_names

list(set(get_specific_layer_names(model)))


['',
 'up_proj',
 'down_proj',
 'v_proj',
 'k_proj',
 'o_proj',
 'gate_proj',
 'q_proj']

In [None]:
lora_config = LoraConfig(
    r = 8,
    target_modules = ["q_proj", "o_proj", "k_proj", "v_proj",
                      "gate_proj", "up_proj", "down_proj"],
    task_type = "CAUSAL_LM",
)

SFTConfig Arguments
- **per_device_train_batch_size (int)**: Specifies the batch size for each device (GPU). In this case, it's set to 1 for low memory usage.
- **gradient_accumulation_steps (int)**: The number of steps to accumulate gradients before performing a backward pass. Here, gradients are accumulated over 4 steps.
- **warmup_steps (int)**: The number of steps for learning rate warm-up before it reaches the specified `learning_rate`. This helps stabilize training in the beginning.
- **max_steps (int)**: The total number of training steps to run. In this case, the training will stop after 50 steps.
- **learning_rate (float)**: The learning rate for training. Here it's set to `2e-4`, which is a moderate value to prevent overshooting during optimization.
- **fp16 (bool)**: Whether to use 16-bit precision for training (faster computation and less memory usage), set to `True` here.
- **optim (str)**: The optimizer used for training. Here, `paged_adamw_8bit` is used for efficient training with 8-bit precision.
- **max_seq_length (int)**: The maximum sequence length for inputs. Set to 512 tokens in this case to fit the model's requirements.

Please see [SFTConfig Document](https://huggingface.co/docs/trl/v0.16.0/en/sft_trainer#trl.SFTConfig) for more details.

In [None]:
training_args = SFTConfig(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=50,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit",
        max_seq_length=512,
        report_to="tensorboard",
    )

In [None]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    processing_class=tokenizer,
    args=training_args,
    peft_config=lora_config,
    formatting_func=formatting_func,
    # data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False, return_tensors="pt"),
)

Applying formatting function to train dataset:   0%|          | 0/419 [00:00<?, ? examples/s]

Converting train dataset to ChatML:   0%|          | 0/419 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/419 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/419 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/419 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [None]:
import os
os.environ["WANDB_DISABLED"] = "true"

In [None]:
trainer.train()

Step,Training Loss
1,0.943
2,0.9336
3,0.8467
4,0.7783
5,0.7901
6,0.8412
7,0.7272
8,0.7193
9,0.8088
10,0.6157


TrainOutput(global_step=50, training_loss=0.5645383900403976, metrics={'train_runtime': 114.3031, 'train_samples_per_second': 1.75, 'train_steps_per_second': 0.437, 'total_flos': 935863376732160.0, 'train_loss': 0.5645383900403976})

In [None]:
system =  "You are a skilled software engineer. "
question =system + "What is the difference between a variable and an object"

prompt = f"Question: {question} \n Answer: "

inputs = tokenizer(prompt, return_tensors='pt', padding=True, truncation=True, max_length=512).to("cuda")

outputs = model.generate(**inputs, num_return_sequences=1, max_new_tokens=512)

text = tokenizer.decode(outputs[0], skip_special_tokens=True)

# print(text)
Markdown(text.split("Answer:")[1])

 
Sure. Here's the difference between a variable and an object:
 **Variable:**
 * A variable is a named memory location that stores a single value.
 * It is a placeholder for a specific value, which can be changed later.
 * Variables are used to store data that is used in multiple parts of a program.
 * They are declared using the `=` operator, followed by the variable name and an assignment operator.
 * Example: `name = "John"`
 **Object:**
 * An object is a more complex data structure that contains multiple variables and methods.
 * It is an instance of a class, which defines the structure and behavior of the object.
 * Objects are created using the `new` keyword, followed by the class name and the constructor function.
 * Objects can have their own data and methods, which are not defined in the class.
 * They are used to represent real-world entities, such as people, animals, or data.
 * Example:
 ```python
 class Person:
     name = ""
     age = 0
     def __init__(self, name, age):
         self.name = name
         self.age = age
 ```
 In summary, variables are used to store single values, while objects are used to represent complex data structures with multiple variables and methods.

## References
- Paul Mooney and Ashley Chow. Google – AI Assistants for Data Tasks with Gemma. https://kaggle.com/competitions/data-assistants-with-gemma, 2024. Kaggle.
- https://www.kaggle.com/code/shiivvvaam/pygemma-finetuned-rag/notebook
- https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/fine-tune-llms-in-2024-with-trl.ipynb
- https://clay-atlas.com/blog/2024/01/03/supervised-fine-tuning-sft-trainer/
- https://ai.google.dev/gemma/docs/core/huggingface_vision_finetune_qlora?hl=zh-TW