# Google's Gemma Model: Supervised Finetuning on Telugu Dataset
Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models.

## Finetuning:
Finetuning is a crucial step in the machine learning pipeline, aimed at refining a pre-trained model to perform better on a specific task or dataset. This process involves adjusting the model's weights through additional training, allowing it to learn from the nuances and intricacies of a targeted dataset. The goal is to enhance the model's accuracy and efficiency in tasks which the model is finetuned on.

## Unsupervised Finetuning:
Unsupervised Finetuning relies on unstructured textual blocks without explicit question-answer pairs. This approach allows the model to learn from a broader context, understanding the language's syntax, semantics, and usage patterns in a more generalized manner.


## Supervised Finetuning:
Supervised Finetuning is a strategic approach that leverages question-answer pairs to guide the model's learning process. By providing these structured pairs, the model gains a clearer understanding of the context and can make more accurate predictions.


## Parameter Efficient Finetuning:
Parameter Efficient Finetuning introduces a more resource-conscious method of model enhancement. By employing techniques such as LoRA (Low Rank Adapters), it focuses on finetuning a limited set of vectors instead of using the entire model.

## Big Shoutout to Unsloth
Unsloth emerges as a groundbreaking library in this context, offering tools and mechanisms to accelerate the finetuning process by doubling its speed. It helps speed up the finetuning process by 2x speed on several models including Mistral, LLama, and others.

Visit Unsloth:
https://unsloth.ai/



# Installing Libraries

In [None]:
%%capture
import torch
major_version, minor_version = torch.cuda.get_device_capability()
if major_version >= 8:
    # Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)
    !pip install "unsloth[colab-ampere] @ git+https://github.com/unslothai/unsloth.git"
else:
    # Use this for older GPUs (V100, Tesla T4, RTX 20xx)
    !pip install "unsloth[colab] @ git+https://github.com/unslothai/unsloth.git"
pass

# Loading Model

In [None]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048
dtype = None
load_in_4bit = True # Use 4bit quantization to reduce memory usage.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gemma-2b-it-bnb-4bit",
    # model_name = "Telugu-LLM-Labs/Telugu-gemma-2b-finetuned-sft",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...",
)



config.json:   0%|          | 0.00/1.11k [00:00<?, ?B/s]

==((====))==  Unsloth: Fast Gemma patching release 2024.3
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.1.0+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. Xformers = 0.0.22.post7. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth




model.safetensors:   0%|          | 0.00/2.07G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.15k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

# Loading Dataset

In [None]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["telugu_transliterated_instruction"]
    inputs       = examples["telugu_transliterated_input"]
    outputs      = examples["telugu_transliterated_output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass


In [None]:
from datasets import load_dataset
instruct_tune_dataset = load_dataset("Telugu-LLM-Labs/yahma_alpaca_cleaned_telugu_filtered_and_romanized",split = "train")

Downloading readme:   0%|          | 0.00/27.0 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/144M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

In [None]:
instruct_tune_dataset = instruct_tune_dataset.shuffle(seed=539)

In [None]:
instruct_tune_dataset[:5]

In [None]:
dataset1 = instruct_tune_dataset.map(formatting_prompts_func, batched = True,)

Map:   0%|          | 0/28910 [00:00<?, ? examples/s]

In [None]:
dataset1['text'][5]

'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nmeeku santoshanni kaliginche 5 vishayala jabitan srishtinchandi.\n\n### Input:\nNone\n\n### Response:\noka AIga, naku bhavalu levu, kabatti nenu sampradaya arthamlo anandanni anubhavinchalenu. aithe, manavulaku santoshanni kaliginche aidhu vishayalu ikkada unnaayi: \n\n1. snehitulu mariyu kutumbanto nanyamaina samayanni gadapadam\n2. vaaru anandinche abhiruchulu mariyu karyakalaapalalo palgonadam\n3. itharulaku sahayam cheyadam mariyu sanukula prabhavanni chupadam\n4. vaaru panichestunna lakshyanni sadhinchadam\n5. prakrithi leda kala yokka andanni anubhavimchadam.<eos>'

# Hyperparameter Selection

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 64, #Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = True,
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

Unsloth 2024.3 patched 18 layers with 18 QKV layers, 18 O layers and 18 MLP layers.


In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset1,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        # per_device_train_batch_size = 10,
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 100,
        # save_steps = 100,
        # num_train_epochs = 1, #Number of Epochs
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

Map (num_proc=2):   0%|          | 0/28910 [00:00<?, ? examples/s]

# Training

In [None]:
trainer_stats = trainer.train()

# Inference

In [None]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "fibonacci series rayadaniki python program ivvu", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 256, use_cache = True)
response = tokenizer.batch_decode(outputs)

In [None]:
response[0][response[0].find("### Response:"):]

'### Response:\n```python\ndef fibonacci(n):\n    """\n    Fibonacci series rayadaniki python program ivvu\n\n    Args:\n        n: int\n\n    Returns:\n        None\n    """\n\n    # base case\n    if n == 0:\n        return 0\n    elif n == 1:\n        return 1\n\n    # fibonacci series rayadaniki python program ivvu\n    return fibonacci(n-1) + fibonacci(n-2)\n\n\nif __name__ == "__main__":\n    print(fibonacci(10))\n```<eos>'

# Inference Using Model Trained for 2 Epochs

In [None]:
model2, tokenizer2 = FastLanguageModel.from_pretrained(
    # model_name = "TeluguHouseCollective/Gemma-2B-Telugu_Instruct_Finetuned",
    model_name = "Telugu-LLM-Labs/Telugu-gemma-2b-finetuned-sft",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...",
)

In [None]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model2) # Enable native 2x faster inference
inputs = tokenizer2(
[
    alpaca_prompt.format(
        "fibonacci series rayadaniki python program ivvu", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model2.generate(**inputs, max_new_tokens = 256, use_cache = True)
response = tokenizer2.batch_decode(outputs)

In [None]:
response[0][response[0].find("### Response:"):]

# Saving and Pushing to Hugging Face

In [None]:
new_model_name = "Gemma-2B-Telugu_Instruct_Finetuned_March"

In [None]:
# Saving the Adapters local and also into hugging face
if True: model.save_pretrained(new_model_name) # Local saving
if True: model.push_to_hub_merged("TeluguHouseCollective/"+new_model_name, tokenizer, save_method = "lora", token = " ")

In [None]:
# Saving the complete model in 16bit into hugging face
# # Merge to 16bit
if True: model.save_pretrained_merged(new_model_name, tokenizer, save_method = "merged_16bit",)
if True: model.push_to_hub_merged("TeluguHouseCollective/"+new_model_name, tokenizer, save_method = "merged_16bit",token = " ")