# Finetuning Using Google Gemma's Model

S.Dheenath
dheenathsundararajan@gmail.com

###Install the required dependencies

In [None]:
!pip3 install -q -U bitsandbytes==0.42.0
!pip3 install -q -U peft==0.8.2
!pip3 install -q -U trl==0.7.10
!pip3 install -q -U accelerate==0.27.1
!pip3 install -q -U datasets==2.17.0
!pip3 install -q -U transformers==4.38.0

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.0/105.0 MB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m183.4/183.4 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m314.1/314.1 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.3/21.3 MB[0m [31m56.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m150.9/150.9 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m103.4/103.4 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.8/40.8 MB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━

###Import all the required libraries

In [None]:
import os
import transformers
import torch
from google.colab import userdata
from datasets import load_dataset
from trl import SFTTrainer
from peft import LoraConfig
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import BitsAndBytesConfig, GemmaTokenizer

###Access the model using HuggingFace token

In [None]:
os.environ["HF_TOKEN"] = userdata.get('HF_TOKEN')

Using BitsAndBytesConfig for the google/gemma-2b model involves **4-bit quantization** to reduce memory and computational needs, making it efficient for running on limited hardware. The **nf4 quantization** type ensures accuracy despite lower precision, and computations use the **bfloat16** format to balance speed and information retention.

In [None]:
model_id = "google/gemma-2b"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_id, token=os.environ['HF_TOKEN'])
model = AutoModelForCausalLM.from_pretrained(model_id,
                                             quantization_config=bnb_config,
                                             device_map={"":0},
                                             token=os.environ['HF_TOKEN'])



tokenizer_config.json:   0%|          | 0.00/33.6k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

Responses

In [None]:
text = "Quote: Imagination is more"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quote: Imagination is more important than knowledge. Knowledge is limited. Imagination encircles the world.

- Albert Einstein

The


In [None]:
text = "Quote: Purpose fuels"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quote: Purpose fuels the soul
Author: Aung San Suu Kyi
Date: 1991



In [None]:
text = "what is machine learning"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=30)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

what is machine learning?
- is a sub-field of computer science that focuses on giving computers the ability to learn and improve without being explicitly programmed.
- is


In [None]:
text = "Quote: I have not failed"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quote: I have not failed. I've just found 10,000 ways that won't work.


In [None]:
text = "Auther of a roadside stand"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Auther of a roadside stand in the 1930s, he was the first to sell the fruit in the area


In [None]:
os.environ["WANDB_DISABLED"] = "false"

The LoraConfig sets up **Low-Rank Adaptation (LoRA)** for efficient fine-tuning of a language model. It specifies a rank of 8 for low-rank matrices to reduce parameters and computational cost. The target_modules list defines the model layers to fine-tune, and task_type is set to CAUSAL_LM for next-word prediction tasks. This configuration allows adapting large models to specific tasks with fewer resources.

In [None]:
lora_config = LoraConfig(
    r = 8,
    target_modules = ["q_proj", "o_proj", "k_proj", "v_proj",
                      "gate_proj", "up_proj", "down_proj"],
    task_type = "CAUSAL_LM",
)

The Abirate/english_quotes dataset on Hugging Face offers a curated collection of **English quotes from Goodreads**, structured in JSON format. It includes fields for author and quote texts

Here the model is trained with four diverse datasets to provide responses across various scenarios

In [None]:
from datasets import load_dataset

data = load_dataset("Abirate/english_quotes")
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)

Downloading readme:   0%|          | 0.00/5.55k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/647k [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/2508 [00:00<?, ? examples/s]

In [None]:
data['train']['quote']

['“Be yourself; everyone else is already taken.”',
 "“I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.”",
 "“Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.”",
 '“So many books, so little time.”',
 '“A room without books is like a body without a soul.”',
 "“Be who you are and say what you feel, because those who mind don't matter, and those who matter don't mind.”",
 "“You've gotta dance like there's nobody watching,Love like you'll never be hurt,Sing like there's nobody listening,And live like it's heaven on earth.”",
 "“You know you're in love when you can't fall asleep because reality is finally better than your dreams.”",
 '“You only live once, but if you do it right, once is enough.”',
 '“Be the change that you wish to see in the world.”',
 "“In three words I can sum up everything I

In [None]:
def formatting_func(example):
    text = f"Quote: {example['quote'][0]}\nAuthor: {example['author'][0]}"
    return [text]

In [None]:
data['train']

Dataset({
    features: ['quote', 'author', 'tags', 'input_ids', 'attention_mask'],
    num_rows: 2508
})

In [None]:
trainer = SFTTrainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=100,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    peft_config=lora_config,
    formatting_func=formatting_func,
)



Map:   0%|          | 0/2508 [00:00<?, ? examples/s]



In [None]:
import torch
torch.cuda.empty_cache()

In [None]:
trainer.train()

Step,Training Loss
1,1.6797
2,0.6299
3,1.0237
4,1.0337
5,0.4207
6,1.2326
7,1.0955
8,0.331
9,0.5588
10,0.4703


TrainOutput(global_step=100, training_loss=0.14299218144267797, metrics={'train_runtime': 59.9209, 'train_samples_per_second': 6.675, 'train_steps_per_second': 1.669, 'total_flos': 55030401331200.0, 'train_loss': 0.14299218144267797, 'epoch': 66.67})

##Testing our custom dataset

In [None]:
text = "Quote: A woman is like a tea bag;"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quote: A woman is like a tea bag; you can’ unspeakably strong, or you can be weak and trivial.
Author: Eleanor Roosevelt


In [None]:
text = "Quote: Outside of a dog, a book is man's"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quote: Outside of a dog, a book is man's best friend.
Author: Nicolas Chamfort
Category: Quotes
Quote: The most wasted of


In [None]:
text = "Quote: Be the change that you wish"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quote: Be the change that you wish to see in the world.
Author: Mahatma Gandhi
Category: Quotes
Quote: The most


In [None]:
text = "Quote: So many books"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=30)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Quote: So many books, so little time.
Author: Stephen King
Category: Books
Quote: The most wasted of all days is one without laughter.
Author


The "facebook/kilt_tasks" dataset is a collection of benchmark datasets designed for **Knowledge Intensive Language Tasks (KILT)**. It includes a variety of tasks such as fact-checking, entity linking, question answering, and dialogue

In [None]:
from datasets import load_dataset

data = load_dataset("facebook/kilt_tasks")
data = data.map(lambda samples: tokenizer(samples["input"]), batched=True)

Map:   0%|          | 0/87372 [00:00<?, ? examples/s]

Map:   0%|          | 0/2837 [00:00<?, ? examples/s]

Map:   0%|          | 0/1444 [00:00<?, ? examples/s]

In [None]:
data['train']['input']

['how i.met your mother who is the mother',
 'who had the most wins in the nfl',
 'who played mantis guardians of the galaxy 2',
 'what channel is the premier league on in france',
 "god's not dead a light in the darkness release date",
 'who is the current president of un general assembly',
 'when do the eclipse supposed to take place',
 'what is the name of the sea surrounding dubai',
 'who holds the nba record for most points in a career',
 'when did the new maze runner movie come out',
 'how many players on a box lacrosse team',
 'when did the nba 3 second rule start',
 'who plays norman bates in the tv show',
 'when did usa start driving on the right',
 'name some components of the central nervous system (cns)',
 "who's the director of the price is right",
 'bridge on the river kwai fact or fiction',
 'who is the prime minister of republic of mauritius',
 'what teams are in the fa cup final',
 'who is the first person who went to moon',
 'when did the first lego movie come out',
 

In [None]:
def formatting_func(example):
    text = f"You: {example['input'][0]}\nLisa: {example['output'][0]}"
    return [text]

In [None]:
data['train']

Dataset({
    features: ['id', 'input', 'meta', 'output', 'input_ids', 'attention_mask'],
    num_rows: 87372
})

In [None]:
trainer = SFTTrainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=100,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    peft_config=lora_config,
    formatting_func=formatting_func,
)



Map:   0%|          | 0/87372 [00:00<?, ? examples/s]



In [None]:
trainer.train()

Step,Training Loss
1,2.3646
2,2.1273
3,2.4142
4,1.7774
5,1.732
6,1.4315
7,1.7563
8,1.2107
9,1.8581
10,1.2805


TrainOutput(global_step=100, training_loss=0.6893454584479332, metrics={'train_runtime': 211.1264, 'train_samples_per_second': 1.895, 'train_steps_per_second': 0.474, 'total_flos': 1459853178593280.0, 'train_loss': 0.6893454584479332, 'epoch': 4.55})

#Testing our custom dataset

In [None]:
text = "Input: who played mantis guardians of the galaxy 2"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Input: who played mantis guardians of the galaxy 2
Output: [{'answer': 'Michael Peña', 'meta': {'score': -1},


In [None]:
text = "Input: who played mantis guardians of the galaxy 2"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=20)
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
answer = decoded_output.split("'answer': ")[1].split(",")[0].strip("' ")
print(answer)


Michael Peña


In [None]:
text = "Input: what is the name of the sea surrounding dubai"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Input: what is the name of the sea surrounding dubai
Output: Arabian sea
Explanation: the Arabian sea is the largest body of water in the world


In [None]:
text = "Input: bridge on the river kwai fact or fiction"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Input: bridge on the river kwai fact or fiction
Answer: fact
Explanation:
The bridge on the river kwai is a fact.



In [None]:
text = "Input: when does avengers infinity war part 2 come out"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Input: when does avengers infinity war part 2 come out
Lisa: [{'answer': 'May 20, 2019', '


In [None]:
text = "Input: when is the last time england won the workd cup"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Input: when is the last time england won the workd cup
Output: [{'answer': '2018', 'meta': {'score': -


The "Amod/mental_health_counseling_conversations" dataset consists of conversations related to mental health counseling.

In [None]:
from datasets import load_dataset

data = load_dataset("Amod/mental_health_counseling_conversations")
data = data.map(lambda samples: tokenizer(samples["Context"]), batched=True)

Map:   0%|          | 0/3512 [00:00<?, ? examples/s]

In [None]:
data['train']['Context']

["I'm going through some things with my feelings and myself. I barely sleep and I do nothing but think about how I'm worthless and how I shouldn't be here.\n   I've never tried or contemplated suicide. I've always wanted to fix my issues, but I never get around to it.\n   How can I change my feeling of being worthless to everyone?",
 "I'm going through some things with my feelings and myself. I barely sleep and I do nothing but think about how I'm worthless and how I shouldn't be here.\n   I've never tried or contemplated suicide. I've always wanted to fix my issues, but I never get around to it.\n   How can I change my feeling of being worthless to everyone?",
 "I'm going through some things with my feelings and myself. I barely sleep and I do nothing but think about how I'm worthless and how I shouldn't be here.\n   I've never tried or contemplated suicide. I've always wanted to fix my issues, but I never get around to it.\n   How can I change my feeling of being worthless to everyon

In [None]:
def formatting_func(example):
    text = f"You: {example['Context'][0]}\nLisa: {example['Response'][0]}"
    return [text]

In [None]:
data['train']

Dataset({
    features: ['Context', 'Response', 'input_ids', 'attention_mask'],
    num_rows: 3512
})

In [None]:
trainer = SFTTrainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=100,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    peft_config=lora_config,
    formatting_func=formatting_func,
)



Map:   0%|          | 0/3512 [00:00<?, ? examples/s]



In [None]:
trainer.train()

Step,Training Loss
1,2.6608
2,2.6608
3,2.6392
4,2.588
5,2.5301
6,2.4657
7,2.3984
8,2.3332
9,2.2687
10,2.1989


TrainOutput(global_step=100, training_loss=0.49377331798430535, metrics={'train_runtime': 200.7159, 'train_samples_per_second': 1.993, 'train_steps_per_second': 0.498, 'total_flos': 1302565416960000.0, 'train_loss': 0.49377331798430535, 'epoch': 100.0})

#Testing custom dataset

In [None]:
text = "Context: I have so many issues to address. I have a history of sexual abuse, I’m a breast cancer survivor"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Context: I have so many issues to address. I have a history of sexual abuse, I’m a breast cancer survivor and I have a long history of depression and low self esteem. I’ve had counseling about all of this but I still have a lot of issues to address. I’ve had a good relationship with God but I have a long way to go before I fully trust Him.
    I have a long history of depression. I’ve been on several different antidepressants over the years. I’ve had counseling about my depression but I still have a lot of issues to address. I have a


In [None]:
text = "Context: Maybe this is a stupid question, but I sometimes don't know what's real or not. If feel at times like everyone's lying. How do I know if God is one of those lies?"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Context: Maybe this is a stupid question, but I sometimes don't know what's real or not. If feel at times like everyone's lying. How do I know if God is one of those lies?
Lisa: It is very normal to not feel real when everyone else is fine. You can read more about this here: http:// chrétiens.about.com/od/socialissues/f/social_notreal.htm And to feel sure that God is real, you can spend time reading and studying the Bible. The more of the Word of God you read and absorb, the more solid your faith will be. You can also join a small group of Christians where you can be honest about


In [None]:
text = "Context: I have been diagnosed with posttraumatic stress disorder due to my military experiences. Not a year ago, I had a car accident. Could this experience add more problems?"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Context: I have been diagnosed with posttraumatic stress disorder due to my military experiences. Not a year ago, I had a car accident. Could this experience add more problems?
Lisa: It is possible that the accident can trigger or bring up the PTSD symptoms. If you are concerned that there is a connection between the two experiences and that you will have a trigger or have more PTSD symptoms, then you may want to see a therapist about your symptoms and relationship between the two experiences. In the meantime, you may want to consider seeing a therapist about your symptoms. Often times, people have secondary or new PTSD symptoms when they have a new trigger or stressor. You may


The "PawanKrd/math-gpt-4o-200k" dataset is designed to enhance the capabilities of machine learning models, particularly for solving **mathematical problems**. It comprises 200,000 mathematical problems and solutions spanning various domains such as algebra, geometry, calculus, and statistics.

In [None]:
from datasets import load_dataset

data = load_dataset("PawanKrd/math-gpt-4o-200k")
data = data.map(lambda samples: tokenizer(samples["prompt"]), batched=True)

Downloading readme:   0%|          | 0.00/321 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/115M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/200035 [00:00<?, ? examples/s]

Map:   0%|          | 0/200035 [00:00<?, ? examples/s]

In [None]:
data['train']['prompt']

['Jungkook is the 5th place. Find the number of people who crossed the finish line faster than Jungkook.',
 'A number divided by 10 is 6. Yoongi got the result by subtracting 15 from a certain number. What is the result he got?',
 'Dongju selects a piece of paper with a number written on it, and wants to make a three-digit number by placing the first selected number in the hundreds place, the second selected in the tens place, and the third selected in the units place. If the numbers written on each paper was 1, 6, and 8, respectively, find the sum of the second smallest and third smallest three-digit numbers that Dongju can make. However, you cannot select the same numbered paper multiple times.',
 'You wanted to subtract 46 from a number, but you accidentally subtract 59 and get 43. How much do you get from the correct calculation?',
 'The length of one span of Jinseo is about 12 centimeters (cm). When Jinseo measured the length of the shorter side of the bookshelf, it was about two 

In [None]:
def formatting_func(example):
    text = f"You: {example['prompt'][0]}\nLisa: {example['response'][0]}"
    return [text]

In [None]:
data['train']

Dataset({
    features: ['prompt', 'response', 'input_ids', 'attention_mask'],
    num_rows: 200035
})

As this dataset contains over 200k rows..the gpu failed to load then thus we can reduce number of steps and increase the gradient accumulation

In [None]:
trainer = SFTTrainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=8,
        warmup_steps=2,
        max_steps=10,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit"
    ),
    peft_config=lora_config,
    formatting_func=formatting_func,
)

In [None]:
import torch
torch.cuda.empty_cache()

In [None]:
trainer.train()

Step,Training Loss
1,0.9613
2,0.9157
3,0.8928
4,0.8673
5,0.9182
6,0.8915
7,0.9298
8,0.9877


In [None]:
text = "prompt: A number divided by 10 is 6. Yoongi got the result by subtracting 15 from a certain number. What is the result he got?"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=30)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training a model with a variety of datasets enables the creation of highly specialized chatbots tailored to specific use cases. By selecting and fine-tuning on domain-specific data, such as customer service interactions or medical conversations, the model can accurately handle relevant queries. This process involves cleaning and preprocessing data, **fine-tuning with frameworks like PEFT or TRL**, and continuously improving response quality through user feedback. Optimizing performance with techniques like **4-bit quantization** allows deployment on various hardware. Integrating the chatbot with existing systems and maintaining regular updates ensures it adapts to evolving user needs, providing accurate and contextually relevant interactions.

We can enhance the user experience by integrating **Streamlit or Chainlit** to create an interactive chatbot interface in your local browser. This setup allows users to interact with the chatbot in real-time while facilitating model training with processed datasets for accurate responses. By using these frameworks, we can develop a user-friendly interface where the chatbot can be continuously fine-tuned with new data, ensuring it remains precise and relevant. Streamlit or Chainlit provides a seamless way to deploy and test the chatbot locally, making it easy to demonstrate its capabilities and gather user feedback for ongoing improvements.