# Finetuning Large Language Models for Emoji Generation

In [23]:
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7 peft

In [2]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: read).
The token `Research` has been saved to /root/.cache/huggingface/stored_tokens
Your token has been saved to /root/.cache/huggingface/token
Login successful.
The current active token is: `Research`


In [3]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

## Task 1 - Custom instruction dataset

In [4]:
data = [('It’s a beautiful sunny day!', '☀️🌻😎'),
('Have a great workout!', '💪🏋️\u200d♂️🏃\u200d♀️'),
('Let’s grab a coffee.', '☕👥🤝'),
('I’m listening to music.', '🎧🎶🎵'),
('I’m listening to music.', '🎧🎶🎵'),
('Happy New Year!', '🎆🥂🎊'),
('Happy New Year!', '🎆🥂🎊'),
('Happy Birthday!', '🎉🎂🎈'),
('Congratulations on your graduation!', '👨\u200d🎓🎓🥳'),
('Feeling under the weather.', '🤒🛌🌧️'),
('Happy Birthday!', '🎉🎂🎈'),
('Good luck on your interview!', '🍀👔💼'),
('Let’s grab a coffee.', '☕👥🤝'),
('Time for a movie night.', '🍿🎬📺'),
('I love reading books.', '📖❤️👓'),
('Time for a movie night.', '🍿🎬📺'),
('It’s a beautiful sunny day!', '☀️🌻😎'),
('Merry Christmas!', '🎄🎁❄️'),
('It’s a beautiful sunny day!', '☀️🌻😎'),
('Studying for exams.', '📚✍️🤓'),
('Let’s go dancing tonight.', '💃🕺🎶'),
('I need a day to relax.', '🛀🧖\u200d♀️😌'),
('Have a great workout!', '💪🏋️\u200d♂️🏃\u200d♀️'),
('Ready for the football game.', '🏈📣🏟️'),
('Feeling under the weather.', '🤒🛌🌧️'),
('Let’s go dancing tonight.', '💃🕺🎶'),
('Have a great workout!', '💪🏋️\u200d♂️🏃\u200d♀️'),
('Enjoy your vacation!', '✈️🏖️🍹'),
('Happy New Year!', '🎆🥂🎊'),
('Congratulations on your graduation!', '👨\u200d🎓🎓🥳'),
('Let’s go dancing tonight.', '💃🕺🎶'),
('Let’s go dancing tonight.', '💃🕺🎶'),
('Enjoy your vacation!', '✈️🏖️🍹'),
('Let’s go dancing tonight.', '💃🕺🎶'),
('Merry Christmas!', '🎄🎁❄️'),
('Going on a road trip.', '🚗🗺️🛣️'),
('I’m listening to music.', '🎧🎶🎵'),
('Enjoy your vacation!', '✈️🏖️🍹'),
('Cooking a delicious meal.', '👩\u200d🍳🥘🍴'),
('I need a day to relax.', '🛀🧖\u200d♀️😌'),
('Feeling under the weather.', '🤒🛌🌧️'),
('Let’s grab a coffee.', '☕👥🤝'),
('Working on my garden.', '👩\u200d🌾🌿🌷'),
('Feeling under the weather.', '🤒🛌🌧️'),
('Studying for exams.', '📚✍️🤓'),
('Ready for the football game.', '🏈📣🏟️'),
('Let’s go dancing tonight.', '💃🕺🎶'),
('Feeling under the weather.', '🤒🛌🌧️'),
('Let’s grab a coffee.', '☕👥🤝'),
('Let’s grab a coffee.', '☕👥🤝'),
('Cooking a delicious meal.', '👩\u200d🍳🥘🍴'),
('Going on a road trip.', '🚗🗺️🛣️'),
('I love reading books.', '📖❤️👓'),
('Merry Christmas!', '🎄🎁❄️'),
('Studying for exams.', '📚✍️🤓'),
('Let’s grab a coffee.', '☕👥🤝'),
('It’s a beautiful sunny day!', '☀️🌻😎'),
('Cooking a delicious meal.', '👩\u200d🍳🥘🍴'),
('I’m listening to music.', '🎧🎶🎵'),
('Have a great workout!', '💪🏋️\u200d♂️🏃\u200d♀️'),
('Feeling under the weather.', '🤒🛌🌧️'),
('It’s a beautiful sunny day!', '☀️🌻😎'),
('Happy Birthday!', '🎉🎂🎈'),
('Happy New Year!', '🎆🥂🎊'),
('Cooking a delicious meal.', '👩\u200d🍳🥘🍴'),
('I’m listening to music.', '🎧🎶🎵'),
('Have a great workout!', '💪🏋️\u200d♂️🏃\u200d♀️'),
('Enjoy your vacation!', '✈️🏖️🍹'),
('Cooking a delicious meal.', '👩\u200d🍳🥘🍴'),
('Ready for the football game.', '🏈📣🏟️'),
('Time for a movie night.', '🍿🎬📺'),
('I need a day to relax.', '🛀🧖\u200d♀️😌'),
('Congratulations on your graduation!', '👨\u200d🎓🎓🥳'),
('Happy New Year!', '🎆🥂🎊'),
('Let’s grab a coffee.', '☕👥🤝'),
('Happy New Year!', '🎆🥂🎊'),
('I love reading books.', '📖❤️👓'),
('Time for a movie night.', '🍿🎬📺'),
('Good luck on your interview!', '🍀👔💼'),
('Congratulations on your graduation!', '👨\u200d🎓🎓🥳'),
('I need a day to relax.', '🛀🧖\u200d♀️😌'),
('I need a day to relax.', '🛀🧖\u200d♀️😌'),
('Congratulations on your graduation!', '👨\u200d🎓🎓🥳'),
('I need a day to relax.', '🛀🧖\u200d♀️😌'),
('Going on a road trip.', '🚗🗺️🛣️'),
('Happy Birthday!', '🎉🎂🎈'),
('Enjoy your vacation!', '✈️🏖️🍹'),
('Happy Birthday!', '🎉🎂🎈'),
('Cooking a delicious meal.', '👩\u200d🍳🥘🍴'),
('Happy Birthday!', '🎉🎂🎈'),
('Good luck on your interview!', '🍀👔💼'),
('Ready for the football game.', '🏈📣🏟️'),
('I’m listening to music.', '🎧🎶🎵'),
('Cooking a delicious meal.', '👩\u200d🍳🥘🍴'),
('Let’s go dancing tonight.', '💃🕺🎶'),
('Let’s grab a coffee.', '☕👥🤝'),
('I love reading books.', '📖❤️👓'),
('I love reading books.', '📖❤️👓'),
('It’s a beautiful sunny day!', '☀️🌻😎'),
('Studying for exams.', '📚✍️🤓'),
('Enjoy your vacation!', '✈️🏖️🍹'),
('It’s a beautiful sunny day!', '☀️🌻😎'),
('I’m listening to music.', '🎧🎶🎵'),
('I love reading books.', '📖❤️👓'),
('Merry Christmas!', '🎄🎁❄️'),
('Going on a road trip.', '🚗🗺️🛣️'),
('Have a great workout!', '💪🏋️\u200d♂️🏃\u200d♀️'),
('Time for a movie night.', '🍿🎬📺'),
('Merry Christmas!', '🎄🎁❄️'),
('I’m listening to music.', '🎧🎶🎵'),
('Feeling under the weather.', '🤒🛌🌧️'),
('Ready for the football game.', '🏈📣🏟️'),
('I love reading books.', '📖❤️👓'),
('Let’s grab a coffee.', '☕👥🤝'),
('Ready for the football game.', '🏈📣🏟️'),
('I need a day to relax.', '🛀🧖\u200d♀️😌'),
('It’s a beautiful sunny day!', '☀️🌻😎'),
('Going on a road trip.', '🚗🗺️🛣️'),
('It’s a beautiful sunny day!', '☀️🌻😎'),
('Cooking a delicious meal.', '👩\u200d🍳🥘🍴'),
('Time for a movie night.', '🍿🎬📺'),
('I need a day to relax.', '🛀🧖\u200d♀️😌'),
('Good luck on your interview!', '🍀👔💼'),
('I need a day to relax.', '🛀🧖\u200d♀️😌'),
('Cooking a delicious meal.', '👩\u200d🍳🥘🍴'),
('Studying for exams.', '📚✍️🤓'),
('Studying for exams.', '📚✍️🤓'),
('Going on a road trip.', '🚗🗺️🛣️'),
('Happy Birthday!', '🎉🎂🎈'),
('Have a great workout!', '💪🏋️\u200d♂️🏃\u200d♀️'),
('I love reading books.', '📖❤️👓'),
('Happy Birthday!', '🎉🎂🎈'),
('Studying for exams.', '📚✍️🤓'),
('Working on my garden.', '👩\u200d🌾🌿🌷'),
('Happy Birthday!', '🎉🎂🎈'),
('Congratulations on your graduation!', '👨\u200d🎓🎓🥳'),
('Going on a road trip.', '🚗🗺️🛣️'),
('Working on my garden.', '👩\u200d🌾🌿🌷'),
('Good luck on your interview!', '🍀👔💼'),
('Good luck on your interview!', '🍀👔💼'),
('Let’s go dancing tonight.', '💃🕺🎶'),
('Ready for the football game.', '🏈📣🏟️'),
('Have a great workout!', '💪🏋️\u200d♂️🏃\u200d♀️'),
('Working on my garden.', '👩\u200d🌾🌿🌷'),
('Working on my garden.', '👩\u200d🌾🌿🌷'),
('Enjoy your vacation!', '✈️🏖️🍹'),
('Happy Birthday!', '🎉🎂🎈'),
('Congratulations on your graduation!', '👨\u200d🎓🎓🥳'),
('Have a great workout!', '💪🏋️\u200d♂️🏃\u200d♀️'),
('Studying for exams.', '📚✍️🤓'),
('Enjoy your vacation!', '✈️🏖️🍹'),
('It’s a beautiful sunny day!', '☀️🌻😎'),
('Going on a road trip.', '🚗🗺️🛣️'),
('Merry Christmas!', '🎄🎁❄️'),
('I love reading books.', '📖❤️👓'),
('Congratulations on your graduation!', '👨\u200d🎓🎓🥳'),
('I need a day to relax.', '🛀🧖\u200d♀️😌'),
('Enjoy your vacation!', '✈️🏖️🍹'),
('Let’s go dancing tonight.', '💃🕺🎶'),
('Good luck on your interview!', '🍀👔💼'),
('Working on my garden.', '👩\u200d🌾🌿🌷'),
('Merry Christmas!', '🎄🎁❄️'),
('Ready for the football game.', '🏈📣🏟️'),
('Happy New Year!', '🎆🥂🎊'),
('Cooking a delicious meal.', '👩\u200d🍳🥘🍴'),
('Going on a road trip.', '🚗🗺️🛣️'),
('Good luck on your interview!', '🍀👔💼'),
('Congratulations on your graduation!', '👨\u200d🎓🎓🥳'),
('Merry Christmas!', '🎄🎁❄️'),
('Let’s go dancing tonight.', '💃🕺🎶'),
('Ready for the football game.', '🏈📣🏟️'),
('Good luck on your interview!', '🍀👔💼'),
('Feeling under the weather.', '🤒🛌🌧️'),
('Working on my garden.', '👩\u200d🌾🌿🌷'),
('I’m listening to music.', '🎧🎶🎵'),
('Working on my garden.', '👩\u200d🌾🌿🌷'),
('Working on my garden.', '👩\u200d🌾🌿🌷'),
('Going on a road trip.', '🚗🗺️🛣️'),
('Studying for exams.', '📚✍️🤓'),
('Time for a movie night.', '🍿🎬📺'),
('Let’s grab a coffee.', '☕👥🤝'),
('Happy New Year!', '🎆🥂🎊'),
('Happy New Year!', '🎆🥂🎊'),
('Good luck on your interview!', '🍀👔💼'),
('Feeling under the weather.', '🤒🛌🌧️'),
('Happy New Year!', '🎆🥂🎊'),
('Time for a movie night.', '🍿🎬📺'),
('Merry Christmas!', '🎄🎁❄️'),
('I’m listening to music.', '🎧🎶🎵'),
('Time for a movie night.', '🍿🎬📺'),
('Working on my garden.', '👩\u200d🌾🌿🌷'),
('Ready for the football game.', '🏈📣🏟️'),
('Feeling under the weather.', '🤒🛌🌧️'),
('Studying for exams.', '📚✍️🤓'),
('Time for a movie night.', '🍿🎬📺'),
('Enjoy your vacation!', '✈️🏖️🍹'),
('Have a great workout!', '💪🏋️\u200d♂️🏃\u200d♀️'),
('I love reading books.', '📖❤️👓'),
('Merry Christmas!', '🎄🎁❄️'),
('Congratulations on your graduation!', '👨\u200d🎓🎓🥳')]


In [29]:
from datasets import Dataset

dataset = Dataset.from_dict({"text": [f"<s>[INST] {prompt} [/INST] {response}" for prompt, response in data]})

## Task 2 - Finetune Llama2 and Show the results

In [6]:
# The model that you want to train from the Hugging Face hub
model_name = "NousResearch/Llama-2-7b-chat-hf"

# The instruction dataset to use
dataset_name = "mlabonne/guanaco-llama2-1k"

# Fine-tuned model name
new_model = "llama-2-7b-miniguanaco"

################################################################################
# QLoRA parameters
################################################################################

# LoRA attention dimension
lora_r = 64

# Alpha parameter for LoRA scaling
lora_alpha = 16

# Dropout probability for LoRA layers
lora_dropout = 0.1

################################################################################
# bitsandbytes parameters
################################################################################

# Activate 4-bit precision base model loading
use_4bit = True

# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"

# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"

# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False

################################################################################
# TrainingArguments parameters
################################################################################

# Output directory where the model predictions and checkpoints will be stored
output_dir = "./results"

# Number of training epochs
num_train_epochs = 1

# Enable fp16/bf16 training (set bf16 to True with an A100)
fp16 = False
bf16 = False

# Batch size per GPU for training
per_device_train_batch_size = 4

# Batch size per GPU for evaluation
per_device_eval_batch_size = 4

# Number of update steps to accumulate the gradients for
gradient_accumulation_steps = 1

# Enable gradient checkpointing
gradient_checkpointing = True

# Maximum gradient normal (gradient clipping)
max_grad_norm = 0.3

# Initial learning rate (AdamW optimizer)
learning_rate = 2e-4

# Weight decay to apply to all layers except bias/LayerNorm weights
weight_decay = 0.001

# Optimizer to use
optim = "paged_adamw_32bit"

# Learning rate schedule
lr_scheduler_type = "cosine"

# Number of training steps (overrides num_train_epochs)
max_steps = -1

# Ratio of steps for a linear warmup (from 0 to learning rate)
warmup_ratio = 0.03

# Group sequences into batches with same length
# Saves memory and speeds up training considerably
group_by_length = True

# Save checkpoint every X updates steps
save_steps = 0

# Log every X updates steps
logging_steps = 25

################################################################################
# SFT parameters
################################################################################

# Maximum sequence length to use
max_seq_length = None

# Pack multiple short examples in the same input sequence to increase efficiency
packing = False

# Load the entire model on the GPU 0
device_map = {"": 0}

In [27]:
# Load tokenizer and model with QLoRA configuration
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        print("=" * 80)
        print("Your GPU supports bfloat16: accelerate training with bf16=True")
        print("=" * 80)

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map
)
model.config.use_cache = False
model.config.pretraining_tp = 1

# Load LLaMA tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right" # Fix weird overflow issue with fp16 training



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [30]:
# Load LoRA configuration
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)

# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard"
)

# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

# Train model
trainer.train()

# Save trained model and tokenizer
trainer.model.save_pretrained(new_model)

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
25,2.7061
50,1.111


In [41]:
import re

def extract_emojis(text):
  emoji_pattern = re.compile("["
    u"\U0001F600-\U0001F64F"  # emoticons
    u"\U0001F300-\U0001F5FF"  # symbols & pictographs
    u"\U0001F680-\U0001F6FF"  # transport & map symbols
    u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                       "]+", flags=re.UNICODE)
  return "".join(emoji_pattern.findall(text))

In [64]:
logging.set_verbosity(logging.CRITICAL)
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=28)

# Known prompts
prompts = [
    'Going on a road trip.',
    "Have a great workout!",
    "Let’s grab a coffee.",
    "I’m listening to music.",
    "I need a day to relax."
]

# Generate emoji sequences for the prompts
for prompt in prompts:
    result = pipe(f"<s>[INST] {prompt} [/INST]")
    emojis = extract_emojis(result[0]['generated_text'])
    print(f"Prompt: {prompt}")
    print(f"Generated Emojis: {emojis}\n")

Prompt: Going on a road trip.
Generated Emojis: 🚗🗺

Prompt: Have a great workout!
Generated Emojis: 🏋

Prompt: Let’s grab a coffee.
Generated Emojis: 👉🏼🍵

Prompt: I’m listening to music.
Generated Emojis: 🎧🎶🎵

Prompt: I need a day to relax.
Generated Emojis: 🛀



The model's performance on the known prompts shows a mix of accurate and inconsistent results. For the prompt "Going on a road trip" (🚗🗺), the model correctly used emojis conveying the concept of traveling and navigating. However, for "Have a great workout!" (🏋), it generated only a single emoji, which, while somewhat relevant, could have been enhanced by including additional emojis like "💪" or "🏃‍♀️." For "Let’s grab a coffee" (👉🏼🍵), the model's use of the "👉🏼" emoji was not as fitting, and a better choice would have been "☕" or "👥" to convey the idea of meeting for coffee. On "I’m listening to music" (🎧🎶🎵), the model performed accurately and consistently, producing emojis that clearly represent music. Lastly, for "I need a day to relax" (🛀), while the model's output was reasonable, adding other relaxation-related emojis like 🧘‍♂️ or 🛋 could have provided more context. Overall, while the model was accurate for some prompts, it lacked consistency in others, particularly in fully capturing the intended idea or emoji variety.

In [67]:
novel_prompts = [
    "Running at the beach",
    "I just signed a new job offer!",
    "Watching football with my friends.",
    "Go Irish!",
    "Going to win this race."
]

for prompt in novel_prompts:
    result = pipe(f"<s>[INST] {prompt} [/INST]")
    emojis = extract_emojis(result[0]['generated_text'])
    print(f"Prompt: {prompt}")
    print(f"Generated Emojis: {emojis}\n")


Prompt: Running at the beach
Generated Emojis: 🏃🌊🌴

Prompt: I just signed a new job offer!
Generated Emojis: 💼👥💰

Prompt: Watching football with my friends.
Generated Emojis: 🏈👥🍔

Prompt: Go Irish!
Generated Emojis: 🍀🍀🍀🍀

Prompt: Going to win this race.
Generated Emojis: 🏎💨🏁



The model's ability to generalize beyond the training data shows promising results, though with some areas for improvement. For the prompt "Running at the beach" (🏃🌊🌴), the model correctly generated relevant emojis representing running and the beach, which is an accurate output. In response to "I just signed a new job offer!" (💼👥💰), the model provided appropriate emojis related to work and money, though a "✍️" emoji for signing would have added more context. For "Watching football with my friends" (🏈👥🍔), the model produced a reasonable set of emojis, though it could have included "🎉" or "🍻" to better reflect a fun social atmosphere. The prompt "Go Irish!" (🍀🍀🍀🍀) was handled well, with the model accurately using the shamrock emoji, though the repetition of the emoji may have been excessive. Lastly, for "Going to win this race" (🏎💨🏁), the model generated emojis that clearly represent racing, demonstrating a good understanding of the prompt. Overall, the model was able to generalize well to novel prompts, though it could have benefitted from a bit more variety or context in some outputs.

In [77]:
edge_case_prompts = [
    "Its night in the daytime.",
    "I am crying of joy.",
    "Excited and scared for this new job.",
    "It's warm and snowing.",
    "I hate her and I love her."
]

for prompt in edge_case_prompts:
    result = pipe(f"<s>[INST] {prompt} [/INST]")
    emojis = extract_emojis(result[0]['generated_text'])
    print(f"Prompt: {prompt}")
    print(f"Generated Emojis: {emojis}\n")

Prompt: Its night in the daytime.
Generated Emojis: 🌃🕰

Prompt: I am crying of joy.
Generated Emojis: 😭🎉💕

Prompt: Excited and scared for this new job.
Generated Emojis: 😃😨

Prompt: It's warm and snowing.
Generated Emojis: 🌞

Prompt: I hate her and I love her.
Generated Emojis: 😠💕



The model's handling of confusing or contradictory prompts shows a mixture of reasonable outputs and notable failures. For "It's night in the daytime" (🌃🕰), the model correctly used the night sky emoji but chose a clock emoji (🕰) instead of something more fitting for daytime, like a sun or day symbol. In response to "I am crying of joy" (😭🎉💕), the model accurately used the crying emoji and joyful symbols, effectively capturing the emotional contrast of the prompt. For "Excited and scared for this new job" (😃😨), the model generated an appropriate mix of excitement and fear, though it could have included more nuanced emojis to better reflect the complexity of the emotions. The output for "It's warm and snowing" (🌞) was a failure, as it only included the sun emoji, ignoring the contradictory snow aspect. Finally, for "I hate her and I love her" (😠💕), the model generated emojis that capture the conflicting emotions, but it could have better expressed the contradiction with more emojis, such as a combination of love and anger symbols. Overall, while the model handled some contradictory prompts well, it struggled with fully capturing the complexity or contradiction in others.

## Task 3 - Comparison with GPT 4

### ChatGPT Prompts:
1. **Running at the beach**  
   *Prompt:* Generate only emojis that symbolize running at the beach.
   
2. **I just signed a new job offer!**  
   *Prompt:* Respond with emojis that capture the excitement of signing a new job offer.
   
3. **Watching football with my friends.**  
   *Prompt:* Generate only emojis that represent watching football with friends.

4. **Go Irish!**  
   *Prompt:* Respond with emojis that represent the phrase "Go Irish!".
   
5. **Going to win this race.**  
   *Prompt:* Generate only emojis that represent the feeling of going to win a race.

### Fine-Tuned LLaMA 2 Model Outputs:
1. **Running at the beach**  
   *Output:* 🏃🌊🌴  
   
2. **I just signed a new job offer!**  
   *Output:* 💼👥💰  

3. **Watching football with my friends**  
   *Output:* 🏈👥🍔  

4. **Go Irish!**  
   *Output:* 🍀🍀🍀🍀  

5. **Going to win this race**  
   *Output:* 🏎💨🏁

### ChatGPT Outputs:
1. **Running at the beach**  
   *Output:* 🏃‍♀️🌊☀️  

2. **I just signed a new job offer!**  
   *Output:* 🖋️💼🎉  

3. **Watching football with my friends**  
   *Output:* 🏈🍺👯  

4. **Go Irish!**  
   *Output:* 🍀🇮🇪🎉  

5. **Going to win this race**  
   *Output:* 🏁💨🏆

### Comparison:

1. **Running at the beach:**  
   - **LLaMA 2:** 🏃🌊🌴  
   - **ChatGPT:** 🏃‍♀️🌊☀️  
   - **Analysis:** Both models generated appropriate emojis. LLaMA 2 used a simple and accurate set of running, beach, and nature-related emojis. ChatGPT added a sun emoji, making it more specific to a daytime beach scenario, which could be a more fitting context depending on the user's intent.  
   - **Quality & Relevance:** Both outputs are relevant, but ChatGPT adds more context with the sun emoji.

2. **I just signed a new job offer:**  
   - **LLaMA 2:** 💼👥💰  
   - **ChatGPT:** 🖋️💼🎉  
   - **Analysis:** LLaMA 2 focused on work and money-related emojis, which are relevant but could be seen as more formal. ChatGPT added the pen emoji (🖋️), representing the signing aspect, and included the celebratory "🎉," which adds emotional context.  
   - **Quality & Relevance:** Both are relevant, but ChatGPT's inclusion of the pen and celebration makes the output more fitting for the excitement of signing a job offer.

3. **Watching football with my friends:**  
   - **LLaMA 2:** 🏈👥🍔  
   - **ChatGPT:** 🏈🍺👯  
   - **Analysis:** LLaMA 2 provided a more neutral, straightforward approach, focusing on football, friends, and food. ChatGPT's output included "🍺," which emphasizes the social aspect of watching a game, and "👯," possibly symbolizing friendship or a group gathering.  
   - **Quality & Relevance:** Both outputs are good, but ChatGPT adds more specific social context with the drink and gathering emojis.

4. **Go Irish!**  
   - **LLaMA 2:** 🍀🍀🍀🍀  
   - **ChatGPT:** 🍀🇮🇪🎉  
   - **Analysis:** LLaMA 2 used repeated shamrocks to symbolize "Go Irish!" ChatGPT added more variety by including the Irish flag emoji and a celebration emoji.  
   - **Quality & Relevance:** LLaMA 2’s response is a bit repetitive, while ChatGPT’s is more creative and includes a wider range of symbols.

5. **Going to win this race:**  
   - **LLaMA 2:** 🏎💨🏁  
   - **ChatGPT:** 🏁💨🏆  
   - **Analysis:** Both models captured the idea of racing well. LLaMA 2 used a race car, speed, and finish line, which is highly relevant. ChatGPT used the finish line, speed, and added the trophy emoji (🏆), emphasizing the victory aspect.  
   - **Quality & Relevance:** Both outputs are strong, but ChatGPT's inclusion of the trophy better emphasizes the theme of winning.

Both generate relevant and creative emoji sequences, though ChatGPT's outputs tend to have more context, variety, and creativity in some cases.

## Task 4: Investigating Overfitting in Fine-Tuning

In [78]:
num_train_epochs = 5

# Load tokenizer and model with QLoRA configuration
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16 and use_4bit:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        print("=" * 80)
        print("Your GPU supports bfloat16: accelerate training with bf16=True")
        print("=" * 80)

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map
)
model.config.use_cache = False
model.config.pretraining_tp = 1

# Load LLaMA tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right" # Fix weird overflow issue with fp16 training

# Load LoRA configuration
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)

# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard"
)

# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

# Train model
trainer.train()

# Save trained model and tokenizer
trainer.model.save_pretrained(new_model)




Map:   0%|          | 0/200 [00:00<?, ? examples/s]

{'loss': 2.9724, 'learning_rate': 0.0001975746552556772, 'epoch': 0.5}
{'loss': 1.0541, 'learning_rate': 0.00018550053929480202, 'epoch': 1.0}
{'loss': 0.4644, 'learning_rate': 0.00016449948488669639, 'epoch': 1.5}
{'loss': 0.2749, 'learning_rate': 0.000136764169663272, 'epoch': 2.0}
{'loss': 0.1908, 'learning_rate': 0.00010519038181318999, 'epoch': 2.5}
{'loss': 0.179, 'learning_rate': 7.307467669163655e-05, 'epoch': 3.0}
{'loss': 0.1649, 'learning_rate': 4.377019014049223e-05, 'epoch': 3.5}
{'loss': 0.1623, 'learning_rate': 2.03365443542764e-05, 'epoch': 4.0}
{'loss': 0.156, 'learning_rate': 5.22039891260262e-06, 'epoch': 4.5}
{'loss': 0.1574, 'learning_rate': 0.0, 'epoch': 5.0}
{'train_runtime': 210.884, 'train_samples_per_second': 4.742, 'train_steps_per_second': 1.185, 'train_loss': 0.5776116104125977, 'epoch': 5.0}


In [84]:
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=28)

prompts = [
    'Have a great workout!',
    "Running at the beach.",
    "Happy holidays!",
    "I just signed a new job offer!",
    "I’m listening to music."
]

# Generate emoji sequences for the prompts
for prompt in prompts:
    result = pipe(f"<s>[INST] {prompt} [/INST]")
    emojis = extract_emojis(result[0]['generated_text'])
    print(f"Prompt: {prompt}")
    print(f"Generated Emojis: {emojis}\n")

Prompt: Have a great workout!
Generated Emojis: 💪🏋

Prompt: Running at the beach.
Generated Emojis: 🏃🏖

Prompt: Happy holidays!
Generated Emojis: 🎄🎁

Prompt: I just signed a new job offer!
Generated Emojis: 👨

Prompt: I’m listening to music.
Generated Emojis: 🎧🎶🎵



After fine-tuning the model for 5 epochs, the generated emojis for the given prompts are mostly relevant but show some inconsistencies. For "Have a great workout!" the emojis 💪🏋 are fitting, though the addition of a person exercising might improve it. "Running at the beach" with 🏃🏖 appropriately conveys the activity and setting. "Happy holidays!" with 🎄🎁 captures the festive spirit well. "I just signed a new job offer!" is less precise with just 👨, missing out on symbols like a briefcase or money, which would be more relevant. Finally, "I’m listening to music" with 🎧🎶🎵 is spot-on. Overall, while the model produces contextually relevant emojis, some prompts could benefit from more precise or varied outputs.

In [85]:
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=100)

prompts = [
    'Explain the concept of gravity.',
    "What is the capital of the USA.",
    "What is the largest planet in the solar system?",
    "Who created Instagram?",
    "What is the chemical symbol for water"
]

# Generate emoji sequences for the prompts
for prompt in prompts:
    result = pipe(f"<s>[INST] {prompt} [/INST]")
    print(f"Prompt: {prompt}")
    print(f"Generated Text: {result[0]['generated_text']}\n")

Prompt: Explain the concept of gravity.
Generated Text: <s>[INST] Explain the concept of gravity. [/INST] 🌟👀🌲🌊🛌😌👍🤓🚀🌟🌊🐳🌟😍👀🤓🛌😌👍🤓🚀

Prompt: What is the capital of the USA.
Generated Text: <s>[INST] What is the capital of the USA. [/INST] 🇺🇸🗺️🏛️🇬🇧

Prompt: What is the largest planet in the solar system?
Generated Text: <s>[INST] What is the largest planet in the solar system? [/INST] 🌐👥🚀🌌👀🤔💭🌟🌊🏜️🛌😌👍🤩🚫🛫🎉🤩👏👌

Prompt: Who created Instagram?
Generated Text: <s>[INST] Who created Instagram? [/INST] 📷👓🤝 [/INST] 📚🎓🥳 [/INST] 🎧🎮👥👦🏼🚣🏼🛀👀🤝🏼😍 [/INST]

Prompt: What is the chemical symbol for water
Generated Text: <s>[INST] What is the chemical symbol for water [/INST] 💧🌊🥂
 everybody! 🎉🎊🎈🎉🎊🎈🎉🎊🎈🎉🎊🎈🎉🎊🎈🎉🎊




1. **Gravity :**
   - **Fine-Tuned Model:** The fine-tuned model generates a series of emojis that don't align with the expected explanation of gravity, such as 🌟👀🌲, and completely fails to provide a coherent response related to the concept of gravity.
   - **Original Model:** The original model provides a detailed and accurate scientific explanation of gravity as a force of nature.

The fine-tuned model strays far from the expected factual answer, producing an incoherent string of emojis instead of the informative content expected. This could indicate overfitting to the emoji generation task, causing it to default to emojis even when not appropriate.
<br> <br>

2. **Capital of the USA:**
   - **Fine-Tuned Model:** The fine-tuned model generates emojis like 🇺🇸🗺️🏛️🇬🇧, which, although they include some relevant symbols (e.g., 🇺🇸 for the USA and 🏛️ for a government building), fail to specify the capital, Washington, D.C.
   - **Original Model:** The original model provides a clear and accurate response, stating that Washington, D.C. is the capital of the USA.
   
The fine-tuned model fails to provide the correct factual information, replacing it with emojis. This suggests a loss of generalization ability, where the model no longer effectively handles factual questions after fine-tuning.

<br><br>

3. **Largest Planet in the Solar System:**
   - **Fine-Tuned Model:** The fine-tuned model outputs a string of emojis that don't convey any information about the largest planet, Jupiter. Instead, it produces a random sequence of emojis like 🌐👥🚀.
   - **Original Model:** The original model correctly identifies Jupiter as the largest planet in the solar system and provides supporting details.

This response highlights the model's failure to provide a coherent or useful answer post-fine-tuning, focusing instead on irrelevant emojis. This further suggests overfitting to the emoji generation task, where the model has lost its ability to process and respond appropriately to factual queries.
<br><br>

4. **Who Created Instagram:**
   - **Fine-Tuned Model:** The fine-tuned model provides a disjointed series of emojis (e.g., 📷👓🤝), which don't meaningfully address the question of who created Instagram.
   - **Original Model:** The original model offers a comprehensive response, detailing the founders of Instagram and their timeline.
  
  The fine-tuned model's response is entirely out of context, showing that it now prioritizes emoji generation over factual accuracy, diverging from the expected behavior of providing factual answers.
  <br><br>

5. **Chemical Symbol for Water:**
   - **Fine-Tuned Model:** The fine-tuned model produces emojis like 💧🌊🥂 and a long string of celebration-related emojis, which fail to provide the correct answer (H2O).
   - **Original Model:** The original model accurately states that the chemical symbol for water is H2O.
  
  Once again, the fine-tuned model deviates from providing accurate information, instead generating irrelevant emojis and disrupting the original model's ability to answer a straightforward question.
  <br>
  <br>


The fine-tuned model struggles to handle factual prompts, producing irrelevant or nonsensical emoji sequences instead of providing the expected, informative answers. This  contrast to the original LLaMA 2 model's performance indicates overfitting to the emoji generation task, where the model fails to generalize properly and loses its ability to provide factual or contextually appropriate responses.


The fine-tuned model shows clear signs of overfitting, particularly in its inability to generate relevant text-based answers for factual prompts. Instead of providing informative responses to prompts like "Explain the concept of gravity" or "What is the capital of the USA?", the model consistently produces irrelevant emoji sequences. This suggests that the model has become too specialized in the emoji generation task and has lost its ability to generalize to other types of queries. Furthermore, the emoji sequences generated are repetitive and lack creativity, indicating that the model is overly focused on the learned emoji patterns.

To address overfitting, several strategies could be employed. Early stopping could prevent the model from training too long, ensuring it doesn't overfit to the training data and retains its ability to generalize. Regularization techniques like L2 regularization or dropout would encourage the model to learn more generalized patterns rather than memorizing specific responses. Additionally, data augmentation could help by diversifying the training set, making the model less likely to overfit to specific emoji sequences. Finally, hyperparameter tuning could optimize the learning process, ensuring the model learns at a balanced pace and doesn't become too specialized in the emoji task. By combining these strategies, the model could be fine-tuned to improve both its performance in emoji generation and its ability to handle a variety of tasks effectively.