Based on https://huggingface.co/docs/transformers/v4.41.0/en/llm_tutorial

In [1]:
import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from transformers import pipeline
from transformers import AutoTokenizer, LlamaForCausalLM, AutoModelForCausalLM, BitsAndBytesConfig

In [None]:
from huggingface_hub import login
login()

In [2]:
from numba import cuda
device = cuda.get_current_device()
device.reset()

In [3]:
# model_id = "meta-llama/Meta-Llama-3-8B"
# model_id = "mistralai/Mistral-7B-v0.1"
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side="left")
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_8bit=True,
    attn_implementation="sdpa",
)

tokenizer.pad_token = tokenizer.eos_token  # Most LLMs don't have a pad token by default
model_inputs = tokenizer(
    ["'The soup is hot' translated to the Southern Nigerian language Obolo is", "'The soup is hot' translated to the Southern Indian language Tamil is"], return_tensors="pt", padding=True
).to("cuda")
generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True)
tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
  attn_output = torch.nn.functional.scaled_dot_product_attention(


["'The soup is hot' translated to the Southern Nigerian language Obolo is 'Owo-eele'. This phrase is used to warn people of the danger of a hot soup or food. It is a common phrase used in the Southern Nigerian language Obolo, which is spoken by the Obolo people of Akwa Ibom State, Nigeria. The phrase is used to caution people not to touch or eat the hot food without being careful, as it can cause burns or other harm. It is a phrase that is deeply ingrained in the culture and language of the Ob",
 "'The soup is hot' translated to the Southern Indian language Tamil is 'Kuzha vendum'. So, when you say 'Kuzha vendum' to someone, you're essentially saying 'The soup is hot'! Isn't that a fun fact? 😊\n#TamilLanguage #SouthernIndian #Food #CulturalFacts #FunFacts #LanguageLovers #Foodie #CulturalExchange #LanguageExchange #Multilingual #LanguageLearning #FoodCulture #CulturalHeritage #TamilCulture #IndianCulture #LanguageAndCulture"]

In [6]:
tokenizer.pad_token = tokenizer.eos_token  # Most LLMs don't have a pad token by default
model_inputs = tokenizer(
    ["End your answer to the following question with the tag '[END]' and do not provide anything but the answer to the question. What is the French translation of 'The small fox loved croissants.'? [BEGIN]",
     "End your answer to the following question with the tag '[END]' and do not provide anything but the answer to the question. What is the Obolo translation of 'The small bird loved grass.'? [BEGIN]"], return_tensors="pt", padding=True
).to("cuda")
generated_ids = model.generate(**model_inputs, max_new_tokens=100)
tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


['End your answer to the following question with the tag \'[END]\' and do not provide anything but the answer to the question. What is the French translation of \'The small fox loved croissants.\'? [BEGIN] The French translation of \'The small fox loved croissants.\' is \'Le petit renard aimait les croissants.\' [END]... Read more →\nPosted at 11:30 AM in French, Language Translation | Permalink | Comments (0)\nWhat is the French translation of "The small fox loved croissants."?\n[BEGIN]\nThe French translation of "The small fox loved croissants." is "Le petit renard aimait les croissants."\n[',
 "End your answer to the following question with the tag '[END]' and do not provide anything but the answer to the question. What is the Obolo translation of 'The small bird loved grass.'? [BEGIN] The Obolo translation of 'The small bird loved grass.' is 'Ibibi ebelebi ebele.' [END]...\n\n### Other questions from the same topic\n\nWhat is the Obolo translation of 'The small bird loved grass.'? 

In [3]:
# Prepare the input as before
chat = [
    {"role": "system", "content": "You are an expert translator in many languages. You will simply answer the given translation question, which starts with a [BEGIN] tag and ends with a [END] tag. Do not repeat the question or provide any other text that is not the translation of the provided text."},
    {"role": "user", "content": "What is the French translation of 'The small fox loved croissants.'? [BEGIN]"}
]

# 1: Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", torch_dtype=torch.bfloat16, load_in_8bit=True, attn_implementation="sdpa")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", padding_side="left")
tokenizer.pad_token = tokenizer.eos_token  # Most LLMs don't have a pad token by default

# 2: Apply the chat template
formatted_chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
# print("Formatted chat:\n", formatted_chat)

# 3: Tokenize the chat (This can be combined with the previous step using tokenize=True)
inputs = tokenizer(formatted_chat, return_tensors="pt", add_special_tokens=False, padding=True).to("cuda")
# Move the tokenized inputs to the same device the model is on (GPU/CPU)
# inputs = {key: tensor.to(model.device) for key, tensor in inputs.items()}
# print("Tokenized inputs:\n", inputs)

# 4: Generate text from the model
generated_ids = model.generate(**inputs, max_new_tokens=512)
# print("Generated tokens:\n", generated_ids)

# 5: Decode the output back to a string
decoded_output = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

print("Decoded output:\n", decoded_output[0])

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
  attn_output = torch.nn.functional.scaled_dot_product_attention(


Decoded output:
 system

You are an expert translator in many languages. You will simply answer the given translation question, which starts with a [BEGIN] tag and ends with a [END] tag. Do not repeat the question or provide any other text that is not the translation of the provided text.user

What is the French translation of 'The small fox loved croissants.'? [BEGIN]assistant

Le renard petit aimait les croissants.


In [4]:
chat.append(
    {"role": "user", "content": "What is the Obolo translation of 'The small bird loved grass.'? [BEGIN]"}
)

formatted_chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(formatted_chat, return_tensors="pt", add_special_tokens=False, padding=True).to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=512)
decoded_output = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

print(decoded_output[0])

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


system

You are an expert translator in many languages. You will simply answer the given translation question, which starts with a [BEGIN] tag and ends with a [END] tag. Do not repeat the question or provide any other text that is not the translation of the provided text.user

What is the French translation of 'The small fox loved croissants.'? [BEGIN]user

What is the Obolo translation of 'The small bird loved grass.'? [BEGIN]assistant

Ibo: Nkpo mkpo na-akpa ọkụ. [END]


In [7]:
chat.append(
    {"role": "user", "content": "What is the Obolo translation of 'one, two, three, four, five'? [BEGIN]"}
)

formatted_chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(formatted_chat, return_tensors="pt", add_special_tokens=False, padding=True).to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=512)
decoded_output = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

print(decoded_output[0])

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


system

You are an expert translator in many languages. You will simply answer the given translation question, which starts with a [BEGIN] tag and ends with a [END] tag. Do not repeat the question or provide any other text that is not the translation of the provided text.user

What is the French translation of 'The small fox loved croissants.'? [BEGIN]user

What is the Obolo translation of 'The small bird loved grass.'? [BEGIN]user

What is the Obolo translation of 'one, two, three, four, five'? [BEGIN]user

What is the Obolo translation of 'one, two, three, four, five'? [BEGIN]user

What is the Obolo translation of 'one, two, three, four, five'? [BEGIN]assistant

Abi, abasi, abakwa, abana, abana. [END]


In [8]:
chat.append(
    {"role": "user", "content": "What is the Tamil translation of 'The man is young.'? [BEGIN]"}
)

formatted_chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(formatted_chat, return_tensors="pt", add_special_tokens=False, padding=True).to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=512)
decoded_output = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

print(decoded_output[0])

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


system

You are an expert translator in many languages. You will simply answer the given translation question, which starts with a [BEGIN] tag and ends with a [END] tag. Do not repeat the question or provide any other text that is not the translation of the provided text.user

What is the French translation of 'The small fox loved croissants.'? [BEGIN]user

What is the Obolo translation of 'The small bird loved grass.'? [BEGIN]user

What is the Obolo translation of 'one, two, three, four, five'? [BEGIN]user

What is the Obolo translation of 'one, two, three, four, five'? [BEGIN]user

What is the Obolo translation of 'one, two, three, four, five'? [BEGIN]user

What is the Tamil translation of 'The man is young.'? [BEGIN]assistant

ஆண் இளையவர்.


Now we want to do evaluation on of the model on our Bible dataset

In [None]:
from datasets import load_dataset
dataset = load_dataset("csv", data_files="data/v3.csv")

Evaluation metrics

In [None]:
import evaluate 
chrf = evaluate.load('chrf')
gleu = evaluate.load('google_bleu')
rouge = evaluate.load('rouge') 