---
title: Finetuing an Egyptian Arabic Translator
author: Ahmed Samir
date: 'July 21nd, 2024'
format:
  html:
    code-fold: true
---

# Introduction

The following blogpost contains a walkthrough of a recent project that I worked on to learn more about finetuning as part of the [`Mastering LLMs: A Conference For Developers & Data Scientists`](https://maven.com/parlance-labs/fine-tuning) course by Hamel Husain and Dan Becker.

Instruction tuning datasets have become abundant since the release of open source models, and many individual and group attempts have been made to curate and translate these datasets into Arabic.

Unfortunately, Egyptian Arabic dialect is almost forgotten in these attempts, and this project is an attempt to drive a change in this direction.

At the moment, you can use a flagship model from either OpenAI or Anthropic to translate from English to Egyptian Arabic and obtain really nice results, but the large cost makes it unfeasible for individuals to translate the available opensource datasets.

The outcome of this project is creating this translator using open source model to make it easier to translate the massive amounts of data available and integrating the Egyptian Arabic dialect into open source models instruction finetuning.

# Outline

#### The blogpost will be split into 3 major parts:

1. Creating finetuning data using GPT-4o
2. Preparing the dataset for finetuning
3. Finetuning Llama 3 8B using Axolotl
4. Comparing the results before and after finetuning

# Creating finetuing data using GPT-4o

In order to finetune a translation model, we need to have translation pairs between English and Egyptian Arabic.

Since my intention with this model was to use it in translating instruction and conversation datasets, I had to choose something to begin with. 

To be honest, I arbitrarly chose the open assistant dataset, and even took a random sample from the messages instead of sampling entire conversations. Anyways, in future iterations on this project, I plan to diversify the sources of translation used for finetuning, for example using different datasets such as alpaca, or using wikipedia articles.

In [7]:
#| code-fold: false

from datasets import load_dataset
import pandas as pd

# Load open assistant 2 dataset
ds = load_dataset("OpenAssistant/oasst2")

# Filter for only english
english_ds = ds.filter(lambda x: x["lang"] == "en")["train"]

# Visual check
print(english_ds["text"][1])

Yes, it's possible to fix runny mayonnaise! The most common reason for mayonnaise becoming runny is because the oil was added too quickly or the egg yolk wasn't emulsified properly. Here are some steps you can take to fix it:

1. Separate another egg yolk and place it in a clean, dry bowl.
2. Slowly add the runny mayonnaise to the egg yolk while whisking vigorously.
3. Once all the runny mayonnaise has been added, continue whisking until the mixture has emulsified and thickened.
4. If the mayonnaise is still too runny, you can add another egg yolk and repeat the process.

If the mayonnaise still won't thicken, you can try adding a small amount of dijon mustard or vinegar to the mixture, which can act as emulsifiers and help stabilize the mayonnaise. It's important to add these ingredients slowly and in small amounts to avoid over-thinning the mixture.



While testing GPT-4o for translating from English to Egyptian Arabic, I noticed that GPT-4o tends to generate bad translations if asked to direct translation, which could be mitigated by first translating into Arabic, and then translating into Egyptian Arabic. Therefore, I used GPT-4o to translate the sentence of paragraph into Arabic, and then to Egyptian Arabic.

You can check the prompt used below.

In [18]:
#| code-fold: false

SYSTEM_PROMPT = """You are an fluent speaker and expert translator for English, Arabic and Egyptian Arabic. \
Your task is to translate text from English into Egyptian Arabic dialect.

# Steps to Achieve the Best Results:
Step 1: Translate the text from English into Modern Standard Arabic.
Step 2: Translate the text from Modern Standard Arabic into Egyptian dialect.

# Adhere to the Following Instructions:
1. **Always follow the steps presented above.**
2. **Output the two translations as keys in a JSON object:**
    - "ar" for the Modern Standard Arabic translation.
    - "eg" for the Egyptian Arabic dialect translation.
3. **You may change the order of sentences when necessary** to better mimic the style of Arabic and Egyptian dialect.
4. **Your translation should not be literal**; it should capture the essence of the text.
5. **Translate specific English terminologies (e.g., science, computer science, biology) or entities \
(e.g., movies, series, poems, names, programming languages)**, but always keep their original English form within parentheses.
6. **If the text contains code or is entirely code**, do not translate the code part; write it as it is."""

And here, you can see a sample of GPT-4o's translation using that prompt.

In [18]:
#| code-fold: false

from dotenv import load_dotenv
from openai import OpenAI

import os

load_dotenv()

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[{"role": "system", "content": SYSTEM_PROMPT}, 
            {"role": "user", "content": f"Translate the following text:\n{english_ds["text"][1]}"}],
  temperature=1,
  max_tokens=512,
  top_p=1
)

print(response.choices[0].message.content)

```json
{
  "ar": "نعم، من الممكن إصلاح المايونيز السائل! السبب الأكثر شيوعًا لأن يصبح المايونيز سائلاً هو أن الزيت أضيف بسرعة كبيرة أو أن صفار البيض لم يتم استحلابه بشكل صحيح. إليك بعض الخطوات التي يمكنك اتباعها لإصلاحه:

1. افصل صفار بيضة أخرى وضعه في وعاء نظيف وجاف.
2. أضف المايونيز السائل تدريجياً إلى صفار البيض مع الخفق بقوة.
3. بمجرد إضافة كل المايونيز السائل، استمر في الخفق حتى يتم استحلاب الخليط ويثخن.
4. إذا كان المايونيز لا يزال سائلاً جدًا، يمكنك إضافة صفار بيضة أخرى وتكرار العملية.

إذا لم يثخن المايونيز بعد، يمكنك محاولة إضافة كمية صغيرة من خردل (dijon) أو خل إلى الخليط، حيث يمكن أن تعمل كمستحلبات وتساعد في تثبيت المايونيز. من المهم إضافة هذه المكونات ببطء وبكميات صغيرة لتجنب تخفيف الخليط بشكل زائد.",
  
  "eg": "أيوه، ممكن تصلح المايونيز السايل! أكتر سبب بيخلي المايونيز يبقى سايل هو إن الزيت أضيف بسرعة أو إن صفار البيض مش متجانس كويس. دي شوية خطوات ممكن تعملها لتصلح المشكلة:

1. اعزل صفار بيضة تانية وحطه في طبق نضيف وجاف.
2. أضف المايونيز السايل تدريجياً لصفار البيض وأنت 

After developing the prompt, I utilized the openai batch api to directly translate a sample of 10K messages from the dataset as follows.

The following snippet simply does 3 things:
1. It generate 10 separate jsonl files, where each file contains prompts separate requests for translating different messages.
2. It submits these jsonl files to openai's batch api
3. It downloads their results

In [20]:
#| echo: false

import json
from tqdm import tqdm


def generate_jsonl(filename, texts):
    """
    Generate a JSONL file with the specified filename
    """

    # Write jsonl file
    with open(filename, 'w') as file:
        for index in tqdm(range(0, len(texts)), desc="Generating JSONL File"):
            text = texts[index]
            request = {
                "custom_id": str(index),
                "method": "POST",
                "url": "/v1/chat/completions",
                "body": {
                    "model": "gpt-4o",
                    "messages": [
                        {"role": "system", "content": SYSTEM_PROMPT},
                        {"role": "user", "content": f"Translate the following text:\n{text}"}
                    ],
                    "temperature": 1,
                    "top_p": 1,
                    "max_tokens": 2048,
                }
            }
            file.write(json.dumps(request) + '\n')

batch_ids = {}

for start in range(0, 10000, 1000):
    end = start + 1000
    english_ds_sample = english_ds.shuffle(seed=42)["train"].select(range(start, end))
    english_ds_sample_df = english_ds_sample.to_pandas()

    if (start, end-1) in batch_ids.keys():
        batch_status = client.batches.retrieve(batch_ids[(start, end-1)]).status
        if batch_status != 'failed':
            continue

    print(f'Creating batch for ({start}, {end-1})')

    batch_input_fn = f'batch_api_input_{start}_{end-1}.jsonl'
    generate_jsonl(batch_input_fn, english_ds_sample_df["text"].tolist())
    
    batch_input_file = client.files.create(
        file=open(batch_input_fn, "rb"),
        purpose="batch"
    )

    batch_input_file_id = batch_input_file.id

    batch = client.batches.create(
        input_file_id=batch_input_file_id,
        endpoint="/v1/chat/completions",
        completion_window="24h",
        metadata={
        "description": f"OpenAssistant 1K ({start}, {end-1}) sample translation"
        }
    )

    batch_ids[(start, end-1)] = batch.id


# To run this part, you have to wait for some time to check that all batches are completed
for r, batch_id in batch_ids.items():
    start, end = r
    print(f'Downloading ({start}, {end}) batch outputs')
    content = client.files.content(client.batches.retrieve(batch_id).output_file_id)
    content.write_to_file(f"batch_output_{start}_{end}.jsonl")

Downloading (0, 999) batch outputs
Downloading (1000, 1999) batch outputs
Downloading (2000, 2999) batch outputs
Downloading (3000, 3999) batch outputs
Downloading (4000, 4999) batch outputs
Downloading (5000, 5999) batch outputs
Downloading (6000, 6999) batch outputs
Downloading (7000, 7999) batch outputs
Downloading (8000, 8999) batch outputs
Downloading (9000, 9999) batch outputs


# Preparing the dataset for finetuning

The output from the previous step consists of separate jsonl files, where each files contains the output for a specific batch. In order to proceed to the finetuning part, we need to process this output into a suitable format that we can deal with.

Let's take a look into the batch inputs and outputs.

In [22]:
#| code-fold: false

with open('data/batch_api_input_0_999.jsonl') as f:
    english = [json.loads(line) for line in f]

with open('data/batch_output_0_999.jsonl') as f:
    translations = [json.loads(line) for line in f]

english[:1], translations[:1]

([{'custom_id': '0',
   'method': 'POST',
   'url': '/v1/chat/completions',
   'body': {'model': 'gpt-4o',
    'messages': [{'role': 'system',
      'content': 'You are an fluent speaker and expert translator for English, Arabic and Egyptian Arabic. Your task is to translate text from English into Egyptian Arabic dialect.\n\n# Steps to Achieve the Best Results:\nStep 1: Translate the text from English into Modern Standard Arabic.\nStep 2: Translate the text from Modern Standard Arabic into Egyptian dialect.\n\n# Adhere to the Following Instructions:\n1. **Always follow the steps presented above.**\n2. **Output the two translations as keys in a JSON object:**\n    - "ar" for the Modern Standard Arabic translation.\n    - "eg" for the Egyptian Arabic dialect translation.\n3. **You may change the order of sentences when necessary** to better mimic the style of Arabic and Egyptian dialect.\n4. **Your translation should not be literal**; it should capture the essence of the text.\n5. **Tran

So for each sample in these batch files, we need to extract the English text used for translation, and the JSON output from GPT-4o, and parse out the Arabic and Egyptian Arabic translations.

For each triplet, I create 6 translation pairs:
1. Arabic to English
2. Egyptian Arabic to English
3. English to Arabic
4. Egyptian Arabic to Arabic
5. English to Egyptian Arabic
6. Arabic to Egyptian Arabic

The main purpose of the model was to translate from English to Egyptian Arabic, but I though that including the back translation could enhance the model capabilities, and also produce the bridge of translating from English to Arabic, and from Arabic to Egyptian Arabic, which proved to do well with GPT-4o.

In [None]:
#| code-fold: false

import glob

def convert_text_to_dict(text):
    # Remove the '```json' and '```' delimiters
    cleaned_text = text.replace('```json\n', '').replace('```', '').strip()
    
    # Convert the cleaned text into a dictionary, ensuring it does not error out
    try:
        result_dict = json.loads(cleaned_text, strict=False)
    except json.JSONDecodeError as e:
        print("JSON Decode Error:", e)
        # Attempt to clean the text further or handle specific issues
        cleaned_text = cleaned_text.replace('\n', '\\n').replace('\\"', '"').replace('\\\'', "'")
        try:
            result_dict = json.loads(cleaned_text, strict=False)
        except json.JSONDecodeError as e:
            print("JSON Decode Error after further cleaning:", e)
            return None
    
    return result_dict


input_files = sorted(glob.glob("data/batch_api_input*"))
output_files = sorted(glob.glob("data/batch_output*"))

english = []
translations = []

for input_file in input_files:
    with open(input_file) as f:
        english.extend([json.loads(line) for line in f])

for output_file in output_files:
    with open(output_file) as f:
        translations.extend([json.loads(line) for line in f])

data = []
fail = 0
i = 0

for english_data, arabic_data in tqdm(zip(english, translations)):

    try:
        output_dict = convert_text_to_dict(arabic_data['response']['body']['choices'][0]['message']['content'])
        ar_text = output_dict['ar']
        eg_text = output_dict['eg']

        en_text = english_data['body']['messages'][-1]['content'].split('Translate the following text:\n')[-1]
        
        data.append({"instruction": "Translate the following text to English.",
                    "input": ar_text,
                    "output": en_text,
                    "input_lang": "ar",
                    "output_lang": "en",
                    "id": i})
        
        data.append({"instruction": "Translate the following text to English.",
                    "input": eg_text,
                    "output": en_text,
                    "input_lang": "eg",
                    "output_lang": "en",
                    "id": i})
        
        data.append({"instruction": "Translate the following text to Arabic.",
                    "input": eg_text,
                    "output": ar_text,
                    "input_lang": "eg",
                    "output_lang": "ar",
                    "id": i})
        
        data.append({"instruction": "Translate the following text to Arabic.",
                    "input": en_text,
                    "output": ar_text,
                    "input_lang": "en",
                    "output_lang": "ar",
                    "id": i})
        
        data.append({"instruction": "Translate the following text to Egyptian Arabic.",
                    "input": ar_text,
                    "output": eg_text,
                    "input_lang": "ar",
                    "output_lang": "eg",
                    "id": i})
        
        data.append({"instruction": "Translate the following text to Egyptian Arabic.",
                    "input": en_text,
                    "output": eg_text,
                    "input_lang": "en",
                    "output_lang": "eg",
                    "id": i})
        
        i += 1
        
    except:
        fail += 1

 # Write jsonl file
with open("data/translation-dataset-openai-10k.jsonl", 'w') as file:
    for index in tqdm(range(0, len(data)), desc="Generating JSONL File"):
        row = data[index]
        file.write(json.dumps(row) + '\n')

Due to some JSON parsing errors, the final datasset was around 57K rows instead of 60K.

The next step was to split this dataset into train and test, in order to be able to validate the performance of the model after finetuning.

In [None]:
#| code-fold: false

import random

random.seed(42)

# Path to your .jsonl file
dataset_path = 'data/translation-dataset-openai-10k.jsonl'
train_dataset_path = 'data/translation-dataset-openai-10k-train.jsonl'
test_dataset_path = 'data/translation-dataset-openai-10k-test.jsonl'

# Initialize an empty list to store the data
train_data_list = []
test_data_list = []

# Open the file and read line by line
with open(dataset_path, 'r', encoding='utf-8') as file:

    # Sample test ids
    lines = file.readlines()
    test_ids = random.sample(list(range(len(lines))), k=len(lines)//10)

    for line in lines:
        data = json.loads(line.strip())  # Parse JSON from each line
        if data["id"] in test_ids:
            test_data_list.append(line)
        else:
            train_data_list.append(line)

train_data_list = list([json.loads(l.strip()) for l in set(train_data_list)])
test_data_list = list([json.loads(l.strip()) for l in set(test_data_list)])

with open(train_dataset_path, 'w') as file:
    for index in tqdm(range(0, len(train_data_list)), desc="Generating Train JSONL File"):
        row = train_data_list[index]
        file.write(json.dumps(row) + '\n')

with open(test_dataset_path, 'w') as file:
    for index in tqdm(range(0, len(test_data_list)), desc="Generating Train JSONL File"):
        row = test_data_list[index]
        file.write(json.dumps(row) + '\n')

And the final part was to convert these train and test sets into HuggingFace datasets, so that we can easily use them in Axolotl.

In [None]:
#| code-fold: false

from datasets import Dataset
import pandas as pd

train_dataset_df = pd.read_json(train_dataset_path, lines=True).astype(str)
test_dataset_df = pd.read_json(test_dataset_path, lines=True).astype(str)

train_dataset = Dataset.from_pandas(train_dataset_df)
test_dataset = Dataset.from_pandas(test_dataset_df)

train_dataset.save_to_disk('translation-dataset-v3-train.hf')
test_dataset.save_to_disk('translation-dataset-v3-test.hf')

# Finetuning Llama 3 8B using Axolotl

To give you a brief intro about Axolotl, it is a tool designed to streamline LLM finetuning. I like Axolotl because it lets you focus on the data instead of focusing on the finetuning code, while having the best finetuning practicies built-in.

In this project, I used a pretty simple finetuning configuration that I'll provide below. To summarize what the configuration entails:
1. It loads a Llama 3 8B in 8bit
2. It uses the [alpaca format](https://github.com/tatsu-lab/stanford_alpaca) for finetuning
3. It finetunes a [LoRA](https://sebastianraschka.com/blog/2023/llm-finetuning-lora.html) adapter instead of doing a full finetune
4. It uses [sample packing](https://axolotl-ai-cloud.github.io/axolotl/docs/multipack.html) to improve finetuning efficiency
5. It trains the model for 2 epochs, while running 10 evals per epoch
6. It logs the train and eval loss into weights and biases

The finetuning was carried out on a single A5000 GPU (24 GB VRAM) on [Jarvis Labs](https://jarvislabs.ai/) that costs 0.49$/hr, and took around 10 hours to complete. You can check the weights and biases log over [here](https://wandb.ai/ahmedsamirio/en_eg_translator/runs/hwzxxt0r).

For more info about Axolotl, I highly recommend the [documentation](https://axolotl-ai-cloud.github.io/axolotl/), and checking this [short video guide](https://www.youtube.com/watch?v=HAYPoeC41fw) by [Jarvis Labs](https://jarvislabs.ai/) that shows how to spin up an instance that uses axolotl over there.

```yaml
base_model: meta-llama/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: true
load_in_4bit: false
strict: false

datasets:
  - path: translation-dataset-v3-train.hf
    type: alpaca
    train_on_split: train

test_datasets:
  - path: translation-dataset-v3-test.hf
    type: alpaca
    split: train

dataset_prepared_path: ./last_run_prepared
output_dir: ./llama_3_translator
hub_model_id: ahmedsamirio/llama_3_translator_v3


sequence_len: 2048
sample_packing: true
pad_to_sequence_len: true
eval_sample_packing: false

adapter: lora
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj

wandb_project: en_eg_translator
wandb_entity: ahmedsamirio
wandb_name: llama_3_en_eg_translator_v3

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 2
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 2e-5

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
evals_per_epoch: 10
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  pad_token: <|end_of_text|>
```


# Comparing the finetuned model with GPT-4o

Of course, the comparison here will go in facor of GPT-4o, but since we were aiming to emulate it's performance, let's make a comparison to see how far off our finetuned model is.

I'll use three random sample responses from the [alpaca dataset](https://huggingface.co/datasets/tatsu-lab/alpaca).

In [79]:
alpaca_sample = [
    """I had to make a difficult decision when I was working as a project manager at a construction company. I was in charge of a project that needed to be completed by a certain date in order to meet the client’s expectations. However, due to unexpected delays, we were not able to meet the deadline and so I had to make a difficult decision. I decided to extend the deadline, but I had to stretch the team’s resources even further and increase the budget. Although it was a risky decision, I ultimately decided to go ahead with it to ensure that the project was completed on time and that the client’s expectations were met. The project was eventually successfully completed and this was seen as a testament to my leadership and decision-making abilities.""",
    """There are several factors that contribute to an individual's success, such as hard work and dedication, effective communication skills, positive attitude, good time management, a clear vision and specific goals, problem-solving and decision-making skills, willingness to take risks, resilience and adaptability, prioritization and organization, proactivity, self-motivation, personal growth, and the ability to collaborate with others.""",
    """Cats and dogs are both beloved pets, but they have important differences. Dogs are typically more outgoing and energetic, while cats are considered more independent. Dogs tend to be more social and active, enjoying walks and playing with other animals. Cats, on the other hand, tend to be more solitary, preferring to relax and snuggle up in a warm spot. Dogs typically require more care and attention, while cats are more self-sufficient. Despite these differences, cats and dogs remain popular and loving pets."""
]

In [7]:
#| code-fold: false

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("ahmedsamirio/Egyptian-Arabic-Translator-Llama-3-8B")
model = AutoModelForCausalLM.from_pretrained("ahmedsamirio/Egyptian-Arabic-Translator-Llama-3-8B", load_in_8bit=True, device_map="cuda")
pipe = pipeline(task='text-generation', model=model, tokenizer=tokenizer)

In [83]:
#| code-fold: false

ar_template = """<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Translate the following text to Arabic.

### Input:
{text}

### Response:
"""

eg_template = """<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Translate the following text to Egyptian Arabic.

### Input:
{text}

### Response:
"""

def get_output(prompt, max_new_tokens):
    out = pipe(prompt, max_new_tokens=max_new_tokens, do_sample=False, temperature=None)
    return out[0]['generated_text'].split("### Response:\n")[-1]

def get_ft_model_translations(text):
    eg_text = get_output(eg_template.format(text=text), 512)
    translations = {"eg": eg_text}
    return translations

def get_gpt4o_translations(text):
    response = client.chat.completions.create(
      model="gpt-4o",
      messages=[{"role": "system", "content": SYSTEM_PROMPT}, 
                {"role": "user", "content": f"Translate the following text:\n{text}"}],
      temperature=1,
      max_tokens=512,
      top_p=1
    )
    translations = convert_text_to_dict(response.choices[0].message.content)
    return translations

def compare_translations(text):
    ft_translations = get_ft_model_translations(text)
    gpt4o_translations = get_gpt4o_translations(text)
    
    print("Original Text:\n")
    print(text)
    print()
    
    print('GPT-4o Translation:\n')
    print(gpt4o_translations['eg'])
    print()
    
    print('Fintuned Model Translation:\n')
    print(ft_translations['eg'])
    print()

### Sample 1

In [74]:
compare_translations(alpaca_sample[0])

Original Text:

I had to make a difficult decision when I was working as a project manager at a construction company. I was in charge of a project that needed to be completed by a certain date in order to meet the client’s expectations. However, due to unexpected delays, we were not able to meet the deadline and so I had to make a difficult decision. I decided to extend the deadline, but I had to stretch the team’s resources even further and increase the budget. Although it was a risky decision, I ultimately decided to go ahead with it to ensure that the project was completed on time and that the client’s expectations were met. The project was eventually successfully completed and this was seen as a testament to my leadership and decision-making abilities.

GPT-4o Translation:

اضطريت آخد قرار صعب وأنا كنت شغال كمدير مشروع في شركة مقاولات. كنت مسؤول عن مشروع لازم يخلص في معاد معين عشان نرضي العميل. بس بسبب تأخيرات غير متوقعة، ماقدرناش نلتزم بالميعاد، فكان لازم آخد قرار صعب. قررت أمد ال

### Sample 2

In [84]:
compare_translations(alpaca_sample[1])

Original Text:

There are several factors that contribute to an individual's success, such as hard work and dedication, effective communication skills, positive attitude, good time management, a clear vision and specific goals, problem-solving and decision-making skills, willingness to take risks, resilience and adaptability, prioritization and organization, proactivity, self-motivation, personal growth, and the ability to collaborate with others.

GPT-4o Translation:

في عوامل كتير بتساهم في نجاح الشخص، زي الشغل الجامد والاجتهاد، مهارات التواصل الفعّالة، النظرة الإيجابية، إدارة الوقت بشكل كويس، رؤية واضحة وأهداف محددة، مهارات حل المشاكل واتخاذ القرار، الرغبة في المخاطرة، المرونة والتكيف، الأولويات والتنظيم، المبادرة، التحفيز الذاتي، النمو الشخصي، والقدرة على التعاون مع الناس التانية.

Fintuned Model Translation:

فيه عوامل كتير بتساهم في نجاح الشخص، زي الشغل الجاد والتفاني، مهارات التواصل الفعّالة، المزاج الإيجابي، إدارة الوقت بشكل كويس، رؤية واضحة وأهداف محددة، مهارات حل المشاكل واتخ

### Sample 3

In [85]:
compare_translations(alpaca_sample[2])

Original Text:

Cats and dogs are both beloved pets, but they have important differences. Dogs are typically more outgoing and energetic, while cats are considered more independent. Dogs tend to be more social and active, enjoying walks and playing with other animals. Cats, on the other hand, tend to be more solitary, preferring to relax and snuggle up in a warm spot. Dogs typically require more care and attention, while cats are more self-sufficient. Despite these differences, cats and dogs remain popular and loving pets.

GPT-4o Translation:

القطط والكلاب الحيوانات دي الاتنين محبوبين، بس في اختلافات مهمة بينهم. الكلاب عادةً بتكون أكثر انفتاح ونشاط، في حين إن القطط بتحب تستقل. الكلاب بتحب الاختلاط وبتكون نشيطة، بتمبسط من المشي واللعب مع الحيوانات التانية. لكن القطط بتميل للعزلة، وبتحب تسترخى وتتمدد في مكان دافئ. الكلاب بتطلب رعاية واهتمام أكتر، بس القطط بتعتمد على نفسها أكتر. رغم الاختلافات دي، القطط والكلاب لسه حيوانات أليفة محبوبة وشعبية.

Fintuned Model Translation:

القطط والكلاب

# Conclusion

As you can see, the results are still not that great, however the finetuned model is almost catching up with GPT-4o, and I believe that with some minor tweaks in the finetuning methodology, we can achieve a performance close to that of GPT-4o.

# TL;DR

- Used GPT-4o to create translation pairs from English to Modern Standard Arabic and then to Egyptian Arabic.
- Generated a dataset from the OpenAssistant/oasst2 messages using GPT-4o and OpenAI's batch API.
- Prepared and processed the dataset for fine-tuning.
- Fine-tuned Llama-3-8B using Axolotl, focusing on LoRA adapters and sample packing.
- Evaluated the fine-tuned model against GPT-4o using sample texts.
- Found that while the fine-tuned model isn't perfect, it's a significant step towards accessible Egyptian Arabic translations.