# Roberta-aegyptustranslit-classifier

## Overview
A fine-tuned RoBERTa-base model for classifying Ancient Egyptian transliterations into their respective historical time periods ('Predynastic & Early Dynastic', 'Old Kingdom & First Intermediate', 'Middle Kingdom & Second Intermediate', 'New Kingdom & Third Intermediate', 'Late Period & Greco-Roman Egypt').
## Training Results

| Metric | Value |
|--------|-------|
| **F1 Score** | ~0.562 |
| **Weighted F1** | ~0.567 |
| **Validation Loss** | ~1.5295 |
| **Epochs** | 20 |
| **Learning Rate** | 2e-5 |
| **Batch Size** | 64 |

## Per-class F1 scores:
- **Predynastic & Early Dynastic:** F1 = 0.576
- **Old Kingdom & First Intermediate:** F1 = 0.432
- **Middle Kingdom & Second Intermediate:** F1 = 0.468
- **New Kingdom & Third Intermediate:** F1 = 0.713
- **Late Period & Greco-Roman Egypt:** F1 = 0.608

## Intended Use & Limitations

This model is designed for **historical text classification** and intended for exploratory research and as a performance baseline. 
Current constraints include:  
- **Data limitations**: ~10k balanced samples may not represent all orthographic variations.  
- **Period bias**: Middle Kingdom classification (F1=0.47) underperforms due to:  
  - Orthographic overlap with neighboring periods  
- **Best practices**: Always verify critical classifications with primary sources.  

**Roadmap**:  
- Expand to ccorpus samples (balanced & unbalanced)

## Data Used For Training

Thesaurus Linguae Aegyptiae, Late Egyptian sentences, corpus v19, premium, https://huggingface.co/datasets/thesaurus-linguae-aegyptiae/tla-late_egyptian-v19-premium, v1.0, 1/19/2025 ed. by Tonio Sebastian Richter & Daniel A. Werning on behalf of the Berlin-Brandenburgische Akademie der Wissenschaften and Hans-Werner Fischer-Elfert & Peter Dils on behalf of the Sächsische Akademie der Wissenschaften zu Leipzig.
Thesaurus Linguae Aegyptiae, Original Earlier Egyptian sentences, corpus v18, premium, https://huggingface.co/datasets/thesaurus-linguae-aegyptiae/tla-Earlier_Egyptian_original-v18-premium, v1.1, 2/16/2024 ed. by Tonio Sebastian Richter & Daniel A. Werning on behalf of the Berlin-Brandenburgische Akademie der Wissenschaften and Hans-Werner Fischer-Elfert & Peter Dils on behalf of the Sächsische Akademie der Wissenschaften zu Leipzig.
Thesaurus Linguae Aegyptiae, Demotic sentences, corpus v18, premium https://huggingface.co/datasets/thesaurus-linguae-aegyptiae/tla-demotic-v18-premium, v1.1, 2/16/2024 ed. by Tonio Sebastian Richter & Daniel A. Werning on behalf of the Berlin-Brandenburgische Akademie der Wissenschaften and Hans-Werner Fischer-Elfert & Peter Dils on behalf of the Sächsische Akademie der Wissenschaften zu Leipzig.
## Usage
```python
# Using pipline
from transformers import pipeline
classifier = pipeline("text-classification", model="RamzyBakir/roberta-aegyptustranslit-classifier")
classifier("bn-ꞽw n rmṯ-ḫm ꞽn ꜥn pꜣ nty ḫꜣꜥ pꜣ myṱ r-ḏbꜣ swg")
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("RamzyBakir/roberta-aegyptustranslit-classifier")
tokenizer = AutoTokenizer.from_pretrained("RamzyBakir/roberta-aegyptustranslit-classifier")


In [1]:
from datasets import load_dataset

from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer

import numpy as np
from transformers import DataCollatorWithPadding

2025-05-07 11:01:09.989972: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1746615670.195755      31 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1746615670.250481      31 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


# Data Loading and Preparation

In [2]:
def label_dates(df):
    # Define period boundaries
    periods = [
        {"name": "Predynastic & Early Dynastic", "start": -4300, "end": -2675},
        {"name": "Old Kingdom & First Intermediate", "start": -2675, "end": -1980},
        {"name": "Middle Kingdom & Second Intermediate", "start": -1980, "end": -1539},
        {"name": "New Kingdom & Third Intermediate", "start": -1539, "end": -656},
        {"name": "Late Period & Greco-Roman Egypt", "start": -664, "end": 642}
    ]

    # Initialize the datelabel column with "Unknown"
    df['datelabel'] = "Unknown"

    # Process each row individually
    for i in df.index:
        for period in periods:
            if df.at[i, 'dateNotBefore'] <= period["end"] and df.at[i, 'dateNotAfter'] >= period["start"]:
                df.at[i, 'datelabel'] = period["name"]
                break  # Take the first matching period and stop checking

    return df

In [3]:
import pandas as pd
import requests
import json
from pandas import json_normalize

early = pd.read_json("hf://datasets/thesaurus-linguae-aegyptiae/tla-Earlier_Egyptian_original-v18-premium/train.jsonl", lines=True)
early = label_dates(early)
late = pd.read_json("hf://datasets/thesaurus-linguae-aegyptiae/tla-late_egyptian-v19-premium/train.jsonl", lines=True)
late = label_dates(late)

# URL for the Hugging Face Datasets API
url = "https://huggingface.co/datasets/thesaurus-linguae-aegyptiae/tla-demotic-v18-premium/raw/main/train.jsonl"

# Make the request
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Split the text by lines because it's a JSONL file
    lines = response.text.strip().split('\n')

    # Parse each line as JSON
    data = [json.loads(line) for line in lines]

    # Now each item is a normal JSON object you can work with
    dfs = [json_normalize(item) for item in data]

    # If you want a single DataFrame
    demotic = pd.concat(dfs, ignore_index=True)
demotic['dateNotBefore'] = demotic['dateNotBefore'].replace('', 0)
demotic['dateNotAfter'] = demotic['dateNotAfter'].replace('', 0)
demotic['dateNotBefore'] = demotic['dateNotBefore'].astype(int)
demotic['dateNotAfter'] = demotic['dateNotAfter'].astype(int)
demotic = label_dates(demotic)

In [4]:
merged_corpus_df = pd.concat([
    early[["transliteration", "datelabel"]],
    late[["transliteration", "datelabel"]],
    demotic[["transliteration", "datelabel"]]],
    ignore_index=True)
merged_corpus_df.shape

(29762, 2)

In [5]:
period_mapping = {
    "Predynastic & Early Dynastic": 0,
    "Old Kingdom & First Intermediate": 1,
    "Middle Kingdom & Second Intermediate": 2,
    "New Kingdom & Third Intermediate": 3,
    "Late Period & Greco-Roman Egypt": 4
}
merged_corpus_df["period_label"] = merged_corpus_df["datelabel"].map(period_mapping)

In [6]:
merged_corpus_df = merged_corpus_df.drop("datelabel",axis=1)
merged_corpus_df.head()

Unnamed: 0,transliteration,period_label
0,nḏ (w)di̯ r =s,2
1,n ṯw ꞽm =sn,1
2,ḫꜣ m tʾ ḥnq.t kꜣ(.PL) ꜣpd(.PL) n ꞽmꜣḫ ꞽm.ꞽ-rʾ-...,2
3,ꜥḥꜥ,1
4,(w)sꞽr wnꞽs m n =k ꞽr.t-ḥr.w ꞽꜥb n =k s(ꞽ) ꞽr ...,1


In [7]:
id2label = {
    0: "Predynastic & Early Dynastic",
    1: "Old Kingdom & First Intermediate",
    2: "Middle Kingdom & Second Intermediate",
    3: "New Kingdom & Third Intermediate",
    4: "Late Period & Greco-Roman Egypt"
}

label2id = {
    "Predynastic & Early Dynastic": 0,
    "Old Kingdom & First Intermediate": 1,
    "Middle Kingdom & Second Intermediate": 2,
    "New Kingdom & Third Intermediate": 3,
    "Late Period & Greco-Roman Egypt": 4
}

# Model and Tokenizer loading

In [9]:
# Load model directly
from transformers import RobertaTokenizerFast, RobertaForSequenceClassification

model_path = "FacebookAI/roberta-base"

model = RobertaForSequenceClassification.from_pretrained(model_path, 
                                                           num_labels=5, 
                                                           id2label=id2label, 
                                                           label2id=label2id,)
tokenizer = RobertaTokenizerFast.from_pretrained("FacebookAI/roberta-base")

config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at FacebookAI/roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [10]:
with open("text.txt", "r") as f:
    text = f.read()
    print(len(text))  # Number of characters

984771


# Data Transformation, Labeling and Balancing

In [10]:
num_labels = []
for i in merged_corpus_df["period_label"].unique():
    val = merged_corpus_df[merged_corpus_df["period_label"] == i].shape[0]
    num_labels.append(val)
    print(id2label[i],"=",val)

def create_balanced_dataset(df):
    # Randomly sample "ham" instances to match the number of min label instances
    early_subset = df[df["period_label"] == 0].sample(min(num_labels), random_state=123)
    old_first_subset = df[df["period_label"] == 1].sample(min(num_labels), random_state=123)
    middle_second_subset = df[df["period_label"] == 2].sample(min(num_labels), random_state=123)
    new_third_subset = df[df["period_label"] == 3].sample(min(num_labels), random_state=123)
    late_subset = df[df["period_label"] == 4].sample(min(num_labels), random_state=123)
    # Combine subsets
    balanced_df = pd.concat([
        early_subset[["transliteration", "period_label"]],
        old_first_subset[["transliteration", "period_label"]],
        middle_second_subset[["transliteration", "period_label"]],
        new_third_subset[["transliteration", "period_label"]],
        late_subset[["transliteration", "period_label"]]],
        ignore_index=True)

    return balanced_df

balanced_df = create_balanced_dataset(merged_corpus_df)
print(balanced_df["period_label"].value_counts())

Middle Kingdom & Second Intermediate = 3476
Old Kingdom & First Intermediate = 6945
Predynastic & Early Dynastic = 2375
New Kingdom & Third Intermediate = 3740
Late Period & Greco-Roman Egypt = 13226
period_label
0    2375
1    2375
2    2375
3    2375
4    2375
Name: count, dtype: int64


In [11]:
print(merged_corpus_df["period_label"].value_counts())

period_label
4    13226
1     6945
3     3740
2     3476
0     2375
Name: count, dtype: int64


In [12]:
def random_split(df, train_frac, validation_frac):
    # Shuffle the entire DataFrame
    df = df.sample(frac=1, random_state=123).reset_index(drop=True)

    # Calculate split indices
    train_end = int(len(df) * train_frac)
    validation_end = train_end + int(len(df) * validation_frac)

    # Split the DataFrame
    train_df = df[:train_end]
    validation_df = df[train_end:validation_end]
    test_df = df[validation_end:]

    return train_df, validation_df, test_df

utrain_df, uvalidation_df, utest_df = random_split(merged_corpus_df, 0.8, 0.1)
utrain_df.head()

Unnamed: 0,transliteration,period_label
0,pꜣ nkt nty-ꞽw =k ꞽw dy(.t) s n =s nkt ꜥn wḫꜣ =...,4
1,r nꜣ rd.w n pꜣ sẖ-ḥr sẖ ẖr-rd.wy pꜣ sẖ n-rn =f,4
2,pꜣ wr-ꞽꜣbṱ pꜣ-qll,4
3,tw =f ꞽn =w wꜥ šrṱ n šs-n-nsw mtw =f ꞽ.ꞽr-ḥr =f,4
4,zꜣ =k pw n(.ꞽ) ḏ.t =k n ḏ.t,1


In [13]:
btrain_df, bvalidation_df, btest_df = random_split(balanced_df, 0.8, 0.1)
btrain_df.head()

Unnamed: 0,transliteration,period_label
0,ta-mn ta pꜣ-ꞽgš sꜣ pꜣ-ꜥlꜥl pꜣ h̭ꜥ(m) mw.t =s t...,4
1,ꞽr =f wꜥb n nꜣ rpy.w n pꜣ tš-n-nw.t,4
2,ḥtp-ḏi̯ nswt ḥtp-ḏi̯ ꞽnp.w ḫnt.ꞽ-zḥ-nṯr,1
3,ḏd.ꞽn nm.tꞽ-nḫt pn,2
4,ḥmw.tꞽ wḥꜥ,0


In [18]:
def preprocess_function(examples):
    return tokenizer(examples["transliteration"], truncation=True)

In [15]:
from datasets import Dataset

# unbalanced
utrain_dataset = Dataset.from_pandas(utrain_df)
uval_dataset = Dataset.from_pandas(uvalidation_df)
utest_dataset = Dataset.from_pandas(utest_df)
# balanced
btrain_dataset = Dataset.from_pandas(btrain_df)
bval_dataset = Dataset.from_pandas(bvalidation_df)
btest_dataset = Dataset.from_pandas(btest_df)

In [19]:
utrain_dataset = utrain_dataset.map(preprocess_function, batched=True)
uval_dataset = uval_dataset.map(preprocess_function, batched=True)
utest_dataset = utest_dataset.map(preprocess_function, batched=True)

btrain_dataset = btrain_dataset.map(preprocess_function, batched=True)
bval_dataset = bval_dataset.map(preprocess_function, batched=True)
btest_dataset = btest_dataset.map(preprocess_function, batched=True)

Map:   0%|          | 0/23809 [00:00<?, ? examples/s]

Map:   0%|          | 0/2976 [00:00<?, ? examples/s]

Map:   0%|          | 0/2977 [00:00<?, ? examples/s]

Map:   0%|          | 0/9500 [00:00<?, ? examples/s]

Map:   0%|          | 0/1187 [00:00<?, ? examples/s]

Map:   0%|          | 0/1188 [00:00<?, ? examples/s]

In [20]:
utrain_dataset = utrain_dataset.rename_column("period_label", "labels")
uval_dataset = uval_dataset.rename_column("period_label", "labels")
utest_dataset = utest_dataset.rename_column("period_label", "labels")
btrain_dataset = btrain_dataset.rename_column("period_label", "labels")
bval_dataset = bval_dataset.rename_column("period_label", "labels")
btest_dataset = btest_dataset.rename_column("period_label", "labels")

In [21]:
# freeze base model parameters
for name, param in model.base_model.named_parameters():
    param.requires_grad = False

# unfreeze base model pooling layers
for name, param in model.base_model.named_parameters():
    if "pooler" in name:
        param.requires_grad = True

In [22]:
from transformers import Trainer, TrainingArguments
from sklearn.metrics import classification_report,accuracy_score, f1_score

label_names = ['Predynastic & Early Dynastic', 'Old Kingdom & First Intermediate', 'Middle Kingdom & Second Intermediate', 'New Kingdom & Third Intermediate', 'Late Period & Greco-Roman Egypt']

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = logits.argmax(axis=-1)
    
    accuracy = accuracy_score(labels, predictions)
    f1 = f1_score(labels, predictions, average="macro")
    report = classification_report(
        labels, predictions, target_names=label_names, output_dict=True, zero_division=0
    )

    print("\nPer-class F1 scores:")
    for label in label_names:
        print(f"{label}: F1 = {report[label]['f1-score']:.3f}")
    
    return {
        "accuracy": accuracy,
        "f1": f1,  
        "weighted_f1": f1_score(labels, predictions, average="weighted")
    }

In [24]:
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

In [25]:
model

RobertaForSequenceClassification(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(50265, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0-11): 12 x RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
         

# Model Training

## 1- Model Fine-tunning on Balanced Data

In [26]:
# hyperparameters
lr = 2e-5
batch_size = 64
num_epochs = 20

training_args = TrainingArguments(
    report_to = "none",
    output_dir="roberta-aeTranslit-classifier_optimized-new-tokenizer-balanced",
    learning_rate=lr, 
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=num_epochs,
    logging_strategy="epoch",
    eval_strategy="epoch",
    save_strategy="epoch",
    weight_decay=0.01,
    adam_beta1=0.9,
    adam_beta2=0.999,
    adam_epsilon=1e-8,
    lr_scheduler_type="linear",
    warmup_ratio=0.1,   
    metric_for_best_model="weighted_f1",
    save_total_limit=10,
    load_best_model_at_end=True,
)

In [28]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=btrain_dataset,
    eval_dataset=bval_dataset,
    processing_class=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()



Epoch,Training Loss,Validation Loss,Accuracy,F1,Weighted F1
1,1.6127,1.611171,0.213142,0.119119,0.111414
2,1.606,1.602151,0.374895,0.272582,0.268411
3,1.5984,1.594909,0.306655,0.248548,0.244632
4,1.5918,1.587975,0.419545,0.325245,0.321192
5,1.5833,1.581635,0.526537,0.513853,0.51604
6,1.5771,1.574976,0.525695,0.496804,0.502083
7,1.569,1.569198,0.502949,0.464219,0.462967
8,1.5635,1.564449,0.501264,0.460828,0.460325
9,1.5577,1.558656,0.513058,0.477067,0.478311
10,1.5523,1.554013,0.518955,0.495399,0.497647



Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.273
Old Kingdom & First Intermediate: F1 = 0.000
Middle Kingdom & Second Intermediate: F1 = 0.322
New Kingdom & Third Intermediate: F1 = 0.000
Late Period & Greco-Roman Egypt: F1 = 0.000





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.606
Old Kingdom & First Intermediate: F1 = 0.000
Middle Kingdom & Second Intermediate: F1 = 0.281
New Kingdom & Third Intermediate: F1 = 0.476
Late Period & Greco-Roman Egypt: F1 = 0.000





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.409
Old Kingdom & First Intermediate: F1 = 0.008
Middle Kingdom & Second Intermediate: F1 = 0.353
New Kingdom & Third Intermediate: F1 = 0.472
Late Period & Greco-Roman Egypt: F1 = 0.000





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.593
Old Kingdom & First Intermediate: F1 = 0.088
Middle Kingdom & Second Intermediate: F1 = 0.354
New Kingdom & Third Intermediate: F1 = 0.560
Late Period & Greco-Roman Egypt: F1 = 0.031





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.590
Old Kingdom & First Intermediate: F1 = 0.287
Middle Kingdom & Second Intermediate: F1 = 0.495
New Kingdom & Third Intermediate: F1 = 0.667
Late Period & Greco-Roman Egypt: F1 = 0.531





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.574
Old Kingdom & First Intermediate: F1 = 0.386
Middle Kingdom & Second Intermediate: F1 = 0.322
New Kingdom & Third Intermediate: F1 = 0.639
Late Period & Greco-Roman Egypt: F1 = 0.563





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.591
Old Kingdom & First Intermediate: F1 = 0.283
Middle Kingdom & Second Intermediate: F1 = 0.438
New Kingdom & Third Intermediate: F1 = 0.667
Late Period & Greco-Roman Egypt: F1 = 0.407





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.571
Old Kingdom & First Intermediate: F1 = 0.415
Middle Kingdom & Second Intermediate: F1 = 0.394
New Kingdom & Third Intermediate: F1 = 0.668
Late Period & Greco-Roman Egypt: F1 = 0.429





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.570
Old Kingdom & First Intermediate: F1 = 0.361
Middle Kingdom & Second Intermediate: F1 = 0.454
New Kingdom & Third Intermediate: F1 = 0.684
Late Period & Greco-Roman Egypt: F1 = 0.493





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.576
Old Kingdom & First Intermediate: F1 = 0.385
Middle Kingdom & Second Intermediate: F1 = 0.444
New Kingdom & Third Intermediate: F1 = 0.691
Late Period & Greco-Roman Egypt: F1 = 0.501





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.576
Old Kingdom & First Intermediate: F1 = 0.322
Middle Kingdom & Second Intermediate: F1 = 0.460
New Kingdom & Third Intermediate: F1 = 0.686
Late Period & Greco-Roman Egypt: F1 = 0.525





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.574
Old Kingdom & First Intermediate: F1 = 0.332
Middle Kingdom & Second Intermediate: F1 = 0.473
New Kingdom & Third Intermediate: F1 = 0.730
Late Period & Greco-Roman Egypt: F1 = 0.561





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.578
Old Kingdom & First Intermediate: F1 = 0.392
Middle Kingdom & Second Intermediate: F1 = 0.463
New Kingdom & Third Intermediate: F1 = 0.728
Late Period & Greco-Roman Egypt: F1 = 0.617





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.584
Old Kingdom & First Intermediate: F1 = 0.362
Middle Kingdom & Second Intermediate: F1 = 0.462
New Kingdom & Third Intermediate: F1 = 0.726
Late Period & Greco-Roman Egypt: F1 = 0.591





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.587
Old Kingdom & First Intermediate: F1 = 0.408
Middle Kingdom & Second Intermediate: F1 = 0.467
New Kingdom & Third Intermediate: F1 = 0.726
Late Period & Greco-Roman Egypt: F1 = 0.625





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.580
Old Kingdom & First Intermediate: F1 = 0.404
Middle Kingdom & Second Intermediate: F1 = 0.454
New Kingdom & Third Intermediate: F1 = 0.714
Late Period & Greco-Roman Egypt: F1 = 0.597





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.578
Old Kingdom & First Intermediate: F1 = 0.428
Middle Kingdom & Second Intermediate: F1 = 0.444
New Kingdom & Third Intermediate: F1 = 0.708
Late Period & Greco-Roman Egypt: F1 = 0.616





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.576
Old Kingdom & First Intermediate: F1 = 0.432
Middle Kingdom & Second Intermediate: F1 = 0.468
New Kingdom & Third Intermediate: F1 = 0.713
Late Period & Greco-Roman Egypt: F1 = 0.608


TrainOutput(global_step=1500, training_loss=1.5567582244873046, metrics={'train_runtime': 1342.6631, 'train_samples_per_second': 141.51, 'train_steps_per_second': 1.117, 'total_flos': 1.6795908173372832e+16, 'train_loss': 1.5567582244873046, 'epoch': 20.0})

In [29]:
print(f"Best model checkpoint: {trainer.state.best_model_checkpoint}")

Best model checkpoint: roberta-aeTranslit-classifier_optimized-new-tokenizer-balanced/checkpoint-1275


In [30]:
import os
save_dir = "/kaggle/working/roberta-aegyptustranslit-classifier-balanced"
os.makedirs(save_dir, exist_ok=True)

trainer.save_model(save_dir)
tokenizer.save_pretrained(save_dir)

('/kaggle/working/roberta-aegyptustranslit-classifier-balanced/tokenizer_config.json',
 '/kaggle/working/roberta-aegyptustranslit-classifier-balanced/special_tokens_map.json',
 '/kaggle/working/roberta-aegyptustranslit-classifier-balanced/vocab.json',
 '/kaggle/working/roberta-aegyptustranslit-classifier-balanced/merges.txt',
 '/kaggle/working/roberta-aegyptustranslit-classifier-balanced/added_tokens.json',
 '/kaggle/working/roberta-aegyptustranslit-classifier-balanced/tokenizer.json')

In [31]:
from transformers import pipeline

classifier = pipeline("text-classification", model="/kaggle/working/roberta-aegyptustranslit-classifier-balanced")

Device set to use cuda:0


In [32]:
# Creating Copy of balanced test data
btest_df_copy = btest_df

In [33]:
# Predicting the balanced test data classes
bpreds = []
for cell in btest_df_copy["transliteration"]:
    bpreds.append(classifier(cell)[0]['label'])

btest_df_copy["preds"] = bpreds

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


In [34]:
# Mapping predicted class id to label
btest_df_copy['period_label_text'] = btest_df_copy['period_label'].map(id2label)

# Calculate accuracy
correct_predictions = (btest_df_copy['period_label_text'] == btest_df_copy['preds']).sum()
total_predictions = len(btest_df_copy)
accuracy = (correct_predictions / total_predictions) * 100

# Display results
print(f"Total samples: {total_predictions}")
print(f"Correct predictions: {correct_predictions}")
print(f"Accuracy: {accuracy:.2f}%")

Total samples: 1188
Correct predictions: 679
Accuracy: 57.15%


## 2- Model Fine-tunning on Unalanced and Weighted Data

In [38]:
import torch
from torch.nn import CrossEntropyLoss
from sklearn.utils.class_weight import compute_class_weight

# Compute class weights (sklearn)
class_weights = compute_class_weight(
    "balanced", 
    classes=np.unique(utrain_df["period_label"]), 
    y=utrain_df["period_label"]
)
class_weights = torch.tensor(class_weights, dtype=torch.float32).to('cuda')
class_weights

tensor([2.5115, 0.8517, 1.6910, 1.5985, 0.4523], device='cuda:0')

In [39]:
# Add to Trainer
class WeightedTrainer(Trainer): 
    def compute_loss(self, model, inputs, return_outputs=False,num_items_in_batch=None):
        labels = inputs.get("labels")
        # forward pass
        outputs = model(**inputs)
        logits = outputs.get('logits')
        # compute custom loss
        loss_fct = CrossEntropyLoss(weight=class_weights)
        loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1)).to('cuda')
        return (loss, outputs) if return_outputs else loss

In [40]:
# hyperparameters
lr = 2e-5
batch_size = 64
num_epochs = 10

training_args2 = TrainingArguments(
    report_to = "none",
    output_dir="roberta-aeTranslit-classifier_optimized-unbalanced",
    learning_rate=lr, 
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=num_epochs,
    logging_strategy="epoch",
    eval_strategy="epoch",
    save_strategy="epoch",
    weight_decay=0.01,
    adam_beta1=0.9,
    adam_beta2=0.999,
    adam_epsilon=1e-8,
    lr_scheduler_type="linear",
    warmup_ratio=0.1,   
    metric_for_best_model="weighted_f1",
    save_total_limit=10,
    load_best_model_at_end=True,
)

In [41]:
training_args2

TrainingArguments(
_n_gpu=2,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
average_tokens_across_devices=False,
batch_eval_metrics=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_do_concat_batches=True,
eval_on_start=False,
eval_steps=None,
eval_strategy=IntervalStrategy.EPOCH,
eval_use_gather_object=False

In [42]:
trainer2 = WeightedTrainer(
    model=model,
    args=training_args2,
    train_dataset=utrain_dataset,
    eval_dataset=bval_dataset,
    processing_class=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer2.train()



Epoch,Training Loss,Validation Loss,Accuracy,F1,Weighted F1
1,1.523,1.493049,0.573715,0.559713,0.565764
2,1.5074,1.477557,0.570345,0.559061,0.5666
3,1.4888,1.445254,0.576243,0.555274,0.562858
4,1.4735,1.430102,0.583825,0.570361,0.577322
5,1.4587,1.405639,0.566133,0.538545,0.545861
6,1.449,1.401579,0.580455,0.555526,0.563785
7,1.4358,1.386338,0.583825,0.568242,0.575027
8,1.4324,1.381745,0.58214,0.559159,0.567132
9,1.4289,1.377783,0.583825,0.56107,0.568925
10,1.4241,1.374511,0.581297,0.557174,0.565235



Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.575
Old Kingdom & First Intermediate: F1 = 0.359
Middle Kingdom & Second Intermediate: F1 = 0.450
New Kingdom & Third Intermediate: F1 = 0.735
Late Period & Greco-Roman Egypt: F1 = 0.679





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.579
Old Kingdom & First Intermediate: F1 = 0.482
Middle Kingdom & Second Intermediate: F1 = 0.332
New Kingdom & Third Intermediate: F1 = 0.752
Late Period & Greco-Roman Egypt: F1 = 0.650





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.573
Old Kingdom & First Intermediate: F1 = 0.479
Middle Kingdom & Second Intermediate: F1 = 0.330
New Kingdom & Third Intermediate: F1 = 0.732
Late Period & Greco-Roman Egypt: F1 = 0.662





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.578
Old Kingdom & First Intermediate: F1 = 0.501
Middle Kingdom & Second Intermediate: F1 = 0.368
New Kingdom & Third Intermediate: F1 = 0.740
Late Period & Greco-Roman Egypt: F1 = 0.664





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.582
Old Kingdom & First Intermediate: F1 = 0.493
Middle Kingdom & Second Intermediate: F1 = 0.289
New Kingdom & Third Intermediate: F1 = 0.699
Late Period & Greco-Roman Egypt: F1 = 0.630





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.588
Old Kingdom & First Intermediate: F1 = 0.469
Middle Kingdom & Second Intermediate: F1 = 0.302
New Kingdom & Third Intermediate: F1 = 0.752
Late Period & Greco-Roman Egypt: F1 = 0.667





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.587
Old Kingdom & First Intermediate: F1 = 0.473
Middle Kingdom & Second Intermediate: F1 = 0.377
New Kingdom & Third Intermediate: F1 = 0.745
Late Period & Greco-Roman Egypt: F1 = 0.660





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.585
Old Kingdom & First Intermediate: F1 = 0.479
Middle Kingdom & Second Intermediate: F1 = 0.316
New Kingdom & Third Intermediate: F1 = 0.751
Late Period & Greco-Roman Egypt: F1 = 0.664





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.590
Old Kingdom & First Intermediate: F1 = 0.481
Middle Kingdom & Second Intermediate: F1 = 0.319
New Kingdom & Third Intermediate: F1 = 0.753
Late Period & Greco-Roman Egypt: F1 = 0.661





Per-class F1 scores:
Predynastic & Early Dynastic: F1 = 0.588
Old Kingdom & First Intermediate: F1 = 0.479
Middle Kingdom & Second Intermediate: F1 = 0.308
New Kingdom & Third Intermediate: F1 = 0.748
Late Period & Greco-Roman Egypt: F1 = 0.664


TrainOutput(global_step=1870, training_loss=1.4621599003592916, metrics={'train_runtime': 1657.6925, 'train_samples_per_second': 143.627, 'train_steps_per_second': 1.128, 'total_flos': 2.2187205079291656e+16, 'train_loss': 1.4621599003592916, 'epoch': 10.0})

In [43]:
print(f"Best model checkpoint: {trainer2.state.best_model_checkpoint}")

Best model checkpoint: roberta-aeTranslit-classifier_optimized-unbalanced/checkpoint-748


In [44]:
# Define save directory
import os
save_dir = "/kaggle/working/roberta-aegyptustranslit-classifier-unbalanced-weighted"
os.makedirs(save_dir, exist_ok=True)

# Save model (will be pytorch_model.bin)
trainer2.save_model(save_dir)
tokenizer.save_pretrained(save_dir)

('/kaggle/working/roberta-aegyptustranslit-classifier-unbalanced-weighted/tokenizer_config.json',
 '/kaggle/working/roberta-aegyptustranslit-classifier-unbalanced-weighted/special_tokens_map.json',
 '/kaggle/working/roberta-aegyptustranslit-classifier-unbalanced-weighted/vocab.json',
 '/kaggle/working/roberta-aegyptustranslit-classifier-unbalanced-weighted/merges.txt',
 '/kaggle/working/roberta-aegyptustranslit-classifier-unbalanced-weighted/added_tokens.json',
 '/kaggle/working/roberta-aegyptustranslit-classifier-unbalanced-weighted/tokenizer.json')

In [45]:
uwclassifier = pipeline("text-classification", model="/kaggle/working/roberta-aegyptustranslit-classifier-unbalanced-weighted")

Device set to use cuda:0


In [49]:
copy_btest_df2 = btest_df.drop(columns = ["preds","period_label_text"],axis=1)
copy_btest_df2.head()

Unnamed: 0,transliteration,period_label
10687,ꜥnḫ by =s m-bꜣḥ wsꞽr ḫnṱ-ꞽmnṱ nṯr-ꜥꜣ nb-ꞽbt,4
10688,ngꜣy,1
10689,n rḫ =ꞽ st,2
10690,nfr-ẖn(.tꞽ)-n-nswt,1
10691,ḥtp ꞽb ⸗f ḥr mꜣꜥ.t,3


In [50]:
uwpreds = []
for cell in copy_btest_df2["transliteration"]:
    uwpreds.append(uwclassifier(cell)[0]['label'])

copy_btest_df2["preds"] = uwpreds

In [51]:
copy_btest_df2['period_label_text'] = copy_btest_df2['period_label'].map(id2label)

# Calculate accuracy
correct_predictions = (copy_btest_df2['period_label_text'] == copy_btest_df2['preds']).sum()
total_predictions = len(copy_btest_df2)
accuracy = (correct_predictions / total_predictions) * 100

# Display results
print(f"Total samples: {total_predictions}")
print(f"Correct predictions: {correct_predictions}")
print(f"Accuracy: {accuracy:.2f}%")

Total samples: 1188
Correct predictions: 680
Accuracy: 57.24%


In [53]:
# Unbalanced test data
copy_utest_df2 = utest_df

uwpreds2 = []
for cell in copy_utest_df2["transliteration"]:
    uwpreds2.append(uwclassifier(cell)[0]['label'])

copy_utest_df2["preds"] = uwpreds2

copy_utest_df2['period_label_text'] = copy_utest_df2['period_label'].map(id2label)

# Calculate accuracy
correct_predictions = (copy_utest_df2['period_label_text'] == copy_utest_df2['preds']).sum()
total_predictions = len(copy_utest_df2)
accuracy = (correct_predictions / total_predictions) * 100

# Display results
print(f"Total samples: {total_predictions}")
print(f"Correct predictions: {correct_predictions}")
print(f"Accuracy: {accuracy:.2f}%")

Total samples: 2977
Correct predictions: 1614
Accuracy: 54.22%
