## Multilingual NER
- Purpose: Demonstrates your ability to work with multilingual NLP tasks.
- Objective: Build a model that can recognize named entities in different languages.
- Dataset: WikiANN (Pan-X)
- Description:
  - Use a pre-trained multilingual model like XLM-RoBERTa.
  - Train it to recognize people, organizations, locations in different languages.
  - Compare fine-tuning with LoRA vs. full fine-tuning.

## Dependencies and Dataset

In [None]:
!pip install -U transformers
!pip install -U accelerate
!pip install -U datasets

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.wh

In [None]:
import json
import requests
import pandas as pd
from datasets import load_dataset

# Load WikiANN for English
wikiann_en = load_dataset("wikiann", "en")
print("English:")
print(wikiann_en)

# Load WikiANN for Chinese Simplified
wikiann_zh = load_dataset("wikiann", "zh")
print("\nChinese Simplified:")
print(wikiann_zh)

# Load WikiANN for Japanese
wikiann_ja = load_dataset("wikiann", "ja")
print("\nJapanese:")
print(wikiann_ja)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/158k [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/748k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/748k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/1.50M [00:00<?, ?B/s]

Generating validation split:   0%|          | 0/10000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/10000 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/20000 [00:00<?, ? examples/s]

English:
DatasetDict({
    validation: Dataset({
        features: ['tokens', 'ner_tags', 'langs', 'spans'],
        num_rows: 10000
    })
    test: Dataset({
        features: ['tokens', 'ner_tags', 'langs', 'spans'],
        num_rows: 10000
    })
    train: Dataset({
        features: ['tokens', 'ner_tags', 'langs', 'spans'],
        num_rows: 20000
    })
})


validation-00000-of-00001.parquet:   0%|          | 0.00/733k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/714k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/1.45M [00:00<?, ?B/s]

Generating validation split:   0%|          | 0/10000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/10000 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/20000 [00:00<?, ? examples/s]


Chinese Simplified:
DatasetDict({
    validation: Dataset({
        features: ['tokens', 'ner_tags', 'langs', 'spans'],
        num_rows: 10000
    })
    test: Dataset({
        features: ['tokens', 'ner_tags', 'langs', 'spans'],
        num_rows: 10000
    })
    train: Dataset({
        features: ['tokens', 'ner_tags', 'langs', 'spans'],
        num_rows: 20000
    })
})


validation-00000-of-00001.parquet:   0%|          | 0.00/852k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/860k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/1.70M [00:00<?, ?B/s]

Generating validation split:   0%|          | 0/10000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/10000 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/20000 [00:00<?, ? examples/s]


Japanese:
DatasetDict({
    validation: Dataset({
        features: ['tokens', 'ner_tags', 'langs', 'spans'],
        num_rows: 10000
    })
    test: Dataset({
        features: ['tokens', 'ner_tags', 'langs', 'spans'],
        num_rows: 10000
    })
    train: Dataset({
        features: ['tokens', 'ner_tags', 'langs', 'spans'],
        num_rows: 20000
    })
})


In [None]:
print(wikiann_en["train"][0])
print(wikiann_zh["train"][0])
print(wikiann_ja["train"][0])

{'tokens': ['R.H.', 'Saunders', '(', 'St.', 'Lawrence', 'River', ')', '(', '968', 'MW', ')'], 'ner_tags': [3, 4, 0, 3, 4, 4, 0, 0, 0, 0, 0], 'langs': ['en', 'en', 'en', 'en', 'en', 'en', 'en', 'en', 'en', 'en', 'en'], 'spans': ['ORG: R.H. Saunders', 'ORG: St. Lawrence River']}
{'tokens': ['2', '0', '0', '9', '年', '：', '李', '民', '基', '《', 'E', 't', 'e', 'r', 'n', 'a', 'l', '#', 'S', 'u', 'm', 'm', 'e', 'r', '》'], 'ner_tags': [0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'langs': ['zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh', 'zh'], 'spans': ['PER: 李 民 基']}
{'tokens': ['#', '#', 'ユ', 'リ', 'ウ', 'ス', '・', 'ベ', 'ー', 'リ', 'ッ', 'ク', '#', '1', '9', '9', '9'], 'ner_tags': [0, 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0], 'langs': ['ja', 'ja', 'ja', 'ja', 'ja', 'ja', 'ja', 'ja', 'ja', 'ja', 'ja', 'ja', 'ja', 'ja', 'ja', 'ja', 'ja'], 'spans': ['PER: ユ リ ウ ス ・ ベ ー リ ッ

In [None]:
from datasets import DatasetDict, concatenate_datasets

merged_dataset = DatasetDict({
    "train": concatenate_datasets([
        wikiann_en["train"].shuffle(seed=42),
        wikiann_zh["train"].shuffle(seed=42),
        wikiann_ja["train"].shuffle(seed=42)
    ]),
    "validation": concatenate_datasets([
        wikiann_en["validation"].shuffle(seed=42),
        wikiann_zh["validation"].shuffle(seed=42),
        wikiann_ja["validation"].shuffle(seed=42)
    ]),
    "test": concatenate_datasets([
        wikiann_en["test"].shuffle(seed=42),
        wikiann_zh["test"].shuffle(seed=42),
        wikiann_ja["test"].shuffle(seed=42)
    ])
})

print(merged_dataset)

DatasetDict({
    train: Dataset({
        features: ['tokens', 'ner_tags', 'langs', 'spans'],
        num_rows: 60000
    })
    validation: Dataset({
        features: ['tokens', 'ner_tags', 'langs', 'spans'],
        num_rows: 30000
    })
    test: Dataset({
        features: ['tokens', 'ner_tags', 'langs', 'spans'],
        num_rows: 30000
    })
})


## Dataset Tokenization & Preparation
- The dataset is loaded from `datasets`, where it means its in our desired format, `datadict` with its needed columns, we may start model building

In [None]:
from transformers import AutoTokenizer
model_ckpt = "xlm-roberta-base"
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)

merged_dataset['train'][3]

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/615 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.10M [00:00<?, ?B/s]

{'tokens': ['Dennis', 'Carroll', '(', ',', '1981-1991', ')'],
 'ner_tags': [1, 2, 0, 0, 0, 0],
 'langs': ['en', 'en', 'en', 'en', 'en', 'en'],
 'spans': ['PER: Dennis Carroll']}

In [None]:
input = merged_dataset['train'][3]['tokens']
output = tokenizer(input, is_split_into_words=True)
tokenizer.convert_ids_to_tokens(output.input_ids)

['<s>',
 '▁Dennis',
 '▁Car',
 'roll',
 '▁(',
 '▁',
 ',',
 '▁1981',
 '-',
 '1991',
 '▁)',
 '</s>']

In [None]:
def tokenize_and_align_labels(examples):
    tokenized_inputs = tokenizer(examples['tokens'], truncation=True, is_split_into_words=True)

    labels = []
    for i, label in enumerate(examples['ner_tags']):
        word_ids = tokenized_inputs.word_ids(batch_index=i)

        previous_word_idx = None
        label_ids = []

        for word_idx in word_ids:
            # if id=-100 then loss is not calculated
            if word_idx is None:
                label_ids.append(-100)

            elif word_idx != previous_word_idx:
                label_ids.append(label[word_idx])

            else:
                label_ids.append(-100)

            previous_word_idx = word_idx

        labels.append(label_ids)

    tokenized_inputs['labels'] = labels

    return tokenized_inputs

tokenized_dataset = merged_dataset.map(tokenize_and_align_labels, batched=True)

Map:   0%|          | 0/60000 [00:00<?, ? examples/s]

Map:   0%|          | 0/30000 [00:00<?, ? examples/s]

Map:   0%|          | 0/30000 [00:00<?, ? examples/s]

In [None]:
tokenized_dataset['train'][3], merged_dataset['train'][3]

({'tokens': ['Dennis', 'Carroll', '(', ',', '1981-1991', ')'],
  'ner_tags': [1, 2, 0, 0, 0, 0],
  'langs': ['en', 'en', 'en', 'en', 'en', 'en'],
  'spans': ['PER: Dennis Carroll'],
  'input_ids': [0, 124748, 3980, 27722, 15, 6, 4, 26771, 9, 76550, 1388, 2],
  'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
  'labels': [-100, 1, 2, -100, 0, 0, -100, 0, -100, -100, 0, -100]},
 {'tokens': ['Dennis', 'Carroll', '(', ',', '1981-1991', ')'],
  'ner_tags': [1, 2, 0, 0, 0, 0],
  'langs': ['en', 'en', 'en', 'en', 'en', 'en'],
  'spans': ['PER: Dennis Carroll']})

## Model Building

### Full FineTuning

In [None]:
from transformers import DataCollatorForTokenClassification

data_collator = DataCollatorForTokenClassification(tokenizer)

In [None]:
merged_dataset["train"].features["ner_tags"].feature.names

['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC']

In [None]:
from transformers import AutoModelForTokenClassification

num_labels = len(set(merged_dataset["train"].features["ner_tags"].feature.names))

model = AutoModelForTokenClassification.from_pretrained(
    model_ckpt, num_labels=num_labels
)

model.safetensors:   0%|          | 0.00/1.12G [00:00<?, ?B/s]

Some weights of XLMRobertaForTokenClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
!pip install evaluate
!pip install seqeval

Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Downloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.3
Collecting seqeval
  Downloading seqeval-1.2.2.tar.gz (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: seqeval
  Building wheel for seqeval (setup.py) ... [?25l[?25hdone
  Created wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16161 sha256=adbbf2305438a48a4d045c0d847cf1934807c6adfb1aad94b45fed3b2dcd7a9e
  Stored in directory: /root/.cache/pip/wheels/bc/92/f0/243288f899c2eacdfa8c5f9aede4c71a9bad0ee26a01dc5ead
Successfully built seqeval
Installing collected packa

In [None]:
# Extract label names from the dataset
label_list = tokenized_dataset["train"].features["ner_tags"].feature.names

# Create an index-to-label mapping
index2tag = {i: label for i, label in enumerate(label_list)}

print(index2tag)

{0: 'O', 1: 'B-PER', 2: 'I-PER', 3: 'B-ORG', 4: 'I-ORG', 5: 'B-LOC', 6: 'I-LOC'}


In [None]:
import numpy as np
import evaluate

metric = evaluate.load("seqeval")  # seqeval is best for NER

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)

    # Convert IDs back to actual labels
    true_labels = [[index2tag[label] for label in label_seq if label != -100] for label_seq in labels]
    pred_labels = [[index2tag[pred] for pred, label in zip(pred_seq, label_seq) if label != -100]
                   for pred_seq, label_seq in zip(predictions, labels)]

    results = metric.compute(predictions=pred_labels, references=true_labels)

    return {
        "precision": results["overall_precision"],
        "recall": results["overall_recall"],
        "f1": results["overall_f1"],
        "accuracy": results["overall_accuracy"],
    }


Downloading builder script:   0%|          | 0.00/6.34k [00:00<?, ?B/s]

In [None]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="./xlm-roberta-ner",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=1,
    weight_decay=0.01,
    # logging_dir="./logs",
    # logging_steps=10,
    # save_total_limit=2,
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()

  trainer = Trainer(


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mmahkotasteam[0m ([33mmahkotasteam-asia-pacific-university-of-technology-innov[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


Epoch,Training Loss,Validation Loss


NameError: name 'index2tag' is not defined

In [None]:
trainer.evaluate() # I have done training, do evaluation will do

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,0.2918,0.279631,0.711456,0.741296,0.726069,0.90986


{'eval_loss': 0.2796313762664795,
 'eval_precision': 0.7114559260482007,
 'eval_recall': 0.7412958549350107,
 'eval_f1': 0.7260694302388544,
 'eval_accuracy': 0.9098603347180197}

#### Saving the Full Fine-Tuned Model

In [None]:
# model.save_pretrained("./xlm-roberta-ner-full-finetuning")
# tokenizer.save_pretrained("./xlm-roberta-ner-full-finetuning")

('./xlm-roberta-ner/tokenizer_config.json',
 './xlm-roberta-ner/special_tokens_map.json',
 './xlm-roberta-ner/sentencepiece.bpe.model',
 './xlm-roberta-ner/added_tokens.json',
 './xlm-roberta-ner/tokenizer.json')

### LoRA Training

In [None]:
!pip install -U peft



In [None]:
from transformers import AutoModelForTokenClassification, Trainer
from peft import LoraConfig, get_peft_model

model_ckpt = "xlm-roberta-base"
num_labels = len(set(merged_dataset["train"].features["ner_tags"].feature.names))

model = AutoModelForTokenClassification.from_pretrained(
    model_ckpt, num_labels=num_labels
)

# Define LoRA configuration
lora_config = LoraConfig(
    r=8,  # Rank (controls parameter reduction)
    lora_alpha=16,  # Scaling factor
    lora_dropout=0.1,
    target_modules=["query", "value"],  # Apply LoRA to attention layers
    bias="none",
    task_type="TOKEN_CLS",  # Token classification task
)

# Convert to a LoRA-enabled model
lora_model = get_peft_model(model, lora_config)
lora_model.print_trainable_parameters()  # Check trainable params

Some weights of XLMRobertaForTokenClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 300,295 || all params: 277,758,734 || trainable%: 0.1081


In [None]:
from transformers import TrainingArguments, Trainer
import torch

training_args_lora = TrainingArguments(
    output_dir="./xlm-roberta-ner-lora",
    num_train_epochs=1, # two passs over the dataset
    per_device_train_batch_size=2, #mbs=2
    gradient_accumulation_steps=16, # effective batch size 16*2
    optim="adamw_torch",
    save_steps=200, # checkpoint every 200 steps
    logging_steps=1,
    learning_rate=2e-4, # step size in the optimizer update
    weight_decay=0.001,
    fp16=True, # 16 bit
    bf16=False, # not supported on V100
    max_grad_norm=0.3, #gradient clipping improves convergence
    max_steps=-1,
    warmup_ratio=0.03, # learning rate warmup
    group_by_length=True,
    lr_scheduler_type="cosine" # cosine lr scheduler
)

trainer = Trainer(
    model=lora_model,
    args=training_args_lora,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

import gc # garbage collection
gc.collect()
torch.cuda.empty_cache() # clean cache

trainer.train()

  trainer = Trainer(


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mmahkotasteam[0m ([33mmahkotasteam-asia-pacific-university-of-technology-innov[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


Step,Training Loss
1,2.1547
2,2.1722
3,2.1724
4,2.1706
5,2.1697
6,2.1937
7,2.1243
8,2.1409
9,2.1774
10,2.1371


TrainOutput(global_step=1875, training_loss=0.6506406379858652, metrics={'train_runtime': 1475.2258, 'train_samples_per_second': 40.672, 'train_steps_per_second': 1.271, 'total_flos': 1070099458555392.0, 'train_loss': 0.6506406379858652, 'epoch': 1.0})

In [None]:
trainer.evaluate()

Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


{'eval_loss': 0.43412038683891296,
 'eval_precision': 0.5352747252747253,
 'eval_recall': 0.5984176515393499,
 'eval_f1': 0.5650877620390028,
 'eval_accuracy': 0.8704241243140673,
 'eval_runtime': 122.0575,
 'eval_samples_per_second': 245.786,
 'eval_steps_per_second': 30.723,
 'epoch': 1.0}

### QLoRA Training

In [None]:
from transformers import AutoModelForTokenClassification, Trainer
from peft import LoraConfig, get_peft_model

model_ckpt = "xlm-roberta-base"
num_labels = len(set(merged_dataset["train"].features["ner_tags"].feature.names))

model = AutoModelForTokenClassification.from_pretrained(
    model_ckpt, num_labels=num_labels
)

# Define LoRA configuration
lora_config = LoraConfig(
    r=12,  # Rank (controls parameter reduction)
    lora_alpha=24,  # Scaling factor
    lora_dropout=0.05,
    target_modules=["query", "value"],  # Apply LoRA to attention layers
    bias="none",
    task_type="TOKEN_CLS",  # Token classification task
)

# Convert to a LoRA-enabled model
lora_model = get_peft_model(model, lora_config)
lora_model.print_trainable_parameters()  # Check trainable params

Some weights of XLMRobertaForTokenClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 447,751 || all params: 277,906,190 || trainable%: 0.1611


In [None]:
from transformers import TrainingArguments, Trainer
import torch

training_args_lora = TrainingArguments(
    output_dir="./xlm-roberta-ner-lora",
    num_train_epochs=3, # two passs over the dataset
    per_device_train_batch_size=2, #mbs=2
    gradient_accumulation_steps=16, # effective batch size 16*2
    optim="adamw_torch",
    save_steps=200, # checkpoint every 200 steps
    logging_steps=1,
    learning_rate=2e-4, # step size in the optimizer update
    weight_decay=0.001,
    fp16=True, # 16 bit
    bf16=False, # not supported on V100
    max_grad_norm=0.3, #gradient clipping improves convergence
    max_steps=-1,
    warmup_ratio=0.03, # learning rate warmup
    group_by_length=True,
    lr_scheduler_type="cosine" # cosine lr scheduler
)

trainer = Trainer(
    model=lora_model,
    args=training_args_lora,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

import gc # garbage collection
gc.collect()
torch.cuda.empty_cache() # clean cache

trainer.train()

  trainer = Trainer(


Step,Training Loss
1,2.0555
2,2.0387
3,2.0339
4,2.0078
5,2.0138
6,2.0207
7,2.0046
8,1.982
9,2.0245
10,1.9877


TrainOutput(global_step=5625, training_loss=0.47345280842251247, metrics={'train_runtime': 4757.9149, 'train_samples_per_second': 37.832, 'train_steps_per_second': 1.182, 'total_flos': 3215659229496288.0, 'train_loss': 0.47345280842251247, 'epoch': 3.0})

In [None]:
trainer.evaluate()

Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


{'eval_loss': 0.3514210283756256,
 'eval_precision': 0.6339044390489715,
 'eval_recall': 0.6799921373989533,
 'eval_f1': 0.6561399765283261,
 'eval_accuracy': 0.8913841735473614,
 'eval_runtime': 108.7483,
 'eval_samples_per_second': 275.866,
 'eval_steps_per_second': 34.483,
 'epoch': 3.0}