# IndoBenchmark: NERP

Named-entity recognition (NER) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc. [[Wikipedia: Named Entity Recognition]](https://en.wikipedia.org/wiki/Named-entity_recognition)

We will try to finetune the bert-base-indonesian-522M for the Named Entity Recognition (NER) task. For this purpose we will use the [NERGRIT Corpus](https://github.com/grit-id/nergrit-corpus) which contains 321.757 lines of train, 66.974 lines of test and 64.208 lines of valid dataset. It uses Inside-Outside-Beginning (IOB) format where each line is composed of a word and its label/category. 

The [NERGRIT Corpus](https://github.com/grit-id/nergrit-corpus) is a very valueable dataset for indonesian NLP researcher. Unfortunately there are many typos or errors on the labels, so I spent some times to analyse the errors, make corrections and report the [issue to their Github repository](https://github.com/grit-id/nergrit-corpus/issues/1). Since the license allows us to redistribute the dataset, I will also publish the original dataset including its corrections. Currently the dataset is only available per [request](https://ner.grit.id/index.php/front/about) (klick the "Get NERGRIT Corpus").


## Transformers or Simpletransformers?

We will use simpletransformers in this case to simplify the training and inferencing

In [1]:
from simpletransformers.ner import NERModel, NERArgs
import pandas as pd
import logging
import sys

In [2]:
logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

We use the corrected dataset which has less lines than the originals (train: 309203, valid: 61680, test: 64568)

In [3]:
data_dir_1 = "/dataset/nergrit-corpus/ner/data"
file_train_1 = f'{data_dir_1}/train_corrected.txt'
file_valid_1 = f'{data_dir_1}/valid_corrected.txt'
file_test_1 = f'{data_dir_1}/test_corrected.txt'
#file_labels_map = f'{data_dir}/labels-map.csv'

In [4]:
data_dir_2 = "/dataset/indonlu/nerp_ner-prosa"
file_train_2 = f'{data_dir_2}/train_valid_preprocess.txt'
file_valid_2 = f'{data_dir_2}/valid_preprocess.txt'
file_test_2 = f'{data_dir_2}/test_preprocess_masked_label.txt'
#file_labels_map = f'{data_dir}/labels-map.csv'

In [5]:
labels = ["B-EVT", "B-FNB", "B-IND", "B-PLC", "B-PPL", 
          "I-EVT", "I-FNB", "I-IND", "I-PLC", "I-PPL", "O"]

In [40]:
labels_map = {'B-CRD' : 'O', 
              'B-DAT' : 'O', 
              'B-EVT' : 'O',
              'B-FAC' : 'O',
              'B-GPE' : 'B-PLACE',
              'B-LAN' : 'O',
              'B-LAW' : 'O',
              'B-LOC' : 'B-PLACE',
              'B-MON' : 'O',
              'B-NOR' : 'B-ORGANISATION',
              'B-ORD' : 'O',
              'B-ORG' : 'B-ORGANISATION',
              'B-PER' : 'B-PERSON',
              'B-PRC' : 'O',
              'B-PRD' : 'O',
              'B-QTY' : 'O',
              'B-REG' : 'O',
              'B-TIM' : 'O',
              'B-WOA' : 'O',
              'I-CRD' : 'O',
              'I-DAT' : 'O',
              'I-EVT' : 'O',
              'I-FAC' : 'O',
              'I-GPE' : 'I-PLACE',
              'I-LAN' : 'O',
              'I-LAW' : 'O',
              'I-LOC' : 'I-PLACE',
              'I-MON' : 'O',
              'I-NOR' : 'I-ORGANISATION',
              'I-ORD' : 'O',
              'I-ORG' : 'I-ORGANISATION',
              'I-PER' : 'I-PERSON',
              'I-PRC' : 'O',
              'I-PRD' : 'O',
              'I-QTY' : 'O',
              'I-REG' : 'O',
              'I-TIM' : 'O',
              'I-WOA' : 'O',
              'O' : 'O'}

The Simpletransformers requires the dataset either as pandas dataframe with following column/format: **sentence_id**, **words**, **labels**, or Text file in CoNLL format. The **sentence_id** is consecutive number determines which words belong to a given sentence. I.e. the words from the same sequence should be assigned the same unique sentence_id.

In [6]:
# Function to read ner file in connl format and return a DataFrame with columns: sentence_id, words, labels
def get_ner_data(filename, labels_map=None):
    word_list = []
    sentence_counter = 0
    with open(filename) as fp:
        for cnt, line in enumerate(fp):
            try:
                texts = line.split()
                if len(texts) != 0:
                    word, label = ' '.join(texts[0:-1]), texts[-1]
                    if labels_map:
                        label = labels_map[label]
                    word_list.append([sentence_counter, word, label])
                else:
                    sentence_counter += 1
            except:
                print("Unexpected error:", sys.exc_info()[0], cnt, line)
                word_list.append([sentence_counter, "", ""])              
                sentence_counter += 1
                pass
    print(f'read {cnt} lines')
    ner_data = pd.DataFrame(word_list, columns=["sentence_id", "words", "labels"])
    return ner_data

In [7]:
file_train_1

'/dataset/nergrit-corpus/ner/data/train_corrected.txt'

In [8]:
train_data_1 = get_ner_data(file_train_1, labels_map)

NameError: name 'labels_map' is not defined

In [9]:
valid_data_1 = get_ner_data(file_valid_1, labels_map)

NameError: name 'labels_map' is not defined

In [11]:
test_data_1 = get_ner_data(file_test_1, labels_map)

read 66972 lines


In [12]:
train_data_1.iloc[:20]

Unnamed: 0,sentence_id,words,labels
0,0,Indonesia,B-PLACE
1,0,mengekspor,O
2,0,produk,O
3,0,industri,O
4,0,skala,O
5,0,besar,O
6,0,ke,O
7,0,Amerika,B-PLACE
8,0,Serikat,I-PLACE
9,0,.,O


In [10]:
train_data_2 = get_ner_data(file_train_2)

read 183597 lines


In [11]:
valid_data_2 = get_ner_data(file_valid_2)

read 20468 lines


In [12]:
test_data_2 = get_ner_data(file_test_2)

read 20791 lines


In [13]:
len(train_data_1),len(valid_data_1),len(test_data_1)

NameError: name 'train_data_1' is not defined

In [14]:
len(train_data_2),len(valid_data_2),len(test_data_2)

(176038, 19629, 19952)

In [15]:
train_data_2.iloc[300:350]

Unnamed: 0,sentence_id,words,labels
300,12,yang,O
301,12,dilansir,O
302,12,nme,B-IND
303,12,.,O
304,13,pada,O
305,13,tahap,O
306,13,verifikasi,O
307,13,awal,O
308,13,sebelumnya,O
309,13,",",O


In [16]:
valid_data_2.head(10)

Unnamed: 0,sentence_id,words,labels
0,0,"""",O
1,0,demi,O
2,0,kebersamaan,O
3,0,",",O
4,0,karena,O
5,0,menyangkut,O
6,0,kepentingan,O
7,0,parpol,O
8,0,",",O
9,0,maka,O


In [17]:
test_data_2.head(10)

Unnamed: 0,sentence_id,words,labels
0,0,kuasa,O
1,0,hukum,O
2,0,teamster,O
3,0,berasal,O
4,0,dari,O
5,0,edmonton,O
6,0,",",O
7,0,namun,O
8,0,tinggal,O
9,0,di,O


## The Labels

The NERGRIT corpus contains 19 entities, each with Inside- and Beginning-Tag, plus an Outside-Tag. Alltogether become 39 categories. The entities have following meaning:
1. 'CRD' --> Cardinal
1. 'DAT' --> Date
1. 'EVT' --> Event
1. 'FAC' --> Facility
1. 'GPE' --> Geopolitical Entity
1. 'LAW' --> Law Entity (such as Undang-Undang)
1. 'LOC' --> Location
1. 'MON' --> Money
1. 'NOR' --> Political Organization
1. 'ORD' --> Ordinal
1. 'ORG' --> Organization
1. 'PER' --> Person
1. 'PRC' --> Percent
1. 'PRD' --> Product
1. 'QTY' --> Quantity
1. 'REG' --> Religion
1. 'TIM' --> Time
1. 'WOA' --> Work of Art
1. 'LAN' --> Language

In [13]:
"""
df = pd.read_csv(file_labels_map, sep=' ', names=['X','Y'])
labels = list(set(df['Y']))
labels.sort()
len(labels)
"""

"\ndf = pd.read_csv(file_labels_map, sep=' ', names=['X','Y'])\nlabels = list(set(df['Y']))\nlabels.sort()\nlen(labels)\n"

In [14]:
labels = ["B-ORGANISATION", "B-PERSON", "B-PLACE", "I-ORGANISATION", "I-PERSON", "I-PLACE", "O"]

In [18]:
labels

['B-EVT',
 'B-FNB',
 'B-IND',
 'B-PLC',
 'B-PPL',
 'I-EVT',
 'I-FNB',
 'I-IND',
 'I-PLC',
 'I-PPL',
 'O']

## The Training with bert-base-indonesian-522M

Since I have already pre-trained the bert-base with indonesian Wikipedia, I want to try its performance for this task.

In [19]:
# Configure the model
model_args = NERArgs()
model_args.num_train_epochs = 5
model_args.train_batch_size = 32
model_args.evaluate_during_training = True
model_args.output_dir = '/output/indonlu/nerp/bert-base-indonesian-1.5G'
model_args.best_model_dir = f'{model_args.output_dir}/best_model'
model_args.overwrite_output_dir = True
model_args.fp16 = False
model_args.labels_list=labels
model_args.do_lower_case = True

In [20]:
model_args

NERArgs(adam_epsilon=1e-08, best_model_dir='/output/indonlu/nerp/bert-base-indonesian-1.5G/best_model', cache_dir='cache_dir/', config={}, custom_layer_parameters=[], custom_parameter_groups=[], dataloader_num_workers=78, do_lower_case=True, dynamic_quantize=False, early_stopping_consider_epochs=False, early_stopping_delta=0, early_stopping_metric='eval_loss', early_stopping_metric_minimize=True, early_stopping_patience=3, encoding=None, eval_batch_size=8, evaluate_during_training=True, evaluate_during_training_silent=True, evaluate_during_training_steps=2000, evaluate_during_training_verbose=False, fp16=False, gradient_accumulation_steps=1, learning_rate=4e-05, local_rank=-1, logging_steps=50, manual_seed=None, max_grad_norm=1.0, max_seq_length=128, model_name=None, model_type=None, multiprocessing_chunksize=500, n_gpu=1, no_cache=False, no_save=False, num_train_epochs=5, output_dir='/output/indonlu/nerp/bert-base-indonesian-1.5G', overwrite_output_dir=True, process_count=78, quantize

In [21]:
model_bert_base = NERModel(
    #"bert", "cahya/bert-base-indonesian-522M", labels=labels, args=model_args
    "bert", "/output/bert-id-100/bert-base-indonesian-1.5G", labels=labels, args=model_args
)

Some weights of the model checkpoint at /output/bert-id-100/bert-base-indonesian-1.5G were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at /output/ber

In [37]:
# Train the model
model_bert_base.train_model(train_data_2, eval_data=valid_data_2)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=7560.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Epoch', max=5.0, style=ProgressStyle(description_width='i…

INFO:simpletransformers.ner.ner_model:   Starting fine-tuning.


HBox(children=(FloatProgress(value=0.0, description='Running Epoch 0 of 5', max=237.0, style=ProgressStyle(des…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=840.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=105.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 1 of 5', max=237.0, style=ProgressStyle(des…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=840.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=105.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 2 of 5', max=237.0, style=ProgressStyle(des…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=840.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=105.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 3 of 5', max=237.0, style=ProgressStyle(des…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=840.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=105.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 4 of 5', max=237.0, style=ProgressStyle(des…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=840.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=105.0, style=ProgressStyle(descr…





INFO:simpletransformers.ner.ner_model: Training of bert model complete. Saved to /output/indonlu/nerp/bert-base-indonesian-1.5G.


In [38]:
# Evaluate the model with valid dataset
result, model_outputs, preds_list = model_bert_base.eval_model(valid_data_2)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=840.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=105.0, style=ProgressStyle(descr…




INFO:simpletransformers.ner.ner_model:{'eval_loss': 0.010946499614320124, 'precision': 0.9576365663322185, 'recall': 0.973371104815864, 'f1_score': 0.9654397302613092}


In [37]:
# Evaluate the model with valid dataset
result, model_outputs, preds_list = model_bert_base.eval_model(valid_data_2)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=840.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=105.0, style=ProgressStyle(descr…




INFO:simpletransformers.ner.ner_model:{'eval_loss': 0.15242013360284978, 'precision': 0.8164731896075179, 'recall': 0.8368271954674221, 'f1_score': 0.8265249020705091}


In [42]:
# Evaluate the model with test dataset
result, model_outputs, preds_list = model_bert_base.eval_model(test_data_2)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=840.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=105.0, style=ProgressStyle(descr…




INFO:simpletransformers.ner.ner_model:{'eval_loss': 1.5195813320931935, 'precision': 0.0, 'recall': 0, 'f1_score': 0}


In [25]:
test_data_2.iloc[0:30]

Unnamed: 0,sentence_id,words,labels
0,0,kuasa,O
1,0,hukum,O
2,0,teamster,O
3,0,berasal,O
4,0,dari,O
5,0,edmonton,O
6,0,",",O
7,0,namun,O
8,0,tinggal,O
9,0,di,O


In [26]:
preds_list

[['O',
  'O',
  'B-PPL',
  'O',
  'O',
  'B-PLC',
  'O',
  'O',
  'O',
  'O',
  'O',
  'B-PLC',
  'O',
  'O',
  'O',
  'O'],
 ['O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'B-PLC',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O'],
 ['O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'B-PPL',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O'],
 ['O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'B-PPL',
  'O'],
 ['O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O']

In [44]:
len(preds_list)

840

In [45]:
def print_result(preds_list, test_data, max_len=10):
    for i in range(len(preds_list)):
        if i>max_len:
            break
        sentence = list(test_data[test_data['sentence_id']==i]['words'])
        for j, word in enumerate(sentence):
            print(f'{i}:{word}\t{preds_list[i][j]}')

def save_result(preds_list, test_data, filename):
    with open(filename, 'w') as out_file:
        out_file.write(f'index,label\n')
        index = 0
        for i in range(len(preds_list)):
            label = str(preds_list[i])
            out_file.write(f'{index},"{label}"\n')
            index += 1


In [46]:
#output_dir = "/output/indonlu/nerp"
output_fn = f'{model_args.output_dir}/result.txt'

In [47]:
print_result(preds_list, test_data_2, 6)

0:kuasa	O
0:hukum	O
0:teamster	B-PPL
0:berasal	O
0:dari	O
0:edmonton	B-PLC
0:,	O
0:namun	O
0:tinggal	O
0:di	O
0:sekitar	O
0:vancouver	B-PLC
0:sejak	O
0:tahun	O
0:1991	O
0:.	O
1:data	O
1:diurutkan	O
1:berdasarkan	O
1:umur	O
1:,	O
1:jenis	O
1:kelamin	O
1:dan	O
1:metode	O
1:;	O
1:pertambahan	O
1:yang	O
1:ditandai	O
1:terlihat	O
1:di	O
1:antara	O
1:orang	O
1:skotlandia	B-PLC
1:berumur	O
1:dari	O
1:25	O
1:sampai	O
1:54	O
1:tahun	O
1:dengan	O
1:gantung	O
1:diri	O
1:meningkatkan	O
1:popularitas	O
1:.	O
2:"	O
2:urusan	O
2:dengan	O
2:atasannya	O
2:masuk	O
2:dalam	O
2:aturan	O
2:pelanggaran	O
2:uu	O
2:kepegawaian	O
2:dan	O
2:etika	O
2:.	O
2:presiden	O
2:sebagai	O
2:atasan	O
2:punya	O
2:hak	O
2:untuk	O
2:beri	O
2:sanksi,	O
2:"	O
2:kata	O
2:yesmil	B-PPL
2:,	O
2:saat	O
2:dihubungi	O
2:wartawan	O
2:,	O
2:jumat	O
2:(	O
2:6/4)	O
2:.	O
3:"	O
3:saya	O
3:pikir	O
3:itu	O
3:pembicaraan	O
3:publik	O
3:,	O
3:di	O
3:era	O
3:demokrasi	O
3:ini	O
3:kan	O
3:orang	O
3:bebas	O
3:berbicara	O
3:apa	O
3:saja	O
3:.	O
3

In [48]:
save_result(preds_list, test_data_2, output_fn)

In [49]:
!head $output_fn

index,label
0,"['O', 'O', 'B-PPL', 'O', 'O', 'B-PLC', 'O', 'O', 'O', 'O', 'O', 'B-PLC', 'O', 'O', 'O', 'O']"
1,"['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-PLC', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']"
2,"['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-PPL', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']"
3,"['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-PPL', 'O']"
4,"['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']"
5,"['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-PPL', 'O', 'O', 'O', 'O', 'B-IND', 'O', 'O', 'O', 'O']"
6,"['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-IND', 'O', 'O', '

In [50]:
result = pd.read_csv(output_fn, names=['index','label']).set_index('index')

In [51]:
result.iloc[0:20]

Unnamed: 0_level_0,label
index,Unnamed: 1_level_1
index,label
0,"['O', 'O', 'B-PPL', 'O', 'O', 'B-PLC', 'O', 'O..."
1,"['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', ..."
2,"['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', ..."
3,"['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', ..."
4,"['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', ..."
5,"['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', ..."
6,"['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', ..."
7,"['B-PLC', 'O', 'B-PLC', 'O', 'O', 'O', 'O', 'O..."
8,"['B-FNB', 'O', 'O', 'O', 'O', 'O', 'O', 'O', '..."


### Nergrit 2 (train+valid)

In [185]:
train_data_all = pd.concat([train_data_2, valid_data_2], ignore_index=True)

In [186]:
len(train_data_all), len(train_data_2), len(valid_data_2), 

(63193, 56210, 6983)

In [187]:
# Configure the model
model_args = NERArgs()
model_args.num_train_epochs = 5
model_args.train_batch_size = 32
model_args.evaluate_during_training = True
model_args.output_dir = '/output/indonlu/bert-base-indonesian'
model_args.best_model_dir = '/output/indonlu/bert-base-indonesian/best_model'
model_args.overwrite_output_dir = True
model_args.fp16 = False
model_args.labels_list=labels
model_args.do_lower_case = True

In [188]:
model_bert_base = NERModel(
    "bert", "cahya/bert-base-indonesian-522M", labels=labels, args=model_args
)

Some weights of the model checkpoint at cahya/bert-base-indonesian-522M were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at cahya/bert-base-indonesia

In [189]:
# Train the model
model_bert_base.train_model(train_data_2, eval_data=valid_data_2)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=1672.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Epoch', max=5.0, style=ProgressStyle(description_width='i…

HBox(children=(FloatProgress(value=0.0, description='Running Epoch 0 of 5', max=53.0, style=ProgressStyle(desc…






INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 1 of 5', max=53.0, style=ProgressStyle(desc…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 2 of 5', max=53.0, style=ProgressStyle(desc…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 3 of 5', max=53.0, style=ProgressStyle(desc…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 4 of 5', max=53.0, style=ProgressStyle(desc…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…





INFO:simpletransformers.ner.ner_model: Training of bert model complete. Saved to /output/indonlu/bert-base-indonesian.


In [190]:
# Evaluate the model with test dataset
result, model_outputs, preds_list = model_bert_base.eval_model(valid_data_2)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…

INFO:simpletransformers.ner.ner_model:{'eval_loss': 0.15711867857586453, 'precision': 0.7812041116005873, 'recall': 0.8036253776435045, 'f1_score': 0.7922561429635144}





In [191]:
# Evaluate the model with test dataset
result, model_outputs, preds_list = model_bert_base.eval_model(test_data_2)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…

INFO:simpletransformers.ner.ner_model:{'eval_loss': 1.0206156019811277, 'precision': 0.0, 'recall': 0, 'f1_score': 0}





In [192]:
save_result(preds_list, test_data_2, output_fn)

In [28]:
# Configure the model
model_args = NERArgs()
model_args.num_train_epochs = 5
model_args.train_batch_size = 32
model_args.evaluate_during_training = True
model_args.output_dir = '/output/indonlu/bert-base-indonesian'
model_args.best_model_dir = '/output/indonlu/bert-base-indonesian/best_model'
model_args.overwrite_output_dir = True
model_args.fp16 = False
model_args.labels_list=labels
model_args.do_lower_case = True

In [31]:
model_bert_base = NERModel(
    "bert", "/output/indonlu/bert-base-indonesian/best_model",  args=model_args
)

In [32]:
# Evaluate the model with test dataset
result, model_outputs, preds_list = model_bert_base.eval_model(valid_data_2)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…




INFO:simpletransformers.ner.ner_model:{'eval_loss': 0.12883393300904167, 'precision': 0.7910662824207493, 'recall': 0.8293051359516617, 'f1_score': 0.8097345132743363}


In [33]:
# Evaluate the model with test dataset
result, model_outputs, preds_list = model_bert_base.eval_model(test_data_2)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…

INFO:simpletransformers.ner.ner_model:{'eval_loss': 0.7154410238619204, 'precision': 0.0, 'recall': 0, 'f1_score': 0}





In [34]:
save_result(preds_list, test_data_2, output_fn)

### Nergrit 1 vs Nergrit 2

In [41]:
# Configure the model
model_args = NERArgs()
model_args.num_train_epochs = 5
model_args.train_batch_size = 32
model_args.evaluate_during_training = True
model_args.output_dir = '/output/indonlu/bert-base-indonesian'
model_args.best_model_dir = '/output/indonlu/bert-base-indonesian/best_model'
model_args.overwrite_output_dir = True
model_args.fp16 = False
model_args.labels_list=labels
model_args.do_lower_case = True

In [42]:
model_bert_base = NERModel(
    "bert", "cahya/bert-base-indonesian-522M", labels=labels, args=model_args
)

Some weights of the model checkpoint at cahya/bert-base-indonesian-522M were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at cahya/bert-base-indonesia

In [43]:
# Train the model
model_bert_base.train_model(train_data_1, eval_data=valid_data_2)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=12514.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Epoch', max=5.0, style=ProgressStyle(description_width='i…

HBox(children=(FloatProgress(value=0.0, description='Running Epoch 0 of 5', max=392.0, style=ProgressStyle(des…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 1 of 5', max=392.0, style=ProgressStyle(des…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 2 of 5', max=392.0, style=ProgressStyle(des…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 3 of 5', max=392.0, style=ProgressStyle(des…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 4 of 5', max=392.0, style=ProgressStyle(des…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…





INFO:simpletransformers.ner.ner_model: Training of bert model complete. Saved to /output/indonlu/bert-base-indonesian.


In [44]:
# Evaluate the model with valid dataset
result, model_outputs, preds_list = model_bert_base.eval_model(valid_data_2)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…

INFO:simpletransformers.ner.ner_model:{'eval_loss': 0.3372342694136832, 'precision': 0.6912378303198887, 'recall': 0.7507552870090635, 'f1_score': 0.7197682838522811}





In [39]:
# Evaluate the model with valid dataset
result, model_outputs, preds_list = model_bert_base.eval_model(valid_data_2)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…

INFO:simpletransformers.ner.ner_model:{'eval_loss': 0.3447435959069817, 'precision': 0.6963276836158192, 'recall': 0.7447129909365559, 'f1_score': 0.7197080291970803}





In [None]:
# Evaluate the model with test dataset
result, model_outputs, preds_list = model_bert_base.eval_model(test_data)

In [None]:
train_data_3 = pd.concat([train_data_1, train_data_2], ignore_index=True)

### (Nergrit 1 +  Nergrit 2) vs Nergrit 2

In [45]:
train_data_3 = pd.concat([train_data_1, train_data_2], ignore_index=True)

In [46]:
len(train_data_3),len(train_data_1),len(train_data_2)

(365416, 309206, 56210)

In [47]:
train_data_1.head()

Unnamed: 0,sentence_id,words,labels
0,0,Indonesia,B-PLACE
1,0,mengekspor,O
2,0,produk,O
3,0,industri,O
4,0,skala,O


In [48]:
last_si = train_data_1.iloc[-1]['sentence_id']

In [55]:
train_data_tmp = train_data_2

In [63]:
train_data_tmp['sentence_id'] = 100

In [80]:
train_data_tmp.head()

Unnamed: 0,sentence_id,words,labels
0,12552,Kontribusinya,O
1,12552,terhadap,O
2,12552,industri,O
3,12552,musik,O
4,12552,telah,O


In [67]:
train_data_2.head()

Unnamed: 0,sentence_id,words,labels
0,0,Kontribusinya,O
1,0,terhadap,O
2,0,industri,O
3,0,musik,O
4,0,telah,O


In [79]:
for i, row in train_data_2.iterrows():
    train_data_tmp.loc[i, 'sentence_id'] = train_data_2.iloc[i]['sentence_id'] + last_si + 1
    #print(i, train_data_tmp.iloc[i]['sentence_id'] )

In [81]:
train_data_3 = pd.concat([train_data_1, train_data_tmp], ignore_index=True)

In [82]:
train_data_3.iloc[309200: 309220]

Unnamed: 0,sentence_id,words,labels
309200,12551,di,O
309201,12551,38,O
309202,12551,negara,O
309203,12551,bagian,O
309204,12551,lainnya,O
309205,12551,.,O
309206,12552,Kontribusinya,O
309207,12552,terhadap,O
309208,12552,industri,O
309209,12552,musik,O


In [83]:
model_bert_base = NERModel(
    "bert", "cahya/bert-base-indonesian-522M", labels=labels, args=model_args
)

Some weights of the model checkpoint at cahya/bert-base-indonesian-522M were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at cahya/bert-base-indonesia

In [84]:
# Train the model
model_bert_base.train_model(train_data_3, eval_data=valid_data_2)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=14186.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Epoch', max=5.0, style=ProgressStyle(description_width='i…

HBox(children=(FloatProgress(value=0.0, description='Running Epoch 0 of 5', max=444.0, style=ProgressStyle(des…






INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 1 of 5', max=444.0, style=ProgressStyle(des…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 2 of 5', max=444.0, style=ProgressStyle(des…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 3 of 5', max=444.0, style=ProgressStyle(des…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 4 of 5', max=444.0, style=ProgressStyle(des…

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…





INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…





INFO:simpletransformers.ner.ner_model: Training of bert model complete. Saved to /output/indonlu/bert-base-indonesian.


In [85]:
# Evaluate the model with valid dataset
result, model_outputs, preds_list = model_bert_base.eval_model(valid_data_2)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=209.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=27.0, style=ProgressStyle(descri…

INFO:simpletransformers.ner.ner_model:{'eval_loss': 0.1967056412939672, 'precision': 0.7793904208998549, 'recall': 0.8111782477341389, 'f1_score': 0.7949666913397484}





In [24]:
preds_list

[['B-PERSON', 'I-PERSON', 'O', 'O', 'O', 'O', 'B-ORGANISATION', 'O', 'O'],
 ['O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'B-PERSON',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'B-PERSON',
  'I-PERSON',
  'I-PERSON',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O'],
 ['O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'B-ORGANISATION',
  'I-ORGANISATION',
  'O',
  'O',
  'O',
  'B-PLACE',
  'O',
  'B-PERSON',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O'],
 ['O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  

In [33]:
list(test_data[test_data['sentence_id']==0]['words'])

['Joetata',
 'Hadihardaja',
 'dan',
 'dihadiri',
 'oleh',
 'Rektor',
 'Undip',
 'Prof',
 '.']

In [32]:
for i in range(len(preds_list)):
    sentence = list(test_data[test_data['sentence_id']==i]['words'])
    for j, word in enumerate(sentence):
        print(word, preds_list[i][j])
    if i>10:
        break

NameError: name 'test_data' is not defined

In [28]:
for i, row in test_data.iterrows():
    print(i, row['words'], preds_list[row['sentence_id']])
    
    
    for j in row['words']:
    if i>10:
        break

0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 1
11 1


The result (F1-score: 80.17 %) is quite similar with the F1-score NERGRIT has achieved (about 80%). 
Last week I got very low F1-score (about 60%), I was disappointed because it was much lower then the F1-score achieved by NERGRIT team. It turned out that the model was trained incorrectly, I trained the bert-base-indonesian-522M as if it is cased model (this is the default configuration). After I enabled the lowercase in the configuratin (model_args.do_lower_case = True), the F1-score is much better.


## The Training with xlm-roberta-base

I tried a multilanguage model from Facebook: XLM-Roberta-base which was pre-trained on 2.5TB of dataset.

In [19]:
# Configure the model
model_args = NERArgs()
model_args.num_train_epochs = 5
model_args.train_batch_size = 128
model_args.evaluate_during_training = True
model_args.output_dir = '/output/ner/xlm-roberta-base'
model_args.best_model_dir = '/output/ner/xlm-roberta-base/best_model'
model_args.overwrite_output_dir = True
model_args.fp16 = False
model_args.labels_list=labels

In [20]:
model_xlmroberta_base = NERModel(
    "xlmroberta", "xlm-roberta-base", labels=labels, args=model_args
)

Some weights of the model checkpoint at xlm-roberta-base were not used when initializing XLMRobertaForTokenClassification: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing XLMRobertaForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing XLMRobertaForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of XLMRobertaForTokenClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-st

In [21]:
# Train the model
model_xlmroberta_base.train_model(train_data, eval_data=valid_data)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=12514.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Epoch', max=5.0, style=ProgressStyle(description_width='i…

HBox(children=(FloatProgress(value=0.0, description='Running Epoch 0 of 5', max=98.0, style=ProgressStyle(desc…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=2520.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=315.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 1 of 5', max=98.0, style=ProgressStyle(desc…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=2520.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=315.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 2 of 5', max=98.0, style=ProgressStyle(desc…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=2520.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=315.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 3 of 5', max=98.0, style=ProgressStyle(desc…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=2520.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=315.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 4 of 5', max=98.0, style=ProgressStyle(desc…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=2520.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=315.0, style=ProgressStyle(descr…





INFO:simpletransformers.ner.ner_model: Training of xlmroberta model complete. Saved to /output/ner/xlm-roberta-base.


In [22]:
# Evaluate the model with valid dataset
result, model_outputs, preds_list = model_xlmroberta_base.eval_model(valid_data)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=2520.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=315.0, style=ProgressStyle(descr…




INFO:simpletransformers.ner.ner_model:{'eval_loss': 0.1866366565582298, 'precision': 0.8336475023562677, 'recall': 0.8462090408993064, 'f1_score': 0.8398813056379822}


In [23]:
# Evaluate the model with test dataset
result, model_outputs, preds_list = model_xlmroberta_base.eval_model(test_data)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=2397.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=300.0, style=ProgressStyle(descr…




INFO:simpletransformers.ner.ner_model:{'eval_loss': 0.18713697187486106, 'precision': 0.8234437975817286, 'recall': 0.8321077044914583, 'f1_score': 0.8277530808620787}


### Result

The result is great, F1-score 82.8%

## The Training with xlm-roberta-large

Then I tried a second multilanguage model from Facebook: XLM-Roberta-large

In [24]:
# Configure the model
model_args = NERArgs()
model_args.num_train_epochs = 5
model_args.train_batch_size = 32
model_args.evaluate_during_training = True
model_args.output_dir = '/output/ner/xlm-roberta-large'
model_args.best_model_dir = '/output/ner/xlm-roberta-large/best_model'
model_args.overwrite_output_dir = True
model_args.fp16 = False
model_args.labels_list=labels

In [25]:
model_xlmroberta_large = NERModel(
    "xlmroberta", "xlm-roberta-large", labels=labels, args=model_args
)

Some weights of the model checkpoint at xlm-roberta-large were not used when initializing XLMRobertaForTokenClassification: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing XLMRobertaForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing XLMRobertaForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of XLMRobertaForTokenClassification were not initialized from the model checkpoint at xlm-roberta-large and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-

In [26]:
# Train the model
model_xlmroberta_large.train_model(train_data, eval_data=valid_data)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=12514.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Epoch', max=5.0, style=ProgressStyle(description_width='i…

HBox(children=(FloatProgress(value=0.0, description='Running Epoch 0 of 5', max=392.0, style=ProgressStyle(des…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=2520.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=315.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 1 of 5', max=392.0, style=ProgressStyle(des…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=2520.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=315.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 2 of 5', max=392.0, style=ProgressStyle(des…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=2520.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=315.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 3 of 5', max=392.0, style=ProgressStyle(des…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=2520.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=315.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 4 of 5', max=392.0, style=ProgressStyle(des…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=2520.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=315.0, style=ProgressStyle(descr…





INFO:simpletransformers.ner.ner_model: Training of xlmroberta model complete. Saved to /output/ner/xlm-roberta-large.


In [27]:
# Evaluate the model with valid dataset
result, model_outputs, preds_list = model_xlmroberta_large.eval_model(valid_data)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=2520.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=315.0, style=ProgressStyle(descr…




INFO:simpletransformers.ner.ner_model:{'eval_loss': 0.21224965298888349, 'precision': 0.8447300165055411, 'recall': 0.8568524276488878, 'f1_score': 0.8507480408454049}


In [28]:
# Evaluate the model with test dataset
result, model_outputs, preds_list = model_xlmroberta_large.eval_model(test_data)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=2397.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=300.0, style=ProgressStyle(descr…




INFO:simpletransformers.ner.ner_model:{'eval_loss': 0.21842414552079087, 'precision': 0.836591086786552, 'recall': 0.8473809254440547, 'f1_score': 0.8419514388489209}


### Result

Again, the result is great, it achieved F1-score of 84.19%. It is 4 percent better than the bert-base-indonesian-522M. Maybe  my LM needs more data for pre-training

## Predict some Samples

In [29]:
# Make predictions with the model
texts = [
    "Gubernur Bank Indonesia Agus Martowardojo bersama jajaran deputi Gubernur Bank Indonesia menggelar konferensi pers usai Rapat Dewan Gubernur di Bank Indonesia, Jakarta, Kamis (17/5/2015)",
    "Selama 24 jam puncak Mahameru di Malang kebanjiran pendaki dari Wina",
]

In [30]:
predictions, raw_outputs = model_bert_base.predict(texts)
predictions

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=2.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Prediction', max=1.0, style=ProgressStyle(descrip…




[[{'Gubernur': 'B-NOR'},
  {'Bank': 'I-NOR'},
  {'Indonesia': 'I-NOR'},
  {'Agus': 'B-PER'},
  {'Martowardojo': 'I-PER'},
  {'bersama': 'O'},
  {'jajaran': 'O'},
  {'deputi': 'B-ORG'},
  {'Gubernur': 'I-ORG'},
  {'Bank': 'I-ORG'},
  {'Indonesia': 'I-ORG'},
  {'menggelar': 'O'},
  {'konferensi': 'B-EVT'},
  {'pers': 'I-EVT'},
  {'usai': 'O'},
  {'Rapat': 'B-EVT'},
  {'Dewan': 'I-EVT'},
  {'Gubernur': 'I-EVT'},
  {'di': 'O'},
  {'Bank': 'B-ORG'},
  {'Indonesia,': 'I-ORG'},
  {'Jakarta,': 'B-GPE'},
  {'Kamis': 'B-DAT'},
  {'(17/5/2015)': 'I-DAT'}],
 [{'Selama': 'O'},
  {'24': 'B-QTY'},
  {'jam': 'I-QTY'},
  {'puncak': 'B-LOC'},
  {'Mahameru': 'I-LOC'},
  {'di': 'O'},
  {'Malang': 'B-GPE'},
  {'kebanjiran': 'O'},
  {'pendaki': 'O'},
  {'dari': 'O'},
  {'Wina': 'B-GPE'}]]

In [31]:
predictions, raw_outputs = model_xlmroberta_base.predict(texts)
predictions

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=2.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Prediction', max=1.0, style=ProgressStyle(descrip…




[[{'Gubernur': 'B-NOR'},
  {'Bank': 'I-ORG'},
  {'Indonesia': 'I-ORG'},
  {'Agus': 'B-PER'},
  {'Martowardojo': 'I-PER'},
  {'bersama': 'O'},
  {'jajaran': 'O'},
  {'deputi': 'O'},
  {'Gubernur': 'I-NOR'},
  {'Bank': 'I-NOR'},
  {'Indonesia': 'I-NOR'},
  {'menggelar': 'O'},
  {'konferensi': 'B-EVT'},
  {'pers': 'I-EVT'},
  {'usai': 'O'},
  {'Rapat': 'B-EVT'},
  {'Dewan': 'I-EVT'},
  {'Gubernur': 'I-EVT'},
  {'di': 'O'},
  {'Bank': 'B-ORG'},
  {'Indonesia,': 'I-ORG'},
  {'Jakarta,': 'B-GPE'},
  {'Kamis': 'B-DAT'},
  {'(17/5/2015)': 'I-DAT'}],
 [{'Selama': 'O'},
  {'24': 'B-QTY'},
  {'jam': 'I-QTY'},
  {'puncak': 'B-LOC'},
  {'Mahameru': 'I-LOC'},
  {'di': 'O'},
  {'Malang': 'B-GPE'},
  {'kebanjiran': 'O'},
  {'pendaki': 'O'},
  {'dari': 'O'},
  {'Wina': 'B-GPE'}]]

In [32]:
predictions, raw_outputs = model_xlmroberta_large.predict(texts)
predictions

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=2.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Prediction', max=1.0, style=ProgressStyle(descrip…




[[{'Gubernur': 'B-ORG'},
  {'Bank': 'I-ORG'},
  {'Indonesia': 'I-ORG'},
  {'Agus': 'B-PER'},
  {'Martowardojo': 'I-PER'},
  {'bersama': 'O'},
  {'jajaran': 'O'},
  {'deputi': 'O'},
  {'Gubernur': 'B-ORG'},
  {'Bank': 'I-ORG'},
  {'Indonesia': 'I-ORG'},
  {'menggelar': 'O'},
  {'konferensi': 'B-EVT'},
  {'pers': 'I-EVT'},
  {'usai': 'O'},
  {'Rapat': 'B-EVT'},
  {'Dewan': 'I-EVT'},
  {'Gubernur': 'I-EVT'},
  {'di': 'O'},
  {'Bank': 'B-ORG'},
  {'Indonesia,': 'I-ORG'},
  {'Jakarta,': 'B-GPE'},
  {'Kamis': 'B-DAT'},
  {'(17/5/2015)': 'I-DAT'}],
 [{'Selama': 'O'},
  {'24': 'B-QTY'},
  {'jam': 'I-QTY'},
  {'puncak': 'B-LOC'},
  {'Mahameru': 'I-LOC'},
  {'di': 'O'},
  {'Malang': 'B-GPE'},
  {'kebanjiran': 'O'},
  {'pendaki': 'O'},
  {'dari': 'O'},
  {'Wina': 'B-GPE'}]]