# IndoBenchmark: KEPS

Named-entity recognition (NER) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc. [[Wikipedia: Named Entity Recognition]](https://en.wikipedia.org/wiki/Named-entity_recognition)

We will try to finetune the bert-base-indonesian-522M for the Named Entity Recognition (NER) task. For this purpose we will use the [NERGRIT Corpus](https://github.com/grit-id/nergrit-corpus) which contains 321.757 lines of train, 66.974 lines of test and 64.208 lines of valid dataset. It uses Inside-Outside-Beginning (IOB) format where each line is composed of a word and its label/category. 

The [NERGRIT Corpus](https://github.com/grit-id/nergrit-corpus) is a very valueable dataset for indonesian NLP researcher. Unfortunately there are many typos or errors on the labels, so I spent some times to analyse the errors, make corrections and report the [issue to their Github repository](https://github.com/grit-id/nergrit-corpus/issues/1). Since the license allows us to redistribute the dataset, I will also publish the original dataset including its corrections. Currently the dataset is only available per [request](https://ner.grit.id/index.php/front/about) (klick the "Get NERGRIT Corpus").


## Transformers or Simpletransformers?

We will use simpletransformers in this case to simplify the training and inferencing

In [1]:
from simpletransformers.ner import NERModel, NERArgs
import pandas as pd
import logging
import sys

In [2]:
logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

We use the corrected dataset which has less lines than the originals (train: 309203, valid: 61680, test: 64568)

In [3]:
data_dir = "/dataset/indonlu/keps_keyword-extraction-prosa"
file_train = f'{data_dir}/train_preprocess.txt'
file_valid = f'{data_dir}/valid_preprocess.txt'
file_test = f'{data_dir}/test_preprocess_masked_label.txt'
#file_labels_map = f'{data_dir}/labels-map.csv'

The Simpletransformers requires the dataset either as pandas dataframe with following column/format: **sentence_id**, **words**, **labels**, or Text file in CoNLL format. The **sentence_id** is consecutive number determines which words belong to a given sentence. I.e. the words from the same sequence should be assigned the same unique sentence_id.

In [4]:
# Function to read ner file in connl format and return a DataFrame with columns: sentence_id, words, labels
def get_pos_data(filename, labels_map=None):
    word_list = []
    sentence_counter = 0
    with open(filename) as fp:
        for cnt, line in enumerate(fp):
            try:
                texts = line.split()
                if len(texts) != 0:
                    word, label = ' '.join(texts[0:-1]), texts[-1]
                    if labels_map:
                        label = labels_map[label]
                    word_list.append([sentence_counter, word, label])
                else:
                    sentence_counter += 1
            except:
                print("Unexpected error:", sys.exc_info()[0], cnt, line)
                word_list.append([sentence_counter, "", ""])              
                sentence_counter += 1
                pass
    print(f'read {cnt} lines')
    ner_data = pd.DataFrame(word_list, columns=["sentence_id", "words", "labels"])
    return ner_data

In [5]:
train_data = get_pos_data(file_train)

read 10384 lines


In [6]:
valid_data = get_pos_data(file_valid)

read 2561 lines


In [7]:
test_data = get_pos_data(file_test)

read 3952 lines


In [8]:
len(train_data),len(valid_data),len(test_data)

(9585, 2362, 3706)

In [9]:
labels = list(set(train_data['labels']))

In [10]:
labels.sort()

In [11]:
labels

['B', 'I', 'O']

In [12]:
train_data.iloc[:50]

Unnamed: 0,sentence_id,words,labels
0,0,Setelah,O
1,0,melalui,B
2,0,proses,B
3,0,telepon,I
4,0,yang,O
5,0,panjang,O
6,0,tutup,B
7,0,sudah,O
8,0,kartu,B
9,0,kredit,I


In [13]:
valid_data.head(10)

Unnamed: 0,sentence_id,words,labels
0,0,Teller,B
1,0,BCA,I
2,0,konter,B
3,0,1,I
4,1,admin,O
5,1,@halobca,B
6,1,kok,O
7,1,susah,B
8,1,dihubungi,B
9,1,ya,O


In [14]:
test_data.head(10)

Unnamed: 0,sentence_id,words,labels
0,0,ini,O
1,0,atm,O
2,0,bca,O
3,0,ui,O
4,0,kenapa,O
5,0,enggak,O
6,0,bisa,O
7,0,menarik,O
8,0,duit,O
9,0,saya,O


## The Labels

The NERGRIT corpus contains 19 entities, each with Inside- and Beginning-Tag, plus an Outside-Tag. Alltogether become 39 categories. The entities have following meaning:
1. 'CRD' --> Cardinal
1. 'DAT' --> Date
1. 'EVT' --> Event
1. 'FAC' --> Facility
1. 'GPE' --> Geopolitical Entity
1. 'LAW' --> Law Entity (such as Undang-Undang)
1. 'LOC' --> Location
1. 'MON' --> Money
1. 'NOR' --> Political Organization
1. 'ORD' --> Ordinal
1. 'ORG' --> Organization
1. 'PER' --> Person
1. 'PRC' --> Percent
1. 'PRD' --> Product
1. 'QTY' --> Quantity
1. 'REG' --> Religion
1. 'TIM' --> Time
1. 'WOA' --> Work of Art
1. 'LAN' --> Language

## The Training with bert-base-indonesian-522M

Since I have already pre-trained the bert-base with indonesian Wikipedia, I want to try its performance for this task.

In [31]:
# Configure the model
model_args = NERArgs()
model_args.num_train_epochs = 5
model_args.train_batch_size = 64
model_args.evaluate_during_training = True
model_args.output_dir = '/output/indonlu/keps/bert-base-indonesian-1.5G'
model_args.best_model_dir = f'{model_args.output_dir}/best_model'
model_args.overwrite_output_dir = True
model_args.fp16 = False
model_args.labels_list=labels
model_args.do_lower_case = True

In [32]:
model_args

NERArgs(adam_epsilon=1e-08, best_model_dir='/output/indonlu/keps/bert-base-indonesian-1.5G/best_model', cache_dir='cache_dir/', config={}, custom_layer_parameters=[], custom_parameter_groups=[], dataloader_num_workers=78, do_lower_case=True, dynamic_quantize=False, early_stopping_consider_epochs=False, early_stopping_delta=0, early_stopping_metric='eval_loss', early_stopping_metric_minimize=True, early_stopping_patience=3, encoding=None, eval_batch_size=8, evaluate_during_training=True, evaluate_during_training_silent=True, evaluate_during_training_steps=2000, evaluate_during_training_verbose=False, evaluate_each_epoch=True, fp16=False, gradient_accumulation_steps=1, learning_rate=4e-05, local_rank=-1, logging_steps=50, manual_seed=None, max_grad_norm=1.0, max_seq_length=128, model_name=None, model_type=None, multiprocessing_chunksize=500, n_gpu=1, no_cache=False, no_save=False, num_train_epochs=5, output_dir='/output/indonlu/keps/bert-base-indonesian-1.5G', overwrite_output_dir=True, 

In [33]:
model_bert_base = NERModel(
    #"bert", "cahya/bert-base-indonesian-522M", labels=labels, args=model_args
    "bert", "cahya/bert-base-indonesian-1.5G", labels=labels, args=model_args
)

Some weights of the model checkpoint at cahya/bert-base-indonesian-1.5G were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at cahya/bert-base-indonesia

In [34]:
# Train the model
model_bert_base.train_model(train_data, eval_data=valid_data)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=800.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Epoch', max=5.0, style=ProgressStyle(description_width='i…

HBox(children=(FloatProgress(value=0.0, description='Running Epoch 0 of 5', max=13.0, style=ProgressStyle(desc…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=200.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=25.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 1 of 5', max=13.0, style=ProgressStyle(desc…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=200.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=25.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 2 of 5', max=13.0, style=ProgressStyle(desc…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=200.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=25.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 3 of 5', max=13.0, style=ProgressStyle(desc…






INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=200.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=25.0, style=ProgressStyle(descri…




HBox(children=(FloatProgress(value=0.0, description='Running Epoch 4 of 5', max=13.0, style=ProgressStyle(desc…




INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=200.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=25.0, style=ProgressStyle(descri…





INFO:simpletransformers.ner.ner_model: Training of bert model complete. Saved to /output/indonlu/keps/bert-base-indonesian-1.5G.


In [35]:
# Evaluate the model with valid dataset
result, model_outputs, preds_list = model_bert_base.eval_model(valid_data)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=200.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=25.0, style=ProgressStyle(descri…

INFO:simpletransformers.ner.ner_model:{'eval_loss': 0.40929289281368253, 'precision': 0.8018779342723005, 'recall': 0.8323586744639376, 'f1_score': 0.816834050693448}





In [36]:
# Evaluate the model with test dataset
result, model_outputs, preds_list = model_bert_base.eval_model(test_data)

INFO:simpletransformers.ner.ner_model: Converting to features started.


HBox(children=(FloatProgress(value=0.0, max=247.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, description='Running Evaluation', max=31.0, style=ProgressStyle(descri…

INFO:simpletransformers.ner.ner_model:{'eval_loss': 1.7482469870198158, 'precision': 0.0, 'recall': 0, 'f1_score': 0}





In [37]:
test_data.iloc[0:30]

Unnamed: 0,sentence_id,words,labels
0,0,ini,O
1,0,atm,O
2,0,bca,O
3,0,ui,O
4,0,kenapa,O
5,0,enggak,O
6,0,bisa,O
7,0,menarik,O
8,0,duit,O
9,0,saya,O


In [22]:
preds_list[:5]

[['O',
  'B',
  'I',
  'I',
  'O',
  'B',
  'I',
  'I',
  'O',
  'O',
  'O',
  'O',
  'O',
  'B',
  'I',
  'O'],
 ['O',
  'O',
  'O',
  'B',
  'I',
  'B',
  'I',
  'O',
  'B',
  'I',
  'O',
  'B',
  'O',
  'B',
  'O',
  'B',
  'I',
  'B',
  'I',
  'O',
  'O',
  'B',
  'I'],
 ['O', 'O', 'B', 'I', 'B', 'O', 'O', 'O', 'B', 'B', 'O', 'O', 'B', 'I'],
 ['O',
  'O',
  'O',
  'B',
  'O',
  'B',
  'I',
  'O',
  'O',
  'O',
  'O',
  'O',
  'O',
  'B',
  'I',
  'I',
  'B',
  'I'],
 ['O',
  'O',
  'B',
  'B',
  'B',
  'O',
  'B',
  'I',
  'I',
  'O',
  'I',
  'B',
  'I',
  'O',
  'O',
  'B',
  'O',
  'B',
  'I']]

In [41]:
len(preds_list)

247

In [42]:
def print_result(preds_list, test_data, max_len=10):
    for i in range(len(preds_list)):
        if i>max_len:
            break
        sentence = list(test_data[test_data['sentence_id']==i]['words'])
        for j, word in enumerate(sentence):
            print(f'{i}:{word}\t{preds_list[i][j]}')

def save_result(preds_list, test_data, filename):
    with open(filename, 'w') as out_file:
        out_file.write(f'index,label\n')
        index = 0
        for i in range(len(preds_list)):
            label = str(preds_list[i])
            out_file.write(f'{index},"{label}"\n')
            index += 1


In [43]:
#output_dir = "/output/indonlu/nerp"
output_fn = f'{model_args.output_dir}/pred.txt'

In [44]:
output_fn

'/output/indonlu/keps/bert-base-indonesian-1.5G/pred.txt'

In [45]:
print_result(preds_list, test_data, 6)

0:ini	O
0:atm	B
0:bca	I
0:ui	I
0:kenapa	O
0:enggak	B
0:bisa	I
0:menarik	I
0:duit	O
0:saya	O
0:lagi	O
0:buru-buru	O
0:mau	O
0:bayar	B
0:kosan	I
0:padahal	O
1:2	O
1:minggu	O
1:terakhir	O
1:bolak	O
1:balik	O
1:kcp	B
1:bca	B
1:gegara	I
1:isi	B
1:flazz	B
1:pakai	O
1:atm	B
1:sudah	O
1:terdebit	B
1:tetapi	O
1:saldo	B
1:flazz	O
1:enggak	B
1:menambah	I
1:sampai	O
1:sekarang	O
1:belum	O
1:kelar	I
2:kok	O
2:bisa-bisanya	O
2:atm	B
2:bca	I
2:error	B
2:sih	O
2:saya	O
2:sudah	O
2:masukkan	B
2:kartu	B
2:terus	O
2:atm-nya	B
2:enggak	B
2:jalan-jalan	I
3:lebih	O
3:baik	O
3:jangan	O
3:menukarkan	B
3:ke	O
3:atm	B
3:bca	I
3:yang	O
3:baru	O
3:kadang	O
3:di	O
3:tempat	O
3:yang	O
3:enggak	B
3:bisa	I
3:menerima	I
3:kartu	B
3:kredit	I
4:ketika	O
4:saya	O
4:transfer	B
4:menggunakan	O
4:atm	B
4:di	O
4:cabang	B
4:bca	I
4:siliwangi	I
4:setruk	B
4:kertasnya	I
4:tidak	B
4:keluar	I
4:sehingga	O
4:saya	O
4:tidak	B
4:mempunyai	I
4:bukti	I
4:transfer	I
5:mesin	O
5:atm	I
5:bca	I
5:di	O
5:alfamart	B
5:palopo	I
5:tidak	B
5:b

In [46]:
save_result(preds_list, test_data, output_fn)

In [None]:
!head $output_fn

In [None]:
result = pd.read_csv(output_fn, names=['index','label']).set_index('index')

In [None]:
result.iloc[0:20]

### Nergrit 2 (train+valid)

In [None]:
train_data_all = pd.concat([train_data, valid_data], ignore_index=True)

In [None]:
len(train_data_all), len(train_data), len(valid_data), 

In [None]:
# Configure the model
model_args = NERArgs()
model_args.num_train_epochs = 5
model_args.train_batch_size = 32
model_args.evaluate_during_training = True
model_args.output_dir = '/output/indonlu/posp/bert-base-indonesian-1.5G-all'
model_args.best_model_dir = f'{model_args.output_dir}/best_model'
model_args.overwrite_output_dir = True
model_args.fp16 = False
model_args.labels_list=labels
model_args.do_lower_case = True

In [None]:
model_bert_base = NERModel(
    "bert", "cahya/bert-base-indonesian-1.5G", labels=labels, args=model_args
)

In [None]:
# Train the model
model_bert_base.train_model(train_data_all, eval_data=valid_data)

In [None]:
# Evaluate the model with test dataset
result, model_outputs, preds_list = model_bert_base.eval_model(valid_data)

In [None]:
# Evaluate the model with test dataset
result, model_outputs, preds_list = model_bert_base.eval_model(test_data)

In [None]:
output_fn = f'{model_args.output_dir}/result.txt'

In [None]:
save_result(preds_list, test_data, output_fn)

In [None]:
output_fn

In [None]:
# Configure the model
model_args = NERArgs()
model_args.num_train_epochs = 5
model_args.train_batch_size = 32
model_args.evaluate_during_training = True
model_args.output_dir = '/output/indonlu/bert-base-indonesian'
model_args.best_model_dir = '/output/indonlu/bert-base-indonesian/best_model'
model_args.overwrite_output_dir = True
model_args.fp16 = False
model_args.labels_list=labels
model_args.do_lower_case = True

In [None]:
model_bert_base = NERModel(
    "bert", "/output/indonlu/bert-base-indonesian/best_model",  args=model_args
)

In [None]:
# Evaluate the model with test dataset
result, model_outputs, preds_list = model_bert_base.eval_model(valid_data_2)

In [None]:
# Evaluate the model with test dataset
result, model_outputs, preds_list = model_bert_base.eval_model(test_data_2)

In [None]:
save_result(preds_list, test_data_2, output_fn)

### Nergrit 1 vs Nergrit 2

In [None]:
# Configure the model
model_args = NERArgs()
model_args.num_train_epochs = 5
model_args.train_batch_size = 32
model_args.evaluate_during_training = True
model_args.output_dir = '/output/indonlu/bert-base-indonesian'
model_args.best_model_dir = '/output/indonlu/bert-base-indonesian/best_model'
model_args.overwrite_output_dir = True
model_args.fp16 = False
model_args.labels_list=labels
model_args.do_lower_case = True

In [None]:
model_bert_base = NERModel(
    "bert", "cahya/bert-base-indonesian-522M", labels=labels, args=model_args
)

In [None]:
# Train the model
model_bert_base.train_model(train_data_1, eval_data=valid_data_2)

In [None]:
# Evaluate the model with valid dataset
result, model_outputs, preds_list = model_bert_base.eval_model(valid_data_2)

In [None]:
# Evaluate the model with valid dataset
result, model_outputs, preds_list = model_bert_base.eval_model(valid_data_2)

In [None]:
# Evaluate the model with test dataset
result, model_outputs, preds_list = model_bert_base.eval_model(test_data)

In [None]:
train_data_3 = pd.concat([train_data_1, train_data_2], ignore_index=True)

### (Nergrit 1 +  Nergrit 2) vs Nergrit 2

In [None]:
train_data_3 = pd.concat([train_data_1, train_data_2], ignore_index=True)

In [None]:
len(train_data_3),len(train_data_1),len(train_data_2)

In [None]:
train_data_1.head()

In [None]:
last_si = train_data_1.iloc[-1]['sentence_id']

In [None]:
train_data_tmp = train_data_2

In [None]:
train_data_tmp['sentence_id'] = 100

In [None]:
train_data_tmp.head()

In [None]:
train_data_2.head()

In [None]:
for i, row in train_data_2.iterrows():
    train_data_tmp.loc[i, 'sentence_id'] = train_data_2.iloc[i]['sentence_id'] + last_si + 1
    #print(i, train_data_tmp.iloc[i]['sentence_id'] )

In [None]:
train_data_3 = pd.concat([train_data_1, train_data_tmp], ignore_index=True)

In [None]:
train_data_3.iloc[309200: 309220]

In [None]:
model_bert_base = NERModel(
    "bert", "cahya/bert-base-indonesian-522M", labels=labels, args=model_args
)

In [None]:
# Train the model
model_bert_base.train_model(train_data_3, eval_data=valid_data_2)

In [None]:
# Evaluate the model with valid dataset
result, model_outputs, preds_list = model_bert_base.eval_model(valid_data_2)

In [None]:
preds_list

In [None]:
list(test_data[test_data['sentence_id']==0]['words'])

In [None]:
for i in range(len(preds_list)):
    sentence = list(test_data[test_data['sentence_id']==i]['words'])
    for j, word in enumerate(sentence):
        print(word, preds_list[i][j])
    if i>10:
        break

In [None]:
for i, row in test_data.iterrows():
    print(i, row['words'], preds_list[row['sentence_id']])
    
    
    for j in row['words']:
    if i>10:
        break

The result (F1-score: 80.17 %) is quite similar with the F1-score NERGRIT has achieved (about 80%). 
Last week I got very low F1-score (about 60%), I was disappointed because it was much lower then the F1-score achieved by NERGRIT team. It turned out that the model was trained incorrectly, I trained the bert-base-indonesian-522M as if it is cased model (this is the default configuration). After I enabled the lowercase in the configuratin (model_args.do_lower_case = True), the F1-score is much better.


## The Training with xlm-roberta-base

I tried a multilanguage model from Facebook: XLM-Roberta-base which was pre-trained on 2.5TB of dataset.

In [None]:
# Configure the model
model_args = NERArgs()
model_args.num_train_epochs = 5
model_args.train_batch_size = 128
model_args.evaluate_during_training = True
model_args.output_dir = '/output/ner/xlm-roberta-base'
model_args.best_model_dir = '/output/ner/xlm-roberta-base/best_model'
model_args.overwrite_output_dir = True
model_args.fp16 = False
model_args.labels_list=labels

In [None]:
model_xlmroberta_base = NERModel(
    "xlmroberta", "xlm-roberta-base", labels=labels, args=model_args
)

In [None]:
# Train the model
model_xlmroberta_base.train_model(train_data, eval_data=valid_data)

In [None]:
# Evaluate the model with valid dataset
result, model_outputs, preds_list = model_xlmroberta_base.eval_model(valid_data)

In [None]:
# Evaluate the model with test dataset
result, model_outputs, preds_list = model_xlmroberta_base.eval_model(test_data)

### Result

The result is great, F1-score 82.8%

## The Training with xlm-roberta-large

Then I tried a second multilanguage model from Facebook: XLM-Roberta-large

In [None]:
# Configure the model
model_args = NERArgs()
model_args.num_train_epochs = 5
model_args.train_batch_size = 32
model_args.evaluate_during_training = True
model_args.output_dir = '/output/ner/xlm-roberta-large'
model_args.best_model_dir = '/output/ner/xlm-roberta-large/best_model'
model_args.overwrite_output_dir = True
model_args.fp16 = False
model_args.labels_list=labels

In [None]:
model_xlmroberta_large = NERModel(
    "xlmroberta", "xlm-roberta-large", labels=labels, args=model_args
)

In [None]:
# Train the model
model_xlmroberta_large.train_model(train_data, eval_data=valid_data)

In [None]:
# Evaluate the model with valid dataset
result, model_outputs, preds_list = model_xlmroberta_large.eval_model(valid_data)

In [None]:
# Evaluate the model with test dataset
result, model_outputs, preds_list = model_xlmroberta_large.eval_model(test_data)

### Result

Again, the result is great, it achieved F1-score of 84.19%. It is 4 percent better than the bert-base-indonesian-522M. Maybe  my LM needs more data for pre-training

## Predict some Samples

In [None]:
# Make predictions with the model
texts = [
    "Gubernur Bank Indonesia Agus Martowardojo bersama jajaran deputi Gubernur Bank Indonesia menggelar konferensi pers usai Rapat Dewan Gubernur di Bank Indonesia, Jakarta, Kamis (17/5/2015)",
    "Selama 24 jam puncak Mahameru di Malang kebanjiran pendaki dari Wina",
]

In [None]:
predictions, raw_outputs = model_bert_base.predict(texts)
predictions

In [None]:
predictions, raw_outputs = model_xlmroberta_base.predict(texts)
predictions

In [None]:
predictions, raw_outputs = model_xlmroberta_large.predict(texts)
predictions