<a href="https://colab.research.google.com/github/cateto/python4NLP/blob/main/ner/TNER_demo_(multi_lingual_NER).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T-NER: Multilingual NER Model
This notebook describes how to achieve an NER model in non-English language.
All we need to do is (i) choose appropriate dataset, and (ii) finetune multilingual language model.

### Multilingual NER datasets
First, you have to decide which dataset to finetune the language model depending on the target language. 

* [WikiAnn](https://huggingface.co/datasets/tner/wikiann): NER dataset in 282 languages where the source corpus comes from Wikipedia.
* [WikiNeural](https://huggingface.co/datasets/tner/wikineural): NER dataset in 9 languages where the source corpus comes from Wikipedia and WikiNews.

In this notebook, we finetune XLM-R on Japanese subset of WikiAnn to obtain Korean NER model.

### Setup

In [1]:
# main package
%pip install tner -U
%pip list | grep tner

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tner
  Downloading tner-0.2.1.tar.gz (2.2 MB)
[K     |████████████████████████████████| 2.2 MB 15.9 MB/s 
Collecting allennlp>=2.0.0
  Downloading allennlp-2.10.1-py3-none-any.whl (730 kB)
[K     |████████████████████████████████| 730 kB 67.8 MB/s 
[?25hCollecting transformers
  Downloading transformers-4.24.0-py3-none-any.whl (5.5 MB)
[K     |████████████████████████████████| 5.5 MB 53.9 MB/s 
[?25hCollecting sentencepiece
  Downloading sentencepiece-0.1.97-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[K     |████████████████████████████████| 1.3 MB 50.1 MB/s 
[?25hCollecting seqeval
  Downloading seqeval-1.2.2.tar.gz (43 kB)
[K     |████████████████████████████████| 43 kB 740 kB/s 
[?25hCollecting datasets
  Downloading datasets-2.7.1-py3-none-any.whl (451 kB)
[K     |████████████████████████████████| 451 kB 53.2 MB/s 
Collecting lmdb>=1.2.1
  

In [2]:
import logging
from tner import GridSearcher, TransformersNER

logging.basicConfig(format='%(asctime)s %(levelname)-8s %(message)s', level=logging.INFO, datefmt='%Y-%m-%d %H:%M:%S')
logger = logging.getLogger()
logger.setLevel(logging.INFO)

## WikiAnn Dataset

In [4]:
from datasets import load_dataset
data = load_dataset("tner/wikiann", "ko")

Downloading and preparing dataset wikiann/ko to /root/.cache/huggingface/datasets/tner___wikiann/ko/1.1.0/39367cc2fcf8467e7d7d81fdd2e3b5277c3b0c003bbe7f3e5e4895a41a141507...


Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/1.87M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/3.75M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.87M [00:00<?, ?B/s]

   

Extracting data files #0:   0%|          | 0/1 [00:00<?, ?obj/s]

Extracting data files #1:   0%|          | 0/1 [00:00<?, ?obj/s]

Extracting data files #2:   0%|          | 0/1 [00:00<?, ?obj/s]

Generating train split: 0 examples [00:00, ? examples/s]

INFO:datasets_modules.datasets.tner--wikiann.39367cc2fcf8467e7d7d81fdd2e3b5277c3b0c003bbe7f3e5e4895a41a141507.wikiann:generating examples from = /root/.cache/huggingface/datasets/downloads/9307960252b5660b36f727501b5a586ac3abebcbd9c21d4846f4115fcb5c1314


Generating validation split: 0 examples [00:00, ? examples/s]

INFO:datasets_modules.datasets.tner--wikiann.39367cc2fcf8467e7d7d81fdd2e3b5277c3b0c003bbe7f3e5e4895a41a141507.wikiann:generating examples from = /root/.cache/huggingface/datasets/downloads/d401e38603f8d9bb6a8acc1a09b2e41fe7bdc00b48b283e67380a6fa83839791


Generating test split: 0 examples [00:00, ? examples/s]

INFO:datasets_modules.datasets.tner--wikiann.39367cc2fcf8467e7d7d81fdd2e3b5277c3b0c003bbe7f3e5e4895a41a141507.wikiann:generating examples from = /root/.cache/huggingface/datasets/downloads/7982a5cb450b60f4bb790fe72ccee0e644894d423d1c670165d5d010794cdf84


Dataset wikiann downloaded and prepared to /root/.cache/huggingface/datasets/tner___wikiann/ko/1.1.0/39367cc2fcf8467e7d7d81fdd2e3b5277c3b0c003bbe7f3e5e4895a41a141507. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

In [5]:
data

DatasetDict({
    train: Dataset({
        features: ['tokens', 'tags'],
        num_rows: 20000
    })
    validation: Dataset({
        features: ['tokens', 'tags'],
        num_rows: 10000
    })
    test: Dataset({
        features: ['tokens', 'tags'],
        num_rows: 10000
    })
})

## Finetuning
Same as the [model finetuning example](https://colab.research.google.com/drive/1AlcTbEsp8W11yflT7SyT0L4C4HG6MXYr?usp=sharing).


In [6]:
rm -rf ckpt_xlmr_wikiann_ja

In [7]:
searcher = GridSearcher(
   checkpoint_dir='./ckpt_xlmr_wikiann_ko',
   dataset="tner/wikiann",  # either of `dataset` (huggingface dataset) or `local_dataset` (custom dataset) should be given
   dataset_name='ko',
   model="xlm-roberta-base",  # language model to fine-tune
   epoch=10,  # the total epoch (`L` in the figure)
   epoch_partial=5,  # the number of epoch at 1st stage (`M` in the figure)
   n_max_config=1,  # the number of models to pass to 2nd stage (`K` in the figure)
   batch_size=32,
   gradient_accumulation_steps=[2],
   crf=[True],
   lr=[1e-5],
   weight_decay=[None],
   random_seed=[42],
   lr_warmup_step_ratio=[0.1],
   max_grad_norm=[10]
)
searcher.train()

INFO:root:INITIALIZE GRID SEARCHER: 1 configs to try
INFO:root:## 1st RUN: Configuration 0/1 ##
INFO:root:hyperparameters
INFO:root:	 * dataset: tner/wikiann
INFO:root:	 * dataset_split: train
INFO:root:	 * dataset_name: ko
INFO:root:	 * local_dataset: None
INFO:root:	 * model: xlm-roberta-base
INFO:root:	 * crf: True
INFO:root:	 * max_length: 128
INFO:root:	 * epoch: 10
INFO:root:	 * batch_size: 32
INFO:root:	 * lr: 1e-05
INFO:root:	 * random_seed: 42
INFO:root:	 * gradient_accumulation_steps: 2
INFO:root:	 * weight_decay: None
INFO:root:	 * lr_warmup_step_ratio: 0.1
INFO:root:	 * max_grad_norm: 10


  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:initialize language model with `xlm-roberta-base`


Downloading:   0%|          | 0.00/615 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.04G [00:00<?, ?B/s]

Some weights of the model checkpoint at xlm-roberta-base were not used when initializing XLMRobertaForTokenClassification: ['lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing XLMRobertaForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of XLMRobertaForTokenClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-st

Downloading:   0%|          | 0.00/4.83M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/8.68M [00:00<?, ?B/s]

INFO:root:dataset preprocessing
INFO:root:encode all the data: 20000
INFO:root:preprocessed feature is saved at ./ckpt_xlmr_wikiann_ko/model_atiguy/cache/encoded_feature.pkl
INFO:root:start model training
INFO:root:	 * global step 50: loss: 1472.98, lr: 8.012820512820515e-07
INFO:root:	 * global step 100: loss: 1488.8, lr: 1.602564102564103e-06
INFO:root:	 * global step 150: loss: 1441.74, lr: 2.403846153846154e-06
INFO:root:	 * global step 200: loss: 1353.4, lr: 3.205128205128206e-06
INFO:root:	 * global step 250: loss: 1246.42, lr: 4.006410256410257e-06
INFO:root:	 * global step 300: loss: 1155.91, lr: 4.807692307692308e-06
INFO:root:	 * global step 350: loss: 1079.23, lr: 5.608974358974359e-06
INFO:root:	 * global step 400: loss: 1006.71, lr: 6.410256410256412e-06
INFO:root:	 * global step 450: loss: 943.57, lr: 7.211538461538462e-06
INFO:root:	 * global step 500: loss: 887.19, lr: 8.012820512820515e-06
INFO:root:	 * global step 550: loss: 838.33, lr: 8.814102564102565e-06
INFO:root

  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:encode all the data: 10000
INFO:root:preprocessed feature is saved at ./ckpt_xlmr_wikiann_ko/encoded/xlm-roberta-base.128.dev.True.validation.pkl

  0%|          | 0/625 [00:00<?, ?it/s][A
  0%|          | 1/625 [00:00<03:19,  3.12it/s][A
  0%|          | 2/625 [00:00<02:49,  3.68it/s][A
  0%|          | 3/625 [00:00<02:37,  3.95it/s][A
  1%|          | 4/625 [00:01<02:31,  4.11it/s][A
  1%|          | 5/625 [00:01<02:27,  4.21it/s][A
  1%|          | 6/625 [00:01<02:26,  4.24it/s][A
  1%|          | 7/625 [00:01<02:23,  4.30it/s][A
  1%|▏         | 8/625 [00:01<02:22,  4.34it/s][A
  1%|▏         | 9/625 [00:02<02:21,  4.35it/s][A
  2%|▏         | 10/625 [00:02<02:21,  4.36it/s][A
  2%|▏         | 11/625 [00:02<02:20,  4.36it/s][A
  2%|▏         | 12/625 [00:02<02:20,  4.36it/s][A
  2%|▏         | 13/625 [00:03<02:21,  4.34it/s][A
  2%|▏         | 14/625 [00:03<02:21,  4.32it/s][A
  2%|▏         | 15/625 [00:03<02:22,  4.29it/s][A
  3%|▎         | 16/625 [00:0

  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:dataset preprocessing
INFO:root:load optimizer from ./ckpt_xlmr_wikiann_ko/model_atiguy/optimizers/optimizer.5.pt
INFO:root:optimizer is loading on cuda
INFO:root:load scheduler from ./ckpt_xlmr_wikiann_ko/model_atiguy/optimizers/optimizer.5.pt
INFO:root:loading preprocessed feature from ./ckpt_xlmr_wikiann_ko/model_atiguy/cache/encoded_feature.pkl
INFO:root:start model training
INFO:root:	 * global step 50: loss: 115.73, lr: 5.4665242165242175e-06
INFO:root:	 * global step 100: loss: 118.82, lr: 5.377492877492878e-06
INFO:root:	 * global step 150: loss: 120.64, lr: 5.288461538461539e-06
INFO:root:	 * global step 200: loss: 125.28, lr: 5.1994301994302e-06
INFO:root:	 * global step 250: loss: 124.74, lr: 5.110398860398861e-06
INFO:root:	 * global step 300: loss: 124.4, lr: 5.021367521367522e-06
INFO:root:	 * global step 350: loss: 122.87, lr: 4.932336182336183e-06
INFO:root:	 * global step 400: loss: 121.74, lr: 4.8433048433048435e-06
INFO:root:	 * global step 450: loss: 119.9

  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:loading preprocessed feature from ./ckpt_xlmr_wikiann_ko/encoded/xlm-roberta-base.128.dev.True.validation.pkl

  0%|          | 0/625 [00:00<?, ?it/s][A
  0%|          | 1/625 [00:00<03:38,  2.85it/s][A
  0%|          | 2/625 [00:00<03:25,  3.03it/s][A
  0%|          | 3/625 [00:00<03:16,  3.16it/s][A
  1%|          | 4/625 [00:01<03:18,  3.13it/s][A
  1%|          | 5/625 [00:01<03:18,  3.12it/s][A
  1%|          | 6/625 [00:01<03:16,  3.16it/s][A
  1%|          | 7/625 [00:02<03:16,  3.15it/s][A
  1%|▏         | 8/625 [00:02<03:13,  3.19it/s][A
  1%|▏         | 9/625 [00:02<03:12,  3.21it/s][A
  2%|▏         | 10/625 [00:03<03:09,  3.24it/s][A
  2%|▏         | 11/625 [00:03<03:07,  3.28it/s][A
  2%|▏         | 12/625 [00:03<03:06,  3.28it/s][A
  2%|▏         | 13/625 [00:04<03:08,  3.24it/s][A
  2%|▏         | 14/625 [00:04<02:57,  3.45it/s][A
  2%|▏         | 15/625 [00:04<02:47,  3.65it/s][A
  3%|▎         | 16/625 [00:04<02:40,  3.80it/s][A
  3%|▎       

  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:loading preprocessed feature from ./ckpt_xlmr_wikiann_ko/encoded/xlm-roberta-base.128.dev.True.validation.pkl

  0%|          | 0/625 [00:00<?, ?it/s][A
  0%|          | 1/625 [00:00<02:58,  3.49it/s][A
  0%|          | 2/625 [00:00<02:41,  3.87it/s][A
  0%|          | 3/625 [00:00<02:36,  3.98it/s][A
  1%|          | 4/625 [00:01<02:34,  4.02it/s][A
  1%|          | 5/625 [00:01<02:31,  4.10it/s][A
  1%|          | 6/625 [00:01<02:28,  4.16it/s][A
  1%|          | 7/625 [00:01<02:28,  4.17it/s][A
  1%|▏         | 8/625 [00:01<02:25,  4.24it/s][A
  1%|▏         | 9/625 [00:02<02:26,  4.22it/s][A
  2%|▏         | 10/625 [00:02<02:24,  4.25it/s][A
  2%|▏         | 11/625 [00:02<02:24,  4.24it/s][A
  2%|▏         | 12/625 [00:02<02:25,  4.22it/s][A
  2%|▏         | 13/625 [00:03<02:28,  4.11it/s][A
  2%|▏         | 14/625 [00:03<02:26,  4.17it/s][A
  2%|▏         | 15/625 [00:03<02:25,  4.19it/s][A
  3%|▎         | 16/625 [00:03<02:25,  4.20it/s][A
  3%|▎       

  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:loading preprocessed feature from ./ckpt_xlmr_wikiann_ko/encoded/xlm-roberta-base.128.dev.True.validation.pkl

  0%|          | 0/625 [00:00<?, ?it/s][A
  0%|          | 1/625 [00:00<03:07,  3.33it/s][A
  0%|          | 2/625 [00:00<02:44,  3.79it/s][A
  0%|          | 3/625 [00:00<02:36,  3.98it/s][A
  1%|          | 4/625 [00:01<02:31,  4.10it/s][A
  1%|          | 5/625 [00:01<02:30,  4.11it/s][A
  1%|          | 6/625 [00:01<02:28,  4.17it/s][A
  1%|          | 7/625 [00:01<02:25,  4.23it/s][A
  1%|▏         | 8/625 [00:01<02:25,  4.23it/s][A
  1%|▏         | 9/625 [00:02<02:24,  4.27it/s][A
  2%|▏         | 10/625 [00:02<02:23,  4.30it/s][A
  2%|▏         | 11/625 [00:02<02:23,  4.29it/s][A
  2%|▏         | 12/625 [00:02<02:22,  4.30it/s][A
  2%|▏         | 13/625 [00:03<02:21,  4.31it/s][A
  2%|▏         | 14/625 [00:03<02:22,  4.29it/s][A
  2%|▏         | 15/625 [00:03<02:21,  4.30it/s][A
  3%|▎         | 16/625 [00:03<02:21,  4.31it/s][A
  3%|▎       

  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:loading preprocessed feature from ./ckpt_xlmr_wikiann_ko/encoded/xlm-roberta-base.128.dev.True.validation.pkl

  0%|          | 0/625 [00:00<?, ?it/s][A
  0%|          | 1/625 [00:00<03:03,  3.41it/s][A
  0%|          | 2/625 [00:00<02:42,  3.83it/s][A
  0%|          | 3/625 [00:00<02:35,  4.00it/s][A
  1%|          | 4/625 [00:01<02:35,  3.98it/s][A
  1%|          | 5/625 [00:01<02:32,  4.06it/s][A
  1%|          | 6/625 [00:01<02:29,  4.13it/s][A
  1%|          | 7/625 [00:01<02:27,  4.19it/s][A
  1%|▏         | 8/625 [00:01<02:27,  4.17it/s][A
  1%|▏         | 9/625 [00:02<02:26,  4.19it/s][A
  2%|▏         | 10/625 [00:02<02:27,  4.18it/s][A
  2%|▏         | 11/625 [00:02<02:25,  4.22it/s][A
  2%|▏         | 12/625 [00:02<02:24,  4.25it/s][A
  2%|▏         | 13/625 [00:03<02:24,  4.24it/s][A
  2%|▏         | 14/625 [00:03<02:23,  4.26it/s][A
  2%|▏         | 15/625 [00:03<02:21,  4.30it/s][A
  3%|▎         | 16/625 [00:03<02:22,  4.28it/s][A
  3%|▎       

  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:loading preprocessed feature from ./ckpt_xlmr_wikiann_ko/encoded/xlm-roberta-base.128.dev.True.validation.pkl

  0%|          | 0/625 [00:00<?, ?it/s][A
  0%|          | 1/625 [00:00<02:58,  3.49it/s][A
  0%|          | 2/625 [00:00<02:40,  3.88it/s][A
  0%|          | 3/625 [00:00<02:32,  4.07it/s][A
  1%|          | 4/625 [00:00<02:28,  4.19it/s][A
  1%|          | 5/625 [00:01<02:27,  4.21it/s][A
  1%|          | 6/625 [00:01<02:30,  4.10it/s][A
  1%|          | 7/625 [00:01<02:29,  4.13it/s][A
  1%|▏         | 8/625 [00:01<02:26,  4.21it/s][A
  1%|▏         | 9/625 [00:02<02:24,  4.25it/s][A
  2%|▏         | 10/625 [00:02<02:25,  4.23it/s][A
  2%|▏         | 11/625 [00:02<02:25,  4.23it/s][A
  2%|▏         | 12/625 [00:02<02:25,  4.23it/s][A
  2%|▏         | 13/625 [00:03<02:24,  4.23it/s][A
  2%|▏         | 14/625 [00:03<02:23,  4.25it/s][A
  2%|▏         | 15/625 [00:03<02:25,  4.20it/s][A
  3%|▎         | 16/625 [00:03<02:24,  4.22it/s][A
  3%|▎       

  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:loading preprocessed feature from ./ckpt_xlmr_wikiann_ko/encoded/xlm-roberta-base.128.dev.True.validation.pkl

  0%|          | 0/625 [00:00<?, ?it/s][A
  0%|          | 1/625 [00:00<02:58,  3.50it/s][A
  0%|          | 2/625 [00:00<02:38,  3.94it/s][A
  0%|          | 3/625 [00:00<02:32,  4.08it/s][A
  1%|          | 4/625 [00:00<02:29,  4.16it/s][A
  1%|          | 5/625 [00:01<02:26,  4.23it/s][A
  1%|          | 6/625 [00:01<02:25,  4.24it/s][A
  1%|          | 7/625 [00:01<02:24,  4.27it/s][A
  1%|▏         | 8/625 [00:01<02:25,  4.24it/s][A
  1%|▏         | 9/625 [00:02<02:24,  4.27it/s][A
  2%|▏         | 10/625 [00:02<02:23,  4.28it/s][A
  2%|▏         | 11/625 [00:02<02:22,  4.30it/s][A
  2%|▏         | 12/625 [00:02<02:24,  4.26it/s][A
  2%|▏         | 13/625 [00:03<02:23,  4.27it/s][A
  2%|▏         | 14/625 [00:03<02:24,  4.23it/s][A
  2%|▏         | 15/625 [00:03<02:24,  4.23it/s][A
  3%|▎         | 16/625 [00:03<02:23,  4.23it/s][A
  3%|▎       

  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:loading preprocessed feature from ./ckpt_xlmr_wikiann_ko/encoded/xlm-roberta-base.128.dev.True.validation.pkl

  0%|          | 0/625 [00:00<?, ?it/s][A
  0%|          | 1/625 [00:00<03:00,  3.45it/s][A
  0%|          | 2/625 [00:00<02:41,  3.86it/s][A
  0%|          | 3/625 [00:00<02:35,  4.01it/s][A
  1%|          | 4/625 [00:01<02:31,  4.09it/s][A
  1%|          | 5/625 [00:01<02:30,  4.11it/s][A
  1%|          | 6/625 [00:01<02:29,  4.14it/s][A
  1%|          | 7/625 [00:01<02:26,  4.21it/s][A
  1%|▏         | 8/625 [00:01<02:26,  4.20it/s][A
  1%|▏         | 9/625 [00:02<02:25,  4.22it/s][A
  2%|▏         | 10/625 [00:02<02:24,  4.25it/s][A
  2%|▏         | 11/625 [00:02<02:25,  4.21it/s][A
  2%|▏         | 12/625 [00:02<02:26,  4.19it/s][A
  2%|▏         | 13/625 [00:03<02:24,  4.23it/s][A
  2%|▏         | 14/625 [00:03<02:24,  4.23it/s][A
  2%|▏         | 15/625 [00:03<02:22,  4.27it/s][A
  3%|▎         | 16/625 [00:03<02:25,  4.19it/s][A
  3%|▎       

  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:loading preprocessed feature from ./ckpt_xlmr_wikiann_ko/encoded/xlm-roberta-base.128.dev.True.validation.pkl

  0%|          | 0/625 [00:00<?, ?it/s][A
  0%|          | 1/625 [00:00<02:56,  3.53it/s][A
  0%|          | 2/625 [00:00<02:42,  3.84it/s][A
  0%|          | 3/625 [00:00<02:35,  4.00it/s][A
  1%|          | 4/625 [00:01<02:32,  4.07it/s][A
  1%|          | 5/625 [00:01<02:28,  4.19it/s][A
  1%|          | 6/625 [00:01<02:29,  4.15it/s][A
  1%|          | 7/625 [00:01<02:26,  4.23it/s][A
  1%|▏         | 8/625 [00:01<02:26,  4.22it/s][A
  1%|▏         | 9/625 [00:02<02:24,  4.27it/s][A
  2%|▏         | 10/625 [00:02<02:22,  4.31it/s][A
  2%|▏         | 11/625 [00:02<02:22,  4.29it/s][A
  2%|▏         | 12/625 [00:02<02:22,  4.31it/s][A
  2%|▏         | 13/625 [00:03<02:24,  4.23it/s][A
  2%|▏         | 14/625 [00:03<02:24,  4.24it/s][A
  2%|▏         | 15/625 [00:03<02:23,  4.24it/s][A
  3%|▎         | 16/625 [00:03<02:23,  4.23it/s][A
  3%|▎       

  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:loading preprocessed feature from ./ckpt_xlmr_wikiann_ko/encoded/xlm-roberta-base.128.dev.True.validation.pkl

  0%|          | 0/625 [00:00<?, ?it/s][A
  0%|          | 1/625 [00:00<02:58,  3.49it/s][A
  0%|          | 2/625 [00:00<02:40,  3.87it/s][A
  0%|          | 3/625 [00:00<02:36,  3.99it/s][A
  1%|          | 4/625 [00:01<02:31,  4.10it/s][A
  1%|          | 5/625 [00:01<02:29,  4.14it/s][A
  1%|          | 6/625 [00:01<02:27,  4.21it/s][A
  1%|          | 7/625 [00:01<02:27,  4.19it/s][A
  1%|▏         | 8/625 [00:01<02:26,  4.22it/s][A
  1%|▏         | 9/625 [00:02<02:24,  4.25it/s][A
  2%|▏         | 10/625 [00:02<02:23,  4.27it/s][A
  2%|▏         | 11/625 [00:02<02:23,  4.28it/s][A
  2%|▏         | 12/625 [00:02<02:24,  4.25it/s][A
  2%|▏         | 13/625 [00:03<02:23,  4.27it/s][A
  2%|▏         | 14/625 [00:03<02:22,  4.29it/s][A
  2%|▏         | 15/625 [00:03<02:22,  4.29it/s][A
  3%|▎         | 16/625 [00:03<02:24,  4.22it/s][A
  3%|▎       

  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:dataset preprocessing
INFO:root:load optimizer from ./ckpt_xlmr_wikiann_ko/model_atiguy/optimizers/optimizer.10.pt
INFO:root:optimizer is loading on cuda
INFO:root:load scheduler from ./ckpt_xlmr_wikiann_ko/model_atiguy/optimizers/optimizer.10.pt
INFO:root:loading preprocessed feature from ./ckpt_xlmr_wikiann_ko/model_atiguy/cache/encoded_feature.pkl
INFO:root:start model training
INFO:root:	 * global step 50: loss: 81.99, lr: 9.291032696665588e-07
INFO:root:	 * global step 100: loss: 83.45, lr: 8.481709291032697e-07
INFO:root:	 * global step 150: loss: 85.97, lr: 7.672385885399807e-07
INFO:root:	 * global step 200: loss: 90.53, lr: 6.863062479766916e-07
INFO:root:	 * global step 250: loss: 89.9, lr: 6.053739074134024e-07
INFO:root:	 * global step 300: loss: 90.06, lr: 5.244415668501133e-07
INFO:root:	 * global step 350: loss: 88.67, lr: 4.4350922628682423e-07
INFO:root:	 * global step 400: loss: 87.63, lr: 3.6257688572353516e-07
INFO:root:	 * global step 450: loss: 86.57, lr

  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:loading preprocessed feature from ./ckpt_xlmr_wikiann_ko/encoded/xlm-roberta-base.128.dev.True.validation.pkl

  0%|          | 0/625 [00:00<?, ?it/s][A
  0%|          | 1/625 [00:00<02:58,  3.49it/s][A
  0%|          | 2/625 [00:00<02:48,  3.70it/s][A
  0%|          | 3/625 [00:00<02:40,  3.88it/s][A
  1%|          | 4/625 [00:01<02:33,  4.05it/s][A
  1%|          | 5/625 [00:01<02:30,  4.13it/s][A
  1%|          | 6/625 [00:01<02:28,  4.17it/s][A
  1%|          | 7/625 [00:01<02:27,  4.20it/s][A
  1%|▏         | 8/625 [00:01<02:24,  4.26it/s][A
  1%|▏         | 9/625 [00:02<02:24,  4.27it/s][A
  2%|▏         | 10/625 [00:02<02:24,  4.25it/s][A
  2%|▏         | 11/625 [00:02<02:24,  4.26it/s][A
  2%|▏         | 12/625 [00:02<02:24,  4.23it/s][A
  2%|▏         | 13/625 [00:03<02:24,  4.22it/s][A
  2%|▏         | 14/625 [00:03<02:23,  4.25it/s][A
  2%|▏         | 15/625 [00:03<02:23,  4.26it/s][A
  3%|▎         | 16/625 [00:03<02:21,  4.29it/s][A
  3%|▎       

### Evaluation
Now the best model is stored at `ckpt_xlmr_wikiann_ja/best_model`, so let's load the model run evaluation on the test split.

In [8]:
model = TransformersNER("ckpt_xlmr_wikiann_ko/best_model")

INFO:root:initialize language model with `ckpt_xlmr_wikiann_ko/best_model`
INFO:root:use CRF
INFO:root:loading pre-trained CRF layer
INFO:root:label2id: {'B-LOC': 0, 'B-ORG': 1, 'B-PER': 2, 'I-LOC': 3, 'I-ORG': 4, 'I-PER': 5, 'O': 6}
INFO:root:device   : cuda
INFO:root:gpus     : 1


***Check Prediction on English and Korean***

In [9]:
model.predict(["Jacob Collier is a Grammy awarded English artist from London"]) 

INFO:root:encode all the data: 1

100%|██████████| 1/1 [00:00<00:00, 15.50it/s]


{'prediction': [['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-LOC']],
 'probability': [[0.8120394349098206,
   0.82203608751297,
   0.9939196109771729,
   0.9850554466247559,
   0.8627651929855347,
   0.9818136692047119,
   0.9024280309677124,
   0.9622467160224915,
   0.9283892512321472,
   0.8458589315414429]],
 'input': [['Jacob',
   'Collier',
   'is',
   'a',
   'Grammy',
   'awarded',
   'English',
   'artist',
   'from',
   'London']],
 'entity_prediction': [[{'type': 'LOC',
    'entity': ['London'],
    'position': [9],
    'probability': [0.8458589315414429]}]]}

In [17]:
model.predict(["마시호, 방예담이 YG엔터테인먼트와 전속계약을 종료하며 그룹 트레저를 탈퇴한다."]) 

INFO:root:encode all the data: 1

100%|██████████| 1/1 [00:00<00:00, 13.32it/s]


{'prediction': [['B-PER', 'B-PER', 'B-ORG', 'O', 'O', 'B-ORG', 'I-ORG', 'O']],
 'probability': [[0.9780898094177246,
   0.9826644062995911,
   0.9855415225028992,
   0.9987155199050903,
   0.999717652797699,
   0.7492053508758545,
   0.5149374008178711,
   0.9996532201766968]],
 'input': [['마시호,',
   '방예담이',
   'YG엔터테인먼트와',
   '전속계약을',
   '종료하며',
   '그룹',
   '트레저를',
   '탈퇴한다.']],
 'entity_prediction': [[{'type': 'PER',
    'entity': ['마시호,'],
    'position': [0],
    'probability': [0.9780898094177246]},
   {'type': 'PER',
    'entity': ['방예담이'],
    'position': [1],
    'probability': [0.9826644062995911]},
   {'type': 'ORG',
    'entity': ['YG엔터테인먼트와'],
    'position': [2],
    'probability': [0.9855415225028992]},
   {'type': 'ORG',
    'entity': ['그룹', '트레저를'],
    'position': [5, 6],
    'probability': [0.7492053508758545, 0.5149374008178711]}]]}

***Run Evaluation on English and Japanse***

In [18]:
metric = model.evaluate('tner/wikiann', dataset_name='ko', dataset_split='test', batch_size=16)



  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:encode all the data: 10000

  0%|          | 0/625 [00:00<?, ?it/s][A
  0%|          | 1/625 [00:00<03:05,  3.37it/s][A
  0%|          | 2/625 [00:00<02:42,  3.82it/s][A
  0%|          | 3/625 [00:00<02:34,  4.02it/s][A
  1%|          | 4/625 [00:01<02:30,  4.13it/s][A
  1%|          | 5/625 [00:01<02:27,  4.20it/s][A
  1%|          | 6/625 [00:01<02:27,  4.19it/s][A
  1%|          | 7/625 [00:01<02:26,  4.23it/s][A
  1%|▏         | 8/625 [00:01<02:25,  4.23it/s][A
  1%|▏         | 9/625 [00:02<02:26,  4.22it/s][A
  2%|▏         | 10/625 [00:02<02:25,  4.23it/s][A
  2%|▏         | 11/625 [00:02<02:25,  4.23it/s][A
  2%|▏         | 12/625 [00:02<02:23,  4.26it/s][A
  2%|▏         | 13/625 [00:03<02:23,  4.27it/s][A
  2%|▏         | 14/625 [00:03<02:24,  4.24it/s][A
  2%|▏         | 15/625 [00:03<02:23,  4.25it/s][A
  3%|▎         | 16/625 [00:03<02:23,  4.25it/s][A
  3%|▎         | 17/625 [00:04<02:23,  4.24it/s][A
  3%|▎         | 18/625 [00:04<02:22,  4.25i

In [19]:
metric

{'micro/f1': 0.8426850964828247,
 'micro/f1_ci': {},
 'micro/recall': 0.8538445538376205,
 'micro/precision': 0.8318135764944276,
 'macro/f1': 0.8367769538923128,
 'macro/f1_ci': {},
 'macro/recall': 0.8475773798155887,
 'macro/precision': 0.8263112516228963,
 'per_entity_metric': {'location': {'f1': 0.8899313117775173,
   'f1_ci': {},
   'precision': 0.8732533289495314,
   'recall': 0.9072587532023911},
  'organization': {'f1': 0.7666051660516605,
   'f1_ci': {},
   'precision': 0.7636113025499656,
   'recall': 0.7696225978235702},
  'person': {'f1': 0.8537943838477605,
   'f1_ci': {},
   'precision': 0.842069123369192,
   'recall': 0.8658507884208049}}}

In [20]:
metric = model.evaluate('tner/wikiann', dataset_name='en', dataset_split='test', batch_size=16)

Downloading and preparing dataset wikiann/en to /root/.cache/huggingface/datasets/tner___wikiann/en/1.1.0/39367cc2fcf8467e7d7d81fdd2e3b5277c3b0c003bbe7f3e5e4895a41a141507...


Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/1.15M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/2.30M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.15M [00:00<?, ?B/s]

   

Extracting data files #2:   0%|          | 0/1 [00:00<?, ?obj/s]

Extracting data files #0:   0%|          | 0/1 [00:00<?, ?obj/s]

Extracting data files #1:   0%|          | 0/1 [00:00<?, ?obj/s]

Generating train split: 0 examples [00:00, ? examples/s]

INFO:datasets_modules.datasets.tner--wikiann.39367cc2fcf8467e7d7d81fdd2e3b5277c3b0c003bbe7f3e5e4895a41a141507.wikiann:generating examples from = /root/.cache/huggingface/datasets/downloads/06684d838cd2153caf9b3359675851a981cf63f49cfd4f2cceccbb3a273944f9


Generating validation split: 0 examples [00:00, ? examples/s]

INFO:datasets_modules.datasets.tner--wikiann.39367cc2fcf8467e7d7d81fdd2e3b5277c3b0c003bbe7f3e5e4895a41a141507.wikiann:generating examples from = /root/.cache/huggingface/datasets/downloads/c98d89737aa07dd67176256bc180161e0434b488175da1e0d846bc6fcb9045bd


Generating test split: 0 examples [00:00, ? examples/s]

INFO:datasets_modules.datasets.tner--wikiann.39367cc2fcf8467e7d7d81fdd2e3b5277c3b0c003bbe7f3e5e4895a41a141507.wikiann:generating examples from = /root/.cache/huggingface/datasets/downloads/c6577d8c7e841d7c5e2d8c6ec63c560f6f7eea64a458006137470b92287bcd99


Dataset wikiann downloaded and prepared to /root/.cache/huggingface/datasets/tner___wikiann/en/1.1.0/39367cc2fcf8467e7d7d81fdd2e3b5277c3b0c003bbe7f3e5e4895a41a141507. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:encode all the data: 10000

  0%|          | 0/625 [00:00<?, ?it/s][A
  0%|          | 1/625 [00:00<02:57,  3.52it/s][A
  0%|          | 2/625 [00:00<02:38,  3.92it/s][A
  0%|          | 3/625 [00:00<02:32,  4.07it/s][A
  1%|          | 4/625 [00:00<02:28,  4.18it/s][A
  1%|          | 5/625 [00:01<02:26,  4.22it/s][A
  1%|          | 6/625 [00:01<02:25,  4.25it/s][A
  1%|          | 7/625 [00:01<02:24,  4.27it/s][A
  1%|▏         | 8/625 [00:01<02:24,  4.26it/s][A
  1%|▏         | 9/625 [00:02<02:26,  4.21it/s][A
  2%|▏         | 10/625 [00:02<02:24,  4.25it/s][A
  2%|▏         | 11/625 [00:02<02:23,  4.27it/s][A
  2%|▏         | 12/625 [00:02<02:23,  4.28it/s][A
  2%|▏         | 13/625 [00:03<02:24,  4.25it/s][A
  2%|▏         | 14/625 [00:03<02:26,  4.18it/s][A
  2%|▏         | 15/625 [00:03<02:27,  4.12it/s][A
  3%|▎         | 16/625 [00:03<02:26,  4.15it/s][A
  3%|▎         | 17/625 [00:04<02:25,  4.19it/s][A
  3%|▎         | 18/625 [00:04<02:23,  4.22i

In [21]:
metric

{'micro/f1': 0.5561998055245435,
 'micro/f1_ci': {},
 'micro/recall': 0.5532311219372403,
 'micro/precision': 0.5592005213990876,
 'macro/f1': 0.5569730722665464,
 'macro/f1_ci': {},
 'macro/recall': 0.5544470506964195,
 'macro/precision': 0.5619927476287483,
 'per_entity_metric': {'location': {'f1': 0.42449166095499197,
   'f1_ci': {},
   'precision': 0.4535025628508665,
   'recall': 0.39896929353661154},
  'organization': {'f1': 0.5160841938046068,
   'f1_ci': {},
   'precision': 0.48789187159752206,
   'recall': 0.5477344573234985},
  'person': {'f1': 0.7303433620400404,
   'f1_ci': {},
   'precision': 0.7445838084378563,
   'recall': 0.7166374012291484}}}

In [24]:
!tar cvf ckpt_xlmr_wikiann_ko_best_model.tar ./ckpt_xlmr_wikiann_ko/best_model

./ckpt_xlmr_wikiann_ko/best_model/
./ckpt_xlmr_wikiann_ko/best_model/tokenizer.json
./ckpt_xlmr_wikiann_ko/best_model/pytorch_model.bin
./ckpt_xlmr_wikiann_ko/best_model/trainer_config.json
./ckpt_xlmr_wikiann_ko/best_model/tokenizer_config.json
./ckpt_xlmr_wikiann_ko/best_model/eval/
./ckpt_xlmr_wikiann_ko/best_model/eval/metric.json
./ckpt_xlmr_wikiann_ko/best_model/eval/prediction.validation.json
./ckpt_xlmr_wikiann_ko/best_model/config.json
./ckpt_xlmr_wikiann_ko/best_model/sentencepiece.bpe.model
./ckpt_xlmr_wikiann_ko/best_model/special_tokens_map.json
