# T-NER: Model Training Example
An example of using [T-NER](https://github.com/asahi417/tner) to finetune & evaluate language model on NER.

***Table of Contents***  
- [Finetuning & Evaluation on single dataset](https://colab.research.google.com/drive/1AlcTbEsp8W11yflT7SyT0L4C4HG6MXYr#scrollTo=23QyG8ypSILQ&line=2&uniqifier=1)
- [Finetuning & Evaluation on multiple datasets](https://colab.research.google.com/drive/1AlcTbEsp8W11yflT7SyT0L4C4HG6MXYr#scrollTo=L7R5qjXRdPWb&line=2&uniqifier=1)
- [Finetuning & Evaluation on a custom dataset](https://colab.research.google.com/drive/1AlcTbEsp8W11yflT7SyT0L4C4HG6MXYr#scrollTo=nB6i22foeCjV&line=1&uniqifier=1)

### Setup

In [3]:
# main package
%pip install tner -U
%pip list | grep tner

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tner
  Using cached tner-0.2.0.tar.gz (2.2 MB)
Collecting allennlp>=2.0.0
  Using cached allennlp-2.10.1-py3-none-any.whl (730 kB)
Collecting transformers
  Using cached transformers-4.24.0-py3-none-any.whl (5.5 MB)
Collecting sentencepiece
  Using cached sentencepiece-0.1.97-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
Collecting seqeval
  Using cached seqeval-1.2.2.tar.gz (43 kB)
Collecting datasets
  Using cached datasets-2.7.0-py3-none-any.whl (451 kB)
Collecting lmdb>=1.2.1
  Using cached lmdb-1.3.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (298 kB)
Collecting tensorboardX>=1.2
  Downloading tensorboardX-2.5.1-py2.py3-none-any.whl (125 kB)
[K     |████████████████████████████████| 125 kB 5.1 MB/s 
[?25hCollecting huggingface-hub>=0.0.16
  Downloading huggingface_hub-0.11.0-py3-none-any.whl (182 kB)
[K     |██████████████████████████

In [4]:
import logging
from tner import GridSearcher, TransformersNER

logging.basicConfig(format='%(asctime)s %(levelname)-8s %(message)s', level=logging.INFO, datefmt='%Y-%m-%d %H:%M:%S')
logger = logging.getLogger()
logger.setLevel(logging.INFO)

## Finetuning
Let's finetune `albert-base-v1` on `wnut2017`!


In [5]:
searcher = GridSearcher(
   checkpoint_dir='./ckpt_bert_tweebank',
   dataset="tner/tweebank_ner",  # either of `dataset` (huggingface dataset) or `local_dataset` (custom dataset) should be given
   model="roberta-large",  # language model to fine-tune
   epoch=10,  # the total epoch (`L` in the figure)
   epoch_partial=5,  # the number of epoch at 1st stage (`M` in the figure)
   n_max_config=1,  # the number of models to pass to 2nd stage (`K` in the figure)
   batch_size=32,
   gradient_accumulation_steps=[2],
   crf=[True],
   lr=[1e-3, 1e-4],
   weight_decay=[None], 
   random_seed=[42],
   lr_warmup_step_ratio=[0.1],
   max_grad_norm=[None, 10])

In [None]:
searcher.train()

INFO:root:INITIALIZE GRID SEARCHER: 4 configs to try
INFO:root:## 1st RUN: Configuration 0/4 ##
INFO:root:hyperparameters
INFO:root:	 * dataset: tner/tweebank_ner
INFO:root:	 * dataset_split: train
INFO:root:	 * dataset_name: None
INFO:root:	 * local_dataset: None
INFO:root:	 * model: roberta-large
INFO:root:	 * crf: True
INFO:root:	 * max_length: 128
INFO:root:	 * epoch: 10
INFO:root:	 * batch_size: 32
INFO:root:	 * lr: 0.001
INFO:root:	 * random_seed: 42
INFO:root:	 * gradient_accumulation_steps: 2
INFO:root:	 * weight_decay: None
INFO:root:	 * lr_warmup_step_ratio: 0.1
INFO:root:	 * max_grad_norm: None


Downloading builder script:   0%|          | 0.00/2.97k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/2.12k [00:00<?, ?B/s]

Downloading and preparing dataset tweebank_ner/tweebank_ner to /root/.cache/huggingface/datasets/tner___tweebank_ner/tweebank_ner/1.0.0/1a4677dc1bfe316a09cdbca887f9a7f1b1b01bc7dcd8c2cc234962a15af9f812...


Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/243k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/317k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/148k [00:00<?, ?B/s]

   

Extracting data files #0:   0%|          | 0/1 [00:00<?, ?obj/s]

Extracting data files #2:   0%|          | 0/1 [00:00<?, ?obj/s]

Extracting data files #1:   0%|          | 0/1 [00:00<?, ?obj/s]

Generating train split: 0 examples [00:00, ? examples/s]

INFO:datasets_modules.datasets.tner--tweebank_ner.1a4677dc1bfe316a09cdbca887f9a7f1b1b01bc7dcd8c2cc234962a15af9f812.tweebank_ner:generating examples from = /root/.cache/huggingface/datasets/downloads/0285d49e607834ff24ef0941f45ded26f90d8565d04e97e6786e102e3edc30f9


Generating validation split: 0 examples [00:00, ? examples/s]

INFO:datasets_modules.datasets.tner--tweebank_ner.1a4677dc1bfe316a09cdbca887f9a7f1b1b01bc7dcd8c2cc234962a15af9f812.tweebank_ner:generating examples from = /root/.cache/huggingface/datasets/downloads/67688699ff05bc940757d7c4dee5bc6e9701385931f42f6817cf813fcaf778e2


Generating test split: 0 examples [00:00, ? examples/s]

INFO:datasets_modules.datasets.tner--tweebank_ner.1a4677dc1bfe316a09cdbca887f9a7f1b1b01bc7dcd8c2cc234962a15af9f812.tweebank_ner:generating examples from = /root/.cache/huggingface/datasets/downloads/e2ea8b93def80d293ad509073b49ca65f984a5b572a99c9389406c9c351f8f5c


Dataset tweebank_ner downloaded and prepared to /root/.cache/huggingface/datasets/tner___tweebank_ner/tweebank_ner/1.0.0/1a4677dc1bfe316a09cdbca887f9a7f1b1b01bc7dcd8c2cc234962a15af9f812. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:initialize language model with `roberta-large`


Downloading:   0%|          | 0.00/482 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaForTokenClassification: ['lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForTokenClassification were not initialized from the model checkpoint at roberta-large and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be ab

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

INFO:root:dataset preprocessing
INFO:root:encode all the data: 1639
INFO:root:preprocessed feature is saved at ./ckpt_bert_tweebank/model_ubskso/cache/encoded_feature.pkl
INFO:root:start model training
INFO:root:	 * global step 50: loss: 975.18, lr: 0.001
INFO:root:[epoch 0/10] average loss: 959.42, lr: 0.001
INFO:root:model saving at ./ckpt_bert_tweebank/model_ubskso/epoch_1
INFO:root:saving model weight at ./ckpt_bert_tweebank/model_ubskso/epoch_1
INFO:root:saving tokenizer at ./ckpt_bert_tweebank/model_ubskso/epoch_1
INFO:root:optimizer saving at ./ckpt_bert_tweebank/model_ubskso/optimizers/optimizer.1.pt
INFO:root:remove old optimizer files
INFO:root:	 * global step 50: loss: 282.26, lr: 0.0008888888888888888
INFO:root:[epoch 1/10] average loss: 281.41, lr: 0.0008888888888888888
INFO:root:model saving at ./ckpt_bert_tweebank/model_ubskso/epoch_2
INFO:root:saving model weight at ./ckpt_bert_tweebank/model_ubskso/epoch_2
INFO:root:saving tokenizer at ./ckpt_bert_tweebank/model_ubskso

  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:initialize language model with `roberta-large`
Some weights of the model checkpoint at roberta-large were not used when initializing RobertaForTokenClassification: ['lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForTokenClassification were not initialized from the model checkpoint at roberta-large and are newly initialized: ['classifier.bias', 'classifier.weight']
You should

  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:initialize language model with `roberta-large`
Some weights of the model checkpoint at roberta-large were not used when initializing RobertaForTokenClassification: ['lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForTokenClassification were not initialized from the model checkpoint at roberta-large and are newly initialized: ['classifier.bias', 'classifier.weight']
You should

  0%|          | 0/3 [00:00<?, ?it/s]

INFO:root:initialize language model with `roberta-large`
Some weights of the model checkpoint at roberta-large were not used when initializing RobertaForTokenClassification: ['lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForTokenClassification were not initialized from the model checkpoint at roberta-large and are newly initialized: ['classifier.bias', 'classifier.weight']
You should

### Evaluation
Now the best model is stored at `ckpt_bert_wnut2017/best_model`, so let's load the model run evaluation on the test split.

First, we load the model and check the prediction.

In [None]:
model = TransformersNER("ckpt_bert_tweebank/best_model")
model.predict(["Jacob Collier is a Grammy awarded English artist from London"]) 

Then, the model instance has `evaluate` function where one can run evaluatino on the dataset easily.

In [None]:
metric = model.evaluate('tner/tweebank', dataset_split='test', batch_size=16)

In [None]:
metric

## Finetuning on multiple datasets
To finetune on multiple datasets, we need to give a list to the variable `dataset`.

In [None]:
searcher = GridSearcher(
   checkpoint_dir='./ckpt_bert_multiple_dataset',
   dataset=["tner/wnut2017", "tner/fin"],  # either of `dataset` (huggingface dataset) or `local_dataset` (custom dataset) should be given
   model="distilbert-base-cased",  # language model to fine-tune
   epoch=10,  # the total epoch (`L` in the figure)
   epoch_partial=5,  # the number of epoch at 1st stage (`M` in the figure)
   n_max_config=1,  # the number of models to pass to 2nd stage (`K` in the figure)
   batch_size=32,
   gradient_accumulation_steps=[2],
   crf=[True],
   lr=[1e-3, 1e-4],
   weight_decay=[None],
   random_seed=[42],
   lr_warmup_step_ratio=[0.1],
   max_grad_norm=[None, 10]
)
searcher.train()

In [None]:
model = TransformersNER("ckpt_bert_multiple_dataset/best_model")
metric_wnut = model.evaluate('tner/wnut2017', dataset_split='test', batch_size=16)
metric_fin = model.evaluate('tner/fin', dataset_split='test', batch_size=16)

In [None]:
metric_wnut

In [None]:
metric_fin

## Finetuning on a custom dataset
Finetuning on a [custom dataset](https://github.com/asahi417/tner/tree/master/examples/custom_dataset_sample).

In [None]:
!mkdir ./custom_data
!wget https://raw.githubusercontent.com/asahi417/tner/master/examples/local_dataset_sample/train.txt -O custom_data/train.txt
!wget https://raw.githubusercontent.com/asahi417/tner/master/examples/local_dataset_sample/valid.txt -O custom_data/valid.txt
!wget https://raw.githubusercontent.com/asahi417/tner/master/examples/local_dataset_sample/test.txt -O custom_data/test.txt

In [None]:
!head -n 5 custom_data/train.txt

In [None]:
local_dataset = {"train": "custom_data/train.txt", "validation": "custom_data/valid.txt", "test": "custom_data/test.txt"}

In [None]:
searcher = GridSearcher(
   checkpoint_dir='./ckpt_bert_custom_dataset',
   local_dataset=local_dataset,
   model="distilbert-base-cased",  # language model to fine-tune
   epoch=2,  # the total epoch (`L` in the figure)
   epoch_partial=1,  # the number of epoch at 1st stage (`M` in the figure)
   n_max_config=1,  # the number of models to pass to 2nd stage (`K` in the figure)
   batch_size=4,
   gradient_accumulation_steps=[1],
   crf=[True],
   lr=[1e-4],
   weight_decay=[None],
   random_seed=[42],
   lr_warmup_step_ratio=[0.1],
   max_grad_norm=[None, 10]
)
searcher.train()

In [None]:
model = TransformersNER("ckpt_bert_custom_dataset/best_model")
metric = model.evaluate(local_dataset=local_dataset, dataset_split='test', batch_size=16)

In [None]:
metric