# What Is This
This notebook shows how to use the utilities in the repo to quickly start a sequence labeling training. 
The utilities take care of alignment, padding, batching and windowing. 
For a walk through of the utiltiies see our [tutorial on sequence labeling with transformers](https://lighttag.io/blog/sequence-labeling-with-transformers/example). For the reasoning behind it see our semi-essay on the considerations of [aligning span annotations to Huggingface tokenizer outputs](https://www.lighttag.io/blog/sequence-labeling-with-transformers/) 

In [1]:
cd ..

In [8]:
from sequence_aligner.labelset import LabelSet
from sequence_aligner.dataset import  TrainingDataset
from sequence_aligner.containers import TrainingBatch
import json


In [5]:
## Load The Raw Data
raw = json.load(open('./data/ddi_train.json'))
for example in raw:
    for annotation in example['annotations']:
        #We expect the key of label to be label but the data has tag
        annotation['label'] = annotation['tag']

In [7]:
from transformers import BertTokenizerFast
tokenizer = BertTokenizerFast.from_pretrained('bert-base-cased')
label_set = LabelSet(labels=["drug"]) #Only one label in this dataset
dataset = TrainingDataset(data=raw,tokenizer=tokenizer,label_set=label_set)

In [12]:
from torch.utils.data import DataLoader
from transformers import BertForTokenClassification,AdamW
model = BertForTokenClassification.from_pretrained(
    "bert-base-cased", num_labels=len(dataset.label_set.ids_to_label.values())
)
optimizer = AdamW(model.parameters(), lr=5e-6)

dataloader = DataLoader(
    dataset,
    collate_fn=TrainingBatch,
    batch_size=4,
    shuffle=True,
)
for num, batch in enumerate(dataloader):
    loss, logits = model(
        input_ids=batch.input_ids,
        attention_mask=batch.attention_masks,
        labels=batch.labels,
    )
    loss.backward()
    optimizer.step()
    print(loss)
    if num > 3:
        break

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-cas

tensor(1.6381, grad_fn=<NllLossBackward>)
tensor(1.4514, grad_fn=<NllLossBackward>)
tensor(1.5203, grad_fn=<NllLossBackward>)
tensor(1.3982, grad_fn=<NllLossBackward>)
tensor(1.2953, grad_fn=<NllLossBackward>)
