## What is NER?
NER stands for Named Entity Recognition, which is a natural language processing technique used to identify and classify named entities (such as names of people, places, organizations, dates, and more) in text. Here's a breakdown of NER in five steps

<img src= "https://frenzy86.s3.eu-west-2.amazonaws.com/python/nlp/nlp2.1.png" width=1000>

<img src= "https://frenzy86.s3.eu-west-2.amazonaws.com/python/nlp/nlp2.2.png" width=1000>

**DEMO:**
https://demos.explosion.ai/displacy-ent

**Text Input:**
NER begins with a piece of text, which can be a sentence, a paragraph, a document, or even a larger corpus of text.

**Tokenization:**
The input text is split into individual words or tokens. This process is called tokenization, and it's a crucial step because NER works on a token-by-token basis.

**Entity Recognition:**       
The tokenized text is analyzed to identify spans of tokens that correspond to named entities. NER systems use various techniques, such as rule-based approaches, machine learning models (like conditional random fields or deep learning models), or a combination of these methods to recognize entities.

**Entity/Token Classification:**
Once the entities are recognized, they are classified into predefined categories like "person," "organization," "location," "date," "number," etc. These categories help organize and provide context to the recognized entities.

**Output:**
The final output of the NER process is a structured representation of the original text with identified named entities and their corresponding categories. This output can be used for various purposes like information extraction, content summarization, sentiment analysis, and more.

## IOB NER Tagging Format
- IOB tagging, which stands for Inside-Outside-Beginning tagging, is a common technique used in named entity recognition (NER) to label tokens in a text sequence to indicate their positions within named entities. It helps identify the starting, inside, and outside parts of entities within the text.

<img src= "https://frenzy86.s3.eu-west-2.amazonaws.com/python/nlp/nlp2.3.png" width=1000>

- IOB Tagging:
The tokens within a named entity are tagged as follows:

    - "B" (Beginning): The first token of an entity is tagged with "B" to indicate the beginning of the entity.
    - "I" (Inside): Tokens subsequent to the first token of an entity are tagged with "I" to indicate they are inside the entity.
    - "O" (Outside): Tokens that are not part of any named entity are tagged with "O" to indicate they are outside any entity.

- Tagging Format Article:
    - https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)

## What is Transormers?

- The Transformer architecture's key innovation is the mechanism of self-attention, which allows the model to weigh the importance of different words in a sentence relative to each other.
- This self-attention mechanism enables the model to capture contextual relationships between words regardless of their position in the input sequence.

<img src= "https://frenzy86.s3.eu-west-2.amazonaws.com/python/nlp/nlp2.4.jpg" width=1000>

## The Dataset
### CONLLPP Dataset

https://huggingface.co/datasets/conllpp

- CoNLLpp is a corrected version of the CoNLL2003 NER dataset where labels of 5.38% of the sentences in the test set have been manually corrected. The original CoNLL2003 dataset is available at https://www.clips.uantwerpen.be/conll2003/ner/.

```
{
    "tokens": ["SOCCER", "-", "JAPAN", "GET", "LUCKY", "WIN", ",", "CHINA", "IN", "SURPRISE", "DEFEAT", "."],
    "original_ner_tags_in_conll2003": ["O", "O", "B-LOC", "O", "O", "O", "O", "B-PER", "O", "O", "O", "O"],
    "corrected_ner_tags_in_conllpp": ["O", "O", "B-LOC", "O", "O", "O", "O", "B-LOC", "O", "O", "O", "O"],
}

```

```
{
    "chunk_tags": [11, 12, 12, 21, 13, 11, 11, 21, 13, 11, 12, 13, 11, 21, 22, 11, 12, 17, 11, 21, 17, 11, 12, 12, 21, 22, 22, 13, 11, 0],
    "id": "0",
    "ner_tags": [0, 3, 4, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    "pos_tags": [12, 22, 22, 38, 15, 22, 28, 38, 15, 16, 21, 35, 24, 35, 37, 16, 21, 15, 24, 41, 15, 16, 21, 21, 20, 37, 40, 35, 21, 7],
    "tokens": ["The", "European", "Commission", "said", "on", "Thursday", "it", "disagreed", "with", "German", "advice", "to", "consumers", "to", "shun", "British", "lamb", "until", "scientists", "determine", "whether", "mad", "cow", "disease", "can", "be", "transmitted", "to", "sheep", "."]
}

"B" (Beginning): The first token of an entity is tagged with "B" to indicate the beginning of the entity.
"I" (Inside): Tokens subsequent to the first token of an entity are tagged with "I" to indicate they are inside the entity.
"O" (Outside): Tokens that are not part of any named entity are tagged with "O" to indicate they are outside any entity.

```

In [1]:
!pip install -U transformers -q
!pip install -U accelerate -q
!pip install -U datasets -q
!pip install -U seqeval -q
!pip install -U evaluate -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.9/9.9 MB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.9/2.9 MB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m471.6/471.6 kB[0m [31m22.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.9/39.9 MB[0m [31m18.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver do

In [2]:
import pandas as pd
import numpy as np
from datasets import load_dataset

import warnings
warnings.filterwarnings('ignore')

data = load_dataset('conllpp')
data

conllpp.py:   0%|          | 0.00/8.73k [00:00<?, ?B/s]

README.md:   0%|          | 0.00/7.70k [00:00<?, ?B/s]

The repository for conllpp contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/conllpp.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y


Downloading data:   0%|          | 0.00/650k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/163k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/141k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/14041 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/3250 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/3453 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['id', 'tokens', 'pos_tags', 'chunk_tags', 'ner_tags'],
        num_rows: 14041
    })
    validation: Dataset({
        features: ['id', 'tokens', 'pos_tags', 'chunk_tags', 'ner_tags'],
        num_rows: 3250
    })
    test: Dataset({
        features: ['id', 'tokens', 'pos_tags', 'chunk_tags', 'ner_tags'],
        num_rows: 3453
    })
})

In [3]:
data['train'].features

{'id': Value(dtype='string', id=None),
 'tokens': Sequence(feature=Value(dtype='string', id=None), length=-1, id=None),
 'pos_tags': Sequence(feature=ClassLabel(names=['"', "''", '#', '$', '(', ')', ',', '.', ':', '``', 'CC', 'CD', 'DT', 'EX', 'FW', 'IN', 'JJ', 'JJR', 'JJS', 'LS', 'MD', 'NN', 'NNP', 'NNPS', 'NNS', 'NN|SYM', 'PDT', 'POS', 'PRP', 'PRP$', 'RB', 'RBR', 'RBS', 'RP', 'SYM', 'TO', 'UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'WDT', 'WP', 'WP$', 'WRB'], id=None), length=-1, id=None),
 'chunk_tags': Sequence(feature=ClassLabel(names=['O', 'B-ADJP', 'I-ADJP', 'B-ADVP', 'I-ADVP', 'B-CONJP', 'I-CONJP', 'B-INTJ', 'I-INTJ', 'B-LST', 'I-LST', 'B-NP', 'I-NP', 'B-PP', 'I-PP', 'B-PRT', 'I-PRT', 'B-SBAR', 'I-SBAR', 'B-UCP', 'I-UCP', 'B-VP', 'I-VP'], id=None), length=-1, id=None),
 'ner_tags': Sequence(feature=ClassLabel(names=['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC', 'B-MISC', 'I-MISC'], id=None), length=-1, id=None)}

In [6]:
pd.DataFrame(data['train'][:])[['tokens', 'ner_tags']].iloc[0]

Unnamed: 0,0
tokens,"[EU, rejects, German, call, to, boycott, Briti..."
ner_tags,"[3, 0, 7, 0, 0, 0, 7, 0, 0]"


In [7]:
tags = data['train'].features['ner_tags'].feature

index2tag = {idx:tag for idx, tag in enumerate(tags.names)}
tag2index = {tag:idx for idx, tag in enumerate(tags.names)}

In [8]:
index2tag

{0: 'O',
 1: 'B-PER',
 2: 'I-PER',
 3: 'B-ORG',
 4: 'I-ORG',
 5: 'B-LOC',
 6: 'I-LOC',
 7: 'B-MISC',
 8: 'I-MISC'}

In [9]:
tags.int2str(3)

'B-ORG'

In [10]:
def create_tag_names(batch):
    tag_name = {'ner_tags_str': [tags.int2str(idx) for idx in batch['ner_tags']]}
    return tag_name

In [11]:
data = data.map(create_tag_names)

Map:   0%|          | 0/14041 [00:00<?, ? examples/s]

Map:   0%|          | 0/3250 [00:00<?, ? examples/s]

Map:   0%|          | 0/3453 [00:00<?, ? examples/s]

In [12]:
pd.DataFrame(data['train'][:])[['tokens', 'ner_tags', 'ner_tags_str']].iloc[0]

Unnamed: 0,0
tokens,"[EU, rejects, German, call, to, boycott, Briti..."
ner_tags,"[3, 0, 7, 0, 0, 0, 7, 0, 0]"
ner_tags_str,"[B-ORG, O, B-MISC, O, O, O, B-MISC, O, O]"


## Model Building

### Tokenization for DistilBERT
- Transformer models like DistilBERT cannot receive raw strings as input; instead, they assume the text has been tokenized and encoded as numerical vectors.
- Tokenization is the step of breaking down a string into the atomic units used in the model

#### DistilBERT

DistilBERT is a smaller, faster and cheaper version of BERT. It has 40% smaller than BERT and runs 60% faster while preserving over 95% of BERT’s performance.

**Introduction to DistilBERT:** DistilBERT, short for "Distill and BERT," is a compact version of the renowned BERT (Bidirectional Encoder Representations from Transformers) model.

**Model Architecture:** It reduces the number of layers and attention heads, resulting in a smaller and faster model.

**Parameter Reduction:** One of DistilBERT's key features is its parameter reduction strategy, achieved by distillation. This involves training the model on a combination of teacher (BERT) and student (DistilBERT).

**Efficiency and Speed:** By reducing the model's size and complexity, DistilBERT achieves a significant speedup during both training and inference.

<img src= "https://frenzy86.s3.eu-west-2.amazonaws.com/python/nlp/nlp6.3.jpg" width=1000>

In [13]:
from transformers import AutoTokenizer

model_checkpoint = "distilbert-base-cased"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/465 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [14]:
tokenizer.is_fast

True

In [15]:
inputs = data['train'][0]['tokens']
inputs = tokenizer(inputs, is_split_into_words=True)
print(inputs.tokens())

['[CLS]', 'EU', 'rejects', 'German', 'call', 'to', 'boycott', 'British', 'la', '##mb', '.', '[SEP]']


In [16]:
print(data['train'][0]['tokens'])
print(data['train'][0]['ner_tags_str'])

['EU', 'rejects', 'German', 'call', 'to', 'boycott', 'British', 'lamb', '.']
['B-ORG', 'O', 'B-MISC', 'O', 'O', 'O', 'B-MISC', 'O', 'O']


In [17]:
inputs.word_ids()

[None, 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, None]

In [18]:
def align_labels_with_tokens(labels, word_ids):
    new_labels = []
    current_word=None
    for word_id in word_ids:
        if word_id != current_word:
            current_word = word_id
            label = -100 if word_id is None else labels[word_id]
            new_labels.append(label)
        elif word_id is None:
            new_labels.append(-100)

        else:
            label = labels[word_id]
            if label%2==1:
                label = label + 1
            new_labels.append(label)

    return new_labels

In [19]:
labels = data['train'][0]['ner_tags']
word_ids = inputs.word_ids()

print(labels, word_ids)

[3, 0, 7, 0, 0, 0, 7, 0, 0] [None, 0, 1, 2, 3, 4, 5, 6, 7, 7, 8, None]


In [20]:
align_labels_with_tokens(labels, word_ids)

[-100, 3, 0, 7, 0, 0, 0, 7, 0, 0, 0, -100]

In [21]:
def tokenize_and_align_labels(examples):
    tokenized_inputs = tokenizer(examples['tokens'], truncation=True, is_split_into_words=True)

    all_labels = examples['ner_tags']

    new_labels = []
    for i, labels in enumerate(all_labels):
        word_ids = tokenized_inputs.word_ids(i)
        new_labels.append(align_labels_with_tokens(labels, word_ids))

    tokenized_inputs['labels'] = new_labels
    return tokenized_inputs

In [22]:
tokenized_datasets = data.map(tokenize_and_align_labels, batched=True, remove_columns=data['train'].column_names)
tokenized_datasets

Map:   0%|          | 0/14041 [00:00<?, ? examples/s]

Map:   0%|          | 0/3250 [00:00<?, ? examples/s]

Map:   0%|          | 0/3453 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 14041
    })
    validation: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 3250
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 3453
    })
})

### Data Collation and Metrics

In [23]:
from transformers import DataCollatorForTokenClassification

data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)
batch = data_collator([tokenized_datasets['train'][i] for i in range(2)])
batch

{'input_ids': tensor([[  101,  7270, 22961,  1528,  1840,  1106, 21423,  1418,  2495, 12913,
           119,   102],
        [  101,  1943, 14428,   102,     0,     0,     0,     0,     0,     0,
             0,     0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]]), 'labels': tensor([[-100,    3,    0,    7,    0,    0,    0,    7,    0,    0,    0, -100],
        [-100,    1,    2, -100, -100, -100, -100, -100, -100, -100, -100, -100]])}

### Metrics

Seqeval is a Python framework for sequence labeling evaluation. seqeval can evaluate the performance of chunking tasks such as named-entity recognition, part-of-speech tagging, semantic role labeling and so on.

https://huggingface.co/spaces/evaluate-metric/seqeval

In [24]:
import evaluate

metric = evaluate.load('seqeval')
ner_feature = data['train'].features['ner_tags']
ner_feature

Downloading builder script:   0%|          | 0.00/6.34k [00:00<?, ?B/s]

Sequence(feature=ClassLabel(names=['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC', 'B-MISC', 'I-MISC'], id=None), length=-1, id=None)

In [25]:
label_names = ner_feature.feature.names
label_names

['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC', 'B-MISC', 'I-MISC']

In [26]:
labels = data['train'][0]['ner_tags']
labels = [label_names[i] for i in labels]
labels

['B-ORG', 'O', 'B-MISC', 'O', 'O', 'O', 'B-MISC', 'O', 'O']

In [27]:
predictions = labels.copy()
predictions[2] = "O"

metric.compute(predictions=[predictions], references=[labels])

{'MISC': {'precision': 1.0,
  'recall': 0.5,
  'f1': 0.6666666666666666,
  'number': 2},
 'ORG': {'precision': 1.0, 'recall': 1.0, 'f1': 1.0, 'number': 1},
 'overall_precision': 1.0,
 'overall_recall': 0.6666666666666666,
 'overall_f1': 0.8,
 'overall_accuracy': 0.8888888888888888}

In [28]:
def compute_metrics(eval_preds):
    logits, labels = eval_preds
    predictions = np.argmax(logits, axis=-1)
    true_labels = [[label_names[l] for l in label if l!=-100] for label in labels]
    true_predictions = [[label_names[p] for p,l in zip(prediction, label) if l!=-100]
                        for prediction, label in zip(predictions, labels)]
    all_metrics = metric.compute(predictions=true_predictions, references=true_labels)

    return {
            "precision": all_metrics['overall_precision'],
            "recall": all_metrics['overall_recall'],
            "f1": all_metrics['overall_f1'],
            "accuracy": all_metrics['overall_accuracy']
            }

### Model Training

In [29]:
id2label = {i:label for i, label in enumerate(label_names)}
label2id = {label:i for i, label in enumerate(label_names)}

print(id2label)

{0: 'O', 1: 'B-PER', 2: 'I-PER', 3: 'B-ORG', 4: 'I-ORG', 5: 'B-LOC', 6: 'I-LOC', 7: 'B-MISC', 8: 'I-MISC'}


In [30]:
from transformers import AutoModelForTokenClassification

model = AutoModelForTokenClassification.from_pretrained(
                                                        model_checkpoint,
                                                        id2label=id2label,
                                                        label2id=label2id,
                                                        )

model.safetensors:   0%|          | 0.00/263M [00:00<?, ?B/s]

Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [31]:
model.config.num_labels

9

This model is a fine-tuned version of distilbert-base-uncased on the conll2003 dataset. It achieves the following results on the evaluation set

https://huggingface.co/malduwais/distilbert-base-uncased-finetuned-ner

- Model Selection:
The code is working with a DistilBERT model, which is a lighter, faster version of BERT (Bidirectional Encoder Representations from Transformers). BERT models are powerful for understanding context in language.
- Task:
The model is being fine-tuned for Named Entity Recognition (NER), a task where the goal is to identify and classify named entities (like persons, organizations, locations) in text.
- Training Configuration:
The TrainingArguments set up the hyperparameters and conditions for the training process. This includes things like learning rate, number of epochs, and how often to evaluate and save the model.
- Dataset Preparation:
The code assumes that datasets have been prepared and tokenized earlier (tokenized_datasets). These datasets are split into training and validation sets.
- Training Pipeline:
The Trainer class from Hugging Face's transformers library is used to set up the entire training pipeline. This abstracts away much of the complexity of training deep learning models, handling things like:

- Batching the data
Moving data to the correct device (CPU/GPU)
Applying the forward pass
Computing the loss
Applying backpropagation
Updating the model's weights
Evaluating the model periodically
Saving checkpoints

In [32]:
from transformers import TrainingArguments, Trainer

NUM_EPOCHS = 3

args = TrainingArguments("distilbert-finetuned-ner",
                         evaluation_strategy = "epoch",
                         save_strategy="epoch",
                         learning_rate = 2e-5,
                         num_train_epochs=NUM_EPOCHS,
                         weight_decay=0.01
                         )


trainer = Trainer(model=model,
                  args=args,
                  train_dataset = tokenized_datasets['train'],
                  eval_dataset = tokenized_datasets['validation'],
                  data_collator=data_collator,
                  compute_metrics=compute_metrics,
                  tokenizer=tokenizer,
                  )

In [33]:
trainer.train()

Epoch,Training Loss,Validation Loss,Precision,Recall,F1,Accuracy
1,0.0917,0.087287,0.879028,0.913497,0.895931,0.976217
2,0.0449,0.079944,0.908866,0.928139,0.918401,0.980588
3,0.029,0.073198,0.912146,0.936553,0.924188,0.983222


TrainOutput(global_step=5268, training_loss=0.08033001893564165, metrics={'train_runtime': 323.7798, 'train_samples_per_second': 130.098, 'train_steps_per_second': 16.27, 'total_flos': 460431563935266.0, 'train_loss': 0.08033001893564165, 'epoch': 3.0})

In [34]:
from transformers import pipeline

checkpoint = "/content/distilbert-finetuned-ner/checkpoint-5268"

token_classifier = pipeline("token-classification",
                            model=checkpoint,
                            aggregation_strategy="simple",
                            )

token_classifier("My name is Daniele Grotti. I work at IFOA and I live in Bologna")

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


[{'entity_group': 'PER',
  'score': 0.9991215,
  'word': 'Daniele Grotti',
  'start': 11,
  'end': 25},
 {'entity_group': 'ORG',
  'score': 0.9986301,
  'word': 'IFOA',
  'start': 37,
  'end': 41},
 {'entity_group': 'LOC',
  'score': 0.99277836,
  'word': 'Bologna',
  'start': 56,
  'end': 63}]

In [35]:
!zip -r distilbert_ner.zip "/content/distilbert-finetuned-ner/checkpoint-5268"

  adding: content/distilbert-finetuned-ner/checkpoint-5268/ (stored 0%)
  adding: content/distilbert-finetuned-ner/checkpoint-5268/optimizer.pt (deflated 25%)
  adding: content/distilbert-finetuned-ner/checkpoint-5268/trainer_state.json (deflated 68%)
  adding: content/distilbert-finetuned-ner/checkpoint-5268/special_tokens_map.json (deflated 42%)
  adding: content/distilbert-finetuned-ner/checkpoint-5268/training_args.bin (deflated 51%)
  adding: content/distilbert-finetuned-ner/checkpoint-5268/config.json (deflated 51%)
  adding: content/distilbert-finetuned-ner/checkpoint-5268/model.safetensors (deflated 8%)
  adding: content/distilbert-finetuned-ner/checkpoint-5268/tokenizer.json (deflated 70%)
  adding: content/distilbert-finetuned-ner/checkpoint-5268/rng_state.pth (deflated 25%)
  adding: content/distilbert-finetuned-ner/checkpoint-5268/tokenizer_config.json (deflated 75%)
  adding: content/distilbert-finetuned-ner/checkpoint-5268/scheduler.pt (deflated 56%)
  adding: content/dis