## High-level overview

We can all agree researching products, platforms or services is painful and time taking.

I often find myself researching multiple sources (R.I.P chrome tabs) to see if something is reliable, worthy, and reasonable to buy and still feel like I am lost. Information is scattered, hard to find, and hard to sort through (multiple mentions of different products, mixed reviews) and get a clear summary.

Moreover, Misleading ads, SEO spam (best 5 product articles with the least research), and fake reviews (Affiliated links, sponsored content) can make the whole experience frustrating.

Reddit is one place where people leave honest reviews and discourage any fake ones. With over 52 million daily active users, Reddit provides a platform for people from all walks of life to share their experiences and opinions.

### The goal of this project is to help users find:
1) if a product or service is worth it via sentiment analysis. This helps with queries such as "Is regal unlimited subscription worth it?"
2) product mentions amongst posts and comments via named entity recognition (NER). This helps with queries such as "Best 4K TV to buy".

Please see the set_up_notebook.ipynb that addresses first problem - sentiment analysis.

### This notebook focuses on solving the second problem - Named Entity Recognition

Hugging Face's Transformers provides State-of-the-art Machine Learning for PyTorch, TensorFlow, and JAX.
Training an NLP model from scratch takes hundreds of hours.
Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models. Using pretrained models can reduce compute costs, carbon footprint, and save the time and resources required to train a model from scratch.


https://huggingface.co/models

Although we can get a pre-trained model for our task (NER), these pre-trained models are not trained on the product entities. We need to get a dataset that has product labels and fine tune the pre-trained model on product enntities to be able to predict product mentions from reddit.

Let's load libraries!

In [23]:
# Loading libraries

import os
from datasets import load_dataset, concatenate_datasets, load_metric
from transformers import AutoTokenizer, DataCollatorForTokenClassification, AutoModelForTokenClassification, TrainingArguments, Trainer, create_optimizer, AutoModel
from transformers import EarlyStoppingCallback
import evaluate
import numpy as np
import pandas as pd
#import tensorflow as tf
#from transformers.keras_callbacks import KerasMetricCallback
#from transformers.keras_callbacks import PushToHubCallback
from transformers import pipeline
import wandb


Let's get the dataset that has product labels from HugginFace Datasets library.

In [2]:
wnut = load_dataset("wnut_17")
wnut

Reusing dataset wnut_17 (C:\Users\avanjavakam\.cache\huggingface\datasets\wnut_17\wnut_17\1.0.0\077c7f08b8dbc800692e8c9186cdf3606d5849ab0e7be662e6135bb10eba54f9)


  0%|          | 0/3 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['id', 'tokens', 'ner_tags'],
        num_rows: 3394
    })
    validation: Dataset({
        features: ['id', 'tokens', 'ner_tags'],
        num_rows: 1009
    })
    test: Dataset({
        features: ['id', 'tokens', 'ner_tags'],
        num_rows: 1287
    })
})

## Description of Input Data

WNUT 17: Emerging and Rare entity recognition (https://huggingface.co/datasets/wnut_17)
This dataset was chosen among many available for the following reasons:
1) Reddit has many previously-unseen entities in the context of emerging discussions which is what this dataset focuses on
2) Size of the dataset is small but good enough to meet the project goals
3) Faster training (relative to other large datasets) and GPU is not required
4) It is trained on noisy user-generated text (similar to what's needed for this project)

train dataset has 3394 rows, validation dataset has 1009 rows and test dataset has 1287 rows.

Let us see pre-defined entities of the data. Product entities are of the most interest for this project.

In [47]:
label_list = wnut["train"].features[f"ner_tags"].feature.names
id2tag = {id: tag for id, tag in enumerate(label_list)}
id2tag

{0: 'O',
 1: 'B-corporation',
 2: 'I-corporation',
 3: 'B-creative-work',
 4: 'I-creative-work',
 5: 'B-group',
 6: 'I-group',
 7: 'B-location',
 8: 'I-location',
 9: 'B-person',
 10: 'I-person',
 11: 'B-product',
 12: 'I-product'}

In [46]:
label_list = wnut["test"].features[f"ner_tags"].feature.names
id2tag = {id: tag for id, tag in enumerate(label_list)}
id2tag

{0: 'O',
 1: 'B-corporation',
 2: 'I-corporation',
 3: 'B-creative-work',
 4: 'I-creative-work',
 5: 'B-group',
 6: 'I-group',
 7: 'B-location',
 8: 'I-location',
 9: 'B-person',
 10: 'I-person',
 11: 'B-product',
 12: 'I-product'}

Train data set is not large but transformers require a large data set.
To address this, train and validation sets are combined. Test data set will be untouched for evaluation.

In [4]:
# merge train & validation sets
train_dataset = concatenate_datasets([wnut["train"],wnut["validation"]])
train_dataset

Dataset({
    features: ['id', 'tokens', 'ner_tags'],
    num_rows: 4403
})

In [5]:
train_dataset[2]

{'id': '2',
 'tokens': ['Pxleyes',
  'Top',
  '50',
  'Photography',
  'Contest',
  'Pictures',
  'of',
  'August',
  '2010',
  '...',
  'http://bit.ly/bgCyZ0',
  '#photography'],
 'ner_tags': [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}

In [6]:
# looking at an example
ith_example=2

print(wnut["train"][ith_example]['tokens'])
print([id2tag[label] for label in train_dataset[ith_example]['ner_tags']])

['Pxleyes', 'Top', '50', 'Photography', 'Contest', 'Pictures', 'of', 'August', '2010', '...', 'http://bit.ly/bgCyZ0', '#photography']
['B-corporation', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']


Pxleyes is tagged as a corporation and the rest are 'O' which means they do not belong to any entity

## Strategy for solving the problem

1) First, a pre-trained model is needed from the Hugging Face.
    Encoder	models such as ALBERT, BERT, DistilBERT, ELECTRA, RoBERTa can be used for NER (also called token-classification).

    These models are really large. I chose DistilBERT for it's faster training while meeting the functionality.

    Distilbert aims to optimize the training by reducing the size of BERT and increase the speed of BERT — all while trying to retain as much performance as possible. Specifically, Distilbert is 40% smaller than the original BERT-base model, is 60% faster than it, and retains 97% of its functionality.
2) This model needs to be fine tuned with wnut dataset in order to predict product entities.
3) Evaluate how the model performs with test dataset based on accuracy (why this metric was chosen will be explained below)
4) Test with a few example comments from reddit
5) Push the model to Hugging Face hub so it can be used by the streamlit app for inference

## Discussion of the expected solution

For example, a user searches "Best 4K TV to buy" on the app, the model needs to identify the products that are mentioned in all the comments and posts on Reddit.

For example,
text="""Sony X90K is one of the best TVs ever if you are in a room that has a lot of sunlight. But if you game a lot, nothing can beat LG C2 OLED. If you are tight on budget, go with TCL 6 series"""

The model should tag Sony X90K, LG C2 OLED and TCL 6 Series as products.

## Metrics with justification

Accuracy can be priortized over F1 score for this classification task considering the context and consequences.

1) There are no siginificant consequences for false positives or false negatives.

2) F1 score improvement requires a very balanced dataset (no of entities and non-entities being roughly equal) which is not the case for this project where text will have extremely high non-entities and only a handful of entities. F1 score is influenced by recall, which can be significantly impacted by imbalanced datasets.

3) Accuracy is label-level metric that measures the overall correctness of the predicted labels compared to the true labels. It provides a straightforward and intuitive measure of how well the model predicts the entities as a whole. As the primary focus of this NER task is on overall correct labeling, accuracy is a suitable metric.

4) It is hard to find a good decision threshold (usually 0.5) to provide a better trade-off between precision and recall for this specific problem as an entity and a non-entity should both be predicted correctly.

5) Optimizing for F1 score requires annotated reddit dataset (fine tuning specifically for products) which I couldn't find and may involve manual labelling which is extremely expensive in terms of time and cost.

6)  Accurately recognizing non-entities (tokens that do not correspond to any entity) is equally important as identifying the correct entity spans. Accuracy considers both entity and non-entity predictions, providing an overall measure of the model's performance in recognizing both types



## EDA

Get the pre-trained distilbert model and it's tokenizer.

In [7]:
model_name = "distilbert-base-uncased"

tokenizer = AutoTokenizer.from_pretrained(model_name)

exploring how tokenizer behaves with an example

In [8]:
tokenized_input = tokenizer(wnut["train"][2]["tokens"], is_split_into_words=True)
tokens = tokenizer.convert_ids_to_tokens(tokenized_input["input_ids"])

This is how the original input looks

In [9]:
#input
print(wnut["train"][2]["tokens"])

['Pxleyes', 'Top', '50', 'Photography', 'Contest', 'Pictures', 'of', 'August', '2010', '...', 'http://bit.ly/bgCyZ0', '#photography']


Distilbert's tokenizer splits them into sub tokens. Special tokens CLS and SEP are added too. This results in a mismatch between the inputs and the labels.

In [10]:
#tokenized
print(tokens)

['[CLS]', 'p', '##xley', '##es', 'top', '50', 'photography', 'contest', 'pictures', 'of', 'august', '2010', '.', '.', '.', 'http', ':', '/', '/', 'bit', '.', 'l', '##y', '/', 'b', '##gc', '##y', '##z', '##0', '#', 'photography', '[SEP]']


## Data Preprocessing

Re-alignment of tokens and labels involves:
1. Each token is mapped to its tag
2. assign label -100 to unnecessary tokens. PyTorch ignores -100 value during loss calculation.
3. Sub tokens such as [‘p’, ‘##xley’, ‘##es’] should become [1,-100,-100] so we can re align.



In [11]:
def tokenize_and_align_labels(examples):
    tokenized_inputs = tokenizer(examples["tokens"], truncation=True, is_split_into_words=True)

    labels = []
    for i, label in enumerate(examples[f"ner_tags"]):
        word_ids = tokenized_inputs.word_ids(batch_index=i)  # Map tokens to their respective word.
        previous_word_idx = None
        label_ids = []
        for word_idx in word_ids:  # Set the special tokens to -100.
            if word_idx is None:
                label_ids.append(-100)
            elif word_idx != previous_word_idx:  # Only label the first token of a given word.
                label_ids.append(label[word_idx])
            else:
                label_ids.append(-100)
            previous_word_idx = word_idx
        labels.append(label_ids)

    tokenized_inputs["labels"] = labels
    return tokenized_inputs

In [12]:
# applying the custom token function
tokenized_wnut = wnut.map(tokenize_and_align_labels, batched=True)
tokenized_train_dataset = train_dataset.map(tokenize_and_align_labels, batched=True)



  0%|          | 0/4 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

  0%|          | 0/5 [00:00<?, ?ba/s]

In [13]:
id2tag[-100]='ignore'
exml=tokenized_train_dataset[2]

pd.DataFrame({'tokens':tokenizer.convert_ids_to_tokens(exml["input_ids"]), 'ner_labels':exml['labels'], 'ner_tags': [id2tag[label] for label in exml['labels']] })

Unnamed: 0,tokens,ner_labels,ner_tags
0,[CLS],-100,ignore
1,p,1,B-corporation
2,##xley,-100,ignore
3,##es,-100,ignore
4,top,0,O
5,50,0,O
6,photography,0,O
7,contest,0,O
8,pictures,0,O
9,of,0,O


## Modeling

Initial check:
Check how a simple baseline model performs. This model can just tag every token with the most frequent entity throughout the data which is O.

This resulted in ~59% accuracy.

In [22]:
from sklearn.dummy import DummyClassifier

dummy_clf = DummyClassifier(strategy="most_frequent")
dummy_clf.fit(pd.Series(tokenized_train_dataset['input_ids']).explode(), pd.Series(tokenized_train_dataset['labels']).explode().astype(str))
dummy_clf.score(pd.Series(tokenized_train_dataset['input_ids']).explode(), pd.Series(tokenized_train_dataset['labels']).explode().astype(str))

0.5888494815191806

The baseline classifier becomes less naive if we tag each token with the most frequent label of the sentence it belongs.
This resulted in 72% accuracy.

In [23]:
exploded_values=pd.Series(tokenized_train_dataset['labels']).explode()
exploded_values=pd.DataFrame(exploded_values,columns=['B'])

most_frequent_elem_by_doc=pd.Series(tokenized_train_dataset['labels']).apply(lambda x:  max(set(x), key=x.count))
most_frequent_elem_by_doc=pd.DataFrame(most_frequent_elem_by_doc,columns=list('A'))

df_most_freq_token=exploded_values.merge(most_frequent_elem_by_doc, how='right', left_index=True, right_index=True)

dummy_clf = DummyClassifier(strategy="most_frequent")
dummy_clf.fit(pd.Series(tokenized_train_dataset['input_ids']).explode(), df_most_freq_token['A'])
dummy_clf.score(pd.Series(tokenized_train_dataset['input_ids']).explode(), df_most_freq_token['A'])

0.7197897448947134

### Using DistilBERT for Named Entity Recognition

labels should be padded the exact same way as the inputs so that they stay the same size, using -100 as a value so that the corresponding predictions are ignored in the loss computation.

DataCollatorForTokenClassification helps with this padding.

In [14]:
# Data Collator
data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)

# Get model
model = AutoModelForTokenClassification.from_pretrained(model_name, num_labels=len(label_list))

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN t

a quick evaluation before we fine tune the model

In [27]:
metric_seqeval = load_metric("seqeval")
example = wnut["train"][2]

labels = [label_list[i] for i in example["ner_tags"]]
metric_seqeval.compute(predictions=[labels], references=[labels])

{'corporation': {'precision': 1.0, 'recall': 1.0, 'f1': 1.0, 'number': 1},
 'overall_precision': 1.0,
 'overall_recall': 1.0,
 'overall_f1': 1.0,
 'overall_accuracy': 1.0}

We get the precision, recall, and F1 score for each separate entity, as well as overall.
To get metrics on the validation set during training, we need to define the function that'll calculate the metric for us. This is very well-documented in their official docs.

In [39]:
def compute_metrics(p):
    predictions, labels = p
    predictions = np.argmax(predictions, axis=2)

    # Remove ignored index (special tokens)
    true_predictions = [
        [label_list[p] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]
    true_labels = [
        [label_list[l] for (p, l) in zip(prediction, label) if l != -100]
        for prediction, label in zip(predictions, labels)
    ]

    results = metric_seqeval.compute(predictions=true_predictions, references=true_labels)
    flattened_results = {
        "precision": results["overall_precision"],
        "recall": results["overall_recall"],
        "f1": results["overall_f1"],
        "accuracy": results["overall_accuracy"],
    }
    # adding entity level metric as well
    #for k in results.keys():
    #    if k not in flattened_results.keys():
    #        flattened_results[k+"_f1"]=results[k]["f1"]
    return flattened_results

In [59]:


training_args = TrainingArguments(
    output_dir='C:/Users/avanjavakam/OneDrive - Moulton Niguel Water/Documents/R_Home/distilbert_ner',
    #report_to="wandb",
    #run_name = "initial_run"
    num_train_epochs=5,
    learning_rate=2e-5,
    per_device_train_batch_size=16,   
    per_device_eval_batch_size=64,
    weight_decay=0.01,
    warmup_steps=500, 
    eval_steps=60,
    save_steps=60,
    evaluation_strategy="steps",
    load_best_model_at_end=True
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_wnut["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 6)]
)

trainer.train()
#wandb.finish()



  0%|          | 0/1380 [00:00<?, ?it/s]

You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


  0%|          | 0/21 [00:00<?, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))


{'eval_loss': 1.8291836977005005, 'eval_precision': 0.014814814814814815, 'eval_recall': 0.005560704355885079, 'eval_f1': 0.008086253369272238, 'eval_accuracy': 0.910820401008935, 'eval_runtime': 101.8258, 'eval_samples_per_second': 12.639, 'eval_steps_per_second': 0.206, 'epoch': 0.22}


  0%|          | 0/21 [00:00<?, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'eval_loss': 0.458038866519928, 'eval_precision': 0.0, 'eval_recall': 0.0, 'eval_f1': 0.0, 'eval_accuracy': 0.9256124150314223, 'eval_runtime': 111.7945, 'eval_samples_per_second': 11.512, 'eval_steps_per_second': 0.188, 'epoch': 0.43}


  0%|          | 0/21 [00:00<?, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'eval_loss': 0.3688420057296753, 'eval_precision': 0.0, 'eval_recall': 0.0, 'eval_f1': 0.0, 'eval_accuracy': 0.9256124150314223, 'eval_runtime': 108.5973, 'eval_samples_per_second': 11.851, 'eval_steps_per_second': 0.193, 'epoch': 0.65}


  0%|          | 0/21 [00:00<?, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))


{'eval_loss': 0.31115037202835083, 'eval_precision': 0.39375, 'eval_recall': 0.11677479147358666, 'eval_f1': 0.18012866333095068, 'eval_accuracy': 0.9325809071865248, 'eval_runtime': 105.4344, 'eval_samples_per_second': 12.207, 'eval_steps_per_second': 0.199, 'epoch': 0.87}


  0%|          | 0/21 [00:00<?, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))


{'eval_loss': 0.2867625951766968, 'eval_precision': 0.4480769230769231, 'eval_recall': 0.2159406858202039, 'eval_f1': 0.29143214509068166, 'eval_accuracy': 0.9368133042623231, 'eval_runtime': 107.3207, 'eval_samples_per_second': 11.992, 'eval_steps_per_second': 0.196, 'epoch': 1.09}


  0%|          | 0/21 [00:00<?, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))


{'eval_loss': 0.2620603144168854, 'eval_precision': 0.41881443298969073, 'eval_recall': 0.30120481927710846, 'eval_f1': 0.35040431266846367, 'eval_accuracy': 0.9385661151725022, 'eval_runtime': 108.3989, 'eval_samples_per_second': 11.873, 'eval_steps_per_second': 0.194, 'epoch': 1.3}


  0%|          | 0/21 [00:00<?, ?it/s]

{'eval_loss': 0.2517760097980499, 'eval_precision': 0.5279672578444747, 'eval_recall': 0.3586654309545876, 'eval_f1': 0.4271523178807947, 'eval_accuracy': 0.9431832756188278, 'eval_runtime': 122.2843, 'eval_samples_per_second': 10.525, 'eval_steps_per_second': 0.172, 'epoch': 1.52}


  0%|          | 0/21 [00:00<?, ?it/s]

{'eval_loss': 0.24010969698429108, 'eval_precision': 0.4760845383759733, 'eval_recall': 0.39666357738646896, 'eval_f1': 0.43276036400404455, 'eval_accuracy': 0.9421144884784747, 'eval_runtime': 94.176, 'eval_samples_per_second': 13.666, 'eval_steps_per_second': 0.223, 'epoch': 1.74}
{'loss': 0.5257, 'learning_rate': 2e-05, 'epoch': 1.81}


  0%|          | 0/21 [00:00<?, ?it/s]

{'eval_loss': 0.24359002709388733, 'eval_precision': 0.5162846803377563, 'eval_recall': 0.39666357738646896, 'eval_f1': 0.44863731656184486, 'eval_accuracy': 0.9433542815612842, 'eval_runtime': 82.8193, 'eval_samples_per_second': 15.54, 'eval_steps_per_second': 0.254, 'epoch': 1.96}


  0%|          | 0/21 [00:00<?, ?it/s]

{'eval_loss': 0.2503388226032257, 'eval_precision': 0.5418275418275418, 'eval_recall': 0.3901760889712697, 'eval_f1': 0.45366379310344823, 'eval_accuracy': 0.9458766192125176, 'eval_runtime': 83.163, 'eval_samples_per_second': 15.476, 'eval_steps_per_second': 0.253, 'epoch': 2.17}


  0%|          | 0/21 [00:00<?, ?it/s]

{'eval_loss': 0.2353149950504303, 'eval_precision': 0.46853823814133594, 'eval_recall': 0.448563484708063, 'eval_f1': 0.45833333333333337, 'eval_accuracy': 0.9451498439570775, 'eval_runtime': 82.081, 'eval_samples_per_second': 15.68, 'eval_steps_per_second': 0.256, 'epoch': 2.39}


  0%|          | 0/21 [00:00<?, ?it/s]

{'eval_loss': 0.24050550162792206, 'eval_precision': 0.5618279569892473, 'eval_recall': 0.3873957367933272, 'eval_f1': 0.4585847504114098, 'eval_accuracy': 0.9475866786370827, 'eval_runtime': 86.6375, 'eval_samples_per_second': 14.855, 'eval_steps_per_second': 0.242, 'epoch': 2.61}


  0%|          | 0/21 [00:00<?, ?it/s]

{'eval_loss': 0.23974569141864777, 'eval_precision': 0.5839112343966713, 'eval_recall': 0.3901760889712697, 'eval_f1': 0.46777777777777774, 'eval_accuracy': 0.9481424479500663, 'eval_runtime': 86.3845, 'eval_samples_per_second': 14.899, 'eval_steps_per_second': 0.243, 'epoch': 2.83}


  0%|          | 0/21 [00:00<?, ?it/s]

{'eval_loss': 0.245252326130867, 'eval_precision': 0.5699873896595208, 'eval_recall': 0.41890639481000924, 'eval_f1': 0.48290598290598286, 'eval_accuracy': 0.948398956863751, 'eval_runtime': 83.9294, 'eval_samples_per_second': 15.334, 'eval_steps_per_second': 0.25, 'epoch': 3.04}


  0%|          | 0/21 [00:00<?, ?it/s]

{'eval_loss': 0.2514230012893677, 'eval_precision': 0.5508474576271186, 'eval_recall': 0.42168674698795183, 'eval_f1': 0.4776902887139107, 'eval_accuracy': 0.9486982172630499, 'eval_runtime': 81.402, 'eval_samples_per_second': 15.81, 'eval_steps_per_second': 0.258, 'epoch': 3.26}


  0%|          | 0/21 [00:00<?, ?it/s]

{'eval_loss': 0.26181262731552124, 'eval_precision': 0.5732814526588845, 'eval_recall': 0.40963855421686746, 'eval_f1': 0.47783783783783784, 'eval_accuracy': 0.9491257321191912, 'eval_runtime': 85.1954, 'eval_samples_per_second': 15.106, 'eval_steps_per_second': 0.246, 'epoch': 3.48}
{'loss': 0.0843, 'learning_rate': 8.636363636363637e-06, 'epoch': 3.62}


  0%|          | 0/21 [00:00<?, ?it/s]

{'eval_loss': 0.2565685510635376, 'eval_precision': 0.5918097754293263, 'eval_recall': 0.4151992585727525, 'eval_f1': 0.48801742919389973, 'eval_accuracy': 0.9493394895472618, 'eval_runtime': 92.7674, 'eval_samples_per_second': 13.873, 'eval_steps_per_second': 0.226, 'epoch': 3.7}
{'train_runtime': 4502.8795, 'train_samples_per_second': 4.889, 'train_steps_per_second': 0.306, 'train_loss': 0.3002868703767365, 'epoch': 3.7}


TrainOutput(global_step=1020, training_loss=0.3002868703767365, metrics={'train_runtime': 4502.8795, 'train_samples_per_second': 4.889, 'train_steps_per_second': 0.306, 'train_loss': 0.3002868703767365, 'epoch': 3.7})

Test Set Evaluation

In [60]:
predictions, labels, _ = trainer.predict(tokenized_wnut["test"])
predictions = np.argmax(predictions, axis=2)

# Remove ignored index (special tokens)
true_predictions = [
    [label_list[p] for (p, l) in zip(prediction, label) if l != -100]
    for prediction, label in zip(predictions, labels)
]
true_labels = [
    [label_list[l] for (p, l) in zip(prediction, label) if l != -100]
    for prediction, label in zip(predictions, labels)
]

results = metric_seqeval.compute(predictions=true_predictions, references=true_labels)
results

  0%|          | 0/21 [00:00<?, ?it/s]

{'corporation': {'precision': 0.21008403361344538,
  'recall': 0.3787878787878788,
  'f1': 0.27027027027027023,
  'number': 66},
 'creative-work': {'precision': 0.3392857142857143,
  'recall': 0.13380281690140844,
  'f1': 0.1919191919191919,
  'number': 142},
 'group': {'precision': 0.3194444444444444,
  'recall': 0.1393939393939394,
  'f1': 0.19409282700421943,
  'number': 165},
 'location': {'precision': 0.49382716049382713,
  'recall': 0.5333333333333333,
  'f1': 0.5128205128205128,
  'number': 150},
 'person': {'precision': 0.5914972273567468,
  'recall': 0.745920745920746,
  'f1': 0.6597938144329898,
  'number': 429},
 'product': {'precision': 0.20481927710843373,
  'recall': 0.13385826771653545,
  'f1': 0.16190476190476194,
  'number': 127},
 'overall_precision': 0.46853823814133594,
 'overall_recall': 0.448563484708063,
 'overall_f1': 0.45833333333333337,
 'overall_accuracy': 0.9451498439570775}

Get Predictions

In [79]:
def tag_sentence(text:str):
    # convert our text to a  tokenized sequence
    inputs = tokenizer(text, truncation=True, return_tensors="pt")#.to("cuda")
    # get outputs
    outputs = model(**inputs)
    # convert to probabilities with softmax
    probs = outputs[0][0].softmax(1)
    # get the tags with the highest probability
    word_tags = [(tokenizer.decode(inputs['input_ids'][0][i].item()), id2tag[tagid.item()]) 
                  for i, tagid in enumerate (probs.argmax(axis=1))]

    return pd.DataFrame(word_tags, columns=['word', 'tag'])

few examples

In [62]:
text="""Apple unveils all-new MacBook Air, supercharged by the new M2 chip"""

print(tag_sentence(text))

         word            tag
0       [CLS]              O
1       apple  B-corporation
2          un              O
3        ##ve              O
4       ##ils              O
5         all              O
6           -              O
7         new              O
8         mac      B-product
9      ##book      I-product
10        air      I-product
11          ,              O
12      super              O
13  ##charged              O
14         by              O
15        the              O
16        new              O
17         m2      B-product
18       chip      I-product
19      [SEP]              O


In [1]:
from huggingface_hub import notebook_login

In [None]:
!git config --global user.email "anudeepvanjavakam@gmail.com"
!git config --global user.name "anudeepvanjavakam1"


In [3]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
trainer.push_to_hub(commit_message="Training complete")

In [71]:
model = AutoModelForTokenClassification.from_pretrained("anudeepvanjavakam/distilbert_uncased_finetuned_wnut17")


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.11k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/266M [00:00<?, ?B/s]

In [72]:
trainer = Trainer(model=model)
tokenizer = AutoTokenizer.from_pretrained("anudeepvanjavakam/distilbert_uncased_finetuned_wnut17")

Downloading (…)okenizer_config.json:   0%|          | 0.00/320 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

## Hyperparameter Tuning

In [16]:
#Set up Weights and Biases for tracking and monitoring model runs.
#Later we can use it for hyperparameter tuning.
# set the wandb project where this run will be logged
os.environ["WANDB_PROJECT"]="reddit_product_tagging"

# save your trained model checkpoint to wandb
os.environ["WANDB_LOG_MODEL"]="true"

# turn off watch to log faster
os.environ["WANDB_WATCH"]="false"


During hyperparameter search, the Trainer will run several trainings, so it needs to have the model defined via a function (so it can be reinitialized at each new run) instead of just having it passed. We just use the same function as before:

In [68]:
def model_init():
    return AutoModelForTokenClassification.from_pretrained(model_name, num_labels=len(label_list))

In [72]:
training_args = TrainingArguments(
    #output_dir='C:/Users/avanjavakam/OneDrive - Moulton Niguel Water/Documents/R_Home/distilbert_ner_wandb',
    report_to="wandb",
    run_name = "run_2_06_04_2023_11_44_pm",
    disable_tqdm=True,
    #num_train_epochs=5,
    #learning_rate=2e-5,
    #per_device_train_batch_size=16,   
    #per_device_eval_batch_size=64,
    #weight_decay=0.01,
    #warmup_steps=500, 
    eval_steps=500,
    #save_steps=60,
    evaluation_strategy="steps",
    load_best_model_at_end=True,
    
)

In [73]:
trainer = Trainer(
    model_init=model_init,
    args=training_args,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_wnut["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN t

Population-Based Training
Population-based training uses guided hyperparameter search but does not need to restart training for new hyperparameter configurations. Instead of discarding bad-performing trials, we exploit good-performing runs by copying their network weights and hyperparameters and then explore new hyperparameter configurations while continuing to train.

The basic idea behind the algorithm in layman's terms:
Run the hyperparameter optimization process for some samples for a given time step (or iterations) T.

After every T iterations, compare the runs and copy the weights of good-performing runs to the bad-performing runs and change their hyperparameter values to be close to the runs' values that performed well.

Terminate the worst-performing runs.
Although the algorithm's idea seems simple, there is a lot of complex optimization math that goes into building this from scratch. Tune provides a scalable and easy-to-use implementation of the SOTA PBT algorithm




In [71]:
from ray import tune
from ray.tune.schedulers import PopulationBasedTraining

def get_scheduler():
    #Creating the PBT scheduler
    scheduler = PopulationBasedTraining(
        mode = "max",
        metric='eval_acc',
        #perturbation_interval=2,
        hyperparam_mutations={
            "weight_decay": tune.choice([0.0, 0.3]),
            "learning_rate": tune.choice([1e-5, 5e-5]),
            "per_device_train_batch_size": tune.choice([16, 64]),
            "num_train_epochs": tune.choice([2,5]),
            "warmup_steps": tune.choice(range(0, 500))
        }
    )
    return scheduler

We run only 8 trials, much less than Bayesian Optimization since instead of stopping bad trials, they copy from the good ones.

In [60]:
best_trial = trainer.hyperparameter_search(
    direction="maximize",
    backend="wandb",
    #hp_space=wandb_hp_space,
    n_trials=8,
    #keep_checkpoints_num=1,
    scheduler=get_scheduler()
)

Create sweep with ID: ile8sxcl
Sweep URL: https://wandb.ai/anudeepvanjavakam/reddit_product_tagging/sweeps/ile8sxcl


[34m[1mwandb[0m: Agent Starting Run: zggjdxl0 with config:
[34m[1mwandb[0m: 	learning_rate: 6.187091203956564e-05
[34m[1mwandb[0m: 	num_train_epochs: 4
[34m[1mwandb[0m: 	per_device_train_batch_size: 64
[34m[1mwandb[0m: 	seed: 8
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Trying to set _wandb in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Trying to set assignments in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Trying to set metric in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to b

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…





{'train_runtime': 2044.8051, 'train_samples_per_second': 8.613, 'train_steps_per_second': 0.135, 'train_loss': 0.14894058393395465, 'epoch': 4.0}
{'eval_loss': 0.2613465487957001, 'eval_precision': 0.5406976744186046, 'eval_recall': 0.4309545875810936, 'eval_f1': 0.47962867457452296, 'eval_accuracy': 0.9479286905219957, 'eval_runtime': 62.4535, 'eval_samples_per_second': 20.607, 'eval_steps_per_second': 2.578, 'epoch': 4.0}


0,1
eval/accuracy,▁
eval/f1,▁
eval/loss,▁
eval/precision,▁
eval/recall,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁
train/global_step,▁▁

0,1
eval/accuracy,0.94793
eval/f1,0.47963
eval/loss,0.26135
eval/precision,0.5407
eval/recall,0.43095
eval/runtime,62.4535
eval/samples_per_second,20.607
eval/steps_per_second,2.578
train/epoch,4.0
train/global_step,276.0


[34m[1mwandb[0m: Agent Starting Run: 3crqic57 with config:
[34m[1mwandb[0m: 	learning_rate: 6.749872973956807e-05
[34m[1mwandb[0m: 	num_train_epochs: 5
[34m[1mwandb[0m: 	per_device_train_batch_size: 16
[34m[1mwandb[0m: 	seed: 40
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Trying to set _wandb in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Trying to set assignments in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Trying to set metric in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to b

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…





{'loss': 0.15, 'learning_rate': 4.304266823972457e-05, 'epoch': 1.81}
{'eval_loss': 0.245904341340065, 'eval_precision': 0.5497448979591837, 'eval_recall': 0.3994439295644115, 'eval_f1': 0.4626945786366076, 'eval_accuracy': 0.9480141934932239, 'eval_runtime': 64.192, 'eval_samples_per_second': 20.049, 'eval_steps_per_second': 2.508, 'epoch': 1.81}
{'loss': 0.0341, 'learning_rate': 1.8586606739881063e-05, 'epoch': 3.62}
{'eval_loss': 0.32771819829940796, 'eval_precision': 0.5997150997150997, 'eval_recall': 0.3901760889712697, 'eval_f1': 0.47276810780460415, 'eval_accuracy': 0.9482707024069087, 'eval_runtime': 61.776, 'eval_samples_per_second': 20.833, 'eval_steps_per_second': 2.606, 'epoch': 3.62}
{'train_runtime': 2982.7351, 'train_samples_per_second': 7.381, 'train_steps_per_second': 0.463, 'train_loss': 0.07024156148882879, 'epoch': 5.0}


0,1
eval/accuracy,▁█
eval/f1,▁█
eval/loss,▁█
eval/precision,▁█
eval/recall,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁▅▅█
train/global_step,▁▁▅▅█

0,1
eval/accuracy,0.94827
eval/f1,0.47277
eval/loss,0.32772
eval/precision,0.59972
eval/recall,0.39018
eval/runtime,61.776
eval/samples_per_second,20.833
eval/steps_per_second,2.606
train/epoch,5.0
train/global_step,1380.0


[34m[1mwandb[0m: Agent Starting Run: kgy00di9 with config:
[34m[1mwandb[0m: 	learning_rate: 5.6309788887288427e-05
[34m[1mwandb[0m: 	num_train_epochs: 4
[34m[1mwandb[0m: 	per_device_train_batch_size: 8
[34m[1mwandb[0m: 	seed: 25
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Trying to set _wandb in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Trying to set assignments in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Trying to set metric in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to b

VBox(children=(Label(value='0.001 MB of 0.011 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.108078…





{'loss': 0.185, 'learning_rate': 4.35353358729308e-05, 'epoch': 0.91}
{'eval_loss': 0.24270012974739075, 'eval_precision': 0.5070603337612324, 'eval_recall': 0.366079703429101, 'eval_f1': 0.4251883745963402, 'eval_accuracy': 0.9448933350433928, 'eval_runtime': 66.955, 'eval_samples_per_second': 19.222, 'eval_steps_per_second': 2.405, 'epoch': 0.91}
{'loss': 0.0794, 'learning_rate': 3.0760882858573164e-05, 'epoch': 1.81}
{'eval_loss': 0.2586621344089508, 'eval_precision': 0.6069868995633187, 'eval_recall': 0.386468952734013, 'eval_f1': 0.47225368063420153, 'eval_accuracy': 0.9482707024069087, 'eval_runtime': 64.5231, 'eval_samples_per_second': 19.946, 'eval_steps_per_second': 2.495, 'epoch': 1.81}
{'loss': 0.0404, 'learning_rate': 1.798642984421554e-05, 'epoch': 2.72}
{'eval_loss': 0.28830501437187195, 'eval_precision': 0.577023498694517, 'eval_recall': 0.40963855421686746, 'eval_f1': 0.47913279132791325, 'eval_accuracy': 0.9482279509212945, 'eval_runtime': 64.632, 'eval_samples_per_sec

0,1
eval/accuracy,▁▇▇█
eval/f1,▁▇██
eval/loss,▁▂▅█
eval/precision,▁█▆▆
eval/recall,▁▄██
eval/runtime,█▄▄▁
eval/samples_per_second,▁▅▄█
eval/steps_per_second,▁▅▄█
train/epoch,▁▁▃▃▅▅▇▇█
train/global_step,▁▁▃▃▅▅▇▇█

0,1
eval/accuracy,0.94861
eval/f1,0.48178
eval/loss,0.3215
eval/precision,0.58289
eval/recall,0.41057
eval/runtime,62.421
eval/samples_per_second,20.618
eval/steps_per_second,2.579
train/epoch,4.0
train/global_step,2204.0


[34m[1mwandb[0m: Agent Starting Run: p7qe59dp with config:
[34m[1mwandb[0m: 	learning_rate: 8.262120435766388e-06
[34m[1mwandb[0m: 	num_train_epochs: 5
[34m[1mwandb[0m: 	per_device_train_batch_size: 64
[34m[1mwandb[0m: 	seed: 14
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Trying to set _wandb in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Trying to set assignments in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Trying to set metric in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to b

VBox(children=(Label(value='0.001 MB of 0.011 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.108343…





{'train_runtime': 2571.1666, 'train_samples_per_second': 8.562, 'train_steps_per_second': 0.134, 'train_loss': 0.3223885356516078, 'epoch': 5.0}
{'eval_loss': 0.2883630394935608, 'eval_precision': 0.40644171779141103, 'eval_recall': 0.24559777571825764, 'eval_f1': 0.3061813980358174, 'eval_accuracy': 0.9371980676328503, 'eval_runtime': 63.0104, 'eval_samples_per_second': 20.425, 'eval_steps_per_second': 2.555, 'epoch': 5.0}


  _warn_prf(average, modifier, msg_start, len(result))


0,1
eval/accuracy,▁
eval/f1,▁
eval/loss,▁
eval/precision,▁
eval/recall,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁
train/global_step,▁▁

0,1
eval/accuracy,0.9372
eval/f1,0.30618
eval/loss,0.28836
eval/precision,0.40644
eval/recall,0.2456
eval/runtime,63.0104
eval/samples_per_second,20.425
eval/steps_per_second,2.555
train/epoch,5.0
train/global_step,345.0


[34m[1mwandb[0m: Agent Starting Run: runqcxqh with config:
[34m[1mwandb[0m: 	learning_rate: 4.695388712117811e-05
[34m[1mwandb[0m: 	num_train_epochs: 3
[34m[1mwandb[0m: 	per_device_train_batch_size: 32
[34m[1mwandb[0m: 	seed: 18
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Trying to set _wandb in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Trying to set assignments in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Trying to set metric in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to b

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…





{'train_runtime': 1564.0155, 'train_samples_per_second': 8.446, 'train_steps_per_second': 0.265, 'train_loss': 0.15146775176559668, 'epoch': 3.0}
{'eval_loss': 0.26040154695510864, 'eval_precision': 0.5753968253968254, 'eval_recall': 0.4031510658016682, 'eval_f1': 0.47411444141689374, 'eval_accuracy': 0.9476294301226967, 'eval_runtime': 64.73, 'eval_samples_per_second': 19.883, 'eval_steps_per_second': 2.487, 'epoch': 3.0}


0,1
eval/accuracy,▁
eval/f1,▁
eval/loss,▁
eval/precision,▁
eval/recall,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁
train/global_step,▁▁

0,1
eval/accuracy,0.94763
eval/f1,0.47411
eval/loss,0.2604
eval/precision,0.5754
eval/recall,0.40315
eval/runtime,64.73
eval/samples_per_second,19.883
eval/steps_per_second,2.487
train/epoch,3.0
train/global_step,414.0


[34m[1mwandb[0m: Agent Starting Run: d6qthlc6 with config:
[34m[1mwandb[0m: 	learning_rate: 3.730120361942406e-05
[34m[1mwandb[0m: 	num_train_epochs: 6
[34m[1mwandb[0m: 	per_device_train_batch_size: 4
[34m[1mwandb[0m: 	seed: 19
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Trying to set _wandb in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Trying to set assignments in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Trying to set metric in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to b

VBox(children=(Label(value='0.001 MB of 0.001 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…





{'loss': 0.2391, 'learning_rate': 3.447792148050307e-05, 'epoch': 0.45}
{'eval_loss': 0.2832615375518799, 'eval_precision': 0.6374045801526718, 'eval_recall': 0.30954587581093607, 'eval_f1': 0.41671865252651286, 'eval_accuracy': 0.9428412637339147, 'eval_runtime': 68.637, 'eval_samples_per_second': 18.751, 'eval_steps_per_second': 2.346, 'epoch': 0.45}
{'loss': 0.1374, 'learning_rate': 3.165463934158208e-05, 'epoch': 0.91}
{'eval_loss': 0.23936323821544647, 'eval_precision': 0.5104740904079382, 'eval_recall': 0.42910101946246526, 'eval_f1': 0.4662638469284995, 'eval_accuracy': 0.9454491043563764, 'eval_runtime': 60.063, 'eval_samples_per_second': 21.427, 'eval_steps_per_second': 2.681, 'epoch': 0.91}
{'loss': 0.0881, 'learning_rate': 2.8831357202661102e-05, 'epoch': 1.36}
{'eval_loss': 0.2819172739982605, 'eval_precision': 0.5716234652114598, 'eval_recall': 0.38832252085264135, 'eval_f1': 0.4624724061810155, 'eval_accuracy': 0.9472874182377837, 'eval_runtime': 70.51, 'eval_samples_per_

0,1
eval/accuracy,▁▄▆▅▆▇█▇▇▇▇▇▇
eval/f1,▁▆▅▄▅▆█▆▇▇▇▆▇
eval/loss,▃▁▃▃▅▅▅▆▇▇▇██
eval/precision,█▁▄▅▅▆▄▄▆▄▄▄▄
eval/recall,▁█▆▄▅▆█▆▇▇█▆▇
eval/runtime,▇▁█▁█▃▂▂▁▅▃▄▂
eval/samples_per_second,▂█▁█▁▆▇▆█▃▆▅▇
eval/steps_per_second,▂█▁█▁▆▇▆█▃▆▅▇
train/epoch,▁▁▂▂▂▂▃▃▃▃▄▄▄▄▅▅▆▆▆▆▇▇▇▇███
train/global_step,▁▁▂▂▂▂▃▃▃▃▄▄▄▄▅▅▆▆▆▆▇▇▇▇███

0,1
eval/accuracy,0.9484
eval/f1,0.48064
eval/loss,0.38866
eval/precision,0.56203
eval/recall,0.41983
eval/runtime,61.062
eval/samples_per_second,21.077
eval/steps_per_second,2.637
train/epoch,6.0
train/global_step,6606.0


[34m[1mwandb[0m: Agent Starting Run: krqmpvqg with config:
[34m[1mwandb[0m: 	learning_rate: 6.708243353123604e-05
[34m[1mwandb[0m: 	num_train_epochs: 2
[34m[1mwandb[0m: 	per_device_train_batch_size: 64
[34m[1mwandb[0m: 	seed: 21
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Trying to set _wandb in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Trying to set assignments in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Trying to set metric in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to b

VBox(children=(Label(value='0.001 MB of 0.011 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.108272…





{'train_runtime': 1115.7473, 'train_samples_per_second': 7.892, 'train_steps_per_second': 0.124, 'train_loss': 0.2310990181522093, 'epoch': 2.0}
{'eval_loss': 0.2490411102771759, 'eval_precision': 0.5442536327608983, 'eval_recall': 0.3818350324374421, 'eval_f1': 0.44880174291939, 'eval_accuracy': 0.945320849899534, 'eval_runtime': 65.359, 'eval_samples_per_second': 19.691, 'eval_steps_per_second': 2.463, 'epoch': 2.0}


VBox(children=(Label(value='254.130 MB of 254.130 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0,…

0,1
eval/accuracy,▁
eval/f1,▁
eval/loss,▁
eval/precision,▁
eval/recall,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁
train/global_step,▁▁

0,1
eval/accuracy,0.94532
eval/f1,0.4488
eval/loss,0.24904
eval/precision,0.54425
eval/recall,0.38184
eval/runtime,65.359
eval/samples_per_second,19.691
eval/steps_per_second,2.463
train/epoch,2.0
train/global_step,138.0


[34m[1mwandb[0m: Agent Starting Run: 9k7iuegg with config:
[34m[1mwandb[0m: 	learning_rate: 7.389668947799434e-05
[34m[1mwandb[0m: 	num_train_epochs: 3
[34m[1mwandb[0m: 	per_device_train_batch_size: 4
[34m[1mwandb[0m: 	seed: 4
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Trying to set _wandb in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Trying to set assignments in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Trying to set metric in the hyperparameter search but there is no corresponding field in `TrainingArguments`.
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to b

VBox(children=(Label(value='0.001 MB of 0.010 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=0.111453…





{'loss': 0.2209, 'learning_rate': 6.271039073775904e-05, 'epoch': 0.45}
{'eval_loss': 0.2422882318496704, 'eval_precision': 0.5013333333333333, 'eval_recall': 0.3484708063021316, 'eval_f1': 0.41115363586659376, 'eval_accuracy': 0.9451498439570775, 'eval_runtime': 61.197, 'eval_samples_per_second': 21.03, 'eval_steps_per_second': 2.631, 'epoch': 0.45}
{'loss': 0.1492, 'learning_rate': 5.152409199752375e-05, 'epoch': 0.91}
{'eval_loss': 0.3045324981212616, 'eval_precision': 0.6195121951219512, 'eval_recall': 0.3531047265987025, 'eval_f1': 0.4498229043683589, 'eval_accuracy': 0.9448505835577786, 'eval_runtime': 71.46, 'eval_samples_per_second': 18.01, 'eval_steps_per_second': 2.253, 'epoch': 0.91}
{'loss': 0.0875, 'learning_rate': 4.033779325728846e-05, 'epoch': 1.36}
{'eval_loss': 0.32817161083221436, 'eval_precision': 0.6302816901408451, 'eval_recall': 0.33178869323447635, 'eval_f1': 0.4347298117789921, 'eval_accuracy': 0.9458766192125176, 'eval_runtime': 61.267, 'eval_samples_per_secon

0,1
eval/accuracy,▂▁▅▄▅█
eval/f1,▁█▅▅▇█
eval/loss,▁▅▆▄▇█
eval/precision,▁▇█▃▄▇
eval/recall,▄▅▁▆█▆
eval/runtime,▁█▁▃▃▃
eval/samples_per_second,█▁█▆▆▆
eval/steps_per_second,█▁█▆▆▆
train/epoch,▁▁▂▂▃▃▅▅▆▆▇▇█
train/global_step,▁▁▂▂▃▃▅▅▆▆▇▇█

0,1
eval/accuracy,0.9469
eval/f1,0.4509
eval/loss,0.35199
eval/precision,0.60436
eval/recall,0.35959
eval/runtime,63.4169
eval/samples_per_second,20.294
eval/steps_per_second,2.539
train/epoch,3.0
train/global_step,3303.0


In [67]:
best_trial

BestRun(run_id='kgy00di9', objective=2.423856367441647, hyperparameters={'learning_rate': 5.6309788887288427e-05, 'num_train_epochs': 4, 'per_device_train_batch_size': 8, 'seed': 25, 'assignments': {}, 'metric': 'eval/loss'}, run_summary=None)

In [74]:
for n, v in best_trial.hyperparameters.items():
    setattr(trainer.args, n, v)

trainer.train()

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN t





{'loss': 0.185, 'learning_rate': 4.35353358729308e-05, 'epoch': 0.91}
{'eval_loss': 0.24270012974739075, 'eval_precision': 0.5070603337612324, 'eval_recall': 0.366079703429101, 'eval_f1': 0.4251883745963402, 'eval_accuracy': 0.9448933350433928, 'eval_runtime': 80.7282, 'eval_samples_per_second': 15.942, 'eval_steps_per_second': 1.994, 'epoch': 0.91}
{'loss': 0.0794, 'learning_rate': 3.0760882858573164e-05, 'epoch': 1.81}
{'eval_loss': 0.2586621344089508, 'eval_precision': 0.6069868995633187, 'eval_recall': 0.386468952734013, 'eval_f1': 0.47225368063420153, 'eval_accuracy': 0.9482707024069087, 'eval_runtime': 61.545, 'eval_samples_per_second': 20.912, 'eval_steps_per_second': 2.616, 'epoch': 1.81}
{'loss': 0.0404, 'learning_rate': 1.798642984421554e-05, 'epoch': 2.72}
{'eval_loss': 0.28830501437187195, 'eval_precision': 0.577023498694517, 'eval_recall': 0.40963855421686746, 'eval_f1': 0.47913279132791325, 'eval_accuracy': 0.9482279509212945, 'eval_runtime': 60.501, 'eval_samples_per_sec

TrainOutput(global_step=2204, training_loss=0.07536158587668638, metrics={'train_runtime': 3274.7055, 'train_samples_per_second': 5.378, 'train_steps_per_second': 0.673, 'train_loss': 0.07536158587668638, 'epoch': 4.0})

In [77]:
# after pushing the best model to hugging face hub (kgy00di9)
model = AutoModelForTokenClassification.from_pretrained("anudeepvanjavakam/distilbert_finetuned_wnut17_wandb_ner")
trainer = Trainer(model=model)
tokenizer = AutoTokenizer.from_pretrained("anudeepvanjavakam/distilbert_finetuned_wnut17_wandb_ner")

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.11k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Downloading pytorch_model.bin:   0%|          | 0.00/266M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/320 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

In [80]:
text="""Apple unveils all-new MacBook Air, supercharged by the new M2 chip"""

print(tag_sentence(text))

         word            tag
0       [CLS]              O
1       apple  B-corporation
2          un              O
3        ##ve              O
4       ##ils              O
5         all              O
6           -              O
7         new              O
8         mac      B-product
9      ##book      I-product
10        air      I-product
11          ,              O
12      super              O
13  ##charged              O
14         by              O
15        the              O
16        new              O
17         m2      B-product
18       chip      I-product
19      [SEP]              O


## Testing other options for hypertuning (without population based training)

In [63]:
# method
sweep_config = {
    'method': 'random'
}


# hyperparameters
parameters_dict = {
    'epochs': {
        'value': 1
        },
    'batch_size': {
        'values': [8, 64]
        },
    'learning_rate': {
        'distribution': 'log_uniform_values',
        'min': 1e-5,
        'max': 5e-5
    },
    'weight_decay': {
        'values': [0.0, 0.3]
    },
}


sweep_config['parameters'] = parameters_dict
sweep_id = wandb.sweep(sweep_config, project='reddit_product_tagging')

Create sweep with ID: wx2o1mj0
Sweep URL: https://wandb.ai/anudeepvanjavakam/reddit_product_tagging/sweeps/wx2o1mj0


In [64]:
def train(config=None):
  with wandb.init(config=config):
    # set sweep configuration
    config = wandb.config


    # set training arguments
    training_args = TrainingArguments(
        output_dir='reddit_product_tagging-sweeps',
	      report_to='wandb',  # Turn on Weights & Biases logging
        num_train_epochs=config.epochs,
        learning_rate=config.learning_rate,
        weight_decay=config.weight_decay,
        per_device_train_batch_size=config.batch_size,
        per_device_eval_batch_size=16,
        save_strategy='epoch',
        evaluation_strategy='epoch',
        logging_strategy='epoch',
        load_best_model_at_end=True,
        disable_tqdm=False
    )

    # define training loop
    trainer = Trainer(
        # model,
        model_init=model_init,
        args=training_args,
        train_dataset=tokenized_train_dataset,
        eval_dataset=tokenized_wnut['test'],
        tokenizer=tokenizer,
        data_collator=data_collator,
        compute_metrics=compute_metrics
    )


    # start training loop
    trainer.train()

In [65]:
wandb.agent(sweep_id, train, count=10)

[34m[1mwandb[0m: Agent Starting Run: 70nqsf42 with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 3.0218375436134713e-05
[34m[1mwandb[0m: 	weight_decay: 0.3
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN t

  0%|          | 0/551 [00:00<?, ?it/s]

{'loss': 0.2136, 'learning_rate': 0.0, 'epoch': 1.0}


  0%|          | 0/81 [00:00<?, ?it/s]

{'eval_loss': 0.24516281485557556, 'eval_precision': 0.5699404761904762, 'eval_recall': 0.35495829471733087, 'eval_f1': 0.4374643061107938, 'eval_accuracy': 0.9446795776153222, 'eval_runtime': 70.0451, 'eval_samples_per_second': 18.374, 'eval_steps_per_second': 1.156, 'epoch': 1.0}
{'train_runtime': 772.4679, 'train_samples_per_second': 5.7, 'train_steps_per_second': 0.713, 'train_loss': 0.21362103913959704, 'epoch': 1.0}


0,1
eval/accuracy,▁
eval/f1,▁
eval/loss,▁
eval/precision,▁
eval/recall,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁

0,1
eval/accuracy,0.94468
eval/f1,0.43746
eval/loss,0.24516
eval/precision,0.56994
eval/recall,0.35496
eval/runtime,70.0451
eval/samples_per_second,18.374
eval/steps_per_second,1.156
train/epoch,1.0
train/global_step,551.0


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 7oazxgdk with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 1.0863756495194372e-05
[34m[1mwandb[0m: 	weight_decay: 0.3
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN t

  0%|          | 0/551 [00:00<?, ?it/s]

{'loss': 0.2898, 'learning_rate': 0.0, 'epoch': 1.0}


  0%|          | 0/81 [00:00<?, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))


{'eval_loss': 0.28670361638069153, 'eval_precision': 0.4385026737967914, 'eval_recall': 0.22798887859128822, 'eval_f1': 0.3, 'eval_accuracy': 0.9373263220896926, 'eval_runtime': 73.652, 'eval_samples_per_second': 17.474, 'eval_steps_per_second': 1.1, 'epoch': 1.0}
{'train_runtime': 765.3589, 'train_samples_per_second': 5.753, 'train_steps_per_second': 0.72, 'train_loss': 0.28975097323935173, 'epoch': 1.0}


0,1
eval/accuracy,▁
eval/f1,▁
eval/loss,▁
eval/precision,▁
eval/recall,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁

0,1
eval/accuracy,0.93733
eval/f1,0.3
eval/loss,0.2867
eval/precision,0.4385
eval/recall,0.22799
eval/runtime,73.652
eval/samples_per_second,17.474
eval/steps_per_second,1.1
train/epoch,1.0
train/global_step,551.0


[34m[1mwandb[0m: Agent Starting Run: psbmlki8 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 2.23088531403636e-05
[34m[1mwandb[0m: 	weight_decay: 0
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN t

  0%|          | 0/69 [00:00<?, ?it/s]

{'loss': 0.5624, 'learning_rate': 0.0, 'epoch': 1.0}


  0%|          | 0/81 [00:00<?, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'eval_loss': 0.41149789094924927, 'eval_precision': 0.0, 'eval_recall': 0.0, 'eval_f1': 0.0, 'eval_accuracy': 0.9256124150314223, 'eval_runtime': 69.0685, 'eval_samples_per_second': 18.634, 'eval_steps_per_second': 1.173, 'epoch': 1.0}
{'train_runtime': 584.0377, 'train_samples_per_second': 7.539, 'train_steps_per_second': 0.118, 'train_loss': 0.5623834582342617, 'epoch': 1.0}


0,1
eval/accuracy,▁
eval/f1,▁
eval/loss,▁
eval/precision,▁
eval/recall,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁

0,1
eval/accuracy,0.92561
eval/f1,0.0
eval/loss,0.4115
eval/precision,0.0
eval/recall,0.0
eval/runtime,69.0685
eval/samples_per_second,18.634
eval/steps_per_second,1.173
train/epoch,1.0
train/global_step,69.0


[34m[1mwandb[0m: Agent Starting Run: dtg9du4u with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 1.3689961336757494e-05
[34m[1mwandb[0m: 	weight_decay: 0.3
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN t

  0%|          | 0/69 [00:00<?, ?it/s]

{'loss': 0.7141, 'learning_rate': 0.0, 'epoch': 1.0}


  0%|          | 0/81 [00:00<?, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'eval_loss': 0.433150053024292, 'eval_precision': 0.0, 'eval_recall': 0.0, 'eval_f1': 0.0, 'eval_accuracy': 0.9256124150314223, 'eval_runtime': 72.073, 'eval_samples_per_second': 17.857, 'eval_steps_per_second': 1.124, 'epoch': 1.0}
{'train_runtime': 592.1385, 'train_samples_per_second': 7.436, 'train_steps_per_second': 0.117, 'train_loss': 0.7141017637391022, 'epoch': 1.0}


0,1
eval/accuracy,▁
eval/f1,▁
eval/loss,▁
eval/precision,▁
eval/recall,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁

0,1
eval/accuracy,0.92561
eval/f1,0.0
eval/loss,0.43315
eval/precision,0.0
eval/recall,0.0
eval/runtime,72.073
eval/samples_per_second,17.857
eval/steps_per_second,1.124
train/epoch,1.0
train/global_step,69.0


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: qe51vtj6 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 2.4870948756271965e-05
[34m[1mwandb[0m: 	weight_decay: 0.3
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN t

  0%|          | 0/69 [00:00<?, ?it/s]

{'loss': 0.5349, 'learning_rate': 0.0, 'epoch': 1.0}


  0%|          | 0/81 [00:00<?, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'eval_loss': 0.39835095405578613, 'eval_precision': 0.0, 'eval_recall': 0.0, 'eval_f1': 0.0, 'eval_accuracy': 0.9256124150314223, 'eval_runtime': 72.0787, 'eval_samples_per_second': 17.855, 'eval_steps_per_second': 1.124, 'epoch': 1.0}
{'train_runtime': 593.4053, 'train_samples_per_second': 7.42, 'train_steps_per_second': 0.116, 'train_loss': 0.5348971270132756, 'epoch': 1.0}


0,1
eval/accuracy,▁
eval/f1,▁
eval/loss,▁
eval/precision,▁
eval/recall,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁

0,1
eval/accuracy,0.92561
eval/f1,0.0
eval/loss,0.39835
eval/precision,0.0
eval/recall,0.0
eval/runtime,72.0787
eval/samples_per_second,17.855
eval/steps_per_second,1.124
train/epoch,1.0
train/global_step,69.0


[34m[1mwandb[0m: Agent Starting Run: gpo7cdfs with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 2.124585114733642e-05
[34m[1mwandb[0m: 	weight_decay: 0
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN t

  0%|          | 0/69 [00:00<?, ?it/s]

{'loss': 0.5747, 'learning_rate': 0.0, 'epoch': 1.0}


  0%|          | 0/81 [00:00<?, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'eval_loss': 0.41408681869506836, 'eval_precision': 0.0, 'eval_recall': 0.0, 'eval_f1': 0.0, 'eval_accuracy': 0.9256124150314223, 'eval_runtime': 73.625, 'eval_samples_per_second': 17.48, 'eval_steps_per_second': 1.1, 'epoch': 1.0}
{'train_runtime': 591.5063, 'train_samples_per_second': 7.444, 'train_steps_per_second': 0.117, 'train_loss': 0.5747222900390625, 'epoch': 1.0}


0,1
eval/accuracy,▁
eval/f1,▁
eval/loss,▁
eval/precision,▁
eval/recall,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁

0,1
eval/accuracy,0.92561
eval/f1,0.0
eval/loss,0.41409
eval/precision,0.0
eval/recall,0.0
eval/runtime,73.625
eval/samples_per_second,17.48
eval/steps_per_second,1.1
train/epoch,1.0
train/global_step,69.0


[34m[1mwandb[0m: Agent Starting Run: s472n8t5 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 2.6184993091948376e-05
[34m[1mwandb[0m: 	weight_decay: 0
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN t

  0%|          | 0/69 [00:00<?, ?it/s]

{'loss': 0.5208, 'learning_rate': 0.0, 'epoch': 1.0}


  0%|          | 0/81 [00:00<?, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'eval_loss': 0.3851180374622345, 'eval_precision': 0.0, 'eval_recall': 0.0, 'eval_f1': 0.0, 'eval_accuracy': 0.9256124150314223, 'eval_runtime': 69.27, 'eval_samples_per_second': 18.579, 'eval_steps_per_second': 1.169, 'epoch': 1.0}
{'train_runtime': 594.8042, 'train_samples_per_second': 7.402, 'train_steps_per_second': 0.116, 'train_loss': 0.5208257592242697, 'epoch': 1.0}


0,1
eval/accuracy,▁
eval/f1,▁
eval/loss,▁
eval/precision,▁
eval/recall,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁

0,1
eval/accuracy,0.92561
eval/f1,0.0
eval/loss,0.38512
eval/precision,0.0
eval/recall,0.0
eval/runtime,69.27
eval/samples_per_second,18.579
eval/steps_per_second,1.169
train/epoch,1.0
train/global_step,69.0


[34m[1mwandb[0m: Agent Starting Run: pdem1p2k with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 1.2702375305629806e-05
[34m[1mwandb[0m: 	weight_decay: 0.3
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN t

  0%|          | 0/69 [00:00<?, ?it/s]

{'loss': 0.7453, 'learning_rate': 0.0, 'epoch': 1.0}


  0%|          | 0/81 [00:00<?, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'eval_loss': 0.4373634159564972, 'eval_precision': 0.0, 'eval_recall': 0.0, 'eval_f1': 0.0, 'eval_accuracy': 0.9256124150314223, 'eval_runtime': 70.502, 'eval_samples_per_second': 18.255, 'eval_steps_per_second': 1.149, 'epoch': 1.0}
{'train_runtime': 582.5221, 'train_samples_per_second': 7.559, 'train_steps_per_second': 0.118, 'train_loss': 0.7452821040499038, 'epoch': 1.0}


0,1
eval/accuracy,▁
eval/f1,▁
eval/loss,▁
eval/precision,▁
eval/recall,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁

0,1
eval/accuracy,0.92561
eval/f1,0.0
eval/loss,0.43736
eval/precision,0.0
eval/recall,0.0
eval/runtime,70.502
eval/samples_per_second,18.255
eval/steps_per_second,1.149
train/epoch,1.0
train/global_step,69.0


[34m[1mwandb[0m: Agent Starting Run: ir6gwzwz with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 4.73865707184315e-05
[34m[1mwandb[0m: 	weight_decay: 0
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN t

  0%|          | 0/551 [00:00<?, ?it/s]

{'loss': 0.2014, 'learning_rate': 0.0, 'epoch': 1.0}


  0%|          | 0/81 [00:00<?, ?it/s]

{'eval_loss': 0.23708440363407135, 'eval_precision': 0.5646551724137931, 'eval_recall': 0.36422613531047265, 'eval_f1': 0.4428169014084507, 'eval_accuracy': 0.9450215895002352, 'eval_runtime': 71.383, 'eval_samples_per_second': 18.03, 'eval_steps_per_second': 1.135, 'epoch': 1.0}
{'train_runtime': 771.6396, 'train_samples_per_second': 5.706, 'train_steps_per_second': 0.714, 'train_loss': 0.20136521299607957, 'epoch': 1.0}


0,1
eval/accuracy,▁
eval/f1,▁
eval/loss,▁
eval/precision,▁
eval/recall,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁

0,1
eval/accuracy,0.94502
eval/f1,0.44282
eval/loss,0.23708
eval/precision,0.56466
eval/recall,0.36423
eval/runtime,71.383
eval/samples_per_second,18.03
eval/steps_per_second,1.135
train/epoch,1.0
train/global_step,551.0


[34m[1mwandb[0m: Agent Starting Run: s5xh67q4 with config:
[34m[1mwandb[0m: 	batch_size: 64
[34m[1mwandb[0m: 	epochs: 1
[34m[1mwandb[0m: 	learning_rate: 1.1409156823373444e-05
[34m[1mwandb[0m: 	weight_decay: 0.3
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForTokenClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForTokenClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN t

  0%|          | 0/69 [00:00<?, ?it/s]

{'loss': 0.7945, 'learning_rate': 0.0, 'epoch': 1.0}


  0%|          | 0/81 [00:00<?, ?it/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


{'eval_loss': 0.44381245970726013, 'eval_precision': 0.0, 'eval_recall': 0.0, 'eval_f1': 0.0, 'eval_accuracy': 0.9256124150314223, 'eval_runtime': 80.581, 'eval_samples_per_second': 15.972, 'eval_steps_per_second': 1.005, 'epoch': 1.0}
{'train_runtime': 617.4248, 'train_samples_per_second': 7.131, 'train_steps_per_second': 0.112, 'train_loss': 0.7944799229718637, 'epoch': 1.0}


0,1
eval/accuracy,▁
eval/f1,▁
eval/loss,▁
eval/precision,▁
eval/recall,▁
eval/runtime,▁
eval/samples_per_second,▁
eval/steps_per_second,▁
train/epoch,▁▁▁
train/global_step,▁▁▁

0,1
eval/accuracy,0.92561
eval/f1,0.0
eval/loss,0.44381
eval/precision,0.0
eval/recall,0.0
eval/runtime,80.581
eval/samples_per_second,15.972
eval/steps_per_second,1.005
train/epoch,1.0
train/global_step,69.0


In [66]:
wandb.finish()

In [81]:
## Results and Comparison Table

comparison_table = pd.read_csv("comparison_table_hyperparameter_tuning/wandb_export_2023-06-05T21_34_26.822-07_00.csv")
comparison_table

Unnamed: 0,Name,eval/accuracy,eval/f1,eval/loss,eval/precision,eval/recall,batch_size,epochs,eval_batch_size,eval_steps,evaluation_strategy,learning_rate,per_device_eval_batch_size,per_device_train_batch_size,Runtime,Sweep,train_batch_size
0,floral-sweep-3 (chosen model),95%,48%,32%,58%,41%,,,8,500.0,steps,5.6e-05,8,8,3036,ile8sxcl,8
1,zesty-sweep-6,95%,48%,39%,56%,42%,,,8,500.0,steps,3.7e-05,8,4,6402,ile8sxcl,4
2,colorful-sweep-2,95%,47%,33%,60%,39%,,,8,500.0,steps,6.7e-05,8,16,3041,ile8sxcl,16
3,gentle-sweep-1,95%,48%,26%,54%,43%,,,8,500.0,steps,6.2e-05,8,64,2155,ile8sxcl,64
4,kind-sweep-5,95%,47%,26%,58%,40%,,,8,500.0,steps,4.7e-05,8,32,1667,ile8sxcl,32
5,pleasant-sweep-8,95%,45%,35%,60%,36%,,,8,500.0,steps,7.4e-05,8,4,3166,ile8sxcl,4
6,celestial-sweep-7,95%,45%,25%,54%,38%,,,8,500.0,steps,6.7e-05,8,64,1214,ile8sxcl,64
7,drawn-sweep-9,95%,44%,24%,56%,36%,8.0,1.0,16,,epoch,4.7e-05,16,8,835,wx2o1mj0,8
8,pleasant-sweep-1,94%,44%,25%,57%,35%,8.0,1.0,16,,epoch,3e-05,16,8,855,wx2o1mj0,8
9,crimson-sweep-2,94%,30%,29%,44%,23%,8.0,1.0,16,,epoch,1.1e-05,16,8,834,wx2o1mj0,8


## Results

A low F1 score but high accuracy is observed. when it comes to the F1 score, which is a measure of the balance between precision and recall, DistilBERT may sometimes show slightly lower performance compared to the larger BERT model or other more complex models. This is because DistilBERT sacrifices some model capacity to achieve faster inference times and a smaller model size. Consequently, it may have slightly lower precision or recall compared to larger models, leading to a lower F1 score in some cases.

Class imbalance: The dataset is imbalanced, meaning one class has significantly more samples than the other, the model might have a high accuracy by predicting the majority class correctly most of the time. However, it may struggle with the minority class, resulting in lower recall and F1 score for that class.



## Conclusion

- Recap of the this model objective: Find product mentions amongst posts and comments via named entity recognition (NER). This model can be used by the app to help users with queries such as "Best 4K TV to buy"

- Methodology: a baseline model is developed for initial evaluation. A pre-trained DistilBERT model (token-classification) from Hugging Face Transformers which did not have product entities is fine tuned with the readily available wnut_17 data set to predict products. Hyperparameter tuning with "Weights and Biases" (provides tools to quickly track experiments, version and iterate on datasets, evaluate model performance, reproduce models, and visualize results) was performed to find the best model with an accuracy of 95% evaluated and tested on examples. This best model is pushed to the Hugging Face hub and can be called for inference through the app.




## Improvement

Examine the data and check how many examples have each tags/token. Model needs more product data to be trained on.

Cross-validation (with a stratified split since classes aren’t balanced) may help mitigate some of the problems that come from doing a train/test split with small datasets.

Class imbalance: there is a class imbalance (model may predict the majority class “O” majority of the time) and needs to be addressed by employing techniques like oversampling the minority class, undersampling the majority class, or using class weighting.

Consider other evaluation metrics.

Fine-tuning with more reddit comments and product entities can help the model learn better representations and improve its performance on the task at hand.

## Acknowledgement:
- https://huggingface.co/datasets/wnut_17
- https://huggingface.co/learn/nlp-course/chapter7/2
- https://towardsdatascience.com/named-entity-recognition-with-deep-learning-bert-the-essential-guide-274c6965e2d
- https://towardsdatascience.com/build-a-named-entity-recognition-app-with-streamlit-f157672f867f
- https://github.com/JINHXu/create-annotated-NER-dataset
- https://wandb.ai/matt24/vit-snacks-sweeps/reports/Hyperparameter-Search-with-W-B-Sweeps-for-Hugging-Face-Transformer-Models--VmlldzoyMTUxNTg0
- https://docs.wandb.ai/guides/integrations/huggingface#2-name-the-project
- https://docs.wandb.ai/guides/sweeps/initialize-sweeps
- https://wandb.ai/amogkam/transformers/reports/Hyperparameter-Optimization-for-Huggingface-Transformers--VmlldzoyMTc2ODI
- https://huggingface.co/docs/transformers/hpo_train
- https://docs.streamlit.io/library/advanced-features/caching
