Skip to content

Commit

Permalink
Clarification about entity recognition helper files (#85)
Browse files Browse the repository at this point in the history
  • Loading branch information
jwmueller committed Dec 22, 2023
1 parent 4722dce commit 2553464
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions entity_recognition/entity_recognition_training.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@
"source": [
"This notebook demonstrates how to train a NLP model for entity recognition and use it to produce out-of-sample predicted probabilities for each token. These are required inputs to find label issues in token classification datasets with cleanlab. The specific token classification task we consider here is Named Entity Recognition with the [CoNLL-2003 dataset](https://deepai.org/dataset/conll-2003-english), and we train a Transformer network from [HuggingFace's transformers library](https://github.com/huggingface/transformers). This notebook demonstrates how to produce the `pred_probs`, using them to find label issues is demonstrated in cleanlab's [Token Classification Tutorial](https://docs.cleanlab.ai/stable/tutorials/token_classification.html). \n",
"\n",
"Note: running this notebook requires the **.py** files from the **entity_recognition/** parent folder, if running in Colab or locally, make sure you've copied these helper **.py** files to your environment as well. \n",
"\n",
"**Overview of what we'll do in this notebook:** \n",
"- Read and process text datasets with per-token labels in the CoNLL format. \n",
"- Compute out-of-sample predicted probabilities by training a BERT Transformer network via cross-validation. \n",
Expand Down

0 comments on commit 2553464

Please sign in to comment.