Skip to content

Commit

Permalink
docs: implement christopher comments (#1609)
Browse files Browse the repository at this point in the history
Closes #1601
  • Loading branch information
David Fidalgo committed Jul 8, 2022
1 parent 39c60cc commit 090f8a8
Showing 1 changed file with 16 additions and 10 deletions.
26 changes: 16 additions & 10 deletions docs/tutorials/active_learning_with_small_text.ipynb
Expand Up @@ -37,10 +37,15 @@
"source": [
"> Active learning is a special case of machine learning in which a learning algorithm can interactively query a user (or some other information source) to label new data points with the desired outputs. [Wikipedia](https://en.wikipedia.org/wiki/Active_learning_(machine_learning))\n",
"\n",
"This tutorial will show you how to incorporate Rubrix into an active learning workflow involving a human in the loop. \n",
"We will build a simple text classifier by combining the active learning framework small-text and Rubrix. \n",
"Hugging Face's transformers will provide the classifier we will embed in an active learner from small-text. \n",
"You and Rubrix will be the information source that teaches the model to become a sample-efficient classifier."
"Supervised machine learning often requires large amounts of labeled data that are expensive to generate. \n",
"*Active Learning* (AL) systems attempt to overcome this labeling bottleneck. \n",
"The underlying idea is that not all data points are equally important for training the model. \n",
"The AL system tries to query only the most relevant data from a pool of unlabeled data to be labeled by a so-called *oracle*, which is often a human annotator.\n",
"Therefore, AL systems are usually much more sample efficient and need far less training data than traditional supervised systems.\n",
"\n",
"This tutorial will show you how to incorporate [Rubrix](https://github.com/recognai/rubrix) into an active learning workflow involving a human in the loop.\n",
"We will build a simple text classifier by combining the active learning framework [small-text](https://github.com/webis-de/small-text) and Rubrix. \n",
"Hugging Face's [transformers](https://github.com/huggingface/transformers) will provide the classifier we will embed in an active learner from small-text. Rubrix excels in making **you** the oracle that conveniently teaches the model via an intuitive UI."
]
},
{
Expand Down Expand Up @@ -223,8 +228,8 @@
"Now that we have our datasets ready let's set up the active learner. \n",
"For this, we need two components:\n",
"\n",
" - the classifier;\n",
" - the query strategy;\n",
" - the **classifier** to be trained;\n",
" - the **query strategy** to obtain the most relevant data;\n",
" \n",
"In our case, we choose a [Hugging Face transformer](https://huggingface.co/docs/transformers/index) as the classifier and a [tie-breaker](https://small-text.readthedocs.io/en/v1.0.0/components/query_strategies.html#small_text.query_strategies.strategies.BreakingTies) as the query strategy. \n",
"The latter selects instances of the data pool with a small margin between the two most likely predicted labels. "
Expand Down Expand Up @@ -264,8 +269,8 @@
"id": "02f8a4b2-b3ec-4e3e-a299-372b2091ec52",
"metadata": {},
"source": [
"We randomly draw a subset from the data pool as the initialization strategy. \n",
"After obtaining the labels for this batch of instances, the active learner will use them to create the first classifier."
"Since most query strategies, including ours, require a trained model, we randomly draw a subset from the data pool to initialize our AL system. \n",
"After obtaining the labels for this batch of instances, the active learner After obtaining the labels for this batch of instances, the active learner will use them to create the first classifier.\n"
]
},
{
Expand Down Expand Up @@ -506,7 +511,7 @@
"metadata": {},
"source": [
"We should achieve an accuracy of at least **0.8 after around 12 iterations**, corresponding to roughly 260 annotated records. \n",
"The stopping criterium is ultimately up to you, and you can choose more sophisticated criteria like the [KappaAverage](https://small-text.readthedocs.io/en/v1.0.0/components/stopping_criteria.html) implemented in small-text."
"The stopping criterion is ultimately up to you, and you can choose more sophisticated criteria like the [KappaAverage](https://small-text.readthedocs.io/en/v1.0.0/components/stopping_criteria.html) implemented in small-text."
]
},
{
Expand All @@ -533,7 +538,8 @@
"metadata": {},
"source": [
"In this tutorial, we saw how you could **embed Rubrix in an active learning loop involving a human in the loop**. \n",
"We relied on **small-text to use a Hugging Face transformer as an active learner** and gathered a **sample-efficient data set by annotating only the most decisive records**.\n",
"We relied on **small-text to use a Hugging Face transformer within an active learning setup**. \n",
"In the end, we gathered **a sample-efficient data set by annotating only the most informative records** for the model.\n",
"\n",
"Rubrix makes it very easy to use a dedicated annotation team or subject matter experts as an oracle for your active learning system. They will only interact with the Rubrix UI and do not have to worry about training or querying the system. We encourage you to try out active learning in your next project and make your and your annotator's life a little easier."
]
Expand Down

0 comments on commit 090f8a8

Please sign in to comment.