Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,3 +293,10 @@ Rubrix main components are:

## Community
As a new open-source project, we are eager to hear your thoughts, fix bugs, and help you get started. Feel free to use the Discussion forum or the Issues and we'll be pleased to help out.

## Contributors


<a href="https://github.com/recognai/rubrix/graphs/contributors">
<img src="https://contrib.rocks/image?repo=recognai/rubrix" />
</a>
248 changes: 204 additions & 44 deletions docs/tutorials/02-spacy.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
"Let's get started!\n",
"\n",
"\n",
"![spacy_ner explore](https://github.com/recognai/rubrix-materials/raw/main/tutorials/1/spacy1.gif)\n",
"<video width=\"100%\" controls><source src=\"02-spacy/spacyner.mp4\" type=\"video/mp4\"></video>\n",
"\n",
"## Introduction\n",
"\n",
Expand All @@ -41,9 +41,9 @@
"\n",
"If you are new to Rubrix, visit and ⭐ star Rubrix for more materials like and detailed docs: [Github repo](https://github.com/recognai/rubrix)\n",
"\n",
"If you have not installed and launched Rubrix, check the [Setup and Installation guide](https://docs.rubrix.ml/en/latest/getting_started/setup%26installation.html).\n",
"If you have not installed and launched Rubrix yet, check the [Setup and Installation guide](https://docs.rubrix.ml/en/latest/getting_started/setup%26installation.html).\n",
"\n",
"In this tutorial, we will import Rubrix, use the `datasets` and `spaCy` libraries and the `en_core_web_trf` pretrained English model. This one is a **Roberta-based spaCy model**:"
"For this tutorial we also need the third party libraries datasets and of course spaCy together with pytorch, which can be installed via git:"
]
},
{
Expand All @@ -52,37 +52,15 @@
"metadata": {},
"outputs": [],
"source": [
"import rubrix as rb\n",
"%pip install datasets spacy~=3.0 protobuf -qqq\n",
"%pip install spacy-transformers -f https://download.pytorch.org/whl/torch_stable.html\n",
"#If the spacy-transformers installation fails, try '%pip install spacy-transformers' "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Install tutorial dependencies\n",
"\n",
"In this tutorial, we'll use the `datasets` and `spaCy` libraries and the `en_core_web_trf` pretrained English model, a Roberta-based spaCy model . If you do not have them installed, run:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%pip install torch datasets \"spacy[transformers]~=3.0\" protobuf -qqq "
"%pip install torch -qqq\n",
"%pip install datasets \"spacy[transformers]~=3.0\" protobuf -qqq "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Our dataset\n",
"## Our dataset\n",
"For this tutorial, we're going to use the [*Gutenberg Time*](https://huggingface.co/datasets/gutenberg_time) dataset from the Hugging Face Hub. It contains all explicit time references in a dataset of 52,183 novels whose full text is available via Project Gutenberg. From extracts of novels, we are surely going to find some NER entities."
]
},
Expand All @@ -101,7 +79,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's take a look at our dataset! "
"Let's create a small test set and have a look at the data! "
]
},
{
Expand All @@ -110,7 +88,197 @@
"metadata": {},
"outputs": [],
"source": [
"train, test = dataset.train_test_split(test_size=0.002, seed=42).values() ; test"
"train, test = dataset.train_test_split(test_size=0.002, seed=42).values()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>guten_id</th>\n",
" <th>hour_reference</th>\n",
" <th>time_phrase</th>\n",
" <th>is_ambiguous</th>\n",
" <th>time_pos_start</th>\n",
" <th>time_pos_end</th>\n",
" <th>tok_context</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>6953</td>\n",
" <td>11</td>\n",
" <td>half past eleven</td>\n",
" <td>True</td>\n",
" <td>66</td>\n",
" <td>69</td>\n",
" <td>`` I was just going up to him to speak about m...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>13123</td>\n",
" <td>5</td>\n",
" <td>ten minutes to six</td>\n",
" <td>True</td>\n",
" <td>65</td>\n",
" <td>69</td>\n",
" <td>Presently the great machinery which assisted h...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>9826</td>\n",
" <td>0</td>\n",
" <td>midnight</td>\n",
" <td>True</td>\n",
" <td>93</td>\n",
" <td>94</td>\n",
" <td>The mate of course obeyed , and the evening sh...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>12256</td>\n",
" <td>0</td>\n",
" <td>Midnight</td>\n",
" <td>True</td>\n",
" <td>107</td>\n",
" <td>108</td>\n",
" <td>`` She is , I presume , by now , the Countess ...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>28357</td>\n",
" <td>11</td>\n",
" <td>eleven o’clock</td>\n",
" <td>True</td>\n",
" <td>89</td>\n",
" <td>91</td>\n",
" <td>Three days passed . Will still remained at the...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>237</th>\n",
" <td>10066</td>\n",
" <td>10</td>\n",
" <td>ten\\no'clock</td>\n",
" <td>True</td>\n",
" <td>52</td>\n",
" <td>54</td>\n",
" <td>He had drawn his chair closer : he had taken h...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>238</th>\n",
" <td>10446</td>\n",
" <td>2</td>\n",
" <td>Two o'clock</td>\n",
" <td>True</td>\n",
" <td>50</td>\n",
" <td>52</td>\n",
" <td>He contented himself , therefore , with the ba...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>239</th>\n",
" <td>2488</td>\n",
" <td>12</td>\n",
" <td>noon</td>\n",
" <td>True</td>\n",
" <td>87</td>\n",
" <td>88</td>\n",
" <td>It was on this oceanic river that the Nautilus...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>240</th>\n",
" <td>9155</td>\n",
" <td>10</td>\n",
" <td>ten o'clock</td>\n",
" <td>True</td>\n",
" <td>58</td>\n",
" <td>60</td>\n",
" <td>It was well the men had gone home , she though...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>241</th>\n",
" <td>6487</td>\n",
" <td>4</td>\n",
" <td>4:34</td>\n",
" <td>True</td>\n",
" <td>41</td>\n",
" <td>42</td>\n",
" <td>Only four minutes left ! Four minutes ! But he...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>242 rows × 7 columns</p>\n",
"</div>"
],
"text/plain": [
" guten_id hour_reference time_phrase is_ambiguous time_pos_start \\\n",
"0 6953 11 half past eleven True 66 \n",
"1 13123 5 ten minutes to six True 65 \n",
"2 9826 0 midnight True 93 \n",
"3 12256 0 Midnight True 107 \n",
"4 28357 11 eleven o’clock True 89 \n",
".. ... ... ... ... ... \n",
"237 10066 10 ten\\no'clock True 52 \n",
"238 10446 2 Two o'clock True 50 \n",
"239 2488 12 noon True 87 \n",
"240 9155 10 ten o'clock True 58 \n",
"241 6487 4 4:34 True 41 \n",
"\n",
" time_pos_end tok_context \n",
"0 69 `` I was just going up to him to speak about m... \n",
"1 69 Presently the great machinery which assisted h... \n",
"2 94 The mate of course obeyed , and the evening sh... \n",
"3 108 `` She is , I presume , by now , the Countess ... \n",
"4 91 Three days passed . Will still remained at the... \n",
".. ... ... \n",
"237 54 He had drawn his chair closer : he had taken h... \n",
"238 52 He contented himself , therefore , with the ba... \n",
"239 88 It was on this oceanic river that the Nautilus... \n",
"240 60 It was well the men had gone home , she though... \n",
"241 42 Only four minutes left ! Four minutes ! But he... \n",
"\n",
"[242 rows x 7 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test.to_pandas()"
]
},
{
Expand All @@ -121,7 +289,7 @@
"\n",
"### Using a Transformer-based pipeline\n",
"\n",
"Let's install and load our roberta-based pretrained pipeline and apply it to one of our dataset records:"
"Let's download our Roberta-based pretrained pipeline and apply it to one of our dataset records:"
]
},
{
Expand Down Expand Up @@ -163,6 +331,7 @@
},
"outputs": [],
"source": [
"import rubrix as rb\n",
"from tqdm.auto import tqdm\n",
"\n",
"records = []\n",
Expand Down Expand Up @@ -194,15 +363,6 @@
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"records[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -317,7 +477,7 @@
"To only see predictions of a specific model, you can use the `predicted by` filter, which comes from the `prediction_agent` parameter of your `TextClassificationRecord`.\n",
"\n",
"\n",
"![spacy_models_meta](img/spacy_ner2.png \"spaCy models predicted_by filter\")\n"
"![spacy_models_meta](02-spacy/spacy_ner2.png \"spaCy models predicted_by filter\")\n"
]
},
{
Expand Down Expand Up @@ -397,7 +557,7 @@
"You can easily check every example by using the filters and search-box.\n",
"\n",
"\n",
"![spacy imdb](https://github.com/recognai/rubrix-materials/raw/main/tutorials/1/spacy1.gif)"
"<video width=\"100%\" controls><source src=\"02-spacy/spacy2.mp4\" type=\"video/mp4\"></video>"
]
},
{
Expand Down Expand Up @@ -430,7 +590,7 @@
"hash": "b709380ea7d1cb2eb4650c0f11ac7e002ec6a534602815725771481b4784238c"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
Expand All @@ -444,7 +604,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
"version": "3.8.10"
},
"metadata": {
"interpreter": {
Expand Down
Binary file added docs/tutorials/02-spacy/spacy2.mp4
Binary file not shown.
Binary file added docs/tutorials/02-spacy/spacy_ner2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/tutorials/02-spacy/spacyner.mp4
Binary file not shown.
Binary file removed docs/tutorials/img/spacy_ner2.png
Binary file not shown.