Skip to content

Commit

Permalink
fix: minor issues in tutorials (#435)
Browse files Browse the repository at this point in the history
fix: minor issues in tutorials
  • Loading branch information
seliverstov committed Sep 7, 2018
2 parents 45139ed + d547b8b commit e91ff6c
Show file tree
Hide file tree
Showing 2 changed files with 152 additions and 24 deletions.
2 changes: 1 addition & 1 deletion examples/tutorials/04_deeppavlov_chitchat.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -269,7 +269,7 @@
"metadata": {},
"outputs": [],
"source": [
"from deeppavlov.models.preprocessors.lazy_tokenizer import LazyTokenizer\n",
"from deeppavlov.models.tokenizers.lazy_tokenizer import LazyTokenizer\n",
"tokenizer = LazyTokenizer()\n",
"tokenizer(['Hello my friend'])"
]
Expand Down
174 changes: 151 additions & 23 deletions examples/tutorials/faq_tutorial_tfidf_logreg.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -26,10 +25,147 @@
"> It allows the student to find well paid work and to start climbing up on a career ladder right after completing university course. Students of the Russian universities are obliged to attend all lectures as only the knowledge gained during classroom occupations allows students to become the effective and knowing professionals. \n",
"\n",
"\n",
"First of all we need train dataset of FAQ.\n",
"<br>\n",
"As example, let's consider MIPT FAQ for entrants - https://mipt.ru/english/edu/faqs/\n",
"\n"
"In this tuorial we'll describe how to build FAQ model based on config deeppavlov/configs/faq/tfidf_logreg_en_faq.json\n",
"<br>First of all we need train dataset of FAQ. As example, let's consider MIPT FAQ for entrants - https://mipt.ru/english/edu/faqs/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note:** Please, install all necessary requirements using command:\n",
"\n",
">\\>\\> python -m deeppavlov install deeppavlov/configs/faq/tfidf_logreg_en_faq.json"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's look at the FAQ dataset:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Question</th>\n",
" <th>Answer</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>What is preparatory course?</td>\n",
" <td>Preparatory course is a special educational pr...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>What is invitation letter?</td>\n",
" <td>The invitation is official document which is p...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>What is registration?</td>\n",
" <td>Registration grants to the foreign citizen the...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Is it possible to study and work at the same t...</td>\n",
" <td>Russian education is one of the most qualitati...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>How long does the academic year last?</td>\n",
" <td>Academic year proceeds 10 months (from Septemb...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>What documents are demanded for admission?</td>\n",
" <td>Passport, documents of your previous education...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>What is the price for one year of study?</td>\n",
" <td>Russian taught programs cost 250'000 rubles pe...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Should I insure my life?</td>\n",
" <td>Life insurance and health is obligatory for an...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>In what cases student can be deducted from Uni...</td>\n",
" <td>At own will, for health reasons, for the acade...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>I have problems. Who can help me?</td>\n",
" <td>If you have any problems you can address to De...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Question \\\n",
"0 What is preparatory course? \n",
"1 What is invitation letter? \n",
"2 What is registration? \n",
"3 Is it possible to study and work at the same t... \n",
"4 How long does the academic year last? \n",
"5 What documents are demanded for admission? \n",
"6 What is the price for one year of study? \n",
"7 Should I insure my life? \n",
"8 In what cases student can be deducted from Uni... \n",
"9 I have problems. Who can help me? \n",
"\n",
" Answer \n",
"0 Preparatory course is a special educational pr... \n",
"1 The invitation is official document which is p... \n",
"2 Registration grants to the foreign citizen the... \n",
"3 Russian education is one of the most qualitati... \n",
"4 Academic year proceeds 10 months (from Septemb... \n",
"5 Passport, documents of your previous education... \n",
"6 Russian taught programs cost 250'000 rubles pe... \n",
"7 Life insurance and health is obligatory for an... \n",
"8 At own will, for health reasons, for the acade... \n",
"9 If you have any problems you can address to De... "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"FAQ_DATASET_URL = 'http://files.deeppavlov.ai/faq/mipt/faq.csv'\n",
"faq_dataset = pd.read_csv(FAQ_DATASET_URL)\n",
"faq_dataset"
]
},
{
Expand All @@ -41,7 +177,7 @@
"import deeppavlov\n",
"from deeppavlov.models.classifiers.logreg_classifier import LogregClassifier\n",
"from deeppavlov.models.vectorizers.tfidf_vectorizer import TfIdfVectorizer\n",
"from deeppavlov.models.tokenizers.ru_tokenizer import RussianTokenizer\n",
"from deeppavlov.models.tokenizers.spacy_tokenizer import StreamSpacyTokenizer\n",
"from deeppavlov.dataset_readers.faq_reader import FaqDatasetReader\n",
"from deeppavlov.core.data.data_learning_iterator import DataLearningIterator\n",
"from deeppavlov.core.data.utils import download_decompress"
Expand All @@ -55,7 +191,7 @@
"source": [
"# Read FAQ data\n",
"reader = FaqDatasetReader()\n",
"faq_data = reader.read(data_url='http://files.deeppavlov.ai/faq/mipt/faq.csv', x_col_name='Question', y_col_name='Answer')\n",
"faq_data = reader.read(data_url=FAQ_DATASET_URL, x_col_name='Question', y_col_name='Answer')\n",
"iterator = DataLearningIterator(data=faq_data)\n",
"\n",
"x,y = iterator.get_instances()"
Expand Down Expand Up @@ -83,25 +219,17 @@
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2018-08-31 13:48:55.488 INFO in 'pymorphy2.opencorpora_dict.wrapper'['wrapper'] at line 16: Loading dictionaries from /home/andrey/v_envs/deep_pavlov_env/lib/python3.6/site-packages/pymorphy2_dicts/data\n",
"2018-08-31 13:48:55.526 INFO in 'pymorphy2.opencorpora_dict.wrapper'['wrapper'] at line 20: format: 2.4, revision: 393442, updated: 2015-01-17T16:03:56.586168\n"
]
}
],
"outputs": [],
"source": [
"# create tokenizer\n",
"tokenizer = RussianTokenizer(lemmas=True)\n",
"tokenizer = StreamSpacyTokenizer(lemmas=True)\n",
"x_tokenized = tokenizer(x)\n",
"# fit TF-IDF vectorizer on train FAQ dataset \n",
"vectorizer = TfIdfVectorizer(mode='train')\n",
"vectorizer.fit(x)\n",
"vectorizer.fit(x_tokenized)\n",
"\n",
"# Now collect (x,y) pairs: x_train - vectorized question, y_train - answer from FAQ\n",
"x_train = vectorizer(tokenizer(x))\n",
"x_train = vectorizer(x_tokenized)\n",
"y_train = y \n",
"\n",
"# Let's use top 2 answers for each incoming questions (top_n param)\n",
Expand Down Expand Up @@ -150,7 +278,7 @@
"['If you have any problems you can address to Department of Foreign Students: +7 (495) 408-70-43 (Auditorium building, room 315).', 'Life insurance and health is obligatory for any foreign citizen who arrived to Russian Federation for study.']\n",
"\n",
"Answers 1:\n",
"['Russian education is one of the most qualitative and fundamental in the world. It allows the student to find well paid work and to start climbing up on a career ladder right after completing university course. Students of the Russian universities are obliged to attend all lectures as only the knowledge gained during classroom occupations allows students to become the effective and knowing professionals. Thus, there is an opportunity to work only after classes or during vacation on the weekend.', 'Life insurance and health is obligatory for any foreign citizen who arrived to Russian Federation for study.']\n",
"['Russian education is one of the most qualitative and fundamental in the world. It allows the student to find well paid work and to start climbing up on a career ladder right after completing university course. Students of the Russian universities are obliged to attend all lectures as only the knowledge gained during classroom occupations allows students to become the effective and knowing professionals. Thus, there is an opportunity to work only after classes or during vacation on the weekend.', \"Russian taught programs cost 250'000 rubles per year, English taught — 400'000 rubles per year.\"]\n",
"\n"
]
}
Expand All @@ -176,8 +304,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Scores 0:[0.92, 0.01]\n",
"Scores 1:[0.8, 0.03]\n"
"Scores 0:[0.93, 0.01]\n",
"Scores 1:[0.87, 0.06]\n"
]
}
],
Expand Down

0 comments on commit e91ff6c

Please sign in to comment.