<a href="https://colab.research.google.com/github/AI-Front/ChatBots/blob/master/1-Introduction/Intro_chatbot_tutorial_simple.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Simple chitchat bot in DeepPavlov

This tutorial describes how to build a simple trainable dialogue system with DeepPavlov framework. It shows one of the easiest ways to create a chatbot. All you need is just a dozen of dialogs from your domain with bot responses annotated for dialogue acts. The tutorial covers the following steps:

0. [Data preparation](#0.-Data-Preparation)
1. [Train bot](#1.-Train-bot)
2. [Interact with bot](#2.-Interact-with-bot)


An example of the final model served as a telegram bot is:

![gobot_simple_example.png](https://github.com/deepmipt/DeepPavlov/blob/master/examples/img/gobot_simple_example.png?raw=1)

In [2]:
!pip install deeppavlov
!python -m deeppavlov install gobot_dstc2_minimal

Collecting deeppavlov
[?25l  Downloading https://files.pythonhosted.org/packages/63/25/7c97c184d13c579ddc4550f58c0c4fc480a6122393008c21c4e4fad6d64d/deeppavlov-0.11.0-py3-none-any.whl (847kB)
[K     |████████████████████████████████| 849kB 3.5MB/s 
[?25hCollecting Cython==0.29.14
[?25l  Downloading https://files.pythonhosted.org/packages/df/d1/4d3f8a7a920e805488a966cc6ab55c978a712240f584445d703c08b9f405/Cython-0.29.14-cp36-cp36m-manylinux1_x86_64.whl (2.1MB)
[K     |████████████████████████████████| 2.1MB 16.2MB/s 
Collecting sacremoses==0.0.35
[?25l  Downloading https://files.pythonhosted.org/packages/1f/8e/ed5364a06a9ba720fddd9820155cc57300d28f5f43a6fd7b7e817177e642/sacremoses-0.0.35.tar.gz (859kB)
[K     |████████████████████████████████| 860kB 20.6MB/s 
Collecting numpy==1.18.0
[?25l  Downloading https://files.pythonhosted.org/packages/92/e6/45f71bd24f4e37629e9db5fb75caab919507deae6a5a257f9e4685a5f931/numpy-1.18.0-cp36-cp36m-manylinux1_x86_64.whl (20.1MB)
[K     |██████████

2020-07-08 14:39:32.155 INFO in 'deeppavlov.core.common.file'['file'] at line 32: Interpreting 'gobot_dstc2_minimal' as '/usr/local/lib/python3.6/dist-packages/deeppavlov/configs/go_bot/gobot_dstc2_minimal.json'
Collecting tensorflow==1.15.2
[?25l  Downloading https://files.pythonhosted.org/packages/9a/d9/fd234c7bf68638423fb8e7f44af7fcfce3bcaf416b51e6d902391e47ec43/tensorflow-1.15.2-cp36-cp36m-manylinux2010_x86_64.whl (110.5MB)
[K     |████████████████████████████████| 110.5MB 98kB/s 
Collecting tensorflow-estimator==1.15.1
[?25l  Downloading https://files.pythonhosted.org/packages/de/62/2ee9cd74c9fa2fa450877847ba560b260f5d0fb70ee0595203082dafcc9d/tensorflow_estimator-1.15.1-py2.py3-none-any.whl (503kB)
[K     |████████████████████████████████| 512kB 31.6MB/s 
[?25hCollecting gast==0.2.2
  Downloading https://files.pythonhosted.org/packages/4e/35/11749bf99b2d4e3cceb4d55ca22590b0d7c2c62b9de38ac4a4a7f4687421/gast-0.2.2.tar.gz
Collecting tensorboard<1.16.0,>=1.15.0
[?25l  Downloadin

## 0. Data Preparation

In this tutorial we will build and train a simple chatbot just from 10 dialogues. 

Reading data:

In [3]:
from deeppavlov.dataset_readers.dstc2_reader import SimpleDSTC2DatasetReader


class AssistantDatasetReader(SimpleDSTC2DatasetReader):
    
    url = "http://files.deeppavlov.ai/datasets/tutor_assistant_data.tar.gz"
    
    @staticmethod
    def _data_fname(datatype):
        assert datatype in ('val', 'trn', 'tst'), "wrong datatype name"
        return f"assistant-{datatype}.json"

In [4]:
data = AssistantDatasetReader().read('assistant_data')

2020-07-08 14:40:51.729 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 269: [PosixPath('assistant_data/assistant-val.json'), PosixPath('assistant_data/assistant-tst.json')]]
2020-07-08 14:40:51.730 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 270: [downloading data from http://files.deeppavlov.ai/datasets/tutor_assistant_data.tar.gz to assistant_data]
2020-07-08 14:40:51.732 INFO in 'deeppavlov.core.data.utils'['utils'] at line 94: Downloading from http://files.deeppavlov.ai/datasets/tutor_assistant_data.tar.gz to assistant_data/tutor_assistant_data.tar.gz
100%|██████████| 838/838 [00:00<00:00, 274kB/s]
2020-07-08 14:40:52.383 INFO in 'deeppavlov.core.data.utils'['utils'] at line 269: Extracting assistant_data/tutor_assistant_data.tar.gz archive into assistant_data
2020-07-08 14:40:52.389 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 290: [loading dialogs from assistant_data/assistant-trn.json]
2020-07-

The training/validation/test data is stored in json files (`assistant-trn.json`, `assistant-val.json` and `assistant-tst.json`):

In [5]:
!ls assistant_data

assistant-templates.txt  assistant-tst.json
assistant-trn.json	 assistant-val.json


Let's take a look at the training data.

In [6]:
!head -n 10 assistant_data/assistant-trn.json

[
  [
    {
      "speaker": 1,
      "text": "hi"
    },
    {
      "speaker": 2,
      "text": "Hello, what is the weather today?",
      "act": "welcome_msg"


## 1. Train bot

In [8]:
from deeppavlov import configs
from deeppavlov.core.common.file import read_json

gobot_config = read_json(configs.go_bot.gobot_dstc2_minimal)

Download pre-trained GLOVe embeddings:

In [9]:
from deeppavlov.download import download_resource

download_resource(url="http://files.deeppavlov.ai/embeddings/glove.6B.100d.txt",
                  dest_paths=['assistant_bot/'])

2020-07-08 14:41:49.590 INFO in 'deeppavlov.core.data.utils'['utils'] at line 94: Downloading from http://files.deeppavlov.ai/embeddings/glove.6B.100d.txt to assistant_bot/glove.6B.100d.txt
347MB [00:23, 14.7MB/s]


Configure bot to use downloaded embeddings:

In [10]:
from deeppavlov import train_model

gobot_config['chainer']['pipe'][-1]['embedder'] = {
    "class_name": "glove",
    "load_path": "assistant_bot/glove.6B.100d.txt"
}
gobot_config['chainer']['pipe'][-1]['nlg_manager']['template_path'] = 'assistant_data/assistant-templates.txt'
gobot_config['chainer']['pipe'][-1]['nlg_manager']['api_call_action'] = None
gobot_config['dataset_reader']['class_name'] = '__main__:AssistantDatasetReader'
gobot_config['metadata']['variables']['DATA_PATH'] = 'assistant_data'
gobot_config['metadata']['variables']['MODEL_PATH'] = 'assistant_bot'

gobot_config['train']['batch_size'] = 4 # set batch size
gobot_config['train']['max_batches'] = 30 # maximum number of training batches
gobot_config['train']['val_every_n_batches'] = 30 # evaluate on full 'valid' split every 30 epochs
gobot_config['train']['log_every_n_batches'] = 5 # evaluate on full 'train' split every 5 batches

train_model(gobot_config);

2020-07-08 14:42:14.311 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 290: [loading dialogs from /content/assistant_data/assistant-trn.json]
2020-07-08 14:42:14.313 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 290: [loading dialogs from /content/assistant_data/assistant-val.json]
2020-07-08 14:42:14.315 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 290: [loading dialogs from /content/assistant_data/assistant-tst.json]
2020-07-08 14:42:14.317 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 282: There are 24 samples in train split.
2020-07-08 14:42:14.318 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 283: There are 3 samples in valid split.
2020-07-08 14:42:14.320 INFO in 'deeppavlov.dataset_readers.dstc2_reader'['dstc2_reader'] at line 284: There are 3 samples in test split.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Un






2020-07-08 14:42:18.792 INFO in 'deeppavlov.models.embedders.glove_embedder'['glove_embedder'] at line 52: [loading GloVe embeddings from `/content/assistant_bot/glove.6B.100d.txt`]






Instructions for updating:
Use keras.layers.Dense instead.
Instructions for updating:
Please use `layer.__call__` method instead.
Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API
Instructions for updating:
Please use `layer.add_weight` method instead.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where




Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.

Instructions for updating:
Use standard file APIs to check for files with this prefix.


2020-07-08 14:43:07.110 INFO in 'deeppavlov.models.go_bot.policy.policy_network'['policy_network'] at line 89: INSIDE PolicyNetwork init(). Initializing PolicyNetwork from scratch.
2020-07-08 14:43:07.181 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 198: Initial best per_item_dialog_accuracy of 0.0


{"valid": {"eval_examples_count": 1, "metrics": {"per_item_dialog_accuracy": 0.0}, "time_spent": "0:00:01", "epochs_done": 0, "batches_seen": 0, "train_examples_seen": 0, "impatience": 0, "patience_limit": 10}}
{"train": {"eval_examples_count": 8, "metrics": {"per_item_dialog_accuracy": 0.5}, "time_spent": "0:00:01", "epochs_done": 2, "batches_seen": 5, "train_examples_seen": 20, "learning_rate": 0.003, "momentum": 0.95, "loss": 1.5552779197692872}}
{"train": {"eval_examples_count": 8, "metrics": {"per_item_dialog_accuracy": 0.9583}, "time_spent": "0:00:01", "epochs_done": 4, "batches_seen": 10, "train_examples_seen": 40, "learning_rate": 0.003, "momentum": 0.95, "loss": 1.0991986751556397}}
{"train": {"eval_examples_count": 8, "metrics": {"per_item_dialog_accuracy": 0.9583}, "time_spent": "0:00:01", "epochs_done": 7, "batches_seen": 15, "train_examples_seen": 60, "learning_rate": 0.003, "momentum": 0.95, "loss": 0.6032580375671387}}
{"train": {"eval_examples_count": 8, "metrics": {"pe

2020-07-08 14:43:08.146 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 206: Improved best per_item_dialog_accuracy of 1.0
2020-07-08 14:43:08.146 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 208: Saving model
2020-07-08 14:43:08.147 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 75: [saving model to /content/assistant_bot/model/policy]


{"train": {"eval_examples_count": 8, "metrics": {"per_item_dialog_accuracy": 1.0}, "time_spent": "0:00:01", "epochs_done": 12, "batches_seen": 25, "train_examples_seen": 100, "learning_rate": 0.003, "momentum": 0.95, "loss": 0.104348523914814}}
{"train": {"eval_examples_count": 8, "metrics": {"per_item_dialog_accuracy": 1.0}, "time_spent": "0:00:02", "epochs_done": 14, "batches_seen": 30, "train_examples_seen": 120, "learning_rate": 0.003, "momentum": 0.95, "loss": 0.05546935647726059}}



2020-07-08 14:43:08.365 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /content/assistant_bot/word.dict]
2020-07-08 14:43:08.367 INFO in 'deeppavlov.models.embedders.glove_embedder'['glove_embedder'] at line 52: [loading GloVe embeddings from `/content/assistant_bot/glove.6B.100d.txt`]


{"valid": {"eval_examples_count": 1, "metrics": {"per_item_dialog_accuracy": 1.0}, "time_spent": "0:00:02", "epochs_done": 14, "batches_seen": 30, "train_examples_seen": 120, "impatience": 0, "patience_limit": 10}}


2020-07-08 14:43:53.927 INFO in 'deeppavlov.models.go_bot.policy.policy_network'['policy_network'] at line 86: INSIDE PolicyNetwork init(). Initializing PolicyNetwork from checkpoint.
2020-07-08 14:43:53.933 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 51: [loading model from /content/assistant_bot/model/policy]


INFO:tensorflow:Restoring parameters from /content/assistant_bot/model/policy


2020-07-08 14:43:54.134 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /content/assistant_bot/word.dict]
2020-07-08 14:43:54.139 INFO in 'deeppavlov.models.embedders.glove_embedder'['glove_embedder'] at line 52: [loading GloVe embeddings from `/content/assistant_bot/glove.6B.100d.txt`]


{"valid": {"eval_examples_count": 1, "metrics": {"per_item_dialog_accuracy": 1.0}, "time_spent": "0:00:01"}}
{"test": {"eval_examples_count": 1, "metrics": {"per_item_dialog_accuracy": 1.0}, "time_spent": "0:00:01"}}


2020-07-08 14:44:39.703 INFO in 'deeppavlov.models.go_bot.policy.policy_network'['policy_network'] at line 86: INSIDE PolicyNetwork init(). Initializing PolicyNetwork from checkpoint.
2020-07-08 14:44:39.709 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 51: [loading model from /content/assistant_bot/model/policy]


INFO:tensorflow:Restoring parameters from /content/assistant_bot/model/policy


# 2. Interact with bot

In [11]:
from deeppavlov import build_model

bot = build_model(gobot_config)

2020-07-08 14:44:51.525 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /content/assistant_bot/word.dict]
2020-07-08 14:44:51.530 INFO in 'deeppavlov.models.embedders.glove_embedder'['glove_embedder'] at line 52: [loading GloVe embeddings from `/content/assistant_bot/glove.6B.100d.txt`]
2020-07-08 14:45:38.331 INFO in 'deeppavlov.models.go_bot.policy.policy_network'['policy_network'] at line 86: INSIDE PolicyNetwork init(). Initializing PolicyNetwork from checkpoint.
2020-07-08 14:45:38.337 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 51: [loading model from /content/assistant_bot/model/policy]


INFO:tensorflow:Restoring parameters from /content/assistant_bot/model/policy


In [12]:
bot([[{"text": "good evening, bot"}]])

[['Hello, what is the weather today?']]

In [13]:
bot([[{"text": "the weather is clooudy and gloooomy"}]])

[['Then you should cycle!']]

In [14]:
bot([[{"text": "nice idea, thanks!"}]])

[['Hello, what is the weather today?']]

In [15]:
bot.reset()

In [16]:
bot([[{"text": "hi bot"}]])

[['Hello, what is the weather today?']]

In [17]:
bot([[{"text": "looks ok, the sun is bright and yesterday's rain stopped already"}]])

[['Then you should cycle!']]

In [18]:
bot([[{"text": "i dont wanna"}]])

[['Hello, what is the weather today?']]

You can also train a more advanced goal-oriented bot following [gobot_extended_tutorial.ipynb](https://github.com/deepmipt/DeepPavlov/blob/master/examples/gobot_extended_tutorial.ipynb)