<a href="https://colab.research.google.com/github/AI-Front/ChatBots/blob/master/4-Question_Answering/ODQA_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ODQA System for English

**O**pen **D**omain **Q**uestion **A**nswering (ODQA) is a task to find an exact answer to any question in Wikipedia articles. Thus, given only a question, the system outputs the best answer it can find. The default ODQA implementation takes a batch of queries as input and returns the best answer.

The [Overview](https://medium.com/deeppavlov/open-domain-question-answering-with-deeppavlov-c665d2ee4d65)

Example:
![odqa_example.png](https://github.com/AI-Front/ChatBots/blob/master/4-Question_Answering/odqa.png?raw=1)



There are several approaches to the architecture of an ODQA system. A modular ODQA system consists of two components, the first one (the ranker) should be able to find the relevant articles in a database (e.g., Wikipedia), whereas the second one (the reader) extracts an answer from a single article or a small collection of articles retrieved by the ranker. In addition to the strictly two-component ODQA systems, there are hybrid systems that are based on several rankers where the last ranker in the pipeline is combined with an answer extraction module usually via reinforcement learning.

We are going to use [DeepPavlov ODQA model](http://docs.deeppavlov.ai/en/master/features/models/squad.html#pretrained-models) based on BERT.  With this model you can also create a new QA model dealing with the domain you need (information about your company, about your projects, about the StarWars fandom, etc)

Dealing with ODQA model includes:

0. Model installation and interaction with the pretrained model
0. Training the model on your own data
0. Skill example 

# Model installation and interaction

In [1]:
!pip install deeppavlov
!python -m deeppavlov install en_odqa_infer_wiki

Collecting deeppavlov
[?25l  Downloading https://files.pythonhosted.org/packages/63/25/7c97c184d13c579ddc4550f58c0c4fc480a6122393008c21c4e4fad6d64d/deeppavlov-0.11.0-py3-none-any.whl (847kB)
[K     |████████████████████████████████| 849kB 2.7MB/s 
[?25hCollecting pymorphy2==0.8
[?25l  Downloading https://files.pythonhosted.org/packages/a3/33/fff9675c68b5f6c63ec8c6e6ff57827dda28a1fa5b2c2d727dffff92dd47/pymorphy2-0.8-py2.py3-none-any.whl (46kB)
[K     |████████████████████████████████| 51kB 5.7MB/s 
[?25hCollecting ruamel.yaml==0.15.100
[?25l  Downloading https://files.pythonhosted.org/packages/e7/9f/83bb34eaf84032b0b54fcc4a6aff1858572d279d65a301c7ae875f523df5/ruamel.yaml-0.15.100-cp36-cp36m-manylinux1_x86_64.whl (656kB)
[K     |████████████████████████████████| 665kB 9.6MB/s 
[?25hCollecting pytelegrambotapi==3.6.7
[?25l  Downloading https://files.pythonhosted.org/packages/62/ab/99c606f69fcda57e35788b913dd34c9d9acb48dd26349141b3855dcf6351/pyTelegramBotAPI-3.6.7.tar.gz (65kB)


2020-07-09 17:14:22.854 INFO in 'deeppavlov.core.common.file'['file'] at line 32: Interpreting 'en_odqa_infer_wiki' as '/usr/local/lib/python3.6/dist-packages/deeppavlov/configs/odqa/en_odqa_infer_wiki.json'
Collecting tensorflow==1.15.2
[?25l  Downloading https://files.pythonhosted.org/packages/9a/d9/fd234c7bf68638423fb8e7f44af7fcfce3bcaf416b51e6d902391e47ec43/tensorflow-1.15.2-cp36-cp36m-manylinux2010_x86_64.whl (110.5MB)
[K     |████████████████████████████████| 110.5MB 90kB/s 
Collecting tensorflow-estimator==1.15.1
[?25l  Downloading https://files.pythonhosted.org/packages/de/62/2ee9cd74c9fa2fa450877847ba560b260f5d0fb70ee0595203082dafcc9d/tensorflow_estimator-1.15.1-py2.py3-none-any.whl (503kB)
[K     |████████████████████████████████| 512kB 25.2MB/s 
Collecting tensorboard<1.16.0,>=1.15.0
[?25l  Downloading https://files.pythonhosted.org/packages/1e/e9/d3d747a97f7188f48aa5eda486907f3b345cd409f0a0850468ba867db246/tensorboard-1.15.0-py3-none-any.whl (3.8MB)
[K     |██████████

In [None]:
# downloading the model
from deeppavlov import configs
from deeppavlov.core.commands.infer import build_model

odqa = build_model(configs.odqa.en_odqa_infer_wiki, download=True)


2020-07-09 17:21:35.569 INFO in 'deeppavlov.core.data.utils'['utils'] at line 94: Downloading from http://files.deeppavlov.ai/deeppavlov_data/multi_squad_model_noans_1.1.tar.gz to /root/.deeppavlov/multi_squad_model_noans_1.1.tar.gz
100%|██████████| 265M/265M [00:51<00:00, 5.15MB/s]
2020-07-09 17:22:27.198 INFO in 'deeppavlov.core.data.utils'['utils'] at line 269: Extracting /root/.deeppavlov/multi_squad_model_noans_1.1.tar.gz archive into /root/.deeppavlov/models
2020-07-09 17:22:32.915 INFO in 'deeppavlov.core.data.utils'['utils'] at line 94: Downloading from http://files.deeppavlov.ai/datasets/wikipedia/enwiki.tar.gz to /root/.deeppavlov/enwiki.tar.gz
100%|██████████| 4.81G/4.81G [12:33<00:00, 6.39MB/s]
2020-07-09 17:35:06.440 INFO in 'deeppavlov.core.data.utils'['utils'] at line 269: Extracting /root/.deeppavlov/enwiki.tar.gz archive into /root/.deeppavlov/downloads
2020-07-09 17:38:44.92 INFO in 'deeppavlov.core.data.utils'['utils'] at line 94: Downloading from http://files.deeppa

In [None]:
# using the pretrained model
odqa(['What is the name of Darth Vader\'s son?'])

To use the model, you should pass a question only ()


In [None]:
question = 'What is a chatbot?'
odqa([question])

[['computing systems vaguely inspired by the biological neural networks that constitute animal brains'],
 [56],
 [695200.625]]

# Training the model 

In [None]:
from deeppavlov import configs
from deeppavlov.core.commands.train import train_evaluate_model_from_config

train_evaluate_model_from_config(configs.doc_retrieval.en_ranker_tfidf_wiki, download=True)
train_evaluate_model_from_config(configs.squad.multi_squad_noans, download=True)


# Skill example

To understand wheater the model should be activated or not, we can 
1. try to guess if the message contains questions 
2. use a classifier understanding is the question informative or not
3. pass an informative question straight to the model 