<a href="https://colab.research.google.com/github/AI-Front/NTI/blob/main/ODQA_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ODQA System for English

**O**pen **D**omain **Q**uestion **A**nswering (ODQA) is a task to find an exact answer to any question in Wikipedia articles. Thus, given only a question, the system outputs the best answer it can find. The default ODQA implementation takes a batch of queries as input and returns the best answer.

The [Overview](https://medium.com/deeppavlov/open-domain-question-answering-with-deeppavlov-c665d2ee4d65)

Example:
![odqa_example.png](https://github.com/AI-Front/ChatBots/blob/master/4-Question_Answering/odqa.png?raw=1)



There are several approaches to the architecture of an ODQA system. A modular ODQA system consists of two components, the first one (the ranker) should be able to find the relevant articles in a database (e.g., Wikipedia), whereas the second one (the reader) extracts an answer from a single article or a small collection of articles retrieved by the ranker. In addition to the strictly two-component ODQA systems, there are hybrid systems that are based on several rankers where the last ranker in the pipeline is combined with an answer extraction module usually via reinforcement learning.

We are going to use [DeepPavlov ODQA model](http://docs.deeppavlov.ai/en/master/features/models/squad.html#pretrained-models) based on BERT.  With this model you can also create a new QA model dealing with the domain you need (information about your company, about your projects, about the StarWars fandom, etc)

Dealing with ODQA model includes:

0. Model installation and interaction with the pretrained model
0. Training the model on your own data
0. Skill example 

# Model installation and interaction

In [1]:
%tensorflow_version 1.14

`%tensorflow_version` only switches the major version: 1.x or 2.x.
You set: `1.14`. This will be interpreted as: `1.x`.


TensorFlow 1.x selected.


In [2]:
!pip install uvloop==0.14 
!pip install deeppavlov
!python -m deeppavlov install en_odqa_infer_wiki

Collecting uvloop==0.14
[?25l  Downloading https://files.pythonhosted.org/packages/9b/7a/54a80c03b555af21680a2f3692947b43a0d576d90c4c18cace0fee1ccc0e/uvloop-0.14.0-cp37-cp37m-manylinux2010_x86_64.whl (3.8MB)
[K     |████████████████████████████████| 3.8MB 27.3MB/s 
[?25hInstalling collected packages: uvloop
Successfully installed uvloop-0.14.0
Collecting deeppavlov
[?25l  Downloading https://files.pythonhosted.org/packages/22/87/e77ccc7de09f8c5c4a3d981ff6b1d3811d9978976a30bec9bdf50d667ebb/deeppavlov-0.15.0-py3-none-any.whl (907kB)
[K     |████████████████████████████████| 911kB 26.8MB/s 
[?25hCollecting fastapi==0.47.1
[?25l  Downloading https://files.pythonhosted.org/packages/a3/a7/4804d7abf8a1544d079d50650af872387154ebdac5bd07d54b2e60e2b334/fastapi-0.47.1-py3-none-any.whl (43kB)
[K     |████████████████████████████████| 51kB 6.0MB/s 
[?25hCollecting h5py==2.10.0
[?25l  Downloading https://files.pythonhosted.org/packages/3f/c0/abde58b837e066bca19a3f7332d9d0493521d7dd6b482484

2021-06-08 12:50:42.221 INFO in 'deeppavlov.core.common.file'['file'] at line 32: Interpreting 'en_odqa_infer_wiki' as '/usr/local/lib/python3.7/dist-packages/deeppavlov/configs/odqa/en_odqa_infer_wiki.json'
Collecting gast==0.2.2
  Downloading https://files.pythonhosted.org/packages/4e/35/11749bf99b2d4e3cceb4d55ca22590b0d7c2c62b9de38ac4a4a7f4687421/gast-0.2.2.tar.gz
Building wheels for collected packages: gast
  Building wheel for gast (setup.py) ... [?25l[?25hdone
  Created wheel for gast: filename=gast-0.2.2-cp37-none-any.whl size=7557 sha256=999dc46df49ca04a9fd4b5c8b6fa8ff239cec7b01223c6099c4b96ecb16d81aa
  Stored in directory: /root/.cache/pip/wheels/5c/2e/7e/a1d4d4fcebe6c381f378ce7743a3ced3699feb89bcfbdadadd
Successfully built gast
[31mERROR: kapre 0.3.5 has requirement numpy>=1.18.5, but you'll have numpy 1.18.0 which is incompatible.[0m
[31mERROR: kapre 0.3.5 has requirement tensorflow>=2.0.0, but you'll have tensorflow 1.15.2 which is incompatible.[0m
Installing collecte

In [None]:
# downloading the model
from deeppavlov import configs
from deeppavlov.core.commands.infer import build_model

odqa = build_model(configs.odqa.en_odqa_infer_wiki, download=True)


2021-06-08 12:51:00.383 INFO in 'deeppavlov.core.data.utils'['utils'] at line 95: Downloading from http://files.deeppavlov.ai/datasets/wikipedia/enwiki.tar.gz to /root/.deeppavlov/enwiki.tar.gz
 33%|███▎      | 1.63G/4.88G [13:02<29:56, 1.81MB/s]

In [None]:
# using the pretrained model
odqa(['What is the name of Darth Vader\'s son?'])

To use the model, you should pass a question only ()


In [None]:
question = 'What is a chatbot?'
odqa([question])

# Download the Data

https://aiijc.com/ru/task/1067/ 

In [None]:
#train
!wget https://aiijc.com/api_v2/task/1067/56

In [None]:
#test
!wget https://aiijc.com/api_v2/task/1067/58

In [None]:
import pandas as pd
from tqdm import tqdm

In [None]:
train = pd.read_('')
train.head()

In [None]:
test = pd.read_csv('', sep=',')
test.head()

In [None]:
answers = []
for i in tqdm(range(len(test))):
  question = test.iloc[i]['question']
  answer = odqa([question])
  answers.append(answer)
test['answer'] = answers
test = test.drop('question', 1)
test.to_csv('my_sybmission.csv')

# Training the model 

In [None]:
from deeppavlov import configs
from deeppavlov.core.commands.train import train_evaluate_model_from_config

train_evaluate_model_from_config(configs.doc_retrieval.en_ranker_tfidf_wiki, download=True)
train_evaluate_model_from_config(configs.squad.multi_squad_noans, download=True)
