# Query Classifier Tutorial
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial14_Query_Classifier.ipynb)

In this tutorial we introduce the query classifier the goal of introducing this feature was to optimize the overall flow of Haystack pipeline by detecting the nature of user queries. Now, the Haystack can detect primarily three types of queries using both light-weight SKLearn Gradient Boosted classifier or Transformer based more robust classifier. The three categories of queries are as follows:


### 1. Keyword Queries: 
Such queries don't have semantic meaning and merely consist of keywords. For instance these three are the examples of keyword queries.

*   arya stark father
*   jon snow country
*   arya stark younger brothers

### 2. Interrogative Queries: 
In such queries users usually ask a question, regardless of presence of "?" in the query the goal here is to detect the intent of the user whether any question is asked or not in the query. For example:

*   who is the father of arya stark ?
*   which country was jon snow filmed ?
*   who are the younger brothers of arya stark ?

### 3. Declarative Queries: 
Such queries are variation of keyword queries, however, there is semantic relationship between words. Fo example:

*   Arya stark was a daughter of a lord.
*   Jon snow was filmed in a country in UK.
*   Bran was brother of a princess.

In this tutorial, you will learn how the `TransformersQueryClassifier` and `SklearnQueryClassifier` classes can be used to intelligently route your queries, based on the nature of the user query. Also, you can choose between a lightweight Gradients boosted classifier or a transformer based classifier.

Furthermore, there are two types of classifiers you can use out of the box from Haystack.
1. Keyword vs Statement/Question Query Classifier
2. Statement vs Question Query Classifier

As evident from the name the first classifier detects the keywords search queries and semantic statements like sentences/questions. The second classifier differentiates between question based queries and declarative sentences.

### Prepare environment

#### Colab: Enable the GPU runtime
Make sure you enable the GPU runtime to experience decent speed in this tutorial.  
**Runtime -> Change Runtime type -> Hardware accelerator -> GPU**

<img src="https://raw.githubusercontent.com/deepset-ai/haystack/master/docs/_src/img/colab_gpu_runtime.jpg">

These lines are to install Haystack through pip

In [None]:
# Install the latest release of Haystack in your own environment
#! pip install farm-haystack

# Install the latest master of Haystack
!pip install grpcio-tools==1.34.1
!pip install --upgrade git+https://github.com/deepset-ai/haystack.git

# Install  pygraphviz
!apt install libgraphviz-dev
!pip install pygraphviz

# If you run this notebook on Google Colab, you might need to
# restart the runtime after installing haystack.

# In Colab / No Docker environments: Start Elasticsearch from source
! wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q
! tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz
! chown -R daemon:daemon elasticsearch-7.9.2

import os
from subprocess import Popen, PIPE, STDOUT
es_server = Popen(['elasticsearch-7.9.2/bin/elasticsearch'],
                   stdout=PIPE, stderr=STDOUT,
                   preexec_fn=lambda: os.setuid(1)  # as daemon
                  )
# wait until ES has started
! sleep 30

Collecting grpcio-tools==1.34.1
  Downloading grpcio_tools-1.34.1-cp37-cp37m-manylinux2014_x86_64.whl (2.5 MB)
[K     |████████████████████████████████| 2.5 MB 7.7 MB/s 
Installing collected packages: grpcio-tools
Successfully installed grpcio-tools-1.34.1
Collecting git+https://github.com/deepset-ai/haystack.git
  Cloning https://github.com/deepset-ai/haystack.git to /tmp/pip-req-build-dq33uzal
  Running command git clone -q https://github.com/deepset-ai/haystack.git /tmp/pip-req-build-dq33uzal
Collecting farm==0.8.0
  Downloading farm-0.8.0-py3-none-any.whl (204 kB)
[K     |████████████████████████████████| 204 kB 8.3 MB/s 
[?25hCollecting fastapi
  Downloading fastapi-0.68.0-py3-none-any.whl (52 kB)
[K     |████████████████████████████████| 52 kB 984 kB/s 
[?25hCollecting uvicorn
  Downloading uvicorn-0.14.0-py3-none-any.whl (50 kB)
[K     |████████████████████████████████| 50 kB 8.1 MB/s 
[?25hCollecting gunicorn
  Downloading gunicorn-20.1.0-py3-none-any.whl (79 kB)
[K    

If running from Colab or a no Docker environment, you will want to start Elasticsearch from source

## Initialization

Let's fetch some data (in this case, pages from the Game of Thrones wiki) and prepare it so that it can
be used indexed into our `DocumentStore`

In [1]:
from haystack.utils import print_answers, fetch_archive_from_http, convert_files_to_dicts, clean_wiki_text, launch_es
from haystack.pipelines import Pipeline, RootNode
from haystack.document_stores import ElasticsearchDocumentStore
from haystack.nodes import ElasticsearchRetriever, DensePassageRetriever, FARMReader, TransformersQueryClassifier, SklearnQueryClassifier

#Download and prepare data - 517 Wikipedia articles for Game of Thrones
doc_dir = "data/article_txt_got"
s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt.zip"
fetch_archive_from_http(url=s3_url, output_dir=doc_dir)

# convert files to dicts containing documents that can be indexed to our datastore
got_dicts = convert_files_to_dicts(
    dir_path=doc_dir,
    clean_func=clean_wiki_text,
    split_paragraphs=True
)

# Initialize DocumentStore and index documents
launch_es()
document_store = ElasticsearchDocumentStore()
document_store.delete_documents()
document_store.write_documents(got_dicts)

# Initialize Sparse retriever
es_retriever = ElasticsearchRetriever(document_store=document_store)

# Initialize dense retriever
dpr_retriever = DensePassageRetriever(document_store)
document_store.update_embeddings(dpr_retriever, update_existing_embeddings=False)

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2")

docker: Error response from daemon: driver failed programming external connectivity on endpoint determined_mclean (a60f4d3a46d61bc218b18767b664632c8eb5fa620a44b06cd80e1f5f9b72d895): Bind for 0.0.0.0:9200 failed: port is already allocated.
Tried to start Elasticsearch through Docker but this failed. It is likely that there is already an existing Elasticsearch instance running. 


6accfdef2d0e9b4c472e78beff6f970654e3280b3ae706fed035e990add5fdd9


Updating embeddings:   0%|          | 0/2357 [00:00<?, ? Docs/s]

Create embeddings:   0%|          | 0/2368 [00:00<?, ? Docs/s]

Some weights of the model checkpoint at deepset/roberta-base-squad2 were not used when initializing RobertaModel: ['qa_outputs.bias', 'qa_outputs.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at deepset/roberta-base-squad2 and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Failed to log params: Changing param values is not allowed. Param with key='pred

## Keyword vs Question/Statement Classifier

The keyword vs question/statement query classifier essentially distinguishes between the keyword queries and statements/questions. So you can intelligently route to different retrieval nodes based on the nature of the query. Using this classifier can potentially yield the following benefits:

*  Getting better search results (e.g. by routing only proper questions to DPR / QA branches and not keyword queries)
*  Less GPU costs (e.g. if 50% of your traffic is only keyword queries you could just use elastic here and save the GPU resources for the other 50% of traffic with semantic queries)

![image](https://user-images.githubusercontent.com/6007894/127831511-f55bad86-4b4f-4b54-9889-7bba37e475c6.png)


Below, we define a `SklQueryClassifier` and show how to use it:

Read more about the trained model and dataset used [here](https://ext-models-haystack.s3.eu-central-1.amazonaws.com/gradboost_query_classifier/readme.txt)

In [3]:
# Here we build the pipeline
sklearn_keyword_classifier = Pipeline()
sklearn_keyword_classifier.add_node(component=SklearnQueryClassifier(), name="QueryClassifier", inputs=["Query"])
sklearn_keyword_classifier.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["QueryClassifier.output_1"])
sklearn_keyword_classifier.add_node(component=es_retriever, name="ESRetriever", inputs=["QueryClassifier.output_2"])
sklearn_keyword_classifier.add_node(component=reader, name="QAReader", inputs=["ESRetriever", "DPRRetriever"])
sklearn_keyword_classifier.draw("pipeline_classifier.png")


In [5]:

# Run only the dense retriever on the full sentence query
res_1 = sklearn_keyword_classifier.run(
    query="Who is the father of Arya Stark?"
)
print("DPR Results" + "\n" + "="*15)
print_answers(res_1, details="minimal")

# Run only the sparse retriever on a keyword based query
res_2 = sklearn_keyword_classifier.run(
    query="arya stark father"
)
print("ES Results" + "\n" + "="*15)
print_answers(res_2, details="minimal")


Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.39 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.42s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.02 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.96 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.03 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.03 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:03<00:00,  3.27s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.05 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.42s/ Batches]


DPR Results
[   {   'answer': 'Eddard and Catelyn Stark',
        'context': 'Background ===\n'
                   'Arya is the third child and younger daughter of Eddard and '
                   'Catelyn Stark and is nine years old at the beginning of '
                   'the book series.  Sh'},
    {   'answer': 'Rhaegar',
        'context': ', Aemon Targaryen, Jorah Mormont, Meera Reed, Jon '
                   'Connington and Gilly.\n'
                   'Rhaegar married the Dornish princess Elia Martell of '
                   'Sunspear, and fathered wi'},
    {   'answer': 'Eddard Stark',
        'context': 'e from House Tully in the Riverlands region prior to her '
                   'marriage to Eddard Stark. She has her hair dyed dark brown '
                   'later on while in the Vale, disgui'},
    {   'answer': 'Eddard Stark and Catelyn Stark',
        'context': 'ces==\n'
                   'Sansa Stark is the second child and elder daughter of '
                   'Ed

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.84 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.45s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.02 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.98 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.88 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.04 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.05 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.39s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:02<00:00,  2.72s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.06 Batches/s]

ES Results
[   {   'answer': 'Ned',
        'context': '\n'
                   '====Season 1====\n'
                   'Arya accompanies her father Ned and her sister Sansa to '
                   "King's Landing. Before their departure, Arya's "
                   'half-brother Jon Snow gifts A'},
    {   'answer': 'Tywin',
        'context': 'Stark marrying two of his children.\n'
                   'Tyrion Lannister suspects his father Tywin, who decides '
                   'Tyrion and his barbarians will fight in the vanguard, '
                   'want'},
    {   'answer': 'Eddard',
        'context': 's Nymeria after a legendary warrior queen. She travels '
                   "with her father, Eddard, to King's Landing when he is made "
                   'Hand of the King. Before she leaves,'},
    {   'answer': 'Balon',
        'context': "sgusted, Robb acquiesces to Theon's further captivity, as "
                   "Theon's father Balon has recently died and Theon's absence 




In [6]:

# Run only the dense retriever on the full sentence query
res_3 = sklearn_keyword_classifier.run(
    query="which country was jon snow filmed ?"
)
print("DPR Results" + "\n" + "="*15)
print_answers(res_3, details="minimal")

# Run only the sparse retriever on a keyword based query
res_4 = sklearn_keyword_classifier.run(
    query="jon snow country"
)
print("ES Results" + "\n" + "="*15)
print_answers(res_4, details="minimal")

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.02 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.03 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.05 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.00 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.04 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:05<00:00,  5.86s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.02 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.43s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.40s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]


DPR Results
[   {   'answer': 'Northern Ireland',
        'context': 'ector, the scene was filmed on privately owned land in '
                   'Saintfield, Northern Ireland, and they had only 12 days to '
                   'shoot. After reading the script Sapo'},
    {   'answer': 'Iceland',
        'context': 's funeral.\n'
                   'The storylines led by Jon Snow and Daenerys Targaryen '
                   'continued to be filmed in Iceland and in the Moroccan city '
                   'of Essaouira respectively.'},
    {   'answer': 'Spain',
        'context': 'f the scenes that take place in the principality of Dorne '
                   'were filmed in Spain, beginning in October 2014. Locations '
                   'explored for the production inclu'},
    {   'answer': 'Canada',
        'context': ' in October 2015.\n'
                   'Only a very small portion of the season was filmed in '
                   'Canada (north of Calgary, Alberta): the scenes featu

Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.13s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.00 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.02 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.00 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.02 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.00 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.00 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.00 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.98 Batches/s]

ES Results
[   {   'answer': 'forests around Toome, in County Antrim, Northern Ireland',
        'context': ' Jon Snow and the wildlings were filmed in the forests '
                   'around Toome, in County Antrim, Northern Ireland. The '
                   'scenes in Northern Ireland were filmed si'},
    {   'answer': 'Season 3',
        'context': '\n'
                   '====Season 3====\n'
                   'When Jon Snow first arrives in the Wildling camp, he '
                   'initially mistakes Tormund for Mance Rayder, much to '
                   "Tormund's amusement. Mance"},
    {   'answer': 'Chris Stapleton',
        'context': '\n'
                   '=== Casting ===\n'
                   'Country singer Chris Stapleton has a cameo appearance as a '
                   'wight alongside his bass player and tour manager. '
                   'Stapleton said his manag'},
    {   'answer': '===Jon Snow',
        'context': '\n'
                   '===Jon Snow===\n'
   




In [7]:
# Run only the dense retriever on the full sentence query
res_5 = sklearn_keyword_classifier.run(
    query="who are the younger brothers of arya stark ?"
)
print("DPR Results" + "\n" + "="*15)
print_answers(res_5, details="minimal")

# Run only the sparse retriever on a keyword based query
res_6 = sklearn_keyword_classifier.run(
    query="arya stark younger brothers"
)
print("ES Results" + "\n" + "="*15)
print_answers(res_6, details="minimal")

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.49 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.92 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.84s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.29 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.41s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.00 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.99 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.00 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.98 Batches/s]


DPR Results
[   {   'answer': 'Bran, and Rickon',
        'context': 'f Lord Eddard Stark, and mother to his children Robb, '
                   'Sansa, Arya, Bran, and Rickon. She is the daughter of Lord '
                   'Hoster Tully of Riverrun; niece to Se'},
    {   'answer': 'Bran and Rickon',
        'context': 's five siblings: an older brother Robb, an older sister '
                   'Sansa, two younger brothers Bran and Rickon, and an older '
                   'illegitimate half-brother, Jon Snow.'},
    {   'answer': 'Prince Joffrey and Princess Myrcella',
        'context': 'on ===\n'
                   'Prince Tommen Baratheon is the younger brother of Prince '
                   'Joffrey and Princess Myrcella and is second in line for '
                   'the throne. Tommen is Queen Ce'},
    {   'answer': 'Jojen and Meera',
        'context': ' control of his abilities.  When Theon Greyjoy captures '
                   'Winterfell, Jojen and Meera accompany Bran 

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.83 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.97 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.41s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.02 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.02 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.05 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.03 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:02<00:00,  2.96s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.98 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.03 Batches/s]

ES Results
[   {   'answer': 'Rickon Stark and Bran Stark',
        'context': 'raised with a younger sister Arya Stark, two younger '
                   'brothers Rickon Stark and Bran Stark, as well as an older '
                   'brother Robb Stark, and an older illegi'},
    {   'answer': 'Bran and Rickon',
        'context': 's five siblings: an older brother Robb, an older sister '
                   'Sansa, two younger brothers Bran and Rickon, and an older '
                   'illegitimate half-brother, Jon Snow.'},
    {   'answer': 'Robert Baratheon',
        'context': 'Baratheon of House Baratheon, Lord of Dragonstone, is the '
                   "elder of Robert Baratheon's younger brothers. A brooding, "
                   'humorless man known for a hard and'},
    {   'answer': 'Bran',
        'context': "ns several victories against the Lannisters while Robb's "
                   'younger brother Bran rules the Northern stronghold of '
                   'Winterfell




## Transformer Keyword vs Question/Statement Classifier

Firstly, it's essential to understand the trade-offs between SkLearn and Transformer query classifiers. The transformer classifier is more accurate than SkLearn classifier however, it requires more memory and most probably GPU for faster inference however the transformer size is roughly `50 MBs`. Whereas, SkLearn is less accurate however is much more faster and doesn't require GPU for inference.

Below, we define a `TransformersQueryClassifier` and show how to use it:

Read more about the trained model and dataset used [here](https://huggingface.co/shahrukhx01/bert-mini-finetune-question-detection)

In [8]:
# Here we build the pipeline
transformer_keyword_classifier = Pipeline()
transformer_keyword_classifier.add_node(component=TransformersQueryClassifier(), name="QueryClassifier", inputs=["Query"])
transformer_keyword_classifier.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["QueryClassifier.output_1"])
transformer_keyword_classifier.add_node(component=es_retriever, name="ESRetriever", inputs=["QueryClassifier.output_2"])
transformer_keyword_classifier.add_node(component=reader, name="QAReader", inputs=["ESRetriever", "DPRRetriever"])
transformer_keyword_classifier.draw("pipeline_classifier.png")

Downloading:   0%|          | 0.00/619 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/44.7M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/334 [00:00<?, ?B/s]

In [9]:

# Run only the dense retriever on the full sentence query
res_1 = transformer_keyword_classifier.run(
    query="Who is the father of Arya Stark?"
)
print("DPR Results" + "\n" + "="*15)
print_answers(res_1, details="minimal")

# Run only the sparse retriever on a keyword based query
res_2 = transformer_keyword_classifier.run(
    query="arya stark father"
)
print("ES Results" + "\n" + "="*15)
print_answers(res_2, details="minimal")


Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.65 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.38s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.99 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.03 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.03 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.03 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.02 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:03<00:00,  3.15s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.02 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.39s/ Batches]


DPR Results
[   {   'answer': 'Eddard and Catelyn Stark',
        'context': 'Background ===\n'
                   'Arya is the third child and younger daughter of Eddard and '
                   'Catelyn Stark and is nine years old at the beginning of '
                   'the book series.  Sh'},
    {   'answer': 'Rhaegar',
        'context': ', Aemon Targaryen, Jorah Mormont, Meera Reed, Jon '
                   'Connington and Gilly.\n'
                   'Rhaegar married the Dornish princess Elia Martell of '
                   'Sunspear, and fathered wi'},
    {   'answer': 'Eddard Stark',
        'context': 'e from House Tully in the Riverlands region prior to her '
                   'marriage to Eddard Stark. She has her hair dyed dark brown '
                   'later on while in the Vale, disgui'},
    {   'answer': 'Eddard Stark and Catelyn Stark',
        'context': 'ces==\n'
                   'Sansa Stark is the second child and elder daughter of '
                   'Ed

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.62 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.43s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.02 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.02 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.05 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.05 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.40s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:02<00:00,  2.72s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.02 Batches/s]

ES Results
[   {   'answer': 'Ned',
        'context': '\n'
                   '====Season 1====\n'
                   'Arya accompanies her father Ned and her sister Sansa to '
                   "King's Landing. Before their departure, Arya's "
                   'half-brother Jon Snow gifts A'},
    {   'answer': 'Tywin',
        'context': 'Stark marrying two of his children.\n'
                   'Tyrion Lannister suspects his father Tywin, who decides '
                   'Tyrion and his barbarians will fight in the vanguard, '
                   'want'},
    {   'answer': 'Eddard',
        'context': 's Nymeria after a legendary warrior queen. She travels '
                   "with her father, Eddard, to King's Landing when he is made "
                   'Hand of the King. Before she leaves,'},
    {   'answer': 'Balon',
        'context': "sgusted, Robb acquiesces to Theon's further captivity, as "
                   "Theon's father Balon has recently died and Theon's absence 




In [10]:

# Run only the dense retriever on the full sentence query
res_3 = transformer_keyword_classifier.run(
    query="which country was jon snow filmed ?"
)
print("DPR Results" + "\n" + "="*15)
print_answers(res_3, details="minimal")

# Run only the sparse retriever on a keyword based query
res_4 = transformer_keyword_classifier.run(
    query="jon snow country"
)
print("ES Results" + "\n" + "="*15)
print_answers(res_4, details="minimal")

Inferencing Samples: 100%|██████████| 1/1 [00:04<00:00,  4.06s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.84s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.85s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.99 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.00 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.04 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.40s/ Batches]


DPR Results
[   {   'answer': 'Iceland',
        'context': 's funeral.\n'
                   'The storylines led by Jon Snow and Daenerys Targaryen '
                   'continued to be filmed in Iceland and in the Moroccan city '
                   'of Essaouira respectively.'},
    {   'answer': 'Northern Ireland',
        'context': ' Winterfell scenes were filmed at sets in Moneyglass and '
                   'Magheramorne in Northern Ireland, with indoor scenes '
                   'filmed at Paint Hall studios in Belfast.'},
    {   'answer': 'English',
        'context': '\n'
                   '== Reception ==\n'
                   'Originally auditioning for the role of Jon Snow, English '
                   'actor Alfie Allen has received positive reviews for his '
                   'role as Theon Greyjo'},
    {   'answer': 'Iceland',
        'context': 's the House of the Undying. Scenes set north of the Wall '
                   'were filmed in Iceland in November 2011. 

Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.05s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.76 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.51 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.02 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.02 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]

ES Results
[   {   'answer': 'forests around Toome, in County Antrim, Northern Ireland',
        'context': ' Jon Snow and the wildlings were filmed in the forests '
                   'around Toome, in County Antrim, Northern Ireland. The '
                   'scenes in Northern Ireland were filmed si'},
    {   'answer': 'Season 3',
        'context': '\n'
                   '====Season 3====\n'
                   'When Jon Snow first arrives in the Wildling camp, he '
                   'initially mistakes Tormund for Mance Rayder, much to '
                   "Tormund's amusement. Mance"},
    {   'answer': 'Chris Stapleton',
        'context': '\n'
                   '=== Casting ===\n'
                   'Country singer Chris Stapleton has a cameo appearance as a '
                   'wight alongside his bass player and tour manager. '
                   'Stapleton said his manag'},
    {   'answer': '===Jon Snow',
        'context': '\n'
                   '===Jon Snow===\n'
   




In [11]:
# Run only the dense retriever on the full sentence query
res_5 = transformer_keyword_classifier.run(
    query="who are the younger brothers of arya stark ?"
)
print("DPR Results" + "\n" + "="*15)
print_answers(res_5, details="minimal")

# Run only the sparse retriever on a keyword based query
res_6 = transformer_keyword_classifier.run(
    query="arya stark younger brothers"
)
print("ES Results" + "\n" + "="*15)
print_answers(res_6, details="minimal")

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.20 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.95 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.00 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.96s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.40s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.00 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.53 Batches/s]


DPR Results
[   {   'answer': 'Bran, and Rickon',
        'context': 'f Lord Eddard Stark, and mother to his children Robb, '
                   'Sansa, Arya, Bran, and Rickon. She is the daughter of Lord '
                   'Hoster Tully of Riverrun; niece to Se'},
    {   'answer': 'Bran and Rickon',
        'context': 's five siblings: an older brother Robb, an older sister '
                   'Sansa, two younger brothers Bran and Rickon, and an older '
                   'illegitimate half-brother, Jon Snow.'},
    {   'answer': 'Prince Joffrey and Princess Myrcella',
        'context': 'on ===\n'
                   'Prince Tommen Baratheon is the younger brother of Prince '
                   'Joffrey and Princess Myrcella and is second in line for '
                   'the throne. Tommen is Queen Ce'},
    {   'answer': 'Jojen and Meera',
        'context': ' control of his abilities.  When Theon Greyjoy captures '
                   'Winterfell, Jojen and Meera accompany Bran 

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.94 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.00 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.40s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.05 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.05 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.02 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:02<00:00,  2.71s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.04 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.05 Batches/s]

ES Results
[   {   'answer': 'Rickon Stark and Bran Stark',
        'context': 'raised with a younger sister Arya Stark, two younger '
                   'brothers Rickon Stark and Bran Stark, as well as an older '
                   'brother Robb Stark, and an older illegi'},
    {   'answer': 'Bran and Rickon',
        'context': 's five siblings: an older brother Robb, an older sister '
                   'Sansa, two younger brothers Bran and Rickon, and an older '
                   'illegitimate half-brother, Jon Snow.'},
    {   'answer': 'Robert Baratheon',
        'context': 'Baratheon of House Baratheon, Lord of Dragonstone, is the '
                   "elder of Robert Baratheon's younger brothers. A brooding, "
                   'humorless man known for a hard and'},
    {   'answer': 'Bran',
        'context': "ns several victories against the Lannisters while Robb's "
                   'younger brother Bran rules the Northern stronghold of '
                   'Winterfell




## Question vs Statement Classifier

One possible use case of this classifier could be to route queries after the document retrieval to only send questions to QA reader and in case of declarative sentence, just return the DPR/ES results back to user to enhance user experience and only show answers when user explicitly asks it.

![image](https://user-images.githubusercontent.com/6007894/127864452-f931ea7f-2e62-4f59-85dc-056d56eb9295.png)


Below, we define a `TransformersQueryClassifier` and show how to use it:

Read more about the trained model and dataset used [here](https://huggingface.co/shahrukhx01/question-vs-statement-classifier)

In [13]:
# Here we build the pipeline
transformer_question_classifier = Pipeline()
transformer_question_classifier.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["Query"])
transformer_question_classifier.add_node(component=TransformersQueryClassifier(model_name_or_path="shahrukhx01/question-vs-statement-classifier"), name="QueryClassifier", inputs=["DPRRetriever"])
transformer_question_classifier.add_node(component=reader, name="QAReader", inputs=["QueryClassifier.output_1"])
transformer_question_classifier.draw("question_classifier.png")

# Run only the QA reader on the question query
res_1 = transformer_question_classifier.run(
    query="Who is the father of Arya Stark?"
)
print("DPR Results" + "\n" + "="*15)
print_answers(res_1, details="minimal")

# Show only DPR results
res_2 = transformer_question_classifier.run(
    query="Arya Stark was the daughter of a Lord."
)
print("ES Results" + "\n" + "="*15)
print_answers(res_2, details="minimal")

Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.50 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.39s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.01 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.54 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.93 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  2.02 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:03<00:00,  3.14s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  1.32 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.42s/ Batches]


DPR Results
[   {   'answer': 'Eddard and Catelyn Stark',
        'context': 'Background ===\n'
                   'Arya is the third child and younger daughter of Eddard and '
                   'Catelyn Stark and is nine years old at the beginning of '
                   'the book series.  Sh'},
    {   'answer': 'Rhaegar',
        'context': ', Aemon Targaryen, Jorah Mormont, Meera Reed, Jon '
                   'Connington and Gilly.\n'
                   'Rhaegar married the Dornish princess Elia Martell of '
                   'Sunspear, and fathered wi'},
    {   'answer': 'Eddard Stark',
        'context': 'e from House Tully in the Riverlands region prior to her '
                   'marriage to Eddard Stark. She has her hair dyed dark brown '
                   'later on while in the Vale, disgui'},
    {   'answer': 'Eddard Stark and Catelyn Stark',
        'context': 'ces==\n'
                   'Sansa Stark is the second child and elder daughter of '
                   'Ed

{'documents': [{'content': '\n=== Background ===\nArya is the third child and younger daughter of Eddard and Catelyn Stark and is nine years old at the beginning of the book series.  She has five siblings: an older brother Robb, an older sister Sansa, two younger brothers Bran and Rickon, and an older illegitimate half-brother, Jon Snow.', 'content_type': 'text', 'score': 0.7142763811587122, 'meta': {'name': '43_Arya_Stark.txt'}, 'embedding': None, 'id': 'd7a98cb66f592540fa7de20bf46a5e64'},
  {'content': '\n==Character and appearances==\nSansa Stark is the second child and elder daughter of Eddard Stark and Catelyn Stark. She was born and raised in Winterfell, until leaving with her father and sister at the beginning of the series. She was raised with a younger sister Arya Stark, two younger brothers Rickon Stark and Bran Stark, as well as an older brother Robb Stark, and an older illegitimate half-brother, Jon Snow.\nRaised as a lady, Sansa is traditionally feminine. Sansa\'s interest

## Standalone Query Classifier
Below we run queries classifiers standalone to better understand their outputs on each of the three types of queries

In [None]:
# Here we create the keyword vs question/statement query classifier
from haystack.pipelines import TransformersQueryClassifier

queries = ["arya stark father","jon snow country",
           "who is the father of arya stark","which country was jon snow filmed?"]

keyword_classifier = TransformersQueryClassifier()

for query in queries:
    result = keyword_classifier.run(query=query)
    if result[1] == "output_1":
        category = "question/statement"
    else:
        category = "keyword"

    print(f"Query: {query}, raw_output: {result}, class: {category}")


Query: arya stark father, raw_output: ({'query': 'arya stark father'}, 'output_2'), class: keyword
Query: jon snow country, raw_output: ({'query': 'jon snow country'}, 'output_2'), class: keyword
Query: who is the father of arya stark, raw_output: ({'query': 'who is the father of arya stark'}, 'output_1'), class: question/statement
Query: which country was jon snow filmed?, raw_output: ({'query': 'which country was jon snow filmed?'}, 'output_1'), class: question/statement


In [None]:
# Here we create the question vs statement query classifier 
from haystack.pipelines import TransformersQueryClassifier

queries = ["Lord Eddard was the father of Arya Stark.","Jon Snow was filmed in United Kingdom.",
           "who is the father of arya stark?","Which country was jon snow filmed in?"]

question_classifier = TransformersQueryClassifier(model_name_or_path="shahrukhx01/question-vs-statement-classifier")

for query in queries:
    result = question_classifier.run(query=query)
    if result[1] == "output_1":
        category = "question"
    else:
        category = "statement"

    print(f"Query: {query}, raw_output: {result}, class: {category}")

Query: Lord Eddard was the father of Arya Stark., raw_output: ({'query': 'Lord Eddard was the father of Arya Stark.'}, 'output_2'), class: statement
Query: Jon Snow was filmed in United Kingdom., raw_output: ({'query': 'Jon Snow was filmed in United Kingdom.'}, 'output_2'), class: statement
Query: who is the father of arya stark?, raw_output: ({'query': 'who is the father of arya stark?'}, 'output_1'), class: question
Query: Which country was jon snow filmed in?, raw_output: ({'query': 'Which country was jon snow filmed in?'}, 'output_1'), class: question


## Conclusion

The query classifier gives you more possibility to be more creative with the pipelines and use different retrieval nodes in a flexible fashion. Moreover, as in the case of Question vs Statement classifier you can also choose the queries which you want to send to the reader.

Finally, you also have the possible of bringing your own classifier and plugging it into either `TransformersQueryClassifier(model_name_or_path="<huggingface_model_name_or_file_path>")` or using the `SklearnQueryClassifier(model_name_or_path="url_to_classifier_or_file_path_as_pickle", vectorizer_name_or_path="url_to_vectorizer_or_file_path_as_pickle")`

## About us

This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany

We bring NLP to the industry via open source!  
Our focus: Industry specific language models & large scale QA systems.
  
Some of our other work: 
- [German BERT](https://deepset.ai/german-bert)
- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)
- [FARM](https://github.com/deepset-ai/FARM)

Get in touch:
[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Slack](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)

By the way: [we're hiring!](https://www.deepset.ai/jobs) 