# Answering questions from a long text passage

In the previous recipe, we learned an approach to extract the answer to a question, given a context. This pattern involves the model retrieving the answer from the given context. The model cannot answer a question that is not contained in the context. This does serve a purpose where we want an answer from a given context. This type of question-answering system is defined as **Closed Domain Question Answering (CDQA)**.

There is another system of question answering that can answer questions that are general in nature. These systems are trained on larger corpora. This training provides them with the ability to answer questions that are open in nature. These systems are called **Open Domain Question Answering (ODQA)** systems.

Getting ready

As part of this recipe, we will use the **DeepPavlov** (https://deeppavlov.ai) ODQA system to answer an open question. We will use the deeppavlov library along with the **Knowledge Base Question Answering (KBQA)** model. This model has been trained on English wiki data as a knowledge base. It uses various NLP techniques such as entity linking and disambiguation, knowledge graphs, and so on to extract the exact answer to the question.

Install and download the document corpus

In [None]:
# Install Python 3.10
!sudo apt-get update -y
!sudo apt-get install python3.10 python3.10-distutils python3-pip -y

In [None]:
# Set Python 3.10 as the default 'python3' command
# !sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1
# !sudo update-alternatives --set python3 /usr/bin/python3.10

In [None]:
# Re-install Pip for the new Python version
#!curl -sS https://bootstrap.pypa.io/get-pip.py | python3

In [None]:
# !pip install deeppavlov

In [None]:
# 1. Install the specific Python 3.10 development headers
# (This provides the "blueprints" the compiler needs)
#!sudo apt-get install python3.10-dev build-essential -y

In [None]:
# 2. Install the build-time Python dependencies
# !pip install --upgrade pip setuptools wheel
# !pip install pybind11 Cython

In [None]:
# 1. Ensure pybind11 is definitely in the environment
# !pip install pybind11

In [None]:
# 2. Force install hdt using the current environment's tools
!pip install hdt==2.3 --no-build-isolation



In [None]:
# !pip install "numpy>=1.18.0" "pandas==2.2.2" "scikit-learn==1.6.1"

In [None]:
# 1. Clear the pip cache to stop it from using that corrupted .tar.gz file
# !pip cache purge

In [None]:
# 1. Install Python 3.10 and the specific distutils it needs
!apt-get install python3.10 python3.10-distutils -y

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Note, selecting 'python3-distutils' instead of 'python3.10-distutils'
python3-distutils is already the newest version (3.10.8-1~22.04).
python3.10 is already the newest version (3.10.12-1~22.04.12).
0 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.


In [None]:
# 2. Get the specific pip for Python 3.10
!curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10

Collecting pip
  Using cached pip-25.3-py3-none-any.whl.metadata (4.7 kB)
Using cached pip-25.3-py3-none-any.whl (1.8 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 25.3
    Uninstalling pip-25.3:
      Successfully uninstalled pip-25.3
Successfully installed pip-25.3


In [None]:
# 3. Use the Python 3.10 Pip specifically to install DeepPavlov
# We use 'python3.10 -m pip' to avoid any confusion with the system pip
!python3.10 -m pip install deeppavlov

Collecting pybind11==2.10.3 (from deeppavlov)
  Using cached pybind11-2.10.3-py3-none-any.whl.metadata (9.4 kB)
Using cached pybind11-2.10.3-py3-none-any.whl (222 kB)
Installing collected packages: pybind11
  Attempting uninstall: pybind11
    Found existing installation: pybind11 2.2.4
    Uninstalling pybind11-2.2.4:
      Successfully uninstalled pybind11-2.2.4
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
hdt 2.3 requires pybind11==2.2.4, but you have pybind11 2.10.3 which is incompatible.[0m[31m
[0mSuccessfully installed pybind11-2.10.3


In [None]:
# 4. Install the HDT library (which you succeeded with earlier)
!python3.10 -m pip install hdt==2.3 --no-build-isolation

Collecting pybind11==2.2.4 (from hdt==2.3)
  Using cached pybind11-2.2.4-py2.py3-none-any.whl.metadata (2.4 kB)
Using cached pybind11-2.2.4-py2.py3-none-any.whl (145 kB)
Installing collected packages: pybind11
  Attempting uninstall: pybind11
    Found existing installation: pybind11 2.10.3
    Uninstalling pybind11-2.10.3:
      Successfully uninstalled pybind11-2.10.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
deeppavlov 1.7.0 requires pybind11==2.10.3, but you have pybind11 2.2.4 which is incompatible.[0m[31m
[0mSuccessfully installed pybind11-2.2.4


In [None]:
# 5. Run the model install using the 3.10 executable
!python3.10 -m deeppavlov install kbqa_cq_en

Collecting en-core-web-sm==3.5.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m23.4 MB/s[0m  [33m0:00:00[0m
Ignoring transformers: markers 'python_version < "3.8"' don't match your environment
Collecting ru-core-news-sm==3.5.0
  Downloading https://github.com/explosion/spacy-models/releases/download/ru_core_news_sm-3.5.0/ru_core_news_sm-3.5.0-py3-none-any.whl (15.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.3/15.3 MB[0m [31m22.1 MB/s[0m  [33m0:00:00[0m


Imports

In [None]:
import sys
sys.path.append('/usr/local/lib/python3.10/dist-packages') # this is to ensure the environment uses python 3.10 instead of 3.12


In this step, we initialize the KBQA model, kbqa_cq_en, which is passed to the build_model method as an argument. We also set the download argument to True so that the model is downloaded as well in case it is missing locally:

In [None]:
import sys
import torch.nn as nn

# 1. Re-link the Python 3.10 path
if '/usr/local/lib/python3.10/dist-packages' not in sys.path:
    sys.path.append('/usr/local/lib/python3.10/dist-packages')

# 2. Use a "Guard" to prevent recursion even if the cell is run twice
if not hasattr(nn.Module, '_is_patched'):
    print("Applying atomic patch...")
    _original_load = nn.Module.load_state_dict

    def universal_load_patch(self, *args, **kwargs):
        kwargs['strict'] = False  # Ignore the 'position_ids' error
        return _original_load(self, *args, **kwargs)

    nn.Module.load_state_dict = universal_load_patch
    nn.Module._is_patched = True
    print("Patch applied safely.")
else:
    print("Patch already active, skipping to avoid recursion.")

# 3. Now load the model
from deeppavlov import build_model, configs
kbqa_model = build_model(configs.kbqa.kbqa_cq_en, download=True)

Patch already active, skipping to avoid recursion.


2026-01-08 20:01:37.496 INFO in 'deeppavlov.download'['download'] at line 138: Skipped http://files.deeppavlov.ai/kbqa/wikidata/query_prediction_eng.pickle download because of matching hashes
INFO:deeppavlov.download:Skipped http://files.deeppavlov.ai/kbqa/wikidata/query_prediction_eng.pickle download because of matching hashes
2026-01-08 20:01:39.993 INFO in 'deeppavlov.download'['download'] at line 138: Skipped http://files.deeppavlov.ai/kbqa/models/path_ranking_nll_roberta_lcquad2.tar.gz download because of matching hashes
INFO:deeppavlov.download:Skipped http://files.deeppavlov.ai/kbqa/models/path_ranking_nll_roberta_lcquad2.tar.gz download because of matching hashes
2026-01-08 20:01:41.288 INFO in 'deeppavlov.download'['download'] at line 138: Skipped http://files.deeppavlov.ai/kbqa/datasets/lcquad2.tar.gz download because of matching hashes
INFO:deeppavlov.download:Skipped http://files.deeppavlov.ai/kbqa/datasets/lcquad2.tar.gz download because of matching hashes
2026-01-08 20:01

ModuleNotFoundError: No module named 'hdt'

We use the initialized model and pass it a couple of questions that we want to be answered:

In [None]:
from transformers import PreTrainedTokenizerFast

# 1. Restore the original method first to avoid nested loops
from transformers.tokenization_utils_fast import PreTrainedTokenizerFast as BaseTokenizer
PreTrainedTokenizerFast._batch_encode_plus = BaseTokenizer._batch_encode_plus

_original_batch_encode = PreTrainedTokenizerFast._batch_encode_plus

# 2. Define the "clean" patch
def final_tokenizer_patch(self, *args, **kwargs):
    # Remove the arguments that the Fast Tokenizer's C++ core rejects
    kwargs.pop('pad_to_max_length', None)
    kwargs.pop('padding', None)
    kwargs.pop('truncation', None)

    return _original_batch_encode(self, *args, **kwargs)

# 3. Apply the patch
PreTrainedTokenizerFast._batch_encode_plus = final_tokenizer_patch
print("Final Tokenizer patch applied! Inter-library argument conflicts resolved.")

# 4. Run your query
result = kbqa_model(["What is the capital of Egypt?",
                     "who is Bill Clinton's wife?"])


Final Tokenizer patch applied! Inter-library argument conflicts resolved.


RecursionError: maximum recursion depth exceeded