# Hugging Face - Question Answering from PDF


**Author:** [Muhammad Talha Khan](https://www.linkedin.com/in/muhtalhakhan/)

**Description**: This Transformers QA Pipeline shows a Question Answering models that can retrieve the answer to a question from a given text, which is useful for searching for an answer in a document. This specific example is set with the Bitcoin white paper but you can put any PDF.

The PDF will server as context which will used for questions answering.

## Input

### Install Packages

In [1]:
!pip install tensorflow

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
!pip install -q transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.8/5.8 MB[0m [31m19.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m182.4/182.4 KB[0m [31m14.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m81.5 MB/s[0m eta [36m0:00:00[0m
[?25h

Use "--user" if it asks for permission prompt.

In [3]:
!pip install PyPDF2

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 KB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1


In [4]:
!pip install urllib3

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### Import Libraries


In [5]:
from transformers import pipeline
import urllib.request
import PyPDF2
import io

### Add the Document Path

In [7]:
URL = 'https://arxiv.org/pdf/2301.02593.pdf'
req = urllib.request.Request(URL, headers={'User-Agent' : "Chrome"})
remote_file = urllib.request.urlopen(req).read()
remote_file_bytes = io.BytesIO(remote_file)
pdfdoc_remote = PyPDF2.PdfReader(remote_file_bytes)

You can change the URL path to the desired one relating to any of the PDF.

## Model

### Read Text from File

In [11]:
pdf_text = ""    
    
for i in range(len(pdfdoc_remote.pages)):
    print(i)
    page = pdfdoc_remote.pages[i]
    page_content = page.extract_text()
    pdf_text += page_content    

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14


### Generate the text data from the pdf file 

In [12]:
print(pdf_text)

Multi-Agent Reinforcement Learning for Fast-Timescale
Demand Response of Residential Loads
Vincent Mai
Robotics & Embodied AI Lab, Mila
Université de Montréal, Canada
vincent.mai@umontreal.caPhilippe Maisonneuve
GERAD & Mila
Polytechnique Montréal, Canada
philippe.maisonneuve@polymtl.caTianyu Zhang
Mila
Université de Montréal, Canada
tianyu.zhang@mila.quebec
Hadi Nekoei
Mila
Université de Montréal, Canada
nekoeihe@mila.quebecLiam Paull
Robotics & Embodied AI Lab, Mila
Université de Montréal, Canada
liam.paull@umontreal.caAntoine Lesage-Landry
GERAD & Mila
Polytechnique Montréal, Canada
antoine.lesage-landry@polymtl.ca
ABSTRACT
To integrate high amounts of renewable energy resources, electrical
power grids must be able to cope with high amplitude, fast timescale
variations in power generation. Frequency regulation through de-
mand response has the potential to coordinate temporally flexible
loads, such as air conditioners, to counteract these variations. Ex-
isting approaches for discre

### Loading the pipeline
Import Pipeline from Transformer after installing the transformers and tensorflow.

In [13]:
nlp = pipeline('question-answering', model='deepset/roberta-base-squad2', tokenizer='deepset/roberta-base-squad2')

Downloading:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/496M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/772 [00:00<?, ?B/s]

## Output

### Ask question

In [None]:
context = pdf_text 
question = input('Enter your question:\n')

question_set = {
        'context': context,
        'question': question
    }

results = nlp(question_set)

Enter your question:
What is the bang bang controller?




### Get answer

This will print the answer to the question you have asked before.

In [15]:
print("\nAnswer: " + results['answer'])


Answer: residential air condition-
ers
