# Question Answering
Este notebook aplica Question Answering otimizado em documentos longos

### Como funciona:
1. Extrai o texto do PDF/DOCX
2. Divide em *chunks* moderados
3. Cria embeddings dos chunks (SentenceTransformers)
4. Indexa os embeddings no **FAISS**
5. Ao receber uma pergunta:
   - Busca no FAISS os chunks mais relevantes
   - Passa somente esses trechos ao modelo QA (DistilBERT ou RoBERTa)
---


In [None]:

!pip install transformers torch pdfplumber python-docx nltk faiss-cpu sentence-transformers


In [17]:

import pdfplumber
from docx import Document
from transformers import pipeline
import torch
import nltk
import os
import time
import faiss
from sentence_transformers import SentenceTransformer

nltk.download('punkt')

device = 0 if torch.cuda.is_available() else -1

models = {
    # "distilbert": pipeline("question-answering", model="distilbert-base-uncased-distilled-squad", device=device),
    "falconsai": pipeline("question-answering", model="Falconsai/question_answering_v2", device=device),
    "bert": pipeline("question-answering", model="usamakenway/Bert-question-answering-optimized", device=device),
    "roberta": pipeline("question-answering", model="deepset/xlm-roberta-large-squad2", device=device)
}

# embeddings
embedder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Device set to use cuda:0


pytorch_model.bin:   0%|          | 0.00/436M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/401 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/436M [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Device set to use cuda:0
Some weights of the model checkpoint at deepset/xlm-roberta-large-squad2 were not used when initializing XLMRobertaForQuestionAnswering: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing XLMRobertaForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


In [4]:

def extract_text(file_path):
    extension = os.path.splitext(file_path)[1].lower()
    text = ""

    if extension == '.pdf':
        with pdfplumber.open(file_path) as pdf:
            for page in pdf.pages:
                page_text = page.extract_text()
                if page_text:
                    text += page_text + "\n"

    elif extension == '.docx':
        doc = Document(file_path)
        for para in doc.paragraphs:
            text += para.text + "\n"

    return ' '.join(text.split())


In [5]:

def chunk_text(text, max_tokens=400, overlap=30):
    sentences = nltk.sent_tokenize(text)
    chunks, current_chunk, current_tokens = [], [], 0

    for sentence in sentences:
        tokens = len(sentence.split())
        if current_tokens + tokens > max_tokens:
            chunks.append(" ".join(current_chunk))
            current_chunk = current_chunk[-overlap:]
            current_tokens = len(" ".join(current_chunk).split())
        current_chunk.append(sentence)
        current_tokens += tokens

    if current_chunk:
        chunks.append(" ".join(current_chunk))

    return chunks


In [6]:

def build_faiss_index(chunks):
    embeddings = embedder.encode(chunks, convert_to_numpy=True)
    dim = embeddings.shape[1]
    index = faiss.IndexFlatL2(dim)
    index.add(embeddings)
    return index, embeddings


In [18]:

def retrieve_chunks(question, chunks, index, top_k=10):
    q_emb = embedder.encode([question], convert_to_numpy=True)
    distances, indices = index.search(q_emb, top_k)
    return [chunks[i] for i in indices[0]]


In [19]:

def answer_with_faiss(question, chunks, index, model_key="roberta", top_k=5):
    qa_model = models[model_key]
    retrieved = retrieve_chunks(question, chunks, index, top_k=top_k)

    answers = []
    for chunk in retrieved:
        try:
            result = qa_model(question=question, context=chunk)
            answers.append(result)
        except Exception as e:
            print(f"Erro: {e}")

    if answers:
        best = max(answers, key=lambda x: x['score'])
        return best, retrieved
    else:
        return None, []


In [9]:
pdfText = extract_text("doencas_respiratorias_cronicas.pdf")
docText = extract_text("DICIONARIO_DE_DADOS.docx")

In [10]:
import nltk
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [22]:
chunks = chunk_text(pdfText)
index, embeddings = build_faiss_index(chunks)

In [27]:


question = "O que é o aerosol e qual sua composição química ?"

for model_key in ["roberta"]:
    start = time.time()
    result, retrieved = answer_with_faiss(question, chunks, index, model_key=model_key, top_k=2)
    elapsed = time.time() - start
    print(f"\nModelo: {model_key}")
    print(f"Chunks: {retrieved}")
    print(f"Resposta: {result['answer'] if result else 'N/A'} (score={result['score']:.2f})")
    print(f"Tempo: {elapsed:.2f} segundos")


You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset



Modelo: roberta
Chunks: ['108 Técnica de uso do aerossol dosimetrado com espaçador e máscara Essa técnica está indicada para uso em crianças menores de seis anos, idosos ou pessoas em crise. Preparação do dispositivo • Montar o espaçador, quando necessário. • Retirar a tampa do aerossol dosimetrado. • Agitar o inalador vigorosamente antes do uso. • Acoplar o aerossol dosimetrado ao espaçador, posicionando-o verticalmente (formando um L). Preparação do usuário • Se possível, o usuário deve estar de pé ou com o tronco reto. Quando criança, colocá-la sentada ou no colo, com o tronco reto, com a cabeça erguida (não permitir o uso de chupeta durante a aplicação). • Colocar a máscara sobre o nariz e a boca. A máscara deve ser adequada ao tamanho do usuário, para evitar vazamentos. • Pressionar o cilindro metálico com o dedo indicador na parte de cima. • Com a máscara bem aderida ao rosto, respirar normalmente por 20 a 30 segundos. • Quando prescrito mais de um jato, significa que a técnica 