# Document Question Answering System 

## 1. Document Preprocessing 

### 1a) Read document PDF file  

In [38]:
import os 
import numpy as np 
import pickle
import warnings
warnings.filterwarnings('ignore')

In [2]:
from PyPDF2 import PdfReader
document_path = r"shuttle_operations_manual.pdf"

In [7]:
reader = PdfReader(document_path)
number_of_pages = len(reader.pages)
content = []
for i in range(number_of_pages):
    page = reader.pages[i]
    text = page.extract_text()
    content.append(text.replace('\n', ''))

In [8]:
number_of_pages

1161

In [26]:
# with open('manual.pkl', 'wb') as f:
#     pickle.dump(content, f)

In [17]:
content[221]

'USA007587 Rev. A 2. SYSTEMS 2.5 Crew Systems 2.5-14 MS BIOMED Input Connector on Panel A11  BIOMED Rotary Switches on Panel R10 Radiation Equipment The harmful biological e ffects of radiation must be minimized through mission planning based on calculated predictions and by monitoring dosage exposures.  Preflight requirements in-clude a projection of mission radiation dosage, an assessment of the probability of solar flares during the mission, and a radiation exposure history of flight crewmembers.  In-flight requirements mandate that each crewmember carry a passive dosimeter throughout the duration of the flight.  In the event of a solar flare or other radiation contingency, the crew would be requested to retrieve and read out one or more of the active dosimeters.   The space shuttle radiation instrumentation system consists of both active and passive dosimeter devices.  Active and passive radiation dosimetry devices include crew passive dosimeters (CPDs), passive radiation dosimeter

### 1b) Create Embeddings for Each Page  

In [18]:
from sentence_transformers import SentenceTransformer 
model = SentenceTransformer('all-MiniLM-L6-v2') 

In [19]:
embeddings = model.encode(content)

In [21]:
embeddings.shape 

(1161, 384)

In [28]:
# with open('vectors.pkl', 'wb') as f:
#     pickle.dump(embeddings, f)

## 2. Ask Question about The Document 

### 2a) Convert The Question into Embeddings 

In [33]:
question = 'How to start the engine during takeoff?'

In [34]:
def get_top3(qvector, pvectors, n):
    dot_prod = np.sum(qvector*pvectors, axis=1)
    pnorm2 = np.sum(pvectors**2, axis=1)
    dist = dot_prod/pnorm2
    results = dist.argsort()[-n:]
    return results

### 2b) Find Most Relavant Pages 

In [39]:
question_vec = model.encode([question])[0]     
top3 = get_top3(question_vec, embeddings, 3)
print(top3)
print(content[top3[0]])   
print(content[top3[1]])  
print(content[top3[2]])  
document = content[top3[0]]

[ 88 852  31]
 USA007587  Rev. A  2. SYSTEMS  2.1 Auxiliary Power Unit/Hydraulics (APU/HYD) 2.1-7  NOTE A barberpole APU/HYD READY TO START talkback will not inhibit a start. APU OPERATE 1, 2, 3  switches are located on panel R2.  When the switches are positioned to START/RUN , the corresponding APU controller activates the start of that unit and removes electrical power automatically from the unit’s gas generator and fuel pump heaters. To start the APU, fuel expelled from the hydrazine tank flows through the open tank valves and filter to the gas generator valve module, which contains the primary and secondary fuel control valves in series.  The primary pulse control valve is normally open, and the secondary pulse control valve is energized open.  Fuel flowing through the pump bypass valve is directed to the gas generator, because the fuel pump is not being driven at that moment by the APU turbine. The fuel in the gas generator decomposes through catalytic reaction, creates hot gas, a

## 3. Prompt Template

In [41]:
prompt = f'''
Please respond as if you were talking to a non-expert. Use analogies. 
Given the following extracted parts of a operating manual of a space shuttle and a question, 
create a final answer with references from the provided document.
If you don't know the answer, just say that you don't know. 
Don't try to make up an answer.
ALWAYS return a "SOURCES" part in your answer.
DOCUMENT: {document}
QUESTION: {question}
...
=========
FINAL ANSWER:
SOURCES:
'''

In [42]:
from llama_cpp import Llama

model_path = r"Mistral-7B-Instruct-v0.1-GGUF\mistral-7b-instruct-v0.1.Q4_K_M.gguf" 
llm = Llama(
    model_path=model_path,
    n_ctx=3000,  # Context length to use
    n_gpu_layers=64
)

generation_kwargs = {
    "max_tokens":20000,
    "stop":["</s>"],
    "echo":False, # Echo the prompt in the output
    "top_k":10 
}


llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from C:\Users\User\Downloads\Mistral-7B-Instruct-v0.1-GGUF\mistral-7b-instruct-v0.1.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.1
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
ll

In [43]:
res = llm(prompt, **generation_kwargs) 
print(res["choices"][0]["text"])


llama_print_timings:        load time =   13259.71 ms
llama_print_timings:      sample time =      47.90 ms /   531 runs   (    0.09 ms per token, 11084.67 tokens per second)
llama_print_timings: prompt eval time =   31164.12 ms /  1146 tokens (   27.19 ms per token,    36.77 tokens per second)
llama_print_timings:        eval time =   68997.34 ms /   530 runs   (  130.18 ms per token,     7.68 tokens per second)
llama_print_timings:       total time =  101010.25 ms /  1676 tokens


USA007587 Rev. A 2. SYSTEMS 2.1 Auxiliary Power Unit/Hydraulics (APU/HYD) 2.1-7

To start the engine during takeoff, you need to activate the auxiliary power unit (APU). You can do this by following these steps:

1. Go to panel R2 on the space shuttle console and locate the APU OPERATE 1, 2, 3 switches.
2. Position the switch(s) you want to start to the START/RUN position.
3. The corresponding APU controller will automatically remove electrical power from the unit's gas generator and fuel pump heaters.
4. Fuel flows through the open tank valves and filter to the gas generator valve module, which contains the primary and secondary fuel control valves in series.
5. The primary pulse control valve is normally open, and the secondary pulse control valve is energized open.
6. Fuel flowing through the pump bypass valve is directed to the gas generator, because the fuel pump is not being driven at that moment by the APU turbine.
7. The fuel in the gas generator decomposes through catalytic re