### Load documents list

In [1]:
import os

directory_path = 'data/'
document_list = os.listdir(directory_path)
print(document_list)

['Attention-is-All-You-Need.pdf', 'GPT-4-Technical-Report.pdf', 'Lets-Verify-Step-by-Step.pdf', 'Sparks-of-AGI.pdf', 'STaR-Self-Taught-Reasoner.pdf', 'Tree-of-Thoughts.pdf']


### Parse documents into lists of text and create its metadata

In [2]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from pypdf import PdfReader
import uuid

documents = []
metadatas = []
ids = []

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap  = 100,
    length_function = len,
    is_separator_regex = False,
)

for document_name in document_list:
    reader = PdfReader(directory_path + document_name)
    for page in reader.pages:
        page_text = page.extract_text()
        chunks = text_splitter.create_documents([page_text])
        for chunk in chunks:
            documents.append(chunk.page_content)
            metadatas.append({'source': document_name, 'page_number': reader.get_page_number(page)})
            ids.append(str(uuid.uuid4()))

### Create DB connection and embeddings function
Here we are using in memory data base and free model from: https://www.sbert.net/docs/pretrained_models.html

In [3]:
import chromadb
from chromadb.utils import embedding_functions

chroma_client = chromadb.EphemeralClient()
sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")

### Create and store documents data as embeddings in vector data base

In [4]:
collection = chroma_client.create_collection(name="pdf_data", embedding_function=sentence_transformer_ef)
collection.add(
    documents=documents,
    metadatas=metadatas,
    ids=ids
)

### Get user querry, find related information in DB and append it to querry

In [5]:
user_querry = "Hi there! I would like to know why self-attention module is so important?"

### Connect to LLM and extract key information from user prompt
This approach gives much better results for vector DB search. 

Extraction of key information can be performed by smaller LLM for cost efficiency.

In [6]:
from openai import OpenAI

client = OpenAI()

completion = client.chat.completions.create(
  model="gpt-3.5-turbo-1106",
  temperature=0.0,
  max_tokens=1000,
  messages=[
    {"role": "system", "content": "You will convert user question to string suitable for data base search. Have in mind that we need only essence for search."},
    {"role": "user", "content": user_querry}
  ]
)

user_querry_sumerized = completion.choices[0].message.content
print(user_querry_sumerized)

"importance of self-attention module"


### Find revelant .pdf files that might consist answer for the user

In [7]:
results = collection.query(
    query_texts=[user_querry_sumerized],
    n_results=10
)

# Context can be improved by providing adjacent chunks from the same document based on metadata
# In this proof of concept approach of simply combining most relevant chunks of information is good enough
context = results["documents"][0]
print(f"Context: {context}")

sources = {source['source'] for source in results["metadatas"][0]}
print(f"Sources: {sources}")

Context: ['the number of operations required to relate signals from two arbitrary input or output positions grows\nin the distance between positions, linearly for ConvS2S and logarithmically for ByteNet. This makes\nit more difﬁcult to learn dependencies between distant positions [ 11]. In the Transformer this is\nreduced to a constant number of operations, albeit at the cost of reduced effective resolution due\nto averaging attention-weighted positions, an effect we counteract with Multi-Head Attention as\ndescribed in section 3.2.\nSelf-attention, sometimes called intra-attention is an attention mechanism relating different positions\nof a single sequence in order to compute a representation of the sequence. Self-attention has been\nused successfully in a variety of tasks including reading comprehension, abstractive summarization,\ntextual entailment and learning task-independent sentence representations [4, 22, 23, 19].', '[19] Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, M

### Prepare system message for LLM

In [8]:
# Basic system message that should be improved by taking into account fragmented context from different files.
# To improve LLM answer accuracy and consistency it is required to give few-shot prompt examples which is out of the scope of this POC.
system_message = f"""\
Context information is below:
---------------------
{context}
---------------------
Given the context information and not prior knowledge answer user question.
If context does NOT contain answer, tell the user you didn't find answer.
If context contain answer - append list of source .pdf names from below:
---------------------
{sources}
---------------------
"""

print(system_message)

Context information is below:
---------------------
['the number of operations required to relate signals from two arbitrary input or output positions grows\nin the distance between positions, linearly for ConvS2S and logarithmically for ByteNet. This makes\nit more difﬁcult to learn dependencies between distant positions [ 11]. In the Transformer this is\nreduced to a constant number of operations, albeit at the cost of reduced effective resolution due\nto averaging attention-weighted positions, an effect we counteract with Multi-Head Attention as\ndescribed in section 3.2.\nSelf-attention, sometimes called intra-attention is an attention mechanism relating different positions\nof a single sequence in order to compute a representation of the sequence. Self-attention has been\nused successfully in a variety of tasks including reading comprehension, abstractive summarization,\ntextual entailment and learning task-independent sentence representations [4, 22, 23, 19].', '[19] Zhouhan Lin,

### Generate answer for user question with high quality LLM and provided context

In [9]:
completion = client.chat.completions.create(
  model="gpt-3.5-turbo-1106",
  temperature=0.0,
  max_tokens=3000,
  messages=[
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_querry}
  ]
)

answer = completion.choices[0].message.content
print(answer)

The self-attention module is important because it allows the model to relate different positions within a single sequence in order to compute a representation of the sequence. This mechanism has been successfully used in various natural language processing tasks such as reading comprehension, abstractive summarization, textual entailment, and learning task-independent sentence representations. Additionally, self-attention has the advantage of connecting all positions with a constant number of sequentially executed operations, making it faster and more space-efficient in practice compared to other mechanisms like recurrent layers. The self-attention module also has the potential to yield more interpretable models, as it allows for the inspection of attention distributions and the identification of different tasks performed by individual attention heads.

Sources:
- 'Attention-is-All-You-Need.pdf'
