In [31]:
%pip install langchain chromadb sentence-transformers python-dotenv openai pypdf

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


In [5]:
%pip install -U langchain-community


Collecting langchain-community
  Downloading langchain_community-0.3.20-py3-none-any.whl.metadata (2.4 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.8.1-py3-none-any.whl.metadata (3.5 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading mypy_extensions-1.0.0-py3-no

In [1]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.document_loaders import DirectoryLoader, PyPDFLoader

In [2]:
# Define the directory containing your PDF documents
documents_dir = "../document/"

# Create a DirectoryLoader to load all PDF files from the directory
loader = DirectoryLoader(
    documents_dir,
    glob="**/*.pdf",  # This will load all PDF files recursively
    loader_cls=PyPDFLoader,
    show_progress=True
)

# Load the documents
documents = loader.load()

# Print the number of documents loaded
print(f"Loaded {len(documents)} documents")


  0%|          | 0/3 [00:00<?, ?it/s]

100%|██████████| 3/3 [00:00<00:00,  5.11it/s]

Loaded 48 documents





In [3]:
# Initialize text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
    separators=["\n\n", "\n", " ", ""]
)

# Split documents into chunks
texts = text_splitter.split_documents(documents)

# Print the number of chunks created
print(f"Created {len(texts)} chunks")

Created 217 chunks


In [4]:
# Initialize embeddings
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# Create vector store
vectorstore = Chroma.from_documents(
    documents=texts,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

# Print the number of vectors stored
print(f"Stored {len(texts)} vectors in the database")

  embeddings = HuggingFaceEmbeddings(
  from .autonotebook import tqdm as notebook_tqdm


Stored 217 vectors in the database


In [5]:
results = vectorstore.similarity_search(
    "what are the main principle of Attention Block",
    k=5,
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

* Recurrent models typically factor computation along the symbol positions of the input and output
sequences. Aligning the positions to steps in computation time, they generate a sequence of hidden
states ht, as a function of the previous hidden state ht−1 and the input for position t. This inherently
sequential nature precludes parallelization within training examples, which becomes critical at longer
sequence lengths, as memory constraints limit batching across examples. Recent work has achieved
signiﬁcant improvements in computational efﬁciency through factorization tricks [18] and conditional
computation [26], while also improving model performance in case of the latter. The fundamental
constraint of sequential computation, however, remains.
Attention mechanisms have become an integral part of compelling sequence modeling and transduc-
tion models in various tasks, allowing modeling of dependencies without regard to their distance in [{'author': 'Ashish Vaswani, Noam Shazeer, Niki 

In [6]:
from typing import List, Dict, Any

def search_documents(
    query: str,
    vectorstore: Chroma,
    k: int = 5
) -> List[str]:
    """
    Perform similarity search on the vector store using the provided query.
    
    Args:
        query (str): The search query from the user
        vectorstore (Chroma): The initialized Chroma vector store
        k (int, optional): Number of results to return. Defaults to 5.
    
    Returns:
        List[str]: List of page content from the documents
    """
    try:
        results = vectorstore.similarity_search(
            query,
            k=k
        )
        return [doc.page_content for doc in results]
    except Exception as e:
        print(f"Error performing similarity search: {str(e)}")
        return []

In [9]:
import os
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
client = OpenAI(api_key = OPENAI_API_KEY)


In [8]:
query = '''What are the main principles of Attention Block?'''
context = search_documents(query, vectorstore)
systemPrompt = f'''You are an intelligent bot you'll be given a text and you'll have to answer the question based on the text
{context}
'''
conversationHistory = [
    {"role": "system", "content": systemPrompt},
    {"role": "user", "content": query}
]

def Answer_Question(conversationHistory):
    response = client.chat.completions.create(
        model = "o3-mini",
        messages=conversationHistory,
        
    )

    return response

response = Answer_Question(conversationHistory)
print(response.choices[0].message.content)

Based on the text—and what is generally understood about attention mechanisms in sequence models—the main principles behind an Attention Block are as follows:

1. Dependency Modeling Beyond Sequential Order: Unlike recurrent models that build dependencies step‐by‐step in a fixed sequence order, the attention mechanism can relate any positions within an input (or between an input and output) irrespective of how far apart they are. This means that dependencies between symbols aren’t bound by their sequential proximity, allowing the model to capture long‐range relationships inherently.

2. Parallelization: Because the attention mechanism computes relationships (or “attention weights”) between all positions concurrently, it avoids the inherently sequential computation of recurrent models. This greatly improves computational efficiency, especially for long sequences where recurrence tends to slow down training.

3. Weighted Aggregation of Information: Attention operates by computing a set o