## **Setup Rich Text Editor**

In [8]:
from rich.pretty import pprint
from rich import print as fprint
from rich import inspect

## **Load PDF**

In [9]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.documents import Document

loader: PyPDFLoader = PyPDFLoader(file_path="../assets/NIPS-2017-attention-is-all-you-need-Paper.pdf")

docs: list[Document] = loader.load()
pprint(docs[0])

## **Semantic Chunking**

In [10]:
!ollama list

NAME                     ID              SIZE      MODIFIED     
llama3.1:8b              46e0c10c039e    4.9 GB    2 hours ago     
nomic-embed-text:v1.5    0a109f422b47    274 MB    26 hours ago    
deepseek-r1:1.5b         e0979632db5a    1.1 GB    7 days ago      
llava:13b                0d0eb4d7f485    8.0 GB    2 months ago    
mistral:latest           3944fe81ec14    4.1 GB    2 months ago    


In [11]:
from langchain_experimental.text_splitter import SemanticChunker
from langchain_ollama import  OllamaEmbeddings

embeddings: OllamaEmbeddings = OllamaEmbeddings(model="nomic-embed-text:v1.5")
semantic_splitter: SemanticChunker = SemanticChunker(embeddings=embeddings)

docs_chunks: list[Document] = semantic_splitter.split_documents(documents=docs)
fprint("Length of chunks : ", len(docs_chunks))

## **Creating Embeddings**

In [12]:
from langchain_chroma.vectorstores import Chroma

vector_store = Chroma.from_documents(
    documents=docs_chunks,
    embedding=embeddings,
    persist_directory="../DB/transformer_paper_db"
)

## **Built in RAG**

In [None]:
from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.retrieval import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.runnables import RunnableLambda

llm = ChatOllama(model="mistral:latest")

prompt = ChatPromptTemplate.from_template(
    """You are a helpful assistant. Use only the context below to answer. If unsure, say "Based on provided info, I don't know."
    Context: {context}
    Question: {input}
    Answer:"""
)

stuff_chain = create_stuff_documents_chain(
    llm=llm,
    prompt=prompt,
) # A Runnable object that will build the prompt with context input. Then llm will create the response.

retrieval_chain = create_retrieval_chain(
    retriever=vector_store.as_retriever(search_kwargs={"k":2}),
    combine_docs_chain=stuff_chain
) # A Runnable Object that will search for similarity and after finding top k similarity pass to stuff chain


main_chain = retrieval_chain | RunnableLambda(lambda x : x.get("answer", ""))

query = "What are transformers?"

for chunk in main_chain.stream({"input":query}):
    print(chunk,end="",flush=False)

 Transformers, based on the provided context, are model architectures that use a multi-layer, multi-head self-attention mechanism and fully connected feed-forward networks as sub-layers. They employ residual connections and layer normalization in each of their layers to facilitate these connections. The dimensions of the output of all sub-layers and embedding layers are set to 512 (dmodel = 512). In the decoder, an additional third sub-layer is added that performs multi-head attention over the output of the encoder stack. To prevent positions from attending to subsequent positions in the decoder stack, self-attention sub-layers are modified with position masking and offset embeddings.