# Local Rag Demo 

## Document Loading

In [16]:
import os
os.environ['USER_AGENT'] = 'myagent'

from langchain.document_loaders import PyPDFLoader

from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = PyPDFLoader("./Billy_Chan_Resume.pdf")
data = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=10)
all_splits = text_splitter.split_documents(data)

## Initialize the vector store. Use nomic-embed-text, 

In [17]:
from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings

local_embeddings = OllamaEmbeddings(model="nomic-embed-text")

vectorstore = Chroma.from_documents(documents=all_splits, embedding=local_embeddings)

Test that similarity search is working

In [18]:
question = "Who is Billy Chan?"
docs = vectorstore.similarity_search(question)
len(docs)

4

## Import the LLM Model from ollama

In [5]:
from langchain_ollama import ChatOllama

model = ChatOllama(
    model="llama3.2:3b",
)

Test the response from the model

In [21]:
response_message = model.invoke(
    "Who is Billy Chan "
)

print(response_message.content)

I couldn't find any notable or well-known person with the name "Billy Chan". There may be several individuals with this name, and without more context or information, it's difficult to determine who you are referring to.

Could you please provide more details or clarify which Billy Chan you are thinking of? Is he a public figure, artist, musician, athlete, or someone else? Any additional context would help me narrow down the search.


## Using a Chain

We can create a summarization chain with either model by passing in retrieved docs and a simple prompt.

It formats the prompt template using the input key values provided and passes the formatted string to the specified model:


In [26]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template(
    "Summarize the main components in the retrieved docs: {docs}"
)


# Convert loaded documents into strings by concatenating their content
# and ignoring metadata
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


chain = {"docs": format_docs} | prompt | model | StrOutputParser()

question = "Who is Billy Chan?"

docs = vectorstore.similarity_search(question)

chain.invoke(docs)

'Here is a summary of the main components in the retrieved docs:\n\n* Name: Billy Chan\n* Contact Information:\n + Phone: (647)-866-8368\n + Email: billyychan.work@gmail.com\n + LinkedIn: linkedin.com/in/billyzechan\n + GitHub: github.com/billyzechan\n* Availability: Available for 8 or 12 month Software Development Co-op or Internship starting from May 2025\n* Education:\n + Currently attending McMaster University (MEng, Computing & Software)\n + Attended City University of Hong Kong\n\nNote that the provided information is limited and may not be comprehensive.'

## Q&A
You can also perform question-answering with your local model and vector store. Here's an example with a simple string prompt:

In [30]:
from langchain_core.runnables import RunnablePassthrough

RAG_TEMPLATE = """
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.

<context>
{context}
</context>

Answer the following question:

{question}"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

chain = (
    RunnablePassthrough.assign(context=lambda input: format_docs(input["context"]))
    | rag_prompt
    | model
    | StrOutputParser()
)

question = "What Skills Or Work Experiences do Billy Chan have?"

docs = vectorstore.similarity_search(question)

# Run
chain.invoke({"context": docs, "question": question})

'Based on the context, it appears that Billy Chan has skills and experience in software development, specifically in areas such as data structures and algorithms, microservice-oriented architecture, software design, big data, and machine learning. His education at McMaster University suggests a strong foundation in computer science. No specific work experiences are mentioned in the provided context.'

## Q&A with retrieval
Finally, instead of manually passing in docs, you can automatically retrieve them from our vector store based on the user question:

In [33]:
retriever = vectorstore.as_retriever()

qa_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | model
    | StrOutputParser()
)

question = "Based on the skills and experience of Billy Chan, do you think he is a good candidate for any software development co-op?"

qa_chain.invoke(question)

'Based on the provided context, it appears that Billy Chan has relevant education and courses in Computing & Software at McMaster University, as well as experience with data structures, algorithms, microservices, software design, big data, and machine learning. His availability for an 8 or 12 month co-op internship starting from May 2025 suggests that he is actively seeking opportunities. Overall, his background and skills suggest that Billy Chan could be a strong candidate for a software development co-op.'