##Build a local InMemory VectorDb using Langchain

In [1]:
%pip install langchain langgraph langchain_community langchain_huggingface chromadb pypdf

Collecting langgraph
  Downloading langgraph-0.2.60-py3-none-any.whl.metadata (15 kB)
Collecting langchain_community
  Downloading langchain_community-0.3.13-py3-none-any.whl.metadata (2.9 kB)
Collecting langchain_huggingface
  Downloading langchain_huggingface-0.1.2-py3-none-any.whl.metadata (1.3 kB)
Collecting chromadb
  Downloading chromadb-0.5.23-py3-none-any.whl.metadata (6.8 kB)
Collecting pypdf
  Downloading pypdf-5.1.0-py3-none-any.whl.metadata (7.2 kB)
Collecting langgraph-checkpoint<3.0.0,>=2.0.4 (from langgraph)
  Downloading langgraph_checkpoint-2.0.9-py3-none-any.whl.metadata (4.6 kB)
Collecting langgraph-sdk<0.2.0,>=0.1.42 (from langgraph)
  Downloading langgraph_sdk-0.1.48-py3-none-any.whl.metadata (1.8 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting httpx-sse<0.5.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Col

In [3]:
from langchain_core.documents import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.vectorstores import InMemoryVectorStore
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders import PyPDFLoader

In [21]:

# 1. Load PDF documents
# Specify the path to your PDF file
pdf_loader = PyPDFLoader("A Comprehensive Langchain Guide.pdf")
docs_list = pdf_loader.load()  # Load and parse the PDF content into Document objects

# 2. Split documents into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,  # Size of each chunk
    chunk_overlap=100,  # Overlap between chunks
    length_function=len,  # Function to measure chunk length
    is_separator_regex=False,  # If the separator is regex-based
)
doc_splits = text_splitter.split_documents(docs_list)

# 3. Generate embeddings
embedding = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en-v1.5")

# 4. Initialize the vector store
vectorstore = InMemoryVectorStore(embedding=embedding)

# 5. Add documents to the vector store
vectorstore.add_documents(doc_splits)

# 6. Create a retriever for querying
retriever = vectorstore.as_retriever(search_kwargs={"k":2})

# Example usage of the retriever
query = "Langchain features for building llm applications"
results = retriever.get_relevant_documents(query)
docs = vectorstore.similarity_search(query)
print(docs)

# Print the top results
for i, result in enumerate(results, 1):
    print(f"Result {i}:")
    print(f"Content: {result.page_content}")
    # print(f"Metadata: {result.metadata}")
    # print("\n")


[Document(id='053492a1-c624-4258-8178-94e11318dca6', metadata={'source': 'A Comprehensive Langchain Guide.pdf', 'page': 0}, page_content='LangChain is a Python library designed tosimplify thedevelopment of applications that utilizelargelanguagemodels(LLMs), suchasthosefromOpenAI, HuggingFace, andotherproviders.As artificial intelligence evolves, LLMs have proven to be powerful tools across industries,enabling applications that generate text, answer questions, summarize documents, and evenassist withdecision-makingprocesses. However, buildingsophisticatedapplicationsusingLLMscan be challenging due to the complexities of chaining'), Document(id='38c68218-e58c-42b7-b525-584747a8d140', metadata={'source': 'A Comprehensive Langchain Guide.pdf', 'page': 1}, page_content='1. LLM Wrappers: These wrappers enable developers to interact with LLMs through a commoninterface, abstracting away the complexities of different APIs. For example, with the sameOpenAIwrapper, youcaneasilyswitchmodelsbychang

In [26]:
retriever = vectorstore.as_retriever(search_kwargs={"k":1})
query = "what are prompttemplates in langchain"
results = retriever.get_relevant_documents(query)

for i, result in enumerate(results, 1):
    print(f"Content: {result.page_content}")

Content: Prompt templatesareideal for building applications where prompts are dynamically generated, suchaschatbots, Q&Asystems, orcontent generationtools.3. Chains: LangChain’s chaining capabilities allowdeveloperstolinkmultiplecomponentstogether,creating workflows where each step relies on the previous one’s output. For instance, asummarization workflowmight involve a pre-processing step, followed by a text generationstep,
