<a href="https://colab.research.google.com/github/Saifullah785/langchain-generative-ai-journey/blob/main/Lecture_13_langchain_retrievers/Lecture_13_langchain_retrievers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [27]:
import os
# Set the HUGGINGFACE_API_KEY environment variable
os.environ['HUGGINGFACE_API_KEY'] = 'hf_token'

In [28]:
# Install necessary libraries for LangChain, ChromaDB, FAISS, sentence-transformers, tiktoken, HuggingFace, and Wikipedia integration
! pip install langchain chromadb faiss-cpu sentence-transformers tiktoken langchain_HuggingFace langchain_community wikipedia



# **Wikipedia Retriever**

In [29]:
# Import the WikipediaRetriever class from the langchain_community module
from langchain_community.retrievers import WikipediaRetriever

In [30]:
# Initialize the WikipediaRetriever with a specified number of results and language
retriever = WikipediaRetriever(top_k_results=2, lang='en')

In [31]:
# Define the query string for the Wikipedia search
query = 'the geopolitical history of india and pakistan from the perspective of a chinese'

In [32]:
# Invoke the retriever with the query to get relevant documents from Wikipedia
docs = retriever.invoke(query)
# Display the retrieved documents
docs

[Document(metadata={'title': 'China–India relations', 'summary': "China and India maintained peaceful relations for thousands of years, but their relationship has varied since the Chinese Communist Party's victory in the Chinese Civil War in 1949 and the annexation of Tibet by the People's Republic of China. The two nations have sought economic cooperation with each other, while frequent border disputes and economic nationalism in both countries are major points of contention.\nCultural and economic relations between China and India date back to ancient times. The Silk Road not only served as a major trade route between India and China, but is also credited for facilitating the spread of Buddhism from India to East Asia. During the 19th century, China was involved in a growing opium trade with the East India Company, which exported opium grown in India. During World War II, both British India and the Republic of China (ROC) played a crucial role in halting the progress of Imperial Japa

In [33]:
# Iterate through the retrieved documents and print their content
for i , doc in enumerate(docs):
    print(f'\n--- Result {i+1} --- ')
    print(f'Content:\n{doc.page_content}..')


--- Result 1 --- 
Content:
China and India maintained peaceful relations for thousands of years, but their relationship has varied since the Chinese Communist Party's victory in the Chinese Civil War in 1949 and the annexation of Tibet by the People's Republic of China. The two nations have sought economic cooperation with each other, while frequent border disputes and economic nationalism in both countries are major points of contention.
Cultural and economic relations between China and India date back to ancient times. The Silk Road not only served as a major trade route between India and China, but is also credited for facilitating the spread of Buddhism from India to East Asia. During the 19th century, China was involved in a growing opium trade with the East India Company, which exported opium grown in India. During World War II, both British India and the Republic of China (ROC) played a crucial role in halting the progress of Imperial Japan. After India became independent in 19

#**Vector Store Retriever**

In [34]:
# Import necessary classes from langchain_community and langchain_core for working with vector stores and documents
from langchain_community.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_core.documents import Document

In [35]:
# Define a list of Document objects with example text content
documents = [
    Document(page_content="LangChain helps developers build LLM applications easily"),
    Document(page_content="Chroma is a vector database optimized for LLM-based search"),
    Document(page_content='Embeddings convert text into high-dimensional vectors.'),
    Document(page_content='OpenAI provides powerful embedding models.'),
]

In [36]:
# Initialize the HuggingFaceEmbeddings model for generating embeddings
embeddings_model = HuggingFaceEmbeddings()

# Create a Chroma vector store from the documents and embeddings model
vectorstores = Chroma.from_documents(
    documents =documents,
    embedding = embeddings_model,
    collection_name='my_collection'

)

  embeddings_model = HuggingFaceEmbeddings()


In [37]:
# Convert the vector store into a retriever with a specified number of search results
retriever = vectorstores.as_retriever(search_kwargs={'k': 2})

In [38]:
# Define the query string for the vector store search
query = 'what is Chroma used for?'
# Invoke the retriever with the query to get relevant documents from the vector store
results = retriever.invoke(query)

In [39]:
# Iterate through the search results and print the content of each document
for i, doc in enumerate(results):
  print(f'\n--- Result {i+1} --- ')
  print(f'Content:\n{doc.page_content}..')


--- Result 1 --- 
Content:
Chroma is a vector database optimized for LLM-based search..

--- Result 2 --- 
Content:
Chroma is a vector database optimized for LLM-based search..


In [40]:
# Iterate through the search results again and print the content of each document
for i, doc in enumerate(results):
    print(f"\n--- Result {i+1} ---")
    print(doc.page_content)


--- Result 1 ---
Chroma is a vector database optimized for LLM-based search

--- Result 2 ---
Chroma is a vector database optimized for LLM-based search


# **MMR (Maximum Marginal Relevance)**

In [41]:
# Define a list of Document objects with example text content for MMR
docs = [
    Document(page_content="LangChain makes it easy to work with LLMs."),
    Document(page_content="LangChain is used to build LLM based applications."),
    Document(page_content="Chroma is used to store and search document embeddings."),
    Document(page_content="Embeddings are vector representations of text."),
    Document(page_content="MMR helps you get diverse results when doing similarity search."),
    Document(page_content="LangChain supports Chroma, FAISS, Pinecone, and more."),
]

In [42]:
# Import the FAISS vector store class from the langchain_community module
from langchain_community.vectorstores import FAISS

In [43]:
# Initialize the HuggingFaceEmbeddings model (this will use a default model)
embeddings_model = HuggingFaceEmbeddings()
# Create a FAISS vector store from the documents and embeddings model
vectorstore = FAISS.from_documents(
    documents = docs,
    embedding = embeddings_model
    )

  embeddings_model = HuggingFaceEmbeddings()


In [44]:
# Convert the FAISS vector store into a retriever using MMR search
retriever = vectorstore.as_retriever(
    search_type='mmr', # Specify Maximum Marginal Relevance search
    search_kwargs = {'k': 3, 'lambda_mult': 0.5} # Configure search parameters (k=number of results, lambda_mult=diversity)
)

In [45]:
# Define the query string for the MMR search
query = 'what is langchain?'
# Invoke the retriever with the query to get relevant documents using MMR
results = retriever.invoke(query)

In [46]:
# Iterate through the MMR search results and print the content of each document
for i, doc in enumerate(results):
  print(f'\n --- Result {i+1} --- ')
  print(f'Content:\n{doc.page_content}..')


 --- Result 1 --- 
Content:
LangChain is used to build LLM based applications...

 --- Result 2 --- 
Content:
Embeddings are vector representations of text...

 --- Result 3 --- 
Content:
LangChain supports Chroma, FAISS, Pinecone, and more...
