<a href="https://colab.research.google.com/github/A-K-0802/GEN-AI-2025/blob/main/Retrievers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install langchain chromadb faiss-cpu langchain-huggingface tiktoken langchain-community wikipedia



#Wikipedia Retriever

In [2]:
from langchain_community.retrievers import WikipediaRetriever

retriever=WikipediaRetriever(top_k_results=2,lang="en")
query = "Geopolitical history of China"

docs=retriever.invoke(query)
for i,doc in enumerate(docs):
  print(f"\n---Result {i+1}---\n")
  print(f"Content:\n{doc.page_content}...")


---Result 1---

Content:
Geopolitics (from Ancient Greek  γῆ gê 'earth, land' and  πολιτική politikḗ 'politics') is the study of the effects of Earth's geography on politics and international relations. Geopolitics usually refers to countries and relations between them, it may also focus on two other kinds of states: de facto independent states with limited international recognition and relations between sub-national geopolitical entities, such as the federated states that make up a federation, confederation, or a quasi-federal system.
At the level of international relations, geopolitics is a method of studying foreign policy to understand, explain, and predict international political behavior through geographical variables. These include area studies, climate, topography, demography, natural resources, and applied science of the region being evaluated.
Geopolitics focuses on political power linked to geographic space, in particular, territorial waters, land territory and  wealth of n

# Vector Store Retriever

In [3]:
from langchain_community.vectorstores import Chroma
from langchain_huggingface.embeddings import HuggingFaceEmbeddings


embedding_model=HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

In [4]:
from langchain_core.documents import Document

documents=[
    Document(page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose"),
    Document(page_content="One dinosaur lives in the ocean"),
    Document(page_content="One dinosaur ate another dinosaur"),
    Document(page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ..."),
    Document(page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and ..."),
    Document(page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them"),
]

In [5]:
vector_store=Chroma.from_documents(documents=documents,embedding=embedding_model,collection_name="my_collection")

In [6]:
retriever_VS=vector_store.as_retriever(search_kwargs={"k":2})

query="Where is DiCaprio?"
result=retriever_VS.invoke(query)
for i,doc in enumerate(result):
  print(f"\n---Result {i+1}---\n")
  print(f"Content:\n{doc.page_content}...")


---Result 1---

Content:
Leo DiCaprio gets lost in a dream within a dream within a dream within a ......

---Result 2---

Content:
One dinosaur lives in the ocean...


# Maximum Marginal Retriever

In [7]:
from langchain_community.vectorstores import FAISS

embedding_model=HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore_F=FAISS.from_documents(documents=documents,embedding=embedding_model)

retriever_F=vectorstore_F.as_retriever(search_type="mmr",
                                       search_kwargs={"k":3, "lambda_mult":1})

query="Where are dinosaurs"

In [8]:
result_F=retriever_F.invoke(query)
for i,doc in enumerate(result_F):
  print(f"\n---Result {i+1}---\n")
  print(f"Content:\n{doc.page_content}...")


---Result 1---

Content:
One dinosaur lives in the ocean...

---Result 2---

Content:
A bunch of scientists bring back dinosaurs and mayhem breaks loose...

---Result 3---

Content:
One dinosaur ate another dinosaur...


# Multi Query Retriever

In [9]:
docs_criket = [
    Document(
        page_content="Cricket is a bat-and-ball game played between two teams of eleven players. It originated in England and is now popular worldwide.",
        metadata={"source": "wikipedia", "title": "Cricket Overview", "category": "Sports"}
    ),
    Document(
        page_content="Sachin Tendulkar is regarded as one of the greatest batsmen in the history of cricket. He scored 100 international centuries.",
        metadata={"source": "espncricinfo", "title": "Sachin Tendulkar", "category": "Players"}
    ),
    Document(
        page_content="The Indian Premier League (IPL) is a professional T20 cricket league in India, known for its entertainment and global viewership.",
        metadata={"source": "bcci.tv", "title": "IPL", "category": "Tournaments"}
    ),
    Document(
        page_content="A Test match in cricket is played over five days with unlimited overs. It is considered the most challenging format of the game.",
        metadata={"source": "icc-cricket.com", "title": "Test Cricket", "category": "Formats"}
    ),
    Document(
        page_content="The 2011 ICC Cricket World Cup was won by India. The final was held in Mumbai, where India defeated Sri Lanka.",
        metadata={"source": "icc-cricket.com", "title": "2011 Cricket World Cup", "category": "History"}
    )
]

In [18]:
from re import search
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_huggingface import HuggingFaceEndpoint



In [11]:
!pip install -U langchain-google-genai



In [19]:
from langchain_google_genai import ChatGoogleGenerativeAI

model=ChatGoogleGenerativeAI(model="google-2.0-flash")

vector_store_MQ=FAISS.from_documents(documents=docs_criket,embedding=embedding_model)
similarity_ret=vector_store_MQ.as_retriever(search_type="similarity",search_kwargs={"k":2})
MQR=MultiQueryRetriever.from_llm(
    retriever=vector_store.as_retriever(search_kwargs={'k':3}),
    llm=model
)

In [None]:
query="Tell me about Indian Cricket"

result_sim=similarity_ret.invoke(query)
result_mq=MQR.invoke(query)