<a href="https://colab.research.google.com/github/Biswajitnahak2003/GenAIByLangchain/blob/main/Retrievers_in_langchain/Retrievers_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Retrivers in langchain**(based on retrieving)

* 1. Maximum Marginal Relevance*(MMR)

In [1]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.schema import Document

In [2]:
import os
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
from google.colab import userdata
os.environ["GOOGLE_API_KEY"] = userdata.get("GOOGLE_API_KEY")

In [4]:
docs = [
    Document(
        page_content="Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation."
    ),
    Document(
        page_content="The syntax of Python is known for being clean and expressive, allowing developers to write clear, logical code for small and large-scale projects."
    ),
    Document(
        page_content="For web development, Python's most popular framework is Django, which enables rapid development of secure and maintainable websites."
    ),
    Document(
        page_content="In data science, Python is the dominant language, with libraries like Pandas, NumPy, and Scikit-learn providing powerful tools for analysis and machine learning."
    )
]

In [5]:
embedding = GoogleGenerativeAIEmbeddings(model = "models/gemini-embedding-001")
docs = FAISS.from_documents(docs, embedding)

In [6]:
query = "Tell me about the Python language."
mmr_retriever = docs.as_retriever(search_type="mmr", search_kwargs={"k": 2})
mmr_results = mmr_retriever.invoke(query)

In [7]:
standard_retriever = docs.as_retriever(search_kwargs={"k": 2})
standard_results = standard_retriever.invoke(query)

In [8]:
for doc in standard_results:
    print(f"- {doc.page_content}")

- Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.
- The syntax of Python is known for being clean and expressive, allowing developers to write clear, logical code for small and large-scale projects.


In [9]:
for doc in mmr_results:
    print(f"- {doc.page_content}")

- Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.
- For web development, Python's most popular framework is Django, which enables rapid development of secure and maintainable websites.


*2. MultiQueryRetriever*

In [10]:
from langchain.retrievers.multi_query import MultiQueryRetriever

In [11]:
docs = [
    Document(page_content="India won the 2024 T20 World Cup, beating South Africa by 7 runs in a thrilling final. The match was held in Barbados."),
    Document(page_content="Virat Kohli was named Player of the Match for his crucial innings of 76 runs, which anchored the Indian batting effort."),
    Document(page_content="A spectacular catch by Suryakumar Yadav at the boundary in the final over was a major turning point, dismissing David Miller."),
    Document(page_content="Hardik Pandya bowled the final over, defending 16 runs to secure the victory for India. He took the key wicket of David Miller.")
]

In [12]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")
db = FAISS.from_documents(docs, embeddings)
retriever = db.as_retriever(search_kwargs={"k": 2})

In [13]:
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0)

In [16]:
mq_retriever = MultiQueryRetriever.from_llm(
    retriever=retriever, llm=llm
)

In [17]:
query = "Tell me about the final match."
retrieved_docs = mq_retriever.invoke(query)

In [18]:
for doc in retrieved_docs:
    print(f"- {doc.page_content}")

- India won the 2024 T20 World Cup, beating South Africa by 7 runs in a thrilling final. The match was held in Barbados.
- Virat Kohli was named Player of the Match for his crucial innings of 76 runs, which anchored the Indian batting effort.
- A spectacular catch by Suryakumar Yadav at the boundary in the final over was a major turning point, dismissing David Miller.


*3. Contextual Compression Retriever*

In [20]:
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever

In [21]:
docs = [
    Document(
        page_content="The 2024 T20 World Cup final was held in Barbados. It was a sunny day with a slight breeze, perfect for cricket. India scored 176 runs, with Virat Kohli making a crucial 76. The stadium was packed with fans from both nations."
    ),
    Document(
        page_content="South Africa's chase was intense. Heinrich Klaasen played a blistering knock, but Hardik Pandya's final over secured a 7-run victory for India. After the match, the Indian team celebrated their historic win."
    )
]


In [22]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")
db = FAISS.from_documents(docs, embeddings)
base_retriever = db.as_retriever(search_kwargs={"k": 2})

In [24]:
llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0)
compressor = LLMChainExtractor.from_llm(llm)

In [25]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=base_retriever
)

In [26]:
query = "What was Virat Kohli's score?"
retrieved_docs = compression_retriever.invoke(query)

In [27]:
for doc in retrieved_docs:
    print(f"- {doc.page_content}")

- Virat Kohli making a crucial 76.
