# Retrieval

- Retrieval is the centerpiece of our retrieval augmented generation (RAG) flow.


1. Accessing / Indexing the data in the vector store

    -  Basic Semantic similarity 
    - Maximum margin relevence
    - Including metadata
    
2. LLM Aided Retrivel 

## Vectorstore retrieval

In [None]:
import os
import openai
import sys
sys.path.append('../..')

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

## Maximum Marginal Relevence (MMR)

- You may not always want to choose most similar response 




### Compression 
- Increase the number of  result you can put in the context by shrinking the response to only one relevent information


## Similarity Search

In [2]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
persist_directory = 'docs/chroma/'

In [3]:
embedding = OpenAIEmbeddings()
vectordb = Chroma(
    persist_directory=persist_directory,
    embedding_function=embedding
)

  warn_deprecated(


ValidationError: 1 validation error for OpenAIEmbeddings
__root__
  Did not find openai_api_key, please add an environment variable `OPENAI_API_KEY` which contains it, or pass `openai_api_key` as a named parameter. (type=value_error)

In [None]:
print(vectordb._collection.count())

In [4]:
texts = [
    """The Amanita phalloides has a large and imposing epigeous (aboveground) fruiting body (basidiocarp).""",
    """A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.""",
    """A. phalloides, a.k.a Death Cap, is one of the most poisonous of all known mushrooms.""",
]
question = "Tell me about all-white mushrooms with large fruiting bodies"


In [5]:
smalldb = Chroma.from_texts(texts, embedding=embedding)
## return all posible output  
smalldb.similarity_search(question, k=2)

NameError: name 'embedding' is not defined

In [None]:
## Now let's MMR (maximum marginal relevence)

smalldb.max_marginal_relevence_search(question,k=2,fetch_k=3)

### Why fetch_k might be greater than k

- Efficiency: The database might need to explore a slightly larger pool of candidates to ensure it has enough options to identify the truly most relevant k results. This is especially true if the similarity scores between items are very close.

- Filtering: Some databases might perform additional filtering based on criteria beyond just similarity. Fetching more candidates allows for this extra filtering step while still guaranteeing the desired k most relevant results.


### Addressing Diversity: Maximum marginal relevance

Last class we introduced one problem: how to enforce diversity in the search results.

Maximum marginal relevance strives to achieve both relevance to the query and diversity among the results.



In [None]:
question = "what did they say about matlab?"
docs_ss = vectordb.similarity_search(question,k=3)

In [None]:
docs_mmr = vectordb.max_marginal_relevance_search(question,k=3)

### Diffence between Similarity_search and Max_marginal_Relevance_search


- **Similarity_search:** 

    - Core Idea:
        Finds documents that are most similar to the query based on some measure of similarity, often using vector representations.
    - Focuses on: 
        Individual document similarity. The documents with the highest similarity scores are considered the most relevant.
    - Applications:
        Works well when you want documents that directly address the query or closely match its content.

    Example: Searching for product descriptions based on a user's search term.

- **Max_Marginal_Relevance_Search:**

    - Core Idea:
        Aims to return a diverse set of relevant documents that not only match the query but also minimize redundancy(repitative) among themselves.
    - Focuses on:
        Both individual relevance and diversity. It prioritizes documents highly relevant to the query while also ensuring they provide unique information compared to other retrieved documents.
    - Applications:
        Useful when you want a comprehensive overview of a topic or need to avoid overwhelming users with similar results.

    Example: Searching for news articles about a current event. You might want articles from different sources with varying perspectives, not just duplicates from the same source.


- Choosing Between Them

    - **Similarity Search:** 
            Use it when you prioritize finding the documents closest to the query's content, regardless of redundancy.
    - **MMR:**
            Use it when you want a diverse set of relevant documents that cover different aspects of the topic or avoid repetitive results.



## Addressing Specificity: working with metadata

In last lecture, we showed that a question about the third lecture can include results from other lectures as well.

To address this, many vectorstores support operations on metadata.

metadata provides context for each embedded chunk.


In [None]:
question = "what did they say about regression in the third lecture?"


docs = vectordb.similarity_search(
    question,
    k=3,
    filter={"source":"docs/cs229_lectures/MachineLearning-Lecture03.pdf"}
)

# the output give most relevent pdf pages. 

for d in docs:
    print(d.metadata)

## Addressing Specificity: working with metadata using self-query retriever

But we have an interesting challenge: we often want to infer the metadata from the query itself.

To address this, we can use SelfQueryRetriever, which uses an LLM to extract:

- The query string to use for vector search
- A metadata filter to pass in as well

Most vector databases support metadata filters, so this doesn't require any new databases or indexes.


- We can do that without manual intraction, by following methods

In [4]:
from langchain.llms import OpenAI
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo # in which we use to specify different field in the metadata.

In [5]:
## we only have two field in the metadata, Sources and Pages 

metadata_field_info = [
    AttributeInfo(name="source",
                  description="The lecture the chunk is from, should be one of `docs/cs229_lectures/MachineLearning-Lecture01.pdf`, `docs/cs229_lectures/MachineLearning-Lecture02.pdf`, or `docs/cs229_lectures/MachineLearning-Lecture03.pdf`",
                  type="string"),
    AttributeInfo(name="page",
                  description="The page from the lecture",
                  type="integer")
                  ]

## this information passed into LLM and  

In [None]:
document_content_description    = "Lecture note"
llm                             = OpenAI(model='gpt-3.5-turbo-instruct', temperature=0)
retriever                       = SelfQueryRetriever.from_llm(llm               = llm,
                                                              vectorstore       = vectordb,
                                                              document_contents = document_content_description,
                                                              metadata_field_info   = metadata_field_info,
                                                              verbose               = True)

In [None]:
question = "what did they say about regression in the third lecture?"

docs = retriever.get_relevant_documents(question)

for d in docs:
    print(d.metadata)

## Compression

- Another approach for improving the quality of retrived docs is **Compression**

- Information most relevant to a query may be buried in a document with a lot of irrelevant text.

- Passing that full document through your application can lead to more expensive LLM calls and poorer responses.

Contextual compression is meant to fix this. 

In [1]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

In [2]:
def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))


In [None]:
# Wrap our vectorstore
llm = OpenAI(temperature=0, model="gpt-3.5-turbo-instruct")
compressor = LLMChainExtractor.from_llm(llm)


compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever()
)

question = "what did they say about matlab?"
compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)

In [None]:
## Combining various techniques

compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever(search_type = "mmr")
)

question = "what did they say about matlab?"
compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)

## Other types of retrieval

It's worth noting that vectordb as not the only kind of tool to retrieve documents.

The LangChain retriever abstraction includes other ways to retrieve documents, such as TF-IDF or SVM.


In [None]:
from langchain.retrievers import SVMRetriever
from langchain.retrievers import TFIDFRetriever
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter


# Load PDF
loader = PyPDFLoader("docs/cs229_lectures/MachineLearning-Lecture01.pdf")
pages = loader.load()
all_page_text=[p.page_content for p in pages]
joined_page_text=" ".join(all_page_text)

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1500,chunk_overlap = 150)
splits = text_splitter.split_text(joined_page_text)



# Retrieve
svm_retriever = SVMRetriever.from_texts(splits,embedding)
tfidf_retriever = TFIDFRetriever.from_texts(splits)


question = "What are major topics for this class?"
docs_svm=svm_retriever.get_relevant_documents(question)
docs_svm[0]


question = "what did they say about matlab?"
docs_tfidf=tfidf_retriever.get_relevant_documents(question)
docs_tfidf[0]

In [None]:
from langchain_core