# Retrieval

Retrieval is the centerpiece of our retrieval augmented generation (RAG) flow. 

Let's get our vectorDB from before.

- **MMR** increases diversity in the retrieved documents, which can be crucial for relevant information which does not seem relevant from first glance at semantic similarity

## Vectorstore retrieval

In [None]:
#!pip install lark

## EITHER: get your [OpenAI API Key](https://platform.openai.com/account/api-keys)

In [None]:
import os
import openai
import sys
sys.path.append('../..')

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

## OR: use [LocalAI as an OpenAI replacement](https://localai.io/howtos/easy-request-openai/)

In [1]:
import os
import openai

# Specify the port your LocalAI docker container runs on
openai.api_base = "http://localhost:8080/v1"  # default
openai.api_key = "sx-xxx"
OPENAI_API_KEY = "sx-xxx"
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

In [2]:
# Specify the model you are using
# llm_model = ""
llm_model = "lunademo"
# llm_model = "thebloke__wizardlm-13b-v1-0-uncensored-superhot-8k-ggml__wizardlm-13b-v1.0-superhot-8k.ggmlv3.q4_k_m.bin"
# llm_model = "llama2-7b-uncensored"
# llm_model = "mistral-7b"

### Similarity Search
- the problem can be that we retrieve relevant parts, but we omit parts which do not seem relevent
- here: the mushroom is actually poisonous, but we don't get this information with the used query

In [3]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
persist_directory = 'docs/chroma/'

In [4]:
embedding = OpenAIEmbeddings()

In [5]:
vectordb = Chroma(
    persist_directory=persist_directory,
    embedding_function=embedding
)

print(vectordb._collection.count())

287


In [6]:
texts = [
    """The Amanita phalloides has a large and imposing epigeous (aboveground) fruiting body (basidiocarp).""",
    """A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.""",
    """A. phalloides, a.k.a Death Cap, is one of the most poisonous of all known mushrooms.""",
]

In [7]:
smalldb = Chroma.from_texts(texts, embedding=embedding)

In [8]:
question = "Tell me about all-white mushrooms with large fruiting bodies"

In [9]:
smalldb.similarity_search(question, k=2)  # k=2, retrieve only 2 most relevant documents

[Document(page_content='The Amanita phalloides has a large and imposing epigeous (aboveground) fruiting body (basidiocarp).'),
 Document(page_content='A. phalloides, a.k.a Death Cap, is one of the most poisonous of all known mushrooms.')]

**Now we use MMR (maximum marginal relevance) instead of simple similarity search**
- we leave `k=2` for 2 returned documents, but set `fetch_k=3` to retrieve all documents

In [10]:
smalldb.max_marginal_relevance_search(question,k=2, fetch_k=3)

[Document(page_content='A mushroom with a large fruiting body is the Amanita phalloides. Some varieties are all-white.'),
 Document(page_content='The Amanita phalloides has a large and imposing epigeous (aboveground) fruiting body (basidiocarp).')]

### Addressing Diversity: Maximum marginal relevance

Last class we introduced one problem: how to enforce diversity in the search results.
 
`Maximum marginal relevance` strives to achieve both relevance to the query *and diversity* among the results.

In [11]:
question = "what did they say about matlab?"
docs_ss = vectordb.similarity_search(question,k=3)

In [12]:
docs_ss[0].page_content[:100]

'than any that you may choose to do yourself in your project. So all the homeworks can be \ndone in MA'

In [13]:
docs_ss[1].page_content[:100]

'of the tumors, like clump thic kness, uniformity of cell size, uniformity of cell shape, \n[inaudible'

**Note the difference in results with `MMR`**
- second document is different now

In [14]:
docs_mmr = vectordb.max_marginal_relevance_search(question,k=3)

In [15]:
docs_mmr[0].page_content[:100]

'than any that you may choose to do yourself in your project. So all the homeworks can be \ndone in MA'

In [16]:
docs_mmr[1].page_content[:100]

"alter medical practice as a result of me dical knowledge that's derived by applying \nlearning algori"

### Addressing Specificity: working with metadata

**In last lecture, we showed that a question about the third lecture can include results from other lectures as well**

To address this, many vectorstores support operations on `metadata`.

`metadata` provides context for each embedded chunk.

- we use a `filter` here for metadata

In [20]:
question = "what did they say about regression in the third lecture?"

In [21]:
docs = vectordb.similarity_search(
    question,
    k=3,
    filter={"source":"docs/cs229_lectures/MachineLearning-Lecture03.pdf"}
)

In [22]:
for d in docs:
    print(d.metadata)

### Addressing Specificity: working with metadata using self-query retriever
**- instead of specifying this info manually, we use an LLM and pass it metadata to do it for us**

But we have an interesting challenge: we often want to infer the metadata from the query itself.

To address this, we can use `SelfQueryRetriever`, which uses an LLM to extract:
 
1. The `query` string to use for vector search
2. A metadata filter to pass in as well

Most vector databases support metadata filters, so this doesn't require any new databases or indexes.

In [23]:
from langchain.llms import OpenAI
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo

In [24]:
metadata_field_info = [
    AttributeInfo(
        name="source",
        description="The lecture the chunk is from, should be one of `docs/cs229_lectures/MachineLearning-Lecture01.pdf`, `docs/cs229_lectures/MachineLearning-Lecture02.pdf`, or `docs/cs229_lectures/MachineLearning-Lecture03.pdf`",
        type="string",
    ),
    AttributeInfo(
        name="page",
        description="The page from the lecture",
        type="integer",
    ),
]

In [26]:
document_content_description = "Lecture notes"
llm = OpenAI(
    model = "lunademo",
    temperature=0)
retriever = SelfQueryRetriever.from_llm(
    llm,
    vectordb,
    document_content_description,
    metadata_field_info,
    verbose=True
)

In [27]:
question = "what did they say about regression in the third lecture?"

**You will receive a warning** about predict_and_parse being deprecated the first time you executing the next line. This can be safely ignored.

In [None]:
docs = retriever.get_relevant_documents(question)

In [None]:
for d in docs:
    print(d.metadata)

### Additional tricks: compression

Another approach for improving the quality of retrieved docs is compression.

Information most relevant to a query may be buried in a document with a lot of irrelevant text. 

Passing that full document through your application can lead to more expensive LLM calls and poorer responses.

Contextual compression is meant to fix this. 

In [30]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

**We define a custom function here to pretty print the docs**
- they're often very long and difficult to print

In [31]:
def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))

In [32]:
# Wrap our vectorstore
llm = OpenAI(
    model = "lunademo",
    temperature=0)
compressor = LLMChainExtractor.from_llm(llm)

In [33]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever()
)

In [34]:
question = "what did they say about matlab?"
compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)

Document 1:

"knowledge of C or Java specifically."
----------------------------------------------------------------------------------------------------
Document 2:

- "what if your data doesn't lie in a two-dimensional or three-dimensional or even a finite dimensional space, but is it possible - what if your data actually lies in an infinite dimensional space?" - this sentence explains that the context is about medical data and the possibility of it being infinite dimensional.
----------------------------------------------------------------------------------------------------
Document 3:

- "there won't be video you can safely sit there and make faces  at me, and that won't show, okay?"
- "On the third page, there's a section that says Online Resources."
----------------------------------------------------------------------------------------------------
Document 4:

- "MATLAB is I guess part of the programming language that makes it very easy to write codes using matrices, to write co

## Combining various techniques

In [35]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectordb.as_retriever(search_type = "mmr")
)

In [36]:
question = "what did they say about matlab?"
compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)

Document 1:

"knowledge of C or Java specifically."
----------------------------------------------------------------------------------------------------
Document 2:

- "most of you probably use learning algorithms" (relevant to the question about whether the asker uses learning algorithms in their work)
- "every time you send US mail, you are using a learning algorithm" (relevant to the question about whether the asker is aware of using learning algorithms in their work)
----------------------------------------------------------------------------------------------------
Document 3:

>>> "So, for example, this is something that my students and I work on. If I give you the keys to the city, you will have to make a sequence of decisions over time."
----------------------------------------------------------------------------------------------------
Document 4:

going to look at the size of the tumor and depe nding on the size of the tumor, we'll try to  figure out whether or not the tu mor

## Other types of retrieval

It's worth noting that vectordb as not the only kind of tool to retrieve documents. 

The `LangChain` retriever abstraction includes other ways to retrieve documents, such as TF-IDF or SVM.

In [37]:
from langchain.retrievers import SVMRetriever
from langchain.retrievers import TFIDFRetriever
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [39]:
# Load PDF
loader = PyPDFLoader("MachineLearning-Lecture01.pdf")
pages = loader.load()
all_page_text=[p.page_content for p in pages]
joined_page_text=" ".join(all_page_text)

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1500,chunk_overlap = 150)
splits = text_splitter.split_text(joined_page_text)


In [41]:
# Retrieve
svm_retriever = SVMRetriever.from_texts(splits,embedding)
tfidf_retriever = TFIDFRetriever.from_texts(splits)

In [42]:
question = "What are major topics for this class?"
docs_svm=svm_retriever.get_relevant_documents(question)
docs_svm[0]



Document(page_content="to things in web crawling and so on. But it's just  cool to show videos, so let me just show \na bunch of them. This learning algorithm was actually implemented by our head TA, Zico, of programming a four-legged dog. I guess Sam Shriver in this class also worked \non the project and Peter Renfrew and Mike and a few others. But I guess this really is a \ngood dog/bad dog since it's a robot dog.  \nThe second video on the right, some of the st udents, I guess Peter, Zico, Tonca working \non a robotic snake, again using learning algorith ms to teach a snake robot to climb over \nobstacles.  \nBelow that, this is kind of a fun example.  Ashutosh Saxena and Jeff Michaels used \nlearning algorithms to teach a car how to  drive at reasonably high speeds off roads \navoiding obstacles.  \nAnd on the lower right, that's a robot program med by PhD student Eva Roshen to teach a \nsort of somewhat strangely configured robot how to get on top of an obstacle, how to get \nover

In [43]:
question = "what did they say about matlab?"
docs_tfidf=tfidf_retriever.get_relevant_documents(question)
docs_tfidf[0]

Document(page_content="Saxena and Min Sun here did, wh ich is given an image like this, right? This is actually a \npicture taken of the Stanford campus. You can apply that sort of cl ustering algorithm and \ngroup the picture into regions. Let me actually blow that up so that you can see it more \nclearly. Okay. So in the middle, you see the lines sort of groupi ng the image together, \ngrouping the image into [inaudible] regions.  \nAnd what Ashutosh and Min did was they then  applied the learning algorithm to say can \nwe take this clustering and us e it to build a 3D model of the world? And so using the \nclustering, they then had a lear ning algorithm try to learn what the 3D structure of the \nworld looks like so that they could come up with a 3D model that you can sort of fly \nthrough, okay? Although many people used to th ink it's not possible to take a single \nimage and build a 3D model, but using a lear ning algorithm and that sort of clustering \nalgorithm is the first ste