# Retrieval Augemented Generation (RAG)

In the prevous two recepies, we learned how to [obtain an embedding](./10-embeddings.ipynb), as well as how to find the [similarity](./11-similarity-search.ipynb) between two embeddings. Retrievel Augmente Generation uses both of these to achieve the goal of querying an LLM and also providing some documents for additional context. For example, if you want the LLM to answer some questions based on specific documents you have, you can feed those documents in 

A RAG model is primarily made up of two components, the retriever, and the reader

In [15]:
from langchain.docstore.document import Document
from langchain_dartmouth.retrievers.document_compressors import DartmouthReranker
from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders import TextLoader

Given a **collection** of documents, we are interested in finding the ones most relevent to our question. Once a query is posed, the `compress_documents()` function from the `DartmouthReranker` class can be used, which uses the concept about [similarity](./11-similarity-search.ipynb), to return a list of documents which are most relevant to the query.



### A Simple RAG

At it's heart a RAG is akin to just pluging in an entire chunk of text that we want the LLM to reference when in it's answer. An example of this can be found below

In [16]:
from langchain_dartmouth.llms import ChatDartmouth
llm = ChatDartmouth(model_name="llama-3-1-8b-instruct")

relevant_document = "Asteroids do not generally hit people. There is a 1 in a 7 billion chance for that to happen"
query = "Are asteroids going to hit me?"

response = llm.invoke(relevant_document + 'Considering this, answer the following question:' +  query)
print(response.content)

Given the 1 in 7 billion chance of an asteroid hitting a person, the likelihood of an asteroid impacting you specifically is extremely low. 

To put it into perspective, you are more likely to win the lottery or be struck by lightning multiple times than be hit by an asteroid.


### Reranking Documents

That was simple! What if we have many documents? In that case we can use the `DartmouthReranker` to find which document is the most relevant. 

<div class="alert alert-info">

**Note:** The `DirectoryLoader` function is a part of LangChain that accepts a directory, a regex expression, and a [loader class](./10-embeddings.ipynb). It's an easy way to load several documents that are in a directory at once
</div>

In [17]:
# Load the documents
loader = DirectoryLoader('./rag_documents', glob="**/*.txt", loader_cls=TextLoader)
collection = loader.load()

In [18]:
reranker = DartmouthReranker()

query = "Are asteroids going to hit me?"

ranked_docs = reranker.compress_documents(query=query, documents=collection)

for doc in ranked_docs:
    print(doc.metadata['source'])

rag_documents\asteroids.txt
rag_documents\hot_sauce.txt
rag_documents\history.txt


We can see that when our query is related to asteroids, the document correcly ranks the asteroids.txt as the most relevent document. Now, we can prepend the content of the first ranked document to our query to get a response which considers the information in the document

In [19]:
from langchain_dartmouth.llms import ChatDartmouth

llm = ChatDartmouth(model_name="llama-3-1-8b-instruct")
response = llm.invoke(ranked_docs[0].page_content + 'Considering this, answer the following question:' +  query)
print(response.content)

The text states that "Asteroids are very likely to not hit you. The chance is 1 in 1 billion." This is a fake estimate made for demonstration purposes, implying that asteroids are extremely unlikely to hit you.


## Using Vector Stores
There is a useful feature in Langchain called `vector stores`. This allows us to store several documents in one go. The best part of this is that we don't need to prepend an entire document. We can break the documents into smaller chunks and only use the most relevent parts. You can learn more about `vector stores` [here](https://python.langchain.com/docs/how_to/#vector-stores). This is an example of an easy way to `embed_documents`. By using the `CharacterTextSplitter`, we can define our **chunk_size**, which is to avoid the batch size issue we saw in [similarity-search]('../langchain_dartmouth_cookbook/11-similarity-search.ipynb'). We can now add these chunks into our vector store. 

In [24]:
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_dartmouth.embeddings import DartmouthEmbeddings
from langchain_text_splitters import CharacterTextSplitter

# Initialize the text splitter with appropriate chunk size
text_splitter = CharacterTextSplitter(separator=" ", chunk_size=100, chunk_overlap=50,length_function=len)

# Load and split the documents
collection = loader.load()
chunks = text_splitter.split_documents(collection)

# Initialize vector store and add chunks
vector_store = InMemoryVectorStore(embedding=DartmouthEmbeddings())
vector_store.add_documents(chunks)
pass

Now when we are searching for relevent parts of our questions, the `similarity_search` method of the vector_store will pinpoint the most relevant parts for our question. These can then be used as part of the prompt. 

<div class="alert alert-info">

**Note:** Below we can see that the `page_content` for each of the vector stores is relevent not only to asteroids, but the particular part of the document that discusses the chances of asteroids hitting someone
</div>


In [26]:
docs = vector_store.similarity_search(query)
docs

[Document(id='49e6ea98-31d0-4acf-beeb-edecd87d8b0d', metadata={'source': 'rag_documents\\asteroids.txt'}, page_content='asteroids, marking a new era in space exploration. \n\nAsteroids are very likely to not hit you. The'),
 Document(id='ba86aba0-8e59-49a9-bf69-e65c621b5336', metadata={'source': 'rag_documents\\asteroids.txt'}, page_content='Asteroids are very likely to not hit you. The chance is 1 in 1 billion. This is a fake estimate'),
 Document(id='b2d699a0-ea3b-461e-a4bd-ab4394948a13', metadata={'source': 'rag_documents\\asteroids.txt'}, page_content='Asteroids**\n\nAsteroids are typically small, with diameters ranging from a few meters to hundreds of'),
 Document(id='878675eb-d58c-4e31-a681-3ed986dce468', metadata={'source': 'rag_documents\\asteroids.txt'}, page_content='implications. By studying asteroids, scientists can gain insights into the formation and evolution')]

In [36]:
# Let's get all the relevent content from the documents
relevent_content = ' '.join(doc.page_content for doc in docs)
response = llm.invoke(relevent_content + 'Considering this, answer the following question:' +  query)
print(response.content)

The chances of an asteroid hitting you are extremely low. While the exact odds are difficult to estimate, it's generally agreed that the likelihood of being hit by an asteroid is tiny.

To put things into perspective, the estimate you mentioned earlier (1 in 1 billion) is likely an exaggeration. However, even if we consider more realistic estimates, the risk is still extremely low.

For example, NASA estimates that the chances of being hit by a large asteroid (over 100 meters in diameter) are about 1 in 1 million every year. The chances of being hit by a smaller asteroid are even lower.

To give you a better idea, here are some estimates of the annual risk of being hit by an asteroid of different sizes, based on NASA's data:

* Small asteroid (less than 10 meters in diameter): 1 in 100 million
* Medium asteroid (10-100 meters in diameter): 1 in 1 million
* Large asteroid (over 100 meters in diameter): 1 in 1 billion (or even less)

So, to answer your question, the chances of an asteroi

# Summary
In this recipe we use a query and `DartmouthReranker` to retrieve relative documents from a **collection**. The content of the highest ranked document is then prepended in the a prompt to an llm. This is an implementation of retreival augmented generation. We can also use a `vector_store` to do this, and pin-point the exact sections of the docment we want to reference using `similarity_search`