# RAG Demo
</br>
The code in this notebook was adapted from langchain's simple <a href='https://python.langchain.com/docs/tutorials/rag/' target='_blank'>RAG application walkthrough</a> and <a href='https://huggingface.co/spaces/cboettig/streamlit-demo/blob/main/pages/rag.py' target='_blank'>Professor Boettiger's streamlit RAG demo</a>.  
</br>

Before running this notebook, make sure to open the terminal and run `pip install -r requirements.txt` to load the necessary packages.

<hr style="border: 5px solid #0D335F;" />
<hr style="border: 2px solid #5FAE5B;" />

# Setting up RAG

This portion of the notebook will walk through the code used to set up our RAG system for the demos.

To run all the code in this section and skip to the demo, click the table of contents icon on the left menu bar. Then right click the title of this section, and choose 'Select and Run Cell(s) for this Heading'. Then click the Demos heading to skip to that portion of the notebook. Note that it may take a minute for all the setup cells to finish running.

<hr style="border: 1px solid #5FAE5B;" />
    
## Initial Setup

First we'll set up the chatbot, embedding model, and embedding storage system.

### Ask for your OpenAI API key if you haven't already set one

In [137]:
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
    api_key = getpass.getpass("Enter API key for OpenAI: ")
    os.environ["OPENAI_API_KEY"] = api_key
    #TODO: Remove this ^ to avoid messing with their os.environ? Or is it ok?
else:
    api_key = os.environ["OPENAI_API_KEY"]

**TODO:** Replace with some call to a secrets folder for easier use while developing? Make sure this code is safe to distribute (i.e. make sure the fact that me having run the code doesn't mean someone else can come in this notebook and access my API key)

### Set up the chatbot's language model

In [138]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model = "llama3", api_key = api_key, base_url = "https://llm.nrp-nautilus.io",  temperature=0)

### Set up the embedding model
**TODO:** [insert one setence explanation for embedding, and link to further explanation]

In [171]:
#Set up the embedding model
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model = "embed-mistral", 
    api_key = api_key, 
    base_url = "https://llm.nrp-nautilus.io")

### Set up the embedding storage system (AKA the vector store)

In [172]:
from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)

### Initial Setup Complete!

<hr style="border: 1px solid #5FAE5B;" />

## Data Processing Pipeline (Indexing)
Here's where we start processing the textual data in the document(s) we want our chatbot to use when answering our questions. In our case, this will involve 3 steps: 

1. Load the document(s)    
2. Split the document(s) into smaller pieces  
3. Produce vectors representing these smaller pieces, and use those vectors to organize our pieces in a database

If we want to change the document(s) our chatbot is using, we'll have to add the new documents and run through this part of the process again (hence the name 'pipeline').

### Load the document(s)
This code allows us to load the textual data from PDFs into a format that we can work with. You can also load html files directly from the web by following the steps described in 
<a href='https://python.langchain.com/docs/tutorials/rag/#loading-documents' target='_blank'>the 'loading documents' portion of the RAG application walkthrough</a>.

In [141]:
from langchain_community.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

def pdf_loader(url):
    """
    Loads the PDF at the given url.

    Args:
        url (str): the url to the PDF you want to load

    Returns: A document containing the text data (and metadata) of the specified PDF.
    """
    loader = PyPDFLoader(url)
    return loader.load()

In [142]:
docs = pdf_loader('https://canature.maps.arcgis.com/sharing/rest/content/items/8da9faef231c4e31b651ae6dff95254e/data')

To load multiple PDFs: put all the PDFs in a folder, uncomment the last line of the cell below, paste in the path to your folder, and then run the cell.

In [143]:
def multiple_pdf_loader(folder_path):
    """
    Loads all PDFs in the specified folder.

    Args:
        folder_path (str): path to the folder containing the all the PDFs you want to load.

    Returns: A list of documents, each document representing one PDF
    """
    loader = PyPDFDirectoryLoader(folder_path)
    return loader.load()

#An example folder file path would look like: 'C:/Users/evan/Downloads/PDF Folder Name'
#Uncomment the line below and paste in the path to your pdf folder to load multiple PDFs.
#docs = multiple_pdf_loader('paste the path to your folder here')

**TODO:** Test the multiple_pdf_loader function  
**TODO:** Also should I remove the docstrings? Do they make the code look scarier than it is just because it makes the cell look much bigger than it should be?

### Split the document(s) into bite-sized pieces
This code will take our document(s) and split their text into smaller sub-sections, sometimes referred to as 'chunks'. There are two important parameters to note in the cell below: `chunk_size` and `chunk_overlap`. 

The `chunk_size` parameter determines (approximately) how many characters will be in each chunk. The `chunk_overlap` parameter determines how many characters will be shared by any given chunk and the chunk that directly follows it in the text. The importance of `chunk_overlap` is discussed in the article (see breaking mode 1), and will be demonstrated later in this notebook.

You can read more about langchain's text splitting methods <a href='https://python.langchain.com/docs/how_to/recursive_text_splitter/' target='_blank'>here</a>.

In [173]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # chunk size (characters)
    chunk_overlap=50,  # chunk overlap (characters)
)
all_splits = text_splitter.split_documents(docs)

print(f"Split pdf into {len(all_splits)} sub-documents.")

Split pdf into 171 sub-documents.


### Add the pieces to the embedding storage system (AKA the vector store)
Under the hood, this code is actually doing two things. When we set up the vector store earlier, we told it which embedding model to use. Now, when we the add the chunks of our documents to the vector store, it first will call the embedding model to create vector representations of those chunks. Then it will use those vector representations to organize the chunks within the database. This will allow us to  quickly search for relevant pieces of our document(s) later.

**TODO:** fact check my description of the under-the-hood activities (I think it's true, but that's just because I don't see how else it could work)

In [174]:
document_ids = vector_store.add_documents(documents=all_splits)

assert len(all_splits) == len(document_ids), "len(all_splits) does not match len(document_ids): The vector store likely wasn't reset before adding the new splits"
print(document_ids[:3])

['1b94b0aa-c33c-401c-a0c1-50de1201af4a', '27f2dafa-fc85-444e-82f9-75e8245c7b09', '88e50023-93ee-4fd9-9a42-6d807bbef6a0']


### Indexing Complete!

At this point we've completed the 'indexing' portion of our set up process. This has involved 3 steps:  

1. **Loading our document(s)**: We used PyPDFLoader to load our pdf(s) into a format we could process using code.
2. **Text Splitting**: We used a text splitter to break our document(s) into smaller pieces that our chatbot will be able to more easily digest.  
3. **Add chunks to our vector storage system**: We used an embedding model to represeent the pieces of our document(s) as vectors. Utilizing the vector embeddings we just made, we organized the pieces of our document(s) in a database.
                                                                                                                                                                                                        
Next we will set up a 'retriever' which will use this organized database to retrieve relevant pieces of our document(s) based on the user's question.

<hr style="border: 1px solid #5FAE5B;" />

## Retrieval and Generation

**TODO** insert Description of what this section builds

Note: The instructions in the Retrieval and Generation portion of LangChain's RAG demo, <a href='https://python.langchain.com/docs/tutorials/rag/#orchestration' target='_blank'>linked here</a>, uses code more conducive to future modifications and integrations into larger systems. This flexible code is likely preferable when developing a real RAG application, but is more complex than necessary for demonstration purposes. In this demo, we'll take a more 'quick and dirty' approach.

### Build the retriever

We'll build an agent that will take in a user's question, search our vector store for semantically similar chunks, and then return those passages.

In [175]:
retriever = vector_store.as_retriever()

### Make a template for the prompt we'll pass to our chatbot

We'll do this using LangChain's ChatPromptTemplate (<a href='https://python.langchain.com/api_reference/core/prompts/langchain_core.prompts.chat.ChatPromptTemplate.html' target='_blank'> documentation linked here</a>)

In [167]:
from langchain_core.prompts import ChatPromptTemplate
system_prompt_template = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "Context: {context}"
)

prompt = ChatPromptTemplate(
    [
        ("system", system_prompt_template),
        ("human", "Question: {input}"),
    ]
)

example_prompt = prompt.invoke(
    {"context": "[I'll put the context here!]", "input": "[I'll put the user's question here!]"}
).to_messages()

print(example_prompt[0].content)
print(example_prompt[1].content)

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, say that you don't know. Use three sentences maximum and keep the answer concise.

Context: [I'll put the context here!]
Question: [I'll put the user's question here!]


### Build a way for the user to query the RAG Chatbot

In [148]:
### This code correctly answered the question, without any changes to the chunk size or parameters ###
### It wouldn't give me a way to see which parts of the original document were referenced, however ###
### I decided to pivot to code closer to Professor Boettiger's, so we can inspect the retriever's actions ###

# def ask_RAG(question: str):
#     relevant_chunks = vector_store.similarity_search(question) 
#     #searches the vector store for semantically related chunks 
#     #these chunks are returned as LangChain Document objects
    
#     context_str = '\n\n'.join(chunk.page_content for chunk in relevant_chunks) 
#     #chunk.page_content gets the chunk's text, since each chunk is a Document object
#     #'/n/n'.join(...) builds a string with two new lines between each relevant chunk
    
#     prompt_with_context = prompt.invoke(
#         {'context': context_str, 'question': question}
#     )
#     #build a prompt to pass to the chatbot using the context string and user's question

#     return llm.invoke(prompt_with_context).content

In [176]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

### NOTES!!!

Re-running the document splitter without clearing the vector store can produce misleading results (the changes to the document splits won't be reflected).

**Gave correct answer:** chunk size: 1000, chunk overlap: 200, 100, 50  
**Gave half answer:** chunk size: 500, chunk overlap: 50

In [177]:
results = rag_chain.invoke({'input': 'What are the 10 pathways to 30x30?'})
print(results['answer'])

The 10 pathways to achieve 30x30 are: 

1. Accelerate Regionally Led Conservation
2. Execute Strategic Land Acquisitions
3. Increase Voluntary Conservation Easements
4. Enhance Conservation of Existing Public Lands and Coastal Waters
5. Institutionalize Advance Mitigation
6. Expand and Accelerate Environmental Restoration and Stewardship
7. Strengthen Coordination Among Governments
8. Align Investments to Maximize Conservation Benefits
9. Advance and Promote Complementary Conservation Measures
10. Evaluate Conservation Outcomes and Adaptively Manage


In [183]:
show_context = False
# Change show_context to True if you want to see the passages used to produce this answer

if show_context:
    for context in results['context']:
        print(context.metadata, '\n')
        print(context.page_content, '\n')

</br>
<hr style="border: 5px solid #0D335F;" />
<hr style="border: 2px solid #5FAE5B;" />

# Demos

<hr style="border: 1px solid #5FAE5B;" />

## Breaking mode 1

<hr style="border: 1px solid #5FAE5B;" />

## Breaking mode 2

</br>
<hr style='border: 3px solid #0D335F;' />
<hr style='border: 1px solid #5FAE5B;' />

# Sources
This is a collection of all the links I inserted throughout the doc

<a href='https://python.langchain.com/docs/tutorials/rag/' target='_blank'>LangChain RAG tutorial</a>  
<a href='https://huggingface.co/spaces/cboettig/streamlit-demo/blob/main/pages/rag.py' target='_blank'>Professor Boettiger's Streamlit RAG Demo</a>  
<a href='https://python.langchain.com/docs/how_to/recursive_text_splitter/' target='_blank'>Recursive Text Splitter Documentation</a>

### Helpful Resources

<a href='https://python.langchain.com/docs/tutorials/' target='_blank'>LangChain's Tutorials page</a>

### Dump:

**Breaking mode 1:**  
Higher chunk overlap increases the chance that, if one chunk is deemed relevant to the prompt, the chunks surrounding it will also be seen as relevant. In effect, this encourages the RAG model to read more of the context surrounding the chunk where it believes an answer is located. The downside of high chunk overlap is increased computational intensity, since higher overlap means there will be more chunks.

### Notes

Assume that people have read the full paper, so avoid being over-redundant. Explaining technicalities of code is good. Repeating some stuff from the paper is ok, just avoid being too redundant.

### Questions:

How should we set up the notebook so users can conveniently enter their OpenAI API key?

What do we think about the blue and green horizontal lines? Are there tweaks we could make that would be better?

I assume the actual notebook shouldn't have the %pip install cells right?  
And how do I add the -U in requirements.txt (or I guess just the U since I assume the q just means don't fill the screen with text)?  
