# RAG Demo
</br>
The code in this notebook was adapted from langchain's simple <a href='https://python.langchain.com/docs/tutorials/rag/' target='_blank'>RAG application walkthrough</a> and <a href='https://huggingface.co/spaces/cboettig/streamlit-demo/blob/main/pages/rag.py' target='_blank'>Professor Boettiger's streamlit RAG demo</a>.  
</br>

Before running this notebook, make sure to open the terminal and run `pip install -r requirements.txt` to load the necessary packages.

<hr style="border: 5px solid #0D335F;" />
<hr style="border: 2px solid #5FAE5B;" />

# Setting up RAG

This portion of the notebook will walk through the code used to set up our RAG system for the demos.

To run all the code in this section and skip to the demo, click the table of contents icon on the left menu bar. Then right click the title of this section, and choose 'Select and Run Cell(s) for this Heading'. Then click the Demos heading to skip to that portion of the notebook. Note that it may take a minute for all the setup cells to finish running.

<hr style="border: 1px solid #5FAE5B;" />
    
## Initial Setup

First, we'll import all the packages we'll need in this notebook. Then we'll set up the chatbot and embedding model.

In [60]:
import getpass
import os
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_community.document_loaders import PyPDFLoader, PyPDFDirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_core.prompts import ChatPromptTemplate

### Ask for your OpenAI API key if you haven't already set one

In [61]:
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
    api_key = getpass.getpass("Enter API key for OpenAI: ")
    os.environ["OPENAI_API_KEY"] = api_key
else:
    api_key = os.environ["OPENAI_API_KEY"]

### Set up the chatbot's language model

In [62]:
from langchain_openai import ChatOpenAI
llama3_llm = ChatOpenAI(model = "llama3", api_key = api_key, base_url = "https://llm.nrp-nautilus.io",  temperature=0)

### Set up the embedding model
An embedding model is a machine that, in this case, can take textual data and produce a vector representation (an embedding) of that piece of text. These vector representations allow us to quickly identify sematically similar pieces of text using linear algebra. To read more about embeddings, check out <a href='https://www.ibm.com/think/topics/embedding' target='_blank'>this article from IMB.</a>

In [63]:
#Set up the embedding model
from langchain_openai import OpenAIEmbeddings

mistral_embeddings = OpenAIEmbeddings(
    model = "embed-mistral", 
    api_key = api_key, 
    base_url = "https://llm.nrp-nautilus.io")

### Initial Setup Complete!

<hr style="border: 1px solid #5FAE5B;" />

## Data Processing Pipeline (Indexing)
Here's where we start processing the textual data in the document(s) we want our chatbot to use when answering our questions. In our case, this will involve 3 steps: 

1. Load the document(s)    
2. Split the document(s) into smaller pieces  
3. Produce vectors representing these smaller pieces, and use those vectors to organize our pieces in a database

If we want to change the document(s) our chatbot is using, we'll have to add the new documents and run through this part of the process again (hence the name 'pipeline').

### Load the document(s)
This code allows us to load the textual data from PDFs into a format that we can work with. You can also load html files directly from the web by following the steps described in 
<a href='https://python.langchain.com/docs/tutorials/rag/#loading-documents' target='_blank'>the 'loading documents' portion of the RAG application walkthrough</a>.

In [64]:
from langchain_community.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

def pdf_loader(url):
    """
    Loads the PDF at the given url.

    Args:
        url (str): the url to the PDF you want to load

    Returns: A document containing the text data (and metadata) of the specified PDF.
    """
    loader = PyPDFLoader(url)
    return loader.load()

In [65]:
pathways_to_30x30_url = 'https://canature.maps.arcgis.com/sharing/rest/content/items/8da9faef231c4e31b651ae6dff95254e/data'
docs = pdf_loader(pathways_to_30x30_url)

To load multiple PDFs: put all the PDFs in a folder, add the PDF folder to the folder containing this jupyter notebook (or any other location were it can be accessed by this jupyter notebook), uncomment the last line of the cell below, write in the path to your folder, and then run the cell.

In [66]:
def multiple_pdf_loader(folder_path):
    """
    Loads all PDFs in the specified folder.

    Args:
        folder_path (str): path to the folder containing the all the PDFs you want to load.

    Returns: A list of documents, each document representing one PDF
    """
    loader = PyPDFDirectoryLoader(folder_path)
    return loader.load()

#If the PDF folder is in the same folder as this jupyter notebook, the folder path is just the PDF folder's name
#Uncomment the line below and paste in the path to your pdf folder to load multiple PDFs.
#docs = multiple_pdf_loader('PDF Folder Path')

### Split the document(s) into bite-sized pieces
This code will take our document(s) and split their text into smaller sub-sections, sometimes referred to as 'chunks'. There are two important parameters to note in the cell below: `chunk_size` and `chunk_overlap`. 

The `chunk_size` parameter determines (approximately) how many characters will be in each chunk. The `chunk_overlap` parameter determines how many characters will be shared by any given chunk and the chunk that directly follows it in the text. The importance of `chunk_overlap` is discussed in the Breaking Mode 1 section of this demo.

You can read more about langchain's text splitting methods <a href='https://python.langchain.com/docs/how_to/recursive_text_splitter/' target='_blank'>here</a>.

In [67]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # chunk size (characters)
    chunk_overlap=200,  # chunk overlap (characters)
)
all_splits = text_splitter.split_documents(docs)

print(f"Split pdf into {len(all_splits)} sub-documents.")

Split pdf into 188 sub-documents.


### Make an embedding storage system and add the chunks to this storage system
First, we'll initialize an embedding storage system, sometimes referred to as a vector store, that will use the embedding model we set up earlier (`embeddings`). Then, when we the add the chunks of our documents to the vector store, it will call the embedding model to create vector representations of those chunks. The vector store will use those vector representations to organize the chunks within its database. This will allow us to  quickly search for relevant pieces of our document(s) later.

**TODO:** fact check my description of the under-the-hood activities (I think it's true, but that's just because I don't see how else it could work)

In [68]:
from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(mistral_embeddings)

document_ids = vector_store.add_documents(documents=all_splits)

print(document_ids[:3])

['f114c9c0-d550-4142-b929-14417710a8c3', '398f3280-d147-48c7-b555-9296f840276b', 'b55791e7-4f01-4f12-adb7-b7dcf5d13b2a']


### Indexing Complete!

At this point we've completed the 'indexing' portion of our set up process. This has involved 3 steps:  

1. Loading our document(s): We used PyPDFLoader to load our pdf(s) into a format we could process using code.
2. Text Splitting: We used a text splitter to break our document(s) into smaller pieces that our LLM will be able to more easily digest.  
3. Add chunks to our vector storage system: We used an embedding model to represeent the pieces of our document(s) as vectors. Utilizing the vector embeddings we just made, we organized the pieces of our document(s) in a database.


<hr style="border: 1px solid #5FAE5B;" />

## Retrieval and Generation

Next we will build the infrastructure to find relevant pieces of our document(s) based on our query, and then pass our query along with those relevant pieces of text to the LLM so it can generate an informed response. 

Retrieving relevant chunks of our document(s) based on our question is often referred to as 'retrieval'.

Note: The instructions in the Retrieval and Generation portion of LangChain's RAG demo, <a href='https://python.langchain.com/docs/tutorials/rag/#orchestration' target='_blank'>linked here</a>, uses code more conducive to future modifications and integrations into larger systems. This flexible code is likely preferable when developing a real RAG application, but is more complex than necessary for demonstration purposes. In this demo, we'll take a more 'quick and dirty' approach.

### Make a template for the prompt we'll pass to our LLM

With a template we can pass in our question and the relevant context, and we'll get back one complete prompt to pass to our LLM. We'll do this using LangChain's ChatPromptTemplate (<a href='https://python.langchain.com/api_reference/core/prompts/langchain_core.prompts.chat.ChatPromptTemplate.html' target='_blank'> documentation linked here</a>)

In [69]:
from langchain_core.prompts import ChatPromptTemplate
system_prompt_template = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "Context: {context}"
)

prompt = ChatPromptTemplate(
    [
        ("system", system_prompt_template),
        ("human", "Question: {input}"),
    ]
)

example_prompt = prompt.invoke(
    {"context": "[I'll put the context here!]", "input": "[I'll put the user's question here!]"}
).to_messages()

print(example_prompt[0].content)
print(example_prompt[1].content)

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, say that you don't know. Use three sentences maximum and keep the answer concise.

Context: [I'll put the context here!]
Question: [I'll put the user's question here!]


### Build a way for the user to query the RAG Chatbot

Each time you query a RAG Chatbot, 3 things happen:

1. The vector store finds the chunks of the document most relevant to your question (this is the 'retrieval' step).
2. Your question and the relevant chunks are bundled into one big prompt.
3. That prompt is passed to the LLM, and the LLM uses the relevant chunks to answer your question (this is the 'generation' step).

So, we'll build a function that does just that.

In [70]:
def ask_RAG(question: str) -> dict:
    """
    Ask our RAG Chatbot a question.

    Args: 
        question (str): the question we want to ask our RAG Chatbot

    Returns:
        dict: a dictionary with two keys, 'answer' and 'context'.
            'answer' is paired with a string containing the llm's answer to our question
            'context' is paired with a string containing the chunks of our document that
                were given to the llm to help it answer our question
    """
    relevant_chunks = vector_store.similarity_search(question) 
    #searches the vector store for chunks of our document semantically similar to the question 
    #these chunks are returned as LangChain Document objects
    
    context_str = '\n\n'.join(chunk.page_content for chunk in relevant_chunks) 
    #chunk.page_content gets the chunk's text (since each chunk is a Document object)
    #'/n/n'.join(...) builds a string with two new lines between each relevant chunk
    
    prompt_with_context = prompt.invoke(
        {'context': context_str, 'input': question}
    )
    #builds a prompt using the context string and user's question
    
    response = llama3_llm.invoke(prompt_with_context)
    #gives the prompt to the LLM and gets the LLM's response
    
    return {'answer': response.content, 'context': context_str}

In [71]:
pathway_response = ask_RAG("What is California's 30x30 initiatve?")
print(pathway_response['answer'])

California's 30x30 initiative is a state commitment to conserve 30% of its lands and coastal waters by 2030, as part of an international movement to protect natural areas and combat climate change. The initiative aims to protect and restore biodiversity, expand access to nature, and mitigate and build resilience to climate change. It also aligns with broader state commitments to advance justice, equity, diversity, and inclusion, and sustain economic prosperity.


In [72]:
print(pathway_response['context'])

state policies, practices, and systems, and 
strategic investments in parks and open 
spaces, workforce, outdoor programming, 
and new partnerships.
30x30 will conserve lands and coastal waters 
in ways that will support these two important 
initiatives and will coordinate closely with leaders 
of these efforts to ensure investments, policies 
and programs are mutually supportive and align 
with one another.  
The 30x30 initiative will utilize information 
and findings from several other state 
government reports and strategies and advance 
complementary priorities from these efforts. 
Other reports and strategies that 30x30 will 
reference and utilize include:
Enhancing Biodiversity 
2021 California Biodiversity Atlas 
2020 California Wildlife Barriers 
2018 California Biodiversity Initiative: A 
Roadmap for Protecting the State’s Natural 
Heritage 
2015 State Wildlife Action Plan 
Ocean And Coastal Protection 
Strategic Plan to Protect California’s Coast and 
Ocean 2020–2025

adminis

</br>
<hr style="border: 5px solid #0D335F;" />
<hr style="border: 2px solid #5FAE5B;" />

# Demos

<hr style="border: 1px solid #5FAE5B;" />

## RAG Builder Set Up

This is a helper function for the demos and sandbox. The function takes in various components of a RAG model, and returns an ask_RAG function built using those components.

In [73]:
def build_ask_RAG(pdf_path = pathways_to_30x30_url, chunk_size = 1000, chunk_overlap = 200, 
                  llm = llama3_llm, embedding_model = mistral_embeddings, prompt = prompt):
    """
    Build an ask_RAG function using the provided components

    Args: 
        pdf_path (str): containing either the url of the pdf or
            the relative path to the folder containing the pdf(s)
            
        chunk_size (int): the approximate size of each chunk produced in
            the text-splitting process
            
        chunk_overlap (int): the approximate overlap to between each chunk
            and the following chunk produced in the text-splitting process

        llm (ChatOpenAI): the LLM used to produce responses to your queries;
            an instance of the ChatOpenAI class

        embedding_model (OpenAIEmbeddings): the embedding model used to organize
            the chunks of your pdf(s) in the vector_store; an instance of the
            OpenAIEmbeddings class

        prompt (ChatPromptTemplate): the prompt template to use when passing
            your question and the relevant context to the llm; an instance of 
            the ChatPromptTemplate class
        
    Returns: 
        function: an ask_RAG function built using the provided components
    """
    #Load the document:
    if pdf_path[:4] == 'http':
        doc = pdf_loader(pdf_path)
    else:
        doc = multiple_pdf_loader(pdf_path)

    #Split the document into bite-sized pieces
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size = chunk_size,
        chunk_overlap = chunk_overlap
    )
    splits = text_splitter.split_documents(doc)

    #Build a vector store using the provided embedding model
    vector_store = InMemoryVectorStore(embedding_model)
    #Store the chunks of the document in the vector store
    vector_store.add_documents(documents=splits)

    #Build an ask_RAG function using vector_store and the provided llm
    def new_ask_RAG(question: str):
        relevant_chunks = vector_store.similarity_search(question) 
        context_str = '\n\n'.join(chunk.page_content for chunk in relevant_chunks) 
        prompt_with_context = prompt.invoke(
            {'context': context_str, 'input': question}
        )
        response = llm.invoke(prompt_with_context)
        return {'answer': response.content, 'context': context_str}
    
    return new_ask_RAG

<hr style="border: 1px solid #5FAE5B;" />

## Breaking mode 1: Chunk Cutoffs

Because the document is broken into chunks, and the LLM is only provided the chunks that the vector store thinks are most similar to the user's question, chunks neighboring each other in the text may be seperated. One could be deemed relevant and passed to the LLM while the other is deemed not sufficiently relevant and left out. Low chunk_size and chunk_overlap increase the risk of this being a problem, as shown in this demo.

### Baseline RAG Agent for Comparison

The RAG Agent we built in the set up portion of this notebook split the document into chunks with `chunk_size = 1000` and `chunk_overlap = 200`. These are the values used in <a href='https://python.langchain.com/docs/tutorials/rag/#preview' target='_blank'>LangChain's RAG application walkthrough</a>. 

To get a baseline response, we'll ask this model to name the 10 pathways to 30x30 that are listed in our document. To answer this question correctly, the retrieval process will have to find chunks of our document that, when taken together, contain all 10 pathways. This shouldn't be too difficult because there are multiple pages in this document with a list of all 10 pathways (namely, pages 4, 5, and 35).

In [74]:
baseline_results = ask_RAG("What are the 10 pathways to 30x30?")
print(baseline_results['answer'])

The 10 pathways to achieve 30x30 are: 

1. Accelerate Regionally Led Conservation
2. Execute Strategic Land Acquisitions
3. Increase Voluntary Conservation Easements
4. Enhance Conservation of Existing Public Lands and Coastal Waters
5. Institutionalize Advance Mitigation
6. Expand and Accelerate Environmental Restoration and Stewardship
7. Strengthen Coordination Among Governments
8. Align Investments to Maximize Conservation Benefits
9. Advance and Promote Complementary Conservation Measures
10. Evaluate Conservation Outcomes and Adaptively Manage


#### Baseline Results

The baseline RAG agent was provided enough context to correctly identify all 10 pathways. To see the context that the retrieval process gave the model, change the value of `show_context_baseline` to `True` in the following cell.

In [75]:
show_context_baseline = False #Change this to True to see the context the baselinen LLM was given to answer this question

if show_context_baseline:
    print(baseline_results['context'])

<br>

### Myopic RAG Agent

To see how low chunk size and chunk overlap can reduce the effectiveness of the retrieval process, we'll create a RAG Agent based on a document that we split with small `chunk_size` and `chunk_overlap` parameters. In this case, we'll use `chunk_size = 500` and `chunk_overlap = 100`. We'll then ask this model the same question we asked our baseline model.

In [76]:
myo_ask_RAG = build_ask_RAG(chunk_size = 500, chunk_overlap = 100)

In [77]:
myo_results = myo_ask_RAG("What are the 10 pathways to 30x30?")
print(myo_results['answer'])

The provided context does not list all 10 pathways to 30x30. However, it mentions three pathways: 

1. Accelerate Regionally Led Conservation
2. Execute Strategic Land Acquisitions
3. Increase Voluntary Conservation Easements

Additionally, it mentions two more pathways (10.11 and 10.12) but does not provide the complete list of 10 pathways.


#### Myopic Results

Clearly the myopic RAG model, with low chunk size and chunk overlap, produced an incomplete answer. Fortunately, the model admitted that it didn't know the remaining 7 pathways. Other models we've tested have not been as forthcoming. Some hallucinated the pathways they couldn't find, as we'll show in breaking mode 2.

To see the context this model was given to answer this question, change the value of `show_context_myo` to `True` in the following cell.

In [78]:
show_context_myo = False #Change this to True to see the context the myopic LLM was given to answer this question

if show_context_myo:
    print(myo_results['context'])

<br>

### Fixing the Myopic Model; Isolating Chunk Size and Chunk Overlap

If your RAG model is producing incomplete answers similar to the answer above, either increasing `chunk_size` or `chunk_overlap` may resolve the issue. Using the myopic model as a baseline, we will see how increasing either parameter can produce a correct answer.

In [33]:
larger_chunk_size_RAG = build_ask_RAG(chunk_size = 1000, chunk_overlap = 100)

In [34]:
lcs_results = larger_chunk_size_RAG("What are the 10 pathways to 30x30?")
print(lcs_results['answer'])

The 10 pathways to achieve 30x30 are: 
1. Accelerate Regionally Led Conservation
2. Execute Strategic Land Acquisitions
3. Increase Voluntary Conservation Easements
4. Enhance Conservation of Existing Public Lands and Coastal Waters
5. Institutionalize Advance Mitigation
6. Expand and Accelerate Environmental Restoration and Stewardship
7. Strengthen Coordination Among Governments
8. Align Investments to Maximize Conservation Benefits
9. Advance and Promote Complementary Conservation Measures
10. Evaluate Conservation Outcomes and Adaptively Manage


In [35]:
show_context_lcs = False

if show_context_lcs:
    print(lcs_results['context'])

Increasing `chunk_size` from `500` to `1000`, while keeping `chunk_overlap` at `100`, was enough to produce an accurate answer.

In [36]:
larger_chunk_overlap_RAG = build_ask_RAG(chunk_size = 500, chunk_overlap = 200)

In [37]:
lco_results = larger_chunk_overlap_RAG("What are the 10 pathways to 30x30?")
print(lco_results['answer'])

The 10 pathways to 30x30 are: 
1. Accelerate Regionally Led Conservation
2. Execute Strategic Land Acquisitions
3. Increase Voluntary Conservation Easements
4. Enhance Conservation of Existing Public Lands and Coastal Waters
5. Institutionalize Advance Mitigation
6. Expand and Accelerate Environmental Restoration and Stewardship
7. Strengthen Coordination Among Governments
8. Align Investments to Maximize Conservation Benefits
9. Advance and Promote Complementary Conservation Measures
10. Evaluate Conservation Outcomes and Adaptively Manage


In [39]:
show_context_lco = False

if show_context_lco:
    print(lco_results['context'])

Increasing `chunk_overlap` from `100` to `200`, while leaving the `chunk_size` at `500`, was also enough to produce an accurate answer.

<br>

### Interpretation

**TODO:** Rehash some of the interpretation? Maybe just a couple sentence summary of what was said in the paper for a refresher?

**Note:** Values of `chunk_size` and `chunk_overlap` at which we observe incorrect answers may change with new versions, but qualitatively the observed patterns are likely to persist.  
As new versions of the packages in this demo have released, the values of `chunk_size` and `chunk_overlap` at which the RAG Chatbot produced incorrect answers changed. Typically, when using the more recent versions of these packages, the model was able to accurately answer the prompt for lower values of `chunk_size` and `chunk_overlap`. Increases in the efficiency of tokenization, allowing each chunk to contain more text using fewer tokens, could account for this change. Qualitatively, though, the patterns we observed remained the same. Lower values of `chunk_size` and `chunk_overlap` increased the risk of incomplete answers.

### Potential Solutions

**Increasing chunk size** forces more neighboring text to be kept together because higher chunk size produces larger units of text. However, these large chunks may still be seperated from their neighbors, and cuts between chunks may still appear in inconvenient places. Increasing chunk size reduces the total number of cuts, but does not address the underlying problem that chunks may be seperated from their neighbors.

**Increasing chunk overlap** may be a more comprehensive solution. Higher chunk overlap increases the amount of text shared by two neighboring chunks, thereby increasing their similarity. Higher similarity amongst neighboring chunks means, if one chunk is deemed relevant to the user's question, neighboring chunks are also more likely to be deemed relevant. This means, rather than reducing the number of cuts, chunk overlap increases the odds that neighboring cut pieces are picked together during the retrieval process.

However, increasing chunk size and chunk overlap also come with costs. Higher chunk sizes will increase the size of the prompt that is sent to the LLM, increasing the cost of each query. Increasing chunk overlap increases the total number of chunks produced, which can require more memory to store and more computational power to embed. Excessively high chunk overlap can also reduce the breadth of information recieved by the LLM, since higher chunk overlap will make the retrieved documents more likely to be more similar to each other.

Chunk size and chunk overlap values must not be excessively low or excessively high. The default values in <a href='https://python.langchain.com/docs/tutorials/rag/#splitting-documents' target='_blank'>LangChain's RAG tutorial</a> (`chunk_size = 1000`, `chunk_overlap = 200`) seem to be reasonable starting points. If you are experiencing difficulty with incomplete answers, increasing chunk overlap (and then chunk size if the problem persists) may be a good place to start. The values you settle at may ultimately depend on your problems, goals, and resources.

### NOTES

Not specifying that there are _10_ pathways in the myopic model (500 chunk size, 100 chunk overlap) lead to 4 pathways, vs 7 when specified

Say that we've observed qualitative pattern is the same but the quantitative numbers may differ because of things like token compression etc.

Adding a question mark at the end of 'What are the 10 pathways to 30x30(?)' changes the myopic model's answer from providing 7 pathways to providing 3.

When you ask the myopic model how many pathways there are, it will say 10. When you then ask what are the pathways (without saying there are 10), it will tell you the 6 pathways are... (or something similar). This reflects how the LLM doesn't actually digest the whole document, it relies on the retrieval process to provide it with all the relevant information it needs.

<hr style="border: 1px solid #5FAE5B;" />

## Breaking mode 2

### Baseline Results

### Changing Chunk Size and Overlap

### Changing the LLM

### Changing the Embedding Model (maybe)

</br>
<hr style="border: 5px solid #0D335F;" />
<hr style="border: 2px solid #5FAE5B;" />

# RAG Sandbox

Use this space to experiment with RAG! When testing the effects of adjusting hyperparameters or other components of the RAG chatbot, we recommend using the `build_ask_RAG` function (defined in the RAG Builder Set Up section of this notebook) for convenience. An example using the `build_ask_RAG` function is provided below for demonstration. 

**TODO**: Write an example of using build_ask_RAG with a variety of non-default parameters.  
**TODO**: Test that providing a relative path to a file of pdfs works in the build_ask_RAG function.

</br>
<hr style='border: 3px solid #0D335F;' />
<hr style='border: 1px solid #5FAE5B;' />

# Sources
This is a collection of all the links I inserted throughout the doc

<a href='https://python.langchain.com/docs/tutorials/rag/' target='_blank'>LangChain RAG tutorial</a>  
<a href='https://huggingface.co/spaces/cboettig/streamlit-demo/blob/main/pages/rag.py' target='_blank'>Professor Boettiger's Streamlit RAG Demo</a>  
<a href='https://www.ibm.com/think/topics/embedding' target='_blank'>What Is Embedding IMB Article</a>  
<a href='https://python.langchain.com/docs/how_to/recursive_text_splitter/' target='_blank'>Recursive Text Splitter Documentation</a>  
<a href='https://python.langchain.com/api_reference/core/prompts/langchain_core.prompts.chat.ChatPromptTemplate.html' target='_blank'>LangChain ChatPromptTemplate Documentation</a>

### Helpful Resources

<a href='https://python.langchain.com/docs/tutorials/' target='_blank'>LangChain's Tutorials page</a>

### Dump:

**Breaking mode 1:**  
Higher chunk overlap increases the chance that, if one chunk is deemed relevant to the prompt, the chunks surrounding it will also be seen as relevant. In effect, this encourages the RAG model to read more of the context surrounding the chunk where it believes an answer is located. The downside of high chunk overlap is increased computational intensity, since higher overlap means there will be more chunks.

### Notes

Assume that people have read the full paper, so avoid being over-redundant. Explaining technicalities of code is good. Repeating some stuff from the paper is ok, just avoid being too redundant.

### Questions:

How should we set up the notebook so users can conveniently enter their OpenAI API key?