# Retrieval Augmented Generation using Runnables and Chains w/ LangChain
Enhance generation with specialized knowledge.

**Purpose**:
This notebook's purpose is to teach you how to build your own custom `Runnable`s from the `LangChain` ecosystem to build your own RAG app.

## Definitions: `Runnables` and `Chains`

*Runnables*:

• A Runnable represents a unit of work that can be executed.

• It can perform a specific task or action, such as making an API call, processing data, or running a machine learning model.

• Runnables can have input and output types specified, and they can be composed together to form more complex workflows.

• They are designed to be flexible and reusable components that can be easily combined and configured.

• Examples of Runnables include API calls, data processing functions, and machine learning models.

*Chains*:

• A Chain is a sequence of Runnables that are executed in a specific order.

• Chains provide a way to string together multiple Runnables to create a workflow or pipeline.

• Each Runnable in the Chain takes the output of the previous Runnable as its input.

• Chains can be used to build complex applications by combining and orchestrating the execution of multiple Runnables.

• They provide a higher-level abstraction for organizing and structuring the flow of data and operations.

• Examples of Chains include data processing pipelines, machine learning workflows, and API request/response sequences.

---

### **Deeper explanation**:

In the process of building an AI chatbot, we often need to connect different components together to create a functional system.

One way to achieve this is by *chaining* these components, ensuring that the output of one component is properly passed to the next component for further processing. To accomplish this, we can directly call the functions or methods of each component and pass the output as arguments to the next component. 

- This straightforward approach works well when we only need to pass the output from one component to another ***without any*** additional processing or transformations in between.

However, in more complex scenarios where we require intermediate processing or transformations on the output, we can use a concept called "runnables." Runnables provide a flexible and modular way to encapsulate and compose these processing steps within a chain.

By using runnables, we can easily add additional functionality, such as filtering or modifying the output, before passing it to the next component. 

- This allows us to *customize the behavior* of the chatbot and *ensure* that the output is properly prepared for the subsequent steps.

#### "How are `Runnable`s different than normal classes?"

*Similarities*:

• Runnables can have methods and attributes, just like normal classes.

• They can define and implement their own logic and functionality.

• Runnables can have constructor arguments and can be instantiated with different configurations.

• They can be subclassed and inherit from other runnables or normal classes.

• Runnables can have static and class methods, allowing for shared functionality across instances.

*Differences*:

• Runnables are designed to be executed as part of a larger system or workflow, often in a distributed or parallelized manner.

• They are typically used for data processing, transformation, or analysis tasks.
• Runnables have specific interfaces and methods that define how they interact with other runnables and the overall system.

• They can be composed and combined with other runnables to create complex workflows.

• Runnables often have additional features and capabilities specific to the Langchain platform, such as input and output type validation, configuration management, and error handling.

• They can be executed asynchronously and in parallel, taking advantage of distributed computing resources.

• Runnables can be versioned and deployed as part of a larger system, allowing for easy updates and maintenance.

#### "How do I decide to use either a `Runnable` or a `Chain`?"
Ultimately, the decision to use runnables or a more straightforward sequential approach depends on the specific requirements and complexity of the chatbot system. You might find yourself using one, both, or neither based on your needs.

In summary, 
1. Chains, which are sequences of interconnected tasks, can operate effectively on their own, without the need for Runnables. They are designed to link various components of a system in a specific order, allowing for the smooth execution of a workflow or pipeline. This makes them particularly useful in scenarios where a straightforward, sequential process is sufficient and where the complexity of Runnables is not required.

2. Runnables resemble traditional classes but offer enhanced functionality, particularly in complex AI chatbot systems. They facilitate the integration and processing of outputs between different components, allowing for customization and increased flexibility in system design. This makes Runnables ideal for scenarios requiring more than just sequential processing, such as when intermediate steps or specific transformations of data are necessary.

## Set up

In [5]:
%pip install -Uq openai tiktoken chromadb langchain

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [6]:
# Set API Key Directly
import os

#os.environ["OPENAI_API_KEY"] = ""

# Load from an .env file
# import dotenv

# dotenv.load_dotenv()

## Split Data into Chunks with `RecursiveCharacterTextSplitter`

To make effective use of our loaded documents (files) we need to split them into manageable chunks.

Generally speaking, smaller chunks warrant more accurate results, but may take longer to process.

### Go Deeper

#### Accuracy with Smaller Chunks
* **Increased Focus**: Smaller chunks of text or queries allow the system to focus on a more specific set of information. This specificity can lead to more accurate and relevant results because the system is not overwhelmed by too much or too broad information.
* **Contextual Relevance**: With a narrower focus, the likelihood of retrieving information that is contextually relevant to more specific queries, enhancing the accuracy of the response.

#### Processing Time
* **Multiple Queries**: Smaller chunks might require multiple queries to cover a topic comprehensively. Each query involves a separate retrieval process, which cumulatively can take more time.
* **Trade-off Between Depth and Breadth**: While smaller queries allow for a depth in a specific area, they might necessitate multiple rounds of retrieval to get a broad understanding, thus increasing overall processing time.

#### System Limitations and Efficiency:
* **Computational Load**: Smaller chunks means more frequent calls to the retrieval system. Depending on the efficiency of the system, this can either slow down the process due to computational load or, if the system is highly efficient, might not significantly impact the processing time.

The following cell demonstrates how to split the loaded data into chunks using the Langchain library. We'll instantiate a variable, `text_splitter`, with the `RecursiveCharacterTextSplitter` class.

In [28]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=0,
    length_function=len
)

## Load Directory of Files with `PyPDFDirectoryLoader`

Now that we have a text splitter, we can use it to split our documents into chunks.
We'll accomplish this by using the `PyPDFDirectoryLoader` class.

Start by using the `PyPDFDirectoryLoader` class to load the data by directly passing in the path to the directory.

For this example, we'll use `./data` and split it into chunks using the `text_splitter` variable.

Be sure to create this subdirectory in this notebook's current working directory and place PDF files you'd like to interrogate.

In [32]:
from langchain.document_loaders import PyPDFDirectoryLoader

# Create loader
pydir_loader = PyPDFDirectoryLoader("./data")

# Load data
docs = pydir_loader.load()

In [33]:
# Split into chunks
split_docs = [text_splitter.split_text(doc.page_content) for doc in docs]

In [34]:
# Print first 4 chunks
for doc_chunks in split_docs:
    for chunk in doc_chunks:
        print(chunk[:4])

KENN
Befo
“The
1. P
of p
Now 
Howe
I’d 
Impo
Lear
Beca
Not 
ment
good
Node
stro
3. L
free
Whil
lear
Stud
Whil
from
4. P
Anot
Now 
they
the 
Deve
Git,
http
5. N
codi
fit.
Atte
comp
me m
If y
appr
The 
Alth
star
WEEK
This


## Retrieve Embeddings from OpenAI

This cell defines a `Runnable`, `retrieve_embeddings` that retrieves embeddings for a list of texts using the `OpenAIEmbeddings` class. The function takes a list of texts as input and returns a list of embeddings, one for each text.

To use this function, pass a list of texts to the `retrieve_embeddings` function, and it will return a list of embeddings, one for each text.

In [None]:
from langchain.embeddings import OpenAIEmbeddings

class RetrieveEmbeddingsRunnable(Runnable):
    lc_attributes = {
        "input": str,
    }
    def invoke(self, input, config=None):
        try:
            embeddings_model = OpenAIEmbeddings()
            embeddings = embeddings_model.embed_documents(input)
            return embeddings
        except Exception as e:
            print(f"Error retrieving embeddings: {e}")
            return []

### Retrieve Embeddings
The following code defines the class, which retrieves embeddings from a directory of files and uses Chroma as the vector store and retriever. The `retrieve_embeddings` method performs the retrieval process, invoking the necessary Runnables and storing the embeddings in Chroma. The class provides methods to access the vector store and retriever objects for further use.

In [None]:
from langchain.vectorstores import Chroma
from langchain.retrievers import VectorStoreRetriever

class EmbeddingsRetrieval(Runnable):
    def __init__(self, directory_path: str, chunk_size: int, overlap_ratio: float):
        self.directory_path = directory_path
        self.chunk_size = chunk_size
        self.overlap_ratio = overlap_ratio
        self.vector_store = None
        self.retriever = None

    def retrieve_embeddings(self):
        try:
            # Create the Runnables
            directory_loader = DirectoryLoaderRunnable(directory_path=self.directory_path)
            data_splitter = SplitDataIntoChunksRunnable(chunk_size=self.chunk_size, overlap_ratio=self.overlap_ratio)
            embeddings_retriever = RetrieveEmbeddingsRunnable()

            # Invoke the Runnables
            processed_files = directory_loader.invoke()
            chunks = data_splitter.invoke(processed_files)
            embeddings = embeddings_retriever.invoke(chunks)

            # Use Chroma as the vector store
            self.vector_store = Chroma()
            self.vector_store.add_documents(embeddings)

            # Use Chroma as the retriever
            self.retriever = self.vector_store.as_retriever()

            return embeddings
        except Exception as e:
            print(f"Error retrieving embeddings: {e}")
            return []

In [None]:
# Create an instance of EmbeddingsRetrieval
retrieval = EmbeddingsRetrieval(directory_path="data", chunk_size=1000, overlap_ratio=0.2)

# Retrieve the embeddings
embeddings = retrieval.retrieve_embeddings()

# Use the embeddings as needed
# For a quick check, print the embeddings
for embedding in embeddings:
    print(embedding)

---

# Examples of ChatOpenAI Chatbot Chain Examples
These end-to-end examples were generated at https://chat.langchain.com/ on 12/22/23:

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.memory import ConversationWindowBufferMemory
from langchain.vectorstores import Chroma
from langchain.retrievers import SelfQueryRetriever
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_core.parsers import StrOutputParser
from langchain.embeddings.openai import OpenAIEmbeddings

# Create the chat model
chat_model = ChatOpenAI()

# Create the prompt template with memory placeholders
prompt_template = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful chatbot"),
        MessagesPlaceholder(variable_name="history"),
        ("human", "{input}"),
    ]
)

# Create the memory with a window size of 2
memory = ConversationWindowBufferMemory(window_size=2, return_messages=True)

# Create the Chroma vector store
embeddings = OpenAIEmbeddings()
vector_store = Chroma.from_documents([], embeddings)

# Create the Chroma retriever
retriever = SelfQueryRetriever(vector_store)

# Create the chain
chain = (
    RunnablePassthrough.assign(
        history=RunnableLambda(memory.load_memory_variables) | lambda x: x.get("history", [])
    )
    | prompt_template
    | chat_model
    | retriever
)

# Create the output parser
output_parser = StrOutputParser()

# Define the user input
user_input = "Hi, how can I help you?"

# Invoke the chain
output = chain.invoke({"input": user_input})

# Parse the output
parsed_output = output_parser.parse(output)

# Update the memory with the user input and model output
memory.save_context({"input": user_input}, {"output": parsed_output})

# Print the parsed output
print(parsed_output)

In [1]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationWindowBufferMemory
from langchain.chat_models import ChatOpenAI
from langchain.parsers import StrOutputParser
from langchain.vector_stores import Chroma

# Create the chat model
chat_model = ChatOpenAI(
    model_name="gpt-3.5-turbo-1106",
    temperature=0.25
    )

# Create the memory
memory = ConversationWindowBufferMemory(window_size=5)

# Create the vector store and retriever
vector_store = Chroma()
retriever = vector_store.as_retriever()

# Create the chatbot chain
chatbot_chain = ConversationChain(
    llm=chat_model,
    memory=memory,
    retriever=retriever,
    parser=StrOutputParser(),
)

# Run the chatbot chain
response = chatbot_chain.invoke("Hello!")
print(response["response"])

ImportError: cannot import name 'ConversationWindowBufferMemory' from 'langchain.memory' (c:\Users\dae\.vscode\Software\.venv\lib\site-packages\langchain\memory\__init__.py)