## Retrieval Augmented Generation (RAG) 

One of the most powerful applications enabled by LLMs is sophisticated question-answering (Q&A) chatbots. These are applications that can answer questions about specific source information. These applications use a technique known as Retrieval Augmented Generation, or RAG.


### Q & A  with RAG
https://python.langchain.com/docs/use_cases/question_answering/

A typical RAG application has two main components:

- Indexing: a pipeline for ingesting data from a source and indexing it. This usually happens offline.
- Retrieval and generation: the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.



## The most common full sequence from raw data to answer looks like:

In [8]:
# 1. Load text files 

from langchain_community.document_loaders import TextLoader

loader = TextLoader("README.md")
loader.load()

[Document(page_content="# LangChain\nComplete understanding of Open AI LangChain \n\nChapters:\n1. LangChain integaration and understanding prompt, chaining and history collecting with ConversationBufferMemory\n2. Further understanding of prompts and FewShotPromptTemplate\n3. PDF Query Using Langchain Q/A chain (Includes Text Embedding for classification, Search, recommendation, detection, etc)\n4. LLM Model with Deployment using HuggingFace Spaces\n5. RAG (Retrieval Augmentated Generation)\n\n\n## Working with langChain\n- Create your openAI API key \n- Generate HuggingFace access token (Read)\n- Install all the requirements \n- Run the required chapter python file using streamlit run\n\n## Streamlit Run\n\n- run on local host: http://localhost:8501/\n\n## Deploy to HuggingFace spaces\n\n- Create new Space with SDK paramenter 'Streamlit':\nhttps://huggingface.co/spaces/rasika-gulhane/LLM_Chatbot_Q-A\n\n- Follow the steps to deploy git repo\n- Spaces > Settings : Add the secrets for Op

In [16]:
# 2. SPLIT 
# Split: Text splitters break large Documents into smaller chunks.
# This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won’t fit in a model’s finite context window.


# - Split the text up into small, semantically meaningful chunks (often sentences).
# - Start combining these small chunks into a larger chunk until you reach a certain size (as measured by some function).
# - Once you reach that size, make that chunk its own piece of text and then start creating a new chunk of text with some overlap (to keep context between chunks).



In [15]:
# 3. Embedding and Vector store:

# - Store: We need somewhere to store and index our splits, so that they can later be searched over. 
# - This is often done using a VectorStore and Embeddings model.

## Retrieval and generation
- Retrieve: Given a user input, relevant splits are retrieved from storage using a Retriever.
- Generate: A ChatModel / LLM produces an answer using a prompt that includes the question and the retrieved data

### Using Retrival in LCEL

In [18]:
! pip install langchain_openai

Collecting langchain_openai
  Using cached langchain_openai-0.1.3-py3-none-any.whl.metadata (2.5 kB)
Using cached langchain_openai-0.1.3-py3-none-any.whl (33 kB)
Installing collected packages: langchain_openai
Successfully installed langchain_openai-0.1.3


In [28]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough


In [20]:

template = """Answer the question based only on the following context:
{context}
Question: {question}
"""

In [21]:

prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()


In [22]:

def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])


In [41]:
# retriever = some_retrival_module()


chain = (
    
    {"context": retriever | format_docs, "question": RunnablePassthrough()}

    | prompt
    | model
    | StrOutputParser()
)

chain.invoke("What did the president say about technology?")

## Custom Retriver

Let’s implement a toy retriever that returns all documents whose text contains the text in the user query.

In [32]:
from typing import List

from langchain_core.callbacks import CallbackManagerForRetrieverRun
from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever

In [33]:
class ToyRetriever(BaseRetriever):

    """A toy retriever that contains the top k documents that contain the user query.

    This retriever only implements the sync method _get_relevant_documents.

    If the retriever were to involve file access or network access, it could benefit
    from a native async implementation of `_aget_relevant_documents`.

    As usual, with Runnables, there's a default async implementation that's provided
    that delegates to the sync implementation running on another thread.
    """

    documents: List[Document]
    """List of documents to retrieve from."""

    k: int
    
    """Number of top results to return"""

    def _get_relevant_documents(
        self, query: str, *, run_manager: CallbackManagerForRetrieverRun
    ) -> List[Document]:
        
        """Sync implementations for retriever."""
        matching_documents = []
        for document in self.documents:
            if len(matching_documents) > self.k:
                return matching_documents

            if query.lower() in document.page_content.lower():
                matching_documents.append(document)
        return matching_documents

In [35]:
# Test this:

documents = [
    Document(
        page_content="Dogs are great companions, known for their loyalty and friendliness.",
        metadata={"type": "dog", "trait": "loyalty"},
    ),
    Document(
        page_content="Cats are independent pets that often enjoy their own space.",
        metadata={"type": "cat", "trait": "independence"},
    ),
    Document(
        page_content="Goldfish are popular pets for beginners, requiring relatively simple care.",
        metadata={"type": "fish", "trait": "low maintenance"},
    ),
    Document(
        page_content="Parrots are intelligent birds capable of mimicking human speech.",
        metadata={"type": "bird", "trait": "intelligence"},
    ),
    Document(
        page_content="Rabbits are social animals that need plenty of space to hop around.",
        metadata={"type": "rabbit", "trait": "social"},
    ),
]
retriever = ToyRetriever(documents=documents, k=3)

In [40]:
# Add unit or integration tests to verify that invoke and ainvoke work.

In [36]:
retriever.invoke("that")

[Document(page_content='Cats are independent pets that often enjoy their own space.', metadata={'type': 'cat', 'trait': 'independence'}),
 Document(page_content='Rabbits are social animals that need plenty of space to hop around.', metadata={'type': 'rabbit', 'trait': 'social'})]

In [37]:
await retriever.ainvoke("that")

# It’s a runnable so it’ll benefit from the standard Runnable Interface! 🤩

[Document(page_content='Cats are independent pets that often enjoy their own space.', metadata={'type': 'cat', 'trait': 'independence'}),
 Document(page_content='Rabbits are social animals that need plenty of space to hop around.', metadata={'type': 'rabbit', 'trait': 'social'})]

In [38]:
retriever.batch(["dog", "cat"])

[[Document(page_content='Dogs are great companions, known for their loyalty and friendliness.', metadata={'type': 'dog', 'trait': 'loyalty'})],
 [Document(page_content='Cats are independent pets that often enjoy their own space.', metadata={'type': 'cat', 'trait': 'independence'})]]

In [39]:
async for event in retriever.astream_events("bar", version="v1"):
    print(event)

{'event': 'on_retriever_start', 'run_id': '110b0196-169c-4208-aadd-f1cfb32cfe7b', 'name': 'ToyRetriever', 'tags': [], 'metadata': {}, 'data': {'input': 'bar'}}
{'event': 'on_retriever_stream', 'run_id': '110b0196-169c-4208-aadd-f1cfb32cfe7b', 'tags': [], 'metadata': {}, 'name': 'ToyRetriever', 'data': {'chunk': []}}
{'event': 'on_retriever_end', 'name': 'ToyRetriever', 'run_id': '110b0196-169c-4208-aadd-f1cfb32cfe7b', 'tags': [], 'metadata': {}, 'data': {'output': []}}


  warn_beta(
