# How to build a simple RAG LLM App with LangChain
A Retrieval Augmented Generation (RAG) app is a combination of searching for information and generating new content, all rolled into one application. It's like a smart assistant that not only finds the information you need but also uses that information to create new, useful responses. Here’s how it works, explained in simple terms:

#### How a RAG App Works
1. **Question or Query**: You start by asking the RAG app a question or giving it a topic you need information about.
2. **Retrieval**: The app uses a **retriever** to go through a large document to find related information. This could be articles, books, or other relevant documents. Think of this step like finding all the ingredients you need for a recipe.
3. **Augmented Generation**: Once the app has the information, it uses an LLM to generate the response. This generator takes all the information the retriever found and uses it to craft a detailed, accurate answer or content that directly addresses your question or topic.
4. **Output**: Finally, the app presents you with a response that isn’t just copied from somewhere else but is a freshly generated piece of content based on the information it retrieved.

#### Why It’s Useful
- **Accuracy and Relevance**: Because the RAG app pulls information from relevant sources before generating a response, the answers you get are usually more accurate and relevant.
- **Efficiency**: It combines the steps of searching for information and creating content, making it a powerful tool for quickly producing high-quality responses.
- **Versatility**: This type of app can be used for a wide range of applications, from helping students with their homework to assisting researchers in summarizing scientific articles.

#### RAG apps save you money
Retrieval Augmented Generation (RAG) apps are particularly good for saving costs associated with large language models (LLMs).

1. **Efficient Use of Resources**:
   - **Selective Querying**: RAG apps first use a retriever to find relevant information before using a large language model to generate a response. This means the LLM is only called upon when necessary, after the relevant information has already been located. Instead of the LLM processing every aspect of a query from scratch, it focuses on integrating and enhancing the retrieved information to formulate a response.
   - **Reduced Computational Load**: By leveraging retrieved content, the LLM performs less computation per query. This is because part of the work (finding relevant information) is already done, so the model just needs to synthesize this information rather than generating content from zero context, which is more resource-intensive.

In essence, RAG apps help save on LLM costs by making sure that the expensive resources (like computing power and time) associated with running these large models are used judiciously and only where they provide the most value, such as in the generation of precise and contextually informed responses, rather than in the preliminary gathering of information.

#### RAG apps and the privacy of your data
RAG apps involve two main steps: retrieving information from documents and then using that information to generate responses. Whether these apps maintain the privacy of the documents they access largely depends on how they are designed and what data they handle. Here's a simple breakdown:

#### RAG Apps and Data Privacy
RAG is commonly deployed to solve large language model shortcomings, including hallucinations and short context windows. But RAG also helps us build privacy-aware apps.

When using RAG, data is provided as context to an LLM only at generation time, but the data does not need to be used for training AI models. This means your data is not stored in the models themselves as knowledge — it’s only shown to LLMs when we ask for responses.

Instead of storing your data permanently in the AI model, RAG uses the data only when needed to generate a response. After the response is generated, the data isn't stored or remembered by the LLM.

In a production-level RAG Application, successfully building privacy-aware solution requires considering and classifying the data you plan to store upfront, and taking advanced steps to keep your sensible data safe.

In [1]:
import os
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())
openai_api_key = os.environ['OPENAI_API_KEY']

In [2]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model='gpt-3.5-turbo')

In [5]:
import bs4
from langchain_chroma import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts import HumanMessagePromptTemplate
from langchain_core.prompts import PromptTemplate

In [7]:
loader = TextLoader('./data/be-good.txt')
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

splits = text_splitter.split_documents(docs)

vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

retriever = vectorstore.as_retriever()

In [12]:
prompt = ChatPromptTemplate(
    input_variables=['context', 'question'],
    metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, 
    messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], 
                                                               template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"))]
)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {'context': retriever | format_docs, 'question': RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

response = rag_chain.invoke('what is the article about?')
response

'The article discusses the idea of creating something people want and not worrying about making money initially. It suggests that embodying this concept can lead to success, using examples like Craigslist and Octopart. Additionally, it explores the benefits of doing good and how it can attract support and aid in overcoming challenges.'