## Retrieval augmented generation (RAG)
Fine-tuning an LLM with specific facts is one way to mitigate this, but is often poorly suited for factual recall and can be costly. Retrieval is the process of providing relevant information to an LLM to improve its response for a given input. Retrieval augmented generation (RAG) is the process of grounding the LLM generation (output) using the retrieved information.

<img src="images/RAG_Landscape.png" alt="Retrival Augumented Generation (RAG Landscape)" width="800"/>


In [2]:
from dotenv import load_dotenv, dotenv_values
import google.generativeai as genai
from IPython.display import Markdown, display
import os 


load_dotenv()
os.getenv("GOOGLE_API_KEY") 
my_api_key = os.getenv("GOOGLE_API_KEY")
genai.configure(api_key=my_api_key)
langchain_api_key = os.getenv('LANGSMITH_API_KEY')

os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = langchain_api_key

In [None]:
!pip install langchainhub -qU

### Indexing

In [11]:
import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings

##### Document Loaders 

In [12]:
# Load Documents
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

##### Splitters

In [13]:
# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,
                                               chunk_overlap=200)
splits = text_splitter.split_documents(docs)

##### Vectorstores

In [14]:
## Call Embedding Model
embedding = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

# Embed/Index
vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=embedding )

### Retrieval

In [15]:
retriever = vectorstore.as_retriever(search_kwargs={"k": 1})

### Generation

In [16]:
# Prompt
prompt_hub_rag = hub.pull("rlm/rag-prompt")
# LLM
llm = llm = ChatGoogleGenerativeAI(model= "gemini-1.5-flash", temperature = 0)
# Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

### Chaining

In [20]:
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

### Inference

In [22]:
result = rag_chain.invoke("What is Task Decomposition?")
print(result)

Task decomposition is the process of breaking down a complex task into smaller, more manageable steps. This is often achieved through the use of chain of thought prompting, which encourages the model to think step-by-step and decompose the task into simpler subtasks. This approach helps to improve model performance on complex tasks and provides insights into the model's reasoning process. 

