# Retrieval-Augmented Generation (RAG) Implementation Guide

This notebook demonstrates a complete RAG pipeline using LangChain and Google's Gemini model. Below are the key components explained:

**Key Concepts**:

-   `langchain`: Framework for building LLM applications

-   `chromadb`: Vector database for storing embeddings

-   `langchain-google-vertexai`: Google Cloud integration for Gemini models

-   `dotenv`: Environment variable management

🧠 What is RAG?
---------------

**Retrieval-Augmented Generation (RAG)** is a hybrid AI framework that combines:

1.  **Retrieval**: Fetching relevant information from external sources.

2.  **Generation**: Using a language model (LLM) to synthesize answers based on the retrieved data.

Unlike traditional LLMs that rely solely on pre-trained knowledge, RAG grounds responses in **real-time or domain-specific data**, making it ideal for dynamic or specialized applications.

🔄 How RAG Works
----------------

### 1\. **Retrieval Phase**

-   **Data Source**: A database/dataset (e.g., documents, websites, PDFs).

-   **Embeddings**: Text is converted into numerical vectors (e.g., using Google's `embedding-001`).

-   **Vector Database**: Stores embeddings for fast similarity searches (e.g., ChromaDB).

-   **Query Matching**: When a user asks a question, the system retrieve the most semantically similar text chunks.

### 2\. **Augmentation Phase**

-   The retrieved context is injected into a **prompt template**.

-   Example prompt:

    ```Answer using this context: [retrieved text]. Question: [user query]```

### 3\. **Generation Phase**

-   An LLM (e.g., Gemini, GPT) generates a response using both:

    -   The retrieved context

    -   Its pre-trained knowledge
 
🛠️ Key Components (In this Code)
-----------------------------------

| Component | Your Implementation | Purpose |
| --- | --- | --- |
| **Document Loader** | `WebBaseLoader` | Fetches web content |
| **Text Splitter** | `RecursiveCharacterTextSplitter` | Splits text into chunks with overlap |
| **Embedding Model** | `GoogleGenerativeAIEmbeddings` | Converts text to vectors |
| **Vector Store** | `Chroma` | Stores/search vectors efficiently |
| **Retriever** | `vectorstore.as_retriever()` | Finds relevant chunks for a query |
| **LLM** | Gemini-2.0-Flash | Generates final answers |


🌟 Why Use RAG?
---------------

1.  **Reduces Hallucinations**\
    Grounds responses in retrieved facts rather than relying purely on memorized knowledge.

2.  **Handles Dynamic Data**\
    Perfect for applications needing up-to-date info (e.g., finance, news).

3.  **Cost-Effective**\
    No need to retrain models---just update the database.

4.  **Domain Adaptation**\
    Easily customize for specialized fields (e.g., A51 Finance use case).


**RAG Steps**:

1.  Retrieved chunks from ChromaDB about A51 Finance.

2.  Injected them into the prompt:\
    *"Use this context: [A51 Yield Supercharger docs]... Answer: What is A51 Finance?"*

3.  Gemini synthesized this response:\
    *"A51 Finance... enhances yield farming strategies using ALM 2.0..."*

In [None]:
!pip install langchain_community langchainhub chromadb langchain langchain-openai

In [46]:
pip install -qU langchain-google-vertexai

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
opentelemetry-proto 1.32.1 requires protobuf<6.0,>=5.0, but you have protobuf 6.31.0rc2 which is incompatible.[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [12]:
!pip install dotenv



**Explanation**:

-   Securely loads Google API key for Gemini model access

-   Uses `getpass` to prevent key exposure in notebooks

In [16]:
import getpass
import os
from dotenv import load_dotenv


load_dotenv()

if not os.environ.get("GOOGLE_API_KEY"):
  os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter API key for Google Gemini: ")

Enter API key for Google Gemini:  ········


**Key Concepts**:

-   **Document Loaders**: Fetch content from various sources (websites, PDFs, databases)

-   **WebBaseLoader**: Specialized loader for web content extraction

Data Ingestion
--------------

In [22]:
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader(web_paths=["https://a51-finance.gitbook.io/a51-finance","https://a51-finance.gitbook.io/a51-finance/protocol-features/rebalancing-mechanisms"])
docs = loader.load()


**Why Chunking Matters**:

-   Breaks large documents into manageable pieces

-   Maintains context with 200-token overlap

-   Optimal chunk size balances context retention and processing efficiency


 Text Processing
-------------------

In [42]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200)
splits = text_splitter.split_documents(docs)
print(splits[0])
print(splits[1])
print(len(splits))

page_content='A51 Yield Supercharger | A51 FinanceA51 FinanceSearch...CtrlK⚡A51 Yield Supercharger⭐A51 Liquidity Automation EngineOverviewThe A51 Finance ThesisProtocol Architecture🌎A51 EcosystemLegacy as Unipilot (v2)Protocol FeaturesMarket ModesAdvanced ModesRebalancing MechanismsSingle-Asset DepositZap InBoosted PositionsStrategies MarketplaceA51 Managed VaultsLiquidity AutomationsMarket ModesAdvanced ModesExample StrategiesVolatility-Hedged Yield Maximization StrategyTokenomicsFOO Tokenomics📃Background🗳️Become a Voter🪙What is $oA51?🪜Voting Mechanism💰Earn Revenue in $ETH📈Maximize Your Rewards$A51 TokenRevenue Model & LPsA51 UtilityGovernanceIncentivesToken Distribution & VestingAMMs SupportUniswap v3 and other AMMsUniswap v4V2 DocsPowered by GitBookOn this pageKey Features of the A51 Yield SuperchargerHow it works:A51 Yield SuperchargerNextA51 Liquidity Automation EngineLast updated 10 months agoA51 Yield Supercharger () is a powerful new tool designed to enhance your yield farming 

Vector Storage
--------------

In [48]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.vectorstores import Chroma

**Embedding Concepts**:

-   Converts text to numerical representations (vectors)

-   Google's `embedding-001` model generates 768-dimensional vectors

-   **ChromaDB**: Lightweight vector database for similarity search

In [136]:
vectorstore = Chroma.from_documents(documents=splits, embedding=GoogleGenerativeAIEmbeddings(model="models/embedding-001"))
print(vectorstore._collection.count())

112


In [None]:
print(vectorstore._collection.get())

In [None]:
print("\nCollection 1 - ", vectorstore._collection.get(ids=['cac7e248-68e7-4d55-a4d3-3c51ee0d8c3b'], include=["embeddings", "documents"]))
print("\nCollection 2 - ", vectorstore._collection.get(ids=['5207382c-e91f-4aa5-a198-99d77004a8d1'], include=["embeddings", "documents"]))
print("\nCollection 3 - ", vectorstore._collection.get(ids=['9f794f66-1fe0-46ed-afdc-c271e6e70186'], include=["embeddings", "documents"]))

**Retrieval Process**:

-   Searches vector store for most relevant document chunks

-   Default returns top 4 similar chunks

Retrieval Setup
---------------

In [70]:
retriever = vectorstore.as_retriever()

# Prompt Engineering
----------------------
"You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise."

In [72]:
from langchain import hub
prompt = hub.pull("rlm/rag-prompt")



In [90]:
from langchain.chat_models import init_chat_model
model = init_chat_model("gemini-2.0-flash", model_provider="google_genai", temperature=0)
model

ChatGoogleGenerativeAI(model='models/gemini-2.0-flash', google_api_key=SecretStr('**********'), temperature=0.0, client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x1221dfe90>, default_metadata=())

In [80]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

In [82]:
def format_docs(docs):
  return "\n".join(doc.page_content for doc in docs)

**Pipeline Components**:

1.  **Retriever**: Fetches relevant context

2.  **Prompt**: Structures the LLM input

3.  **Gemini-2.0**: Generates final response

4.  **Output Parser**: Converts response to clean text

In [92]:
rag_chain = ({"context" : retriever | format_docs, "question" : RunnablePassthrough()}
             | prompt
             | model
             | StrOutputParser())

In [94]:
rag_chain.invoke("What is A51 Finance?")

'A51 Finance, also known as A51 Yield Supercharger, is a solution designed to enhance earnings and reduce risks for DeFi investors. It offers pre-curated strategies that adapt to different market conditions, automatically managing liquidity provision. Users can accumulate rewards and incentives on top of their base earnings, which can be reinvested to further boost returns.'

In [96]:
rag_chain.invoke("How do they rebalance user's positions?")

"Active rebalancing adjusts a user's liquidity position within the market’s price range by adjusting the distribution of tokens in their liquidity pool. If the price of ETH drops below the minimum price range, the position will be rebalanced by converting $USDC to $ETH to bring the LP position back to the range. This ensures the position is always within the active price range to keep earning fees and yields from trading volume."

In [100]:
rag_chain.invoke("How does A51 finance cater impermenanat loss on positions?")

'A51 allows users to set a rebalancing frequency to limit rebalances and prevent bigger losses. Strategy creators can monitor token prices for fluctuations to prevent impermanent loss. Staying updated with the market helps users make informed decisions.'

In [124]:
from langchain_core.runnables import RunnableLambda

def print_prompt(prompt_text):
  print("Prompt - ", prompt_text)
  return prompt_text

In [126]:

rag_chain_with_print = ({"context" : retriever | format_docs, "question" : RunnablePassthrough()}
             | prompt
             | RunnableLambda(print_prompt)
             | model
             | StrOutputParser())



In [128]:
rag_chain_with_print.invoke("How does A51 finance cater impermenanat loss on positions?")

Prompt -  messages=[HumanMessage(content="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: How does A51 finance cater impermenanat loss on positions? \nContext: volatility.As a strategy creator, you can speculate if the market bounces back to your main range until your set threshold of resistance or support.Setting a Rebalancing Frequency:Determine how many times you want the protocol to rebalance your liquidity position before it pauses rebalancing.This helps you make a deterministic decision by rethinking your strategy after the set number of rebalances as the market may change in nature.It also helps prevent changes in your liquidity position too many times thus protecting you from bigger losses which usually happen in active rebalancing.By customizing these factors, you are guiding A51

'A51 allows users to set a rebalancing frequency to limit rebalances and prevent bigger losses. Strategy creators can monitor token prices for fluctuations to prevent impermanent loss. Staying updated with the market helps users make informed decisions.'

Key Considerations
------------------

1.  **Chunk Size**: Adjust based on document complexity

2.  **Temperature**: 0 for factual responses, higher for creativity

3.  **Embedding Model**: Choice impacts retrieval quality

4.  **Prompt Engineering**: Crucial for response quality