<div style="display: flex; justify-content: flex-start; align-items: center; gap: 15px; margin-bottom: 20px;">
  <a target="_blank" href="https://colab.research.google.com/github.com/SylphAI-Inc/AdalFlow/blob/main/notebooks/tutorials/adalflow_rag_documents.ipynb">
    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
  </a>
  <a href="https://github.com/SylphAI-Inc/AdalFlow/blob/main/tutorials/adalflow_rag_documents.py" target="_blank" style="display: flex; align-items: center;">
      <img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" alt="GitHub" style="height: 20px; width: 20px; margin-right: 5px;">
      <span style="vertical-align: middle;"> Open Source Code </span>
  </a>
</div>

# 🤗 Welcome to AdalFlow!
## The PyTorch library to auto-optimize any LLM task pipelines

Thanks for trying us out, we're here to provide you with the best LLM application development experience you can dream of 😊 any questions or concerns you may have, [come talk to us on discord,](https://discord.gg/ezzszrRZvT) we're always here to help! ⭐ <i>Star us on <a href="https://github.com/SylphAI-Inc/AdalFlow">Github</a> </i> ⭐


# Quick Links

Github repo: https://github.com/SylphAI-Inc/AdalFlow

Full Tutorials: https://adalflow.sylph.ai/index.html#.

Deep dive on each API: check out the [developer notes](https://adalflow.sylph.ai/tutorials/index.html).

Common use cases along with the auto-optimization:  check out [Use cases](https://adalflow.sylph.ai/use_cases/index.html).

# Author
This notebook was created by community contributor [Ajith](https://github.com/ajithvcoder/).

# Outline

This is a quick introduction of what AdalFlow is capable of. We will cover:

* How to use adalflow for rag with documents

Adalflow can be used in a genric manner for any api provider without worrying much about prompt, 
model args and parsing results

**Next: Try our [adalflow-text-splitter]("https://colab.research.google.com/github.com/SylphAI-Inc/AdalFlow/blob/main/notebooks/tutorials/adalflow_text_splitter.ipynb")**


# Installation

1. Use `pip` to install the `adalflow` Python package. We will need `openai`, `groq`, and `faiss`(cpu version) from the extra packages.

  ```bash
    pip install torch --index-url https://download.pytorch.org/whl/cpu
    pip install sentence-transformers==3.3.1
    pip install adalflow[openai,groq,faiss-cpu]
  ```
2. Setup  `openai` and `groq` API key in the environment variables

### Set Environment Variables

Note: Enter your api keys in below cell #todo

In [None]:
%%writefile .env

OPENAI_API_KEY="PASTE-OPENAI_API_KEY_HERE"
GROQ_API_KEY="PASTE-GROQ_API_KEY-HERE"

Overwriting .env


In [2]:
from adalflow.utils import setup_env

# Load environment variables - Make sure to have OPENAI_API_KEY in .env file and .env is present in current folder
setup_env(".env")

In [3]:
import os
import tiktoken
from typing import List, Dict, Tuple
import numpy as np
from sentence_transformers import SentenceTransformer
from faiss import IndexFlatL2

from adalflow.components.model_client import GroqAPIClient, OpenAIClient
from adalflow.core.types import ModelType
from adalflow.utils import setup_env

  from .autonotebook import tqdm as notebook_tqdm


`AdalflowRAGPipeline` is a class that implements a Retrieval-Augmented Generation (RAG) pipeline with adalflow using documents. It has:

- Efficient RAG Pipeline for handling large text files, embedding, and retrieval.
- Supports token management and context truncation for LLM integration.
- Generates accurate responses using retrieval-augmented generation (RAG).

In [4]:
class AdalflowRAGPipeline:
    def __init__(self,
                 model_client=None,
                 model_kwargs=None,
                 embedding_model='all-MiniLM-L6-v2', 
                 vector_dim=384, 
                 top_k_retrieval=3,
                 max_context_tokens=800):
        """
        Initialize RAG Pipeline for handling large text files
        
        Args:
            embedding_model (str): Sentence transformer model for embeddings
            vector_dim (int): Dimension of embedding vectors
            top_k_retrieval (int): Number of documents to retrieve
            max_context_tokens (int): Maximum tokens to send to LLM
        """
        # Initialize model client for generation
        self.model_client = model_client
        
        # Initialize tokenizer for precise token counting
        self.tokenizer = tiktoken.get_encoding("cl100k_base")
        
        # Initialize embedding model
        self.embedding_model = SentenceTransformer(embedding_model)
        
        # Initialize FAISS index for vector similarity search
        self.index = IndexFlatL2(vector_dim)
        
        # Store document texts, embeddings, and metadata
        self.documents = []
        self.document_embeddings = []
        self.document_metadata = []
        
        # Retrieval and context management parameters
        self.top_k_retrieval = top_k_retrieval
        self.max_context_tokens = max_context_tokens
        
        # Model generation parameters
        self.model_kwargs = model_kwargs

    def load_text_file(self, file_path: str) -> List[str]:
        """
        Load a large text file and split into manageable chunks
        
        Args:
            file_path (str): Path to the text file
        
        Returns:
            List[str]: List of document chunks
        """
        with open(file_path, 'r', encoding='utf-8') as file:
            # Read entire file
            content = file.read()
        
        # Split content into chunks (e.g., 10 lines per chunk)
        lines = content.split('\n')
        chunks = []
        chunk_size = 10  # Adjust based on your file structure
        
        for i in range(0, len(lines), chunk_size):
            chunk = '\n'.join(lines[i:i+chunk_size])
            chunks.append(chunk)
        
        return chunks

    def add_documents_from_directory(self, directory_path: str):
        """
        Add documents from all text files in a directory
        
        Args:
            directory_path (str): Path to directory containing text files
        """
        for filename in os.listdir(directory_path):
            if filename.endswith('.txt'):
                file_path = os.path.join(directory_path, filename)
                document_chunks = self.load_text_file(file_path)
                
                for chunk in document_chunks:
                    # Embed document chunk
                    embedding = self.embedding_model.encode(chunk)
                    
                    # Add to index and document store
                    self.index.add(np.array([embedding]))
                    self.documents.append(chunk)
                    self.document_embeddings.append(embedding)
                    self.document_metadata.append({
                        'filename': filename,
                        'chunk_index': len(self.document_metadata)
                    })

    def count_tokens(self, text: str) -> int:
        """
        Count tokens in a given text
        
        Args:
            text (str): Input text
        
        Returns:
            int: Number of tokens
        """
        return len(self.tokenizer.encode(text))

    def retrieve_and_truncate_context(self, query: str) -> str:
        """
        Retrieve relevant documents and truncate to fit token limit
        
        Args:
            query (str): Input query
        
        Returns:
            str: Concatenated context within token limit
        """
        # Retrieve relevant documents
        query_embedding = self.embedding_model.encode(query)
        distances, indices = self.index.search(
            np.array([query_embedding]), 
            self.top_k_retrieval
        )
        
        # Collect and truncate context
        context = []
        current_tokens = 0
        
        for idx in indices[0]:
            doc = self.documents[idx]
            doc_tokens = self.count_tokens(doc)
            
            # Check if adding this document would exceed token limit
            if current_tokens + doc_tokens <= self.max_context_tokens:
                context.append(doc)
                current_tokens += doc_tokens
            else:
                break
        
        return "\n\n".join(context)

    def generate_response(self, query: str) -> str:
        """
        Generate a response using retrieval-augmented generation
        
        Args:
            query (str): User's input query
        
        Returns:
            str: Generated response incorporating retrieved context
        """
        # Retrieve and truncate context
        retrieved_context = self.retrieve_and_truncate_context(query)
        
        # Construct context-aware prompt
        full_prompt = f"""
        Context Documents:
        {retrieved_context}
        
        Query: {query}
        
        Generate a comprehensive response that:
        1. Directly answers the query
        2. Incorporates relevant information from the context documents
        3. Provides clear and concise information
        """
        
        # Prepare API arguments
        api_kwargs = self.model_client.convert_inputs_to_api_kwargs(
            input=full_prompt,
            model_kwargs=self.model_kwargs,
            model_type=ModelType.LLM
        )
        
        # Call API and parse response
        response = self.model_client.call(
            api_kwargs=api_kwargs, 
            model_type=ModelType.LLM
        )
        response_text = self.model_client.parse_chat_completion(response)
        
        return response_text


`run_rag_pipeline` demonstrates how to use the AdalflowRAGPipeline to handle retrieval-augmented generation. It initializes the pipeline with specified retrieval and context token limits, loads documents from a directory, and processes a list of queries. For each query, the function retrieves relevant context, generates a response using the pipeline, and prints the results.

In [7]:
def run_rag_pipeline(model_client, model_kwargs, documents, queries):

    # Example usage of RAG pipeline
    rag_pipeline = AdalflowRAGPipeline(
        model_client=model_client,
        model_kwargs=model_kwargs,
        top_k_retrieval=1,  # Retrieve top 1 most relevant chunks
        max_context_tokens=800  # Limit context to 1500 tokens
    )

    # Add documents from a directory of text files
    rag_pipeline.add_documents_from_directory(documents)
    
    # Generate responses
    for query in queries:
        print(f"\nQuery: {query}")
        response = rag_pipeline.generate_response(query)
        print(f"Response: {response}")


In [9]:
# setup_env()

documents = '../../tutorials/assets/documents'

queries = [
    "What year was the Crystal Cavern discovered?",
    "What is the name of the rare tree in Elmsworth?",
    "What local legend claim that Lunaflits surrounds?"
]

groq_model_kwargs = {
    "model": "llama-3.2-1b-preview",  # Use 16k model for larger context
    "temperature": 0.1,
    "max_tokens": 800,
}

openai_model_kwargs = {
    "model": "gpt-3.5-turbo",
    "temperature": 0.1,
    "max_tokens": 800,
}
# Below example shows that adalflow can be used in a genric manner for any api provider
# without worrying about prompt and parsing results
run_rag_pipeline(GroqAPIClient(), groq_model_kwargs, documents, queries)
run_rag_pipeline(OpenAIClient(), openai_model_kwargs, documents, queries)


Query: What year was the Crystal Cavern discovered?
Response: GeneratorOutput(id=None, data=None, error=None, usage=CompletionUsage(completion_tokens=14, prompt_tokens=203, total_tokens=217), raw_response='The Crystal Cavern was discovered in 1987 by divers.', metadata=None)

Query: What is the name of the rare tree in Elmsworth?
Response: GeneratorOutput(id=None, data=None, error=None, usage=CompletionUsage(completion_tokens=17, prompt_tokens=212, total_tokens=229), raw_response='The rare tree in Elmsworth is known as the "Moonshade Willow".', metadata=None)

Query: What local legend claim that Lunaflits surrounds?
Response: GeneratorOutput(id=None, data=None, error=None, usage=CompletionUsage(completion_tokens=19, prompt_tokens=206, total_tokens=225), raw_response='Local legend claims that Lunaflits are guardians of ancient treasure buried deep within the canyon.', metadata=None)

Query: What year was the Crystal Cavern discovered?
Response: GeneratorOutput(id=None, data=None, error