# RAG with SQL Database, Jina AI Reranker, and Hugging Face LLM using LlamaIndex

This notebook demonstrates how to build a Retrieval Augmented Generation (RAG) pipeline using LlamaIndex, where:
1.  **Data Source:** The knowledge base is stored in a SQL database (SQLite in this example).
2.  **Retrieval:** LlamaIndex retrieves relevant information from the SQL database based on a user query.
3.  **Reranking:** `JinaAIRerank` is used to rerank the retrieved documents for better relevance and quality before passing them to the language model.
4.  **Generation:** A Hugging Face LLM (via `llama-index-llms-huggingface`) generates the final answer based on the reranked, retrieved context.

This approach allows you to leverage structured data in SQL databases for your RAG system and improve its performance with sophisticated reranking.

## 1. Install Dependencies

In [None]:
!pip install -qU sqlalchemy transformers einops llama-index llama-index-postprocessor-jinaai-rerank llama-index-llms-huggingface "huggingface_hub[inference]" llama-index-embeddings-huggingface

## 2. Import Libraries and Set Up API Keys

You'll need API keys for Jina AI (for reranking) and Hugging Face Hub (if you are using gated models or want to track usage, though many open models don't strictly require a key for download/inference via `HuggingFaceLLM`).

In [None]:
import os
import logging
import sys
import pandas as pd

# Optional: Set up logging for LlamaIndex
# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG) # DEBUG level for detailed logs
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

# --- API Key Setup ---
# Jina AI API Key for JinaAIRerank
# Replace 'YOUR_JINA_API_KEY' with your actual Jina AI API key.
os.environ['JINA_API_KEY'] = 'YOUR_JINA_API_KEY' 

# Hugging Face API Key (Token)
# Replace 'YOUR_HUGGINGFACE_API_KEY' with your actual Hugging Face Hub token.
# This is often needed for downloading models or using the Inference API.
os.environ['HUGGINGFACE_API_KEY'] = 'YOUR_HUGGINGFACE_API_KEY'

# --- LlamaIndex Core Imports ---
from llama_index.core import ( # Corrected import path
    SQLDatabase,
    VectorStoreIndex,
    ServiceContext,
    Document,
    QueryBundle,
    Settings
)
from llama_index.core.node_parser import SentenceSplitter # Corrected import path
from llama_index.core.query_engine import RetrieverQueryEngine # Corrected import path
from llama_index.core.retrievers import SQLRetriever # Corrected import path

# --- LLM and Reranker Imports ---
from llama_index.llms.huggingface import HuggingFaceLLM # Corrected import path
from llama_index.postprocessor.jinaai_rerank import JinaAIRerank # Corrected import path
from llama_index.embeddings.huggingface import HuggingFaceEmbedding # Corrected import path

# --- Database Imports ---
from sqlalchemy import create_engine, text, Column, Integer, String, MetaData, Table

# Check if API keys are set (optional, for user feedback)
if os.environ.get('JINA_API_KEY') == 'YOUR_JINA_API_KEY' or not os.environ.get('JINA_API_KEY'):
    print("JINA_API_KEY is not set or is using the placeholder. JinaRerank might not work.")
if os.environ.get('HUGGINGFACE_API_KEY') == 'YOUR_HUGGINGFACE_API_KEY' or not os.environ.get('HUGGINGFACE_API_KEY'):
    print("HUGGINGFACE_API_KEY is not set or is using the placeholder. Model downloads might be affected for certain models.")

print("Libraries imported and API key placeholders noted.")

## 3. Set Up SQLite Database and Populate with Data

We'll create an in-memory SQLite database and a table named `documents` containing some text data that our RAG pipeline will query.

In [None]:
# Define the SQLite database engine (in-memory)
engine = create_engine("sqlite:///:memory:")
metadata = MetaData()

# Define the 'documents' table
documents_table = Table(
    'documents',
    metadata,
    Column('id', Integer, primary_key=True, autoincrement=True),
    Column('title', String(255)),
    Column('content', String)
)
metadata.create_all(engine)

# Sample data to insert
sample_docs_data = [
    {'title': 'Alpacas', 'content': 'Alpacas are South American camelids, smaller than llamas. They are known for their soft fleece.'},
    {'title': 'Llamas', 'content': 'Llamas are also South American camelids, often used as pack animals. They are larger than alpacas and have banana-shaped ears.'},
    {'title': 'Climate Change', 'content': 'Climate change refers to long-term shifts in temperatures and weather patterns, largely driven by human activities, especially the burning of fossil fuels.'},
    {'title': 'Photosynthesis', 'content': 'Photosynthesis is the process used by plants, algae, and some bacteria to convert light energy into chemical energy, through a process that uses sunlight, water, and carbon dioxide.'},
    {'title': 'Artificial Intelligence', 'content': 'Artificial intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems. These processes include learning, reasoning, and self-correction.'}
]

# Insert data into the table
with engine.connect() as connection:
    for doc_data in sample_docs_data:
        stmt = documents_table.insert().values(**doc_data)
        connection.execute(stmt)
    connection.commit()

# Verify data insertion
with engine.connect() as connection:
    result = connection.execute(text("SELECT title, content FROM documents")).fetchall()
    print(f"Inserted {len(result)} documents into the SQLite database:")
    for row in result:
        print(f"  Title: {row[0]}, Content: {row[1][:50]}...")

## 4. Connect LlamaIndex to the SQL Database

We use LlamaIndex's `SQLDatabase` class to interface with our SQLite database.

In [None]:
sql_database = SQLDatabase(engine, include_tables=["documents"]) # Specify the table to consider
print("LlamaIndex SQLDatabase initialized.")

## 5. Configure LLM, Embedding Model, and Reranker

We'll set up the Hugging Face LLM for generation, a Hugging Face embedding model for retrieval, and the Jina AI Reranker.

In [None]:
# --- Configure LLM (Hugging Face) ---
# Using a smaller, well-known model for faster demonstration.
# Make sure you have accepted the terms for models like Llama-2 if you choose them.
llm = HuggingFaceLLM(
    model_name="google/flan-t5-small", # A good small model for demonstration
    # model_name="meta-llama/Llama-2-7b-chat-hf", # Example for a larger model
    tokenizer_name="google/flan-t5-small",
    query_wrapper_prompt="Question: {query_str}\nAnswer:", # Adjust based on model needs
    context_window=2048, # Max input tokens for the model
    max_new_tokens=256,  # Max tokens to generate
    model_kwargs={"torch_dtype": "auto"}, # Use "auto" or torch.float16 for GPU
    # generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95},
    device_map="auto" # Automatically select device (CPU/GPU)
)

# --- Configure Embedding Model (Hugging Face) ---
embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-MiniLM-L6-v2" # A good default embedding model
)

# --- Configure Reranker (Jina AI) ---
jina_rerank = JinaAIRerank(
    api_key=os.environ.get('JINA_API_KEY'), 
    model="jina-reranker-v1-base-en", # Specify the Jina reranker model
    top_n=3  # Rerank and return top 3 documents
)

# --- Set up ServiceContext / Settings ---
# LlamaIndex has moved towards using global Settings for LLM, embed_model, etc.
Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = 512 # Optional: configure chunk size for text splitting
Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=20)

print("LLM, Embedding Model, and Reranker configured.")
if not os.environ.get('JINA_API_KEY') or os.environ.get('JINA_API_KEY') == 'YOUR_JINA_API_KEY':
    print("Warning: Jina API key is not set. Reranking will likely fail.")

## 6. Implement RAG Pipeline from SQL Data

For a RAG pipeline over SQL data where we want to apply reranking to text content, we'll first fetch the data from SQL, convert it into LlamaIndex `Document` objects, and then build a `VectorStoreIndex`. This allows us to use standard LlamaIndex retrievers and node postprocessors (like the reranker).

In [None]:
# Fetch data from SQL and convert to LlamaIndex Documents
with engine.connect() as connection:
    result = connection.execute(text("SELECT id, title, content FROM documents")).fetchall()

llama_documents = []
for row_id, title, content in result:
    doc = Document(
        text=content, 
        metadata={'title': title, 'doc_id': row_id} # Add title and original ID as metadata
    )
    llama_documents.append(doc)

print(f"Converted {len(llama_documents)} SQL rows into LlamaIndex Document objects.")

# Build VectorStoreIndex from these documents
# This will use the global Settings.embed_model for embeddings
index = VectorStoreIndex.from_documents(
    llama_documents,
    # service_context=ServiceContext.from_defaults(llm=llm, embed_model=embed_model) # Old way
    # Settings are now used globally, so explicit service_context might not be needed here if already set
)
print("VectorStoreIndex built from SQL documents.")

# --- Construct Query Engine with Retriever and Reranker ---
retriever = index.as_retriever(similarity_top_k=5) # Retrieve top 5 most similar documents initially

# The query engine will use the global Settings.llm
query_engine = RetrieverQueryEngine.from_args(
    retriever=retriever,
    node_postprocessors=[jina_rerank] # Add Jina reranker here
    # service_context=ServiceContext.from_defaults(llm=llm) # Old way
)

print("Query Engine with Jina Reranker created.")

## 7. Run Example Queries

Let's test the RAG pipeline with a few queries. The Jina AI API key must be correctly set for the reranker to work. If not, this step might fail or the reranker might be skipped if it handles errors gracefully (often it will raise an error).

In [None]:
def run_query(query_str):
    print(f"\n--- Query: {query_str} ---")
    try:
        response = query_engine.query(query_str)
        print("\nResponse:")
        print(response)
        
        print("\nSource Nodes (after reranking, if any):")
        for i, node in enumerate(response.source_nodes):
            print(f"  Node {i+1}: Score = {node.score:.4f}, Title = {node.metadata.get('title', 'N/A')}")
            print(f"    Content: {node.text[:100]}...")
            
    except Exception as e:
        print(f"Error during query processing: {e}")
        print("This might be due to an incorrect or missing JINA_API_KEY or HUGGINGFACE_API_KEY, or issues with model loading.")

# Example Query 1: About Alpacas
run_query("Tell me about alpacas and their characteristics.")

# Example Query 2: Comparing Alpacas and Llamas
run_query("How are alpacas different from llamas?")

# Example Query 3: About AI
run_query("What is artificial intelligence?")

## 8. Conclusion

This notebook demonstrated building a RAG pipeline that sources its information from a SQL database. Key steps included:
*   Setting up a SQLite database with sample textual data.
*   Fetching this data and converting it into LlamaIndex `Document` objects.
*   Building a `VectorStoreIndex` on these documents to enable semantic retrieval.
*   Configuring a Hugging Face LLM for text generation and a Hugging Face model for embeddings.
*   Integrating the `JinaAIRerank` postprocessor to refine the retrieved results before they are passed to the LLM.
*   Creating a LlamaIndex `RetrieverQueryEngine` that combines these components.

**Important Considerations:**
*   **API Keys:** Ensure your `JINA_API_KEY` and `HUGGINGFACE_API_KEY` (if needed for your chosen model) are correctly set up as environment variables.
*   **Model Selection:** The choice of LLM and embedding model can significantly impact performance and resource requirements. The `google/flan-t5-small` and `all-MiniLM-L6-v2` models are chosen here for their relatively small size and speed, making them suitable for demonstration.
*   **Data Scale:** For very large SQL databases, consider strategies like incremental indexing, more sophisticated SQL querying to pre-filter relevant data, or using LlamaIndex's `SQLTableRetrieverQueryEngine` if you want to perform natural language to SQL translation first, then RAG on the results.
*   **Error Handling:** The Jina reranker will likely raise an error if the API key is invalid or missing. Robust applications should include proper error handling.
*   **LlamaIndex Updates:** LlamaIndex is a rapidly evolving library. Some class names, import paths, or methods (like `ServiceContext` usage vs. global `Settings`) might change. Always refer to the latest LlamaIndex documentation.