# Building an Agentic RAG System with ArXiv Knowledge Base

This tutorial demonstrates how to create an intelligent agent that can search and reason over academic papers from ArXiv. Unlike basic RAG, agentic RAG adds reasoning capabilities, allowing the agent to understand context and provide more sophisticated responses.

**What you'll learn:**
- Setting up Agno agents with knowledge bases
- Integrating ArXiv for academic paper retrieval  
- Using PostgreSQL vector database for persistent storage
- Building conversational AI with domain expertise

## Step 1: Import Agno Framework Components

Setting up the core components for our agentic system:
- **Agent**: The main reasoning engine that orchestrates responses
- **OpenAIChat**: Language model for understanding and generating responses  
- **ArxivKnowledgeBase**: Specialized knowledge source for academic papers
- **PgVector**: PostgreSQL extension for storing and searching vector embeddings

In [None]:
# Import core Agno framework components for agentic RAG
from dotenv import load_dotenv                    # Environment variable management
import os
from agno.models.openai import OpenAIChat        # OpenAI language model integration
from agno.agent import Agent                     # Main agent orchestrator
from agno.knowledge.arxiv import ArxivKnowledgeBase  # ArXiv paper knowledge source
from agno.vectordb.pgvector import PgVector      # PostgreSQL vector database

## Step 2: API Key Configuration

Setting up OpenAI API access for the language model. The agent uses OpenAI's models to:
- Understand natural language queries
- Reason over retrieved information
- Generate coherent responses
Ensure your `.env` file contains a valid `OPENAI_API_KEY`.

In [None]:
# Load and configure OpenAI API key for language model access
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    raise ValueError("OPENAI_API_KEY not set in .env!")
os.environ["OPENAI_API_KEY"] = api_key

## Step 3: Knowledge Base Configuration

Creating a specialized ArXiv knowledge base that automatically:
- **Searches**: Finds papers using specified query terms
- **Downloads**: Retrieves full paper content from ArXiv API
- **Processes**: Extracts and chunks text for optimal retrieval
- **Stores**: Saves vectors in PostgreSQL for persistent access

The `recreate=False` parameter ensures we don't re-download existing papers.

In [None]:
# Configure ArXiv knowledge base with target research areas
knowledge_base = ArxivKnowledgeBase(
    queries=["Generative AI", "Machine Learning"],  # Topics to search for
    vector_db=PgVector(
        table_name="arxiv_documents",  # Database table for paper storage
        db_url="postgresql+psycopg://ai:ai@localhost:5432/ai",  # PostgreSQL connection
    ),
)

## Step 4: Agent Initialization

Creating an intelligent agent that combines reasoning with knowledge retrieval:
- **knowledge**: Connects to our ArXiv knowledge base
- **search_knowledge=True**: Enables automatic knowledge search for queries
The agent will automatically search the knowledge base when needed to answer questions.

In [None]:
# Initialize agent with knowledge-enabled capabilities
agent = Agent(
    knowledge=knowledge_base,      # Connect to ArXiv knowledge base
    search_knowledge=True,         # Enable automatic knowledge search
)

## Step 5: Knowledge Base Population

Loading papers into our vector database. This process:
- Downloads recent papers matching our query terms
- Extracts text content and metadata
- Creates embeddings for semantic search
- Stores everything in PostgreSQL for persistence

The `recreate=False` parameter prevents re-downloading existing papers, making subsequent runs faster.

In [None]:
# Populate the knowledge base with ArXiv papers
# This downloads papers, creates embeddings, and stores them
agent.knowledge.load(recreate=False)  # Use existing data if available

## Step 6: Interactive Query System

Testing our agentic RAG system with user questions. The agent will:
1. Understand the natural language query
2. Search the ArXiv knowledge base for relevant papers
3. Synthesize information from multiple sources
4. Generate a comprehensive, contextual response

Try queries like: "latest trends in generative AI" or "most influential papers in machine learning".

In [None]:
# Interactive query system - ask questions about the knowledge base
user_question = input("\n Ask a question from the knowledge base: ")

# Generate intelligent response using retrieved knowledge
agent.print_response(user_question, user_id="user_1", stream=True)

# Example queries to try:
# "What are the latest trends in generative AI?"
# "What are the most influential papers in machine learning?"
# "How do transformers work in natural language processing?"


 Ask a question from the knowledge base:  latest paper in generative ai


Output()