# Getting Started with Agent TableRAG

This guide will help you get up and running with Agent TableRAG quickly.

## Prerequisites

- Python 3.8 or higher
- OpenAI API key
- Internet connection for package installation

## Step 1: Install Dependencies

Navigate to the `myspace` directory and install the required packages:

```bash
cd myspace
pip install -r requirements.txt
```

## Step 2: Quick Setup and Test

Run the quick start script to set up your configuration and test the system:

```bash
python quick_start.py
```

This will:
- Check if all dependencies are installed
- Set up your OpenAI API key
- Test the system with sample data
- Verify everything is working

## Step 3: Validate Installation

Run the validation test to ensure all components work:

```bash
python test_validation.py
```

## Step 4: Try the Example

Run the detailed example to see the system in action:

```bash
python example_usage.py
```

## Basic Usage

### 1. Initialize the Agent

```python
from agent_tablerag import AgentTableRAG

agent = AgentTableRAG(
    agent_explanation="You are a personal assistant that helps with meeting schedules"
)
```

### 2. Add Table Knowledge

```python
# From a pandas DataFrame
import pandas as pd
df = pd.DataFrame({
    "Date": ["2024-08-01", "2024-08-02"],
    "Meeting": ["Project Kickoff", "Team Standup"],
    "Time": ["09:00 AM", "10:00 AM"]
})

agent.add_table_knowledge(
    table_data=df,
    knowledge_explanation="Meeting schedule for August 2024",
    table_name="August_Meetings"
)

# From a CSV file
agent.add_table_knowledge(
    table_data="path/to/your/data.csv",
    knowledge_explanation="Description of what this data contains"
)
```

### 3. Query the Agent

```python
response = agent.query("What meetings do I have on August 1st?")

print(response["answer"])
print(f"Confidence: {response['confidence']}")

# Get detailed source information
if response.get("sources"):
    for source in response["sources"]:
        print(f"Source: {source['table_name']}")
        print(f"Relevance: {source['relevance_score']:.3f}")
```

## Configuration

The system is configured through `config.json`:

```json
{
    "model": {
        "EMBEDDING_MODEL": "text-embedding-3-large",
        "GPT_MODEL": "gpt-4o-mini"
    },
    "api_key": "your-openai-api-key",
    "api_base": "https://api.openai.com/v1",
    "retrieval": {
        "top_k": 5,
        "similarity_threshold": 0.7
    },
    "table_processing": {
        "enable_filtering": true,
        "enable_clarification": true,
        "max_table_rows": 100,
        "chunk_size": 512
    }
}
```

### Key Settings

- **EMBEDDING_MODEL**: OpenAI embedding model for semantic search
- **GPT_MODEL**: OpenAI model for text generation
- **top_k**: Number of relevant chunks to retrieve
- **enable_filtering**: Whether to filter large tables for relevance
- **enable_clarification**: Whether to clarify ambiguous terms
- **max_table_rows**: Maximum rows to process (larger tables are filtered)
- **chunk_size**: Size of table chunks for better retrieval

## Advanced Features

### Multiple Tables

```python
# Add multiple related tables
agent.add_table_knowledge(meetings_df, "Meeting schedules")
agent.add_table_knowledge(employees_df, "Employee directory") 
agent.add_table_knowledge(projects_df, "Project information")

# Query across all tables
response = agent.query("Who is the project manager for the Q3 project?")
```

### Table Management

```python
# Get summary of all tables
summary = agent.get_table_summary()
print(f"Total tables: {summary['num_tables']}")

# Remove specific table
agent.remove_table_knowledge("August_Meetings")

# Clear all knowledge
agent.clear_all_knowledge()
```

### Custom Processing

```python
# Disable automatic filtering for small, precise tables
agent.table_processor.enable_filtering = False

# Adjust chunk size for different data types
agent.table_processor.chunk_size = 256  # Smaller chunks for detailed data
```

## Troubleshooting

### Common Issues

1. **Import Errors**
   - Make sure you're in the `myspace` directory
   - Install dependencies: `pip install -r requirements.txt`

2. **API Key Issues**
   - Ensure your OpenAI API key is valid
   - Check you have sufficient credits
   - Verify the key is correctly set in `config.json`

3. **Memory Issues with Large Tables**
   - Reduce `chunk_size` in config
   - Enable filtering: `"enable_filtering": true`
   - Process tables in smaller batches

4. **Poor Retrieval Results**
   - Increase `top_k` for more context
   - Improve your `knowledge_explanation` descriptions
   - Enable clarification: `"enable_clarification": true`

### Performance Tips

1. **For Better Accuracy**
   - Write clear, specific knowledge explanations
   - Use descriptive table names
   - Enable table clarification for domain-specific data

2. **For Better Speed**
   - Reduce `chunk_size` for faster processing
   - Disable clarification for simple tables
   - Use smaller embedding models if acceptable

3. **For Large Datasets**
   - Enable filtering to reduce noise
   - Process tables incrementally
   - Consider pre-filtering your data

## Support

If you encounter issues:

1. Run `python test_validation.py` to check your setup
2. Check the console output for specific error messages
3. Ensure your Python version is 3.8 or higher
4. Verify all dependencies are correctly installed

## Next Steps

- Explore `example_usage.py` for comprehensive examples
- Experiment with different table formats and configurations
- Integrate Agent TableRAG into your own chatbot or application


---
---
---

# Agent TableRAG - Project Summary

## What I've Built

I've successfully reimplemented the TableRAG system as a simplified but powerful **Agent TableRAG** specifically designed for your AI chatbot use case. Here's what the system provides:

### 🎯 Core Problem Solved
Your original issue: AI agents fail to answer questions about tabular data accurately (e.g., "meetings on the first day of August" returning no results when meetings exist).

### 🚀 Key Features

1. **Smart Table Processing**
   - Intelligent filtering to keep only relevant table rows
   - LLM-based clarification of domain-specific terms
   - Automatic chunking for optimal retrieval

2. **Advanced Retrieval System**
   - FAISS-based vector search using OpenAI embeddings
   - Semantic similarity matching for better relevance
   - Multiple table support with context preservation

3. **Agent-Focused API**
   - Simple interface: `add_table_knowledge()` and `query()`
   - Built-in confidence scoring
   - Source attribution for transparency

4. **Production Ready**
   - Error handling and recovery
   - Configurable processing parameters
   - Memory-efficient chunking

## 📁 Project Structure

```
myspace/
├── agent_tablerag.py          # Main API - your entry point
├── config.json                # Configuration file
├── requirements.txt           # Dependencies
├── 
├── Core Components:
├── core/
│   ├── llm_client.py          # OpenAI API integration
│   ├── table_processor.py     # Table filtering & clarification
│   ├── faiss_retriever.py     # Vector search system
│   └── table_parser.py        # Table format handling
├── 
├── Utilities:
├── utils/
│   ├── similarity.py          # Embedding similarity functions
│   └── text_processing.py     # Text cleaning & formatting
├── 
├── Setup & Examples:
├── quick_start.py             # One-command setup & test
├── example_usage.py           # Detailed usage examples
├── test_validation.py         # System validation
├── setup.py                   # Installation script
├── GETTING_STARTED.md         # Comprehensive guide
└── README.md                  # Project overview
```

## 🛠 How It Works

1. **Table Ingestion**: Load tables from CSV, DataFrame, or other formats
2. **Smart Processing**: Filter large tables, clarify ambiguous terms
3. **Vectorization**: Convert table chunks to embeddings using OpenAI
4. **Indexing**: Store in FAISS vector database for fast retrieval
5. **Query Processing**: Find relevant chunks, generate contextual answers

## 🎯 Addressing Your Original Problem

**Before**: Agent says "no meetings on August 1st" when meetings exist
**After**: System will:
1. Find relevant table chunks containing August 1st data
2. Use semantic search to match date variations ("first day of August" → "2024-08-01")
3. Generate accurate answers with source attribution
4. Provide confidence scores

## 📋 Quick Start

1. **Setup** (one command):
   ```bash
   cd myspace
   python quick_start.py
   ```

2. **Use in your code**:
   ```python
   from agent_tablerag import AgentTableRAG
   
   agent = AgentTableRAG("You are a meeting assistant")
   agent.add_table_knowledge(meeting_data, "August 2024 meeting schedule")
   response = agent.query("What meetings do I have on August 1st?")
   ```

## 🔧 Key Improvements Over Original TableRAG

1. **Simplified Architecture**: Removed complex ColBERT dependencies
2. **Agent-Focused**: API designed specifically for chatbot integration
3. **Better Error Handling**: Graceful failures with helpful messages
4. **Faster Setup**: One-command installation and testing
5. **Production Ready**: Memory management and configuration options

## 📊 Configuration Options

The system is highly configurable via `config.json`:

- **Models**: Choose your OpenAI models (embedding + chat)
- **Retrieval**: Adjust number of results and similarity thresholds
- **Processing**: Enable/disable filtering and clarification
- **Performance**: Tune chunk sizes and batch processing

## 🎪 What Makes This Different

1. **Semantic Understanding**: Goes beyond keyword matching
2. **Context Awareness**: Maintains table context and relationships
3. **Intelligent Filtering**: Reduces noise while preserving relevance
4. **Multi-Table Support**: Query across multiple related tables
5. **Confidence Scoring**: Know how reliable each answer is

## 🚀 Ready to Use

The system is complete and ready for integration into your chatbot. The `example_usage.py` demonstrates exactly your use case - a meeting schedule assistant that accurately answers date-based queries.

**Next Steps**:
1. Run `python quick_start.py` to set up and test
2. Try `python example_usage.py` to see it in action
3. Integrate `AgentTableRAG` into your chatbot
4. Add your actual table data and start querying!

This implementation should solve your retrieval accuracy problems while being much easier to set up and use than the original TableRAG repository.


---
---
---

# AI Agent TableRAG

A simplified but powerful table-based retrieval augmented generation system for AI chatbots.

## Overview

This system enhances AI agent chatbots with advanced table retrieval capabilities using ColBERT and intelligent table processing. It's designed to handle scenarios where agents need to answer questions based on tabular data knowledge.

## Features

- **Advanced Table Processing**: Intelligent filtering and clarification of tabular data
- **FAISS Integration**: Fast semantic search using OpenAI embeddings with FAISS vector database
- **Agent-Focused API**: Simple interface designed for chatbot integration
- **Flexible Table Format**: Support for various table formats (CSV, JSON, Markdown)
- **Smart Chunking**: Automatic table segmentation for optimal retrieval
- **LLM-Enhanced Filtering**: Uses GPT models to intelligently filter relevant table content

## Installation

```bash
# Clone or download the project
# Navigate to the myspace directory

# Install dependencies
pip install -r requirements.txt

# Quick start setup
python quick_start.py
```

## Configuration

Update `config.json` with your OpenAI API key:

```json
{
    "api_key": "your-openai-api-key"
}
```

## Quick Start

```python
from agent_tablerag import AgentTableRAG

# Initialize the system
agent_rag = AgentTableRAG(
    agent_explanation="You are a personal assistant that answers questions about meeting schedules",
    config_path="config.json"
)

# Add table knowledge
agent_rag.add_table_knowledge(
    table_data="path/to/meeting_agenda.csv",
    knowledge_explanation="This is the meeting agenda for the next 6 months"
)

# Query the agent
response = agent_rag.query("What meetings do I have on the first day of August?")
print(response)
```

## Project Structure

```
myspace/
├── config.json                 # Configuration file
├── agent_tablerag.py          # Main API interface
├── core/
│   ├── __init__.py
│   ├── table_processor.py     # Table processing and filtering
│   ├── faiss_retriever.py     # FAISS-based retrieval
│   ├── llm_client.py          # OpenAI API interface
│   └── table_parser.py        # Table parsing utilities
├── utils/
│   ├── __init__.py
│   ├── similarity.py          # Similarity calculations
│   └── text_processing.py     # Text processing utilities
├── data/
│   ├── processed/             # Processed table data
│   └── indices/               # ColBERT indices
└── requirements.txt           # Dependencies
```
