Skip to content

Mozzicato/context-thread-agent

Repository files navigation

🧡 Context Thread Agent

AI-Powered Document Analysis & Q&A System for Jupyter Notebooks and Excel Files

Python 3.8+ Gradio Groq


🎯 What is Context Thread Agent?

Context Thread Agent is an intelligent document analysis platform that helps you understand and extract insights from complex Jupyter notebooks and Excel spreadsheets. Using advanced AI powered by Groq's lightning-fast LLM, it provides:

βœ… 100% Grounded Answers - No hallucinations, only facts from your document
βœ… Citation-Based Responses - Every answer references specific cells/sections
βœ… Context-Aware Analysis - Understands relationships between code sections
βœ… Conversation Memory - Maintains context across multiple questions
βœ… Key Insights Generation - AI-powered summary of main points
βœ… Professional UI - Split-screen viewer with intuitive Q&A interface


πŸš€ Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Set your Groq API key (free at console.groq.com)
export GROQ_API_KEY="your_key_here"

# 3. Generate demo files
python generate_demo_files.py

# 4. Launch the application
python main.py ui --port 7860

Open your browser to http://localhost:7860

πŸ“– Detailed Setup: See QUICKSTART.md


πŸ’Ό Major Use Cases

Use Case Description
πŸ“Š Data Analysis Review Understand complex analytical workflows instantly
πŸ” Code Audit Verify assumptions and logic in data science notebooks
πŸ“ˆ Excel Report Analysis Extract insights from large spreadsheets
πŸ€– Automated Documentation Generate summaries and key findings
πŸ’‘ Knowledge Extraction Ask questions about methodology and results
πŸ”— Dependency Tracking Understand how different code sections connect
βœ… Quality Assurance Validate calculations and transformations

✨ Key Features

1. Professional Homepage

  • Clear platform introduction
  • Comprehensive use case showcase
  • Prominent upload section
  • Feature highlights

2. Split-Screen Workspace

Left Panel - Document Viewer:

  • πŸ“„ Browse full document with syntax highlighting
  • πŸ”‘ Generate AI-powered key insights (10-30 seconds)

Right Panel - Q&A Interface:

  • πŸ’¬ Chatbot-style conversation
  • πŸ“š Automatic citations
  • βœ“ Confidence scores
  • 🧠 Context-aware responses

3. Enhanced AI Capabilities

  • Groq Integration: Lightning-fast inference (< 3 seconds)
  • Conversation History: Maintains context across questions
  • Key Points Generator: Comprehensive document summarization
  • Citation Extraction: References specific cells automatically

4. Smart Document Processing

  • Jupyter Notebooks: Full code, markdown, and output analysis
  • Excel Files: Multi-sheet support with statistics
  • Intent Recognition: Understands purpose of code sections
  • Dependency Tracking: Maps relationships between cells

πŸ“ Demo Files Included

Complex Real-World Examples

1. complex_sales_analysis.xlsx (6 sheets, 500 rows)

  • Sales transactions across 5 regions
  • Product performance analytics
  • Time series trends
  • Anomaly detection

2. financial_model.xlsx (4 sheets)

  • Income statement (5-year)
  • Balance sheet
  • Cash flow statement
  • Key financial ratios

3. customer_churn_analysis.ipynb (200+ lines)

  • 10,000 customer dataset
  • Complete ML workflow
  • Random Forest model (84.7% accuracy)
  • Business recommendations

4. stock_forecasting.ipynb

  • Time series analysis
  • ARIMA modeling
  • Forecasting with metrics

🎬 How to Use

1. Upload Your Document

  • Click "Upload & Analyze"
  • Select .ipynb or .xlsx file
  • Wait 2-5 seconds for processing

2. Generate Key Insights (Recommended)

  • Switch to "Key Points" tab
  • Click "Generate Key Insights"
  • Wait 10-30 seconds for AI analysis

3. Ask Questions

  • Type in the chat interface
  • Get instant AI-powered answers
  • Follow-up questions maintain context

4. Example Questions

- "What is this document about?"
- "What are the key findings?"
- "How was [metric] calculated?"
- "Why was [data] removed?"
- "What are the business recommendations?"
- "Are there any data quality issues?"

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  Gradio Web UI                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   Document   β”‚        β”‚    Q&A Interface     β”‚  β”‚
β”‚  β”‚    Viewer    β”‚        β”‚   (with context)     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Context Thread Builder                  β”‚
β”‚  β€’ Parses notebooks/Excel                           β”‚
β”‚  β€’ Extracts cells and dependencies                  β”‚
β”‚  β€’ Infers intents (data loading, modeling, etc.)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           FAISS Vector Indexing                      β”‚
β”‚  β€’ Embeds cell content                              β”‚
β”‚  β€’ Enables semantic search                          β”‚
β”‚  β€’ Fast retrieval (< 100ms)                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            Groq LLM Reasoning                        β”‚
β”‚  β€’ llama-3.3-70b-versatile                          β”‚
β”‚  β€’ Conversation history integration                 β”‚
β”‚  β€’ Citation extraction                              β”‚
β”‚  β€’ Hallucination detection                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“Š Performance Metrics

Metric Value
Upload Processing < 2 seconds
Document Indexing < 5 seconds
Query Response 2-4 seconds
Key Points Generation 10-30 seconds
Groq Inference < 3 seconds
Context Window 8K tokens

πŸ› οΈ Technology Stack

  • Frontend: Gradio (web UI framework)
  • AI/LLM: Groq API (llama-3.3-70b-versatile)
  • Vector Search: FAISS (Facebook AI Similarity Search)
  • Data Processing: Pandas, NumPy
  • Notebook Parsing: nbformat
  • Excel Handling: openpyxl, xlsxwriter

πŸ“š Documentation


🎯 Project Structure

context-thread-agent/
β”œβ”€β”€ ui/
β”‚   └── app.py              # Gradio web interface (enhanced)
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ groq_integration.py # Groq LLM integration (optimized)
β”‚   β”œβ”€β”€ reasoning.py        # Answer generation with context
β”‚   β”œβ”€β”€ retrieval.py        # Vector search engine
β”‚   β”œβ”€β”€ indexing.py         # FAISS indexing
β”‚   β”œβ”€β”€ parser.py           # Notebook/Excel parsing
β”‚   β”œβ”€β”€ dependencies.py     # Context thread building
β”‚   └── intent.py           # Intent classification
β”œβ”€β”€ demo_files/             # Complex demo notebooks & Excel
β”œβ”€β”€ generate_demo_files.py  # Demo file generator
β”œβ”€β”€ main.py                 # Entry point
└── requirements.txt        # Dependencies

🚦 What's New in This Version

Major UI/UX Overhaul ✨

  • βœ… Professional homepage with clear value proposition
  • βœ… Split-screen workspace (viewer + Q&A)
  • βœ… Tabbed document viewer
  • βœ… Chatbot-style conversation interface
  • βœ… Loading indicators and status updates

Enhanced AI Capabilities πŸ€–

  • βœ… Conversation context maintained across questions
  • βœ… Improved Groq prompting for better accuracy
  • βœ… Key insights generation feature
  • βœ… Higher token limits (2000 tokens)
  • βœ… Better citation extraction

Professional Demo Files πŸ“

  • βœ… Complex sales analysis (500+ rows, 6 sheets)
  • βœ… Financial modeling workbook (4 statements)
  • βœ… ML notebook (200+ lines, real analysis)
  • βœ… Time series forecasting notebook

See MAJOR_UPDATES.md for complete details.


🀝 Contributing

Contributions welcome! Areas for improvement:

  • Additional file format support (CSV, JSON, etc.)
  • More visualization options
  • Export functionality for insights
  • Multi-language support
  • Advanced filtering and search

πŸ“„ License

MIT License - see LICENSE file


πŸ™ Acknowledgments

  • Groq for lightning-fast LLM inference
  • Gradio for the intuitive web framework
  • FAISS for efficient vector search
  • Open-source community for excellent tools

πŸ“ž Support

  • Issues: Open a GitHub issue
  • Questions: Check QUICKSTART.md first
  • Demos: Try the included demo files

πŸŽ‰ Get Started Now!

git clone https://github.com/Mozzicato/context-thread-agent.git
cd context-thread-agent
pip install -r requirements.txt
export GROQ_API_KEY="your_key_here"
python generate_demo_files.py
python main.py ui

Upload a document and start asking questions! πŸš€


Made with ❀️ by the Context Thread Agent team

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors