AI-Powered Document Analysis & Q&A System for Jupyter Notebooks and Excel Files
Context Thread Agent is an intelligent document analysis platform that helps you understand and extract insights from complex Jupyter notebooks and Excel spreadsheets. Using advanced AI powered by Groq's lightning-fast LLM, it provides:
β
100% Grounded Answers - No hallucinations, only facts from your document
β
Citation-Based Responses - Every answer references specific cells/sections
β
Context-Aware Analysis - Understands relationships between code sections
β
Conversation Memory - Maintains context across multiple questions
β
Key Insights Generation - AI-powered summary of main points
β
Professional UI - Split-screen viewer with intuitive Q&A interface
# 1. Install dependencies
pip install -r requirements.txt
# 2. Set your Groq API key (free at console.groq.com)
export GROQ_API_KEY="your_key_here"
# 3. Generate demo files
python generate_demo_files.py
# 4. Launch the application
python main.py ui --port 7860Open your browser to http://localhost:7860
π Detailed Setup: See QUICKSTART.md
| Use Case | Description |
|---|---|
| π Data Analysis Review | Understand complex analytical workflows instantly |
| π Code Audit | Verify assumptions and logic in data science notebooks |
| π Excel Report Analysis | Extract insights from large spreadsheets |
| π€ Automated Documentation | Generate summaries and key findings |
| π‘ Knowledge Extraction | Ask questions about methodology and results |
| π Dependency Tracking | Understand how different code sections connect |
| β Quality Assurance | Validate calculations and transformations |
- Clear platform introduction
- Comprehensive use case showcase
- Prominent upload section
- Feature highlights
Left Panel - Document Viewer:
- π Browse full document with syntax highlighting
- π Generate AI-powered key insights (10-30 seconds)
Right Panel - Q&A Interface:
- π¬ Chatbot-style conversation
- π Automatic citations
- β Confidence scores
- π§ Context-aware responses
- Groq Integration: Lightning-fast inference (< 3 seconds)
- Conversation History: Maintains context across questions
- Key Points Generator: Comprehensive document summarization
- Citation Extraction: References specific cells automatically
- Jupyter Notebooks: Full code, markdown, and output analysis
- Excel Files: Multi-sheet support with statistics
- Intent Recognition: Understands purpose of code sections
- Dependency Tracking: Maps relationships between cells
1. complex_sales_analysis.xlsx (6 sheets, 500 rows)
- Sales transactions across 5 regions
- Product performance analytics
- Time series trends
- Anomaly detection
2. financial_model.xlsx (4 sheets)
- Income statement (5-year)
- Balance sheet
- Cash flow statement
- Key financial ratios
3. customer_churn_analysis.ipynb (200+ lines)
- 10,000 customer dataset
- Complete ML workflow
- Random Forest model (84.7% accuracy)
- Business recommendations
4. stock_forecasting.ipynb
- Time series analysis
- ARIMA modeling
- Forecasting with metrics
- Click "Upload & Analyze"
- Select
.ipynbor.xlsxfile - Wait 2-5 seconds for processing
- Switch to "Key Points" tab
- Click "Generate Key Insights"
- Wait 10-30 seconds for AI analysis
- Type in the chat interface
- Get instant AI-powered answers
- Follow-up questions maintain context
- "What is this document about?"
- "What are the key findings?"
- "How was [metric] calculated?"
- "Why was [data] removed?"
- "What are the business recommendations?"
- "Are there any data quality issues?"
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Gradio Web UI β
β ββββββββββββββββ ββββββββββββββββββββββββ β
β β Document β β Q&A Interface β β
β β Viewer β β (with context) β β
β ββββββββββββββββ ββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Context Thread Builder β
β β’ Parses notebooks/Excel β
β β’ Extracts cells and dependencies β
β β’ Infers intents (data loading, modeling, etc.) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FAISS Vector Indexing β
β β’ Embeds cell content β
β β’ Enables semantic search β
β β’ Fast retrieval (< 100ms) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Groq LLM Reasoning β
β β’ llama-3.3-70b-versatile β
β β’ Conversation history integration β
β β’ Citation extraction β
β β’ Hallucination detection β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Metric | Value |
|---|---|
| Upload Processing | < 2 seconds |
| Document Indexing | < 5 seconds |
| Query Response | 2-4 seconds |
| Key Points Generation | 10-30 seconds |
| Groq Inference | < 3 seconds |
| Context Window | 8K tokens |
- Frontend: Gradio (web UI framework)
- AI/LLM: Groq API (llama-3.3-70b-versatile)
- Vector Search: FAISS (Facebook AI Similarity Search)
- Data Processing: Pandas, NumPy
- Notebook Parsing: nbformat
- Excel Handling: openpyxl, xlsxwriter
- QUICKSTART.md - Complete setup guide
- MAJOR_UPDATES.md - Detailed feature documentation
- design.md - System architecture
- HF_DEPLOYMENT_GUIDE.md - Deployment instructions
context-thread-agent/
βββ ui/
β βββ app.py # Gradio web interface (enhanced)
βββ src/
β βββ groq_integration.py # Groq LLM integration (optimized)
β βββ reasoning.py # Answer generation with context
β βββ retrieval.py # Vector search engine
β βββ indexing.py # FAISS indexing
β βββ parser.py # Notebook/Excel parsing
β βββ dependencies.py # Context thread building
β βββ intent.py # Intent classification
βββ demo_files/ # Complex demo notebooks & Excel
βββ generate_demo_files.py # Demo file generator
βββ main.py # Entry point
βββ requirements.txt # Dependencies
- β Professional homepage with clear value proposition
- β Split-screen workspace (viewer + Q&A)
- β Tabbed document viewer
- β Chatbot-style conversation interface
- β Loading indicators and status updates
- β Conversation context maintained across questions
- β Improved Groq prompting for better accuracy
- β Key insights generation feature
- β Higher token limits (2000 tokens)
- β Better citation extraction
- β Complex sales analysis (500+ rows, 6 sheets)
- β Financial modeling workbook (4 statements)
- β ML notebook (200+ lines, real analysis)
- β Time series forecasting notebook
See MAJOR_UPDATES.md for complete details.
Contributions welcome! Areas for improvement:
- Additional file format support (CSV, JSON, etc.)
- More visualization options
- Export functionality for insights
- Multi-language support
- Advanced filtering and search
MIT License - see LICENSE file
- Groq for lightning-fast LLM inference
- Gradio for the intuitive web framework
- FAISS for efficient vector search
- Open-source community for excellent tools
- Issues: Open a GitHub issue
- Questions: Check QUICKSTART.md first
- Demos: Try the included demo files
git clone https://github.com/Mozzicato/context-thread-agent.git
cd context-thread-agent
pip install -r requirements.txt
export GROQ_API_KEY="your_key_here"
python generate_demo_files.py
python main.py uiUpload a document and start asking questions! π
Made with β€οΈ by the Context Thread Agent team