NLytics

Natural Language Analytics Engine

NLytics is an AI-powered data analysis platform that transforms natural language questions into executable Python code. Built for data analysts, researchers, and business users who need rapid insights without manual coding.

🚀 Live Demo on Render 🚀

Screenshots

File Upload

Query Analysis

Code & Insights

Quick Start

Prerequisites

Python 3.9+ (tested on 3.13.7)
Groq API key (free tier available at console.groq.com)

Installation

# Clone repository
git clone https://github.com/Tech-Genkai/NLytics.git
cd NLytics

# Configure environment
echo "GROQ_API_KEY=your_api_key_here" > .env

# Install dependencies
pip install -r backend/requirements.txt

# Start server
python start.py

Access the interface at http://localhost:5000

Features

🤖 AI-Powered Understanding

Groq Llama 3.3-70B for natural language understanding
Full dataset context awareness (columns, types, samples, statistics)
Conversation history tracking for contextual queries
Minimal clarifications needed

💻 Real Code Generation

Generates actual pandas/Python code, not API wrappers
Shows generated code before execution
Self-correcting with retry logic (3 attempts with structured feedback)
Validates for syntax, security, and correctness

🔒 Multi-Layer Security

AST parsing for syntax validation
Blacklist filtering - blocks dangerous operations (eval, exec, os, sys, file I/O)
Import whitelist - only pandas and numpy allowed
Sandboxed execution with restricted builtins
Timeout protection (30-second default)
Column-aware validation

📊 Smart Visualizations

Auto-detects best chart type (bar, scatter, pie, box, line)
Interactive Plotly charts with Chart.js fallback
Professional color palette
Statistical insights and narrative summaries

🎯 Transparent Execution

View generated code before execution
Clear error messages with retry feedback
Execution time tracking
Export results (CSV, Excel, JSON)

Architecture

NLytics employs a 9-phase AI pipeline that transforms conversational queries into validated, executable analytics:

1. Intent Detection       → Natural language understanding (Groq Llama 3.3-70B)
2. Query Refinement       → Semantic optimization for analytical depth
3. Multi-Step Planning    → Decomposition into logical execution steps
4. Code Generation        → Pandas/Python synthesis from natural language
5. Security Validation    → AST parsing and blacklist verification
6. Sandboxed Execution    → Isolated subprocess with restricted builtins
7. Insight Generation     → Statistical analysis and visualization config
8. Answer Synthesis       → Plain-language explanation generation
9. Result Presentation    → Interactive charts with narrative context

Pipeline Flow

User Query
    ↓
AI Intent Detection (Groq Llama 3.3)
    ↓
Query Refinement (Semantic optimization)
    ↓
Query Planning (Multi-step decomposition)
    ↓
Code Generation (Pandas/Python)
    ↓
Validation (Security, Syntax, Logic)
    ↓  [Retry loop with feedback - 3 attempts]
Safe Execution (Sandboxed, Timeout)
    ↓
Insights (Narrative, Plotly viz, Export)

Example Queries

Stock market analysis (samples/stock_data_july_2025.csv):

"highest growing stock"           → Top 10 comparison with volatility analysis
"average market cap by sector"    → Grouped aggregation with visualizations
"stocks with PE ratio below 15"   → Multi-condition filtering
"correlation between volume and price" → Statistical correlation with scatter plot

General analytics patterns:

"show me a summary"               → Statistical overview
"average [column] by [category]"  → Grouped aggregation
"distribution of [column]"        → Frequency analysis
"top 10 [column] by [metric]"     → Ranked comparisons
"outliers in [column]"            → Statistical outlier detection

Installation

System Requirements

Python 3.9 or higher
2GB RAM minimum
Internet connection (for Groq API)
Modern web browser

Step-by-Step Setup

Clone the repository:

git clone https://github.com/Tech-Genkai/NLytics.git
cd NLytics

Get your Groq API key:
- Visit console.groq.com/keys
- Sign up for free (no credit card required)
- Create and copy your API key (starts with gsk_...)

Configure environment:

# Create .env file
echo "GROQ_API_KEY=your_actual_key_here" > .env

Install dependencies:
```
pip install -r backend/requirements.txt
```
Start the server:
```
python start.py
```
Open your browser: Navigate to http://localhost:5000

Usage

Chat Interface

Upload Data
- Click "📤 Upload Data" button
- Select CSV or Excel file (max 50MB)
- Wait for preprocessing confirmation
Ask Questions
- Type natural language queries
- Press Enter to send
- View generated code, results, and insights
View Insights
- Interactive Plotly charts
- Statistical summaries
- Key findings and recommendations

Supported File Formats

CSV (.csv)
Excel (.xlsx, .xls)
Maximum file size: 50MB

REST API

NLytics provides a comprehensive REST API for programmatic access.

Base URL

Production: https://nlytics.onrender.com/api/v1
Local: http://localhost:5000/api/v1

Quick Example

# Complete analysis in one call
curl -X POST https://nlytics.onrender.com/api/v1/analyze \
  -F "file=@data.csv" \
  -F "query=highest stock by volume"

Available Endpoints

Endpoint	Method	Description
`/analyze`	POST	Upload & analyze in one call
`/query`	POST	Query existing session
`/status/<session_id>`	GET	Get session status
`/code/validate`	POST	Validate code
`/code/execute`	POST	Execute code
`/health`	GET	Health check

Response Format

{
  "success": true,
  "status": "completed",
  "query": {
    "original": "highest stock",
    "refined": "top 10 stocks comparison",
    "intent": {...}
  },
  "code": {
    "generated": "df.nlargest(10, 'Close')...",
    "execution_time": 0.23
  },
  "result": {
    "data": [...]
  },
  "visualization": {
    "type": "bar",
    "plotly": "{...}",
    "config": {...}
  },
  "insights": {
    "narrative": "...",
    "key_findings": [...]
  },
  "answer": "Based on the data, AAPL is the highest..."
}

See DEV_GUIDE.md for complete API documentation with examples.

Security

Multi-Layer Validation

AST Parsing - Syntax validation before execution
Blacklist Filtering - Blocks dangerous operations:
- eval, exec, compile, __import__
- open, input, getattr, setattr
- os, sys, subprocess, socket
- globals, locals, vars, dir
Import Whitelist - Only pandas and numpy permitted
Column Validation - Schema-aware code generation
Timeout Enforcement - 30-second execution limit

Sandboxed Execution Environment

Restricted builtins - Only safe functions available
No introspection - Blocked getattr, setattr, globals, locals
No imports - Prevented import access
No file system - No read/write operations
No network - No external connections
Isolated context - Each execution is independent

Configuration

Environment Variables

Create a .env file in the project root:

GROQ_API_KEY=your_groq_api_key_here
SECRET_KEY=your_secret_key_for_flask  # Optional
FLASK_ENV=development  # or production
PORT=5000  # Optional, default is 5000

Execution Parameters

Configurable in backend/config.py:

MAX_CONTENT_LENGTH = 50 * 1024 * 1024  # 50MB file upload limit
API_TIMEOUT = 30  # API request timeout (seconds)
MAX_RETRIES = 3  # Code generation retry attempts
EXECUTION_TIMEOUT = 300  # Code execution timeout (5 minutes)
MAX_ROWS = 1_000_000  # Maximum rows to process
MAX_COLUMNS = 500  # Maximum columns to process

Testing

Run Tests

# All unit tests
python -m pytest backend/tests/ -v

# With coverage report
python -m pytest backend/tests/ --cov=backend/services --cov-report=html

# Automated integration tests (60+ scenarios)
python backend/tests/automated_test.py

# Quick integration test
python backend/tests/automated_test.py --quick

Test Coverage

Code validation - Syntax, security, imports
Safe execution - Error handling, result capture, isolation
Insight generation - DataFrames, scalars, visualizations
Schema inspection - Column types, statistics
Integration tests - 60+ scenarios covering:
- Basic aggregations
- Growth & performance analysis
- Sector comparisons
- Complex queries
- Statistical insights
- Rankings
- Investment screening
- Natural language variations
- Edge cases

Project Scope

What This Is

Proof-of-concept validating AI-powered code generation
Educational tool demonstrating LLM-powered analytics
Personal analysis tool for local use and demos
Architecture blueprint for conversational analytics

What This Is NOT

Production-ready SaaS platform
Enterprise-grade software
Fully hardened security system
Scalable multi-user service

Design Philosophy

NLytics prioritizes transparency and education over abstraction:

Code Visibility - Generated pandas code is always visible
Query Refinement - Automatically enhances queries for analytical depth
Schema Awareness - Understands data types and relationships
Self-Correction - Retry loops with structured feedback
Educational Value - Learn data analysis by observing code generation

Design Decisions

Why no database?

Users analyze sensitive data - no storage = no liability
Sessions expire on restart by design
This is an analysis tool, not a data warehouse
Stateless architecture works for cloud deployment

Why no authentication?

Single-user tool - control access via URL sharing
For public deployment: use Render's basic auth
Adding app-level auth is overkill for demos

Why no caching?

Queries are rarely identical
Groq API responses are fast (~5s average)
Cache invalidation complexity outweighs benefits

Why no Docker?

Local development tool for single users
Python virtual environment is simpler
Docker adds complexity with no benefit

Current Scope

Single-user - Local or Render deployment
Session-based - No persistence by design
API-dependent - Requires Groq API key
Dataset limits - Optimized for <100K rows
Stateless - Works on cloud platforms

Technical Stack

Layer	Technology
AI Model	Groq Llama 3.3-70B (70B parameters)
Backend	Flask 3.1.0, Python 3.9+
Data Processing	pandas 2.2.3, numpy 2.2.4
Visualization	Plotly 5.24.1, Chart.js fallback
Frontend	Vanilla JavaScript, Marked.js
Testing	pytest, 60+ integration scenarios

Project Structure

NLytics/
├── backend/
│   ├── main.py                         # Flask application core
│   ├── config.py                       # Configuration management
│   ├── requirements.txt                # Python dependencies
│   ├── api/
│   │   └── analytics_api.py            # REST API endpoints
│   ├── services/                       # AI pipeline services
│   │   ├── ai_intent_detector.py       # Natural language understanding
│   │   ├── query_refiner.py            # Query optimization
│   │   ├── query_planner.py            # Multi-step planning
│   │   ├── code_generator.py           # LLM code synthesis
│   │   ├── code_validator.py           # Security validation
│   │   ├── safe_executor.py            # Sandboxed execution
│   │   ├── insight_generator.py        # Statistical insights
│   │   ├── answer_synthesizer.py       # Natural language answers
│   │   ├── preprocessor.py             # Data cleaning
│   │   ├── schema_inspector.py         # Column analysis
│   │   └── file_handler.py             # File operations
│   ├── models/
│   │   └── chat_message.py             # Message types
│   ├── tests/                          # Test suite
│   │   ├── automated_test.py           # Integration tests (60+)
│   │   ├── test_api.py
│   │   ├── test_code_validator.py
│   │   ├── test_insight_generator.py
│   │   ├── test_safe_executor.py
│   │   └── test_schema_inspector.py
│   └── utils/                          # Utility modules
├── frontend/
│   ├── index.html                      # Chat interface
│   └── static/
│       ├── js/app.js                   # Client logic
│       └── css/style.css               # Styling
├── data/
│   ├── uploads/                        # User files
│   └── processed/                      # Cleaned data
├── samples/                            # Example datasets
├── .env                                # Environment config
├── .gitignore
├── README.md                           # This file
├── DEV_GUIDE.md                        # Developer guide
└── start.py                            # Application entry point

Troubleshooting

Common Issues

"GROQ_API_KEY not found"

Ensure .env file exists in project root
Verify API key is correct (starts with gsk_...)
Restart server after creating .env

"Address already in use" (Port 5000)

# Windows: Kill process using port 5000
netstat -ano | findstr :5000
taskkill /PID <PID> /F

# Or change port in backend/main.py

"File type not supported"

Only CSV (.csv) and Excel (.xlsx, .xls) supported
File must be under 50MB
Try re-saving as CSV (UTF-8)

"Column not found"

Use exact column names from upload confirmation
Preprocessing normalizes names (spaces → underscores)
Check health report for actual column names

"No dataset loaded"

Upload a file before asking questions
Check if upload completed successfully
Try refreshing the page

For more troubleshooting, see DEV_GUIDE.md.

Contributing

This is a proof-of-concept project. Contributions are welcome for:

Bug fixes
Documentation improvements
Test coverage expansion
Additional visualization types
Performance optimizations

Please open an issue before submitting large changes.

License

MIT License - See LICENSE file for details.

Acknowledgments

Groq for fast LLM inference
Plotly for interactive visualizations
pandas and numpy for data processing
Flask for web framework

Built with ❤️ by Tech-Genkai

Proof-of-concept demonstrating AI-powered natural language to code generation for data analytics.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
backend		backend
data		data
frontend		frontend
samples		samples
screenshots		screenshots
.env.example		.env.example
DEV_GUIDE.md		DEV_GUIDE.md
README.md		README.md
logo.png		logo.png
start.py		start.py

Folders and files

Latest commit

History

Repository files navigation

NLytics

Screenshots

Table of Contents

Quick Start

Prerequisites

Installation

Features

🤖 AI-Powered Understanding

💻 Real Code Generation

🔒 Multi-Layer Security

📊 Smart Visualizations

🎯 Transparent Execution

Architecture

Pipeline Flow

Example Queries

Installation

System Requirements

Step-by-Step Setup

Usage

Chat Interface

Supported File Formats

REST API

Base URL

Quick Example

Available Endpoints

Response Format

Security

Multi-Layer Validation

Sandboxed Execution Environment

Configuration

Environment Variables

Execution Parameters

Testing

Run Tests

Test Coverage

Project Scope

What This Is

What This Is NOT

Design Philosophy

Design Decisions

Current Scope

Technical Stack

Project Structure

Troubleshooting

Common Issues

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages