A production-ready AI-powered data analysis platform built with Next.js, LangChain, LangGraph, DeepAgents, and Google Gemini. Upload CSV files and get instant exploratory data analysis, visualizations, and expert guidance from an AI senior data scientist.
- Automatic EDA: Comprehensive exploratory data analysis with statistics, correlations, and data quality checks
- Smart Visualizations: AI-generated plots (histograms, bar charts, scatter plots) tailored to your data
- Expert Chat Interface: Interactive chat with an AI data scientist powered by Google Gemini
- LangGraph Workflows: Multi-step EDA orchestration with state management
- Persistent Storage: Neon Postgres for datasets, EDA results, and chat history
- Real-time Streaming: Live updates during analysis and chat responses
- Frontend: Next.js 14, React, TypeScript, Tailwind CSS, Recharts
- AI/ML: LangChain, LangGraph, DeepAgents, Google Gemini (via @langchain/google-genai)
- Database: Neon Postgres with Drizzle ORM
- Data Processing: PapaParse (CSV), simple-statistics
-
Neon Database
- Sign up at neon.tech
- Create a new project and get your connection string
-
Google AI Studio API Key
- Visit Google AI Studio
- Create an API key for Gemini
- Clone and navigate to the project:
cd data-scientist-copilot- Install dependencies:
npm install --legacy-peer-deps- Set up environment variables:
cp .env.example .env.localEdit .env.local and add your credentials:
DATABASE_URL=your_neon_connection_string
GOOGLE_API_KEY=your_google_ai_studio_api_key- Generate and push database schema:
npx drizzle-kit generate
npx drizzle-kit push- Run the development server:
npm run devOpen http://localhost:3000 to see the application.
- Upload Dataset: Upload a CSV file (up to 50MB)
- Wait for EDA: Automatic analysis runs in the background
- Explore Results: View statistics, correlations, and default visualizations
- Chat with AI: Ask questions about your data or request custom plots
- "What are the key characteristics of this dataset?"
- "Show me a histogram of the price column"
- "What columns have the most missing values?"
- "Create a scatter plot of age vs income"
- "How should I approach building a churn prediction model?"
- "What feature engineering steps would you recommend?"
START → load_dataset → compute_stats → generate_plots → generate_summary → save_results → END
get_dataset_info: Fetch dataset metadata and EDA resultsget_column_stats: Get detailed statistics for specific columnsgenerate_plot: Create visualizations on demandrecommend_next_steps: Get analysis recommendations
datasets: Dataset metadata and sample dataeda_runs: EDA results, correlations, and plotschat_sessions: Chat session managementchat_messages: Message historycheckpoints: LangGraph state persistence
data-scientist-copilot/
├── app/
│ ├── api/
│ │ ├── upload/ # CSV upload endpoint
│ │ ├── datasets/[id]/ # Dataset info endpoint
│ │ └── chat/ # Chat endpoint
│ ├── datasets/[id]/ # Dataset detail page
│ └── page.tsx # Home/upload page
├── components/
│ ├── upload-panel.tsx # File upload UI
│ ├── eda-summary.tsx # EDA results display
│ ├── plot-panel.tsx # Chart rendering
│ └── chat-panel.tsx # Chat interface
├── lib/
│ ├── agents/
│ │ ├── eda-graph.ts # LangGraph EDA workflow
│ │ ├── chat-agent.ts # Chat agent with tools
│ │ └── tools.ts # LangChain tools
│ ├── data/
│ │ ├── profiler.ts # Data profiling logic
│ │ └── types.ts # TypeScript types
│ └── db/
│ ├── schema.ts # Drizzle schema
│ └── index.ts # Database client
└── drizzle.config.ts # Drizzle configuration
- Define tool in
lib/agents/tools.ts - Add to
allToolsarray - Update agent system prompt in
lib/agents/chat-agent.ts
- Extend
PlotConfigtype inlib/data/types.ts - Add generation logic in
lib/agents/tools.ts(generatePlot) - Add rendering in
components/plot-panel.tsx
- Push to GitHub
- Import project in Vercel
- Add environment variables
- Deploy
DATABASE_URL=your_neon_connection_string
GOOGLE_API_KEY=your_google_ai_studio_api_key
NODE_ENV=production- Verify your Neon connection string
- Ensure database schema is pushed:
npx drizzle-kit push
- Check API key is valid
- Verify API quotas in Google AI Studio
- Ensure file is valid CSV with headers
- Check file size (max 50MB)
- Verify column names don't have special characters
MIT
Contributions welcome! Please open an issue or PR.
- Built with LangChain
- Powered by Google Gemini
- Database by Neon
- Framework by Next.js