EvoLab democratizes ML for classrooms and businesses — autonomous AI agents write and evolve custom models from your data, delivering production-ready predictions without any expertise.
Built for YHack 2026.
EvoLab is an agentic AutoML platform where large language models don't just assist — they do the work. Upload a CSV, describe your goal, and a pipeline of specialized AI agents analyzes your data, generates Python model classes from scratch, trains them, scores them, and breeds better architectures across multiple generations of evolution.
No ML knowledge required. No feature engineering. No hyperparameter search. Just data and a goal.
CSV Upload → Planner → Data Agent → [Architecture → Training → Eval] × N generations → Best Model
| Agent | Role |
|---|---|
| Planner | Infers task type, decides population size, generations, and training subset. Recommends architecture families based on data type (image, tabular, sequence). |
| Data Agent | Imputation, label encoding, standard scaling, train/val split. Runs LLM analysis of data quality and class balance. |
| Architecture Agent | Generates model Python classes entirely from LLM output — no hardcoded templates. Mixes traditional ML (LightGBM, XGBoost, sklearn ensembles) with deep learning (PyTorch CNNs, MLPs, residual nets). CUDA-enabled. |
| Training Agent | exec()s generated code, fits model. On failure: sends error + code back to LLM for a one-shot fix and retries. |
| Eval Agent | Scores fitness, computes metrics. After each generation, runs a potential assessment — LLM estimates each architecture's ceiling on full data, giving underfit deep models a fair score (0.7 × current + 0.3 × potential). |
- Generation 0: LLM seeds a diverse initial population matched to the data type
- Subsequent generations: 50% exploitation (hypothesis-driven hyperparameter tuning of survivors) + 50% exploration (new architectures informed by what failed and what won)
- Full retrain: Winner is retrained on the complete dataset after evolution completes
- Subset training: Candidates train on a planner-chosen subset per generation for speed; final model uses all data
- LLM-generated models — all model code written by Gemini at runtime, no hardcoded architectures
- Evolutionary search — multi-generation exploitation/exploration with fitness-ranked breeding
- Potential-aware fitness — prevents classical models from dominating over underfit deep models
- Real-time logs — SSE log stream shows every agent decision as it happens
- Inference playground — form-based inference with type hints, dropdowns for categoricals, and an AI chat that converts natural language into predictions
- Shareable links — generate a public URL for anyone to run inference on your trained model
- Job discussion chat — ask the LLM to explain why certain models won, what the metrics mean, and what to try next
- Python / Flask — REST API with SSE log streaming
- Gemini 2.0 Flash via Lava AI Gateway — all LLM calls
- scikit-learn, LightGBM, XGBoost, CatBoost — classical ML
- PyTorch + torchvision — deep learning (CUDA-enabled)
- cloudpickle — serializes dynamically exec'd model classes
- React 18 + React Router — SPA
- Recharts — fitness evolution chart
- Server-Sent Events — live agent activity feed
- Python 3.12+
- Node.js 18+
- A Lava AI Gateway API key
cd backend
pip install -r requirements.txtCreate backend/.env:
LAVA_SECRET_KEY=aks_live_...
GEMINI_MODEL=gemini-2.0-flash # optional, this is the defaultpython app.py
# → http://localhost:5001cd frontend
npm install
npm start
# → http://localhost:3000| Method | Endpoint | Description |
|---|---|---|
POST |
/api/jobs |
Upload CSV and start a job |
GET |
/api/jobs |
List all jobs |
GET |
/api/jobs/:id |
Job status and results |
GET |
/api/jobs/:id/logs |
SSE live log stream |
POST |
/api/jobs/:id/predict |
Run inference |
POST |
/api/jobs/:id/chat |
Chat about the job |
POST |
/api/jobs/:id/chat-infer |
Natural language → inference |
POST |
/api/jobs/:id/share |
Generate public share token |
GET |
/api/share/:token |
Get shared job (public) |
POST |
/api/share/:token/predict |
Inference on shared job |
POST |
/api/preview |
Preview CSV columns |
curl -X POST http://localhost:5001/api/jobs/<job_id>/predict \
-H "Content-Type: application/json" \
-d '{"inputs": {"sqft": 2000, "bedrooms": 3, "bathrooms": 2}}'{
"prediction": "485000",
"raw_value": 485000.0,
"probabilities": null
}EvoLab/
├── backend/
│ ├── app.py # Flask API
│ ├── requirements.txt
│ ├── core/
│ │ ├── engine.py # Evolutionary orchestration loop
│ │ ├── llm.py # Gemini client (retry, logging)
│ │ ├── store.py # In-memory job store
│ │ ├── types.py # Job, ModelCandidate, EvolutionRound
│ │ └── error_log.py # File-based error logging → errors.log
│ └── agents/
│ ├── planner.py
│ ├── data_agent.py
│ ├── architecture_agent.py
│ ├── training_agent.py
│ └── eval_agent.py
└── frontend/
└── src/
├── pages/
│ ├── HomePage.js # Upload wizard
│ ├── JobPage.js # Evolution dashboard
│ ├── InferencePage.js # Inference playground + chat
│ └── SharePage.js # Public shared model
└── components/
├── LogStream.js # Live SSE log viewer
├── FitnessChart.js # Generation fitness chart
└── CandidateTable.js # Model leaderboard
Classification: 0.5 × accuracy + 0.3 × F1 + 0.2 × ROC-AUC
Regression: R² (clamped to [0, 1])
Potential adjustment (post-gen-0): 0.7 × raw_fitness + 0.3 × LLM_ceiling_estimate
MIT