-
Notifications
You must be signed in to change notification settings - Fork 1
Constraints and Limitations
Shree Chaturvedi edited this page Jun 13, 2026
·
1 revision
This page lists constraints a new team should understand before extending or deploying the product.
- The workflow is intentionally phase-based. New major surfaces should fit upload, explorer, preprocessing, feature engineering, training, experiments, or deployment unless the phase model is deliberately redesigned.
- LLM-generated actions are advisory until users approve or execute them. Avoid adding hidden automation that mutates data without a review path.
- Preprocessing, feature engineering, and training assume notebook/workbook context. Keep generated code visible and recoverable.
- Deployment currently targets trained tabular classification/regression models with a known feature schema.
- Dataset uploads are optimized for CSV, JSON, and XLSX files.
- Dataset uploads default to
300MB; legacy.xlsis rejected. - CSV/JSON are parsed in memory; XLSX is stream-profiled and loaded into Postgres asynchronously, so it may be temporarily non-queryable.
- Dataset row paging defaults to
200rows and caps at1000. - Document upload is memory-backed and capped at
25MB. - Dataset profiling samples data for statistics and UI performance; do not assume every profile metric came from a full-table scan.
- SQL query execution is read-only SELECT/CTE only, rejects multiple statements and blocked application/system tables, appends a default
LIMIT 200, caps returned rows at1000, and uses a5000msstatement timeout by default. - Benchmark source data should not be stored under
backend/storage. - Prediction logs may contain feature values and should be treated as sensitive.
- Python execution and model serving require Docker.
- The default runtime network is
automl-sandbox, an internal Docker network. It improves isolation but blocks outbound network access from runtime containers. - Package installation inside runtime containers may fail without network access. Use
EXECUTION_NETWORK=bridgeonly when intentionally allowing internet access. - Resource limits are controlled by
EXECUTION_TIMEOUT_MS,EXECUTION_MAX_MEMORY_MB,EXECUTION_MAX_CPU_PERCENT, andEXECUTION_TMPFS_MB. - Docker permissions, image build failures, and host resource pressure are common local setup blockers.
- The isolated Docker network can fall back to
bridgeif creation fails; treat that as an operational hardening item before production.
- Database-backed auth, notebooks, workflows, embeddings, deployments, and query cache require
DATABASE_URL. - Migrations must run before serving backend code that expects new tables/columns.
- File-backed artifacts and Postgres metadata must stay aligned; deleting files manually can orphan database rows.
- Local
npm run devmanages a Postgres container, but production environments must provide durable Postgres and artifact storage.
- LLM workflows require
OPENAI_API_KEYor compatible provider configuration. - One active workflow per project/phase is allowed; a run older than 10 minutes is considered stale.
- Workflow turns cap at 48 iterations, 10 calls per single tool, and 5 identical calls.
- Model latency, rate limits, token budgets, and provider outages affect preprocessing, feature engineering, training, and NL-to-SQL UX.
- LLM outputs need validation and user review; generated SQL/code should not be trusted without execution safeguards and approval gates.
- Timeouts are configured separately for general LLM, preprocessing, thinking, and NL-to-SQL flows.
- Model comparison requires 2 to 5 models.
- Tuning allows 1 to 200 trials.
- Active deployments are capped at 5 per project.
- Prediction proxy traffic is rate-limited to 60 requests per minute per deployment.
- Production must not use local default
JWT_SECRETor placeholder SMTP/LLM settings. - Keep
.env, uploaded data, generated model artifacts, and prediction logs out of public artifacts. - All new data-bearing routes should apply auth and project/deployment ownership middleware.
- Protected API routes require Bearer JWT and verified email unless non-production
DEV_BYPASS_EMAIL_VERIFICATION=trueis intentionally enabled. - Deployment prediction routes use API key auth and rate limiting; preserve those controls when extending serving behavior.
- GitLab CI runs install, lint, build, and test jobs for code branches and merge requests.
- Landing Vercel deploy jobs require Vercel environment variables and are scoped by branch/rules.
- Heavy benchmark and eval commands are documented as local/manual verification and are not currently required in every pipeline job.
- Align frontend phase names and backend workflow phase names explicitly before adding new phases.
- Keep API contracts close to route source so wiki/docs do not drift.
- Decide whether stale documented endpoints should be implemented or removed from repo docs during future maintenance.
- Review migration ownership/FK consistency when adding tables; newer and older tables use a mix of text IDs and UUID references.
- Keep Docker/network, database migrations, SMTP, OpenAI credentials, and persistent storage paths in the production readiness checklist.