Paste two SQL schemas (before/after). Migra tells you what will break, what data will be lost, and how risky the migration is — before you run it.
Backend
cd backend
pip install -r requirements.txt
cp ../.env.example .env # add your GROQ_API_KEY
python app.pyFrontend
cd frontend
npm install
npm startSQL Before + SQL After
↓
parser.py → parses SQL into AST-like Python dicts + computes diff
↓
rules/ → each rule evaluates the diff independently (pure functions)
↓
analyzer.py → collects flags, computes risk score (0–100)
↓
ai_layer.py → Groq explains findings in plain English (advisory only)
↓
app.py → Flask API returns full report
API Endpoints
| Method | Route | Description |
|---|---|---|
| POST | /analyze |
Run analysis, returns full report |
| GET | /analyses |
List past analyses |
| GET | /analyses/:id |
Full report for one analysis |
| GET | /health |
Liveness check |
Rule engine as pure functions. Every rule in rules/ is a pure function of the schema diff — no DB access, no API calls, no side effects. This makes each rule trivially testable in isolation and guarantees that adding a new rule requires creating one new file with zero changes to the rest of the codebase.
AI is advisory, never authoritative. The risk score is always computed by the deterministic rule engine. Groq provides plain-English explanation only. If the Groq call fails, the analysis still completes. This is enforced structurally — the AI layer is the last step and its output is stored in a separate field.
Normalized DB schema. Risk flags are stored in a separate table (not a JSON blob in the analysis row). This makes them independently queryable — e.g. "show me all HIGH severity flags across all analyses."
Validation at the boundary. Pydantic validates all incoming requests at the Flask route level. The pipeline never sees invalid input.
SQLite. Zero setup, ships with Python, sufficient for this use case. Swap DATABASE_URL for a Postgres connection string to migrate with no code changes.
cd backend
pytest tests/ -v31 tests covering every rule, the parser, and the full analysis pipeline.
Claude was used for:
- Generating boilerplate (Flask routes, SQLAlchemy models)
- Scaffolding test cases
- Writing rollback SQL generation logic
All generated code was reviewed against the constraints defined in claude.md before committing. The AI guidance file (claude.md) defines hard rules that constrain AI agent behavior — specifically preventing rules from touching the DB, preventing the AI layer from affecting the risk score, and enforcing the one-file-per-rule structure.
Groq (llama3-8b-8192) is used at runtime as an advisory explanation layer only.
Column rename detection is not implemented. Detecting renames requires heuristics (dropped column + added column with same type = probable rename). This is out of scope — the system correctly reports the drop and add as separate events instead.
Parser handles generic SQL only. Dialect-specific features (Postgres JSONB, MySQL ENUM, etc.) are not parsed. The parser is conservative — it will ignore lines it doesn't understand rather than misclassify them.
No authentication. This is a single-user developer tool. Production deployment would need API key auth on the analyze endpoint.
AI layer latency. The Groq call adds ~1–2s to analysis time. It is non-blocking in the error path but still synchronous in the happy path. A queue-based async approach would improve this for high-traffic use.
Adding a new rule:
- Create
backend/rules/your_rule.py, inheritBaseRule, implementevaluate(diff) -> list[RiskFlag] - Register in
backend/rules/__init__.py - Add tests in
backend/tests/test_rules.py - Nothing else changes
See claude.md for full constraints on how AI agents should extend this system.