Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 27 additions & 24 deletions prompts/library/altair.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,16 @@ import altair as alt

The saved PNG must be **exactly** one of these two sizes (post-render gate in `impl-review.yml` rejects anything off by more than 16 px and re-triggers repair):

| Orientation | View dims | scale_factor | Final PNG |
|-------------|----------------|--------------|---------------|
| Landscape | width=800, height=450 | 4.0 | 3200 × 1800 |
| Square | width=600, height=600 | 4.0 | 2400 × 2400 |
| Orientation | Inner view dims | scale_factor | Final PNG |
|-------------|------------------------|--------------|---------------|
| Landscape | width=620, height=320 | 4.0 | → fits in 3200 × 1800 with title+legend padding |
| Square | width=500, height=460 | 4.0 | → fits in 2400 × 2400 with title+legend padding |

Altair's **view dimensions are NOT the saved PNG dimensions.** `vl-convert` pads the view with title, axis-title, axis-tick-label, and legend extents *outside* `width`/`height`, so a chart with `width=800, height=450` and a long title easily saves at 3404×2120 or 4036×2052. This is the documented cause of every altair drift in the May 2026 fan-out.
**Altair's view dimensions are NOT the saved PNG dimensions.** `vl-convert` pads the view with title, axis title, axis tick labels, and legend extents *outside* the chart's `width` / `height`, so the SAVED file is always **larger** than `(width * scale_factor, height * scale_factor)`. In the May 2026 fan-out, `width=800, height=450` (which naively maps to 3200×1800) actually saved at 3404×2120 or 4036×2052 because the title and legend added 100–200 px of padding on each side.

**Two-part fix, both required:**
**The fix is to size the inner view smaller** so vl-convert's extra padding still leaves the final PNG within the target. The values above (`620×320` for landscape, `500×460` for square) leave roughly 360–500 px of width and 400–500 px of height available for the title bar, axis labels, and right-side legend. Tune within ±20 px if your chart genuinely needs more room, but **stay under 800×450 / 600×600** — the old defaults are guaranteed to overshoot.

1. **Constrain the view** with `configure_view(continuousWidth=…, continuousHeight=…)` and **zero out chart padding** with `.properties(padding={"left":0, "right":0, "top":0, "bottom":0})`. This stops most of the inflation but does not guarantee exact dims when titles or legends are present.

2. **Normalize the saved PNG to exact target dims as the final step** of the implementation. Even with (1), vl-convert can over- or under-shoot by tens of pixels. After `chart.save(...)`, crop (centered) or pad (with `PAGE_BG`) so the saved file lands on the canonical target. Keep this as **inline code** — no helper function (CQ-01 KISS forbids functions/classes in plot impls):
**Then PAD the saved PNG up to exact target dims.** Even with the smaller view, vl-convert can land slightly short. After `chart.save(...)`, pad the canvas with `PAGE_BG` to land exactly on the canonical target. **Do NOT crop.** Cropping silently destroys title/axis-label content at the edges (which the gate cannot detect — it only checks pixel count) and the AI review will flag the missing text as severe edge-clipping (AR-09 auto-reject):

```python
# Right after chart.save(f'plot-{THEME}.png', scale_factor=4.0):
Expand All @@ -30,16 +28,21 @@ from PIL import Image
TW, TH = 3200, 1800 # or (2400, 2400) for square
_img = Image.open(f'plot-{THEME}.png').convert('RGB')
_w, _h = _img.size
if _w > TW or _h > TH: # crop excess centred
_l = max((_w - TW) // 2, 0)
_t = max((_h - TH) // 2, 0)
_img = _img.crop((_l, _t, _l + min(_w, TW), _t + min(_h, TH)))
_w, _h = _img.size
if _w < TW or _h < TH: # pad shortfall with PAGE_BG
if _w > TW or _h > TH:
# vl-convert overshot the inner-view target. This is a real bug in the
# chart definition (probably oversized title fontSize, oversized legend,
# or width/height too large). Fail loudly so impl-repair triggers — do
# NOT crop, because cropping clips title/axis labels and the AR-09 edge-
# clipping auto-reject will catch it anyway.
raise SystemExit(
f"altair vl-convert produced {_w}×{_h}, exceeds target {TW}×{TH}. "
f"Shrink chart .properties(width=, height=) values and re-render."
)
if _w < TW or _h < TH:
# PAD-only: centre the chart on the target canvas with PAGE_BG fill.
_canvas = Image.new('RGB', (TW, TH), PAGE_BG)
_canvas.paste(_img, ((TW - _w) // 2, (TH - _h) // 2))
_img = _canvas
_img.save(f'plot-{THEME}.png')
_canvas.save(f'plot-{THEME}.png')
```

The HTML save is left untouched — only PNGs are gated.
Expand All @@ -49,13 +52,13 @@ chart = alt.Chart(df).mark_point(size=60).encode( # size ~2-3x default, density
x='col_x:Q',
y='col_y:Q'
).properties(
width=800,
height=450,
width=620, # see Canvas table — landscape inner-view
height=320,
padding={"left": 0, "right": 0, "top": 0, "bottom": 0},
title=alt.Title(title, fontSize=16) # kept compact for the long mandated title
).configure_view(
continuousWidth=800,
continuousHeight=450,
continuousWidth=620,
continuousHeight=320,
).configure_axis(
labelFontSize=10,
titleFontSize=12
Expand Down Expand Up @@ -99,8 +102,8 @@ x='date:T'
```python
# Hard target: 3200 × 1800 (landscape) or 2400 × 2400 (square). See "Canvas" above.
chart.save(f'plot-{THEME}.png', scale_factor=4.0)
# Follow with the inline crop/pad-to-target block from the "Canvas" section above
# (no helper functionKISS).
# Then apply the inline PAD-only-to-target block from the "Canvas" section above.
# Do NOT cropcropping would clip title/axis labels and trigger AR-09.
```

**Note**: Requires `vl-convert-python` for PNG export.
Expand Down Expand Up @@ -166,7 +169,7 @@ chart = (
)

chart.save(f'plot-{THEME}.png', scale_factor=4.0)
# Apply the inline crop/pad-to-(3200,1800)-or-(2400,2400) block from the
# Apply the inline PAD-only-to-(3200,1800)-or-(2400,2400) block from the
# "Canvas" section above — MANDATORY, no helper function (KISS / CQ-01).
chart.save(f'plot-{THEME}.html')
```
Expand Down
32 changes: 28 additions & 4 deletions prompts/quality-criteria.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@ Implementation
┌─────────────────────┐
│ Stage 1: Auto-Reject │ ──► FAIL → Score = 0, regenerate
│ (8 quick checks) │
│ Stage 1: Auto-Reject │ ──► FAIL → Score = 0
│ (9 checks) │ AR-01..AR-05, AR-07: regenerate (workflow)
│ │ AR-06, AR-08, AR-09: repair via review cascade (AI)
└─────────────────────┘
│ PASS
Expand All @@ -32,7 +33,7 @@ Implementation

## Stage 1: Auto-Reject

Quick checks **before** AI evaluation. On fail: Score=0, no retry.
Checks that gate quality scoring. On fail: Score=0. Workflow-handled checks (AR-01..AR-05, AR-07) reject without retry — regenerate the whole impl. AI-handled checks (AR-06, AR-08, AR-09) set score=0 inside the review and enter the existing 5-review / 4-repair cascade.

| ID | Check | Description | Verification |
|----|-------|-------------|--------------|
Expand All @@ -44,8 +45,9 @@ Quick checks **before** AI evaluation. On fail: Score=0, no retry.
| AR-06 | NOT_FEASIBLE | Library cannot implement spec | AI decision |
| AR-07 | WRONG_FORMAT | Wrong output type | Not .png for static libraries |
| AR-08 | FAKE_FUNCTIONALITY | Static library simulates interactive features | AI decision |
| AR-09 | EDGE_CLIPPING | Title / axis label / legend clipped at canvas border | AI decision (visual) |

**Check order:** AR-01 → AR-02 → AR-03 → AR-04 → AR-05 → AR-06 → AR-07 → AR-08
**Check order:** AR-01 → AR-02 → AR-03 → AR-04 → AR-05 → AR-06 → AR-07 → AR-08 → AR-09

### AR-05: Library Usage

Expand Down Expand Up @@ -84,6 +86,28 @@ A static library (matplotlib, seaborn, plotnine) simulates interactive features
- Color encoding of time direction (arrows, gradients showing progression)
- Honest notes like "See Plotly version for interactive features"

### AR-09: Edge Clipping

Any text element — title, axis title, axis tick labels, legend, annotations — is **clipped at the canvas border**, meaning visible pixels of the element are missing because they were rendered outside the saved PNG's bounding box and chopped off.

This is distinct from VQ-05's soft "no overflow" check (deducts when text leaves its *axis* but stays on the canvas). AR-09 rejects outright when pixels are missing at the *canvas* edge. The post-render canvas-size gate enforces dimensions but cannot see what is at those edges — that's the reviewer's job.

**Triggers (auto-reject, Score = 0):**
- Title cropped at top edge (top of letters cut, descenders missing, title not fully visible above the plot area)
- Y-axis tick labels missing leftmost digit because they touch the left canvas edge ("500" rendered as "00")
- X-axis label cut at bottom edge (axis title only half-visible at the canvas bottom)
- Legend entries hidden behind / merged into the canvas edge
- Any annotation, label, or category text whose bounding box is partially outside the saved PNG

**NOT auto-reject (legitimate / handled by VQ-05 instead):**
- Tooltips or hover affordances drawn intentionally near the edge
- Decorative gridlines or borders aligned with the canvas edge
- Text that overflows its *axis bounds* but stays fully within the canvas — that's a VQ-05 deduction at most, not AR-09
- Tight-but-readable margins: every pixel of the text is visible, just close to the edge
- Touching the border without missing pixels (proximity ≠ clipping)

The bar is strict: AR-09 requires evidence that pixels were *removed*, not merely that an element sits near the boundary.

---

## Allowed Image Formats
Expand Down
24 changes: 22 additions & 2 deletions prompts/quality-evaluator.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,17 @@ If any fail: Score = 0, no AI review needed.
Before scoring, check for:
- **AR-06: NOT_FEASIBLE** — Library cannot implement the spec
- **AR-08: FAKE_FUNCTIONALITY** — Static library simulates interactive features
- **AR-09: EDGE_CLIPPING** — Title, axis label, legend, or other text is clipped at the canvas border

Comment on lines 25 to 29
**AR-08 triggers:** Simulated tooltips, simulated selection/hover state, simulated UI controls, code comments containing "simulating hover/click/interactivity."

**AR-08 exceptions (NOT auto-reject):** Small multiples for animation, cell annotations in heatmaps, color encoding of time direction, honest notes about interactive alternatives.

If AR-06 or AR-08 triggers: Score = 0, recommendation = "reject", include `auto_reject` field in output.
**AR-09 triggers:** Title cropped at top edge of canvas, axis tick labels missing their leftmost/rightmost character because they touch the canvas edge, x-axis label cut at bottom edge, legend hidden behind canvas border, or any text whose bounding box is partially outside the saved PNG. This is the catalog's most visible failure mode — a chart that publishes with chopped-off text is broken to every viewer. AR-09 is distinct from the soft VQ-05 "no overflow" check: VQ-05 deducts when text leaves its axis but stays on the canvas; AR-09 fires when pixels are actually clipped at the canvas border. The post-render canvas-size gate enforces dimensions but cannot see what is at those edges — that's the reviewer's job.

**AR-09 exceptions (NOT auto-reject):** Tooltips or hover affordances, decorative gridlines or borders aligned with the canvas edge, text that overflows its axis but remains fully within the canvas, tight-but-readable margins.

If AR-06, AR-08, or AR-09 triggers: Score = 0, recommendation = "reject", include `auto_reject` field in output identifying which AR fired and (for AR-09) which element on which edge.

### Stage 2: Quality (your task)

Expand Down Expand Up @@ -161,7 +166,7 @@ You evaluate implementations that passed all auto-reject checks. Focus purely on

## Evaluation Process

### Step 0: Check for Fake Functionality (AR-08)
### Step 0a: Check for Fake Functionality (AR-08)

**For static libraries (matplotlib, seaborn, plotnine, ggplot2) only:**

Expand All @@ -172,6 +177,21 @@ Scan the code and image for:

If found: `auto_reject: "AR-08"`, score = 0, stop evaluation.

### Step 0b: Check for Edge Clipping (AR-09)

**For all libraries.** Inspect both `plot-light.png` and `plot-dark.png` along all four canvas borders. Trigger AR-09 only when visible pixels of an element are **actually missing** because the element was rendered partially outside the saved PNG. Proximity, touching, or tight margins are NOT AR-09 — only chopped pixels are.

Concrete triggers (each is sufficient on its own):
- Plot title cropped at the top edge (top of letters cut, descenders missing).
- Y-axis tick labels missing leftmost digit/character because they overflow the left canvas edge (e.g. "500" rendered as "00").
- X-axis label cut at the bottom edge.
- Legend entries hidden behind / merged into the canvas edge with letters chopped off.
- Any annotation or category label whose bounding box is partially outside the saved PNG.

If found: `auto_reject: "AR-09"`, score = 0, stop evaluation. Identify which element on which edge (e.g. `"title clipped at top edge of light render"`) in the rejection note so the repair step knows where to shrink.

Not AR-09 (handle via VQ-05 instead): text overflowing its axis but staying on the canvas, decorative borders aligned with the edge, tooltips, tight-but-readable margins where every pixel of the text is visible.

### Step 1: Visual Quality (30 pts)

**Inspect BOTH `plot-light.png` AND `plot-dark.png`.** The data colors (Okabe-Ito positions 1–7) must be identical across themes; only chrome (background, text, grid, legend frame) flips. If only one render is provided, that is a pipeline failure — flag in `weaknesses`, score VQ-07 accordingly.
Expand Down
25 changes: 23 additions & 2 deletions prompts/workflow-prompts/ai-quality-review.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,9 +104,9 @@ Visually estimate from each PNG — no pixel measurement needed. These are soft
- "Title spans ~80% of width at fontsize=14pt." → Expected for the long mandated anyplot title; no deduction.
- "Y-axis label 'Fläche von Häusern in Quadratmetern' takes ~40% of axis length at fontsize=12pt." → Genuinely long label at sensible fontsize; no deduction as long as it doesn't overflow the axis.

### 6. Check for Auto-Reject (AR-08)
### 6. Check for Auto-Reject (AR-08, AR-09)

**For static libraries (matplotlib, seaborn, plotnine, ggplot2) only:**
**AR-08 — Fake interactivity (static libraries only — matplotlib, seaborn, plotnine, ggplot2):**

Before scoring, check if the implementation fakes interactive features:
- Simulated tooltips (annotation boxes styled as hover tooltips)
Expand All @@ -116,6 +116,27 @@ Before scoring, check if the implementation fakes interactive features:

If found: Score = 0, verdict = REJECTED, note AR-08 violation.

**AR-09 — Edge clipping (all libraries):**

Inspect both renders for any title, axis tick label, axis title, legend, or annotation that has **visible pixels chopped off at the canvas border** — i.e. the element was rendered partially outside the saved PNG's bounding box and the missing pixels are gone for good. This is the single most embarrassing failure mode for the catalog: a chart with chopped-off text publishes broken into the gallery.

**Strict definition:** AR-09 fires only when **pixels of the element are actually missing**. Proximity to the border, touching the border, or being rendered right up against the edge with all pixels visible is **not** AR-09 — that's at most a VQ-05 deduction. The bar is evidence of chopped content, not crowded margins.

Trigger AR-09 if you see ANY of:
- **Title cropped at the top edge** — top of letters cut off, descenders missing, or title not fully visible above the plot area.
- **Y-axis tick labels missing their leftmost digit/character** because the label extends past the left canvas edge (e.g. "500" rendered as "00", "1,000" as ",000").
- **X-axis label cut at the bottom edge** — axis title only partially visible, descender row chopped.
- **Legend entries hidden behind / merged into the canvas edge** with letters chopped off.
- **Any annotation, label, or category text whose bounding box is partially outside the saved PNG** so part of the text is gone.

If found: **Score = 0, verdict = REJECTED, note AR-09 violation** and identify which element(s) were clipped and on which edge (e.g. "title clipped at top edge of light render — top ~10 px of letters missing"). Repair will receive this and shrink the inner-chart dims so vl-convert / matplotlib / etc. don't push content off the canvas.

**False-positive guard — do NOT trigger AR-09 for:**
- Text that extends past the plot/axis bounds but stays *within* the canvas (VQ-05 deduction at most).
- Tooltips, legend swatches, or grid lines aligned with the canvas border by design.
- Tight-but-readable margins where every pixel of the text is visible — proximity ≠ clipping.
- Touching the border without any missing pixels — touching ≠ chopped.

### 7. Evaluate Using 6-Category Criteria

Read `prompts/quality-criteria.md` and evaluate:
Expand Down
Loading