Skip to content

Feature: Store full AI review data in metadata (checklist, image description) #2845

@MarkusNeusinger

Description

@MarkusNeusinger

Feature Request

Extend the metadata YAML files to store the complete AI review data, not just strengths/weaknesses.

Current State

plots/{spec-id}/metadata/{library}.yaml:

review:
  strengths:
    - "Clean code structure"
    - "Good use of alpha for overlapping points"
  weaknesses:
    - "Grid could be more subtle"

Proposed Extension

review:
  image_description: |
    The plot displays a scatter chart with blue (#306998) circular markers 
    showing the relationship between X and Y values. The title reads 
    "scatter-basic · matplotlib · pyplots.ai". Data points show positive 
    correlation with some scatter. Axes are labeled, grid is visible with 
    alpha=0.3. Overall clean layout with good whitespace.
  
  criteria_checklist:
    visual_quality:
      VQ-01_text_legibility:
        score: 10
        max: 10
        comment: "All text clearly readable at full size"
      VQ-02_no_overlap:
        score: 8
        max: 8
        comment: "No overlapping elements"
      VQ-03_element_visibility:
        score: 6
        max: 8
        comment: "Markers slightly small for data density"
      VQ-04_color_accessibility:
        score: 5
        max: 5
        comment: "Single color, colorblind-safe"
      VQ-05_layout_balance:
        score: 5
        max: 5
        comment: "Good proportions"
    
    spec_compliance:
      SC-01_plot_type:
        score: 8
        max: 8
        comment: "Correct scatter plot"
      SC-02_data_mapping:
        score: 5
        max: 5
        comment: "X/Y correctly assigned"
      SC-03_required_features:
        score: 5
        max: 5
        comment: "All spec features present"
      SC-06_title_format:
        score: 2
        max: 2
        comment: "Correct format: spec-id · library · pyplots.ai"
    
    data_quality:
      DQ-01_feature_coverage:
        score: 8
        max: 8
        comment: "Shows correlation pattern well"
      DQ-02_realistic_context:
        score: 7
        max: 7
        comment: "Plausible data scenario"
      DQ-03_appropriate_scale:
        score: 5
        max: 5
        comment: "Sensible axis ranges"
    
    code_quality:
      CQ-01_kiss_structure:
        score: 3
        max: 3
        comment: "No functions/classes, simple script"
      CQ-02_reproducibility:
        score: 3
        max: 3
        comment: "Fixed seed (np.random.seed(42))"
    
    library_features:
      LF-01_distinctive_features:
        score: 4
        max: 5
        comment: "Uses matplotlib idioms but could leverage more"
  
  strengths:
    - "Clean code structure"
    - "Good use of alpha for overlapping points"
  
  weaknesses:
    - "Grid could be more subtle"

Benefits

  1. Image Description

    • Proves AI actually looked at the image
    • Useful for accessibility (alt text generation)
    • Debugging: verify what AI "saw"
    • Could be shown on hover/detail view
  2. Criteria Checklist

    • Detailed scoring breakdown (not just total)
    • Per-criterion comments explain deductions
    • Helps identify patterns (which criteria fail most?)
    • Useful for regeneration (AI knows exactly what to fix)
    • Transparency for users
  3. Better Regeneration

    • AI can read previous detailed feedback
    • Target specific criteria that scored low
    • Preserve what worked (high-scoring criteria)

Implementation

1. Update prompts/quality-evaluator.md

Instruct AI to output structured checklist data to files:

echo '{...}' > review_checklist.json
echo "Image description text..." > review_image_description.txt

2. Update impl-review.yml

Parse and include in metadata update step.

3. Update prompts/templates/metadata.yaml

Add new fields to template.

Questions

  1. Store as nested YAML or as JSON string in YAML?
  2. Include in database sync or keep only in repo?
  3. Expose in API/frontend?

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions