Skip to content

feat: Add Pydantic models for stricter YAML card validation #2430

@khushal-winner

Description

@khushal-winner

Is your feature request related to a problem? Please describe.
Current YAML card loading in scripts/ (primarily in convert.py and related scripts) uses yaml.safe_load without any schema validation or type checking.

This leads to:

  • Silent failures or hard-to-debug crashes if cards miss required fields (e.g. title, description), have wrong types (e.g. mappings as string instead of dict), or contain extra/unknown fields.
  • Poor error messages for contributors/translators (no clear indication of which field or card is invalid).
  • Risk of invalid data propagating to the website/card browser, causing UI issues or incorrect threat modeling output.
  • Difficulty maintaining consistency as new suits/editions or languages are added.

Describe the solution you'd like
Introduce Pydantic models to define and enforce a strict schema for card YAML structures at load time.

Key aspects:

  • Create a base Card model with required/typed fields (title: str, description: str, mappings: Dict, etc.)
  • Use Pydantic's ValidationError for clear, field-specific errors
  • Integrate into convert.py (or central loading function) to validate before processing
  • Start with common fields across suits, later extend for suit-specific ones
  • Add unit tests for valid/invalid cases

Example:

from pydantic import BaseModel, Field, ValidationError
from typing import Dict, List, Optional

class CardMapping(BaseModel):
    capec: Optional[str] = None
    asvs: Optional[List[str]] = None

class Card(BaseModel):
    title: str = Field(..., min_length=1)
    description: str = Field(..., min_length=10)
    mappings: Dict[str, CardMapping] = Field(default_factory=dict)
    # suit-specific: Optional[str] = None

# In convert.py
try:
    card_data = yaml.safe_load(file_content)
    validated = Card(**card_data)  # raises ValidationError if invalid
except ValidationError as e:
    raise ValueError(f"Invalid card in {filename}: {e.errors()}")

Benefits:

Describe alternatives you've considered

  1. Keep current yaml.safe_load + manual checks
    → Works for basics, but repetitive, misses nested validation, error messages unclear.

  2. Use jsonschema or cerberus
    → Good, but Pydantic offers better type hints, IDE support, error messages, and Pythonic API.

  3. Do nothing
    → Risks invalid data propagating to converter/website.

Additional context

  • Builds directly on fix: explicitly use FAILSAFE_SCHEMA for yaml.load() security hardening #2406 (FAILSAFE_SCHEMA) by adding schema enforcement
  • Low-risk: Pydantic is dev-only (Pipfile), no runtime impact on production
  • Migration incremental: prototype in convert.py first + tests
  • Aligns with OWASP data integrity goals for threat modeling cards
  • Happy to make a PR and do further changes

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions