-
-
Notifications
You must be signed in to change notification settings - Fork 73
Description
Is your feature request related to a problem? Please describe.
Current YAML card loading in scripts/ (primarily in convert.py and related scripts) uses yaml.safe_load without any schema validation or type checking.
This leads to:
- Silent failures or hard-to-debug crashes if cards miss required fields (e.g. title, description), have wrong types (e.g. mappings as string instead of dict), or contain extra/unknown fields.
- Poor error messages for contributors/translators (no clear indication of which field or card is invalid).
- Risk of invalid data propagating to the website/card browser, causing UI issues or incorrect threat modeling output.
- Difficulty maintaining consistency as new suits/editions or languages are added.
Describe the solution you'd like
Introduce Pydantic models to define and enforce a strict schema for card YAML structures at load time.
Key aspects:
- Create a base Card model with required/typed fields (title: str, description: str, mappings: Dict, etc.)
- Use Pydantic's ValidationError for clear, field-specific errors
- Integrate into convert.py (or central loading function) to validate before processing
- Start with common fields across suits, later extend for suit-specific ones
- Add unit tests for valid/invalid cases
Example:
from pydantic import BaseModel, Field, ValidationError
from typing import Dict, List, Optional
class CardMapping(BaseModel):
capec: Optional[str] = None
asvs: Optional[List[str]] = None
class Card(BaseModel):
title: str = Field(..., min_length=1)
description: str = Field(..., min_length=10)
mappings: Dict[str, CardMapping] = Field(default_factory=dict)
# suit-specific: Optional[str] = None
# In convert.py
try:
card_data = yaml.safe_load(file_content)
validated = Card(**card_data) # raises ValidationError if invalid
except ValidationError as e:
raise ValueError(f"Invalid card in {filename}: {e.errors()}")Benefits:
- Clear errors (e.g. "title required", "mappings must be dict")
- Enforces data integrity early
- Complements fix: explicitly use FAILSAFE_SCHEMA for yaml.load() security hardening #2406 YAML hardening with runtime checks
- Incremental: start with convert.py
Describe alternatives you've considered
-
Keep current yaml.safe_load + manual checks
→ Works for basics, but repetitive, misses nested validation, error messages unclear. -
Use jsonschema or cerberus
→ Good, but Pydantic offers better type hints, IDE support, error messages, and Pythonic API. -
Do nothing
→ Risks invalid data propagating to converter/website.
Additional context
- Builds directly on fix: explicitly use FAILSAFE_SCHEMA for yaml.load() security hardening #2406 (FAILSAFE_SCHEMA) by adding schema enforcement
- Low-risk: Pydantic is dev-only (Pipfile), no runtime impact on production
- Migration incremental: prototype in convert.py first + tests
- Aligns with OWASP data integrity goals for threat modeling cards
- Happy to make a PR and do further changes