Skip to content

Add "non-material differences" to the schema #12

@jcpitre

Description

@jcpitre

We should have a way to signal differences between GTFS datasets that do not affect the outcome but that may still interest a user. These are non-material differences — the data is
semantically equivalent, but the raw text differs.

Examples:

  • Column position changed
  • Row position changed
  • Numeric formatting differences — e.g. 45.5000 vs 45.5
  • Quoting differences — e.g. S1,Main St,Downtown vs S1,"Main St","Downtown" (both are valid CSV)
  • Time leading zeros — e.g. 8:30:00 vs 08:30:00
  • etc.

A key use case: if gtfs-diff deems two files semantically identical (e.g. stop_times.txt) but a raw string diff shows differences, we should be able to explain the discrepancy by surfacing which non-material differences were detected. This helps users understand why the diff engine reports no changes even though the files are not byte-for-byte identical.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions