We should have a way to signal differences between GTFS datasets that do not affect the outcome but that may still interest a user. These are non-material differences — the data is
semantically equivalent, but the raw text differs.
Examples:
- Column position changed
- Row position changed
- Numeric formatting differences — e.g.
45.5000 vs 45.5
- Quoting differences — e.g.
S1,Main St,Downtown vs S1,"Main St","Downtown" (both are valid CSV)
- Time leading zeros — e.g.
8:30:00 vs 08:30:00
- etc.
A key use case: if gtfs-diff deems two files semantically identical (e.g. stop_times.txt) but a raw string diff shows differences, we should be able to explain the discrepancy by surfacing which non-material differences were detected. This helps users understand why the diff engine reports no changes even though the files are not byte-for-byte identical.
We should have a way to signal differences between GTFS datasets that do not affect the outcome but that may still interest a user. These are
non-material differences— the data issemantically equivalent, but the raw text differs.
Examples:
45.5000vs45.5S1,Main St,DowntownvsS1,"Main St","Downtown"(both are valid CSV)8:30:00vs08:30:00A key use case: if
gtfs-diffdeems two files semantically identical (e.g.stop_times.txt) but a raw string diff shows differences, we should be able to explain the discrepancy by surfacing whichnon-material differenceswere detected. This helps users understand why the diff engine reports no changes even though the files are not byte-for-byte identical.