Problem
The codebase uses three near-invisible separator characters with a strict, undocumented semantic layering, converted at parse boundaries. They are indistinguishable in diffs and in most editors, yet a wrong-separator edit is correctness-critical and would pass code review by eye:
| Char |
Codepoint |
Layer / meaning |
Example |
⁚ |
U+205A (TWO DOT PUNCTUATION) |
stdlib model-name prefix |
stdlib⁚delay1 |
· |
U+00B7 (MIDDLE DOT) |
compile-time AST module·output |
module·output |
. |
ASCII U+002E |
datamodel-layer module.port |
module.port |
canonicalize() (src/simlin-engine/src/common.rs:328, conversion at common.rs:352) rewrites ASCII . (outside quotes) to · (U+00B7) at parse time, so the datamodel layer uses . and everything downstream of canonicalization uses ·. The stdlib⁚ prefix (U+205A) is a third, separate namespace separator for stdlib/macro module names.
The Vensim macro epic (PR #564, branch macros, merge-base 86cc7fcb) added a third semantic dependency onto this scheme: it is now load-bearing for two subsystems — stdlib modules and macros — not just stdlib modules.
These literals appear across ~20 source files under src/simlin-engine/src/ (e.g. common.rs, db_ltm.rs, db_analysis.rs, ltm_augment.rs, model.rs, module_functions.rs, variable.rs, plus test files). The owning conversion point is a single function (common.rs::canonicalize), but the raw ·/⁚ literals are hand-written at many call sites and in test assertions.
Why it matters
- Correctness, undetectable by review: U+205A, U+00B7, and ASCII
. are visually identical-ish in nearly all editors and in git diff. A cross-layer copy-paste (e.g. pasting an AST-layer module·output string into a datamodel-layer code path, or hand-typing · where ⁚ is required) produces a wrong identifier that neither a human reviewer nor the compiler will flag — it just silently fails to match or matches the wrong thing.
- Increasing blast radius: the macro epic made this scheme load-bearing for a second subsystem, so the cost of a wrong-separator regression is now higher and spans two features.
- No structural guard exists today: nothing prevents a raw
·/⁚ literal from being introduced in the wrong layer.
Components affected
src/simlin-engine/src/common.rs (canonicalize, line 328 / conversion at line 352 — the one place that owns the . -> · boundary)
- The stdlib-module / macro layer that owns
stdlib⁚... (U+205A) name construction (src/simlin-engine/src/module_functions.rs, model.rs)
- ~20 files under
src/simlin-engine/src/ that hand-write · / ⁚ literals (LTM synthetic-name construction in db_ltm.rs / db_analysis.rs / ltm_augment.rs is a heavy user)
Suggested fix
- Introduce named constants / newtypes that make the separator layer explicit at the type level (e.g.
const STDLIB_PREFIX_SEP: char = '\u{205A}';, const AST_MODULE_SEP: char = '\u{00B7}';, or — stronger — distinct wrapper types so an AST-layer identifier cannot be passed where a datamodel-layer one is expected). Replace the hand-written literals at call sites with these names so the intended layer is legible in the source and in review.
- Add a guard test asserting that raw
\u{205A} / \u{00B7} literals do not appear in src/simlin-engine/src/ outside the single module that owns each separator (a source-scanning unit test, analogous to existing repo lint-style guards). This makes a wrong-layer literal a hard CI failure instead of an invisible correctness bug.
Context
Identified during the Vensim macro support epic retrospective (PR #564, branch macros). The epic added the second subsystem dependency onto this separator scheme, which is what elevated a pre-existing latent hazard into something worth a structural guard. Not introduced by a single commit — this is the accreted state of the canonicalization/stdlib/macro naming scheme.
Problem
The codebase uses three near-invisible separator characters with a strict, undocumented semantic layering, converted at parse boundaries. They are indistinguishable in diffs and in most editors, yet a wrong-separator edit is correctness-critical and would pass code review by eye:
⁚stdlib⁚delay1·module·outputmodule·output.module.portmodule.portcanonicalize()(src/simlin-engine/src/common.rs:328, conversion atcommon.rs:352) rewrites ASCII.(outside quotes) to·(U+00B7) at parse time, so the datamodel layer uses.and everything downstream of canonicalization uses·. Thestdlib⁚prefix (U+205A) is a third, separate namespace separator for stdlib/macro module names.The Vensim macro epic (PR #564, branch
macros, merge-base86cc7fcb) added a third semantic dependency onto this scheme: it is now load-bearing for two subsystems — stdlib modules and macros — not just stdlib modules.These literals appear across ~20 source files under
src/simlin-engine/src/(e.g.common.rs,db_ltm.rs,db_analysis.rs,ltm_augment.rs,model.rs,module_functions.rs,variable.rs, plus test files). The owning conversion point is a single function (common.rs::canonicalize), but the raw·/⁚literals are hand-written at many call sites and in test assertions.Why it matters
.are visually identical-ish in nearly all editors and ingit diff. A cross-layer copy-paste (e.g. pasting an AST-layermodule·outputstring into a datamodel-layer code path, or hand-typing·where⁚is required) produces a wrong identifier that neither a human reviewer nor the compiler will flag — it just silently fails to match or matches the wrong thing.·/⁚literal from being introduced in the wrong layer.Components affected
src/simlin-engine/src/common.rs(canonicalize, line 328 / conversion at line 352 — the one place that owns the.->·boundary)stdlib⁚...(U+205A) name construction (src/simlin-engine/src/module_functions.rs,model.rs)src/simlin-engine/src/that hand-write·/⁚literals (LTM synthetic-name construction indb_ltm.rs/db_analysis.rs/ltm_augment.rsis a heavy user)Suggested fix
const STDLIB_PREFIX_SEP: char = '\u{205A}';,const AST_MODULE_SEP: char = '\u{00B7}';, or — stronger — distinct wrapper types so an AST-layer identifier cannot be passed where a datamodel-layer one is expected). Replace the hand-written literals at call sites with these names so the intended layer is legible in the source and in review.\u{205A}/\u{00B7}literals do not appear insrc/simlin-engine/src/outside the single module that owns each separator (a source-scanning unit test, analogous to existing repo lint-style guards). This makes a wrong-layer literal a hard CI failure instead of an invisible correctness bug.Context
Identified during the Vensim macro support epic retrospective (PR #564, branch
macros). The epic added the second subsystem dependency onto this separator scheme, which is what elevated a pre-existing latent hazard into something worth a structural guard. Not introduced by a single commit — this is the accreted state of the canonicalization/stdlib/macro naming scheme.