engine: three visually-indistinguishable separator chars (U+205A / U+00B7 / ASCII .) are load-bearing across stdlib + macro subsystems with no type-level or lint guard

## Problem

The codebase uses **three near-invisible separator characters** with a strict, undocumented semantic layering, converted at parse boundaries. They are indistinguishable in diffs and in most editors, yet a wrong-separator edit is correctness-critical and would pass code review by eye:

| Char | Codepoint | Layer / meaning | Example |
|---|---|---|---|
| `⁚` | U+205A (TWO DOT PUNCTUATION) | stdlib model-name prefix | `stdlib⁚delay1` |
| `·` | U+00B7 (MIDDLE DOT) | compile-time AST `module·output` | `module·output` |
| `.` | ASCII U+002E | datamodel-layer `module.port` | `module.port` |

`canonicalize()` (`src/simlin-engine/src/common.rs:328`, conversion at `common.rs:352`) rewrites ASCII `.` (outside quotes) to `·` (U+00B7) at parse time, so the datamodel layer uses `.` and everything downstream of canonicalization uses `·`. The `stdlib⁚` prefix (U+205A) is a third, separate namespace separator for stdlib/macro module names.

The Vensim macro epic (PR #564, branch `macros`, merge-base `86cc7fcb`) added a **third semantic dependency** onto this scheme: it is now load-bearing for **two subsystems** — stdlib modules *and* macros — not just stdlib modules.

These literals appear across ~20 source files under `src/simlin-engine/src/` (e.g. `common.rs`, `db_ltm.rs`, `db_analysis.rs`, `ltm_augment.rs`, `model.rs`, `module_functions.rs`, `variable.rs`, plus test files). The owning conversion point is a single function (`common.rs::canonicalize`), but the raw `·`/`⁚` literals are hand-written at many call sites and in test assertions.

## Why it matters

- **Correctness, undetectable by review**: U+205A, U+00B7, and ASCII `.` are visually identical-ish in nearly all editors and in `git diff`. A cross-layer copy-paste (e.g. pasting an AST-layer `module·output` string into a datamodel-layer code path, or hand-typing `·` where `⁚` is required) produces a wrong identifier that neither a human reviewer nor the compiler will flag — it just silently fails to match or matches the wrong thing.
- **Increasing blast radius**: the macro epic made this scheme load-bearing for a second subsystem, so the cost of a wrong-separator regression is now higher and spans two features.
- **No structural guard exists today**: nothing prevents a raw `·`/`⁚` literal from being introduced in the wrong layer.

## Components affected

- `src/simlin-engine/src/common.rs` (`canonicalize`, line 328 / conversion at line 352 — the one place that owns the `.` -> `·` boundary)
- The stdlib-module / macro layer that owns `stdlib⁚...` (U+205A) name construction (`src/simlin-engine/src/module_functions.rs`, `model.rs`)
- ~20 files under `src/simlin-engine/src/` that hand-write `·` / `⁚` literals (LTM synthetic-name construction in `db_ltm.rs` / `db_analysis.rs` / `ltm_augment.rs` is a heavy user)

## Suggested fix

1. Introduce **named constants / newtypes** that make the separator layer explicit at the type level (e.g. `const STDLIB_PREFIX_SEP: char = '\u{205A}';`, `const AST_MODULE_SEP: char = '\u{00B7}';`, or — stronger — distinct wrapper types so an AST-layer identifier cannot be passed where a datamodel-layer one is expected). Replace the hand-written literals at call sites with these names so the intended layer is legible in the source and in review.
2. Add a **guard test** asserting that raw `\u{205A}` / `\u{00B7}` literals do not appear in `src/simlin-engine/src/` outside the single module that owns each separator (a source-scanning unit test, analogous to existing repo lint-style guards). This makes a wrong-layer literal a hard CI failure instead of an invisible correctness bug.

## Context

Identified during the Vensim macro support epic retrospective (PR #564, branch `macros`). The epic added the second subsystem dependency onto this separator scheme, which is what elevated a pre-existing latent hazard into something worth a structural guard. Not introduced by a single commit — this is the accreted state of the canonicalization/stdlib/macro naming scheme.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

engine: three visually-indistinguishable separator chars (U+205A / U+00B7 / ASCII .) are load-bearing across stdlib + macro subsystems with no type-level or lint guard #565

Problem

Why it matters

Components affected

Suggested fix

Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Char	Codepoint	Layer / meaning	Example
`⁚`	U+205A (TWO DOT PUNCTUATION)	stdlib model-name prefix	`stdlib⁚delay1`
`·`	U+00B7 (MIDDLE DOT)	compile-time AST `module·output`	`module·output`
`.`	ASCII U+002E	datamodel-layer `module.port`	`module.port`

engine: three visually-indistinguishable separator chars (U+205A / U+00B7 / ASCII .) are load-bearing across stdlib + macro subsystems with no type-level or lint guard #565

Description

Problem

Why it matters

Components affected

Suggested fix

Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions