Skip to content

Clearer parse error for identifiers with a '-' in the middle (#7742)#7744

Open
Unisay wants to merge 2 commits intomasterfrom
yura/issue-7742-uplc-parser-large-case
Open

Clearer parse error for identifiers with a '-' in the middle (#7742)#7744
Unisay wants to merge 2 commits intomasterfrom
yura/issue-7742-uplc-parser-large-case

Conversation

@Unisay
Copy link
Copy Markdown
Contributor

@Unisay Unisay commented Apr 24, 2026

Summary

Closes #7742.

A UPLC file emitted by Scalus 0.16.0's toUplcOptimized — the HTLC sample from the issue — fails to parse with a confusing message hundreds of lines from the real site. The shape reduces to one bad identifier:

(program 1.1.0 (lam pubKeyHash-305478r71 (lam xIn72 xIn72)))

- isn't an identifier character in UPLC (see isIdentifierChar in PlutusCore.Name.Unique); it exists only as the separator between the name text and its numeric unique-suffix (foo-123Name "foo" (Unique 123)). So pubKeyHash-305478r71 isn't a valid unquoted identifier. The old parser still "succeeded" on it in the worst possible way: ate pubKeyHash as the name, -305478 as the unique, left r71 in the stream. r71 then got picked up as the body of the enclosing (lam ...), leaving the closing-paren bookkeeping off by one, and megaparsec eventually blew up on an unrelated ( with "unexpected '(' expecting ')'" at line 448 col 39 — nowhere near the bad name on line 447.

This PR keeps the grammar as-is (it should stay narrow — see the discussion in the issue comments) but makes the diagnostic point at the real problem.

Before / after on the HTLC file

before: htlc.uplc:448:39: unexpected '(' expecting ')'
                            — on a lambda 8+ characters past the real site,
                              on the wrong line

after:  htlc.uplc:447:41: Invalid identifier 'pubKeyHash-305478r71'
        A '-' inside a name is the numeric unique-suffix separator and must be
        followed only by digits and a word boundary.
        To use this text as a name verbatim, quote it with backticks:
        `pubKeyHash-305478r71`
                            — on the offending name, with an actionable hint

Unisay added 2 commits April 24, 2026 18:19
Freeze the current (unhelpful) error output for three forms of invalid
UPLC identifier:
- `foo-bar`                — hyphen followed by non-digits
- `foo-123-456`            — double `-NNN` suffix
- `pubKeyHash-305478r71`   — hyphen + digits + more letters (the shape
  Scalus 0.16.0's `toUplcOptimized` emits, from issue #7742)

All three cases produce misleading diagnostics today — notably the
Scalus case reports the error 8+ characters past the offending name.
Capturing the status quo as goldens so that a follow-up improvement to
name-parser diagnostics shows up as an explicit golden-file diff.
When the unquoted-identifier parser finishes, require that the next char
is a real word-boundary (not another identifier char and not another
'-'). Otherwise the caller wrote something like `pubKeyHash-305478r71`,
`foo-bar` or `foo-123-456`: the '-NNN' we just consumed as the numeric
unique-suffix is not actually terminal, and the prefix interpretation
would silently mis-parse. Consume the remainder of the extended
identifier so the diagnostic can cite the full bad text, then raise a
new `InvalidIdentifier` custom parser error with a caret on the start
of the identifier and an actionable hint to quote it with backticks.

For the original Scalus 0.16.0 HTLC reproducer this changes the error
from `htlc.uplc:448:39: unexpected '(' expecting ')'` (on a lambda 8+
chars past the real site) to `htlc.uplc:447:41: Invalid identifier
'pubKeyHash-305478r71'` — on the offending name itself.

The three negative goldens added in the previous commit are updated to
the new message; all 3886 tests across plutus-core/untyped-plutus-core/
plutus-ir pass unchanged.
@Unisay Unisay self-assigned this Apr 24, 2026
@Unisay Unisay requested a review from a team April 24, 2026 17:18
@Unisay
Copy link
Copy Markdown
Contributor Author

Unisay commented Apr 25, 2026

The single Hydra failure (ci/hydra-build:x86_64-linux.ghc96-mingsW64:checks:plutus-core:test:flat-test) is unrelated to this change. Log says:

stable byte encodings: FAIL
Exception: diff: startProcess: does not exist (No such file or directory)

It's the Windows-cross builder missing the diff binary that tasty-golden shells out to. The same check failed on the recently-merged #7734 with the same message. All native x86_64-linux / aarch64-darwin / x86_64-darwin Hydra checks pass, and locally ./scripts/regen-goldens.sh (~12 800 tests, 7 subprojects) produces zero golden updates outside the three this PR adds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UPLC textual parser (plutus-core 1.45.0.0): fails to parse large (case (constr 0 ...) (lam ...)) when branch body is ~490 lines

1 participant