Clearer parse error for identifiers with a '-' in the middle (#7742)#7744
Open
Clearer parse error for identifiers with a '-' in the middle (#7742)#7744
Conversation
Freeze the current (unhelpful) error output for three forms of invalid UPLC identifier: - `foo-bar` — hyphen followed by non-digits - `foo-123-456` — double `-NNN` suffix - `pubKeyHash-305478r71` — hyphen + digits + more letters (the shape Scalus 0.16.0's `toUplcOptimized` emits, from issue #7742) All three cases produce misleading diagnostics today — notably the Scalus case reports the error 8+ characters past the offending name. Capturing the status quo as goldens so that a follow-up improvement to name-parser diagnostics shows up as an explicit golden-file diff.
When the unquoted-identifier parser finishes, require that the next char
is a real word-boundary (not another identifier char and not another
'-'). Otherwise the caller wrote something like `pubKeyHash-305478r71`,
`foo-bar` or `foo-123-456`: the '-NNN' we just consumed as the numeric
unique-suffix is not actually terminal, and the prefix interpretation
would silently mis-parse. Consume the remainder of the extended
identifier so the diagnostic can cite the full bad text, then raise a
new `InvalidIdentifier` custom parser error with a caret on the start
of the identifier and an actionable hint to quote it with backticks.
For the original Scalus 0.16.0 HTLC reproducer this changes the error
from `htlc.uplc:448:39: unexpected '(' expecting ')'` (on a lambda 8+
chars past the real site) to `htlc.uplc:447:41: Invalid identifier
'pubKeyHash-305478r71'` — on the offending name itself.
The three negative goldens added in the previous commit are updated to
the new message; all 3886 tests across plutus-core/untyped-plutus-core/
plutus-ir pass unchanged.
Contributor
Author
|
The single Hydra failure (
It's the Windows-cross builder missing the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #7742.
A UPLC file emitted by Scalus 0.16.0's
toUplcOptimized— the HTLC sample from the issue — fails to parse with a confusing message hundreds of lines from the real site. The shape reduces to one bad identifier:-isn't an identifier character in UPLC (seeisIdentifierCharinPlutusCore.Name.Unique); it exists only as the separator between the name text and its numeric unique-suffix (foo-123→Name "foo" (Unique 123)). SopubKeyHash-305478r71isn't a valid unquoted identifier. The old parser still "succeeded" on it in the worst possible way: atepubKeyHashas the name,-305478as the unique, leftr71in the stream.r71then got picked up as the body of the enclosing(lam ...), leaving the closing-paren bookkeeping off by one, and megaparsec eventually blew up on an unrelated(with "unexpected '(' expecting ')'" at line 448 col 39 — nowhere near the bad name on line 447.This PR keeps the grammar as-is (it should stay narrow — see the discussion in the issue comments) but makes the diagnostic point at the real problem.
Before / after on the HTLC file