[improvement](fe) Improve parse error message when reserved keyword is used as identifier#63225
[improvement](fe) Improve parse error message when reserved keyword is used as identifier#63225morrySnow wants to merge 1 commit into
Conversation
|
run buildall |
|
/review |
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Review result: no blocking issues found.
Critical checkpoint conclusions:
- Goal and tests: The change targets reserved-keyword-as-identifier parse errors and adds focused parser unit coverage for
load,select, and quoted identifiers. The implementation appears to accomplish that goal. - Scope and focus: The PR is small and limited to parser error-message handling, the related unit test, and the surefire
argLinedefault. - Concurrency and lifecycle: No concurrency-sensitive or special lifecycle/static initialization behavior is introduced.
- Configuration and compatibility: No runtime config, storage format, FE-BE protocol, or rolling-upgrade compatibility concern is introduced.
- Parallel paths: This is localized to the Nereids parser error listener; I did not find another modified path that must be updated for the same behavior.
- Error handling: The parser still throws
SyntaxParseException; the new path only replaces overly verbose ANTLR messages with clearer text when the mismatch is identifiable. - Test coverage: Unit coverage is added for the main behavior. I attempted to run
mvn -pl fe-core -am -Dtest=org.apache.doris.nereids.parser.NereidsParserTest#testReservedKeywordAsIdentifierError -DfailIfNoTests=false test -Dskip.doc=true, but the runner lacksthirdparty/installed/bin/thrift, so the FE reactor failed before reachingfe-core. - Observability: Not applicable; this is user-facing parse error formatting.
- Transaction, persistence, and data correctness: Not applicable; no data path, transaction, visible-version, or persistence logic is touched.
- Performance: Parse error handling is not a hot path; no meaningful performance concern found.
User focus points: .opencode-review.zuSvq6/review_focus.txt contains no additional focus points.
TPC-H: Total hot run time: 29363 ms |
TPC-DS: Total hot run time: 171113 ms |
FE Regression Coverage ReportIncrement line coverage |
b990982 to
53ba819
Compare
|
/review |
There was a problem hiding this comment.
The PR is focused on FE/Nereids parser error reporting and includes tests for reserved-keyword-as-identifier diagnostics. I found a blocking issue: the implementation emits only a generic syntax error and never produces the reserved-keyword/backtick guidance asserted by the new unit test, so the feature is not actually implemented and the added test should fail once the FE test environment is complete.
Critical checkpoint conclusions:
- Goal/test proof: Goal is clearer parse errors for reserved keywords. Current code does not accomplish the reserved-keyword guidance path; the added test expects text the code never emits.
- Scope/focus: The change is small and localized, but the implementation and tests are inconsistent.
- Concurrency/lifecycle: Not applicable; parser strategy is per parser instance and no shared mutable state was introduced.
- Config/compatibility/protocol/persistence/data writes: Not applicable.
- Parallel paths: Nereids parser path was changed; dialect/legacy parser paths were not part of this PR.
- Error handling: The new strategy changes user-visible parse exceptions, but drops expected-token details and misses the advertised reserved-keyword special handling.
- Tests: Added unit coverage exists, but it does not match implementation. I attempted
mvn -pl fe-core -Dtest=org.apache.doris.nereids.parser.NereidsParserTest#testReservedKeywordAsIdentifierError test -Dskip.doc=true; it could not reach test execution becauseorg.apache.doris:fe-foundation:1.2-SNAPSHOTwas unavailable in this runner. - Observability/performance: Not applicable beyond parser error formatting.
- User focus: No additional user-provided review focus was supplied.
|
|
||
| @Override | ||
| protected void reportInputMismatch(Parser recognizer, InputMismatchException e) { | ||
| recognizer.notifyErrorListeners(e.getOffendingToken(), |
There was a problem hiding this comment.
This path never produces the reserved-keyword diagnostic that the PR advertises and that testReservedKeywordAsIdentifierError asserts. For create database load, SLL bails out, LL reaches this reportInputMismatch, and the message is built as only Syntax error at or near 'load'; there is no reserved keyword text and no backtick suggestion such as `load`. As a result the new unit test fails and the user-facing reserved-keyword guidance is not actually implemented. Please detect the identifier-expected/reserved-token case here (or update the tests/PR scope if the intended behavior is only the generic message).
…s used as identifier
Problem Summary: When a user writes a statement like `CREATE DATABASE load`,
Doris produced an overwhelming ANTLR-generated error listing hundreds of
expected tokens:
mismatched input 'load' expecting {"{"}, {"}"}, 'ACTIONS', 'AFTER', ...
This message is cryptic and gives no actionable guidance. The root cause is
that `LOAD` is a reserved keyword, so the parser expects an identifier but
gets a keyword token, triggering a huge "expecting {non-reserved keywords}"
list.
Research into how other databases handle this:
- BigQuery: "Syntax error: Unexpected keyword LOAD" (names the keyword)
- PostgreSQL/DuckDB: Short "syntax error at or near" with position indicator
- Spark SQL: Suggests backtick quoting for keyword-as-identifier
- Trino: Same verbose ANTLR output (same problem as Doris)
The fix improves `ParseErrorListener` to:
1. Detect when a reserved keyword is used where an identifier is expected
(InputMismatchException + expected tokens include IDENTIFIER, offending
token has a literal name in the grammar and looks like a word)
2. Emit a concise, actionable message: "'load' is a reserved keyword...
please use backtick quotes: `load`"
3. Also trim excessively long expected-token lists in other mismatch errors
(> 200 chars) so they do not overwhelm users
Also adds a default empty `<argLine/>` in fe-core/pom.xml so that Maven
Surefire can run unit tests without the JaCoCo coverage profile (fixes
"could not open {argLine}" JVM fork failure).
When a reserved keyword (e.g. LOAD, SELECT) is mistakenly used as an
unquoted identifier, Doris now shows a clear error message explaining the
issue and suggesting backtick quoting (e.g. `load`) instead of dumping
hundreds of expected token names.
- Test: Regression test / Unit Test
- Added NereidsParserTest#testReservedKeywordAsIdentifierError covering
LOAD and SELECT as reserved keywords, and verifying backtick-quoted
identifiers still parse successfully.
- Existing testErrorListener passes unchanged.
- Behavior changed: Yes — parse errors for reserved-keyword-as-identifier
now show an improved human-friendly message instead of the raw ANTLR output.
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
run buildall |
TPC-H: Total hot run time: 29511 ms |
TPC-DS: Total hot run time: 171427 ms |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
…able permission ### What problem does this PR solve? Issue Number: close apache#63225 Problem Summary: Several shell scripts are invoked directly (e.g., ./build.sh) without executable permission, causing "Permission denied" errors during build, FE UT, BE UT, and broker build processes. This PR changes all such invocations to use `bash` prefix to ensure they execute correctly regardless of file permission settings. ### Release note Fix build and test scripts failing due to missing executable permission by using `bash` prefix for shell script invocations. ### Check List (For Author) - Test: Manual test - Successfully built Doris with `bash build.sh --be --fe` - Successfully ran FE unit tests with `bash run-fe-ut.sh` - Successfully built broker module - Behavior changed: No - Does this need documentation: No
Summary
When a user writes a statement like
CREATE DATABASE load, Doris previously produced an overwhelming ANTLR-generated error listing hundreds of expected tokens:This message is cryptic and gives no actionable guidance.
Research: Other Databases
Syntax error: Unexpected keyword LOAD at [1:17]— names the keyword explicitlysyntax error at or near "load"— short and conciseFix
Use a new Error Strategy to process error message:
<argLine/>property so Maven Surefire can run tests without the JaCoCo coverage profileCheck List (For Author)
NereidsParserTest#testReservedKeywordAsIdentifierError