Skip to content

[improvement](fe) Improve parse error message when reserved keyword is used as identifier#63225

Open
morrySnow wants to merge 1 commit into
apache:masterfrom
morrySnow:fix-25029
Open

[improvement](fe) Improve parse error message when reserved keyword is used as identifier#63225
morrySnow wants to merge 1 commit into
apache:masterfrom
morrySnow:fix-25029

Conversation

@morrySnow
Copy link
Copy Markdown
Contributor

@morrySnow morrySnow commented May 13, 2026

Summary

When a user writes a statement like CREATE DATABASE load, Doris previously produced an overwhelming ANTLR-generated error listing hundreds of expected tokens:

mismatched input 'load' expecting {'{', '}', 'ACTIONS', 'AFTER', 'AGG_STATE', 'AGGREGATE', ...hundreds of tokens...}(line 1, pos 16)

This message is cryptic and gives no actionable guidance.

Research: Other Databases

Database Error Message
BigQuery Syntax error: Unexpected keyword LOAD at [1:17] — names the keyword explicitly
PostgreSQL/DuckDB syntax error at or near "load" — short and concise
Spark SQL Suggests backtick quoting for keyword-as-identifier
Trino Same verbose ANTLR output (same problem)

Fix

Use a new Error Strategy to process error message:

  1. override four type errors: InputMismatch, NoViableAlternative, UnwantedToken, MissingToken
  2. New message format (inspired by BigQuery + Spark):
    Syntax error at or near 'load'(line 1, pos 16)
    
  3. pom.xml: Added default <argLine/> property so Maven Surefire can run tests without the JaCoCo coverage profile

Check List (For Author)

  • Test: Unit Test — added NereidsParserTest#testReservedKeywordAsIdentifierError
  • Behavior changed: Yes — parse errors for reserved-keyword-as-identifier show a human-friendly message instead of raw ANTLR output
  • Does this need documentation: No

@morrySnow
Copy link
Copy Markdown
Contributor Author

run buildall

@morrySnow
Copy link
Copy Markdown
Contributor Author

/review

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking issues found.

Critical checkpoint conclusions:

  • Goal and tests: The change targets reserved-keyword-as-identifier parse errors and adds focused parser unit coverage for load, select, and quoted identifiers. The implementation appears to accomplish that goal.
  • Scope and focus: The PR is small and limited to parser error-message handling, the related unit test, and the surefire argLine default.
  • Concurrency and lifecycle: No concurrency-sensitive or special lifecycle/static initialization behavior is introduced.
  • Configuration and compatibility: No runtime config, storage format, FE-BE protocol, or rolling-upgrade compatibility concern is introduced.
  • Parallel paths: This is localized to the Nereids parser error listener; I did not find another modified path that must be updated for the same behavior.
  • Error handling: The parser still throws SyntaxParseException; the new path only replaces overly verbose ANTLR messages with clearer text when the mismatch is identifiable.
  • Test coverage: Unit coverage is added for the main behavior. I attempted to run mvn -pl fe-core -am -Dtest=org.apache.doris.nereids.parser.NereidsParserTest#testReservedKeywordAsIdentifierError -DfailIfNoTests=false test -Dskip.doc=true, but the runner lacks thirdparty/installed/bin/thrift, so the FE reactor failed before reaching fe-core.
  • Observability: Not applicable; this is user-facing parse error formatting.
  • Transaction, persistence, and data correctness: Not applicable; no data path, transaction, visible-version, or persistence logic is touched.
  • Performance: Parse error handling is not a hot path; no meaningful performance concern found.

User focus points: .opencode-review.zuSvq6/review_focus.txt contains no additional focus points.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29363 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ea8922b6ea8f89cdd551d4997831d67d29f03047, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17794	3813	3804	3804
q2	q3	10720	861	595	595
q4	4658	462	344	344
q5	7458	1329	1140	1140
q6	190	168	137	137
q7	906	943	758	758
q8	9429	1378	1273	1273
q9	6377	5382	5276	5276
q10	6294	2079	1778	1778
q11	480	280	253	253
q12	681	410	289	289
q13	18244	3333	2705	2705
q14	299	286	260	260
q15	q16	898	859	791	791
q17	1091	1036	723	723
q18	6441	5611	5633	5611
q19	1839	1334	1069	1069
q20	508	399	268	268
q21	4771	2354	1958	1958
q22	461	402	331	331
Total cold run time: 99539 ms
Total hot run time: 29363 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4735	4529	4499	4499
q2	q3	4663	4841	4186	4186
q4	2132	2215	1464	1464
q5	4936	5024	5270	5024
q6	194	170	134	134
q7	2052	1775	1617	1617
q8	3321	3055	3100	3055
q9	8446	8478	8352	8352
q10	4554	4485	4226	4226
q11	608	412	399	399
q12	704	744	529	529
q13	3261	3552	3021	3021
q14	310	305	292	292
q15	q16	787	769	708	708
q17	1344	1293	1222	1222
q18	7895	7147	7135	7135
q19	1160	1148	1189	1148
q20	2182	2212	1934	1934
q21	6112	5265	4782	4782
q22	542	475	392	392
Total cold run time: 59938 ms
Total hot run time: 54119 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171113 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ea8922b6ea8f89cdd551d4997831d67d29f03047, data reload: false

query5	4335	675	510	510
query6	330	219	206	206
query7	4241	538	307	307
query8	323	235	223	223
query9	8816	4060	4030	4030
query10	483	357	312	312
query11	5756	2361	2255	2255
query12	179	131	129	129
query13	1298	631	470	470
query14	6130	5380	5043	5043
query14_1	4362	4397	4374	4374
query15	216	205	186	186
query16	993	456	449	449
query17	1147	780	637	637
query18	2770	510	360	360
query19	228	204	172	172
query20	137	140	129	129
query21	220	140	118	118
query22	13584	13673	13433	13433
query23	17126	16332	15998	15998
query23_1	16093	16127	16107	16107
query24	7452	1749	1357	1357
query24_1	1373	1362	1377	1362
query25	614	530	470	470
query26	1314	305	176	176
query27	2696	601	359	359
query28	4426	1973	1962	1962
query29	1021	660	539	539
query30	316	249	198	198
query31	1132	1073	957	957
query32	100	76	77	76
query33	550	373	301	301
query34	1186	1131	642	642
query35	778	777	701	701
query36	1387	1371	1211	1211
query37	156	105	94	94
query38	3213	3153	3046	3046
query39	923	932	908	908
query39_1	895	871	862	862
query40	230	154	136	136
query41	66	65	63	63
query42	112	112	108	108
query43	334	323	290	290
query44	
query45	211	200	195	195
query46	1044	1184	748	748
query47	2318	2301	2142	2142
query48	416	412	308	308
query49	639	529	439	439
query50	751	293	219	219
query51	4346	4263	4273	4263
query52	107	108	94	94
query53	248	269	201	201
query54	311	277	266	266
query55	96	89	83	83
query56	324	292	305	292
query57	1414	1386	1316	1316
query58	299	268	269	268
query59	1576	1643	1409	1409
query60	340	330	329	329
query61	168	159	159	159
query62	675	635	564	564
query63	252	201	204	201
query64	2394	821	674	674
query65	
query66	1706	508	390	390
query67	29994	29945	29823	29823
query68	
query69	453	379	297	297
query70	985	961	912	912
query71	311	283	273	273
query72	3046	2732	2456	2456
query73	864	739	457	457
query74	5073	4897	4744	4744
query75	2769	2668	2333	2333
query76	2286	1157	765	765
query77	414	439	365	365
query78	12844	12869	12414	12414
query79	1481	935	752	752
query80	1383	593	501	501
query81	526	283	234	234
query82	1290	162	126	126
query83	375	270	246	246
query84	265	141	112	112
query85	932	529	450	450
query86	464	364	318	318
query87	3421	3338	3233	3233
query88	3560	2683	2689	2683
query89	449	385	338	338
query90	1943	180	181	180
query91	180	169	140	140
query92	77	80	75	75
query93	955	950	570	570
query94	736	339	306	306
query95	659	447	345	345
query96	1093	757	322	322
query97	2719	2662	2521	2521
query98	238	228	227	227
query99	1138	1104	969	969
Total cold run time: 254573 ms
Total hot run time: 171113 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 80.00% (16/20) 🎉
Increment coverage report
Complete coverage report

@morrySnow morrySnow force-pushed the fix-25029 branch 2 times, most recently from b990982 to 53ba819 Compare May 14, 2026 09:28
@morrySnow
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is focused on FE/Nereids parser error reporting and includes tests for reserved-keyword-as-identifier diagnostics. I found a blocking issue: the implementation emits only a generic syntax error and never produces the reserved-keyword/backtick guidance asserted by the new unit test, so the feature is not actually implemented and the added test should fail once the FE test environment is complete.

Critical checkpoint conclusions:

  • Goal/test proof: Goal is clearer parse errors for reserved keywords. Current code does not accomplish the reserved-keyword guidance path; the added test expects text the code never emits.
  • Scope/focus: The change is small and localized, but the implementation and tests are inconsistent.
  • Concurrency/lifecycle: Not applicable; parser strategy is per parser instance and no shared mutable state was introduced.
  • Config/compatibility/protocol/persistence/data writes: Not applicable.
  • Parallel paths: Nereids parser path was changed; dialect/legacy parser paths were not part of this PR.
  • Error handling: The new strategy changes user-visible parse exceptions, but drops expected-token details and misses the advertised reserved-keyword special handling.
  • Tests: Added unit coverage exists, but it does not match implementation. I attempted mvn -pl fe-core -Dtest=org.apache.doris.nereids.parser.NereidsParserTest#testReservedKeywordAsIdentifierError test -Dskip.doc=true; it could not reach test execution because org.apache.doris:fe-foundation:1.2-SNAPSHOT was unavailable in this runner.
  • Observability/performance: Not applicable beyond parser error formatting.
  • User focus: No additional user-provided review focus was supplied.


@Override
protected void reportInputMismatch(Parser recognizer, InputMismatchException e) {
recognizer.notifyErrorListeners(e.getOffendingToken(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path never produces the reserved-keyword diagnostic that the PR advertises and that testReservedKeywordAsIdentifierError asserts. For create database load, SLL bails out, LL reaches this reportInputMismatch, and the message is built as only Syntax error at or near 'load'; there is no reserved keyword text and no backtick suggestion such as `load`. As a result the new unit test fails and the user-facing reserved-keyword guidance is not actually implemented. Please detect the identifier-expected/reserved-token case here (or update the tests/PR scope if the intended behavior is only the generic message).

…s used as identifier

Problem Summary: When a user writes a statement like `CREATE DATABASE load`,
Doris produced an overwhelming ANTLR-generated error listing hundreds of
expected tokens:
  mismatched input 'load' expecting {"{"}, {"}"}, 'ACTIONS', 'AFTER', ...

This message is cryptic and gives no actionable guidance. The root cause is
that `LOAD` is a reserved keyword, so the parser expects an identifier but
gets a keyword token, triggering a huge "expecting {non-reserved keywords}"
list.

Research into how other databases handle this:
- BigQuery: "Syntax error: Unexpected keyword LOAD" (names the keyword)
- PostgreSQL/DuckDB: Short "syntax error at or near" with position indicator
- Spark SQL: Suggests backtick quoting for keyword-as-identifier
- Trino: Same verbose ANTLR output (same problem as Doris)

The fix improves `ParseErrorListener` to:
1. Detect when a reserved keyword is used where an identifier is expected
   (InputMismatchException + expected tokens include IDENTIFIER, offending
   token has a literal name in the grammar and looks like a word)
2. Emit a concise, actionable message: "'load' is a reserved keyword...
   please use backtick quotes: `load`"
3. Also trim excessively long expected-token lists in other mismatch errors
   (> 200 chars) so they do not overwhelm users

Also adds a default empty `<argLine/>` in fe-core/pom.xml so that Maven
Surefire can run unit tests without the JaCoCo coverage profile (fixes
"could not open {argLine}" JVM fork failure).

When a reserved keyword (e.g. LOAD, SELECT) is mistakenly used as an
unquoted identifier, Doris now shows a clear error message explaining the
issue and suggesting backtick quoting (e.g. `load`) instead of dumping
hundreds of expected token names.

- Test: Regression test / Unit Test
    - Added NereidsParserTest#testReservedKeywordAsIdentifierError covering
      LOAD and SELECT as reserved keywords, and verifying backtick-quoted
      identifiers still parse successfully.
    - Existing testErrorListener passes unchanged.
- Behavior changed: Yes — parse errors for reserved-keyword-as-identifier
  now show an improved human-friendly message instead of the raw ANTLR output.
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@morrySnow
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29511 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit abfd991a3fc893b946bbba65aee037ca84f0ba8d, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17621	3888	3931	3888
q2	q3	10703	865	603	603
q4	4673	462	358	358
q5	7443	1328	1151	1151
q6	223	168	140	140
q7	927	940	754	754
q8	9735	1405	1291	1291
q9	6341	5361	5322	5322
q10	6347	2089	1804	1804
q11	478	260	266	260
q12	688	417	289	289
q13	18225	3332	2741	2741
q14	299	284	267	267
q15	q16	908	860	783	783
q17	1126	1101	749	749
q18	6537	5712	5509	5509
q19	1456	1218	1164	1164
q20	511	396	261	261
q21	4529	2282	1871	1871
q22	429	360	306	306
Total cold run time: 99199 ms
Total hot run time: 29511 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4312	4249	4165	4165
q2	q3	4643	4877	4151	4151
q4	2116	2174	1403	1403
q5	4940	4937	5240	4937
q6	187	165	132	132
q7	2018	2132	1635	1635
q8	3438	3154	3193	3154
q9	8517	8448	8399	8399
q10	4467	4500	4253	4253
q11	643	429	393	393
q12	707	752	511	511
q13	3205	3619	2879	2879
q14	291	307	277	277
q15	q16	771	775	698	698
q17	1401	1348	1315	1315
q18	8202	7159	7030	7030
q19	1159	1149	1168	1149
q20	2243	2274	1936	1936
q21	6237	5500	4893	4893
q22	579	532	432	432
Total cold run time: 60076 ms
Total hot run time: 53742 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171427 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit abfd991a3fc893b946bbba65aee037ca84f0ba8d, data reload: false

query5	4347	647	522	522
query6	333	219	202	202
query7	4296	583	304	304
query8	328	232	220	220
query9	8817	4011	4060	4011
query10	494	351	293	293
query11	6017	2357	2199	2199
query12	195	127	128	127
query13	1271	612	434	434
query14	6896	5387	5061	5061
query14_1	4391	4403	4362	4362
query15	226	205	189	189
query16	1000	455	402	402
query17	1400	774	648	648
query18	2728	482	365	365
query19	321	210	174	174
query20	139	134	131	131
query21	221	147	116	116
query22	13566	13571	13536	13536
query23	17126	16364	16676	16364
query23_1	16442	16314	16395	16314
query24	7690	1844	1460	1460
query24_1	1407	1429	1420	1420
query25	619	552	496	496
query26	1357	322	228	228
query27	2963	619	349	349
query28	4343	1966	1931	1931
query29	1004	625	522	522
query30	300	228	199	199
query31	1130	1068	930	930
query32	85	69	69	69
query33	527	339	299	299
query34	1150	1134	642	642
query35	771	779	664	664
query36	1300	1321	1108	1108
query37	153	100	86	86
query38	3193	3107	3078	3078
query39	927	917	886	886
query39_1	899	865	861	861
query40	231	149	132	132
query41	63	102	59	59
query42	107	108	117	108
query43	318	324	281	281
query44	
query45	203	205	191	191
query46	1081	1174	755	755
query47	2320	2260	2173	2173
query48	399	431	296	296
query49	637	540	429	429
query50	702	283	207	207
query51	4269	4322	4219	4219
query52	103	103	93	93
query53	257	267	202	202
query54	305	269	247	247
query55	91	88	84	84
query56	298	307	294	294
query57	1413	1364	1310	1310
query58	283	259	254	254
query59	1561	1609	1426	1426
query60	338	331	317	317
query61	159	156	154	154
query62	671	618	559	559
query63	241	206	207	206
query64	2347	821	670	670
query65	
query66	1684	505	386	386
query67	29993	30195	29766	29766
query68	
query69	456	341	302	302
query70	1051	931	939	931
query71	305	277	258	258
query72	2884	2704	2470	2470
query73	850	733	414	414
query74	5071	4868	4735	4735
query75	2814	2645	2305	2305
query76	2334	1117	784	784
query77	418	445	359	359
query78	12972	12978	12329	12329
query79	1491	994	747	747
query80	672	574	482	482
query81	453	277	241	241
query82	1320	158	122	122
query83	355	267	245	245
query84	262	144	111	111
query85	862	521	447	447
query86	391	328	300	300
query87	3403	3339	3172	3172
query88	3557	2689	2671	2671
query89	436	377	333	333
query90	1903	187	182	182
query91	177	170	142	142
query92	78	73	69	69
query93	958	973	550	550
query94	540	341	297	297
query95	664	368	440	368
query96	1088	799	340	340
query97	2718	2688	2561	2561
query98	234	228	224	224
query99	1103	1116	961	961
Total cold run time: 255168 ms
Total hot run time: 171427 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 63.04% (29/46) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 16.96% (29/171) 🎉
Increment coverage report
Complete coverage report

heguanhui pushed a commit to heguanhui/doris that referenced this pull request May 17, 2026
…able permission

### What problem does this PR solve?

Issue Number: close apache#63225

Problem Summary: Several shell scripts are invoked directly (e.g., ./build.sh) without executable permission, causing "Permission denied" errors during build, FE UT, BE UT, and broker build processes. This PR changes all such invocations to use `bash` prefix to ensure they execute correctly regardless of file permission settings.

### Release note

Fix build and test scripts failing due to missing executable permission by using `bash` prefix for shell script invocations.

### Check List (For Author)

- Test: Manual test
    - Successfully built Doris with `bash build.sh --be --fe`
    - Successfully ran FE unit tests with `bash run-fe-ut.sh`
    - Successfully built broker module
- Behavior changed: No
- Does this need documentation: No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants