Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

What problem does this PR solve?

close: #56328

Problem Summary:
The current close() method in Scanner class and its derived classes has thread safety issues in multi-threaded environments. Multiple threads may call close() simultaneously, leading to:

  1. Race conditions: Multiple threads may check _is_closed status and execute close operations simultaneously
  2. Double close: May cause resources to be released or cleaned up multiple times
  3. Potential memory leaks or crash risks

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 37.50% (6/16) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.75% (18057/34231)
Line Coverage 37.98% (163710/431035)
Region Coverage 32.36% (124795/385648)
Branch Coverage 33.73% (54561/161775)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 62.50% (10/16) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.37% (23935/33536)
Line Coverage 57.79% (249000/430843)
Region Coverage 52.95% (206728/390413)
Branch Coverage 54.65% (88844/162558)

morningman
morningman previously approved these changes Oct 29, 2025
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman requested a review from Copilot October 29, 2025 07:05
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 29, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman
Copy link
Contributor

run performance

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR makes the _is_closed flag in the Scanner class thread-safe by converting it from a plain bool to std::atomic<bool>, and updates all close() methods in scanner implementations to use atomic compare-and-swap operations instead of simple boolean checks.

  • Changed _is_closed from bool to std::atomic<bool> in the Scanner base class
  • Updated all close() implementations to use compare_exchange_strong for thread-safe double-close prevention
  • Fixed a typo in a comment from "dctor" to "destructor"

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
be/src/vec/exec/scan/scanner.h Changed _is_closed to std::atomic<bool> and added <atomic> include
be/src/vec/exec/scan/scanner.cpp Updated base Scanner::close() to use atomic compare-exchange
be/src/vec/exec/scan/scan_node.h Fixed typo in comment (dctor → destructor)
be/src/vec/exec/scan/olap_scanner.cpp Updated OlapScanner::close() to use atomic compare-exchange
be/src/vec/exec/scan/meta_scanner.cpp Updated MetaScanner::close() to use atomic compare-exchange
be/src/vec/exec/scan/jdbc_scanner.cpp Updated JdbcScanner::close() to use atomic compare-exchange
be/src/vec/exec/scan/file_scanner.cpp Updated FileScanner::close() to use atomic compare-exchange
be/src/vec/exec/scan/es_scanner.cpp Updated EsScanner::close() to use atomic compare-exchange
Comments suppressed due to low confidence (3)

be/src/vec/exec/scan/meta_scanner.cpp:531

  • Double close can still occur: after checking _is_closed and before calling Scanner::close(state), the parent class's close() is called at line 531 which will perform its own compare-exchange on _is_closed. Since this method already set _is_closed to true at line 525, the parent's compare-exchange will always fail (expected=false won't match _is_closed=true), making line 531 effectively a no-op. However, if _reader->close() fails and returns an error, _is_closed will remain true but Scanner::close() will never be called, potentially leaking resources. The parent's close() should be called unconditionally or the logic should be restructured.
    bool expected = false;
    if (!_is_closed.compare_exchange_strong(expected, true)) {
        return Status::OK();
    }
    if (_reader) {
        RETURN_IF_ERROR(_reader->close());
    }
    RETURN_IF_ERROR(Scanner::close(state));

be/src/vec/exec/scan/file_scanner.cpp:1726

  • Double close protection is flawed: after setting _is_closed to true at line 1718, the parent class Scanner::close(state) is called at line 1726, which performs its own compare-exchange. Since _is_closed is already true, the parent's compare-exchange will always fail, making line 1726 a no-op. Additionally, if _cur_reader->close() fails at line 1723, _is_closed remains true but the parent close is never executed, potentially leaking resources. The parent's close should be called unconditionally.
    bool expected = false;
    if (!_is_closed.compare_exchange_strong(expected, true)) {
        return Status::OK();
    }

    if (_cur_reader) {
        RETURN_IF_ERROR(_cur_reader->close());
    }

    RETURN_IF_ERROR(Scanner::close(state));

be/src/vec/exec/scan/es_scanner.cpp:203

  • Double close protection is flawed: after setting _is_closed to true at line 195, the parent class Scanner::close(state) is called at line 203, which performs its own compare-exchange. Since _is_closed is already true, the parent's compare-exchange will always fail, making line 203 a no-op. Additionally, if _es_reader->close() fails at line 200, _is_closed remains true but the parent close is never executed, potentially leaking resources. The parent's close should be called unconditionally.
    bool expected = false;
    if (!_is_closed.compare_exchange_strong(expected, true)) {
        return Status::OK();
    }

    if (_es_reader != nullptr) {
        RETURN_IF_ERROR(_es_reader->close());
    }

    RETURN_IF_ERROR(Scanner::close(state));

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@doris-robot
Copy link

ClickBench: Total hot run time: 27.81 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 12025331ed27cd897a05a3e1678699661f289a15, data reload: false

query1	0.07	0.05	0.05
query2	0.09	0.05	0.05
query3	0.26	0.08	0.08
query4	1.60	0.11	0.12
query5	0.28	0.26	0.26
query6	1.17	0.68	0.66
query7	0.04	0.02	0.02
query8	0.05	0.04	0.04
query9	0.63	0.53	0.52
query10	0.58	0.56	0.56
query11	0.16	0.11	0.11
query12	0.16	0.13	0.12
query13	0.62	0.60	0.60
query14	1.02	1.01	1.02
query15	0.85	0.84	0.85
query16	0.40	0.40	0.39
query17	1.07	1.09	1.05
query18	0.22	0.20	0.21
query19	1.94	1.77	1.78
query20	0.01	0.02	0.01
query21	15.45	0.18	0.12
query22	5.09	0.07	0.05
query23	15.69	0.27	0.10
query24	3.28	0.53	0.87
query25	0.07	0.07	0.05
query26	0.14	0.14	0.13
query27	0.06	0.05	0.05
query28	5.34	1.14	0.93
query29	12.67	3.99	3.33
query30	0.28	0.15	0.11
query31	2.81	0.60	0.38
query32	3.23	0.56	0.48
query33	3.06	3.13	3.09
query34	15.95	5.15	4.62
query35	4.60	4.54	4.57
query36	0.67	0.50	0.49
query37	0.09	0.07	0.07
query38	0.07	0.04	0.04
query39	0.04	0.02	0.02
query40	0.17	0.16	0.14
query41	0.08	0.03	0.04
query42	0.03	0.03	0.03
query43	0.04	0.03	0.04
Total cold run time: 100.13 s
Total hot run time: 27.81 s

@suxiaogang223
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Oct 29, 2025
@doris-robot
Copy link

TPC-DS: Total hot run time: 190348 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1827e2e91b5489ecbdf5960a77557c5c27bdb456, data reload: false

query1	1092	394	395	394
query2	6578	1694	1693	1693
query3	6752	229	223	223
query4	25682	24158	23126	23126
query5	4822	628	499	499
query6	342	254	238	238
query7	4658	508	313	313
query8	314	279	258	258
query9	8704	2617	2639	2617
query10	571	357	286	286
query11	15476	15122	14958	14958
query12	187	122	115	115
query13	1672	567	421	421
query14	11220	9228	9261	9228
query15	229	195	191	191
query16	7681	698	530	530
query17	1188	801	641	641
query18	2055	442	344	344
query19	246	242	203	203
query20	151	149	143	143
query21	223	161	141	141
query22	4870	4730	4607	4607
query23	35003	33926	34152	33926
query24	8574	2538	2505	2505
query25	654	560	464	464
query26	1326	279	162	162
query27	2931	546	366	366
query28	4504	2265	2201	2201
query29	815	686	519	519
query30	318	237	225	225
query31	956	852	739	739
query32	79	74	79	74
query33	607	407	351	351
query34	854	920	553	553
query35	834	877	787	787
query36	989	987	889	889
query37	122	116	84	84
query38	3514	3555	3464	3464
query39	1484	1411	1424	1411
query40	224	131	120	120
query41	59	59	57	57
query42	125	111	110	110
query43	505	505	482	482
query44	1240	746	743	743
query45	187	179	172	172
query46	903	995	646	646
query47	1769	1799	1728	1728
query48	395	434	312	312
query49	764	503	426	426
query50	669	708	411	411
query51	3843	3866	3811	3811
query52	110	109	107	107
query53	239	270	199	199
query54	607	595	526	526
query55	86	89	86	86
query56	358	322	303	303
query57	1183	1196	1134	1134
query58	297	284	299	284
query59	2555	2627	2623	2623
query60	349	361	330	330
query61	158	156	157	156
query62	803	756	667	667
query63	234	196	193	193
query64	4497	1169	863	863
query65	4037	3971	3980	3971
query66	1147	440	340	340
query67	15634	15339	15126	15126
query68	8786	909	596	596
query69	488	330	303	303
query70	1314	1270	1242	1242
query71	504	351	333	333
query72	5947	4910	4921	4910
query73	739	592	357	357
query74	9231	9200	8713	8713
query75	4008	3331	2860	2860
query76	3669	1171	753	753
query77	805	403	315	315
query78	9738	9768	8977	8977
query79	2271	851	601	601
query80	650	595	528	528
query81	488	278	229	229
query82	428	174	134	134
query83	302	279	265	265
query84	313	124	98	98
query85	896	469	432	432
query86	344	328	283	283
query87	3775	3733	3653	3653
query88	3195	2232	2247	2232
query89	398	336	301	301
query90	2024	224	220	220
query91	171	173	142	142
query92	87	80	67	67
query93	1606	1012	653	653
query94	703	459	338	338
query95	413	331	319	319
query96	526	591	287	287
query97	2925	3002	2883	2883
query98	241	225	213	213
query99	1460	1406	1296	1296
Total cold run time: 279699 ms
Total hot run time: 190348 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.19 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1827e2e91b5489ecbdf5960a77557c5c27bdb456, data reload: false

query1	0.05	0.05	0.06
query2	0.10	0.05	0.05
query3	0.26	0.09	0.09
query4	1.61	0.12	0.12
query5	0.28	0.26	0.24
query6	1.17	0.66	0.64
query7	0.04	0.03	0.02
query8	0.06	0.05	0.05
query9	0.64	0.54	0.53
query10	0.59	0.59	0.58
query11	0.17	0.11	0.12
query12	0.16	0.13	0.12
query13	0.63	0.61	0.60
query14	1.00	1.02	1.01
query15	0.86	0.85	0.86
query16	0.40	0.39	0.42
query17	1.02	1.02	1.04
query18	0.22	0.20	0.20
query19	1.92	1.77	1.79
query20	0.02	0.02	0.01
query21	15.44	0.17	0.14
query22	5.12	0.07	0.06
query23	15.66	0.27	0.11
query24	3.08	0.76	1.38
query25	0.09	0.07	0.06
query26	0.15	0.15	0.14
query27	0.07	0.06	0.06
query28	5.81	1.17	0.94
query29	12.57	4.03	3.36
query30	0.30	0.14	0.11
query31	2.83	0.60	0.38
query32	3.23	0.58	0.48
query33	3.06	3.11	3.07
query34	15.78	5.16	4.61
query35	4.57	4.58	4.61
query36	0.71	0.52	0.49
query37	0.11	0.07	0.07
query38	0.07	0.04	0.04
query39	0.04	0.03	0.03
query40	0.19	0.16	0.15
query41	0.09	0.04	0.04
query42	0.05	0.03	0.04
query43	0.04	0.04	0.03
Total cold run time: 100.26 s
Total hot run time: 28.19 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 46.15% (6/13) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.75% (18067/34251)
Line Coverage 37.98% (163834/431353)
Region Coverage 32.35% (124946/386181)
Branch Coverage 33.71% (54614/161991)

@suxiaogang223
Copy link
Contributor Author

run external

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 30, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 61.54% (8/13) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.22% (23947/33625)
Line Coverage 57.62% (248722/431692)
Region Coverage 52.64% (206051/391461)
Branch Coverage 54.46% (88752/162976)

@morningman morningman merged commit bf0b87c into apache:master Oct 31, 2025
28 of 29 checks passed
github-actions bot pushed a commit that referenced this pull request Oct 31, 2025
…perations (#57436)

### What problem does this PR solve?
close: #56328

Problem Summary:
The current close() method in Scanner class and its derived classes has
thread safety issues in multi-threaded environments. Multiple threads
may call close() simultaneously, leading to:
1. Race conditions: Multiple threads may check _is_closed status and
execute close operations simultaneously
2. Double close: May cause resources to be released or cleaned up
multiple times
3. Potential memory leaks or crash risks
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Nov 3, 2025
…perations (apache#57436)

close: apache#56328

Problem Summary:
The current close() method in Scanner class and its derived classes has
thread safety issues in multi-threaded environments. Multiple threads
may call close() simultaneously, leading to:
1. Race conditions: Multiple threads may check _is_closed status and
execute close operations simultaneously
2. Double close: May cause resources to be released or cleaned up
multiple times
3. Potential memory leaks or crash risks
morrySnow pushed a commit that referenced this pull request Nov 5, 2025
@suxiaogang223 suxiaogang223 deleted the fix_jni_close branch January 6, 2026 14:52
suxiaogang223 added a commit to suxiaogang223/doris that referenced this pull request Jan 6, 2026
…perations (apache#57436)

### What problem does this PR solve?
close: apache#56328

Problem Summary:
The current close() method in Scanner class and its derived classes has
thread safety issues in multi-threaded environments. Multiple threads
may call close() simultaneously, leading to:
1. Race conditions: Multiple threads may check _is_closed status and
execute close operations simultaneously
2. Double close: May cause resources to be released or cleaned up
multiple times
3. Potential memory leaks or crash risks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.x dev/3.1.3-merged dev/4.0.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] in massive scantasks case, be will crash in jni_connector.close()if scanner failed

6 participants