Skip to content

[fix] Fixed length error in compress.cpp#48139

Closed
lzyy2024 wants to merge 487 commits intoapache:masterfrom
lzyy2024:debug-compress_function
Closed

[fix] Fixed length error in compress.cpp#48139
lzyy2024 wants to merge 487 commits intoapache:masterfrom
lzyy2024:debug-compress_function

Conversation

@lzyy2024
Copy link
Contributor

What problem does this PR solve?

Fixed length error in compress.cpp

Issue Number: close #xxx

Related PR: #47307

Problem Summary:
The compressed string length should be represented by 4 bytes instead of 10, and I replaced the magic value with a constant

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@lzyy2024
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31473 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 668ad16296a32356b8b74f29295c4ae2e6a6b66a, data reload: false

------ Round 1 ----------------------------------
q1	17579	5386	5042	5042
q2	2047	306	169	169
q3	10409	1348	711	711
q4	10205	1011	534	534
q5	7528	2498	2285	2285
q6	193	179	138	138
q7	911	758	605	605
q8	9302	1366	1229	1229
q9	4845	4539	4624	4539
q10	6822	2327	1874	1874
q11	481	283	260	260
q12	353	366	216	216
q13	17769	3658	3102	3102
q14	221	235	215	215
q15	506	462	466	462
q16	624	612	578	578
q17	580	859	357	357
q18	6609	6274	6230	6230
q19	1204	941	534	534
q20	308	318	188	188
q21	2796	2150	1903	1903
q22	360	332	302	302
Total cold run time: 101652 ms
Total hot run time: 31473 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5092	5104	5133	5104
q2	221	329	234	234
q3	2164	2640	2316	2316
q4	1410	1825	1337	1337
q5	4180	4277	4129	4129
q6	211	158	120	120
q7	1872	1809	1643	1643
q8	2591	2539	2472	2472
q9	7348	7237	7157	7157
q10	3000	3169	2891	2891
q11	584	506	495	495
q12	681	758	612	612
q13	3517	3929	3282	3282
q14	284	312	295	295
q15	492	468	466	466
q16	627	664	629	629
q17	1135	1649	1295	1295
q18	7578	7304	7311	7304
q19	799	787	853	787
q20	1948	1981	1864	1864
q21	5237	4915	4855	4855
q22	606	595	550	550
Total cold run time: 51577 ms
Total hot run time: 49837 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183675 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 668ad16296a32356b8b74f29295c4ae2e6a6b66a, data reload: false

query1	962	375	378	375
query2	6519	1879	1813	1813
query3	6788	218	212	212
query4	26787	23460	23394	23394
query5	4632	657	481	481
query6	305	199	200	199
query7	4628	498	298	298
query8	291	244	227	227
query9	8632	2575	2562	2562
query10	471	311	252	252
query11	15305	15121	14894	14894
query12	153	106	102	102
query13	1655	508	409	409
query14	9551	6332	6325	6325
query15	221	188	173	173
query16	7644	606	447	447
query17	1145	684	533	533
query18	1976	386	292	292
query19	184	177	150	150
query20	119	116	114	114
query21	212	122	102	102
query22	4375	4496	4365	4365
query23	33865	32903	33017	32903
query24	7793	2358	2422	2358
query25	553	481	421	421
query26	1229	273	163	163
query27	2486	485	336	336
query28	4197	2444	2416	2416
query29	723	575	452	452
query30	231	182	151	151
query31	932	858	755	755
query32	76	69	64	64
query33	577	386	312	312
query34	772	855	512	512
query35	814	887	728	728
query36	970	993	902	902
query37	129	104	75	75
query38	4089	4102	3973	3973
query39	1455	1475	1431	1431
query40	206	113	101	101
query41	54	55	52	52
query42	134	105	101	101
query43	501	512	478	478
query44	1269	801	794	794
query45	174	163	158	158
query46	852	1023	641	641
query47	1766	1803	1724	1724
query48	386	409	299	299
query49	777	490	414	414
query50	732	742	439	439
query51	4178	4160	4109	4109
query52	108	108	96	96
query53	229	255	187	187
query54	480	477	408	408
query55	83	73	77	73
query56	264	249	277	249
query57	1147	1147	1084	1084
query58	242	241	246	241
query59	2545	2858	2639	2639
query60	289	267	250	250
query61	117	111	113	111
query62	799	725	674	674
query63	239	192	201	192
query64	4280	999	643	643
query65	3187	3113	3105	3105
query66	1051	410	317	317
query67	16069	15750	15410	15410
query68	6873	763	527	527
query69	471	293	276	276
query70	1154	1107	1126	1107
query71	399	294	255	255
query72	5668	3554	3669	3554
query73	740	739	358	358
query74	9355	9293	8914	8914
query75	3133	3165	2705	2705
query76	3245	1163	725	725
query77	463	366	290	290
query78	9962	9915	9243	9243
query79	2414	848	657	657
query80	635	529	448	448
query81	486	269	245	245
query82	657	125	96	96
query83	175	172	156	156
query84	232	96	74	74
query85	767	347	313	313
query86	378	309	279	279
query87	4443	4418	4369	4369
query88	3957	2217	2201	2201
query89	404	318	285	285
query90	1887	197	192	192
query91	140	138	109	109
query92	79	60	64	60
query93	1763	1012	588	588
query94	698	405	270	270
query95	352	267	252	252
query96	493	553	265	265
query97	2793	2818	2745	2745
query98	234	198	204	198
query99	1612	1406	1259	1259
Total cold run time: 271666 ms
Total hot run time: 183675 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.05 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 668ad16296a32356b8b74f29295c4ae2e6a6b66a, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.03	0.03
query3	0.23	0.07	0.07
query4	1.61	0.11	0.09
query5	0.42	0.42	0.44
query6	1.18	0.66	0.65
query7	0.02	0.02	0.02
query8	0.04	0.03	0.04
query9	0.60	0.51	0.52
query10	0.57	0.57	0.58
query11	0.15	0.10	0.10
query12	0.14	0.11	0.12
query13	0.62	0.59	0.60
query14	2.70	2.72	2.70
query15	0.92	0.83	0.84
query16	0.38	0.37	0.38
query17	1.09	1.06	1.02
query18	0.21	0.19	0.20
query19	1.90	1.79	2.00
query20	0.01	0.01	0.02
query21	15.35	0.92	0.54
query22	0.75	1.13	0.65
query23	14.97	1.37	0.66
query24	11.92	1.11	0.39
query25	0.31	0.08	0.14
query26	0.97	0.20	0.13
query27	0.05	0.04	0.04
query28	6.15	0.76	0.44
query29	12.54	3.85	3.28
query30	0.25	0.09	0.06
query31	2.83	0.58	0.38
query32	3.23	0.54	0.46
query33	3.02	3.13	2.99
query34	15.81	5.10	4.52
query35	4.54	4.51	4.58
query36	0.67	0.50	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.02
query40	0.16	0.14	0.12
query41	0.08	0.03	0.02
query42	0.03	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.75 s
Total hot run time: 30.05 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 43.83% (11639/26556)
Line Coverage: 33.74% (97553/289093)
Region Coverage: 32.85% (49948/152064)
Branch Coverage: 28.56% (25109/87902)
Coverage Report: http://coverage.selectdb-in.cc/coverage/668ad16296a32356b8b74f29295c4ae2e6a6b66a_668ad16296a32356b8b74f29295c4ae2e6a6b66a/report/index.html

// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, making sure st is always Z_OK
RETURN_IF_ERROR(compression_codec->compress(data, &compressed_str));
col_data.resize(col_data.size() + 4 + compressed_str.size());
col_data.resize(col_data.size() + compressed_str_length + compressed_str.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add some SQL based groovy test and beut for this function.

zgxme and others added 23 commits February 21, 2025 19:23
…arameter does not work (apache#47578)

### What problem does this PR solve?

Related PR: apache#47262

Problem Summary:
Starting thirdpaty script with only the reserve-ports parameter does not
work
```bash
./run-thirdparties-docker.sh --reserve-ports                                  
```
### What problem does this PR solve?

Use prometheus to calculate average value is better.

Related PR: apache#43144

For example, we use `task_execution_time_ns_avg_in_last_1000_times`
which is equal to `SUM(cost 0, ... cost 999) / 1000` to represent
average execution time, it has two problems:

1. Update of its data source `_task_execution_time_ns_statistic`
acquires lock.
2. Result of `task_execution_time_ns_avg_in_last_1000_times` is not zero
if we just finished a set of tasks and no more tasks to run. For
example, we have a continuous straight line after all tasks have
finished for a while.
<img width="416" alt="image"
src="https://github.com/user-attachments/assets/e874a077-1e74-4700-9dd8-4cf9625bc8f8"
/>

The problem can be fixed by:
1. Using `task_execution_time_ns_total` an atomic counter to store total
sum of execution time of each iteration.
2. With the help of `irate` function of prometheus, we can have an
equivalent substitution like
`irate(doris_be_task_execution_time_ns_total[$__rate_interval])/doris_be_thread_pool_active_threads`

<img width="349" alt="image"
src="https://github.com/user-attachments/assets/2b0b41bd-8709-4cab-84c1-cf11c4cc3ac9"
/>

After all tasks finished, the curve will be zero, this is more
reasonable.
…d_on_limit (apache#46993)

### What problem does this PR solve?

adaptive_pipeline_task_serial_read_on_limit is unstable, since profile
is too much. check profile content after query finished instead of
checking after all test sql finished.
For a scan operator in a query, FE will assign a tablet id to a specific
scan operator and then it will be search by storage engine during
execution. But this searching process will be done after runtime filter
reached. However, if we wait for a runtime filter for a long time and
compaction / balance tasks is done at the same time, a tablet / rowset
will be lost and we got an error.
… 1 (apache#47586)

### What problem does this PR solve?
fix wrong result when percentile's second argument is 1
related with apache#34382
### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [x] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [x] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [x] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
…apache#47565)

Use single tablet table to verify compaction score action, to ensure
compaction score on tablets is predictable.
evict in advance if current cache size is over threshold to avoid sync
evict during query, which may affect query performance.
selectdb/ccr-syncer#396

force_replace flag will only replace table with different schema
except for non-OLAP tables
…t crash (apache#47610)

### What problem does this PR solve?

*** Query id: 5447701417c13e4e-cea25b10f284c6a5 ***
*** is nereids: 0 ***
*** tablet id: 1738818748602 ***
*** Aborted at 1738820047 (unix time) try "date -d @1738820047" if you
are using GNU date ***
*** Current BE git commitID: 512681c ***
*** SIGSEGV invalid permissions for mapped object (@0x7f112a5df53f)
received by PID 6310 (TID 6765 OR 0x7f1384ed3640) from PID 710800703;
stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int,
siginfo_t*, void*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0]
in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in
/usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F14815CC520 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::vectorized::ColumnVector<unsigned
char>::insert_indices_from(doris::vectorized::IColumn const&, unsigned
int const*, unsigned int const*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/columns/column_vector.cpp:323
5# doris::vectorized::MutableBlock::add_rows(doris::vectorized::Block
const*, unsigned int const*, unsigned int const*, std::vector<int,
std::allocator<int> > const*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/core/block.cpp:1036
6# doris::MemTable::_put_into_output(doris::vectorized::Block&) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable.cpp:257
7# doris::MemTable::_to_block(std::unique_ptr<doris::vectorized::Block,
std::default_delete<doris::vectorized::Block> >*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable.cpp:513
8# doris::MemTable::to_block(std::unique_ptr<doris::vectorized::Block,
std::default_delete<doris::vectorized::Block> >*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable.cpp:532
9# doris::FlushToken::_do_flush_memtable(doris::MemTable*, int, long*)
at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_flush_executor.cpp:144
10# doris::FlushToken::_flush_memtable(std::shared_ptr<doris::MemTable>,
int, long) in /mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be
11# doris::MemtableFlushTask::run() at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_flush_executor.cpp:60
12# doris::ThreadPool::dispatch_thread() in
/mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be
13# doris::Thread::supervise_thread(void*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/thread.cpp:499
14# start_thread at ./nptl/pthread_create.c:442
15# 0x00007F14816B0850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83

Problem Summary:
- When memtable insert fails (e.g., due to memory allocation failure
during add_rows),
  the memtable is left in an inconsistent state
- Under memory pressure, the system might trigger a flush operation on
this failed memtable,
  leading to crashes

Solution:
- Reset memtable immediately after insert failure
…ties backup and restore (apache#41925)

## Proposed changes

Issue Number: close #xxx

When backup and restore, the database properties remain
zddr and others added 23 commits February 21, 2025 19:23
…#48107)

Introduced by apache#47702

F20250218 08:55:18.762200 10135 assert_cast.h:57] Bad cast from
type:doris::vectorized::ColumnNullable to doris::vectorized::ColumnStr
*** Check failure stack trace: ***
@ 0x55bad64c6436 google::LogMessage::SendToLog()
@ 0x55bad64c2e80 google::LogMessage::Flush()
@ 0x55bad64c6c79 google::LogMessageFatal::~LogMessageFatal()
@ 0x55bacb081956
ZZ11assert_castIRKN5doris10vectorized9ColumnStrIjEEL18TypeCheckOnRelease1ERKNS1_7IColumnEET_OT1_ENKUlOSA_E_clIS9_EES5_SD
@ 0x55bacb081797 assert_cast<>()
@ 0x55bacb07f4ef doris::vectorized::ColumnStr<>::insert_from()
@ 0x55bad00b7317 doris::vectorized::IColumn::insert_from_multi_column()
@ 0x55bad5555bc1
doris::vectorized::VSortedRunMerger::get_next()::$_1::operator()()
@ 0x55bad55551ef doris::vectorized::VSortedRunMerger::get_next()
@ 0x55bad552fbc8 doris::vectorized::VDataStreamRecvr::get_next()
@ 0x55bad626bf83 doris::pipeline::ExchangeSourceOperatorX::get_block()
@ 0x55bad55fba4e
doris::pipeline::OperatorXBase::get_block_after_projects()
@ 0x55bad63ea221 doris::pipeline::PipelineTask::execute()
@ 0x55bad63fa3fd doris::pipeline::TaskScheduler::_do_work()
@ 0x55bacbffc90a doris::ThreadPool::dispatch_thread()
@ 0x55bacbff0ea1 doris::Thread::supervise_thread()
@ 0x7f945ba99ac3 (unknown)
@ 0x7f945bb2b850 (unknown)
@ (nil) (unknown)
*** Query id: dac13a78b39642cb-9218eca2d5a11638 ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1739840118 (unix time) try "date -d @1739840118" if you
are using GNU date ***
*** Current BE git commitID:
apache@a8e18b0
***
*** SIGABRT unknown detail explain (@0x1cb6) received by PID 7350 (TID
10135 OR 0x7f909bbf6640) from PID 7350; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int,
siginfo_t*, void*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421
1# 0x00007F945BA47520 in /lib/x86_64-linux-gnu/libc.so.6
2# pthread_kill at ./nptl/pthread_kill.c:89
3# raise at ../sysdeps/posix/raise.c:27
4# abort at ./stdlib/abort.c:81
5# 0x000055BAD64D0D0D in
/mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
6# 0x000055BAD64C334A in
/mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
7# google::LogMessage::SendToLog() in
/mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
8# google::LogMessage::Flush() in
/mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
9# google::LogMessageFatal::~LogMessageFatal() in
/mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
10# doris::vectorized::ColumnStr const& assert_cast const&,
(TypeCheckOnRelease)1, doris::vectorized::IColumn
const&>(doris::vectorized::IColumn
const&)::{lambda(auto:1&&)https://github.com/apache/doris/issues/1}::operator()(doris::vectorized::IColumn
const&) const at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/common/assert_cast.h:57
11# doris::vectorized::ColumnStr const& assert_cast const&,
(TypeCheckOnRelease)1, doris::vectorized::IColumn
const&>(doris::vectorized::IColumn const&) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/common/assert_cast.h:72
12# doris::vectorized::ColumnStr::insert_from(doris::vectorized::IColumn
const&, unsigned long) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/columns/column_string.h:174
13# doris::vectorized::IColumn::insert_from_multi_column(std::vector >
const&, std::vector >) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/columns/column.cpp:51
14#
doris::vectorized::VSortedRunMerger::get_next(doris::vectorized::Block*,
bool*)::$_1::operator()() const at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/runtime/vsorted_run_merger.cpp:172
15#
doris::vectorized::VSortedRunMerger::get_next(doris::vectorized::Block*,
bool*) in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
16#
doris::vectorized::VDataStreamRecvr::get_next(doris::vectorized::Block*,
bool*) in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
17#
doris::pipeline::ExchangeSourceOperatorX::get_block(doris::RuntimeState*,
doris::vectorized::Block*, bool*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/pipeline/exec/exchange_source_operator.cpp:160
18#
doris::pipeline::OperatorXBase::get_block_after_projects(doris::RuntimeState*,
doris::vectorized::Block*, bool*) in
/mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
19# doris::pipeline::PipelineTask::execute(bool*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/pipeline/pipeline_task.cpp:376
20# doris::pipeline::TaskScheduler::_do_work(unsigned long) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/pipeline/task_scheduler.cpp:138
21# doris::ThreadPool::dispatch_thread() in
/mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
22# doris::Thread::supervise_thread(void*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/thread.cpp:499
23# start_thread at ./nptl/pthread_create.c:442
24# 0x00007F945BB2B850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
…#48133)

### What problem does this PR solve?
1. Some aggregate functions, such as array_union, require array sort to
ensure order in the output.
2. For the foreach function, if the input is an array, it is difficult
to guarantee the order of the output. Therefore, some unstable cases
have been removed, leaving only those related to scalars.

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [x] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [x] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [x] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
Remove the hash slot output judge empty logic, unless in nereids
…che#48137)

Introduced a new local variable `escaped_suffix` to store the result of
`escape_for_path_name(suffix_path)`.
This change improves performance by avoiding repeated function calls.
### What problem does this PR solve?

[fix](regression) Spark reads Doris data error
… organization and maintainability. (apache#48079)

Problem Summary:
This pull request includes several changes to the ICU (International
Components for Unicode) integration within the project. The primary
focus is on updating the build configuration, refactoring the ICU
analyzer, and adding new ICU-related files.
…n inverted index filters (apache#47504)

Problem Summary:

select count() from httplogs where clientip match '232.71.0.0' and
request match 'images';

IndexFilter:
      -  HitRows:  0ns
          -  fr_clientip:  10.392K  (10392)
          -  fr_request:  28.601172M  (28601172)
      -  ExecTime:  0ns
          -  ft_clientip:  2.65ms
          -  ft_request:  201.778ms

FilteredRows: Represents the count of rows that met the filtering
conditions post-index filtering.
FilteredTime: Represents the time taken to complete the filtering
operation.
WRITE of size 1 at 0x6160007e86f0 thread T1983 (Pipe_normal [wo)
#0 0x55fc8065b975 in std::__atomic_base<bool>::store(bool,
std::memory_order)
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:457:2
apache#1 0x55fc8065b975 in std::__atomic_base<bool>::operator=(bool)
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:349:2
apache#2 0x55fc8065b975 in std::atomic<bool>::operator=(bool)
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/atomic:80:22
apache#3 0x55fc8065b975 in doris::pipeline::PipelineTask::set_running(bool)
/root/doris/be/src/pipeline/pipeline_task.h:192:47
apache#4 0x55fc8065b975 in
doris::pipeline::TaskScheduler::_do_work(int)::$_0::operator()() const
/root/doris/be/src/pipeline/task_scheduler.cpp:121:23
apache#5 0x55fc8065b975 in
doris::Defer<doris::pipeline::TaskScheduler::_do_work(int)::$_0>::~Defer()
/root/doris/be/src/util/defer_op.h:37:16
apache#6 0x55fc8065b975 in doris::pipeline::TaskScheduler::_do_work(int)
/root/doris/be/src/pipeline/task_scheduler.cpp:162:5
apache#7 0x55fc4c57cd19 in doris::ThreadPool::dispatch_thread()
/root/doris/be/src/util/threadpool.cpp:608:24
apache#8 0x55fc4c55395e in std::function<void ()>::operator()() const
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560:9
apache#9 0x55fc4c55395e in doris::Thread::supervise_thread(void*)
/root/doris/be/src/util/thread.cpp:498:5
apache#10 0x7f9ee3d25608 in start_thread
/build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:477:8
apache#11 0x7f9ee3fd2132 in __clone
/build/glibc-SzIz7B/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95

0x6160007e86f0 is located 624 bytes inside of 632-byte region
[0x6160007e8480,0x6160007e86f8)
freed by thread T1981 (Pipe_normal [wo) here:
#0 0x55fc47aa680d in operator delete(void*)
(/mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be+0x3376e80d)
(BuildId: 865149e62959581e)
apache#1 0x55fc8059db84 in
std::default_delete<doris::pipeline::PipelineTask>::operator()(doris::pipeline::PipelineTask*)
const
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85:2
apache#2 0x55fc8059db84 in std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >::~unique_ptr()
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:361:4
apache#3 0x55fc8059db84 in void
std::destroy_at<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >
>(std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >*)
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:88:15
apache#4 0x55fc8059db84 in void
std::_Destroy<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >
>(std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >*)
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:138:7
apache#5 0x55fc8059db84 in void
std::_Destroy_aux<false>::__destroy<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask>
>*>(std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >*,
std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >*)
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:152:6
apache#6 0x55fc8059db84 in void
std::_Destroy<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask>
>*>(std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >*,
std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >*)
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:184:7
apache#7 0x55fc8059db84 in void
std::_Destroy<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >*,
std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >
>(std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >*,
std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >*,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > >&)
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:746:7
apache#8 0x55fc8059db84 in
std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > >::~vector()
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:680:2
apache#9 0x55fc8052571c in void
std::destroy_at<std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > >
>(std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > >*)
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:88:15
apache#10 0x55fc8052571c in void
std::_Destroy<std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > >
>(std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > >*)
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:138:7
apache#11 0x55fc8052571c in void
std::_Destroy_aux<false>::__destroy<std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > >
>*>(std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > >*,
std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > >*)
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:152:6
apache#12 0x55fc8052571c in void
std::_Destroy<std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > >
>*>(std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > >*,
std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > >*)
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:184:7
apache#13 0x55fc8052571c in void
std::_Destroy<std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > >*,
std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > >
>(std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > >*,
std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > >*,
std::allocator<std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > > >&)
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:746:7
apache#14 0x55fc8052571c in
std::vector<std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > >,
std::allocator<std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > > >
>::_M_erase_at_end(std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > >*)
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1796:6
apache#15 0x55fc8052571c in
std::vector<std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > >,
std::allocator<std::vector<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> >,
std::allocator<std::unique_ptr<doris::pipeline::PipelineTask,
std::default_delete<doris::pipeline::PipelineTask> > > > > >::clear()
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:1499:9
apache#16 0x55fc8052571c in
doris::pipeline::PipelineFragmentContext::~PipelineFragmentContext()
/root/doris/be/src/pipeline/pipeline_fragment_context.cpp:142:12
apache#17 0x55fc47ad30cc in
std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:168:6
apache#18 0x55fc80658d57 in
std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:702:11
apache#19 0x55fc80658d57 in std::__shared_ptr<doris::TaskExecutionContext,
(__gnu_cxx::_Lock_policy)2>::~__shared_ptr()
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1149:31
apache#20 0x55fc80658d57 in
doris::pipeline::close_task(doris::pipeline::PipelineTask*,
doris::Status) /root/doris/be/src/pipeline/task_scheduler.cpp:100:1
apache#21 0x55fc8065aa17 in doris::pipeline::TaskScheduler::_do_work(int)
/root/doris/be/src/pipeline/task_scheduler.cpp:160:36
apache#22 0x55fc4c57cd19 in doris::ThreadPool::dispatch_thread()
/root/doris/be/src/util/threadpool.cpp:608:24
apache#23 0x55fc4c55395e in std::function<void ()>::operator()() const
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560:9
apache#24 0x55fc4c55395e in doris::Thread::supervise_thread(void*)
/root/doris/be/src/util/thread.cpp:498:5
apache#25 0x7f9ee3d25608 in start_thread
/build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:477:8
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
Previously, queries like SELECT COUNT(*) FROM table WHERE
date='2017-10-01' required reading the date column in the first read
phase, even though it was only used for filtering and not in the
aggregation. This PR optimizes the execution plan to eliminate
unnecessary column reads, improving performance.
### What problem does this PR solve?

```
[==========] Running 4 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 4 tests from WorkloadSchedPolicyTest
[ RUN      ] WorkloadSchedPolicyTest.one_policy_one_condition
[       OK ] WorkloadSchedPolicyTest.one_policy_one_condition (51 ms)
[ RUN      ] WorkloadSchedPolicyTest.one_policy_mutl_condition
[       OK ] WorkloadSchedPolicyTest.one_policy_mutl_condition (50 ms)
[ RUN      ] WorkloadSchedPolicyTest.test_operator
[       OK ] WorkloadSchedPolicyTest.test_operator (0 ms)
[ RUN      ] WorkloadSchedPolicyTest.test_wg_id_set
[       OK ] WorkloadSchedPolicyTest.test_wg_id_set (0 ms)
[----------] 4 tests from WorkloadSchedPolicyTest (103 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (104 ms total)
[  PASSED  ] 4 tests.
=== Finished. Gtest output: /mnt/disk2/zouxinyi/doris/core/be/ut_build_ASAN/gtest_output
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.