Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Optimize mem usage of partial update #14187

Merged
merged 9 commits into from Dec 26, 2022

Conversation

sevev
Copy link
Contributor

@sevev sevev commented Nov 28, 2022

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Which issues of this PR fixes :

Fixes #11269

Problem Summary(Required) :

We have partially optimized the primary key model for large import memory usage in this pr(#12068), but the enhancement doesn't work if the load is partial update. And we also need a lot of memory if you do a large number of partial updates in one transaction. So this pr will try to reduce the memory usage of large partial update.

There are two reasons for large memory usage during partial column updates:

  1. The first one is that updating a few columns may increase the segment file size and we need to load all data of segment into memory which will cost a lot of memory.
  2. The second one is that doing partial update requires reading data from other columns into memory, which can take up a lot of memory if the table has many columns.

In order to reduce memory usage, the following two adjustments are made:

  1. The first one is to estimate the length of the updated partial columns in each row when importing data, thus reducing the size of the segment file
  2. The second one is don't load all the data of the rowset into memory at once, but to load them one by one according to the segment.

In my test env, one BE with two HDD, using StreamLoad, create a table with 65 column, 20 buckets:

CREATE TABLE `partial_test` (
  `col_1` bigint(20) NOT NULL COMMENT "",
  `col_2` bigint(20) NOT NULL COMMENT "",
  `col_3` bigint(20) NOT NULL COMMENT "",
  `col_4` varchar(150) NOT NULL COMMENT "",
  `col_5` varchar(150) NOT NULL COMMENT "",
  `col_6` varchar(150) NULL COMMENT "",
  `col_7` varchar(150) NULL COMMENT "",
  `col_8` varchar(1024) NULL COMMENT "",
  `col_9` varchar(120) NULL COMMENT "",
  `col_10` varchar(60) NULL COMMENT "",
  `col_11` varchar(10) NULL COMMENT "",
  `col_12` varchar(120) NULL COMMENT "",
  `col_13` varchar(524) NULL COMMENT "",
  `col_14` varchar(100) NULL COMMENT "",
  `col_15` varchar(150) NULL COMMENT "",
  `col_16` varchar(150) NULL COMMENT "",
  `col_17` varchar(150) NULL COMMENT "",
  `col_18` bigint(20) NULL COMMENT "",
  `col_19` varchar(500) NULL COMMENT "",
  `col_20` varchar(150) NULL COMMENT "",
  `col_21` tinyint(4) NULL COMMENT "",
  `col_22` int(11) NULL COMMENT "",
  `col_23` varchar(524) NULL COMMENT "",
  `col_24` bigint(20) NULL COMMENT "",
  `col_25` bigint(20) NULL COMMENT "",
  `col_26` varchar(8) NULL COMMENT "",
  `col_27` decimal64(18, 6) NULL COMMENT "",
  `col_28` decimal64(18, 6) NULL COMMENT "",
  `col_29` decimal64(18, 6) NULL COMMENT "",
  `col_30` decimal64(18, 6) NULL COMMENT "",
  `col_31` decimal64(18, 6) NULL COMMENT "",
  `col_32` decimal64(18, 6) NULL COMMENT "",
  `col_33` bigint(20) NULL COMMENT "",
  `col_34` decimal64(18, 6) NULL COMMENT "",
  `col_35` varchar(8) NULL COMMENT "",
  `col_36` decimal64(18, 6) NULL COMMENT "",
  `col_37` decimal64(18, 6) NULL COMMENT "",
  `col_38` varchar(8) NULL COMMENT "",
  `col_39` decimal64(18, 6) NULL COMMENT "",
  `col_40` decimal64(18, 6) NULL COMMENT "",
  `col_41` varchar(8) NULL COMMENT "",
  `col_42` decimal64(18, 6) NULL COMMENT "",
  `col_43` decimal64(18, 6) NULL COMMENT "",
  `col_44` decimal64(18, 6) NULL COMMENT "",
  `col_45` decimal64(18, 6) NULL COMMENT "",
  `col_46` int(11) NULL COMMENT "",
  `col_47` int(11) NOT NULL COMMENT "",
  `col_48` tinyint(4) NULL COMMENT "",
  `col_49` varchar(200) NULL COMMENT "",
  `col_50` tinyint(4) NULL COMMENT "",
  `col_51` varchar(200) NULL COMMENT "",
  `col_52` varchar(10) NULL COMMENT "",
  `col_53` tinyint(4) NULL COMMENT "",
  `col_54` tinyint(4) NULL COMMENT "",
  `col_55` varchar(150) NULL COMMENT "",
  `col_56` varchar(150) NULL COMMENT "",
  `col_57` varchar(500) NULL COMMENT "",
  `col_58` tinyint(4) NULL COMMENT "",
  `col_59` varchar(100) NULL COMMENT "",
  `col_60` varchar(150) NULL COMMENT "",
  `col_61` varchar(150) NULL COMMENT "",
  `col_62` varchar(150) NULL COMMENT "",
  `col_63` varchar(150) NULL COMMENT "",
  `col_64` datetime NULL COMMENT "",
  `col_65` datetime NULL COMMENT ""
) ENGINE=OLAP 
PRIMARY KEY(`col_1`, `col_2`, `col_3`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`col_1`, `col_2`) BUCKETS 20 
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "V2",
"enable_persistent_index" = "true",
"compression" = "LZ4"
); 
PrimaryKey Length RowNum BucketNum Column Num Partial ColumnNum PartialUpdate RowsNum Load time(s) Apply time(ms) Peak UpdateMemory usage Note
12 Bytes 300M 20 65 5 100M 135261 106693 78.9G branch-main
12 Bytes 300M 20 65 5 100M 166449 149870 10.3G branch-opt
12 Bytes 300M 20 65 5 100K 2078 529 60.1M branch-main
12 Bytes 300M 20 65 5 100K 2211 541 60.2M branch-opt

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr will affect users' behaviors
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto backported to target branch
    • 2.5
    • 2.4
    • 2.3
    • 2.2

@sevev
Copy link
Contributor Author

sevev commented Nov 28, 2022

run starrocks_admit_test

@github-actions
Copy link

clang-tidy review says "All clean, LGTM! 👍"

@sevev sevev changed the title [WIP][Enhancement]Optimize mem usage of partial update [Enhancement]Optimize mem usage of partial update Nov 28, 2022
@sevev sevev changed the title [Enhancement]Optimize mem usage of partial update [Enhancement] Optimize mem usage of partial update Nov 28, 2022
@sevev
Copy link
Contributor Author

sevev commented Nov 28, 2022

run starrocks_be_unittest

@github-actions
Copy link

clang-tidy review says "All clean, LGTM! 👍"

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/storage/rowset_update_state.h Show resolved Hide resolved
be/src/storage/rowset_update_state.h Show resolved Hide resolved
decster
decster previously approved these changes Dec 13, 2022
be/src/storage/memtable.h Outdated Show resolved Hide resolved
be/src/storage/delta_writer.cpp Show resolved Hide resolved
@sevev
Copy link
Contributor Author

sevev commented Dec 22, 2022

run starrocks_be_unittest

@github-actions
Copy link

clang-tidy review says "All clean, LGTM! 👍"

@sevev sevev requested a review from chaoyli December 23, 2022 12:23
@sevev
Copy link
Contributor Author

sevev commented Dec 26, 2022

run starrocks_admit_test

@wanpengfei-git wanpengfei-git added the Approved Ready to merge label Dec 26, 2022
@wanpengfei-git
Copy link
Collaborator

run starrocks_admit_test

@decster decster merged commit 545b7be into StarRocks:main Dec 26, 2022
@github-actions github-actions bot removed Approved Ready to merge be-build labels Dec 26, 2022
@github-actions
Copy link

clang-tidy review says "All clean, LGTM! 👍"

@sonarcloud
Copy link

sonarcloud bot commented Dec 26, 2022

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
4.8% 4.8% Duplication

@sevev
Copy link
Contributor Author

sevev commented Dec 29, 2022

@mergify backport branch-2.5

mergify bot pushed a commit that referenced this pull request Dec 29, 2022
We have partially optimized the primary key model for large import memory usage in this pr(#12068), but the enhancement doesn't work if the load is partial update. And we also need a lot of memory if you do a large number of partial updates in one transaction. So this pr will try to reduce the memory usage of large partial update.

There are two reasons for large memory usage during partial column updates:
1. The first one is that updating a few columns may increase the segment file size and we need to load all data of segment into memory which will cost a lot of memory.
2. The second one is that doing partial update requires reading data from other columns into memory, which can take up a lot of memory if the table has many columns.

In order to reduce memory usage,  the following two adjustments are made:
1. The first one is to estimate the length of the updated partial columns in each row when importing data, thus reducing the size of the segment file
2. The second one is not to load all the data of the rowset into memory at once, but to load them one by one according to the segment.

In my test env, one BE with two HDD, using StreamLoad, create a table with 65 column, 20 buckets:
```
CREATE TABLE `partial_test` (
  `col_1` bigint(20) NOT NULL COMMENT "",
  `col_2` bigint(20) NOT NULL COMMENT "",
  `col_3` bigint(20) NOT NULL COMMENT "",
  `col_4` varchar(150) NOT NULL COMMENT "",
  `col_5` varchar(150) NOT NULL COMMENT "",
  `col_6` varchar(150) NULL COMMENT "",
  `col_7` varchar(150) NULL COMMENT "",
  `col_8` varchar(1024) NULL COMMENT "",
  `col_9` varchar(120) NULL COMMENT "",
  `col_10` varchar(60) NULL COMMENT "",
  `col_11` varchar(10) NULL COMMENT "",
  `col_12` varchar(120) NULL COMMENT "",
  `col_13` varchar(524) NULL COMMENT "",
  `col_14` varchar(100) NULL COMMENT "",
  `col_15` varchar(150) NULL COMMENT "",
  `col_16` varchar(150) NULL COMMENT "",
  `col_17` varchar(150) NULL COMMENT "",
  `col_18` bigint(20) NULL COMMENT "",
  `col_19` varchar(500) NULL COMMENT "",
  `col_20` varchar(150) NULL COMMENT "",
  `col_21` tinyint(4) NULL COMMENT "",
  `col_22` int(11) NULL COMMENT "",
  `col_23` varchar(524) NULL COMMENT "",
  `col_24` bigint(20) NULL COMMENT "",
  `col_25` bigint(20) NULL COMMENT "",
  `col_26` varchar(8) NULL COMMENT "",
  `col_27` decimal64(18, 6) NULL COMMENT "",
  `col_28` decimal64(18, 6) NULL COMMENT "",
  `col_29` decimal64(18, 6) NULL COMMENT "",
  `col_30` decimal64(18, 6) NULL COMMENT "",
  `col_31` decimal64(18, 6) NULL COMMENT "",
  `col_32` decimal64(18, 6) NULL COMMENT "",
  `col_33` bigint(20) NULL COMMENT "",
  `col_34` decimal64(18, 6) NULL COMMENT "",
  `col_35` varchar(8) NULL COMMENT "",
  `col_36` decimal64(18, 6) NULL COMMENT "",
  `col_37` decimal64(18, 6) NULL COMMENT "",
  `col_38` varchar(8) NULL COMMENT "",
  `col_39` decimal64(18, 6) NULL COMMENT "",
  `col_40` decimal64(18, 6) NULL COMMENT "",
  `col_41` varchar(8) NULL COMMENT "",
  `col_42` decimal64(18, 6) NULL COMMENT "",
  `col_43` decimal64(18, 6) NULL COMMENT "",
  `col_44` decimal64(18, 6) NULL COMMENT "",
  `col_45` decimal64(18, 6) NULL COMMENT "",
  `col_46` int(11) NULL COMMENT "",
  `col_47` int(11) NOT NULL COMMENT "",
  `col_48` tinyint(4) NULL COMMENT "",
  `col_49` varchar(200) NULL COMMENT "",
  `col_50` tinyint(4) NULL COMMENT "",
  `col_51` varchar(200) NULL COMMENT "",
  `col_52` varchar(10) NULL COMMENT "",
  `col_53` tinyint(4) NULL COMMENT "",
  `col_54` tinyint(4) NULL COMMENT "",
  `col_55` varchar(150) NULL COMMENT "",
  `col_56` varchar(150) NULL COMMENT "",
  `col_57` varchar(500) NULL COMMENT "",
  `col_58` tinyint(4) NULL COMMENT "",
  `col_59` varchar(100) NULL COMMENT "",
  `col_60` varchar(150) NULL COMMENT "",
  `col_61` varchar(150) NULL COMMENT "",
  `col_62` varchar(150) NULL COMMENT "",
  `col_63` varchar(150) NULL COMMENT "",
  `col_64` datetime NULL COMMENT "",
  `col_65` datetime NULL COMMENT ""
) ENGINE=OLAP
PRIMARY KEY(`col_1`, `col_2`, `col_3`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`col_1`, `col_2`) BUCKETS 20
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "V2",
"enable_persistent_index" = "true",
"compression" = "LZ4"
);
```

|PrimaryKey Length| RowNum|BucketNum| Column Num| Partial ColumnNum | PartialUpdate RowsNum| Load time(s)| Apply time(ms)| Peak UpdateMemory usage | Note |
|---------------------|----------|------------|----------------|--------------------|------------------------------|----|-----|-----|----|
|12 Bytes| 300M | 20 |  65 | 5 | 100M | 135261 | 106693 | 78.9G | branch-main |
|12 Bytes| 300M | 20 |  65 | 5 | 100M | 166449| 149870 | 10.3G | branch-opt |
|12 Bytes| 300M | 20 |  65 | 5 | 100K | 2078 | 529 | 60.1M | branch-main |
|12 Bytes| 300M | 20 |  65 | 5 | 100K | 2211 | 541 | 60.2M | branch-opt |

(cherry picked from commit 545b7be)

# Conflicts:
#	be/src/storage/memtable.h
#	be/src/storage/rowset_update_state.cpp
#	be/src/storage/rowset_update_state.h
#	be/src/storage/tablet_updates.cpp
@mergify
Copy link
Contributor

mergify bot commented Dec 29, 2022

backport branch-2.5

✅ Backports have been created

sevev added a commit to sevev/starrocks that referenced this pull request Dec 29, 2022
We have partially optimized the primary key model for large import memory usage in this pr(StarRocks#12068), but the enhancement doesn't work if the load is partial update. And we also need a lot of memory if you do a large number of partial updates in one transaction. So this pr will try to reduce the memory usage of large partial update.

There are two reasons for large memory usage during partial column updates:
1. The first one is that updating a few columns may increase the segment file size and we need to load all data of segment into memory which will cost a lot of memory.
2. The second one is that doing partial update requires reading data from other columns into memory, which can take up a lot of memory if the table has many columns.

In order to reduce memory usage,  the following two adjustments are made:
1. The first one is to estimate the length of the updated partial columns in each row when importing data, thus reducing the size of the segment file
2. The second one is not to load all the data of the rowset into memory at once, but to load them one by one according to the segment.

In my test env, one BE with two HDD, using StreamLoad, create a table with 65 column, 20 buckets:
```
CREATE TABLE `partial_test` (
  `col_1` bigint(20) NOT NULL COMMENT "",
  `col_2` bigint(20) NOT NULL COMMENT "",
  `col_3` bigint(20) NOT NULL COMMENT "",
  `col_4` varchar(150) NOT NULL COMMENT "",
  `col_5` varchar(150) NOT NULL COMMENT "",
  `col_6` varchar(150) NULL COMMENT "",
  `col_7` varchar(150) NULL COMMENT "",
  `col_8` varchar(1024) NULL COMMENT "",
  `col_9` varchar(120) NULL COMMENT "",
  `col_10` varchar(60) NULL COMMENT "",
  `col_11` varchar(10) NULL COMMENT "",
  `col_12` varchar(120) NULL COMMENT "",
  `col_13` varchar(524) NULL COMMENT "",
  `col_14` varchar(100) NULL COMMENT "",
  `col_15` varchar(150) NULL COMMENT "",
  `col_16` varchar(150) NULL COMMENT "",
  `col_17` varchar(150) NULL COMMENT "",
  `col_18` bigint(20) NULL COMMENT "",
  `col_19` varchar(500) NULL COMMENT "",
  `col_20` varchar(150) NULL COMMENT "",
  `col_21` tinyint(4) NULL COMMENT "",
  `col_22` int(11) NULL COMMENT "",
  `col_23` varchar(524) NULL COMMENT "",
  `col_24` bigint(20) NULL COMMENT "",
  `col_25` bigint(20) NULL COMMENT "",
  `col_26` varchar(8) NULL COMMENT "",
  `col_27` decimal64(18, 6) NULL COMMENT "",
  `col_28` decimal64(18, 6) NULL COMMENT "",
  `col_29` decimal64(18, 6) NULL COMMENT "",
  `col_30` decimal64(18, 6) NULL COMMENT "",
  `col_31` decimal64(18, 6) NULL COMMENT "",
  `col_32` decimal64(18, 6) NULL COMMENT "",
  `col_33` bigint(20) NULL COMMENT "",
  `col_34` decimal64(18, 6) NULL COMMENT "",
  `col_35` varchar(8) NULL COMMENT "",
  `col_36` decimal64(18, 6) NULL COMMENT "",
  `col_37` decimal64(18, 6) NULL COMMENT "",
  `col_38` varchar(8) NULL COMMENT "",
  `col_39` decimal64(18, 6) NULL COMMENT "",
  `col_40` decimal64(18, 6) NULL COMMENT "",
  `col_41` varchar(8) NULL COMMENT "",
  `col_42` decimal64(18, 6) NULL COMMENT "",
  `col_43` decimal64(18, 6) NULL COMMENT "",
  `col_44` decimal64(18, 6) NULL COMMENT "",
  `col_45` decimal64(18, 6) NULL COMMENT "",
  `col_46` int(11) NULL COMMENT "",
  `col_47` int(11) NOT NULL COMMENT "",
  `col_48` tinyint(4) NULL COMMENT "",
  `col_49` varchar(200) NULL COMMENT "",
  `col_50` tinyint(4) NULL COMMENT "",
  `col_51` varchar(200) NULL COMMENT "",
  `col_52` varchar(10) NULL COMMENT "",
  `col_53` tinyint(4) NULL COMMENT "",
  `col_54` tinyint(4) NULL COMMENT "",
  `col_55` varchar(150) NULL COMMENT "",
  `col_56` varchar(150) NULL COMMENT "",
  `col_57` varchar(500) NULL COMMENT "",
  `col_58` tinyint(4) NULL COMMENT "",
  `col_59` varchar(100) NULL COMMENT "",
  `col_60` varchar(150) NULL COMMENT "",
  `col_61` varchar(150) NULL COMMENT "",
  `col_62` varchar(150) NULL COMMENT "",
  `col_63` varchar(150) NULL COMMENT "",
  `col_64` datetime NULL COMMENT "",
  `col_65` datetime NULL COMMENT ""
) ENGINE=OLAP
PRIMARY KEY(`col_1`, `col_2`, `col_3`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`col_1`, `col_2`) BUCKETS 20
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "V2",
"enable_persistent_index" = "true",
"compression" = "LZ4"
);
```

|PrimaryKey Length| RowNum|BucketNum| Column Num| Partial ColumnNum | PartialUpdate RowsNum| Load time(s)| Apply time(ms)| Peak UpdateMemory usage | Note |
|---------------------|----------|------------|----------------|--------------------|------------------------------|----|-----|-----|----|
|12 Bytes| 300M | 20 |  65 | 5 | 100M | 135261 | 106693 | 78.9G | branch-main |
|12 Bytes| 300M | 20 |  65 | 5 | 100M | 166449| 149870 | 10.3G | branch-opt |
|12 Bytes| 300M | 20 |  65 | 5 | 100K | 2078 | 529 | 60.1M | branch-main |
|12 Bytes| 300M | 20 |  65 | 5 | 100K | 2211 | 541 | 60.2M | branch-opt |
wanpengfei-git pushed a commit that referenced this pull request Jan 9, 2023
We have partially optimized the primary key model for large import memory usage in this pr(#12068), but the enhancement doesn't work if the load is partial update. And we also need a lot of memory if you do a large number of partial updates in one transaction. So this pr will try to reduce the memory usage of large partial update.

There are two reasons for large memory usage during partial column updates:
1. The first one is that updating a few columns may increase the segment file size and we need to load all data of segment into memory which will cost a lot of memory.
2. The second one is that doing partial update requires reading data from other columns into memory, which can take up a lot of memory if the table has many columns.

In order to reduce memory usage,  the following two adjustments are made:
1. The first one is to estimate the length of the updated partial columns in each row when importing data, thus reducing the size of the segment file
2. The second one is not to load all the data of the rowset into memory at once, but to load them one by one according to the segment.

In my test env, one BE with two HDD, using StreamLoad, create a table with 65 column, 20 buckets:
```
CREATE TABLE `partial_test` (
  `col_1` bigint(20) NOT NULL COMMENT "",
  `col_2` bigint(20) NOT NULL COMMENT "",
  `col_3` bigint(20) NOT NULL COMMENT "",
  `col_4` varchar(150) NOT NULL COMMENT "",
  `col_5` varchar(150) NOT NULL COMMENT "",
  `col_6` varchar(150) NULL COMMENT "",
  `col_7` varchar(150) NULL COMMENT "",
  `col_8` varchar(1024) NULL COMMENT "",
  `col_9` varchar(120) NULL COMMENT "",
  `col_10` varchar(60) NULL COMMENT "",
  `col_11` varchar(10) NULL COMMENT "",
  `col_12` varchar(120) NULL COMMENT "",
  `col_13` varchar(524) NULL COMMENT "",
  `col_14` varchar(100) NULL COMMENT "",
  `col_15` varchar(150) NULL COMMENT "",
  `col_16` varchar(150) NULL COMMENT "",
  `col_17` varchar(150) NULL COMMENT "",
  `col_18` bigint(20) NULL COMMENT "",
  `col_19` varchar(500) NULL COMMENT "",
  `col_20` varchar(150) NULL COMMENT "",
  `col_21` tinyint(4) NULL COMMENT "",
  `col_22` int(11) NULL COMMENT "",
  `col_23` varchar(524) NULL COMMENT "",
  `col_24` bigint(20) NULL COMMENT "",
  `col_25` bigint(20) NULL COMMENT "",
  `col_26` varchar(8) NULL COMMENT "",
  `col_27` decimal64(18, 6) NULL COMMENT "",
  `col_28` decimal64(18, 6) NULL COMMENT "",
  `col_29` decimal64(18, 6) NULL COMMENT "",
  `col_30` decimal64(18, 6) NULL COMMENT "",
  `col_31` decimal64(18, 6) NULL COMMENT "",
  `col_32` decimal64(18, 6) NULL COMMENT "",
  `col_33` bigint(20) NULL COMMENT "",
  `col_34` decimal64(18, 6) NULL COMMENT "",
  `col_35` varchar(8) NULL COMMENT "",
  `col_36` decimal64(18, 6) NULL COMMENT "",
  `col_37` decimal64(18, 6) NULL COMMENT "",
  `col_38` varchar(8) NULL COMMENT "",
  `col_39` decimal64(18, 6) NULL COMMENT "",
  `col_40` decimal64(18, 6) NULL COMMENT "",
  `col_41` varchar(8) NULL COMMENT "",
  `col_42` decimal64(18, 6) NULL COMMENT "",
  `col_43` decimal64(18, 6) NULL COMMENT "",
  `col_44` decimal64(18, 6) NULL COMMENT "",
  `col_45` decimal64(18, 6) NULL COMMENT "",
  `col_46` int(11) NULL COMMENT "",
  `col_47` int(11) NOT NULL COMMENT "",
  `col_48` tinyint(4) NULL COMMENT "",
  `col_49` varchar(200) NULL COMMENT "",
  `col_50` tinyint(4) NULL COMMENT "",
  `col_51` varchar(200) NULL COMMENT "",
  `col_52` varchar(10) NULL COMMENT "",
  `col_53` tinyint(4) NULL COMMENT "",
  `col_54` tinyint(4) NULL COMMENT "",
  `col_55` varchar(150) NULL COMMENT "",
  `col_56` varchar(150) NULL COMMENT "",
  `col_57` varchar(500) NULL COMMENT "",
  `col_58` tinyint(4) NULL COMMENT "",
  `col_59` varchar(100) NULL COMMENT "",
  `col_60` varchar(150) NULL COMMENT "",
  `col_61` varchar(150) NULL COMMENT "",
  `col_62` varchar(150) NULL COMMENT "",
  `col_63` varchar(150) NULL COMMENT "",
  `col_64` datetime NULL COMMENT "",
  `col_65` datetime NULL COMMENT ""
) ENGINE=OLAP
PRIMARY KEY(`col_1`, `col_2`, `col_3`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`col_1`, `col_2`) BUCKETS 20
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "V2",
"enable_persistent_index" = "true",
"compression" = "LZ4"
);
```

|PrimaryKey Length| RowNum|BucketNum| Column Num| Partial ColumnNum | PartialUpdate RowsNum| Load time(s)| Apply time(ms)| Peak UpdateMemory usage | Note |
|---------------------|----------|------------|----------------|--------------------|------------------------------|----|-----|-----|----|
|12 Bytes| 300M | 20 |  65 | 5 | 100M | 135261 | 106693 | 78.9G | branch-main |
|12 Bytes| 300M | 20 |  65 | 5 | 100M | 166449| 149870 | 10.3G | branch-opt |
|12 Bytes| 300M | 20 |  65 | 5 | 100K | 2078 | 529 | 60.1M | branch-main |
|12 Bytes| 300M | 20 |  65 | 5 | 100K | 2211 | 541 | 60.2M | branch-opt |
@sevev sevev deleted the optimize_mem_usage_of_partial_update branch August 7, 2023 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement] Optimize the mem usage of partial update
4 participants