Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMORO-2266] Improve combined data reader performance #2268

Merged
merged 24 commits into from
Nov 14, 2023

Conversation

zhongqishang
Copy link
Contributor

@zhongqishang zhongqishang commented Nov 9, 2023

Why are the changes needed?

Close #2266 .

Brief change log

  • Add condition trigger(Eq-Delete record count > Data record count * 2.5) filter eq delete write to structLikeMap

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

Signed-off-by: Qishang Zhong <zhongqishang@gmail.com>
@github-actions github-actions bot added module:core Core module module:mixed-hive Hive moduel for Mixed Format labels Nov 9, 2023
Copy link

codecov bot commented Nov 9, 2023

Codecov Report

Attention: 7 lines in your changes are missing coverage. Please review.

Comparison is base (233be9f) 52.40% compared to head (8c24587) 52.47%.

Files Patch % Lines
...netease/arctic/io/reader/CombinedDeleteFilter.java 89.36% 3 Missing and 2 partials ⚠️
...com/netease/arctic/io/reader/StructLikeFunnel.java 75.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2268      +/-   ##
============================================
+ Coverage     52.40%   52.47%   +0.06%     
- Complexity     4127     4140      +13     
============================================
  Files           500      501       +1     
  Lines         28936    28975      +39     
  Branches       2825     2829       +4     
============================================
+ Hits          15165    15205      +40     
- Misses        12537    12540       +3     
+ Partials       1234     1230       -4     
Flag Coverage Δ
core 52.73% <88.13%> (+0.09%) ⬆️
trino 51.14% <ø> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zhongqishang zhongqishang marked this pull request as ready for review November 9, 2023 05:27
@github-actions github-actions bot removed the module:mixed-hive Hive moduel for Mixed Format label Nov 10, 2023
@github-actions github-actions bot added the module:ams-dashboard Ams dashboard module label Nov 13, 2023
Signed-off-by: Qishang Zhong <zhongqishang@gmail.com>
Copy link
Contributor

@zhoujinsong zhoujinsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@shidayang
Copy link
Contributor

LGTM

@zhoujinsong zhoujinsong merged commit ad8fedb into apache:master Nov 14, 2023
6 checks passed
hameizi pushed a commit to hameizi/arctic that referenced this pull request Nov 28, 2023
* [AMORO-2266] Improve combined data reader performance

Signed-off-by: Qishang Zhong <zhongqishang@gmail.com>

* Add doc

* Fix spotless apply

* [AMORO-2266] Use guava bloom filter

Signed-off-by: Qishang Zhong <zhongqishang@gmail.com>

* [AMORO-2266] Use guava bloom filter

Signed-off-by: Qishang Zhong <zhongqishang@gmail.com>

* Fix structLike serialize

* Fix `rewrittenDataRecordCnt` to constructor

* Add test case for various types

* Fix test name and add assert

* Fix comment

* Move readIdentifierData() to CombinedDeleteFilter

* Remove StructLikeWrapper

* Fix comment

* Fix comment and add test

* Add units test

* fix log

Signed-off-by: Qishang Zhong <zhongqishang@gmail.com>

* Fix filterEqDelete trigger logic

* Add parameter FILTER_EQ_DELETE_TRIGGER_RECORD_COUNT

* Fix test

---------

Signed-off-by: Qishang Zhong <zhongqishang@gmail.com>
(cherry picked from commit ad8fedb)
zhoujinsong pushed a commit that referenced this pull request Dec 19, 2023
* [AMORO-2266] Improve combined data reader performance

Signed-off-by: Qishang Zhong <zhongqishang@gmail.com>

* Add doc

* Fix spotless apply

* [AMORO-2266] Use guava bloom filter

Signed-off-by: Qishang Zhong <zhongqishang@gmail.com>

* [AMORO-2266] Use guava bloom filter

Signed-off-by: Qishang Zhong <zhongqishang@gmail.com>

* Fix structLike serialize

* Fix `rewrittenDataRecordCnt` to constructor

* Add test case for various types

* Fix test name and add assert

* Fix comment

* Move readIdentifierData() to CombinedDeleteFilter

* Remove StructLikeWrapper

* Fix comment

* Fix comment and add test

* Add units test

* fix log

Signed-off-by: Qishang Zhong <zhongqishang@gmail.com>

* Fix filterEqDelete trigger logic

* Add parameter FILTER_EQ_DELETE_TRIGGER_RECORD_COUNT

* Fix test

---------

Signed-off-by: Qishang Zhong <zhongqishang@gmail.com>
(cherry picked from commit ad8fedb)
Signed-off-by: zhoujinsong <463763777@qq.com>
ShawHee pushed a commit to ShawHee/arctic that referenced this pull request Dec 29, 2023
* [AMORO-2266] Improve combined data reader performance

Signed-off-by: Qishang Zhong <zhongqishang@gmail.com>

* Add doc

* Fix spotless apply

* [AMORO-2266] Use guava bloom filter

Signed-off-by: Qishang Zhong <zhongqishang@gmail.com>

* [AMORO-2266] Use guava bloom filter

Signed-off-by: Qishang Zhong <zhongqishang@gmail.com>

* Fix structLike serialize

* Fix `rewrittenDataRecordCnt` to constructor

* Add test case for various types

* Fix test name and add assert

* Fix comment

* Move readIdentifierData() to CombinedDeleteFilter

* Remove StructLikeWrapper

* Fix comment

* Fix comment and add test

* Add units test

* fix log

Signed-off-by: Qishang Zhong <zhongqishang@gmail.com>

* Fix filterEqDelete trigger logic

* Add parameter FILTER_EQ_DELETE_TRIGGER_RECORD_COUNT

* Fix test

---------

Signed-off-by: Qishang Zhong <zhongqishang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:ams-dashboard Ams dashboard module module:core Core module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement]: Improve optimizer's compaction performance in large delete scenarios
3 participants