-
Notifications
You must be signed in to change notification settings - Fork 34
PLUGIN-1925: Introduce Custom BigDecimal Splitter #619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PLUGIN-1925: Introduce Custom BigDecimal Splitter #619
Conversation
0bc1267
to
30a6fc0
Compare
LGTM |
database-commons/src/main/java/io/cdap/plugin/db/source/CustomBigDecimalSplitter.java
Outdated
Show resolved
Hide resolved
*/ | ||
public class CustomBigDecimalSplitter extends BigDecimalSplitter { | ||
|
||
public static final int SCALE_BUFFER = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the SCALE_BUFFER determined(add a comment)? Would this value work for all scenarios ?
If not what is the guiding principle to determine the correct scale buffer to have? If it needs to be determined externally ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the comment 9f26405. 5
gives a good enough buffer to maintain the accuracy during the division for all practical scenarios. This ensures correctness even for databases that support very large precision values. The same buffer was applied while fixing the OOM issue for the PostgreSQL Connector as well.
* Custom implementation of {@link BigDecimalSplitter} to ensures safe and precise division of BigDecimal values while | ||
* calculating split points for NUMERIC and DECIMAL types. | ||
*/ | ||
public class CustomBigDecimalSplitter extends BigDecimalSplitter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class name like CustomBigDecimalSplitter does not give much details about the intent of the class. Can you work on a better name for this class based on what it really provide ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to SafeBigDecimalSplitter
9f26405
database-commons/src/main/java/io/cdap/plugin/db/source/DataDrivenETLDBInputFormat.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
9f26405
to
8a24390
Compare
PLUGIN-1925
Issue
The current split point calculation logic in the Oracle Plugin generates excessively large split point lists (up to ~10M), leading to high memory consumption and OOM errors. This occurs when the difference between minVal and maxVal is smaller than the number of splits, causing division to result in
0
and fallback toMIN_SPLIT_SIZE
, which explodes the number of splitsRoot Cause
BigDecimal.divide()
.MIN_SPLIT_SIZE
inflated splits drastically.Proposed Fix
BigDecimalSplitter.tryDivide()
with custom implementation.