Skip to content

Conversation

sgarg-CS
Copy link
Contributor

@sgarg-CS sgarg-CS commented Sep 5, 2025

PLUGIN-1925

Issue

The current split point calculation logic in the Oracle Plugin generates excessively large split point lists (up to ~10M), leading to high memory consumption and OOM errors. This occurs when the difference between minVal and maxVal is smaller than the number of splits, causing division to result in 0 and fallback to MIN_SPLIT_SIZE, which explodes the number of splits

Root Cause

  • No explicit scale set in BigDecimal.divide().
  • Division like 1 / 4 returned 0 instead of 0.25 due to lack of scale. Default: 0
  • Fallback to MIN_SPLIT_SIZE inflated splits drastically.

Proposed Fix

  • Determine scale from column metadata (NUMERIC/DECIMAL).
  • Apply buffer of +5 digits to preserve accuracy.
  • Override BigDecimalSplitter.tryDivide() with custom implementation.
  • Overriding the DataDrivenETLDBInputFormat.getSplitter method

@sgarg-CS sgarg-CS added the build label Sep 5, 2025
@sgarg-CS sgarg-CS marked this pull request as ready for review September 12, 2025 11:30
@anup-cloudsufi
Copy link

LGTM

*/
public class CustomBigDecimalSplitter extends BigDecimalSplitter {

public static final int SCALE_BUFFER = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the SCALE_BUFFER determined(add a comment)? Would this value work for all scenarios ?
If not what is the guiding principle to determine the correct scale buffer to have? If it needs to be determined externally ?

Copy link
Contributor Author

@sgarg-CS sgarg-CS Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the comment 9f26405. 5 gives a good enough buffer to maintain the accuracy during the division for all practical scenarios. This ensures correctness even for databases that support very large precision values. The same buffer was applied while fixing the OOM issue for the PostgreSQL Connector as well.

* Custom implementation of {@link BigDecimalSplitter} to ensures safe and precise division of BigDecimal values while
* calculating split points for NUMERIC and DECIMAL types.
*/
public class CustomBigDecimalSplitter extends BigDecimalSplitter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class name like CustomBigDecimalSplitter does not give much details about the intent of the class. Can you work on a better name for this class based on what it really provide ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to SafeBigDecimalSplitter 9f26405

Copy link
Contributor

@MrRahulSharma MrRahulSharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sgarg-CS sgarg-CS merged commit c1c3d5b into data-integrations:develop Sep 18, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants