Skip to content

Support default null value in data preprocessing job#7739

Merged
jackjlli merged 1 commit intomasterfrom
support-default-null-value-in-preprocessing
Nov 12, 2021
Merged

Support default null value in data preprocessing job#7739
jackjlli merged 1 commit intomasterfrom
support-default-null-value-in-preprocessing

Conversation

@jackjlli
Copy link
Member

@jackjlli jackjlli commented Nov 10, 2021

Description

This PR supports default null value for data preprocessing job.
If the value of partitioning column is null, then use the default null value to distribute the data to all the reducers.
If the value of sorting column is null, then use the default null value for sorting within each reducer.

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

Documentation

@jackjlli jackjlli requested a review from snleee November 10, 2021 19:40
@codecov-commenter
Copy link

codecov-commenter commented Nov 10, 2021

Codecov Report

Merging #7739 (5c8f4f3) into master (13c9ee9) will increase coverage by 0.15%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #7739      +/-   ##
============================================
+ Coverage     71.49%   71.65%   +0.15%     
+ Complexity     4064     4061       -3     
============================================
  Files          1577     1577              
  Lines         80554    80595      +41     
  Branches      11965    11978      +13     
============================================
+ Hits          57592    57747     +155     
+ Misses        19078    18962     -116     
- Partials       3884     3886       +2     
Flag Coverage Δ
integration1 29.45% <ø> (+0.32%) ⬆️
integration2 27.86% <ø> (-0.02%) ⬇️
unittests1 68.56% <ø> (-0.04%) ⬇️
unittests2 14.58% <ø> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...inot/core/util/SegmentCompletionProtocolUtils.java 57.69% <0.00%> (-7.70%) ⬇️
.../helix/core/minion/MinionInstancesCleanupTask.java 77.27% <0.00%> (-4.55%) ⬇️
.../startree/v2/builder/OffHeapSingleTreeBuilder.java 87.42% <0.00%> (-4.20%) ⬇️
.../java/org/apache/pinot/spi/data/TimeFieldSpec.java 88.63% <0.00%> (-2.28%) ⬇️
...e/pinot/common/utils/FileUploadDownloadClient.java 64.37% <0.00%> (-1.88%) ⬇️
...ache/pinot/common/metadata/ZKMetadataProvider.java 82.70% <0.00%> (-0.76%) ⬇️
...e/pinot/core/transport/InstanceRequestHandler.java 60.75% <0.00%> (-0.36%) ⬇️
...roker/requesthandler/BaseBrokerRequestHandler.java 70.93% <0.00%> (-0.20%) ⬇️
...apache/pinot/spi/ingestion/batch/spec/TlsSpec.java 0.00% <0.00%> (ø)
...pinot/server/api/access/AllowAllAccessFactory.java
... and 36 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13c9ee9...5c8f4f3. Read the comment docs.

@snleee
Copy link
Contributor

snleee commented Nov 11, 2021

As we discussed, I don't think that we should evenly distribute null value across all reducers because this will break the partitioning contract. Instead, I think that the data owner should correctly purge the data correctly not to have the null value for the column that they sort & partition on. As long as we do the key salting, it's probably the best thing we can do given the skewed data.

@jackjlli jackjlli force-pushed the support-default-null-value-in-preprocessing branch from 87f83bf to 08fbc20 Compare November 11, 2021 22:39
@jackjlli
Copy link
Member Author

@snleee updated the PR based on the discussion.

@jackjlli jackjlli force-pushed the support-default-null-value-in-preprocessing branch from 08fbc20 to 5c8f4f3 Compare November 12, 2021 21:45
Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jackjlli jackjlli merged commit 068549c into master Nov 12, 2021
@jackjlli jackjlli deleted the support-default-null-value-in-preprocessing branch November 12, 2021 23:05
kriti-sc pushed a commit to kriti-sc/incubator-pinot that referenced this pull request Dec 12, 2021
Co-authored-by: Jack Li(Analytics Engineering) <jlli@jlli-mn1.linkedin.biz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants