Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more options to json index #9543

Merged
merged 2 commits into from Oct 10, 2022

Conversation

Jackie-Jiang
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang commented Oct 6, 2022

Release Notes

Added jsonIndexConfigs (map from column to config) under tableIndexConfig.
The following configs are added to json column:

  • maxLevels: max levels to flatten the json object (Limit the depth of Json index #9476), int, default -1 (unlimited levels)
  • excludeArray: do not flatten array, boolean, default false
  • disableCrossArrayUnnest: do not unnest multiple arrays (unique combination of all elements), boolean, default false
  • includePaths: only flatten the given paths, set of strings, default null (include all paths), example ["$.a.b", "$.a.c[*]"]
  • excludePaths: exclude the given paths when flattening, set of strings, default null (include all paths), example ["$.a.b", "$.a.c[*]"]
  • excludeFields: exclude the given fields when flattening, set of strings, default null (include all paths), example ["b", "c"]

When jsonIndexConfigs is configured, the old jsonIndexColumns will be ignored.

Example:

{
  ...
  "tableIndexConfig": {
    ...
    "jsonIndexConfigs": {
      "jsonColumn": {
        "excludeArray": true
      }
    },
    ...
  },
  ...
}

@Jackie-Jiang Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes Configuration Config changes (addition/deletion/change in behavior) labels Oct 6, 2022
@codecov-commenter
Copy link

codecov-commenter commented Oct 6, 2022

Codecov Report

Merging #9543 (ce070e7) into master (c5d4b15) will increase coverage by 0.03%.
The diff coverage is 82.93%.

@@             Coverage Diff              @@
##             master    #9543      +/-   ##
============================================
+ Coverage     69.90%   69.94%   +0.03%     
- Complexity     4797     4876      +79     
============================================
  Files          1927     1930       +3     
  Lines        102729   102990     +261     
  Branches      15592    15622      +30     
============================================
+ Hits          71811    72032     +221     
- Misses        25847    25882      +35     
- Partials       5071     5076       +5     
Flag Coverage Δ
integration1 26.02% <0.47%> (+0.04%) ⬆️
integration2 24.71% <0.47%> (+0.06%) ⬆️
unittests1 67.29% <82.93%> (+0.02%) ⬆️
unittests2 15.66% <0.00%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ry/optimizer/statement/JsonStatementOptimizer.java 0.00% <0.00%> (ø)
.../apache/pinot/spi/config/table/IndexingConfig.java 90.69% <33.33%> (-2.08%) ⬇️
...ment/creator/impl/DefaultIndexCreatorProvider.java 67.85% <50.00%> (-0.44%) ⬇️
...local/segment/index/loader/IndexLoadingConfig.java 71.25% <52.63%> (-1.68%) ⬇️
...t/index/loader/invertedindex/JsonIndexHandler.java 72.83% <63.63%> (-1.19%) ⬇️
...nt/creator/impl/inv/json/BaseJsonIndexCreator.java 93.75% <75.00%> (-0.10%) ⬇️
...he/pinot/segment/local/utils/TableConfigUtils.java 67.56% <75.00%> (-0.01%) ⬇️
...ot/segment/spi/creator/SegmentGeneratorConfig.java 80.89% <80.00%> (-0.23%) ⬇️
...local/realtime/impl/json/MutableJsonIndexImpl.java 86.40% <81.81%> (ø)
...apache/pinot/spi/config/table/JsonIndexConfig.java 88.00% <88.00%> (ø)
... and 51 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

private boolean _disableCrossArrayUnnest = false;
private Set<String> _includePaths;
private Set<String> _excludePaths;
private Set<String> _excludeFields;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to add an "_excludeTypes"?
for example, if the node is an array i will not need to enumerate them all out

also good to add a javadoc to explain which overwrites which (e.g. in both include and exclude, etc)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excludeArray is for that purpose. IMO, excludeValue and excludeObject doesn't make a lot of sense for JSON.
Added javadoc to explain the behavior of each config.

@Jackie-Jiang Jackie-Jiang merged commit 65e01f1 into apache:master Oct 10, 2022
@Jackie-Jiang Jackie-Jiang deleted the json_index_improvement branch October 10, 2022 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Configuration Config changes (addition/deletion/change in behavior) feature release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants