Skip to content

Refresh ZK metadata when dimension table is updated#8133

Merged
richardstartin merged 7 commits intoapache:masterfrom
mneedham:dimension-table-refresh-metadata
Mar 21, 2022
Merged

Refresh ZK metadata when dimension table is updated#8133
richardstartin merged 7 commits intoapache:masterfrom
mneedham:dimension-table-refresh-metadata

Conversation

@mneedham
Copy link
Copy Markdown
Contributor

@mneedham mneedham commented Feb 4, 2022

This PR fixes a problem where if you update a dimension table (e.g. by adding a new column and then uploading a new CSV file), the new column can't be read by the lookup function. Below is the type of error you'll see:

[
  {
    "message": "QueryExecutionError:\norg.apache.pinot.spi.exception.BadQueryRequestException: Caught exception while initializing transform function: lookup\n\tat org.apache.pinot.core.operator.transform.function.TransformFunctionFactory.get(TransformFunctionFactory.java:244)\n\tat org.apache.pinot.core.operator.transform.function.TransformFunctionFactory.get(TransformFunctionFactory.java:239)\n\tat org.apache.pinot.core.operator.transform.TransformOperator.<init>(TransformOperator.java:59)\n\tat org.apache.pinot.core.plan.TransformPlanNode.run(TransformPlanNode.java:71)\n...\nCaused by: java.lang.IllegalArgumentException: Column does not exist in dimension table: courses_OFFLINE:startLocation\n\tat shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:383)\n\tat org.apache.pinot.core.operator.transform.function.LookupTransformFunction.init(LookupTransformFunction.java:125)\n\tat org.apache.pinot.core.operator.transform.function.TransformFunctionFactory.get(TransformFunctionFactory.java:242)\n\t... 20 more",
    "errorCode": 200
  }
]

The reason that happens is that _propertyStore and _tableSchema in DimensionTableDataManager don't get refreshed when the addSegment function gets called on the refreshing of the dimension table segment.

This PR refreshes those fields along with the cached lookup table.

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

Documentation

Copy link
Copy Markdown
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

We can override reloadSegment() method and set the schema to the one passed in.
We also need to change _tableSchema and _primaryKeyColumns to volatile because they can be accessed from a different thread

@richardstartin
Copy link
Copy Markdown
Member

Good catch!

We can override reloadSegment() method and set the schema to the one passed in. We also need to change _tableSchema and _primaryKeyColumns to volatile because they can be accessed from a different thread

Actually I think consistency needs to be maintained with the lookup table, so these fields should be updated iff the CAS succeeds. Best way to do this would be to store a volatile wrapper class with all the fields in it, pass the schema and primary key columns into the CAS loop, construct the wrapper then perform the CAS on the wrapper.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 7, 2022

Codecov Report

❗ No coverage uploaded for pull request base (master@e59730a). Click here to learn what that means.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #8133   +/-   ##
=========================================
  Coverage          ?   71.44%           
  Complexity        ?     4305           
=========================================
  Files             ?     1625           
  Lines             ?    84215           
  Branches          ?    12602           
=========================================
  Hits              ?    60167           
  Misses            ?    19952           
  Partials          ?     4096           
Flag Coverage Δ
integration1 28.76% <0.00%> (?)
integration2 27.74% <0.00%> (?)
unittests1 67.97% <100.00%> (?)
unittests2 14.20% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...inot/core/data/manager/offline/DimensionTable.java 100.00% <100.00%> (ø)
...ata/manager/offline/DimensionTableDataManager.java 87.50% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e59730a...1dc69e2. Read the comment docs.

@mneedham
Copy link
Copy Markdown
Contributor Author

mneedham commented Feb 7, 2022

@richardstartin I think I've addressed all your suggestions. Only not sure if I did the right thing for the DimensionTable.

Copy link
Copy Markdown
Member

@richardstartin richardstartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

@richardstartin richardstartin merged commit 681da61 into apache:master Mar 21, 2022
@richardstartin richardstartin mentioned this pull request Mar 21, 2022
weixiangsun pushed a commit to weixiangsun/pinot that referenced this pull request Mar 21, 2022
* Refresh ZK metadata when dimension table is updated

* Update DimensionTableDataManagerTest.java

* all fields into a volatile class in the CAS loop (as per Richard's feedback)

* license missing

* Return DimensionTable instead of passing it in

* Don't mutate the state of DimensionTable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants