Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DATE_TIME should work as the primary time column for Pinot tables #5399

Merged
merged 7 commits into from May 19, 2020

Conversation

npawar
Copy link
Contributor

@npawar npawar commented May 16, 2020

#2756

This PR ensures that Pinot can use a DateTimeFieldSpec as a primary time column for the table.

After this change, we no longer need to use TimeFieldSpec. TimeFieldSpec, TimeGranularitySpec have been marked as Deprecated.
Some tests have been modified to test both cases, some have been changed to use datetime, and some retain time.

Manually tested creating segments, querying and retention for

  1. Schema with multiple dateTimeFieldSpecs
  2. Schema with just timeFieldSpec
  3. Schema with no time
  4. Schema with both time and dateTimeFieldSpec
  5. Above all for hybrid tables as well (except the case with no time)

Some changes

  1. The segment.time.column.name property in the segment metadata is now used to keep primary time column. This will match with the time column name in tableConfig. This can be either timeFieldSpec or dateTimeFieldSpec.
    If the primary time column is dateTimeFieldSpecs, the segment metadata will look like this
    E.g: schema has dimensions d1, d2, metric m1, dateTimeFieldSpecs dt1,dt2.
    Primary time = dt1
segment.dimension.column.names = d1,d2
segment.metric.column.names = m1
segment.datetime.column.names = dt1,dt2
segment.time.column.name = dt1 // Note the repetition

If the primary time column is timeFieldSpecs, the segment metadata will look like this
E.g: schema has dimensions d1, d2, metric m1, dateTimeFieldSpecs dt1,dt2, timeFieldSpec t
Primary time = t

segment.dimension.column.names = d1,d2
segment.metric.column.names = m1
segment.datetime.column.names = dt1,dt2
segment.time.column.name = t

@codecov-io
Copy link

codecov-io commented May 16, 2020

Codecov Report

Merging #5399 into master will increase coverage by 0.03%.
The diff coverage is 73.68%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5399      +/-   ##
==========================================
+ Coverage   66.81%   66.84%   +0.03%     
==========================================
  Files        1091     1091              
  Lines       55783    55782       -1     
  Branches     8365     8363       -2     
==========================================
+ Hits        37273    37290      +17     
+ Misses      15762    15744      -18     
  Partials     2748     2748              
Impacted Files Coverage Δ
...he/pinot/controller/util/AutoAddInvertedIndex.java 0.00% <0.00%> (ø)
...t/core/data/function/FunctionEvaluatorFactory.java 97.43% <ø> (ø)
...va/org/apache/pinot/core/minion/SegmentPurger.java 76.92% <0.00%> (ø)
...tion/batch/common/SegmentGenerationTaskRunner.java 0.00% <0.00%> (ø)
...rc/main/java/org/apache/pinot/spi/data/Schema.java 75.08% <ø> (+2.45%) ⬆️
.../java/org/apache/pinot/spi/data/TimeFieldSpec.java 88.37% <ø> (ø)
...java/org/apache/pinot/spi/utils/TimeConverter.java 88.88% <ø> (ø)
...oker/routing/timeboundary/TimeBoundaryManager.java 89.87% <100.00%> (+0.12%) ⬆️
...manager/realtime/HLRealtimeSegmentDataManager.java 82.48% <100.00%> (+0.32%) ⬆️
...ment/creator/impl/SegmentColumnarIndexCreator.java 90.56% <100.00%> (+2.30%) ⬆️
... and 36 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 430d48e...9cc0d14. Read the comment docs.

@haibow
Copy link
Contributor

haibow commented May 16, 2020

This diff looks like a milestone for TimeFieldSpec deprecation. @npawar shall we include this in 0.4.0 release? We could probably hold off on the release a bit. cc @kishoreg

@npawar
Copy link
Contributor Author

npawar commented May 16, 2020

This diff looks like a milestone for TimeFieldSpec deprecation. @npawar shall we include this in 0.4.0 release? We could probably hold off on the release a bit. cc @kishoreg

Yes, this is a milestone for TimeFieldSpec deprecation. Would be great if we can include in the release

@mcvsubbu
Copy link
Contributor

you have mentioned that you tested schema with no time on hybrid tables. How does that work for query routing?

@npawar
Copy link
Contributor Author

npawar commented May 16, 2020

you have mentioned that you tested schema with no time on hybrid tables. How does that work for query routing?

updated comment

Copy link
Member

@kishoreg kishoreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @Jackie-Jiang, @mcvsubbu please look into this

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok to me other than minor comment.

@mcvsubbu
Copy link
Contributor

If you can please do these steps to test it

  • Create a hybrid table with pinot-0.3.0, let it consume some segments, and have some offline segments as well
  • Upgrade to the current version and then do the following tests:
  1. Schema upgrades fine with a new field added (we are still able to get schema, etc.)
  2. queries work fine (show data from old and new segments), even after new offline and realtime segments are pushed

Ideally, do the tests after upgrade controller, and then again after upgrading broker, and then after upgrading server

@npawar
Copy link
Contributor Author

npawar commented May 19, 2020

If you can please do these steps to test it

  • Create a hybrid table with pinot-0.3.0, let it consume some segments, and have some offline segments as well
  • Upgrade to the current version and then do the following tests:
  1. Schema upgrades fine with a new field added (we are still able to get schema, etc.)
  2. queries work fine (show data from old and new segments), even after new offline and realtime segments are pushed

Ideally, do the tests after upgrade controller, and then again after upgrading broker, and then after upgrading server

Just completed the following test. It all looks green.

Initial setup:

  1. Started ZK and Kafka. Started Controller, Broker, Server using 0.3.0
  2. Created kafka topic and hybrid table
  3. Pushed 2 offline segments. Pushed data to kafka, allowed a segment to complete.
  4. Checked getSchema, checked query, checked data correctness.
  5. Added a field to schema. Reloaded segments. Restarted server.
  6. Checked getSchema, checked query, checked data correctness.

Controller upgrade:

  1. Upgraded controller to new code
  2. Checked getSchema, checked query, checked data correctness.
  3. Added a field to schema. Reloaded segments. Restarted server.
  4. Checked getSchema, checked query, checked data correctness.
  5. Pushed one offline segment. Pushed some more realtime data letting more segments complete.
  6. Checked getSchema, checked query, checked data correctness.

Broker upgrade:

  1. Upgraded broker to new code
  2. Checked getSchema, checked query, checked data correctness.
  3. Added a field to schema. Reloaded segments. Restarted server.
  4. Checked getSchema, checked query, checked data correctness.
  5. Pushed one offline segment. Pushed some more realtime data letting more segments complete.
  6. Checked getSchema, checked query, checked data correctness.

Server upgrade:

  1. Upgraded server to new code
  2. Checked getSchema, checked query, checked data correctness.
  3. Added a field to schema. Reloaded segments. Restarted server.
  4. Checked getSchema, checked query, checked data correctness.
  5. Pushed one offline segment. Pushed some more realtime data letting more segments complete.
  6. Checked getSchema, checked query, checked data correctness.

@mcvsubbu
Copy link
Contributor

Really appreciate your diligence @npawar thanks

@npawar npawar merged commit 85474ba into apache:master May 19, 2020
@npawar npawar deleted the datetimefieldspec_take_2 branch May 19, 2020 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants