Skip to content

Making NormalizedDateSegmentNameGenerator use joda time to keep consistency other time handling#10721

Merged
snleee merged 3 commits intoapache:masterfrom
timsants:joda_segment_name_generator
Jun 2, 2023
Merged

Making NormalizedDateSegmentNameGenerator use joda time to keep consistency other time handling#10721
snleee merged 3 commits intoapache:masterfrom
timsants:joda_segment_name_generator

Conversation

@timsants
Copy link
Contributor

@timsants timsants commented May 4, 2023

Context

For a Kafka ingestion use case, the time column was configured with a time format that was working as expected. But when configuring a minion task leveraging the SegmentProcessorFramework, a date format exception was thrown with the same exact time column/format:

Caught exception while parsing simple date format: yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSSZ with value: 2023-03-07T21:18:05.072784135Z at org.apache.pinot.segment.spi.creator.name.NormalizedDateSegmentNameGenerator.getNormalizedDate(NormalizedDateSegmentNameGenerator.java:142)

Specifically the realtime ingestion was using SegmentColumnarIndexCreator, which uses joda time DateFormatter for parsing time format into epoch millis (See SegmentColumnarIndexCreator).

But the minion task was configured to use the NormalizedDateSegmentNameGenerator which failed since it uses another date time format library, java.text SimpleDateFormat. This library seems to be stricter that the joda time library and if a time column values has literal ‘Z’ in it, the time format must have the Z in single quotes. E.g. "2023-01-01T12:00:00.111111111Z" needs a time format of "yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSS’Z’" otherwise it will fail.

Changes

This PR makes the NormalizedDateSegmentNameGenerator also use the joda date time formatter. Joda is no longer maintained and java.text seems to be outdated. This PR aims to unify the time library until we are ready to make the switch to use the newer java.time library.

Instructions:

  1. The PR has to be tagged with at least one of the following labels (*):
    1. feature
    2. bugfix
    3. performance
    4. ui
    5. backward-incompat
    6. release-notes (**)
  2. Remove these instructions before publishing the PR.

(*) Other labels to consider:

  • testing
  • dependencies
  • docker
  • kubernetes
  • observability
  • security
  • code-style
  • extension-point
  • refactor
  • cleanup

(**) Use release-notes label for scenarios like:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed

@codecov-commenter
Copy link

codecov-commenter commented May 4, 2023

Codecov Report

Merging #10721 (37aef31) into master (ab4e8fa) will increase coverage by 1.79%.
The diff coverage is 78.57%.

@@             Coverage Diff              @@
##             master   #10721      +/-   ##
============================================
+ Coverage     68.41%   70.20%   +1.79%     
- Complexity     6518     6529      +11     
============================================
  Files          2164     2164              
  Lines        116410   116412       +2     
  Branches      17609    17609              
============================================
+ Hits          79637    81725    +2088     
+ Misses        31136    28990    -2146     
- Partials       5637     5697      +60     
Flag Coverage Δ
integration1 24.03% <0.00%> (-0.02%) ⬇️
integration2 23.55% <0.00%> (?)
unittests1 67.69% <78.57%> (+<0.01%) ⬆️
unittests2 13.65% <0.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...eator/name/NormalizedDateSegmentNameGenerator.java 83.60% <76.92%> (+0.27%) ⬆️
.../org/apache/pinot/spi/data/DateTimeFormatSpec.java 83.06% <100.00%> (+0.13%) ⬆️

... and 171 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for quickly addressing this!

Can you add the testing case where the format works in joda while it's not working in java to cover this case?

@timsants
Copy link
Contributor Author

Thank you for quickly addressing this!

Can you add the testing case where the format works in joda while it's not working in java to cover this case?

Will do. Also I checked out the other references and it doesn't seem like they will be impacted by this case since they are formatting a constant string.

Tim Santos added 3 commits May 31, 2023 16:29
@timsants timsants force-pushed the joda_segment_name_generator branch from cb726d8 to 37aef31 Compare May 31, 2023 23:30
Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Can you double check on the tests?

@snleee snleee merged commit 8468857 into apache:master Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants