Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support multiple intervals in dataSource inputSpec #1988

Merged
merged 3 commits into from
Dec 4, 2015

Conversation

himanshug
Copy link
Contributor

so that users can read data from multiple intervals from input dataSource

@@ -177,13 +177,13 @@ Here is what goes inside "ingestionSpec"
|Field|Type|Description|Required|
|-----|----|-----------|--------|
|dataSource|String|Druid dataSource name from which you are loading the data.|yes|
|interval|String|A string representing ISO-8601 Intervals.|yes|
|interval|String|This is deprecated, please use intervals.|no|
|intervals|List|A list representing ISO-8601 Intervals.|yes|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list of strings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


Preconditions.checkArgument(
interval != null && intervals != null && !intervals.isEmpty(),
"pls specify intervals only"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use full words too!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@himanshug himanshug force-pushed the multi-interval-batch-delta branch 2 times, most recently from 37a0499 to 60bcd10 Compare November 22, 2015 20:15
@xvrl
Copy link
Member

xvrl commented Nov 24, 2015

👍

@himanshug himanshug added this to the 0.9.0 milestone Dec 2, 2015
}

@Override
public List<DataSegment> getUsedSegmentsForIntervals(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could potentially return the same segments multiple times, I think. Is that bad?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of the case where two of the intervals in the list both partially overlap the same segment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even if two intervals in the list overlap each other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a test for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch.
but, it wouldn't matter because list of intervals is always list of "disjoint" intervals ensured by calling JodaUtils.condenseIntervals(..). In case of 2 intervals overlapping same segment, this would have given same segment twice, which again wouldn't matter because caller uses "windowed" segments appropriately.
however, that looks weird from api perspective so updated the code to remove duplicates and also updated the test case to verify same.

@gianm
Copy link
Contributor

gianm commented Dec 3, 2015

@himanshug looks like a legitimate ci failure. Could you fix that and squash the commits? 👍 from me after that, everything else looks good

[ERROR] /home/travis/build/druid-io/druid/indexing-service/src/test/java/io/druid/indexing/test/TestIndexerMetadataStorageCoordinator.java:[35,8] io.druid.indexing.test.TestIndexerMetadataStorageCoordinator is not abstract and does not override abstract method getUsedSegmentsForIntervals(java.lang.String,java.util.List<org.joda.time.Interval>) in io.druid.indexing.overlord.IndexerMetadataStorageCoordinator

@himanshug
Copy link
Contributor Author

@gianm ah that happened due to rebase not updating a new class. fixed

gianm added a commit that referenced this pull request Dec 4, 2015
support multiple intervals in dataSource inputSpec
@gianm gianm merged commit 20544d4 into apache:master Dec 4, 2015
@himanshug himanshug deleted the multi-interval-batch-delta branch December 5, 2015 17:16
@fjy fjy mentioned this pull request Feb 5, 2016
kfaraz added a commit that referenced this pull request Jun 26, 2024
Changes:
- Rename `UsedSegmentChecker` to `PublishedSegmentsRetriever`
- Remove deprecated single `Interval` argument from `RetrieveUsedSegmentsAction`
as it is now unused and has been deprecated since #1988 
- Return `Set` of segments instead of a `Collection` from `IndexerMetadataStorageCoordinator.retrieveUsedSegments()`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants