Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for loadByPeriod rule #1362

Closed
wants to merge 1 commit into from

Conversation

Projects
None yet
5 participants
@anubhgup
Copy link
Contributor

commented May 13, 2015

If a segment (say created by merge from multiple smaller segments), has a start time less than the loadByPeriod start time, it is currently dropped by the coordinator even if it overlaps the load period. This CL fixes that.

@anubhgup anubhgup force-pushed the anubhgup:master branch from f9d5850 to 0e031fc May 13, 2015

@@ -83,6 +83,6 @@ public boolean appliesTo(DataSegment segment, DateTime referenceTimestamp)
public boolean appliesTo(Interval interval, DateTime referenceTimestamp)
{
final Interval currInterval = new Interval(period, referenceTimestamp);
return currInterval.overlaps(interval) && interval.getStartMillis() >= currInterval.getStartMillis();
return currInterval.overlaps(interval);

This comment has been minimized.

Copy link
@nishantmonu51

nishantmonu51 May 14, 2015

Member

would you mind adding a unit test too?

@fjy

This comment has been minimized.

Copy link
Contributor

commented May 14, 2015

@anubhgup This is not a bug and was intended by design. This change also breaks router logic.

Anubhav Gupta
Load a segment by period if it's interval overlaps the period in the …
…loadByPeriod rule, even if the start time of the segment is less than the period start.

@anubhgup anubhgup force-pushed the anubhgup:master branch from 0e031fc to ede7925 May 14, 2015

@anubhgup

This comment has been minimized.

Copy link
Contributor Author

commented May 14, 2015

Added unit test.

@anubhgup

This comment has been minimized.

Copy link
Contributor Author

commented May 14, 2015

What are the intended semantics of loadByPeriod? In my use case, this is breaking things since it leads to a segment getting dropped even though it contains data in my retention period. How can I work around this problem? Currently, I have a retention rule that says "loadByPeriod=7D, dropForever". When I enabled segment merging, any segment that spans that 7D boundary gets dropped by the coordinator.

@fjy

This comment has been minimized.

Copy link
Contributor

commented May 14, 2015

@anubhgup Sorry I had to run before. Writing up a more complete view on loadByPeriod.

I am actually okay with redefining the semantics of loadByPeriod, and they should be changed to make more sense. The one use case that is currently problematic is that the router uses rules to determine how to forward queries. In a setup where you have different brokers dedicated to different time ranges, if the router receives a 1 yr query, we want this query to route to the broker dedicated for slow (older) queries. The changes here would mean the query would be forwarded to the brokers dedicated for fast queries. I am 100% in favor of removing the router entirely and having the routing logic live on the brokers.

One simple workaround is to introduce a new rule type, loadByPeriodByOverlap or something and use that. The correct solution is to rethink the router and some of the rules, which may require some more time dedication.

@anubhgup

This comment has been minimized.

Copy link
Contributor Author

commented May 14, 2015

Thanks @fjy. I can introduce the new rule type. Would it be useful to merge the new rule CL to the main branch, or should I just implement that for local use?

@xvrl

This comment has been minimized.

Copy link
Member

commented May 15, 2015

@anubhgup feel free to submit a PR, I don't see a reason not to include it in the main branch.

@cheddar

This comment has been minimized.

Copy link
Contributor

commented May 18, 2015

Can we call it loadByOverlapPeriod instead?

Also, those semantics about dropping even though it has relevant data are very unintuitive, was that introduced when we added the router? Just as a general rule, the load/drop/retention rule's semantics should be defined according to what makes sense for the retention of segments. If the router's logic cannot re-use them as is, then it shouldn't be a problem with the interface, it just means that the router cannot reuse that same logic...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.