Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Union bySegment queries can get confused about which descriptors to use #1727

Closed
gianm opened this issue Sep 13, 2015 · 3 comments
Closed

Union bySegment queries can get confused about which descriptors to use #1727

gianm opened this issue Sep 13, 2015 · 3 comments
Labels

Comments

@gianm
Copy link
Contributor

gianm commented Sep 13, 2015

When historicals get a union query with multiple datasources, they can get confused and scan segments other than the ones the broker intended. This is because they use UnionTimeLineLookup.findEntry(interval, version) to resolve the descriptors into Segments, but it's possible for segments in different datasources to have the same interval and version. This is actually always going to happen if the datasources were ingested with standalone realtime, since the version is fixed to the start of the hour.

This shows a couple of union queries where one segment is scanned twice and one is not scanned at all (which one is scanned depends on the order of the datasources; see union.json vs union2.json): https://gist.github.com/gianm/fc3a1eba0dbafbf87720

@fjy
Copy link
Contributor

fjy commented Sep 13, 2015

@nishantmonu51

@nishantmonu51
Copy link
Member

Ah, the reason behind introducing UnionTimelineLookup at the historical was to distribute the merging load on both historical and broker.

It seems the historical does not have enough information to work on multiple DataSources and figure out the correct one here.
we would either need to add the dataSource info to the MultipleSegmentSpec or revert back to the old way of letting the broker handle breaking the query into individual datasources and send query for individual datasources down to the historical nodes.

I think we could revert back the breaking changes and let the broker send the query for individual DS down to the historical nodes. @fjy @gianm I will make a PR for reverting them if it sounds good to you ?

@gianm
Copy link
Contributor Author

gianm commented Sep 14, 2015

@nishantmonu51 that approach sounds good to me- I think the other alternative would be to add dataSources to SegmentDescriptors, which seems like a big change for a feature that is not very heavily used.

@gianm gianm added the Bug label Sep 14, 2015
nishantmonu51 added a commit to metamx/druid that referenced this issue Sep 29, 2015
Fixes apache#1727.
revert to doing merging for results for union queries on broker.

revert unrelated changes

Add test for union query runner

Add test

remove unused imports

fix imports

fix renamed file

fix test

update docs.
drcrallen added a commit that referenced this issue Sep 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants