New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show candidate hosts for the given query #2282
Conversation
@navis can you provide some more information on why this is feature is required. You can get this information through Druid metrics. |
@fjy I'm thinking of implementing a druid storage handler for hive. marked as 'wip'. |
@navis having a hive storage adapter would be good but it appears you are thinking of fetching data from historical nodes (and realtime nodes). IMO it will be much more scalable to read segments from hdfs directly. |
Because we are supposed to use both of druid(simple select/groupby) and hive(for joins, udfs, etc) for the same dataset, it's highly possible that historical node already loaded segments to be used and we want to reuse it if possible. I think we can add a method to ask the historical node about local temporary directory it used for bypassing druid access. I don't know how the InputFormat you've mentioned is implemented. Is it support predicates to be pushed to use bitmap index in druid smooth file? If it is, there is not much work to be done for me :). |
@navis the InputFormat I refered to is https://github.com/himanshug/druid-hadoop-utils/blob/master/druid-mr/src/main/java/com/yahoo/druid/hadoop/DruidInputFormat.java which is a simple wrapper on https://github.com/druid-io/druid/blob/master/indexing-hadoop/src/main/java/io/druid/indexer/hadoop/DatasourceInputFormat.java . Regarding pushing predicates and iterating through rows in the segment, you can do that with https://github.com/druid-io/druid/blob/master/server/src/main/java/io/druid/segment/realtime/firehose/IngestSegmentFirehose.java which is used in https://github.com/druid-io/druid/blob/master/indexing-hadoop/src/main/java/io/druid/indexer/hadoop/DatasourceRecordReader.java . |
@himanshug Thanks, I'll check that. |
7b9084b
to
8f1a2df
Compare
Anyway, below is the test result in my notebook.
|
8f1a2df
to
8fd3aec
Compare
Added get method something like @himanshug I need know target segments and locations from predicates (or sarg) made in hive. |
8fd3aec
to
1485d67
Compare
d087707
to
a59b9a5
Compare
I've made patch integrating druid with hive based on this patch. Can anyone review this? I think this is fairly simple patch. |
Looks like the coordinator also have the segments info, so what time is the good to query coordinator and what time is the good to query broker? |
for (Interval interval : intervals) { | ||
for (TimelineObjectHolder<String, ServerSelector> holder : timeline.lookup(interval)) { | ||
for (PartitionChunk<ServerSelector> chunk : holder.getObject()) { | ||
ServerSelector selector = chunk.getObject(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Size can get from ServerSelector?
ServerSelector.getSegment().getSize()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot understand how I've missed that three months ago. Then we can remove controversial approxSize
part safely. Thanks.
a59b9a5
to
5f60c24
Compare
Patch looks much clean now, i think it is good. |
Need doc for how to use it. |
5f60c24
to
4fd4120
Compare
Fail of |
|
||
@JsonCreator | ||
public LocatedSegmentDescriptor( | ||
@JsonProperty("itvl") Interval interval, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for public APIs we should probably use fully qualified names instead of abbreviations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this class is extending SegmentDescriptor
and it uses "itvl" as json property name. should we create separated class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SegmentDescriptor is not exposed to the user so it's less of a problem. However, all our classes that are exposed to the user, use either interval
or intervals
so we should keep that consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good.
@@ -300,6 +303,24 @@ public int compare(Interval o1, Interval o2) | |||
return metrics; | |||
} | |||
|
|||
@GET | |||
@Path("/{dataSourceName}/candidates/intervals/{intervals}/numCandidates/{numCandidates}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding @ResourceFilters(DatasourceResourceFilter.class)
to the endpoint will add the auth logic. So the endpoint will look like this -
@GET
@Path("/{dataSourceName}/candidates/intervals/{intervals}/numCandidates/{numCandidates}")
@Produces(MediaType.APPLICATION_JSON)
@ResourceFilters(DatasourceResourceFilter.class)
public Iterable<LocatedSegmentDescriptor> getQueryTargets(
@PathParam("dataSourceName") String datasource,
@PathParam("intervals") String intervals,
@PathParam("numCandidates") String numCandidates,
@Context final HttpServletRequest req
) throws IOException
If this can be added then it's great otherwise I can do a follow up PR to add it.
{ | ||
public static final int DEFAULT_NUM_CANDIDATES = 5; | ||
|
||
public static List<LocatedSegmentDescriptor> getTargetLocations( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add couple of UTs to this core method ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
*/ | ||
public class ServerViewUtil | ||
{ | ||
public static final int DEFAULT_NUM_CANDIDATES = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: @navis someone my want to have all the server by default but maybe be you have a specific use case in mind ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really. I'll change default behavior to get all servers.
@@ -59,7 +59,17 @@ Returns the dimensions of the datasource. | |||
|
|||
Returns the metrics of the datasource. | |||
|
|||
* `/druid/v2/datasources/{dataSourceName}/candidates/intervals/{comma-separated-intervals-in-ISO8601-format}/numCandidates/{numCandidates}` | |||
|
|||
Returns segment information lists including server locations for the given datasource and intervals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we elect to go with a short list of 5 server if numCanditates
is absent, we should add this to the doc i guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now default returns all servers. updated doc anyway.
private final List<DruidServerMetadata> locations; | ||
|
||
@JsonCreator | ||
public LocatedSegmentDescriptor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some ser/desr test will be nice thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@navis i left same small comments
@navis i meant some not same sorry ! |
@b-slim Addressed comments |
List<Interval> intervalList = Lists.newArrayList(); | ||
for (String interval : intervals.split(",")) { | ||
intervalList.add(Interval.parse(interval.trim())); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When Intervals are specified as PathParam do we need to do any handling for '/' ?
e.g similar thing is done in DataSourcesResource.deleteDataSourceSpecificInterval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess we need to have something like this https://github.com/druid-io/druid/blob/0d745ee1207df4990ab92082ec33c75f5777757e/server/src/main/java/io/druid/server/http/DatasourcesResource.java#L346
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to query-param for intervals/numCandidates. it looks better than before, I think.
looks good to me after #2282 (comment) |
failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
closing & reopening for travis |
All review comments seem handled, changed milestone to 0.9.2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@pjain1 this is in can you do the follow up PR for 0.9.2 ? |
* Show candidate hosts for the given query * Added test cases & minor changes to address comments * Changed path-param to query-pram for intervals/numCandidates
Provide location information for the given query. Need for calculating location of input split.