Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show candidate hosts for the given query #2282

Merged
merged 3 commits into from Sep 22, 2016

Conversation

navis
Copy link
Contributor

@navis navis commented Jan 18, 2016

Provide location information for the given query. Need for calculating location of input split.

@fjy
Copy link
Contributor

fjy commented Jan 18, 2016

@navis can you provide some more information on why this is feature is required. You can get this information through Druid metrics.

@navis navis changed the title Show candidate hosts for the given query [WIP] Show candidate hosts for the given query Jan 19, 2016
@navis
Copy link
Contributor Author

navis commented Jan 19, 2016

@fjy I'm thinking of implementing a druid storage handler for hive. marked as 'wip'.

@himanshug
Copy link
Contributor

@navis having a hive storage adapter would be good but it appears you are thinking of fetching data from historical nodes (and realtime nodes). IMO it will be much more scalable to read segments from hdfs directly.
I wrote a hadoop InputFormat ( a simple wrapper around druid's DatasourceInputFormat) and a Pig loader that works that way (planning to move InputFormat to druid core) , see

@fjy fjy added the Discuss label Jan 19, 2016
@navis
Copy link
Contributor Author

navis commented Jan 20, 2016

Because we are supposed to use both of druid(simple select/groupby) and hive(for joins, udfs, etc) for the same dataset, it's highly possible that historical node already loaded segments to be used and we want to reuse it if possible. I think we can add a method to ask the historical node about local temporary directory it used for bypassing druid access.

I don't know how the InputFormat you've mentioned is implemented. Is it support predicates to be pushed to use bitmap index in druid smooth file? If it is, there is not much work to be done for me :).

@navis
Copy link
Contributor Author

navis commented Jan 20, 2016

@himanshug Thanks, I'll check that.

@navis
Copy link
Contributor Author

navis commented Feb 3, 2016

Anyway, below is the test result in my notebook.

navis@navisui-MacBook-Pro:~/druid$ explain_query.sh timeseries.json
[ {
  "itvl" : "2016-01-28T09:00:00.000+09:00/2016-01-29T09:00:00.000+09:00",
  "ver" : "2016-01-28T23:47:17.046Z",
  "part" : 0,
  "locations" : [ {
    "name" : "localhost:8083",
    "host" : "localhost:8083",
    "maxSize" : 10000000000,
    "type" : "historical",
    "tier" : "_default_tier",
    "priority" : 0
  } ]
}, {
  "itvl" : "2016-02-02T09:00:00.000+09:00/2016-02-03T09:00:00.000+09:00",
  "ver" : "2016-02-02T01:03:07.127Z",
  "part" : 0,
  "locations" : [ {
    "name" : "localhost:8100",
    "host" : "localhost:8100",
    "maxSize" : 0,
    "type" : "realtime",
    "tier" : "_default_tier",
    "priority" : 0
  } ]
} ]

@navis
Copy link
Contributor Author

navis commented Mar 22, 2016

Added get method something like curl "http://localhost:8082/druid/v2/candidates/?datasource=wikipedia&intervals=2013-08-31/2020-09-01".

@himanshug I need know target segments and locations from predicates (or sarg) made in hive. DataSourceInputFormat is a good start point but still need this PR to cover our use case.

@navis navis changed the title [WIP] Show candidate hosts for the given query Show candidate hosts for the given query Mar 22, 2016
@navis navis force-pushed the show-query-segment-host branch 2 times, most recently from d087707 to a59b9a5 Compare April 21, 2016 00:56
@navis
Copy link
Contributor Author

navis commented Apr 25, 2016

I've made patch integrating druid with hive based on this patch. Can anyone review this? I think this is fairly simple patch.

@binlijin
Copy link
Contributor

binlijin commented Apr 25, 2016

Looks like the coordinator also have the segments info, so what time is the good to query coordinator and what time is the good to query broker?

for (Interval interval : intervals) {
for (TimelineObjectHolder<String, ServerSelector> holder : timeline.lookup(interval)) {
for (PartitionChunk<ServerSelector> chunk : holder.getObject()) {
ServerSelector selector = chunk.getObject();
Copy link
Contributor

@binlijin binlijin Apr 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size can get from ServerSelector?
ServerSelector.getSegment().getSize()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot understand how I've missed that three months ago. Then we can remove controversial approxSize part safely. Thanks.

@binlijin
Copy link
Contributor

binlijin commented Apr 26, 2016

Patch looks much clean now, i think it is good.

@binlijin
Copy link
Contributor

Need doc for how to use it.

@navis
Copy link
Contributor Author

navis commented Apr 26, 2016

Fail of AnnouncerTest.testSanity, which seemed definitely not related to this.

@xvrl
Copy link
Member

xvrl commented Apr 27, 2016

@navis this is useful, we built something similar standalone for our #2330 prototype, but we'll use this going forward since I think it makes sense for this to live on a broker.


@JsonCreator
public LocatedSegmentDescriptor(
@JsonProperty("itvl") Interval interval,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for public APIs we should probably use fully qualified names instead of abbreviations.

Copy link
Contributor Author

@navis navis May 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class is extending SegmentDescriptor and it uses "itvl" as json property name. should we create separated class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SegmentDescriptor is not exposed to the user so it's less of a problem. However, all our classes that are exposed to the user, use either interval or intervals so we should keep that consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good.

@@ -300,6 +303,24 @@ public int compare(Interval o1, Interval o2)
return metrics;
}

@GET
@Path("/{dataSourceName}/candidates/intervals/{intervals}/numCandidates/{numCandidates}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess after the changed in #2424 by @pjain1 we need to add Auth logic ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding @ResourceFilters(DatasourceResourceFilter.class) to the endpoint will add the auth logic. So the endpoint will look like this -

  @GET
  @Path("/{dataSourceName}/candidates/intervals/{intervals}/numCandidates/{numCandidates}")
  @Produces(MediaType.APPLICATION_JSON)
  @ResourceFilters(DatasourceResourceFilter.class)
  public Iterable<LocatedSegmentDescriptor> getQueryTargets(
      @PathParam("dataSourceName") String datasource,
      @PathParam("intervals") String intervals,
      @PathParam("numCandidates") String numCandidates,
      @Context final HttpServletRequest req
  ) throws IOException

If this can be added then it's great otherwise I can do a follow up PR to add it.

{
public static final int DEFAULT_NUM_CANDIDATES = 5;

public static List<LocatedSegmentDescriptor> getTargetLocations(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add couple of UTs to this core method ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

*/
public class ServerViewUtil
{
public static final int DEFAULT_NUM_CANDIDATES = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: @navis someone my want to have all the server by default but maybe be you have a specific use case in mind ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really. I'll change default behavior to get all servers.

@@ -59,7 +59,17 @@ Returns the dimensions of the datasource.

Returns the metrics of the datasource.

* `/druid/v2/datasources/{dataSourceName}/candidates/intervals/{comma-separated-intervals-in-ISO8601-format}/numCandidates/{numCandidates}`

Returns segment information lists including server locations for the given datasource and intervals.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we elect to go with a short list of 5 server if numCanditates is absent, we should add this to the doc i guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now default returns all servers. updated doc anyway.

private final List<DruidServerMetadata> locations;

@JsonCreator
public LocatedSegmentDescriptor(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some ser/desr test will be nice thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@b-slim b-slim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navis i left same small comments

@b-slim
Copy link
Contributor

b-slim commented Sep 20, 2016

@navis i meant some not same sorry !

@b-slim b-slim added the Feature label Sep 20, 2016
@b-slim b-slim added this to the 0.9.3 milestone Sep 20, 2016
@navis
Copy link
Contributor Author

navis commented Sep 21, 2016

@b-slim Addressed comments

List<Interval> intervalList = Lists.newArrayList();
for (String interval : intervals.split(",")) {
intervalList.add(Interval.parse(interval.trim()));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When Intervals are specified as PathParam do we need to do any handling for '/' ?
e.g similar thing is done in DataSourcesResource.deleteDataSourceSpecificInterval

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to query-param for intervals/numCandidates. it looks better than before, I think.

@nishantmonu51
Copy link
Member

looks good to me after #2282 (comment)

@navis
Copy link
Contributor Author

navis commented Sep 22, 2016

failed io.druid.server.lookup.PollingLookupTest which seemed not related to this.

@drcrallen
Copy link
Contributor

@navis #3480

Copy link
Member

@nishantmonu51 nishantmonu51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nishantmonu51
Copy link
Member

closing & reopening for travis

@nishantmonu51 nishantmonu51 reopened this Sep 22, 2016
@nishantmonu51 nishantmonu51 modified the milestones: 0.9.2, 0.9.3 Sep 22, 2016
@nishantmonu51
Copy link
Member

All review comments seem handled, changed milestone to 0.9.2.

Copy link
Contributor

@b-slim b-slim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@b-slim b-slim merged commit 49c0fe0 into apache:master Sep 22, 2016
@b-slim
Copy link
Contributor

b-slim commented Sep 22, 2016

@pjain1 this is in can you do the follow up PR for 0.9.2 ?

drcrallen pushed a commit to metamx/druid that referenced this pull request Sep 27, 2016
* Show candidate hosts for the given query

* Added test cases & minor changes to address comments

* Changed path-param to query-pram for intervals/numCandidates
seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this pull request Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants