Skip to content

Commit

Permalink
Added inner_hits feature that allows to include nested hits.
Browse files Browse the repository at this point in the history
Inner hits allows to embed nested inner objects, children documents or the parent document that contributed to the matching of the returned search hit as inner hits, which would otherwise be hidden.

Closes #8153
Closes #3022
Closes #3152
  • Loading branch information
martijnvg committed Dec 2, 2014
1 parent 1ef8f3e commit 025c82c
Show file tree
Hide file tree
Showing 25 changed files with 2,646 additions and 206 deletions.
2 changes: 2 additions & 0 deletions docs/reference/search/request-body.asciidoc
Expand Up @@ -127,3 +127,5 @@ include::request/index-boost.asciidoc[]
include::request/min-score.asciidoc[]

include::request/named-queries-and-filters.asciidoc[]

include::request/inner-hits.asciidoc[]
248 changes: 248 additions & 0 deletions docs/reference/search/request/inner-hits.asciidoc
@@ -0,0 +1,248 @@
[[search-request-inner-hits]]
=== Inner hits

coming[1.5.0]

The <<mapping-parent-field, parent/child>> and <<mapping-nested-type, nested>> features allow to return documents that
have matches in a different scope. In the parent/child case parent document are returned based on matches in child
documents or child document are returned based on matches in parent documents. In the nested case documents are returned
based on matches in nested inner objects.

In both cases the actual matches in the different scopes that caused a document to be returned is hidden. In many cases
it is very useful to know which inner nested objects in the case of nested or children or parent documents in the case
of parent/child caused certain information to be returned. The inner hits feature can be used for this. This feature
returns per search hit in the search response additional nested hits that caused a search hit to match in a different scope.

The following snippet explains the basic structure of inner hits:

[source,js]
--------------------------------------------------
"inner_hits" : {
"<inner_hits_name>" : {
"<path|type>" : {
"<path-to-nested-object-field|child-or-parent-type>" : {
<inner_hits_body>
[,"inner_hits" : { [<sub_inner_hits>]+ } ]?
}
}
}
[,"<inner_hits_name_2>" : { ... } ]*
}
--------------------------------------------------

Inside the `inner_hits` definition, first the name if the inner hit is defined then whether the inner_hit
is a nested by defining `path` or a parent/child based definition by defining `type`. The next object layer contains
the name of the nested object field if the inner_hits is nested or the parent or child type if the inner_hit definition
is parent/child based.

Multiple inner hit definitions can be defined in a single request. In the `<inner_hits_body>` any option for features
that `inner_hits` support can be defined. Optionally another `inner_hits` definition can be defined in the `<inner_hits_body>`.

If `inner_hits` is defined each search will contain a `inner_hits` json object with the following structure:

[source,js]
--------------------------------------------------
"hits": [
{
"_index": ...,
"_type": ...,
"_id": ...,
"inner_hits": {
"<inner_hits_name>": {
"hits": {
"total": ...,
"hits": [
{
"_type": ...,
"_id": ...,
...
},
...
]
}
}
},
...
},
...
]
--------------------------------------------------

==== Options

Inner hits support the following options:

[horizontal]
`path`:: Defines the nested scope where hits will be collected from.
`type`:: Defines the parent or child type score where hits will be collected from.
`query`:: Defines the query that will run in the defined nested, parent or child scope to collect and score hits. By default all document in the scope will be matched.
`from`:: The offset from where the first hit to fetch for each `inner_hits` in the returned regular search hits.
`size`:: The maximum number of hits to return per `inner_hits`. By default the top three matching hits are returned.
`sort`:: How the inner hits should be sorted per `inner_hits`. By default the hits are sorted by the score.

Either `path` or `type` must be defined. The `path` or `type` defines the scope from where hits are fetched and
used as inner hits.

Inner hits also supports the following per document features:

* <<search-request-highlighting,Highlighting>>
* <<search-request-explain,Explain>>
* <<search-request-source-filtering,Source filtering>>
* <<search-request-script-fields,Script fields>>
* <<search-request-fielddata-fields,Fielddata fields>>
* <<search-request-version,Include versions>>

[[nested-inner-hits]]
==== Nested inner hits

The nested `inner_hits` can be used to include nested inner objects as inner hits to a search hit.

The example below assumes that there is a nested object field defined with the name `comments`:

[source,js]
--------------------------------------------------
{
"query" : {
"nested" : {
"path" : "comments",
"query" : {
"match" : {"comments.message" : "[actual query]"}
}
}
},
"inner_hits" : {
"comment" : {
"path" : { <1>
"comments" : { <2>
"query" : {
"match" : {"comments.message" : "[actual query]"}
}
}
}
}
}
}
--------------------------------------------------

<1> The inner hit definition is nested and requires the `path` option.
<2> The path option refers to the nested object field `comments`

In the above the query is repeated in both the query and the `comment` inner hit definition. At the moment there is
no query referencing support, so in order to make sure that only inner nested objects are returned that contributed to
the matching of the regular hits, the inner query in the `nested` query needs to also be defined on the inner hits definition.

An example of a response snippet that could be generated from the above search request:

[source,js]
--------------------------------------------------
...
"hits": {
...
"hits": [
{
"_index": "my-index",
"_type": "question",
"_id": "1",
"_source": ...,
"inner_hits": {
"comment": {
"hits": {
"total": ...,
"hits": [
{
"_type": "question",
"_id": "1",
"_nested": {
"field": "comments",
"offset": 2
},
"_source": ...
},
...
]
}
}
}
},
...
--------------------------------------------------

The `_nested` metadata is crucial in the above example, because it defines from what inner nested object this inner hit
came from. The `field` defines the object array field the nested hit is from and the `offset` relative to its location
in the `_source`. Due to sorting and scoring the actual location of the hit objects in the `inner_hits` is usually
different than the location a nested inner object was defined.

By default the `_source` is returned also for the hit objects in `inner_hits`, but this can be changed. Either via
`_source` filtering feature part of the source can be returned or be disabled. If stored fields are defined on the
nested level these can also be returned via the `fields` feature.

An important default is that the `_source` returned in hits inside `inner_hits` is relative to the `_nested` metadata.
So in the above example only the comment part is returned per nested hit and not the entire source of the top level
document that contained the the comment.

[[parent-child-inner-hits]]
==== Parent/child inner hits

The parent/child `inner_hits` can be used to include parent or child

The examples below assumes that there is a `_parent` field mapping in the `comment` type:

[source,js]
--------------------------------------------------
{
"query" : {
"has_child" : {
"type" : "comment",
"query" : {
"match" : {"message" : "[actual query]"}
}
}
},
"inner_hits" : {
"comment" : {
"type" : { <1>
"comment" : { <2>
"query" : {
"match" : {"message" : "[actual query]"}
}
}
}
}
}
}
--------------------------------------------------

<1> This is a parent/child inner hit definition and requires the `type` option.
<2> Refers to the document type `comment`

An example of a response snippet that could be generated from the above search request:

[source,js]
--------------------------------------------------
...
"hits": {
...
"hits": [
{
"_index": "my-index",
"_type": "question",
"_id": "1",
"_source": ...,
"inner_hits": {
"comment": {
"hits": {
"total": ...,
"hits": [
{
"_type": "comment",
"_id": "5",
"_source": ...
},
...
]
}
}
}
},
...
--------------------------------------------------
Expand Up @@ -34,6 +34,7 @@
import org.elasticsearch.search.Scroll;
import org.elasticsearch.search.aggregations.AbstractAggregationBuilder;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.innerhits.InnerHitsBuilder;
import org.elasticsearch.search.facet.FacetBuilder;
import org.elasticsearch.search.highlight.HighlightBuilder;
import org.elasticsearch.search.rescore.RescoreBuilder;
Expand Down Expand Up @@ -836,6 +837,11 @@ public SearchRequestBuilder setHighlighterExplicitFieldOrder(boolean explicitFie
return this;
}

public SearchRequestBuilder addInnerHit(String name, InnerHitsBuilder.InnerHit innerHit) {
innerHitsBuilder().addInnerHit(name, innerHit);
return this;
}

/**
* Delegates to {@link org.elasticsearch.search.suggest.SuggestBuilder#setText(String)}.
*/
Expand Down Expand Up @@ -1127,6 +1133,10 @@ private HighlightBuilder highlightBuilder() {
return sourceBuilder().highlighter();
}

private InnerHitsBuilder innerHitsBuilder() {
return sourceBuilder().innerHitsBuilder();
}

private SuggestBuilder suggestBuilder() {
return sourceBuilder().suggest();
}
Expand Down
Expand Up @@ -20,12 +20,12 @@
package org.elasticsearch.index.query;

import org.apache.lucene.index.AtomicReaderContext;
import org.apache.lucene.search.DocIdSet;
import org.apache.lucene.search.Filter;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.join.ScoreMode;
import org.apache.lucene.search.join.ToParentBlockJoinQuery;
import org.apache.lucene.util.Bits;
import org.apache.lucene.util.FixedBitSet;
import org.elasticsearch.common.Strings;
import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.common.lucene.search.XConstantScoreQuery;
Expand Down Expand Up @@ -167,11 +167,13 @@ public Query parse(QueryParseContext parseContext) throws IOException, QueryPars
}
}

static ThreadLocal<LateBindingParentFilter> parentFilterContext = new ThreadLocal<>();
// TODO: Change this mechanism in favour of how parent nested object type is resolved in nested and reverse_nested agg
// with this also proper validation can be performed on what is a valid nested child nested object type to be used
public static ThreadLocal<LateBindingParentFilter> parentFilterContext = new ThreadLocal<>();

static class LateBindingParentFilter extends Filter {
public static class LateBindingParentFilter extends FixedBitSetFilter {

Filter filter;
public FixedBitSetFilter filter;

@Override
public int hashCode() {
Expand All @@ -180,7 +182,8 @@ public int hashCode() {

@Override
public boolean equals(Object obj) {
return filter.equals(obj);
if (!(obj instanceof LateBindingParentFilter)) return false;
return filter.equals(((LateBindingParentFilter) obj).filter);
}

@Override
Expand All @@ -189,7 +192,7 @@ public String toString() {
}

@Override
public DocIdSet getDocIdSet(AtomicReaderContext ctx, Bits liveDocs) throws IOException {
public FixedBitSet getDocIdSet(AtomicReaderContext ctx, Bits liveDocs) throws IOException {
//LUCENE 4 UPGRADE just passing on ctx and live docs here
return filter.getDocIdSet(ctx, liveDocs);
}
Expand Down
11 changes: 11 additions & 0 deletions src/main/java/org/elasticsearch/percolator/PercolateContext.java
Expand Up @@ -58,6 +58,7 @@
import org.elasticsearch.search.fetch.FetchSearchResult;
import org.elasticsearch.search.fetch.FetchSubPhase;
import org.elasticsearch.search.fetch.fielddata.FieldDataFieldsContext;
import org.elasticsearch.search.fetch.innerhits.InnerHitsContext;
import org.elasticsearch.search.fetch.partial.PartialFieldsContext;
import org.elasticsearch.search.fetch.script.ScriptFieldsContext;
import org.elasticsearch.search.fetch.source.FetchSourceContext;
Expand Down Expand Up @@ -719,4 +720,14 @@ public SearchContext useSlowScroll(boolean useSlowScroll) {
public Counter timeEstimateCounter() {
throw new UnsupportedOperationException();
}

@Override
public void innerHits(InnerHitsContext innerHitsContext) {
throw new UnsupportedOperationException();
}

@Override
public InnerHitsContext innerHits() {
throw new UnsupportedOperationException();
}
}
5 changes: 5 additions & 0 deletions src/main/java/org/elasticsearch/search/SearchHit.java
Expand Up @@ -199,6 +199,11 @@ public interface SearchHit extends Streamable, ToXContent, Iterable<SearchHitFie
*/
SearchShardTarget getShard();

/**
* @return Inner hits or <code>null</code> if there are none
*/
Map<String, SearchHits> getInnerHits();

/**
* Encapsulates the nested identity of a hit.
*/
Expand Down
2 changes: 2 additions & 0 deletions src/main/java/org/elasticsearch/search/SearchModule.java
Expand Up @@ -33,6 +33,7 @@
import org.elasticsearch.search.fetch.FetchPhase;
import org.elasticsearch.search.fetch.explain.ExplainFetchSubPhase;
import org.elasticsearch.search.fetch.fielddata.FieldDataFieldsFetchSubPhase;
import org.elasticsearch.search.fetch.innerhits.InnerHitsFetchSubPhase;
import org.elasticsearch.search.fetch.matchedqueries.MatchedQueriesFetchSubPhase;
import org.elasticsearch.search.fetch.partial.PartialFieldsFetchSubPhase;
import org.elasticsearch.search.fetch.script.ScriptFieldsFetchSubPhase;
Expand Down Expand Up @@ -69,6 +70,7 @@ protected void configure() {
bind(VersionFetchSubPhase.class).asEagerSingleton();
bind(MatchedQueriesFetchSubPhase.class).asEagerSingleton();
bind(HighlightPhase.class).asEagerSingleton();
bind(InnerHitsFetchSubPhase.class).asEagerSingleton();

bind(SearchServiceTransportAction.class).asEagerSingleton();
bind(MoreLikeThisFetchService.class).asEagerSingleton();
Expand Down

0 comments on commit 025c82c

Please sign in to comment.