Skip to content

Latest commit

 

History

History
315 lines (251 loc) · 10.4 KB

query-string-query.asciidoc

File metadata and controls

315 lines (251 loc) · 10.4 KB

Query String Query

A query that uses a query parser in order to parse its content. Here is an example:

GET /_search
{
    "query": {
        "query_string" : {
            "default_field" : "content",
            "query" : "this AND that OR thus"
        }
    }
}

The query_string query parses the input and splits text around operators. Each textual part is analyzed independently of each other. For instance the following query:

GET /_search
{
    "query": {
        "query_string" : {
            "default_field" : "content",
            "query" : "(new york city) OR (big apple)"
        }
    }
}
  1. will be split into new york city and big apple and each part is then analyzed independently by the analyzer configured for the field.

Warning
Whitespaces are not considered operators, this means that new york city will be passed "as is" to the analyzer configured for the field. If the field is a keyword field the analyzer will create a single term new york city and the query builder will use this term in the query. If you want to query each term separately you need to add explicit operators around the terms (e.g. new AND york AND city).

When multiple fields are provided it is also possible to modify how the different field queries are combined inside each textual part using the type parameter. The possible modes are described here and the default is best_fields.

The query_string top level parameters include:

Parameter Description

query

The actual query to be parsed. See [query-string-syntax].

default_field

The default field for query terms if no prefix field is specified. Defaults to the index.query.default_field index settings, which in turn defaults to . extracts all fields in the mapping that are eligible to term queries and filters the metadata fields. All extracted fields are then combined to build a query when no prefix field is provided. There is a limit on the number of fields that can be queried at once which is defined by the indices.query.bool.max_clause_count [search-settings] which defaults to 1024.

default_operator

The default operator used if no explicit operator is specified. For example, with a default operator of OR, the query capital of Hungary is translated to capital OR of OR Hungary, and with default operator of AND, the same query is translated to capital AND of AND Hungary. The default value is OR.

analyzer

The analyzer name used to analyze the query string.

quote_analyzer

The name of the analyzer that is used to analyze quoted phrases in the query string. For those parts, it overrides other analyzers that are set using the analyzer parameter or the search_quote_analyzer setting.

allow_leading_wildcard

When set, * or ? are allowed as the first character. Defaults to true.

enable_position_increments

Set to true to enable position increments in result queries. Defaults to true.

fuzzy_max_expansions

Controls the number of terms fuzzy queries will expand to. Defaults to 50

fuzziness

Set the fuzziness for fuzzy queries. Defaults to AUTO. See [fuzziness] for allowed settings.

fuzzy_prefix_length

Set the prefix length for fuzzy queries. Default is 0.

fuzzy_transpositions

Set to false to disable fuzzy transpositions (abba). Default is true.

phrase_slop

Sets the default slop for phrases. If zero, then exact phrase matches are required. Default value is 0.

boost

Sets the boost value of the query. Defaults to 1.0.

analyze_wildcard

By default, wildcards terms in a query string are not analyzed. By setting this value to true, a best effort will be made to analyze those as well.

max_determinized_states

Limit on how many automaton states regexp queries are allowed to create. This protects against too-difficult (e.g. exponentially hard) regexps. Defaults to 10000.

minimum_should_match

A value controlling how many "should" clauses in the resulting boolean query should match. It can be an absolute value (2), a percentage (30%) or a combination of both.

lenient

If set to true will cause format based failures (like providing text to a numeric field) to be ignored.

time_zone

Time Zone to be applied to any range query related to dates. See also JODA timezone.

quote_field_suffix

A suffix to append to fields for quoted parts of the query string. This allows to use a field that has a different analysis chain for exact matching. Look here for a comprehensive example.

auto_generate_synonyms_phrase_query

Whether phrase queries should be automatically generated for multi terms synonyms. Defaults to true.

When a multi term query is being generated, one can control how it gets rewritten using the rewrite parameter.

Default Field

When not explicitly specifying the field to search on in the query string syntax, the index.query.default_field will be used to derive which field to search on. If the index.query.default_field is not specified, the query_string will automatically attempt to determine the existing fields in the index’s mapping that are queryable, and perform the search on those fields. Note that this will not include nested documents, use a nested query to search those documents.

Multi Field

The query_string query can also run against multiple fields. Fields can be provided via the "fields" parameter (example below).

The idea of running the query_string query against multiple fields is to expand each query term to an OR clause like this:

field1:query_term OR field2:query_term | ...

For example, the following query

GET /_search
{
    "query": {
        "query_string" : {
            "fields" : ["content", "name"],
            "query" : "this AND that"
        }
    }
}

matches the same words as

GET /_search
{
    "query": {
        "query_string": {
            "query": "(content:this OR name:this) AND (content:that OR name:that)"
        }
    }
}

Since several queries are generated from the individual search terms, combining them is automatically done using a dis_max query with a tie_breaker. For example (the name is boosted by 5 using ^5 notation):

GET /_search
{
    "query": {
        "query_string" : {
            "fields" : ["content", "name^5"],
            "query" : "this AND that OR thus",
            "tie_breaker" : 0
        }
    }
}

Simple wildcard can also be used to search "within" specific inner elements of the document. For example, if we have a city object with several fields (or inner object with fields) in it, we can automatically search on all "city" fields:

GET /_search
{
    "query": {
        "query_string" : {
            "fields" : ["city.*"],
            "query" : "this AND that OR thus"
        }
    }
}

Another option is to provide the wildcard fields search in the query string itself (properly escaping the sign), for example: city.\:something:

GET /_search
{
    "query": {
        "query_string" : {
            "query" : "city.\\*:(this AND that OR thus)"
        }
    }
}
Note
Since \ (backslash) is a special character in json strings, it needs to be escaped, hence the two backslashes in the above query_string.

When running the query_string query against multiple fields, the following additional parameters are allowed:

Parameter Description

type

How the fields should be combined to build the text query. See types for a complete example. Defaults to best_fields

tie_breaker

The disjunction max tie breaker for multi fields. Defaults to 0

The fields parameter can also include pattern based field names, allowing to automatically expand to the relevant fields (dynamically introduced fields included). For example:

GET /_search
{
    "query": {
        "query_string" : {
            "fields" : ["content", "name.*^5"],
            "query" : "this AND that OR thus"
        }
    }
}

Synonyms

The query_string query supports multi-terms synonym expansion with the synonym_graph token filter. When this filter is used, the parser creates a phrase query for each multi-terms synonyms. For example, the following synonym: "ny, new york" would produce:

(ny OR ("new york"))

It is also possible to match multi terms synonyms with conjunctions instead:

GET /_search
{
   "query": {
       "query_string" : {
           "default_field": "title",
           "query" : "ny city",
           "auto_generate_synonyms_phrase_query" : false
       }
   }
}

The example above creates a boolean query:

(ny OR (new AND york)) city

that matches documents with the term ny or the conjunction new AND york. By default the parameter auto_generate_synonyms_phrase_query is set to true.