Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behaviour for matched_queries field #101480

Closed
bogdantkachenko opened this issue Oct 27, 2023 · 4 comments
Closed

Inconsistent behaviour for matched_queries field #101480

bogdantkachenko opened this issue Oct 27, 2023 · 4 comments
Assignees
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@bogdantkachenko
Copy link

Elasticsearch Version

7.17.5

Installed Plugins

No response

Java Version

bundled

OS Version

Linux 69c04e0415e4 5.10.104-linuxkit #1 SMP PREEMPT Thu Mar 17 17:05:54 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

Problem Description

matched_queries field is not populated for match query when should clause includes match and match_bool_prefix with the same name.

Steps to Reproduce

  1. create index with mapping { "dynamic": false, "properties": { "id": { "type": "keyword" }, "names": { "type": "text" } } }
  2. add two documents { "id": "1", "names": ["BALTAS"] }, { "id": "2", "names": [" alias", " Test alias"] }
  3. run search query { "query": { "bool": { "should": [ { "match": { "names": { "query": "BALTAS", "operator": "AND", "fuzziness": "AUTO", "prefix_length": 0, "max_expansions": 50, "fuzzy_transpositions": true, "lenient": false, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": true, "_name": "names" } } }, { "match_bool_prefix": { "names": { "query": "BALTAS", "operator": "AND", "prefix_length": 0, "max_expansions": 50, "fuzzy_transpositions": true, "_name": "names" } } } ] } } }
  4. there is no matched_queries for the document with "id": "2" result { "took": 11, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 2, "relation": "eq" }, "max_score": 1.8713851, "hits": [{ "_index": "named_queries", "_type": "_doc", "_id": "U6lqcosBGtR7NdaKQuTo", "_score": 1.8713851, "_source": { "id": "1", "names": ["BALTAS"] }, "matched_queries": ["names"] }, { "_index": "named_queries", "_type": "_doc", "_id": "VKlqcosBGtR7NdaKh-Qq", "_score": 0.5013448, "_source": { "id": "2", "names": [" alias", " Test alias"] } }] } }
  5. if we modify the name of the match query to names1 matched_queries works as expected
  6. if we run just a single match query matched_queries works as expected

Logs (if relevant)

No response

@bogdantkachenko bogdantkachenko added >bug needs:triage Requires assignment of a team area label labels Oct 27, 2023
@gbanasiak
Copy link
Contributor

8.10.4 shows the same behaviour.

Documentation doesn't clarify whether names should be unique or not: https://www.elastic.co/guide/en/elasticsearch/reference/8.10/query-dsl-bool-query.html#named-queries

@gbanasiak gbanasiak added the :Search/Search Search-related issues that do not fall into other categories label Oct 30, 2023
@elasticsearchmachine elasticsearchmachine added Team:Search Meta label for search team and removed needs:triage Requires assignment of a team area label labels Oct 30, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@benwtrent benwtrent self-assigned this Oct 30, 2023
@benwtrent
Copy link
Member

I verified via a test:

---
"Queries with same name":
    - do:
          bulk:
              refresh: true
              body:
                  - '{ "index" : { "_index" : "test_1", "_id" : "1" } }'
                  - '{"names": [" alias", " Test alias"] }'
                  - '{ "index" : { "_index" : "test_1", "_id" : "2" } }'
                  - '{ "names": ["BALTAS"] }'

    - do:
          search:
              index: test_1
              body:
                >
                {
                  "query": {
                    "bool": {
                      "should": [
                        { "match": { "names": { "query": "BALTAS", "_name": "names" } } },
                        { "match_bool_prefix": { "names": { "query": "BALTAS", "_name": "names" } } }
                      ]
                    }
                  }
                }

    - match:  {hits.total.value: 2}
    - length: {hits.hits.0.matched_queries: 1}
    - match:  {hits.hits.0.matched_queries: ["names"]}
    - length: {hits.hits.1.matched_queries: 1}
    - match:  {hits.hits.1.matched_queries: ["names"]}

Digging into the code, we assume query names are to be unique. Any duplicate names will result in undefined behavior as the _name of queries, depending on what order they are parsed and rewritten could replace the _name of the previously used one.

It has been this way for as long back as I can see in the source. I will add something in the docs explaining this.

Makes me wonder if we should throw an exception if duplicate query names are detected.

benwtrent added a commit that referenced this issue Oct 30, 2023
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 30, 2023
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 30, 2023
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 30, 2023
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 30, 2023
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 30, 2023
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 30, 2023
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 30, 2023
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 30, 2023
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 30, 2023
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 30, 2023
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 30, 2023
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 30, 2023
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Oct 30, 2023
mark-vieira pushed a commit to mark-vieira/elasticsearch that referenced this issue Nov 2, 2023
@benwtrent
Copy link
Member

Disallowing duplicate _name values would prove tricky. Especially for stored alias filters or combining alias filters via a single query.

Closing this issue as we documented that sending duplicate _names is results in undefined behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

4 participants