Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Has child query forces default similarity #16550

Closed
gpstathis opened this issue Feb 9, 2016 · 4 comments
Closed

Has child query forces default similarity #16550

gpstathis opened this issue Feb 9, 2016 · 4 comments
Labels
>bug :Search/Search Search-related issues that do not fall into other categories

Comments

@gpstathis
Copy link
Contributor

Déjà vu of #4977 on 2.2.0. The has_child query seems to only use the default similarity when scoring child documents. Even the built-in non-default similarities don't seem to be picked up. See the curl statements below for reproducing. I reached out to @imotov who confirmed this issue on master.

No promises on a PR this time but we'll start looking into it and see how far we can get.

curl statements for reproducing:

# delete index
curl -XDELETE 'http://localhost:9200/has_child_similarity_test/?pretty=true'

# create index with proper parent/child mappings
curl -XPUT 'http://localhost:9200/has_child_similarity_test/?pretty=true' -d '{
  "settings" : {
    "index" : {
      "number_of_shards" : 1,
      "number_of_replicas" : 0
    }
  },
  "mappings": {
    "author": {
      "properties": {
        "name": { "type": "string" }
      }
    },
    "post": {
      "_parent": { "type": "author" },
      "properties": {
        "content": { "type": "string" }
      }
    }
  }
}'

# add data
curl -XPUT 'http://localhost:9200/has_child_similarity_test/author/1?pretty=true' -d '{
   "name": "George P. Stathis"
}'
curl -XPUT 'http://localhost:9200/has_child_similarity_test/post/1?parent=1&pretty=true' -d '{
  "content": "Lorem ipsum dolor sit amet."
}'
curl -XPUT 'http://localhost:9200/has_child_similarity_test/post/2?parent=1&pretty=true' -d '{
  "content": "Lorem ipsum dolor sit amet again!"
}'

echo " "
echo "Sleep for two secs to allow for indexing"
sleep 2

# Search posts directly
echo " "
echo "Run query against child docs"
curl 'http://localhost:9200/has_child_similarity_test/post/_search?pretty=1' -d '{
  "query" : {
    "query_string" : {
      "default_field": "content",
      "query" : "Lorem ipsum"
    }
  }
}' | grep '_score'

echo " "
echo "Two docs should have matched with scores 0.61871845 and 0.53033006"

# Search with has_child
echo " "
echo "Run same query as has_child query in 'sum' mode"
curl 'http://localhost:9200/has_child_similarity_test/_search?pretty=1' -d '{
  "query": {
    "has_child": {
      "type": "post",
      "query": {
        "query_string": {
          "query": "Lorem ipsum"
        }
      },
      "score_mode": "total"
    }
  }
}' | grep '_score'

echo " "
echo "One parent doc should have matched with score 1.1490486 (i.e. 0.61871845 + 0.53033006)"
sleep 5

# delete index
echo " "
echo "Start over and re-index posts using BM25 similarity"
curl -XDELETE 'http://localhost:9200/has_child_similarity_test/?pretty=true'

# create index with proper parent/child mappings
curl -XPUT 'http://localhost:9200/has_child_similarity_test/?pretty=true' -d '{
  "settings" : {
    "index" : {
      "number_of_shards" : 1,
      "number_of_replicas" : 0
    }
  },
  "mappings": {
    "author": {
      "properties": {
        "name": { "type": "string" }
      }
    },
    "post": {
      "_parent": { "type": "author" },
      "properties": {
        "content": { "type": "string", "similarity" : "BM25" }
      }
    }
  }
}'

# add data
curl -XPUT 'http://localhost:9200/has_child_similarity_test/author/1?pretty=true' -d '{
   "name": "George P. Stathis"
}'
curl -XPUT 'http://localhost:9200/has_child_similarity_test/post/1?parent=1&pretty=true' -d '{
  "content": "Lorem ipsum dolor sit amet."
}'
curl -XPUT 'http://localhost:9200/has_child_similarity_test/post/2?parent=1&pretty=true' -d '{
  "content": "Lorem ipsum dolor sit amet again!"
}'

echo " "
echo "Sleep for another two secs to allow for indexing"
sleep 2

# Search posts directly
echo " "
echo "Run query against child docs"
curl 'http://localhost:9200/has_child_similarity_test/post/_search?pretty=1' -d '{
  "query" : {
    "query_string" : {
      "default_field": "content",
      "query" : "Lorem ipsum"
    }
  }
}' | grep '_score'

echo " "
echo "NOTE!!! Two docs should now have matched with scores 0.80081946 and 0.67905 because we are using BM25"

# Search with has_child
echo " "
echo "Run same query as has_child query in 'sum' mode"
curl 'http://localhost:9200/has_child_similarity_test/_search?pretty=1' -d '{
  "query": {
    "has_child": {
      "type": "post",
      "query": {
        "query_string": {
          "query": "Lorem ipsum"
        }
      },
      "score_mode": "total"
    }
  }
}' | grep '_score'

echo " "
echo "NOTE!!! One parent doc matched but with with score 1.1490486 which is the sum of the child doc scores as computed by the default similarity (i.e. 0.61871845 + 0.53033006) not BM25. With BM25, the expected parent doc score should have been 0.80081946 + 0.67905 = 1.47986946"

@martijnvg
Copy link
Member

This regression must have sneaked in during the parent/child refactoring that was part of ES 2.0...

@martijnvg martijnvg added the >bug label Feb 11, 2016
@martijnvg
Copy link
Member

@gpstathis I see you already worked on tests, but I can't a pr or the actual fix. The actual fix should be easy, from the HasChildQueryBuilder#doToQuery(...) and HasParentQueryBuilder#doToQuery(...) we should pass the Similarity from context#getSearchSimilarity() to LateParsingQuery as an constructor argument, in LateParsingQuery the similarity should become a field too. Then in LateParsingQuery#rewrite() just pass this similarity to the IndexSearcher being created there.

@gpstathis
Copy link
Contributor Author

@martijnvg got caught up with #16594 which I stumbled across as I was creating the tests for this one. Let me take a look and see how far I can get today.

@gpstathis
Copy link
Contributor Author

@martijnvg that was easy indeed. SimilarityIT.testHasChildWithNonDefaultFieldSimilarity() is passing now. SimilarityIT.testHasChildWithNonDefaultGlobalSimilarity() is passing but wrongly so because of #16594. Once that is fixed that unit test should still pass. Will open a PR.

gpstathis added a commit to gpstathis/elasticsearch that referenced this issue Feb 12, 2016
gpstathis added a commit to gpstathis/elasticsearch that referenced this issue Feb 12, 2016
martijnvg pushed a commit to martijnvg/elasticsearch that referenced this issue Feb 15, 2016
gpstathis added a commit to Traackr/elasticsearch that referenced this issue Feb 29, 2016
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Parent/Child labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

3 participants