Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parent / child queries force default similarity #4977

Closed
gpstathis opened this issue Jan 31, 2014 · 7 comments
Closed

Parent / child queries force default similarity #4977

gpstathis opened this issue Jan 31, 2014 · 7 comments

Comments

@gpstathis
Copy link
Contributor

@martijnvg it seems that #3822 may have introduced a regression with custom similarities as well as the non-default bm25 and drf similarities. For instance, if I were to use bm25 for a child doc field, the Top Children query will not use it. It reverts back to the default similarity.

Here is a set of curl statements for reproducing:

# delete index
curl -XDELETE 'http://localhost:9200/top_children_similarity_test/?pretty=true'

# create index with proper parent/child mappings
curl -XPUT 'http://localhost:9200/top_children_similarity_test/?pretty=true' -d '{
  "settings" : {
    "index" : {
      "number_of_shards" : 1,
      "number_of_replicas" : 0
    }
  },
  "mappings": {
    "author": {
      "properties": {
        "name": { "type": "string" }
      }
    },
    "post": {
      "_parent": { "type": "author" },
      "properties": {
        "content": { "type": "string" }
      }
    }
  }
}'

# add data
curl -XPUT 'http://localhost:9200/top_children_similarity_test/author/1?pretty=true' -d '{
   "name": "George P. Stathis"
}'
curl -XPUT 'http://localhost:9200/top_children_similarity_test/post/1?parent=1&pretty=true' -d '{
  "post" : {
    "content": "Lorem ipsum dolor sit amet."
  }
}'
curl -XPUT 'http://localhost:9200/top_children_similarity_test/post/2?parent=1&pretty=true' -d '{
  "post" : {
    "content": "Lorem ipsum dolor sit amet again!"
  }
}'

echo " "
echo "Sleep for two secs to allow for indexing"
sleep 2

# Search posts directly
echo " "
echo "Run query against child docs"
curl 'http://localhost:9200/top_children_similarity_test/post/_search?pretty=1' -d '{
  "query" : {
    "query_string" : {
      "default_field": "content",
      "query" : "Lorem ipsum"
    }
  }
}' | grep '_score'

echo " "
echo "Two docs should have matched with scores 0.61871845 and 0.53033006"

# Search with top_children
echo " "
echo "Run same query as top_children query in 'sum' mode"
curl 'http://localhost:9200/top_children_similarity_test/_search?pretty=1' -d '{
  "query": {
    "top_children": {
      "type": "post",
      "query": {
        "query_string": {
          "query": "Lorem ipsum"
        }
      },
      "score": "sum"
    }
  }
}' | grep '_score'

echo " "
echo "One parent doc should have matched with score 1.1490486 (i.e. 0.61871845 + 0.53033006)"
sleep 5

# delete index
echo " "
echo "Start over and re-index posts using BM25 similarity"
curl -XDELETE 'http://localhost:9200/top_children_similarity_test/?pretty=true'

# create index with proper parent/child mappings
curl -XPUT 'http://localhost:9200/top_children_similarity_test/?pretty=true' -d '{
  "settings" : {
    "index" : {
      "number_of_shards" : 1,
      "number_of_replicas" : 0
    }
  },
  "mappings": {
    "author": {
      "properties": {
        "name": { "type": "string" }
      }
    },
    "post": {
      "_parent": { "type": "author" },
      "properties": {
        "content": { "type": "string", "similarity" : "BM25" }
      }
    }
  }
}'

# add data
curl -XPUT 'http://localhost:9200/top_children_similarity_test/author/1?pretty=true' -d '{
   "name": "George P. Stathis"
}'
curl -XPUT 'http://localhost:9200/top_children_similarity_test/post/1?parent=1&pretty=true' -d '{
  "post" : {
    "content": "Lorem ipsum dolor sit amet."
  }
}'
curl -XPUT 'http://localhost:9200/top_children_similarity_test/post/2?parent=1&pretty=true' -d '{
  "post" : {
    "content": "Lorem ipsum dolor sit amet again!"
  }
}'

echo " "
echo "Sleep for another two secs to allow for indexing"
sleep 2

# Search posts directly
echo " "
echo "Run query against child docs"
curl 'http://localhost:9200/top_children_similarity_test/post/_search?pretty=1' -d '{
  "query" : {
    "query_string" : {
      "default_field": "content",
      "query" : "Lorem ipsum"
    }
  }
}' | grep '_score'

echo " "
echo "NOTE!!! Two docs should now have matched with scores 0.80081946 and 0.67905 because we are using BM25"

# Search with top_children
echo " "
echo "Run same query as top_children query in 'sum' mode"
curl 'http://localhost:9200/top_children_similarity_test/_search?pretty=1' -d '{
  "query": {
    "top_children": {
      "type": "post",
      "query": {
        "query_string": {
          "query": "Lorem ipsum"
        }
      },
      "score": "sum"
    }
  }
}' | grep '_score'

echo " "
echo "NOTE!!! One parent doc matched but with with score 1.1490486 which is the sum of the child doc scores as computed by the default similarity (i.e. 0.61871845 + 0.53033006) not BM25. With BM25, the expected parent doc score should have been 0.80081946 + 0.67905 = 1.47986946"

This affects every version from 0.90.6 all the way to 0.90.11-SNAPSHOT.

A proposed fix may be to carry over the similarity configured in the IndexSearcher passed to the createWeight() methods. See gpstathis@21d4a76 for an example. All org.elasticsearch.index.search.child package tests pass with these modifications.

@s1monw
Copy link
Contributor

s1monw commented Jan 31, 2014

hey @gpstathis this makes a lot of sense to me though! Would you be able to open a PullRequest for this and singe the CLA so we can pull it in? It would be great to have that in the next release which is coming very soon

@gpstathis
Copy link
Contributor Author

CLA is signed. Pull request #4979 is open.

@gpstathis
Copy link
Contributor Author

Hey @s1monw, @martijnvg, thanks for the lightning fast turnaround! I see most 0.90.11 tickets are closed. The release must be really close? ;-)

@kimchy
Copy link
Member

kimchy commented Feb 3, 2014

@gpstathis really close ;)

@gpstathis
Copy link
Contributor Author

👍

@s1monw
Copy link
Contributor

s1monw commented Feb 3, 2014

yeah I guess you had a good timing @gpstathis ;)

@gpstathis
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants