Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic inclusion of empty string by exists filter #8198

Closed
takism1 opened this issue Oct 22, 2014 · 1 comment
Closed

Non-deterministic inclusion of empty string by exists filter #8198

takism1 opened this issue Oct 22, 2014 · 1 comment
Assignees
Labels

Comments

@takism1
Copy link

takism1 commented Oct 22, 2014

This happens with 1.3.4. Repeating these steps enough times I sometimes get both documents returned by the filter and sometimes only the one with the non-empty string.

Delete the index

$ curl -XDELETE localhost:9200/test

Insert 2 documents using bulk API (one has empty string and one does not)

$ cat requests
    { "index" : { "_index" : "test", "_type" : "test", "_id" : "1" } }
    { "title": "" }
    { "index" : { "_index" : "test", "_type" : "test", "_id" : "2" } }
    { "title": "Test document" }
$ curl -XPOST localhost:9200/_bulk --data-binary @requests

This filter sometimes returns only document 2 and sometimes both

$ curl -XPOST localhost:9200/test/_search?pretty -d '
{
    "filter": {
        "exists": {
            "field": "title"
        }
    }
}'

I suspect it may have to do with the order in which the documents are indexed by the bulk API. But in http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_batch_processing.html it is mentioned that "The bulk API executes all the actions sequentially and in order" so I am wondering if this is a bug.

@clintongormley clintongormley self-assigned this Oct 22, 2014
@jpountz
Copy link
Contributor

jpountz commented Oct 22, 2014

I think this is due to the fact that elasticsearch does not generate dynamic mappings for the empty string. Additionally, your documents likely go to different shards, so they are processed in parallel, the order is not defined.

The mapping for title is created by the document that has a non empty title. And sometimes it occurs before the document with an empty title is indexed (when you have 2 matches), sometimes after (when you have 1 match).

I believe this would be fixed by creating dynamic mappings for empty strings but I'm unsure about side effects it could have.

jpountz added a commit to jpountz/elasticsearch that referenced this issue Nov 3, 2014
This will help the exists/missing filters behave as expected in presence of
empty strings, as well as when using a default analyzer that would generate
tokens for an empty string (uncommon).

Close elastic#8198
jpountz added a commit that referenced this issue Nov 4, 2014
This will help the exists/missing filters behave as expected in presence of
empty strings, as well as when using a default analyzer that would generate
tokens for an empty string (uncommon).

Close #8198
@jpountz jpountz closed this as completed in 3501e32 Nov 4, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants