Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scripting: ScriptDocValues.getValues() returns an reused list #8576

Closed
dw opened this issue Nov 20, 2014 · 6 comments
Closed

Scripting: ScriptDocValues.getValues() returns an reused list #8576

dw opened this issue Nov 20, 2014 · 6 comments
Assignees

Comments

@dw
Copy link
Contributor

dw commented Nov 20, 2014

Using ElasticSearch 1.4.0 from your official .deb, given an index with:

"plays": {
    "mappings": {
        "play": {
            "_source": {
                "enabled": false
            },
            "_timestamp": {
                "default": null,
                "enabled": true
            },
            "dynamic": "strict",
            "properties": {
                "artist": {
                    "index": "not_analyzed",
                    "type": "string"
                },

And:

"settings": {
    "index": {
        "creation_date": "1416491787644",
        "merge": {
            "scheduler": {
                "max_thread_count": "1"
            }
        },
        "number_of_replicas": "1",
        "number_of_shards": "1",
        "refresh_interval": "-1",
        "uuid": "XDRbDOoLSR-cMFjQAm9TjQ",
        "version": {
            "created": "1040099"
        }
    }
}

A search like:

{
    "script_values": {
        "artist": {
            "script": "_doc['artist'].values"
        }
    }
}

Will return a result set whose hits contain an artist array whose elements correspond to the elements for the last search result, i.e. all previous hits artist arrays assume the same contents as that of the last hit. It looks like an object is being reused somehow, although casting a glance at ScriptDocValues.java I can't see how. Is it possible listLoaded is not being reset somehow?

A trivial workaround is:

{
    "script_values": {
        "artist": {
            "script": "_doc['artist'].values.take(100)"
        }
    }
}
@jpountz jpountz self-assigned this Nov 20, 2014
@dw
Copy link
Contributor Author

dw commented Nov 20, 2014

Aah, clearly the "problem" is that list is reset and reused for every doc in the query, which looks like a performance thing. I'm happy if this was marked as a doc bug rather than a functional bug, the current behaviour seems fair.

@jpountz
Copy link
Contributor

jpountz commented Nov 20, 2014

@dw I tried to reproduce your issue without success. Here is what I ran:

DELETE plays 

PUT plays
{
  "mappings": {
    "play": {
      "dynamic": "strict",
      "properties": {
        "artist": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  },
  "settings": {
    "number_of_shards": 1
  }
}

PUT plays/play/1
{
  "artist": [ "me", "you" ]
}

PUT plays/play/2
{
  "artist": [ "you", "him" ]
}

GET plays/_search
{
  "script_fields": {
    "artist": {
      "script": "doc['artist'].values"
    }
  }
}

which returned

{
   "took": 4,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 1,
      "hits": [
         {
            "_index": "plays",
            "_type": "play",
            "_id": "1",
            "_score": 1,
            "fields": {
               "artist": [
                  [
                     "me",
                     "you"
                  ]
               ]
            }
         },
         {
            "_index": "plays",
            "_type": "play",
            "_id": "2",
            "_score": 1,
            "fields": {
               "artist": [
                  [
                     "him",
                     "you"
                  ]
               ]
            }
         }
      ]
   }
}

Can you maybe guide me a bit so that I can reproduce the issue? I used a fresh install of elasticsearch 1.4.0.

@dw
Copy link
Contributor Author

dw commented Nov 20, 2014

@jpountz huh strange. How many shards does your index have? If >1 and both docs went to a different shard, and the script_value is executed in the context of a shard, that might explain why you don't see it

@dw
Copy link
Contributor Author

dw commented Nov 20, 2014

(I've only been using ElasticSearch a week, so still guessing about things quite heavily)

@dw
Copy link
Contributor Author

dw commented Nov 20, 2014

Ah, doh, sorry, didn't notice you'd set the shards to 1. Hmm, not sure

@jpountz
Copy link
Contributor

jpountz commented Nov 20, 2014

OK I found the reason: this bug only occurs on a per-segment basis, so you need to run an optimize to reproduce the bug. Here is a recreation for the bug:

DELETE plays 

PUT plays
{
  "mappings": {
    "play": {
      "dynamic": "strict",
      "properties": {
        "artist": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  },
  "settings": {
    "number_of_shards": 1
  }
}

PUT plays/play/1
{
  "artist": [ "me", "you" ]
}

PUT plays/play/2
{
  "artist": [ "you", "him" ]
}

POST plays/_optimize?max_num_segments=1

GET plays/_search
{
  "script_fields": {
    "artist": {
      "script": "doc['artist'].values"
    }
  }
}

which returns

{
   "took": 7,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 1,
      "hits": [
         {
            "_index": "plays",
            "_type": "play",
            "_id": "2",
            "_score": 1,
            "fields": {
               "artist": [
                  [
                     "me",
                     "you"
                  ]
               ]
            }
         },
         {
            "_index": "plays",
            "_type": "play",
            "_id": "1",
            "_score": 1,
            "fields": {
               "artist": [
                  [
                     "me",
                     "you"
                  ]
               ]
            }
         }
      ]
   }
}

@jpountz jpountz changed the title ScriptDocValues$String.getValues() appears to return an object that is reused ScriptDocValues.getValues() returns an reused list Nov 21, 2014
jpountz added a commit to jpountz/elasticsearch that referenced this issue Nov 21, 2014
Scripts currently share the same list across invocations to getValues. This
caused a bug in script fields where all documents coming from the same segment
would get the same values (basically, for the next document for which script
values have been requested). Scripts now return a fresh new list on every
invocation to `getValues`.

Close elastic#8576
jpountz added a commit that referenced this issue Nov 25, 2014
Scripts currently share the same list across invocations to getValues. This
caused a bug in script fields where all documents coming from the same segment
would get the same values (basically, for the next document for which script
values have been requested). Scripts now return a fresh new list on every
invocation to `getValues`.

Close #8576
jpountz added a commit that referenced this issue Nov 25, 2014
Scripts currently share the same list across invocations to getValues. This
caused a bug in script fields where all documents coming from the same segment
would get the same values (basically, for the next document for which script
values have been requested). Scripts now return a fresh new list on every
invocation to `getValues`.

Close #8576
@jpountz jpountz added :Core/Infra/Core Core issues without another label :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache and removed v1.3.6 :Core/Infra/Core Core issues without another label labels Nov 25, 2014
@jpountz jpountz changed the title ScriptDocValues.getValues() returns an reused list Scripting: ScriptDocValues.getValues() returns an reused list Nov 25, 2014
@jpountz jpountz removed :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >bug v1.4.1 v1.5.0 v2.0.0-beta1 labels Nov 25, 2014
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
Scripts currently share the same list across invocations to getValues. This
caused a bug in script fields where all documents coming from the same segment
would get the same values (basically, for the next document for which script
values have been requested). Scripts now return a fresh new list on every
invocation to `getValues`.

Close elastic#8576
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants