Skip to content

terms facet gives wrong count with n_shards > 1 #1305

Closed
@jmchambers

Description

@jmchambers

I'm working with nested documents and have noticed that my faceted search interface is giving the wrong counts when I have more than one shard. To be more specific, I'm working with RDF triples (entity > attribute > value) and I'm nesting the attributes (called predicates in my example):

{
  "_id" : "512a2c022f0b4e3daa341e6c8bcf6c2f",
  "url": "http://dbpedia.org/resource/Alan_Shepard",
  "predicates": [
    {
      "type": "type",
      "string_value": ["thing", "person", "astronaut"]
    }, {
      "type": "label",
      "string_value": ["Alan Shepard"]
    }, {
      "type": "time in space",
      "float_value": [216.950]
    },
    ... lots more
  ]
}

I've created a shell script (https://gist.github.com/1196986) that recreates the problem with a fresh index. The created data set has these totals:

  • thing (30)
  • creative work (20)
  • video game (10)
  • tv show (10)
  • people (10)

With only one shard the following query gives the correct counts no matter what the size parameter is set to:

{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "facets": {
    "type_counts": {
      "terms": {
        "field": "string_value",
        "size": 5
      },
      "nested": "predicates",
      "facet_filter": {
        "term": {
          "type": "type"
        }
      }
    }
  }
}

However, with more than one shard the size parameter affects the accuracy of the counts. If it is equal to or greater than the number of terms returned by the facet query (5 in this case) then it works fine. However, the terms at the bottom of the list start to display low counts as you reduce the size parameter:

With "size" : 4

  • thing (30)
  • creative work (20)
  • video game (10)
  • tv show (9)

With "size" : 3

  • thing (30)
  • creative work (15)
  • video game (9)

With "size" : 2

  • thing (30)
  • creative work (15)

So it looks like the sub-totals from some of the shards aren't being included for some reason. BTW I'm on ubuntu and the problem seems to affect all versions of ES I've tried (17.0, 17.1 and 17.6). Any ideas...?

P.S. absolutely loving ES - it's made my life a lot easier :)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions