Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deeply nested aggregations treated like unnested document aggregations #9280

Closed
pickypg opened this issue Jan 13, 2015 · 3 comments
Closed
Assignees

Comments

@pickypg
Copy link
Member

pickypg commented Jan 13, 2015

Given this "small" example, you can create the mapping, index two simple documents, and aggregate against it. This was tested on 1.4.2 (and internally reported on 1.3.1).

The mapping defines two nested objects -- one within the other (comments and comments.tags). The dates object just contains some queried and basic aggregated details.

The data is setup so that it should be immediately obvious if it's wrong or not. Each comment is given a unique ID with comments.cid (1, 2, 3, and 4). Only the odd-numbered (1 and 3) comments have tags added to them. The even-numbered (2 and 4) comments have a comments.identifier meant to be found in the aggregation. The tags are given unique IDs that reflect their parent comment with comments.tags.tid (22 and 44; extended versions of cid just to be unique to avoid confusion).

There are two nested aggregations toward the bottom of the aggregation, and the inner one is unexpectedly passed the tag associated with comments.cid of 2 even though that comment does not match parent aggregations (as evidenced by printouts on the command line seen after the setup)!

# Delete the index/mapping
DELETE /agg-test

PUT /agg-test
{
  "mappings": {
    "provider": {
      "properties": {
        "comments": {
          "type": "nested",
          "properties": {
            "cid": {
              "type": "long"
            },
            "identifier": {
              "type": "string",
              "index": "not_analyzed"
            },
            "tags": {
              "type": "nested",
              "properties": {
                "tid": {
                  "type": "long"
                },
                "name": {
                  "type": "string",
                  "index": "not_analyzed"
                }
              }
            }
          }
        },
        "dates": {
          "properties": {
            "day": {
              "type": "date",
              "format": "dateOptionalTime"
            },
            "month": {
              "properties": {
                "end": {
                  "type": "date",
                  "format": "dateOptionalTime"
                },
                "label": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "start": {
                  "type": "date",
                  "format": "dateOptionalTime"
                }
              }
            }
          }
        }
      }
    }
  }
}

# Feed 2 documents.
POST /agg-test/provider/_bulk
{"index": {"_id": "12208"}}
{"_id": 12208, "dates": {"month": {"label": "2014-11", "end": "2014-11-30", "start": "2014-11-01"}, "day": "2014-11-30"}, "comments": [{"cid": 3,"identifier": "29111"}, {"cid": 4,"tags": [{"tid" :44,"name": "Roles"}], "identifier": "29101"}]}
{"index": {"_id": "12222"}}
{"_id": 12222, "dates": {"month": {"label": "2014-12", "end": "2014-12-31", "start": "2014-12-01"}, "day": "2014-12-03"}, "comments": [{"cid": 1, "identifier": "29111"}, {"cid": 2,"tags": [{"tid" : 22, "name": "DataChannels"}], "identifier": "29101"}]}

# Verify the mappings if you want 
GET /agg-test/_mappings

# Run query, it should return no comments.tags because no comment has identifier 29111 _and_ a tag, but it will a tag!
GET /agg-test/provider/_search
{
  "aggregations": {
    "startDate": {
      "terms": {
        "field": "dates.month.start",
        "size": 50
      },
      "aggregations": {
        "endDate": {
          "terms": {
            "field": "dates.month.end",
            "size": 50
          },
          "aggregations": {
            "period": {
              "terms": {
                "field": "dates.month.label",
                "size": 50
              },
              "aggregations": {
                "ctxt_idfier_nested": {
                  "nested": {
                    "path": "comments"
                  },
                  "aggregations": {
                    "comment_filter": {
                      "filter": {
                        "term": {
                          "comments.identifier": "29111"
                        }
                      },
                      "aggregations": {
                        "comment_script": {
                          "terms": {
                            "script": "println \"Comment Map: ${doc.get('cid')}\"; println \"ID: ${doc['comments.cid'].value}\"; return doc['comments.identifier'].value",
                            "size": 50
                          }
                        },
                        "nested_tags": {
                          "nested": {
                            "path": "comments.tags"
                          },
                          "aggregations": {
                            "tag_script": {
                              "terms": {
                                "script": "println \"Tag Map: ${doc.get('tid')}\"; println \"ID: ${doc['comments.tags.tid'].value}\"; return doc['comments.tags.name'].value",
                                "size": 50
                              }
                            },
                            "tag": {
                              "terms": {
                                "field": "comments.tags.name",
                                "size": 50
                              }
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

The scripts exist to verify steps along the way, but this prints out (on the ES node's command line):

Comment Map: [3]
ID: 3
Comment Map: [1]
ID: 1
Tag Map: [22]
ID: 22

Since Comment Map: [2]\nID: 2 never appeared in the output, there is no reason that the Tag should have been processed!

What's more baffling is that you can remove the top two aggregations and it will execute properly:

GET /agg-test/provider/_search
{
  "aggregations": {
    "period": {
      "terms": {
        "field": "dates.month.label",
        "size": 50
      },
      "aggregations": {
        "ctxt_idfier_nested": {
          "nested": {
            "path": "comments"
          },
          "aggregations": {
            "comment_filter": {
              "filter": {
                "term": {
                  "comments.identifier": "29111"
                }
              },
              "aggregations": {
                "comment_script": {
                  "terms": {
                    "script": "println \"Comment Map: ${doc.get('cid')}\"; println \"ID: ${doc['comments.cid'].value}\"; return doc['comments.identifier'].value",
                    "size": 50
                  }
                },
                "nested_tags": {
                  "nested": {
                    "path": "comments.tags"
                  },
                  "aggregations": {
                    "tag_script": {
                      "terms": {
                        "script": "println \"Tag Map: ${doc.get('tid')}\"; println \"ID: ${doc['comments.tags.tid'].value}\"; return doc['comments.tags.name'].value",
                        "size": 50
                      }
                    },
                    "tag": {
                      "terms": {
                        "field": "comments.tags.name",
                        "size": 50
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Prints out

Comment Map: [3]
ID: 3
Comment Map: [1]
ID: 1

Without improperly finding any tags.

EDITED: I fixed the mapping, which must have been copy/pasted wrong.

@colings86
Copy link
Contributor

The mapping seems to be wrong in the above (it doesn't parse) I think it should be:

PUT /agg-test
{
  "mappings": {
    "provider": {
      "properties": {
        "comments": {
          "type": "nested",
          "properties": {
            "cid": {
              "type": "long"
            },
            "identifier": {
              "type": "string",
              "index": "not_analyzed"
            },
            "tags": {
              "type": "nested",
              "properties": {
                "tid": {
                  "type": "long"
                },
                "name": {
                  "type": "string",
                  "index": "not_analyzed"
                }
              }
            }
          }
        },
        "dates": {
          "properties": {
            "day": {
              "type": "date",
              "format": "dateOptionalTime"
            },
            "month": {
              "properties": {
                "end": {
                  "type": "date",
                  "format": "dateOptionalTime"
                },
                "label": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "start": {
                  "type": "date",
                  "format": "dateOptionalTime"
                }
              }
            }
          }
        }
      }
    }
  }
}

@colings86
Copy link
Contributor

Also, this issue doesn't seem to reproduce on the current master branch

@martijnvg
Copy link
Member

@colings86 @pickypg I think I found the bug. The nested aggregator work based on the wrong assumption. It assumes: https://github.com/elasticsearch/elasticsearch/blob/a56520d26d70ae731ed929660e29df119290114f/src/main/java/org/elasticsearch/search/aggregations/bucket/nested/NestedAggregator.java#L65

But this isn't the case if buckets are introduced during executing of aggs. If buckets are created before the actual execution this assumption is true. This results in the parent filter not being resolved correctly for nested nested aggs. (It resolves the root level as filter)

I'll open a pr and move the resolving of the parentFilter to the collect method instead, in that method we can assume that the child filter of a parent nested aggregator is resolved at that time, which can than be as a parent filter for the nested aggregator that are nested below.

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Jan 26, 2015
…y in the case the nested agg gets created on the fly for buckets that are constructed during query execution.

The fix is the move the parent filter resolving from the nextReader(...) method to the collect(...) method, because only then any parent nested filter's parent filter is then properly instantiated.

Closes elastic#9280
Closes elastic#9335
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Jan 26, 2015
…y in the case the nested agg gets created on the fly for buckets that are constructed during query execution.

The fix is the move the parent filter resolving from the nextReader(...) method to the collect(...) method, because only then any parent nested filter's parent filter is then properly instantiated.

Closes elastic#9280
Closes elastic#9335
martijnvg added a commit that referenced this issue Jan 26, 2015
…y in the case the nested agg gets created on the fly for buckets that are constructed during query execution.

The fix is the move the parent filter resolving from the nextReader(...) method to the collect(...) method, because only then any parent nested filter's parent filter is then properly instantiated.

Closes #9280
Closes #9335
martijnvg added a commit that referenced this issue Jan 26, 2015
…y in the case the nested agg gets created on the fly for buckets that are constructed during query execution.

The fix is the move the parent filter resolving from the nextReader(...) method to the collect(...) method, because only then any parent nested filter's parent filter is then properly instantiated.

Closes #9280
Closes #9335
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
…y in the case the nested agg gets created on the fly for buckets that are constructed during query execution.

The fix is the move the parent filter resolving from the nextReader(...) method to the collect(...) method, because only then any parent nested filter's parent filter is then properly instantiated.

Closes elastic#9280
Closes elastic#9335
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
…y in the case the nested agg gets created on the fly for buckets that are constructed during query execution.

The fix is the move the parent filter resolving from the nextReader(...) method to the collect(...) method, because only then any parent nested filter's parent filter is then properly instantiated.

Closes elastic#9280
Closes elastic#9335
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants