New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow histogram request #4051

Closed
hc4 opened this Issue Aug 2, 2017 · 9 comments

Comments

Projects
None yet
4 participants
@hc4
Contributor

hc4 commented Aug 2, 2017

I've noticed slow elastic queries.
After digging it down I found that problem is caused by not otimal histogramm request in case of empty query.

Current graylog histogramm query:

{
  "query" : {
    "query_string" : {
      "query" : "*",
      "allow_leading_wildcard" : false
    }
  },
  "aggregations" : {
    "gl2_filter" : {
      "filter" : {
        "bool" : {
          "must" : [ {
            "range" : {
              "timestamp" : {
                "from" : "2017-08-02 17:59:46.786",
                "to" : "2017-08-02 18:04:46.786",
                "include_lower" : true,
                "include_upper" : true
              }
            }
          }, {
            "query_string" : {
              "query" : "streams:573c6b2e1d8ed80b7a82fcb2"
            }
          } ]
        }
      },
      "aggregations" : {
        "gl2_histogram" : {
          "date_histogram" : {
            "field" : "timestamp",
            "interval" : "1m"
          }
        }
      }
    }
  }
}

Optimized query:

{
  "query" : {
    "bool" : {
      "must" : {
        "match_all" : { }
      },
	  "filter" : {
        "bool" : {
          "must" : [ {
            "range" : {
              "timestamp" : {
                "from" : "2017-08-02 17:59:46.786",
                "to" : "2017-08-02 18:04:46.786",
                "include_lower" : true,
                "include_upper" : true
              }
            }
          }, {
            "query_string" : {
              "query" : "streams:573c6b2e1d8ed80b7a82fcb2"
            }
          } ]
        }
      }
    }
  },
  "aggregations" : {
    "gl2_filter" : {
      "filter" : {
        "match_all" : { }
      },
      "aggregations" : {
        "gl2_histogram" : {
          "date_histogram" : {
            "field" : "timestamp",
            "interval" : "1m"
          }
        }
      }
    }
  }
}

First version tooks about 1m in my setup, second version less than 1s :)
Intresting thing is that data request uses such optimization.

Your Environment

  • Graylog Version: 2.3.0
  • Elasticsearch Version: 5.5.1
@dennisoelkers

This comment has been minimized.

Member

dennisoelkers commented Aug 3, 2017

Thanks for your valuable input @hc4 (as usual)! We will try that out and see if it makes sense for us to adapt our query.

@dennisoelkers dennisoelkers self-assigned this Aug 3, 2017

@hc4

This comment has been minimized.

Contributor

hc4 commented Aug 3, 2017

Actually empty filter in aggs sections is not needed.
So request should look like this:

{
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "bool": {
          "must": [{
            "range": {
              "timestamp": {
                "from": "2017-08-02 17:59:46.786",
                "to": "2017-08-02 18:04:46.786",
                "include_lower": true,
                "include_upper": true
              }
            }
          }, {
            "query_string": {
              "query": "streams:573c6b2e1d8ed80b7a82fcb2"
            }
          }]
        }
      }
    }
  },
  "aggregations": {
    "gl2_histogram": {
      "date_histogram": {
        "field": "timestamp",
        "interval": "1m"
      }
    }
  }
}

Notice, that "query" part is identical to data request's.

{
  "from": 0,
  "size": 150,
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "bool": {
          "must": [{
            "range": {
              "timestamp": {
                "from": "2017-08-02 17:59:43.393",
                "to": "2017-08-02 18:04:43.393",
                "include_lower": true,
                "include_upper": true
              }
            }
          }, {
            "query_string": {
              "query": "streams:573c6b2e1d8ed80b7a82fcb2"
            }
          }]
        }
      }
    }
  },
  "sort": [{
    "timestamp": {
      "order": "desc"
    }
  }]
}

So builder logic which builds "query" for data and hist reqeusts could be shard (imho, it must be shared).
And another little bug, is that time window for data and hist requests is different (in relative mode).

@hc4

This comment has been minimized.

Contributor

hc4 commented Aug 3, 2017

also "size" option could be added to not request actual data (only hist buckets)
Got it from Grafana query builder :)
like this:

{
  "size" : 0,
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "bool": {
          "must": [{
            "range": {
              "timestamp": {
                "from": "2017-08-02 17:59:46.786",
                "to": "2017-08-02 18:04:46.786",
                "include_lower": true,
                "include_upper": true
              }
            }
          }, {
            "query_string": {
              "query": "streams:573c6b2e1d8ed80b7a82fcb2"
            }
          }]
        }
      }
    }
  },
  "aggregations": {
    "gl2_histogram": {
      "date_histogram": {
        "field": "timestamp",
        "interval": "1m"
      }
    }
  }
}

dennisoelkers added a commit that referenced this issue Aug 3, 2017

Filter by time range in search query instead of aggregation.
This change improves the queries generated by the
Searches#terms/fieldStats/histogram/fieldHistogram methods. By filtering
the search result based on the specified time range in the query instead
of the aggregation, overall ES runtime is improved.

Fortunately, filteredSearchRequest is already returning a matching query
+ time range search builder which can be used for those cases. Before
this change, the source builder was generated by the specific method and
contained only the query instead of query + time range.

Fixes #4051.

@wafflebot wafflebot bot added the in progress label Aug 3, 2017

@dennisoelkers dennisoelkers changed the title from Slow histogramm request to Slow histogram request Aug 10, 2017

@wafflebot wafflebot bot removed the in progress label Aug 10, 2017

kroepke added a commit that referenced this issue Aug 10, 2017

Improve filtering for standard aggregation queries. (#4056)
* Filter by time range in search query instead of aggregation.

This change improves the queries generated by the
Searches#terms/fieldStats/histogram/fieldHistogram methods. By filtering
the search result based on the specified time range in the query instead
of the aggregation, overall ES runtime is improved.

Fortunately, filteredSearchRequest is already returning a matching query
+ time range search builder which can be used for those cases. Before
this change, the source builder was generated by the specific method and
contained only the query instead of query + time range.

Fixes #4051.

* Removing now unneeded code, fetching correct aggregations.
@kroepke

This comment has been minimized.

Member

kroepke commented Aug 10, 2017

I think #4056 should be backported to 2.3 as well, @Graylog2/engineers opinions?

@hc4

This comment has been minimized.

Contributor

hc4 commented Aug 14, 2017

Maybe create PR for 2.3.1 branch to not loose this fix?

@bernd

This comment has been minimized.

Member

bernd commented Aug 14, 2017

@hc4 We decided to not back port this into 2.3 because there is a chance that this introduces some regressions. During the 2.4 release cycle we have some more time to test this.

@hc4

This comment has been minimized.

Contributor

hc4 commented Aug 14, 2017

Is there any ETA for 2.4?
This improvement is very improtant for me, because default histogramm request takes minnutes to complete in my setup, while fixed one takes less than a second.

@bernd

This comment has been minimized.

Member

bernd commented Aug 16, 2017

@hc4 No ETA yet, but I think it will happen in Q4/2017. No promises, though. 😉

@hc4

This comment has been minimized.

Contributor

hc4 commented Aug 28, 2017

Ha. Problem, with slow hist request was partially fixed in ES 5.5.2 (elastic/elasticsearch#25872)
Now query "*" processed as match_all filter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment