Add series aggregation DSL function `aggregate` #2294

hermanschaaf · 2018-08-17T15:55:37Z

This PR adds an aggregate DSL function, which allows one to combine different series in a seriesSet using a specified aggregator (currently min, max, p50, avg).

This is particularly useful when comparing data across different weeks (using the over) function. In our case, for anomaly detection, we want to compare the current day's data with an aggregated view of the same day in previous weeks. In particular, we want to compare each point in the last day to the median of each point in the corresponding day for the last 3 weeks, so that any anomalies that occurred in a previous week are ignored. This way we compare with a hypothetical "perfect" day.

For example:

$weeks = over("avg:10m-avg-zero:os.cpu", "24h", "1w", 3)
$a = aggregate($weeks, "", "p50")
merge($a, $q)

Which looks like this:

Or, if we wanted to combine series but maintain the region and color groups`, that query would look like this:

$weeks = over("avg:10m-avg-zero:os.cpu{region=*,color=*}", "24h", "1w", 3)
aggregate($weeks, "region,color", "p50")

which would result in one merged series for each unique region/color combination.

I am very happy to take suggestions for changes / improvements. With regards to naming the function, I would have probably chosen "merge", but since that is already taken, I went with the OpenTSDB terminology and used "aggregate".

kylebrandt · 2018-08-17T16:03:38Z

Oh I really like this idea. These are operations I often feel that the TSDB should do, but often they don't or not with the order of operations you may want.

Some things to think about:

Line up timestamps?
What to do on operations that include NaN values etc

Looking at how r or pandas does things can show some of the possibilities. If we just want to start with a simple function with limited behavior (for example, timestamps must line up, nan values or ignored) etc that is fine. But we might not want to take the aggregate func name for that.

kylebrandt · 2018-08-17T16:06:24Z

Another factor is aggregation and tags / dimensions, it will most like be very useful to be able to do a group by within a tagset.

So for example, says you have a series that is host=*,disk=*, you would pass an argument that would just have "host", you would get series in the set tag with each host, and the disks were aggregated across those host.

Edit: ... Oh, maybe you already do this...

hermanschaaf · 2018-08-17T18:22:51Z

Thanks for the feedback @kylebrandt!

Another factor is aggregation and tags / dimensions, it will most like be very useful to be able to do a group by within a tagset.

Indeed, the existing code behaves exactly like you described. Only if you leave the groups argument empty, it aggregates across all series regardless of tags.
NaN values: Good point, I'll add some test cases to see how that behaves now. How do you think that should work? Say we have three points that are aligned by timestamp, 1, 3, NaN, should we make the resulting point NaN regardless of the aggregation function applied?
For lining up timestamps. At the moment, I look at all the points that do line up and aggregate them, and any that don't align are considered on their own. I think this would be the expected behavior from this aggregate function (or whatever we end up calling it). In OpenTSDB at least, it's possible to make sure all points line up by using a downsample when retrieving the points.

Perhaps we can add an additional function to snap points in each series, similar to how OpenTSDB does downsampling, in a different PR?
Any suggestions for names? I chose aggregate because it matches OpenTSDB naming conventions. combine or combineSeries? groupby, maybe? That makes less sense if no group is specified.

kylebrandt · 2018-08-17T18:45:43Z

NaN: Honestly I don't think we really ever thought that out well enough. If we are sticking with the simplest possible version, I would just drop them

Lining up timestamps: Agreed, building interpolation is no small request. Not only do you have to downsample but also upsample. There are all sorts of different ways to do this

Name: I wanted to reserve aggregation because I kind of image that to be a function with options about NaN, downsampling, etc in the future. Then simpler functions that do common behaviors ( basically aggregation with common options) could be added. But you are correct that it is a good name. So I would recommend we call it something temporary like ~ zagg() for now. With the assumption being that we can alias the function to aggregation and deprecate the old name if we end up not really developing this feature much more in say the next 6 months, but in a way that won't break people's configs. Or if this will become a name suitable for one of the various time series aggregation category of functions it could change in the same way.

kylebrandt · 2018-08-17T18:51:00Z

cmd/bosun/expr/funcs.go

+	"aggregate": {
+		Args:   []models.FuncType{models.TypeSeriesSet, models.TypeString, models.TypeString},
+		Return: models.TypeSeriesSet,
+		Tags:   tagFirst,


This should be a function that extracts what the resulting tag keys will be, so it will come from groups arg to Aggregate arg[1]. The reason is that if you look in expr/parse, the nodes have a Tag method which makes sure at parse time that the resulting tags of function make sense in any sort of union operations.

Ah, thanks, that makes sense now. I updated it, please have another look when you get a chance.

kylebrandt · 2018-08-17T18:53:45Z

cmd/bosun/expr/funcs.go

+		}
+		res.Group = opentsdb.TagSet{}
+		results.Results = append(results.Results, res)
+	} else {


drop the else, return return &results, nil above in the if condition, and then the rest of the code does not need to be indented. Same idea under https://github.com/golang/go/wiki/CodeReviewComments#indent-error-flow in terms of keeping minimal indentation.

kylebrandt · 2018-08-17T18:55:23Z

cmd/bosun/expr/funcs.go

+			vals := []string{}
+			for _, grp := range grps {
+				if val, ok := result.Group[grp]; ok {
+					vals = append(vals, val)


No need for else, can `continue in the if statement, and then have the error after the if statement. Same idea of minimal indentaiton

kylebrandt · 2018-08-17T18:58:54Z

cmd/bosun/expr/funcs.go

+	res := Result{}
+	newSeries := make(Series)
+
+	switch aggregator {


Probably useful to mimic what we do in http://bosun.org/expressions#windowquery-string-duration-string-period-string-num-scalar-funcname-string-seriesset :

In addition to supporting Bosun’s reduction functions that take on argument, percentile operations may be be done by setting funcName to p followed by number that is between 0 and 1 (inclusively). For example, "p.25" will be the 25th percentile, "p.999" will be the 99.9th percentile. "p0" and "p1" are min and max respectively (However, in these cases it is recommended to use "min" and "max" for the sake of clarity.

Nice, I wasn't aware that it's done that way in the window query. Will make it so 👍

hermanschaaf · 2018-08-17T19:34:31Z

For NaN, I added a test that just asserts that NaN-aligned values remain NaN - I'm not sure we should drop them here, since that's what the dropna function is there for? (It was awkard to test, since reflect.DeepEqual doesn't consider two math.NaN values to be equal)

Left to do:

update the name to zaggr (if you're happy with that?)
use flexible p values like in the window function
the build is failing with Failed to stop bosun cleanly. Exit code 0 (124=60s test timeout reached) - any idea why?

kylebrandt · 2018-08-17T19:37:58Z

Just "aggr"? reasonable unlike a z prefix, and leaves agg and aggregate for future.

kylebrandt · 2018-08-17T19:49:25Z

I've never really been happy with dropna though, since it can result in an empty series which I think throws an error because we never really defined what to do about that.

For now I would maybe:

Just drop any NaN values from series
if one of the series in the set ends up empty because of this, drop the series
I guess error if you end up with an empty set, or return the empty set and see what bosun does with it

I say the above because I'm trying to think of what will make the function least error prone (annoying) in the context of some of the stuff that needs to be better defined in Bosun in general.

So in the bigger picture outside of your PR (I should read through existing tests, maybe some of this is covered):

What do we do with items in a set that have no value? Whatever we do, is there a reason for it - and is what we really want in the context of alerting (what most of the expression language is for, but at least mentally distinguishing between funcs that are meant just for graphing vs to be used in an alert expression might help lead to some sane path)
What do we do with a set that has no values in it (again is that what we want etc). How would an empty set behave as part of a union operation?

kylebrandt · 2018-08-17T19:51:13Z

Regarding the timeout I doubt that is you, probably just travis being slow - we use a free tier.

kylebrandt · 2018-08-17T20:43:04Z

Recent push to master will require an update, but should just be as simple as removing the T miniprofiler.Timer argument.

hermanschaaf · 2018-08-17T20:54:02Z

Ok, thanks! I'll make those changes (probably tomorrow or early next week, getting a bit late over here). I'll let you know when I've made them.

Just about the NaN values again: as it stands, nothing that the aggregation function does can result in a NaN, so the only way you can end up with it is if there were NaN values in the original series given as input. Maybe I'm missing the wider context, but wouldn't that be the expected behavior? If we drop NaN entries, I feel like we're doing things users might not expect from this function.

…ion-dsl

hermanschaaf · 2018-08-18T14:53:12Z

@kylebrandt Ok, I have updated the code to:

use flexible percentile values like in the window function,
use the aggr function name
resolve conflicts with master
test the percentile values
run goimports

NaN behavior is still to pass them along, to avoid the complexities and possibly unexpected behavior of removing them.

What do you think?

kylebrandt · 2018-08-18T14:59:40Z

cmd/bosun/expr/expr.go

@@ -290,7 +290,7 @@ func (a *Results) Equal(b *Results) (bool, error) {
 	if a.IgnoreOtherUnjoined != b.IgnoreOtherUnjoined {
 		return false, fmt.Errorf("ignoreUnjoined flag does not match a: %v, b: %v", a.IgnoreOtherUnjoined, b.IgnoreOtherUnjoined)
 	}
-	if a.NaNValue != a.NaNValue {
+	if a.NaNValue != b.NaNValue {


kylebrandt · 2018-08-18T15:09:09Z

cmd/bosun/expr/funcs.go

@@ -385,6 +411,198 @@ var builtins = map[string]parse.Func{
 	},
 }

+// Aggr combines multiple series matching the specified groups using an aggregator function. If group
+// is empty, all given series are combined, regardless of existing groups.
+// Available aggregator functions include: avg (average), p50 (median), min (minimum) and max (maximum).


update comment to reflect changes

would be good to have sum as well

Updated the comment and added sum. Also updated the documentation accordingly.

kylebrandt · 2018-08-18T15:11:41Z

cmd/bosun/expr/funcs.go

+	for _, result := range series {
+		for t, v := range result.Value.Value().(Series) {
+			newSeries[t] += v
+			counts[t] += 1


counts[t]++

That's it, no more Python for me 🐍

kylebrandt · 2018-08-18T15:33:39Z

Few more comments but in general looks good to me - I'm content to go with your take on NaN.

I think maybe the number of aggrThing functions (or at least the lines of code in them) could probably be reduced by a function that takes another reduction function as an argument or something like this ... but I'm okay with the aggrThing funcs as is -- so don't worry about it.

This will be quite useful :-) +1

hermanschaaf · 2018-08-18T18:24:58Z

Thanks, I removed two of the aggrThing functions (aggrMin and aggrMax) by using aggrPercentile. But yeah I agree, it would be nice to generalize these in the future so it can take any function. Potential future improvement :)

hermanschaaf · 2018-08-18T18:50:46Z

Ok, made some fixes to the documentation too. All done making changes now 👍

kylebrandt · 2018-08-20T13:42:26Z

@hermanschaaf Looks good! In a new PR can you add a little more to the documentation in terms of behavior: for example, timestamps needing to line up, what happens with NaN etc. ?

hermanschaaf · 2018-08-21T13:58:24Z

Thanks @kylebrandt - sure thing, I will open a new PR for that

Herman Schaaf added 4 commits August 16, 2018 10:12

Add aggregate function to DSL

07c5964

Allow specifying of groups in arguments

a59c21e

Add documentation for aggregate function

14a0446

Add missing argument name

cb4b59a

hermanschaaf changed the title ~~Add series aggregation DSL~~ Add series aggregation DSL function aggregate Aug 17, 2018

kylebrandt reviewed Aug 17, 2018

View reviewed changes

Herman Schaaf added 2 commits August 17, 2018 20:26

Test NaN case and use correct tags function

1a1c3db

Use correct sprintf placeholders

95070fd

Fix errorf formatting

1a18143

Herman Schaaf added 5 commits August 18, 2018 10:58

Run goimports

3b0fab7

Merge remote-tracking branch 'origin/master' into add-series-aggregat…

bd2c25d

…ion-dsl

Merge with upstream master

0a86bcf

Rename to aggr, use flexible percentiles

b499438

Add tests for p0 and p1

3657f97

kylebrandt reviewed Aug 18, 2018

View reviewed changes

Herman Schaaf added 2 commits August 18, 2018 19:08

Add sum aggregator, and simplify min and max code

ea6edf6

Update documentation for aggr function

e4efb67

Improve formatting on documentation

d11513f

kylebrandt merged commit d7dfefd into bosun-monitor:master Aug 20, 2018

ghost mentioned this pull request Aug 29, 2018

Inconsistent behaviour with over() compared to bandTSDB() and window() #2297

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add series aggregation DSL function `aggregate` #2294

Add series aggregation DSL function `aggregate` #2294

hermanschaaf commented Aug 17, 2018

kylebrandt commented Aug 17, 2018

kylebrandt commented Aug 17, 2018 •

edited

hermanschaaf commented Aug 17, 2018

kylebrandt commented Aug 17, 2018

kylebrandt Aug 17, 2018

hermanschaaf Aug 17, 2018

kylebrandt Aug 17, 2018

hermanschaaf Aug 17, 2018

kylebrandt Aug 17, 2018

hermanschaaf Aug 17, 2018

kylebrandt Aug 17, 2018

hermanschaaf Aug 17, 2018

hermanschaaf commented Aug 17, 2018

kylebrandt commented Aug 17, 2018

kylebrandt commented Aug 17, 2018

kylebrandt commented Aug 17, 2018

kylebrandt commented Aug 17, 2018

hermanschaaf commented Aug 17, 2018 •

edited

hermanschaaf commented Aug 18, 2018

kylebrandt Aug 18, 2018

kylebrandt Aug 18, 2018

kylebrandt Aug 18, 2018

hermanschaaf Aug 18, 2018

kylebrandt Aug 18, 2018

hermanschaaf Aug 18, 2018

kylebrandt commented Aug 18, 2018

hermanschaaf commented Aug 18, 2018

hermanschaaf commented Aug 18, 2018

kylebrandt commented Aug 20, 2018

hermanschaaf commented Aug 21, 2018

Add series aggregation DSL function aggregate #2294

Add series aggregation DSL function aggregate #2294

Conversation

hermanschaaf commented Aug 17, 2018

kylebrandt commented Aug 17, 2018

kylebrandt commented Aug 17, 2018 • edited

hermanschaaf commented Aug 17, 2018

kylebrandt commented Aug 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hermanschaaf commented Aug 17, 2018

kylebrandt commented Aug 17, 2018

kylebrandt commented Aug 17, 2018

kylebrandt commented Aug 17, 2018

kylebrandt commented Aug 17, 2018

hermanschaaf commented Aug 17, 2018 • edited

hermanschaaf commented Aug 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kylebrandt commented Aug 18, 2018

hermanschaaf commented Aug 18, 2018

hermanschaaf commented Aug 18, 2018

kylebrandt commented Aug 20, 2018

hermanschaaf commented Aug 21, 2018

Add series aggregation DSL function `aggregate` #2294

Add series aggregation DSL function `aggregate` #2294

kylebrandt commented Aug 17, 2018 •

edited

hermanschaaf commented Aug 17, 2018 •

edited