Refactoring aggregators #210

tiedotguy · 2019-01-29T02:01:12Z

An issue we have is hot aggregators - the same metric name overwhelming a single aggregator. As it's handled on a single aggregator, it can't scale.

I've come up with an alternate approach, which I believe will enable scaling. The core data structure (which we'll call Consolidator until a better name comes up) is a slice of channel of MetricMaps and an index:

type Consolidator struct {
  next uint64 // atomic
  maps []chan *gostatsd.MetricMap
}

Data is submitted to the Consolidator by atomically incrementing the index, reading from the channel with the returned index (modulo the size of the slice), adding any metrics to the map returned, and sending the map back.

When it's time to do actual aggregation, the flusher will receive from each channel, and send a new map.

All the maps can then be combined and aggregated (multiple goroutines optional). If they're combined in to a master MetricMap, then the Reset will still apply, allowing persistence to still exist.

Properties:

UDP is non-blocking (except for the swap), even under extreme load with a single metric name, as the number of goroutines is fixed.
HTTP can potentially block, as the number of goroutines is unbounded.
No back pressure. A large number of tag sets would eat memory. I'm not even sure how it could be added explicitly.

The text was updated successfully, but these errors were encountered:

tiedotguy · 2019-02-03T02:42:26Z

Benchmarking indicates it's slightly better to use a regular buffered chan. I initially avoided that to reduce fan-in pressure, however it appears to be unnecessary as there's enough work done between channel access. There's practically no code difference anyway, and the intent is clearer, so a buffered channel should be used instead.

There may be a hazard with flushing though, as it needs to drain the entire channel before adding new maps, rather than swapping.

tiedotguy · 2019-02-07T00:10:28Z

The hazard with flushing is good, as it allows us to create back pressure, by not pushing new maps until processing of the old ones is done.

tiedotguy added the feature label Jan 29, 2019

tiedotguy mentioned this issue Jan 29, 2019

Send metrics through the pipeline in batches #211

Closed

tiedotguy mentioned this issue Feb 18, 2019

Add metric consolidation, some formatting #217

Merged

tiedotguy mentioned this issue Jun 18, 2019

Error when trying to run tester #244

Closed

tiedotguy mentioned this issue Nov 15, 2019

Question: Any plans to add TCP server support?. #276

Closed

tiedotguy mentioned this issue Jul 10, 2020

Remove dispatch metrics #331

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring aggregators #210

Refactoring aggregators #210

tiedotguy commented Jan 29, 2019

tiedotguy commented Feb 3, 2019

tiedotguy commented Feb 7, 2019

Refactoring aggregators #210

Refactoring aggregators #210

Comments

tiedotguy commented Jan 29, 2019

tiedotguy commented Feb 3, 2019

tiedotguy commented Feb 7, 2019