Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring aggregators #210

Open
tiedotguy opened this issue Jan 29, 2019 · 2 comments
Open

Refactoring aggregators #210

tiedotguy opened this issue Jan 29, 2019 · 2 comments
Labels

Comments

@tiedotguy
Copy link
Collaborator

An issue we have is hot aggregators - the same metric name overwhelming a single aggregator. As it's handled on a single aggregator, it can't scale.

I've come up with an alternate approach, which I believe will enable scaling. The core data structure (which we'll call Consolidator until a better name comes up) is a slice of channel of MetricMaps and an index:

type Consolidator struct {
  next uint64 // atomic
  maps []chan *gostatsd.MetricMap
}

Data is submitted to the Consolidator by atomically incrementing the index, reading from the channel with the returned index (modulo the size of the slice), adding any metrics to the map returned, and sending the map back.

When it's time to do actual aggregation, the flusher will receive from each channel, and send a new map.

All the maps can then be combined and aggregated (multiple goroutines optional). If they're combined in to a master MetricMap, then the Reset will still apply, allowing persistence to still exist.

Properties:

  • UDP is non-blocking (except for the swap), even under extreme load with a single metric name, as the number of goroutines is fixed.
  • HTTP can potentially block, as the number of goroutines is unbounded.
  • No back pressure. A large number of tag sets would eat memory. I'm not even sure how it could be added explicitly.
@tiedotguy
Copy link
Collaborator Author

Benchmarking indicates it's slightly better to use a regular buffered chan. I initially avoided that to reduce fan-in pressure, however it appears to be unnecessary as there's enough work done between channel access. There's practically no code difference anyway, and the intent is clearer, so a buffered channel should be used instead.

There may be a hazard with flushing though, as it needs to drain the entire channel before adding new maps, rather than swapping.

@tiedotguy
Copy link
Collaborator Author

The hazard with flushing is good, as it allows us to create back pressure, by not pushing new maps until processing of the old ones is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant