Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when trying to run tester #244

Closed
TomMizrachi opened this issue Jun 18, 2019 · 7 comments
Closed

Error when trying to run tester #244

TomMizrachi opened this issue Jun 18, 2019 · 7 comments

Comments

@TomMizrachi
Copy link

~/gostatsd/cmd/tester$ make setup

go get -u github.com/githubnemo/CompileDaemon
go get -u github.com/alecthomas/gometalinter
go: finding github.com/nicksnyder/go-i18n/i18n latest
go: finding github.com/alecthomas/units latest
build github.com/alecthomas/gometalinter: cannot load github.com/nicksnyder/go-i18n/i18n: cannot find module providing package github.com/nicksnyder/go-i18n/i18n
make: *** [setup] Error 1

Seems like the author of go-il8n decided to move the il8n folder to a new folder called v2

@TomMizrachi
Copy link
Author

Would've opened a PR for this guy myself but his repo is archived:
https://github.com/alecthomas/gometalinter/blob/b242b54b75005af59cb3a06620085146709b598a/vendor/manifest

@tiedotguy
Copy link
Collaborator

Hi

Honestly, the entire tester section hasn't been maintained, and I'm not sure if it will work, even if it gets past linting. We switched to golangci-lint a couple of months back (#234), and this portion wasn't noticed.

make check in the project root will do an install of the binary, but it's more of a side-effect of the build/testing than anything though.

@TomMizrachi
Copy link
Author

I see, so is there any other way I can test the throughput of gostatsd on linux?
Or you can add to the README file some more information regarding this subject?

Anyway, thanks for the quick response :)

@tiedotguy
Copy link
Collaborator

I used to use an internal load tester, but I don't have access to it anymore. @aelse, would you mind opening the repo? I had some local changes which I didn't save, but it's probably a better start than the tester binary.

I can tell you a bit about my experience scaling and running it in production. Generally it scales linearly on the number of cores you give it, at about 15-25k/metrics/second/core. On the high end of that you'll likely hit problems with packet loss, so it's important to watch that at the host layer. It's also packet/second intensive much more so than raw bandwidth, so you may find your PPS plateaus even when you have CPU spare. You'll also get better throughput if the clients send multiple metrics/packet.

The big killer for performance is hot metrics. Metrics are distributed deterministically to aggregators on a hash of name+host (not tags, and if --ignore-host is used, it might be only name). If a single aggregator is overloaded, that causes back pressure through the system, and eventual packet loss. I have a plan to fix it (#210), but haven't had the time to get to it.

As the cardinality sent to the backend increases, so to does the time to flush - if it's over your flush interval, then it will skip that flush. This can lead to unexpected behaviour, such as higher incoming throughput because because you're flushing half as much, and not using that CPU. Also it can be confusing when querying from the actual backend.

I always struggled to find a good load generator profile, because we had such a wide variety - some clients had 1 metric per packet, some had 80 (jumbo frames ftw). Some clients have everything on one metric (leading to hot aggregators), and some clients don't use tags, so the metrics are spread out very evenly.

In the end, we moved to the distributed model, with forwarder nodes that can do the majority of the work, and forward over HTTP for final aggregation. I'm not even sure what the raw metrics/second is now. It's still not horizontally scalable yet, but at least one bottleneck has been removed :)

Pretty much all of this is covered by internal metrics, but only the Datadog backend has good metrics for its behavior.

@tiedotguy
Copy link
Collaborator

Also on the note of updating docs - I really want to get the system horizontally scalable, and then rewrite them, documenting the different deployment models.

I want to remove limitations, not document them :)

@TomMizrachi
Copy link
Author

Thanks for the detailed answer! :)
I'll close the issue

@tiedotguy
Copy link
Collaborator

Hi @TomMizrachi, quick FYI - If you're still interested in the topic, I've just pushed a branch with a new load tester on it (#332). Minimal deps, and should be much simpler to build, with simple command line options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants