-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sampling rate support #53
Comments
Hi. The format is supported. But the value restoration is missing from the code. I think this happened because we are not unsing sampling in our systems. It is still should be considered a bug and have to be fixed. |
I've pushed the fix into 0.8.0-alpha1 tag The current fix is quick, and may bring some precision loss due to historical usage of float32 for sampling along with float64 in all other calculations. This should not be an issue because a much bigger loss is assumed when you do the sampling itself. But anyways the better and bigger change is going to be introduced in 0.8.0. @Karsonito, could be great if you tested the change with your requirements and your environment. |
Sorry for delay. Thanks for improvement! All works correctly. |
Pushed |
It's a pity I didn't notice that timing aggregates is broken when rate < 1 Compare results for statsd and bioyino:
All bioyino timing aggregates is not correctly restored. |
Actually, statsd seems to be ignoring the rate totally in your case. All we do with the rate at the moment is dividing the original value by it. This was copied from brubeck, but original docs in statsd don't say much about what it relly means and how the value whould be changed. |
Thanks for attention. "Actually, statsd seems to be ignoring the rate totally in your case." All other aggregates seems correct in my opinion I tested the brubeck and it gave the same result as statsd:
"What behaviour do you actually expect here for rate > 1 and < 1? Does it have to depend on metric type?" |
This seems not totally correct to me. Let me try to model a sampling example. Imagine rate = 0.5 and the original series like this:
The
Now with this approach:
And we also have to consider at least one edge case, when metrics with different rates come at the same time. Nothing stops you from doing like this:
It seems, that the only real problem here is the situation with the timer storage. I think it should be solved by interpreting the sampled timers in another way than unsampled and by storing the special kind of vector processed by a separate formulas on aggregation. Such vectors will obviously take more CPU spent on aggregation of a such metric. Not totally sure, but at first sight for percentiles it will become O(N/2) instead of O(1) compared to unsampled timers. This also will mean that sampled and unsampled timers will not be mixable together inside one aggregation period. So for the example case above only a fist metric will be considered, and others will be errors because of sampling rate mismatch. While it would take some time to program, I think this will give the correctness we would want here. The price is the results will not probably be the same as with the other statsd solutions. What do you think? |
"And I would prefer the intuitive and correct algorithm here rather than blindly pursuing the compatilbility."
I don't see practical sense in using multi rates in one aggregation period. I don't know how you calculating all aggregations. Do you have to restore all 100% of values to make correct calculations? Restoring all values is not friendly for memory and cpu. Especially without reasonable goal ) |
What about rate aggregate? I mean not a sample rate, but rate, i.e. metrics per second. For a series of 300 values sampled at sample_rate = 0.1 for 30 seconds you want to have |
rate = sum / interval = 300 / 30 = 10 |
So you want the "system" rate, the one which is received by bio, not the original one, i.e. the rate on the "source", which in my example would be |
It's not a question of wish ) For example: And such logic of restoring original values should be used in all other cases. Only one exception is bioyino inner stats. It must expose counts without any restoring. Do you agree with me? |
We need original rate. Which is 10 in current example. |
Not here please. I'd like issues to be world-readable just in case. Any other chat - Telegram or something - you're welcome.
I may have miscalculated this, but it is what I meant. You want the original rate inside the real application, not the rate at which bioyino receives them. I've got it anyways, thought the same thing here. |
Excellent! Thank you for your time! Waiting for implementing |
Implemented the correct sampling rate counting along the way to the 0.8.0. Not sure it will go to 0.7, because of lots changes to be done (storing sampling rate in memory was removed in 0.7, but seemed to be required again, so I had to return it back with a non-trivial bunch of changes). Good news: this tag currently runs in our production well, but yet only at cluster servers, not agents. Could be great if you tried the sampling rate counting together with other features. Please make sure to read the CHANGELOG, since it has many changes incompatible with 0.7. |
Thanks for your work! 1. Not all values are received if the
|
|
I tried to push with another metric. It's not helping |
@Karsonito The UDP problem seems to be fixed, should work as intended. Please try the latest tag |
Hello! |
We are using VictoriaMetrics TSDB which uses a little modified gorilla compression I made an analysis of compression and found, that every 3 digits of precision doubles size of record in VM. In monitoring it's useless. It's not a finance tool That's why it's important to have ability to decrease precision for restored values. |
Actually it still may be useful in some edge cases, where counting is done in very low values. Anyways, the problem above(with 32-bit sampling) does not bring precision loss and does not happen because of it. What I can offer is using 32-bit floats instead of 64-bit. I've pushed a new tag where you can try them. To enable this feature you need to use the
We've never tried to run bioyino with f32's, so I cannot guarantee it will all be correct. It should help you with the precision because f32 only have 6 meaningful digits instead of 15. WIll be glad hearing from you how it went and if it is counting things correctly. |
Are you sure you've compiled this with the f32 flag? |
Yes, f32 is enabled: I see I made all possible cleanings before build (I'm newbie in rust)
It doesn't helps:
But it differs from alpha5 without f32:
|
I think I know where the problem is. I'll try to make it less digits correctly, will let you know when it's ready. |
Actually, that was fast. I've changed float serializer to a more correct one. Try the 0.8.0-alpha6 tag.
|
Thanks. Now it's great |
Please merge last fixes in 0.8.0 branch |
These tags were 0.8.0 branch already, so no need to. |
Maybe I misunderstood, but I do not see the last commit in v0.8.0 branch
|
Did not push it somehow. Not it's there |
FYI, I tested resource consumption for 0.6.0, 0.7.2 and 0.8.0 with counters and timers. |
Hello!
If I understand correctly, the sampling rate is not supported like in other statsd implementations.
Bioyino does not "restore" the original value in proportion to the sample rate?
For example, after sending
metric:1|c|@0.01
I expect to receive
metric 100
The text was updated successfully, but these errors were encountered: