Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for sampling metrics #182

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ianks
Copy link

@ianks ianks commented May 4, 2023

This PR adds some initial support for sampling individual metrics. I made it an opt-in feature in Cargo so folks won't pull in the rand dependency by default.

Since I'm new to the codebase, I probably did some things sub-optimally. Let me know I can improve it!

resolves #179

@56quarters
Copy link
Owner

Thanks! I'll try to take a first pass over this in a bit.

@ianks
Copy link
Author

ianks commented May 4, 2023

I messed around with a slight refactor of this PR which tries a slightly different approach by moving the sampling behavior to a new type. LMK which you prefer:

diff --git i/cadence/src/types.rs w/cadence/src/types.rs
index 3ececec..ee4c227 100644
--- i/cadence/src/types.rs
+++ w/cadence/src/types.rs
@@ -8,6 +8,7 @@
 // option. This file may not be copied, modified, or distributed
 // except according to those terms.
 
+use crate::builder::SampleRate;
 use crate::builder::{MetricFormatter, MetricValue};
 use std::error;
 use std::fmt;
@@ -19,6 +20,76 @@ use std::io;
 /// types of metrics as defined in the [Statsd spec](https://github.com/b/statsd_spec).
 pub trait Metric {
     fn as_metric_str(&self) -> &str;
+
+    fn is_sampled(&self) -> bool {
+        true
+    }
+}
+
+/// Marker trait for metrics that can be sampled.
+///
+/// > A float between 0 and 1, inclusive. Only works with COUNT, HISTOGRAM,
+/// > DISTRIBUTION, and TIMER metrics. The default is 1, which samples 100% of the
+/// > time.
+/// > - via [DataDog](https://docs.datadoghq.com/developers/dogstatsd/datagram_shell)
+pub trait Sampleable: Metric {
+    /// Returns a new metric that indicates this metric was sampled.
+    fn sampled(self, sample_rate: f32) -> Result<Sampled<Self>, MetricError>
+    where
+        Self: Sized,
+    {
+        Sampled::new(self, sample_rate)
+    }
+}
+
+impl Sampleable for Counter {}
+impl Sampleable for Timer {}
+impl Sampleable for Gauge {}
+impl Sampleable for Histogram {}
+
+/// Wraps a that indicates a metric was sampled.
+pub struct Sampled<T: Sampleable> {
+    is_sampled: bool,
+    repr: String,
+    _marker: std::marker::PhantomData<T>,
+}
+
+impl<T: Sampleable> Sampled<T> {
+    pub fn new<S: TryInto<SampleRate>>(
+        metric: T,
+        sample_rate: S,
+    ) -> Result<Self, <S as std::convert::TryInto<SampleRate>>::Error> {
+        let sample_rate = sample_rate.try_into()?;
+        let mut repr = String::with_capacity(metric.as_metric_str().len() + sample_rate.kv_size());
+        repr.push_str(metric.as_metric_str());
+        repr.push('|');
+        repr.push_str(sample_rate.as_str());
+
+        #[cfg(feature = "sample-rate")]
+        use rand::Rng;
+
+        #[cfg(feature = "sample-rate")]
+        let is_sampled = rand::thread_rng().gen_bool(sample_rate.value() as f64);
+
+        #[cfg(not(feature = "sample-rate"))]
+        let is_sampled = true;
+
+        Ok(Sampled {
+            repr,
+            is_sampled,
+            _marker: std::marker::PhantomData,
+        })
+    }
+}
+
+impl<T: Sampleable> Metric for Sampled<T> {
+    fn as_metric_str(&self) -> &str {
+        &self.repr
+    }
+
+    fn is_sampled(&self) -> bool {
+        self.is_sampled
+    }
 }
 
 /// Counters are simple values incremented or decremented by a client.
@@ -307,7 +378,7 @@ pub type MetricResult<T> = Result<T, MetricError>;
 mod tests {
     #![allow(deprecated, deprecated_in_future)]
 
-    use super::{Counter, ErrorKind, Gauge, Histogram, Meter, Metric, MetricError, Set, Timer};
+    use super::{Counter, ErrorKind, Gauge, Histogram, Meter, Metric, MetricError, Sampleable, Set, Timer};
     use std::error::Error;
     use std::io;
 
@@ -421,4 +492,17 @@ mod tests {
         let our_err = MetricError::from((ErrorKind::InvalidInput, "Nope!"));
         assert!(our_err.source().is_none());
     }
+
+    #[test]
+    fn test_metrics_can_be_wrapped_as_sampled() {
+        let counter = Counter::new("my.app.", "test.counter", 4).sampled(1.0 / 3.0).unwrap();
+        let gauge = Gauge::new("my.app.", "test.gauge", 2).sampled(0.5).unwrap();
+        let histogram = Histogram::new("my.app.", "test.histogram", 45).sampled(0.5).unwrap();
+        let timer = Timer::new("my.app.", "test.timer", 34).sampled(0.5).unwrap();
+
+        assert_eq!("my.app.test.counter:4|c|@0.33333", counter.as_metric_str());
+        assert_eq!("my.app.test.timer:34|ms|@0.5", timer.as_metric_str());
+        assert_eq!("my.app.test.gauge:2|g|@0.5", gauge.as_metric_str());
+        assert_eq!("my.app.test.histogram:45|h|@0.5", histogram.as_metric_str());
+    }
 }

@56quarters
Copy link
Owner

Before I take a more thorough look, I've come around to unconditionally including the dependency on rand since it would allow us to get rid of all the conditional compilation. WDYT?

@ianks
Copy link
Author

ianks commented May 4, 2023

100% agree. Most folks will have it in their dep tree already, anyway I imagine.

Copy link
Owner

@56quarters 56quarters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The public API for this roughly looks how I want, thanks! There are a few changes I'd like to see with how this is implemented that fall into a few categories:

  • I'd like to propagate errors related to bad sample rates instead of panicking or ignoring them.
  • ByteStr is a nice touch but I'd rather do something simpler/dumber and use float formatting for write!() with a fixed number of digits.
  • Removing the feature flag as discussed and combining the no-op and rand based Samplers.

I think once the above changes are made, there might be further changes we could make to simplify this a bit. Thanks again for this feature!

@@ -14,5 +14,13 @@ autobenches = false

[dependencies]
crossbeam-channel = "0.5.1"
rand = { version = "0.8.5", optional = true }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, fine to make this required and remove the feature flag.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize it was possible to have a builder.rs and builder/. Can you you move this to builder/mod.rs?

use crate::client::{MetricBackend, StatsdClient};
use crate::types::{Metric, MetricError, MetricResult};
#[cfg(feature = "sample-rate")]
use sample_rate::SampleRate;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use full import paths everywhere: crate::builder::sample_rate

self.sample_rate = Some(sr);
self.kv_size += sr.kv_size();
}
Err(e) => panic!("invalid sample rate for metric {}: {}", self.key, e),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want Cadence to crash people's application since statsd metrics are by definition lossy (and so dropping something invalid or returning an error would be preferable). Instead, I think we should take a different approach to setting the sample rate. Described below on the MetricBuilder changes.

#[cfg(feature = "sample-rate")]
fn write_sample_rate(&self, out: &mut String) {
if let Some(sample_rate) = self.sample_rate {
if sample_rate.is_applicable_to_metric(self.type_) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the idea of silently ignoring sample rates for metrics where it doesn't make sense. This seems like the type of API guarantee that people would come to rely on even if they shouldn't. Instead, I'd rather return an error as described below.

#[cfg(feature = "sample-rate")]
pub fn with_sample_rate(mut self, sample_rate: f32) -> Self {
if let BuilderRepr::Success(ref mut formatter, _) = self.repr {
formatter.with_sample_rate(sample_rate);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should eagerly parse the sample rate here, validating it's between 0 and 1 and return an error if it's not by changing BuildRepr. Example below, sketching the idea (I haven't tested this and it doesn't need to look like this verbatim):

pub fn with_sample_rate(mut self, sample_rate: f32) -> Self {
    if let BuilderRepr::Success(ref mut formatter, type_) = self.repr {
         // note that we're passing type_ here and so can enforce that sampling is only applied when it makes sense for the metric type.
        match SampleRate::parse(sample_rate, type_) {
            Ok(rate) => { formatter.with_sample_rate(rate); },
            Err(e) => {  self.repr = BuilderRepr::Error(e); }
        }
    }  
    self
}


/// Returns the sampler to use for this metric, based on features.
#[cfg(feature = "sample-rate")]
fn sampler(&self) -> Option<Sampler> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of returning an Option<Sampler> here could we use a no-op version instead at runtime?

len: usize,
}

impl<const N: usize> ByteStr<N> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the care taken to not cause any allocations but I'd be fine with doing something less involved to format the sample rate like write!(some_buf, "|@{:.6}", rate); instead of using this type.


match formatter.sampler() {
Some(sampler) => {
if let Some(sampled_metric) = sampler.sample(&metric) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like sampler.sample(m) actually needs the metric, it's effectively just returning a bool to allow the send or not.

if formatter.sampler().allow() {
    client.send_metric(&metric)?
}

Ok(())

@@ -710,17 +798,6 @@ mod tests {
assert_eq!(1, errors.load(Ordering::Acquire));
}

#[test]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this removed on purpose?

@56quarters
Copy link
Owner

Unrelated to your changes, but make sure you're running tests and linting locally since it seems like I haven't correctly set CI to run for forks.

@56quarters
Copy link
Owner

I messed around with a slight refactor of this PR which tries a slightly different approach by moving the sampling behavior to a new type. LMK which you prefer:

My preference would be for the original implementation, adding a .with_sample_rate() method to MetricBuilder. It feels more natural to me that you can only add sampling when you're "building" a metric to send, rather than applying a sample after a Metric (Counter, Gauge, etc) has already been constructed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support being able to sample metrics
2 participants