Skip to content

Conversation

@ZENOTME
Copy link
Contributor

@ZENOTME ZENOTME commented Jan 1, 2026

  • add binomial_bounds to support calculate lower_bound&&upper_bound
  • add get_lower_bound&&get_upper_bound in ThetaSketch

cc @notfilippo @Xuanwo @PsiACE @tisonkun

/// assert!(lower_bound <= estimate);
/// assert!(estimate <= upper_bound);
/// ```
pub fn get_lower_bound(&self, num_std_devs: u32) -> Result<f64, Error> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the other PR, we don't need the get_ prefix.

/// assert!(lower_bound <= estimate);
/// assert!(estimate <= upper_bound);
/// ```
pub fn get_upper_bound(&self, num_std_devs: u32) -> Result<f64, Error> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

* add binomial_bounds to support calculate lower_bound&&upper_bound
* add get_lower_bound&&get_upper_bound in ThetaSketch
@leerho
Copy link
Contributor

leerho commented Jan 4, 2026

Waiting on requested change.

Comment on lines +154 to +159
pub fn lower_bound(&self, num_std_devs: u32) -> Result<f64, Error> {
if !self.is_estimation_mode() {
return Ok(self.num_retained() as f64);
}
binomial_bounds::lower_bound(self.num_retained() as u64, self.theta(), num_std_devs)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a NumStdDev enum:

/// Number of standard deviations for confidence bounds
///
/// This enum specifies the number of standard deviations to use when computing
/// upper and lower bounds for cardinality estimates. Higher values provide wider
/// confidence intervals with greater certainty that the true cardinality falls
/// within the bounds.
#[repr(u8)]
pub enum NumStdDev {
/// One standard deviation (\~68% confidence interval)
One = 1,
/// Two standard deviations (\~95% confidence interval)
Two = 2,
/// Three standard deviations (\~99.7% confidence interval)
Three = 3,
}

You can reuse it by moving the struct to a shared module level and thus we would never error here.

Besides, we may consider where to place these shared structs, including the ResizeFactor. datasketches::{NumStdDev, ResizeFactor} is fair enough now, but perhaps pollute the top-level namespace too much.

Comment on lines +748 to +764
// Comment for this larger test like Java
// for ci in 1..=3 {
// let arr = run_test_aux(2000, ci, 1e-7);
// for j in 0..5 {
// let ratio = arr[j] / STD[i][j];
// assert!(
// (ratio - 1.0).abs() < TOL,
// "ci={}, j={}: expected {}, got {}, ratio={}",
// ci,
// j,
// STD[i][j],
// arr[j],
// ratio
// );
// }
// i += 1;
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much time it costs? I'd prefer not to comment out tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't seem to find the source for this commented out test in either C++ or Java . Could you give me a full link?

Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants