Skip to content

Conversation

@tisonkun
Copy link
Member

Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
mod serialization;
mod sketch;

pub use self::serde::ItemsSerde;
Copy link
Member Author

@tisonkun tisonkun Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To @PsiACE,

No need ItemsSerde if we already implement serde over concrete FrequentItemsSketch<i64> and FrequentItemsSketch<String>.

But the C++ impl does have a Serde abstraction you may investigate in https://github.com/apache/datasketches-cpp/blob/7bb979d3/common/include/serde.hpp

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @AlexanderSaydakov I'd appreciate it if you can share the story and usage of serde.hpp.

load_threshold: usize,
keys: Vec<Option<T>>,
values: Vec<i64>,
values: Vec<u64>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All offsets and counts are non-negative. Use u64 for all these places.

/// Shifts all values by `adjust_amount`.
///
/// This is used during purges to decrement counters.
pub fn adjust_all_values_by(&mut self, adjust_amount: i64) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revoke pub when it's only used in the same file. Less publicity, easier reasoning.

Comment on lines -483 to -495
#[should_panic(expected = "count may not be negative")]
fn test_longs_negative_count_panics() {
let mut sketch: FrequentItemsSketch<i64> = FrequentItemsSketch::new(8);
sketch.update_with_count(1, -1);
}

#[test]
#[should_panic(expected = "count may not be negative")]
fn test_items_negative_count_panics() {
let mut sketch = FrequentItemsSketch::new(8);
sketch.update_with_count("a".to_string(), -1);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never panic by type check.

Signed-off-by: tison <wander4096@gmail.com>
Comment on lines 35 to 37
pub use self::hash_table::ResizeFactor;
pub use self::sketch::ThetaSketch;
pub use self::sketch::ThetaSketchBuilder;
Copy link
Member Author

@tisonkun tisonkun Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To @ZENOTME,

ThetaSketchBuilder and ResizeFactor are needed because ThetaSketch::builder exposes the builder, which has a resize_factor method that only accepts a ResizeFactor instance.

Copy link
Member

@PsiACE PsiACE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 💯 on frequencies sketch.

PsiACE

This comment was marked as duplicate.

Signed-off-by: tison <wander4096@gmail.com>
Comment on lines +35 to +45
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum ResizeFactor {
/// Do not resize. Sketch will be configured to full size.
X1,
/// Resize by factor of 2
X2,
/// Resize by factor of 4
X4,
/// Resize by factor of 8
X8,
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved ResizeFactor as a top-level concept. This is shared among ThetaSketch, TupleSketch, and several sampling sketches.

cc @leerho for double checking that put ResizeFactor a top-level exported symbol looks good.

@tisonkun
Copy link
Member Author

@leerho this PR is mergable. I'll have a follow-up to use the new codec utils for frequencies serde impls. But I don't extend this PR further to make it too mixed.

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems nice!

@leerho leerho merged commit c2cbb7c into apache:main Dec 31, 2025
9 checks passed
@tisonkun tisonkun deleted the finetune branch December 31, 2025 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants