-
Notifications
You must be signed in to change notification settings - Fork 10
chore: fine tune frequencies and theta sketches #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
| mod serialization; | ||
| mod sketch; | ||
|
|
||
| pub use self::serde::ItemsSerde; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To @PsiACE,
No need ItemsSerde if we already implement serde over concrete FrequentItemsSketch<i64> and FrequentItemsSketch<String>.
But the C++ impl does have a Serde abstraction you may investigate in https://github.com/apache/datasketches-cpp/blob/7bb979d3/common/include/serde.hpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @AlexanderSaydakov I'd appreciate it if you can share the story and usage of serde.hpp.
| load_threshold: usize, | ||
| keys: Vec<Option<T>>, | ||
| values: Vec<i64>, | ||
| values: Vec<u64>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All offsets and counts are non-negative. Use u64 for all these places.
| /// Shifts all values by `adjust_amount`. | ||
| /// | ||
| /// This is used during purges to decrement counters. | ||
| pub fn adjust_all_values_by(&mut self, adjust_amount: i64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revoke pub when it's only used in the same file. Less publicity, easier reasoning.
| #[should_panic(expected = "count may not be negative")] | ||
| fn test_longs_negative_count_panics() { | ||
| let mut sketch: FrequentItemsSketch<i64> = FrequentItemsSketch::new(8); | ||
| sketch.update_with_count(1, -1); | ||
| } | ||
|
|
||
| #[test] | ||
| #[should_panic(expected = "count may not be negative")] | ||
| fn test_items_negative_count_panics() { | ||
| let mut sketch = FrequentItemsSketch::new(8); | ||
| sketch.update_with_count("a".to_string(), -1); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never panic by type check.
datasketches/src/theta/mod.rs
Outdated
| pub use self::hash_table::ResizeFactor; | ||
| pub use self::sketch::ThetaSketch; | ||
| pub use self::sketch::ThetaSketchBuilder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To @ZENOTME,
ThetaSketchBuilder and ResizeFactor are needed because ThetaSketch::builder exposes the builder, which has a resize_factor method that only accepts a ResizeFactor instance.
PsiACE
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm 💯 on frequencies sketch.
Signed-off-by: tison <wander4096@gmail.com>
| #[derive(Debug, Clone, Copy, PartialEq, Eq)] | ||
| pub enum ResizeFactor { | ||
| /// Do not resize. Sketch will be configured to full size. | ||
| X1, | ||
| /// Resize by factor of 2 | ||
| X2, | ||
| /// Resize by factor of 4 | ||
| X4, | ||
| /// Resize by factor of 8 | ||
| X8, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved ResizeFactor as a top-level concept. This is shared among ThetaSketch, TupleSketch, and several sampling sketches.
cc @leerho for double checking that put ResizeFactor a top-level exported symbol looks good.
|
@leerho this PR is mergable. I'll have a follow-up to use the new codec utils for frequencies serde impls. But I don't extend this PR further to make it too mixed. |
Xuanwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems nice!
cc @PsiACE @ZENOTME