Skip to content

wrapped compact theta sketch#223

Merged
AlexanderSaydakov merged 4 commits intomasterfrom
wrapped_compact_theta
Jun 30, 2021
Merged

wrapped compact theta sketch#223
AlexanderSaydakov merged 4 commits intomasterfrom
wrapped_compact_theta

Conversation

@AlexanderSaydakov
Copy link
Contributor

No description provided.

@coveralls
Copy link

coveralls commented Jun 23, 2021

Pull Request Test Coverage Report for Build 987577788

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 20 of 29 (68.97%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.07%) to 97.927%

Changes Missing Coverage Covered Lines Changed/Added Lines %
theta/include/compact_theta_sketch_parser_impl.hpp 20 29 68.97%
Totals Coverage Status
Change from base Build 961626121: -0.07%
Covered Lines: 7415
Relevant Lines: 7572

💛 - Coveralls

@AlexanderSaydakov
Copy link
Contributor Author

Theta union lgk16 wrapped compact

This is a comparison of union update time with compact sketches ready at hand, compact sketches with deserialize time included, and wrapped compact sketches (effectively avoiding some cost of deserialization).
We can see that performing a union of wrapped sketches is even slightly faster than using compact sketch objects ready at hand presumably because of a faster iterator over the wrapped sketches.

Copy link
Contributor

@jmalkin jmalkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not a big deal, but none of the tests override the default seed to explicitly test for seed mismatch

const size_t entries_start_u64 = has_theta ? COMPACT_SKETCH_ENTRIES_ESTIMATION_U64 : COMPACT_SKETCH_ENTRIES_EXACT_U64;
const uint64_t* entries = reinterpret_cast<const uint64_t*>(ptr) + entries_start_u64;
const size_t expected_size_bytes = (entries_start_u64 + num_entries) * sizeof(uint64_t);
if (size < expected_size_bytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would we want to count too large a size as an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. I can imagine a larger record and a user who does not know the size of the sketch. The primary goal here was to avoid reading past the end.

const size_t expected_size_bytes = (entries_start_u64 + num_entries) * sizeof(uint64_t);
if (size < expected_size_bytes)
throw std::invalid_argument(std::to_string(expected_size_bytes)
+ " bytes expected, actual " + std::to_string(size) + ", sketch dump: " + hex_dump(reinterpret_cast<const uint8_t*>(ptr), size));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this dump can be quite large (MBs) in some cases. not sure if we need to consider the data security risk in that it'll happily dump arbitrary amounts of memory given a pointer and size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Perhaps this should be made configurable.

return std::vector<char>(other);
}

TEST_CASE() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this test case and its supporting functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this was left there accidentally after some experimenting.

@jmalkin
Copy link
Contributor

jmalkin commented Jun 30, 2021

👍

@AlexanderSaydakov AlexanderSaydakov merged commit 31163d9 into master Jun 30, 2021
@AlexanderSaydakov AlexanderSaydakov deleted the wrapped_compact_theta branch June 30, 2021 23:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants