Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce error tightening ratio to 0.499 #53

Merged
merged 2 commits into from Sep 18, 2012

Conversation

ploxiln
Copy link
Contributor

@ploxiln ploxiln commented Sep 17, 2012

necessary for scaling bloom's false-positive rate to not increase
well beyond the desired error rate when multiple levels are full

setting it at exactly 0.5 caused a segfault I don't understand

@mreiferson
Copy link
Contributor

Didn't we have some issues (perhaps separately resolved) where setting it lower was problematic?

@ploxiln
Copy link
Contributor Author

ploxiln commented Sep 18, 2012

I don't think we had any issues with lower numbers. It was originally at .9, and then at .7. Lower numbers mean a bigger file and better accuracy (and are only applicable to scaling blooms). I think we want the overall "false positive rate" to approach but not exceed the specified accuracy. Why it wasn't at .5 before, is because we weren't sure how we were going to judge the overall accuracy, and since then I decided that "false positive rate" was probably the best way.

My mention of 0.5 causing problems is probably because of a compiler optimization, maybe just on my distro / toolchain. Unlike all the other values we've tried, 0.5 is the only one that can be represented exactly with a base-2 float (or double). I'll investigate a bit more.

@ploxiln
Copy link
Contributor Author

ploxiln commented Sep 18, 2012

ready for re-review @mreiferson @jphines

an error tightening ratio of 0.5 combined with a capacity of 100,000 and a target error rate of 0.05 just happened to trigger a rare bug when re-opening a scaling bloom, by having one of the counting blooms start at or within 12 bytes just before a memory page boundary

@mccutchen
Copy link

holy crap that sounds like a strange bug to track down

/me runs back to Python

@ploxiln
Copy link
Contributor Author

ploxiln commented Sep 18, 2012

@mccutchen not really; the bug is simply that the code was reading just past the part of the file it memory-mapped, but was usually "ok" because the OS needs to map an entire page at a time, so had usually mapped past that point also. I'm just lucky that I happened to run into it

@mreiferson
Copy link
Contributor

squash into 2

when mapping to an existing file, we don't get access to those
numbers until after new_counting_bloom_from_scale() calls bitmap_resize()

this went unnoticed for so long because the os mmaps the file a page
at a time, so it almost always maps a bit more of the file than was
requested
necessary for scaling bloom's false-positive rate to not increase
well beyond the desired error rate when multiple levels are full
@ploxiln
Copy link
Contributor Author

ploxiln commented Sep 18, 2012

squashed

mreiferson added a commit that referenced this pull request Sep 18, 2012
reduce error tightening ratio to 0.499
@mreiferson mreiferson merged commit 1727ee1 into bitly:master Sep 18, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants