Skip to content

Loading…

Use type long for offset #55

Merged
merged 1 commit into from

3 participants

@lilydjwg

On 64bit Linux, when the filter file grows more than 2G, the 32bit offset will overflow.

@ploxiln

If you want offset to be 64 bits, I would prefer the type uint64_t, because long will be 32 bits on most 32bit linux platforms (and 64 bits on most 64 bit linux platforms), so it will be inconsistent between platforms.

As you note, a 32 bit offset limits a single filter to 2G, but single filters don't "grow", they're created at a particular size. The scaling bloom creates new single filters larger and larger, but by the time they reach 2G, isn't that way larger than you would want them to be anyway?

How large are the scaling bloom and counting bloom filters you are using?

@lilydjwg

I think long has the same size and sign as pointers on most linux platforms, and that's what offset is added to.

I just tried to add a lot of data to a scaling bloom for testing purpose. I don't think I will need such big a filter in reality. But I think it's better to allow it to be bigger by just adjusting integer size.

@mreiferson

i'd prefer the un-ambiguous uint64_t as well

@ploxiln

Well it's a good point, not much more than 2GiB can be mem-mapped on 32-bit platforms anyway... I'll take a closer look and see if this is sufficient and robust

@ploxiln ploxiln commented on an outdated diff
src/dablooms.c
@@ -252,7 +252,8 @@ counting_bloom_t *new_counting_bloom(unsigned int capacity, double error_rate, c
int counting_bloom_add(counting_bloom_t *bloom, const char *s, size_t len)
{
- unsigned int index, i, offset;
+ unsigned int index, i;
+ long offset;
@ploxiln
ploxiln added a note

This offset is not the offset of the counting bloom in the scaling bloom, it's the offset of a particular hash functions bits into a counting bloom, so it doesn't help to change this to long. Notice that this offset is only used to calculate index, which is still "unsigned int".

@ploxiln
ploxiln added a note

ditto for counting_bloom_remove() and counting_bloom_check()

@lilydjwg
lilydjwg added a note

Oops... I'll fix them soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ploxiln ploxiln commented on an outdated diff
src/dablooms.c
@@ -319,7 +322,8 @@ int free_scaling_bloom(scaling_bloom_t *bloom)
/* creates a new counting bloom filter from a given scaling bloom filter, with count and id */
counting_bloom_t *new_counting_bloom_from_scale(scaling_bloom_t *bloom)
{
- int i, offset;
+ unsigned int i;
@ploxiln
ploxiln added a note

we don't really need this i which counts off counting blooms in the scaling bloom to go above 2 billion

@lilydjwg
lilydjwg added a note

This should be a mistake. Sorry, I should review before I sent the pull request.

@ploxiln
ploxiln added a note

don't be sorry, we (Bitly people) like to have a lot of back-and-forth in pull requests, and then squash all fixup commits together at the end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ploxiln ploxiln commented on an outdated diff
src/dablooms.c
@@ -544,7 +545,7 @@ scaling_bloom_t *new_scaling_bloom_from_file(unsigned int capacity, double error
{
int fd;
off_t size;
- unsigned int offset;
+ long offset;
@ploxiln
ploxiln added a note

A funny thing about this offset is that it's not really needed. I think you can remove all uses of it in this function, and change
size -= offset to size -= sizeof(scaling_bloom_header_t)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ploxiln

I didn't notice you updated this until now, sorry. Can you do one last thing, and squash all your commits into one commit? (git rebase -i bitly/master and replace "pick" with "squash" before the 2nd and 3rd commits in the list)

@lilydjwg
@ploxiln

thanks

@ploxiln ploxiln merged commit c63b0ee into bitly:master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jan 28, 2013
  1. @lilydjwg
Showing with 14 additions and 16 deletions.
  1. +10 −12 src/dablooms.c
  2. +4 −4 src/dablooms.h
View
22 src/dablooms.c
@@ -109,9 +109,9 @@ bitmap_t *new_bitmap(int fd, size_t bytes)
return bitmap;
}
-int bitmap_increment(bitmap_t *bitmap, unsigned int index, unsigned int offset)
+int bitmap_increment(bitmap_t *bitmap, unsigned int index, long offset)
{
- uint32_t access = index / 2 + offset;
+ long access = index / 2 + offset;
uint8_t temp;
uint8_t n = bitmap->array[access];
if (index % 2 != 0) {
@@ -132,9 +132,9 @@ int bitmap_increment(bitmap_t *bitmap, unsigned int index, unsigned int offset)
}
/* increments the four bit counter */
-int bitmap_decrement(bitmap_t *bitmap, unsigned int index, unsigned int offset)
+int bitmap_decrement(bitmap_t *bitmap, unsigned int index, long offset)
{
- uint32_t access = index / 2 + offset;
+ long access = index / 2 + offset;
uint8_t temp;
uint8_t n = bitmap->array[access];
@@ -156,9 +156,9 @@ int bitmap_decrement(bitmap_t *bitmap, unsigned int index, unsigned int offset)
}
/* decrements the four bit counter */
-int bitmap_check(bitmap_t *bitmap, unsigned int index, unsigned int offset)
+int bitmap_check(bitmap_t *bitmap, unsigned int index, long offset)
{
- unsigned int access = index / 2 + offset;
+ long access = index / 2 + offset;
if (index % 2 != 0 ) {
return bitmap->array[access] & 0x0f;
} else {
@@ -211,7 +211,7 @@ int free_counting_bloom(counting_bloom_t *bloom)
return 0;
}
-counting_bloom_t *counting_bloom_init(unsigned int capacity, double error_rate, unsigned int offset)
+counting_bloom_t *counting_bloom_init(unsigned int capacity, double error_rate, long offset)
{
counting_bloom_t *bloom;
@@ -319,7 +319,8 @@ int free_scaling_bloom(scaling_bloom_t *bloom)
/* creates a new counting bloom filter from a given scaling bloom filter, with count and id */
counting_bloom_t *new_counting_bloom_from_scale(scaling_bloom_t *bloom)
{
- int i, offset;
+ int i;
+ long offset;
double error_rate;
counting_bloom_t *cur_bloom;
@@ -544,7 +545,6 @@ scaling_bloom_t *new_scaling_bloom_from_file(unsigned int capacity, double error
{
int fd;
off_t size;
- unsigned int offset;
scaling_bloom_t *bloom;
counting_bloom_t *cur_bloom;
@@ -564,13 +564,11 @@ scaling_bloom_t *new_scaling_bloom_from_file(unsigned int capacity, double error
bloom = scaling_bloom_init(capacity, error_rate, filename, fd);
- offset = sizeof(scaling_bloom_header_t);
- size -= offset;
+ size -= sizeof(scaling_bloom_header_t);
while (size) {
cur_bloom = new_counting_bloom_from_scale(bloom);
// leave count and id as they were set in the file
size -= cur_bloom->num_bytes;
- offset += cur_bloom->num_bytes;
if (size < 0) {
free_scaling_bloom(bloom);
fprintf(stderr, "Error, Actual filesize and expected filesize are not equal\n");
View
8 src/dablooms.h
@@ -17,9 +17,9 @@ typedef struct {
bitmap_t *bitmap_resize(bitmap_t *bitmap, size_t old_size, size_t new_size);
bitmap_t *new_bitmap(int fd, size_t bytes);
-int bitmap_increment(bitmap_t *bitmap, unsigned int index, unsigned int offset);
-int bitmap_decrement(bitmap_t *bitmap, unsigned int index, unsigned int offset);
-int bitmap_check(bitmap_t *bitmap, unsigned int index, unsigned int offset);
+int bitmap_increment(bitmap_t *bitmap, unsigned int index, long offset);
+int bitmap_decrement(bitmap_t *bitmap, unsigned int index, long offset);
+int bitmap_check(bitmap_t *bitmap, unsigned int index, long offset);
int bitmap_flush(bitmap_t *bitmap);
void free_bitmap(bitmap_t *bitmap);
@@ -34,7 +34,7 @@ typedef struct {
typedef struct {
counting_bloom_header_t *header;
unsigned int capacity;
- unsigned int offset;
+ long offset;
unsigned int counts_per_func;
uint32_t *hashes;
size_t nfuncs;
Something went wrong with that request. Please try again.