Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantics of expected_max_items_in_table changed #35

Closed
GoogleCodeExporter opened this issue Apr 11, 2015 · 2 comments
Closed

Semantics of expected_max_items_in_table changed #35

GoogleCodeExporter opened this issue Apr 11, 2015 · 2 comments

Comments

@GoogleCodeExporter
Copy link

Between versions 1.1 and 1.4 the semantics of the constructor argument
expected_max_items_in_table changed due to this patch:

-    : hash(hf), equals(eql), get_key(ext), num_deleted(0),
-      use_deleted(false), delval(), table(min_size(0, n)) {    // start small
+    : hash(hf), equals(eql), get_key(ext), num_deleted(0), use_deleted(false),
+      delval(), enlarge_resize_percent(HT_OCCUPANCY_FLT),
+      shrink_resize_percent(HT_EMPTY_FLT),
+      table(expected_max_items_in_table == 0
+            ? HT_DEFAULT_STARTING_BUCKETS
+            : min_size(expected_max_items_in_table, 0)) {

The argument used to be passed as the second argument to min_size
(min_buckets_wanted) in version 1.1, and is now passed as the first agument
to min_size (num_elts).

This change has a large effect on the memory requirements of
sparse_hashtable. With version 1.1,
sparse_hash_set(1<<30)
would require 256 MB of RAM. Whereas with version 1.4, it requires double
that, at 512 MB of RAM.

Are the semantics of version 1.4 the correct, intended semantics, or a
regression? How do versions 1.2 and 1.3 behave?

The comment to min_size says:
  // This is the smallest size a hashtable can be without being too crowded
  // If you like, you can give a min #buckets as well as a min #elts
It does not define the term `elt'. What does it mean?

Cheers,
Shaun

Original issue reported on code.google.com by sjackman on 4 Mar 2009 at 12:21

@GoogleCodeExporter
Copy link
Author

Yes, this is intentional.  See
http://groups.google.com/group/google-sparsehash/browse_thread/thread/3d6e83a8ac
9b828b#.
 I probably should announce it better.

Since the table size is always a multiple of two, I'm not surprised you're 
seeing a
doubling in size.  In the past, the hashtable was really too small to hold 1<<30
items.  Now it should be sized more appropriately for that.  If you want a 
smaller
hashtable, best to give it an arg of less than 1<<30, indicating you'll be 
inserting
less than a billion items.  Or you could raise HT_OCCPANCY_FLT.

} The comment to min_size says:
}   // This is the smallest size a hashtable can be without being too crowded
}   // If you like, you can give a min #buckets as well as a min #elts
} It does not define the term `elt'. What does it mean?

elt is an abbreviation for element.


Original comment by csilv...@gmail.com on 4 Mar 2009 at 8:56

  • Changed state: Invalid
  • Added labels: Priority-Medium, Type-Defect

@GoogleCodeExporter
Copy link
Author

Thanks for the explanation. I'll fix my code to account for HT_OCCPANCY_FLT. In 
my
example above, 2^30 is a very gross estimate of the number of expected items, 
but the
number of elements being rounded up to 2^31 blows my memory budget. I've simply
picked a new `estimate' that is less than 2^30*HT_OCCPANCY_FLT.

Cheers,
Shaun

Original comment by sjackman on 4 Mar 2009 at 9:47

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant