Browse files

More edits to the 'how many regions' section from our man Kevin O' Dell

git-svn-id: 13f79535-47bb-0310-9956-ffa450edef68
  • Loading branch information...
1 parent 89469e5 commit 378e40ab2b0168e5695386aa995ab1cd6a278cc7 @saintstack saintstack committed Jan 21, 2013
Showing with 20 additions and 4 deletions.
  1. +20 −4 src/docbkx/configuration.xml
@@ -1052,9 +1052,11 @@ index e70ebc6..96f8c27 100644
<link xlink:href="">HTableDescriptor</link>.
<section xml:id="too_many_regions">
- <title>Too many regions</title>
+ <title>How many regions per RegionServer?</title>
- Here are some issues you will run into when lots of regions per regionserver:
+ Typically you want to keep your region count low on HBase for numerous reasons.
+ Usually right around 100 regions per RegionServer has yielded the best results.
+ Here are some of the reasons below for keeping region count low:
MSLAB requires 2mb per memstore (that's 2mb per family per region).
@@ -1069,19 +1071,33 @@ index e70ebc6..96f8c27 100644
at that point they should almost all have about 5MB of data so
it would flush that amount. 5MB inserted later, it would flush another
region that will now have a bit over 5MB of data, and so on.
+ A basic formula for the amount of regions to have per region server would
+ look like this:
+ Heap * upper global memstore limit = amount of heap devoted to memstore
+ then the amount of heap devoted to memstore / (Number of regions per RS * CFs).
+ This will give you the rough memstore size if everything is being written to.
+ A more accurate formula is
+ Heap * upper global memstore limit = amount of heap devoted to memstore then the
+ amount of heap devoted to memstore / (Number of actively written regions per RS * CFs).
+ This can allot you a higher region count from the write perspective if you know how many
+ regions you will be writing to at one time.
<listitem><para>The master as is is allergic to tons of regions, and will
take a lot of time assigning them and moving them around in batches.
The reason is that it's heavy on ZK usage, and it's not very async
at the moment (could really be improved -- and has been imporoved a bunch
in 0.96 hbase).
+ <listitem><para>
+ In older versions of HBase (pre-v2 hfile, 0.90 and previous), tons of regions
+ on a few RS can cause the store file index to rise raising heap usage and can
+ create memory pressure or OOME on the RSs
+ </para></listitem>
<para>Another issue is the effect of the number of regions on mapreduce jobs.
Keeping 5 regions per RS would be too low for a job, whereas 1000 will generate too many maps.
- </para>
+ </para>

0 comments on commit 378e40a

Please sign in to comment.