Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
More edits to the 'how many regions' section from our man Kevin O' Dell
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1436497 13f79535-47bb-0310-9956-ffa450edef68
  • Loading branch information
saintstack committed Jan 21, 2013
1 parent 89469e5 commit 378e40a
Showing 1 changed file with 20 additions and 4 deletions.
24 changes: 20 additions & 4 deletions src/docbkx/configuration.xml
Expand Up @@ -1052,9 +1052,11 @@ index e70ebc6..96f8c27 100644
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>. <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>.
</para> </para>
<section xml:id="too_many_regions"> <section xml:id="too_many_regions">
<title>Too many regions</title> <title>How many regions per RegionServer?</title>
<para> <para>
Here are some issues you will run into when lots of regions per regionserver: Typically you want to keep your region count low on HBase for numerous reasons.
Usually right around 100 regions per RegionServer has yielded the best results.
Here are some of the reasons below for keeping region count low:
<unorderedlist> <unorderedlist>
<listitem><para> <listitem><para>
MSLAB requires 2mb per memstore (that's 2mb per family per region). MSLAB requires 2mb per memstore (that's 2mb per family per region).
Expand All @@ -1069,19 +1071,33 @@ index e70ebc6..96f8c27 100644
at that point they should almost all have about 5MB of data so at that point they should almost all have about 5MB of data so
it would flush that amount. 5MB inserted later, it would flush another it would flush that amount. 5MB inserted later, it would flush another
region that will now have a bit over 5MB of data, and so on. region that will now have a bit over 5MB of data, and so on.
A basic formula for the amount of regions to have per region server would
look like this:
Heap * upper global memstore limit = amount of heap devoted to memstore
then the amount of heap devoted to memstore / (Number of regions per RS * CFs).
This will give you the rough memstore size if everything is being written to.
A more accurate formula is
Heap * upper global memstore limit = amount of heap devoted to memstore then the
amount of heap devoted to memstore / (Number of actively written regions per RS * CFs).
This can allot you a higher region count from the write perspective if you know how many
regions you will be writing to at one time.
</para></listitem> </para></listitem>
<listitem><para>The master as is is allergic to tons of regions, and will <listitem><para>The master as is is allergic to tons of regions, and will
take a lot of time assigning them and moving them around in batches. take a lot of time assigning them and moving them around in batches.
The reason is that it's heavy on ZK usage, and it's not very async The reason is that it's heavy on ZK usage, and it's not very async
at the moment (could really be improved -- and has been imporoved a bunch at the moment (could really be improved -- and has been imporoved a bunch
in 0.96 hbase). in 0.96 hbase).
</para></listitem> </para></listitem>
<listitem><para>
In older versions of HBase (pre-v2 hfile, 0.90 and previous), tons of regions
on a few RS can cause the store file index to rise raising heap usage and can
create memory pressure or OOME on the RSs
</para></listitem>
</unorderedlist> </unorderedlist>
</para> </para>
<para>Another issue is the effect of the number of regions on mapreduce jobs. <para>Another issue is the effect of the number of regions on mapreduce jobs.
Keeping 5 regions per RS would be too low for a job, whereas 1000 will generate too many maps. Keeping 5 regions per RS would be too low for a job, whereas 1000 will generate too many maps.
</para> </para>

</section> </section>


</section> </section>
Expand Down

0 comments on commit 378e40a

Please sign in to comment.