Skip to content
Permalink
Browse files
HBASE-23562 [operator tools] Add a RegionsMerge tool that allows for …
…merging multiple adjacent regions until a desired number of regions is reached.

Co-authored-by: BukrosSzabolcs <bukros.szabolcs@gmail.com>

Closes #56

Signed-off-by: Josh Elser <elserj@apache.org>
  • Loading branch information
wchevreuil authored and joshelser committed Apr 10, 2020
1 parent 3e9a150 commit 9aa27feea4f6b214a59842685a500b7c05c682d8
Showing 7 changed files with 719 additions and 0 deletions.
@@ -60,5 +60,10 @@
<artifactId>hbase-hbck2</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase.operator.tools</groupId>
<artifactId>hbase-tools</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
</project>
@@ -0,0 +1,87 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Apache HBase Tool for merging regions

_RegionsMerger_ is an utility tool for manually merging bunch of regions of
a given table. It's mainly useful on situations when an HBase cluster has too
many regions per RegionServers, and many of these regions are small enough that
it can be merged together, reducing the total number of regions in the cluster
and releasing RegionServers overall memory resources.

This may happen for mistakenly pre-splits, or after a purge in table
data, as regions would not be automatically merged.

## Setup
Make sure HBase tools jar is added to HBase classpath:

```
export HBASE_CLASSPATH=$HBASE_CLASSPATH:./hbase-tools-1.1.0-SNAPSHOT.jar
```

## Usage

_RegionsMerger_ requires two arguments as parameters: 1) The name of the table
to have regions merged; 2) The desired total number of regions for the informed
table. For example, to merge all regions of table `my-table` until it gets to a
total of 5 regions, assuming the _setup_ step above has been performed:

```
$ hbase org.apache.hbase.RegionsMerger my-table 5
```

## Implementation Details

_RegionsMerger_ uses client API
_org.apache.hadoop.hbase.client.Admin.getRegions_ to fetch the list of regions
for the specified table, iterates through the resulting list, identifying pairs
of adjacent regions. For each pair found, it submits a merge request using
_org.apache.hadoop.hbase.client.Admin.mergeRegionsAsync_ client API method.
This means multiple merge requests had been sent once the whole list has been
iterated.

Assuming that all merges issued by the RegionsMerger are successful, the resulting number of
regions will be no more than half the original number of regions. This resulting total
might not be equal to the target value passed as parameter, in which case
_RegionsMerger_ will perform another round of merge requests, this time over
the current existing regions (it fetches another list of regions from
_org.apache.hadoop.hbase.client.Admin.getRegions_).

Merge requests are processed asynchronously. HBase may take a certain time to
complete some merge requests, so _RegionsMerger_ may perform some sleep between
rounds of regions iteration for sending requests. The specific amount of time is
configured by `hbase.tools.merge.sleep` property, in milliseconds, and it
defaults to `2000`(2 seconds).

While iterating through the list of regions, once a pair of adjacent regions is
detected, _RegionsMerger_ checks the current file system size of each region (excluding MOB data),
before deciding to submit the merge request for the given regions. If the sum of
both regions size exceeds a threshold, merge will not be attempted.
This threshold is a configurable percentage of `hbase.hregion.max.filesize`
value, and is applied to avoid merged regions from getting immediately split
after the merge completes, which would happen automatically if the resulting
region size reaches `hbase.hregion.max.filesize` value. The percentage of
`hbase.hregion.max.filesize` is a double value configurable via
`hbase.tools.merge.upper.mark` property and it defaults to `0.9`.

Given this `hbase.hregion.max.filesize` restriction for merge results, it may be
impossible to achieve the desired total number of regions.
_RegionsMerger_ keeps tracking the progress of regions merges, on each round.
If no progress is observed after a configurable amount of rounds,
_RegionsMerger_ aborts automatically. The limit of rounds without progress is an
integer value configured via `hbase.tools.max.iterations.blocked` property.
@@ -0,0 +1,203 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<!--
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-->
<modelVersion>4.0.0</modelVersion>
<parent>
<artifactId>hbase-operator-tools</artifactId>
<groupId>org.apache.hbase.operator.tools</groupId>
<version>1.1.0-SNAPSHOT</version>
<relativePath>..</relativePath>
</parent>


<artifactId>hbase-tools</artifactId>
<name>Apache HBase - HBase Tools</name>
<description>Utility Maintenance tools for HBase 2+</description>
<properties>
<hbase-thirdparty.version>2.2.1</hbase-thirdparty.version>
<log4j2.version>2.11.1</log4j2.version>
</properties>

<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>${log4j2.version}</version>
</dependency>

<!--We want to use the shaded client but for testing, we need to rely on hbase-server.
HBASE-15666 is about how shaded-client and hbase-server won't work together.
TODO: Fix.-->

<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>${hbase.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-shaded-testing-util</artifactId>
<version>${hbase.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-zookeeper</artifactId>
<version>${hbase.version}</version>
<scope>provided</scope>
<type>test-jar</type>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>${hbase.version}</version>
<scope>provided</scope>
<type>test-jar</type>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-testing-util</artifactId>
<version>${hbase.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<version>2.1.0</version>
<scope>test</scope>
</dependency>
</dependencies>

<build>
<resources>
<resource>
<directory>src/main/resources</directory>
<filtering>true</filtering>
</resource>
</resources>
<testResources>
<testResource>
<directory>src/test/resources/META-INF/</directory>
<targetPath>META-INF/</targetPath>
<includes>
<include>NOTICE</include>
</includes>
<filtering>true</filtering>
</testResource>
</testResources>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-remote-resources-plugin</artifactId>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
</plugin>
<!-- Make a jar and put the sources in the jar -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
</plugin>
<!--Used packaging a fat jar-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<excludes>
<exclude>classworlds:classworlds</exclude>
<exclude>junit:junit</exclude>
<exclude>jmock:*</exclude>
<exclude>*:xml-apis</exclude>
<exclude>org.apache.maven:lib:tests</exclude>
<exclude>log4j:log4j:jar:</exclude>
</excludes>
</artifactSet>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-checkstyle-plugin</artifactId>
<configuration>
<failOnViolation>true</failOnViolation>
</configuration>
</plugin>
</plugins>
</build>
<profiles>
<!-- Needs to match the profile in apache parent pom -->
<profile>
<id>apache-release</id>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-resources-plugin</artifactId>
<executions>
<execution>
<id>license-javadocs</id>
<phase>prepare-package</phase>
<goals>
<goal>copy-resources</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/apidocs</outputDirectory>
<resources>
<resource>
<directory>src/main/javadoc/META-INF/</directory>
<targetPath>META-INF/</targetPath>
<includes>
<include>NOTICE</include>
</includes>
<filtering>true</filtering>
</resource>
</resources>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
</profiles>
</project>

0 comments on commit 9aa27fe

Please sign in to comment.