Skip to content
Permalink
Browse files
HBASE-25266 [hbase-operator-tools] Add a repair tool for moving stale… (
#78)

Signed-off-by: Sean Busbey <busbey@apache.org>
  • Loading branch information
wchevreuil committed Jan 25, 2021
1 parent b461d58 commit 97603975f2abd3c3a7503e6e047df0dc5d2cad3a
Showing 5 changed files with 377 additions and 10 deletions.
@@ -25,11 +25,16 @@
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.regex.Pattern;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.PathFilter;
import org.apache.hadoop.hbase.HConstants;
@@ -103,6 +108,61 @@ public static Path getRootDir(final Configuration c) throws IOException {
}

/**
* Copy all files/subdirectories from source path to destination path.
*
* COPIED from FSUtils.copyFilesParallel
*
* @param srcFS FileSystem instance for the source path
* @param src source path
* @param dstFS FileSystem instance for the destination path
* @param dst destination path
* @param conf a valid hbase configuration object
* @param threads number of threads to execute the copy
* @return list of Path representing all items residing int the source path
* @throws IOException e
*/
public static List<Path> copyFilesParallel(FileSystem srcFS, Path src, FileSystem dstFS, Path dst,
Configuration conf, int threads) throws IOException {
ExecutorService pool = Executors.newFixedThreadPool(threads);
List<Future<Void>> futures = new ArrayList<>();
List<Path> traversedPaths;
try {
traversedPaths = copyFiles(srcFS, src, dstFS, dst, conf, pool, futures);
for (Future<Void> future : futures) {
future.get();
}
} catch (ExecutionException | InterruptedException | IOException e) {
throw new IOException("Copy snapshot reference files failed", e);
} finally {
pool.shutdownNow();
}
return traversedPaths;
}

private static List<Path> copyFiles(FileSystem srcFS, Path src, FileSystem dstFS, Path dst,
Configuration conf, ExecutorService pool, List<Future<Void>> futures) throws IOException {
List<Path> traversedPaths = new ArrayList<>();
traversedPaths.add(dst);
FileStatus currentFileStatus = srcFS.getFileStatus(src);
if (currentFileStatus.isDirectory()) {
if (!dstFS.mkdirs(dst)) {
throw new IOException("Create directory failed: " + dst);
}
FileStatus[] subPaths = srcFS.listStatus(src);
for (FileStatus subPath : subPaths) {
traversedPaths.addAll(copyFiles(srcFS, subPath.getPath(), dstFS,
new Path(dst, subPath.getPath().getName()), conf, pool, futures));
}
} else {
Future<Void> future = pool.submit(() -> {
FileUtil.copy(srcFS, src, dstFS, dst, false, false, conf);
return null;
});
futures.add(future);
}
return traversedPaths;
}
/*
*
* COPIED from CommonFSUtils.listStatus
*
@@ -16,16 +16,20 @@
limitations under the License.
-->

# Apache HBase Tool for merging regions
# Extra Tools for fixing Apache HBase inconsistencies

_RegionsMerger_ is an utility tool for manually merging bunch of regions of
a given table. It's mainly useful on situations when an HBase cluster has too
many regions per RegionServers, and many of these regions are small enough that
it can be merged together, reducing the total number of regions in the cluster
and releasing RegionServers overall memory resources.
This _Operator Tools_ module provides extra tools for fixing different types of inconsistencies
in HBase. It differs from _HBCK2_ module by defining more complex operations than the commands
available in _HBCK2_. These tools often perform a set of steps to fix the underlying issues,
sometimes combining _HBCK2_ commands with other existing tools.


The current available tools in this
module are:

- RegionsMerger;
- MissingRegionDirsRepairTool;

This may happen for mistakenly pre-splits, or after a purge in table
data, as regions would not be automatically merged.

## Setup
Make sure HBase tools jar is added to HBase classpath:
@@ -34,7 +38,20 @@ Make sure HBase tools jar is added to HBase classpath:
export HBASE_CLASSPATH=$HBASE_CLASSPATH:./hbase-tools-1.1.0-SNAPSHOT.jar
```

## Usage
Each of these tools are detailed below.

## RegionsMerger - Tool for merging regions

_RegionsMerger_ is an utility tool for manually merging bunch of regions of
a given table. It's mainly useful on situations when an HBase cluster has too
many regions per RegionServers, and many of these regions are small enough that
it can be merged together, reducing the total number of regions in the cluster
and releasing RegionServers overall memory resources.

This may happen for mistakenly pre-splits, or after a purge in table
data, as regions would not be automatically merged.

### Usage

_RegionsMerger_ requires two arguments as parameters: 1) The name of the table
to have regions merged; 2) The desired total number of regions for the informed
@@ -45,7 +62,7 @@ total of 5 regions, assuming the _setup_ step above has been performed:
$ hbase org.apache.hbase.RegionsMerger my-table 5
```

## Implementation Details
### Implementation Details

_RegionsMerger_ uses client API
_org.apache.hadoop.hbase.client.Admin.getRegions_ to fetch the list of regions
@@ -85,3 +102,40 @@ _RegionsMerger_ keeps tracking the progress of regions merges, on each round.
If no progress is observed after a configurable amount of rounds,
_RegionsMerger_ aborts automatically. The limit of rounds without progress is an
integer value configured via `hbase.tools.max.iterations.blocked` property.

## MissingRegionDirsRepairTool - Tool for sideline regions dirs for regions not in meta table

_MissingRegionDirsRepairTool_ moves regions dirs existing under table's dir, but not in meta.
To be used in cases where the region is not present in meta, but still has a dir with hfiles on the
underlying file system, and no holes in the table region chain has been detected.

When no _region holes_ are reported, existing `HBCK2.addFsRegionsMissingInMeta`
command isn't appropriate, as it would bring the region back in meta and cause overlaps.

This tool performs the following actions:
1) Identifies regions in hdfs but not in meta;
2) For each of these regions, sidelines the related dir to a temp folder;
3) Load hfiles from each sidelined region to the related table;

Sidelined regions are never removed from temp folder. Operators should remove those manually,
after they certified on data integrity.

### Usage

This tool requires no parameters. Assuming classpath is properly set, can be run as follows:

```
$ hbase org.apache.hbase.MissingRegionDirsRepairTool
```


### Implementation Details

_MissingRegionDirsRepairTool_ uses `HBCK2.reportTablesWithMissingRegionsInMeta` to retrieve a
_Map<TableName,List<Path>>_ containing the list of affected regions grouped by table. For each of
the affected regions, it copies the entire region dir to a
`HBASE_ROOT_DIR/.missing_dirs_repair/TS/TBL_NAME/sidelined` directory. Then, it copies each of the
region hfiles to a `HBASE_ROOT_DIR/.missing_dirs_repair/TS/TBL_NAME/bulkload` dir, renaming these
files with the pattern `REGION_NAME-FILENAME`. For a given table, all affected regions would then
have all its files under same directory for bulkload. _MissingRegionDirsRepairTool_ then uses
_LoadIncrementalHFiles_ to load all files for a given table at once.
@@ -51,6 +51,11 @@
<version>${log4j2.version}</version>
</dependency>

<dependency>
<groupId>org.apache.hbase.operator.tools</groupId>
<artifactId>hbase-hbck2</artifactId>
</dependency>

<!--We want to use the shaded client but for testing, we need to rely on hbase-server.
HBASE-15666 is about how shaded-client and hbase-server won't work together.
TODO: Fix.-->
@@ -0,0 +1,127 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hbase;

import java.io.IOException;
import java.util.List;
import java.util.Map;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.tool.LoadIncrementalHFiles;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.util.ToolRunner;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class MissingRegionDirsRepairTool extends Configured implements org.apache.hadoop.util.Tool {

private static final Logger LOG =
LoggerFactory.getLogger(MissingRegionDirsRepairTool.class.getName());

private static final String WORKING_DIR = ".missing_dirs_repair";

private Configuration conf;
private HBCK2 hbck;
private LoadIncrementalHFiles bulkLoad;

public MissingRegionDirsRepairTool(Configuration conf) {
this.conf=conf;
this.hbck = new HBCK2(conf);
this.bulkLoad = new LoadIncrementalHFiles(conf);
}

@Override
public int run(String[] strings) throws Exception {
Map<TableName,List<Path>> result = hbck
.reportTablesWithMissingRegionsInMeta(new String[]{});
Path runPath = new Path(new Path(HBCKFsUtils.getRootDir(conf),
WORKING_DIR), "" + System.currentTimeMillis());
FileSystem fs = runPath.getFileSystem(conf);
LOG.info("creating temp dir at: " + runPath.getName());
fs.mkdirs(runPath);
try(Connection conn = ConnectionFactory.createConnection(conf)) {
Admin admin = conn.getAdmin();
result.forEach((t, p) -> {
if(!p.isEmpty()) {
Path tblPath =
new Path(runPath, new Path(t.getNameWithNamespaceInclAsString()
.replaceAll(":", "_")));
try {
fs.mkdirs(tblPath);
Path sidelined = new Path(tblPath, "sidelined");
fs.mkdirs(sidelined);
Path bulkload = new Path(tblPath, "bulkload");
fs.mkdirs(bulkload);
p.stream().forEach(region -> {
try {
Path sidelinedRegionDir = new Path(sidelined, region.getName());
fs.mkdirs(sidelinedRegionDir);
HBCKFsUtils.copyFilesParallel(fs, region, fs, sidelinedRegionDir, conf, 3);
admin.getDescriptor(t).getColumnFamilyNames().forEach(cf -> {
Path cfDir = new Path(region, Bytes.toString(cf));
Path tempCfDir = new Path(bulkload, cfDir.getName());
try {
if (!fs.exists(tempCfDir)) {
fs.mkdirs(tempCfDir);
}
FileStatus[] files = fs.listStatus(cfDir);
for (FileStatus file : files) {
fs.rename(file.getPath(),
new Path(tempCfDir,
region.getName() + "-" + file.getPath().getName()));
}
} catch (IOException e) {
LOG.error("Error trying to move files from inconsistent region dir: ", e);
}
});
fs.delete(region, true);
LOG.info("region dir {} moved to {}", region.toUri().getRawPath(),
sidelinedRegionDir.toUri().getRawPath());
} catch (IOException e) {
LOG.error("Error trying to fetch table descriptor: ", e);
}
});
LOG.info("Calling bulk load for: " + tblPath.toUri().getRawPath());
bulkLoad.run(bulkload.toUri().getRawPath(), t);
} catch (IOException e) {
LOG.error("Error trying to create temp dir for sideline files: ", e);
}
}
});
admin.close();
}
return 0;
}

public static void main(String [] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
int errCode = ToolRunner.run(new MissingRegionDirsRepairTool(conf), args);
if (errCode != 0) {
System.exit(errCode);
}
}
}

0 comments on commit 9760397

Please sign in to comment.