giant-squash is a data collector of hbase table sizes. This command-line tool does one thing: it collects data on hbase table sizes and, on shutdown, writes it out to a json file. The usual story is that this data is then fed to the bloom-harvester for maximum enjoyment.
To see a demo of how the data from this tool can be used, see The Story of the Big Data Elders and The Big Data Elders, Archeology Hour.
You must be able to run this jar from a gateway node or any machine on which you can
do hadoop fs -du -s /bla
.
- little-rabbit: The companion tool of giant-squash. This one collects job data from the job tracker.
- bloom-harvester: This is the tool that will eat your squash and rabbits and produce a spectacle for your eyes.
mvn clean install
java -Xmx3G -jar ./giant-squash-1.0-jar-with-dependencies.jar -output <giant-squashes.json> -interval <poll_interval_in_seconds> -tableNames <space delimited list of the table names>
Ctrl-C when done
orkill -2 <pid
if you run it withnohup
in the background.