Apache Accumulo Examples
Follow the steps below to run the Accumulo examples:
Clone this repository
git clone https://github.com/apache/accumulo-examples.git
Follow Accumulo's quickstart to install and run an Accumulo instance. Accumulo has an accumulo-client.properties in
conf/that must be configured as the examples will use this file to connect to your instance.
Review env.sh.example in to see if you need to customize it. If
HADOOP_HOMEare set in your shell, you may be able skip this step. Make sure
ACCUMULO_CLIENT_PROPSis set to the location of your accumulo-client.properties.
cp conf/env.sh.example conf/env.sh vim conf/env.sh
Build the examples repo and copy the examples jar to Accumulo's
./bin/build cp target/accumulo-examples.jar /path/to/accumulo/lib/ext/
Each Accumulo example has its own documentation and instructions for running the example which are linked to below.
When running the examples, remember the tips below:
- Examples are run using the
runmrcommands which are located in the
bin/directory of this repo. The
runexcommand is a simple script that use the examples shaded jar to run a a class. The
runmrstarts a MapReduce job in YARN.
- Commands intended to be run in bash are prefixed by '$' and should be run from the root of this repository.
- Several examples use the
accumulo-utilcommands which are expected to be on your
PATH. These commands are found in the
bin/directory of your Accumulo installation.
- Commands intended to be run in the Accumulo shell are prefixed by '>'.
Each example below highlights a feature of Apache Accumulo.
|batch||Using the batch writer and batch scanner|
|bloom||Creating a bloom filter enabled table to increase query performance|
|bulkIngest||Ingesting bulk data using map/reduce jobs on Hadoop|
|classpath||Using per-table classpaths|
|client||Using table operations, reading and writing data in Java.|
|combiner||Using example StatsCombiner to find min, max, sum, and count.|
|compactionStrategy||Configuring a compaction strategy|
|constraints||Using constraints with tables. Limit the mutation size to avoid running out of memory|
|deleteKeyValuePair||Deleting a key/value pair and verifying the deletion in RFile.|
|dirlist||Storing filesystem information.|
|export||Exporting and importing tables.|
|filedata||Storing file data.|
|filter||Using the AgeOffFilter to remove records more than 30 seconds old.|
|helloworld||Inserting records both inside map/reduce jobs and outside. And reading records between two rows.|
|isolation||Using the isolated scanner to ensure partial changes are not seen.|
|regex||Using MapReduce and Accumulo to find data using regular expressions.|
|reservations||Using conditional mutations to implement simple reservation system.|
|rgbalancer||Using a balancer to spread groups of tablets within a table evenly|
|rowhash||Using MapReduce to read a table and write to a new column in the same table.|
|sample||Building and using sample data in Accumulo.|
|shard||Using the intersecting iterator with a term index partitioned by document.|
|spark||Using Accumulo as input and output for Apache Spark jobs|
|tabletofile||Using MapReduce to read a table and write one of its columns to a file in HDFS.|
|terasort||Generating random data and sorting it using Accumulo.|
|uniquecols||Use MapReduce to count unique columns in Accumulo|
|visibility||Using visibilities (or combinations of authorizations). Also shows user permissions.|
|wordcount||Use MapReduce and Accumulo to do a word count on text files|
This repository can be used to test Accumulo release candidates. See docs/release-testing.md.