A maven project with everything properly set up for using the splout-hadoop API of Splout SQL (http://sploutsql.com/, https://github.com/datasalt/splout-db/). It contains a toy example of indexing and deploying a tablespace with two tables.
IMPORTANT: native libraries must be added to LD_LIBRARY_PATH in local/development mode. They are uncompressed and downloaded by maven to target/maven-shared... (see the pom.xml). You need to add the maven folder to your java.library.path. Here's how in Eclipse:
mvn install mvn eclipse:eclipse Run Configurations -> ... -> JRE -> Installed JREs... -> Click -> Edit ... -> Default VM Arguments: -Djava.library.path=target/maven-shared-archive-resources/
IMPORTANT: For executing the examples you should have at least a QNode and DNode running in your system (see Splout SQL's Getting started: http://sploutsql.com/gettingstarted.html).
For running the example locally, just execute "GenerateTablespace" with no args first and "DeployTablespace" with no args afterwards.
For running the example in pseudo-distributed mode first copy the toy resources to the HDFS:
hadoop fs -put src/main/resources/ src/main/resources
Then install, uncompress the .tar.gz and proceed as follows:
mvn install cd target tar xvfz splout-hadoop-starter-0.0.1-SNAPSHOT-distro.tar.gz cd splout-hadoop-starter-0.0.1-SNAPSHOT hadoop jar splout-hadoop-starter-0.0.1-SNAPSHOT-hadoop.jar generate hadoop jar splout-hadoop-starter-0.0.1-SNAPSHOT-hadoop.jar deploy
You can check that everything went fine by issuing the following queries providing any key:
SELECT * FROM geonames; SELECT * FROM hashtags;
The example is very simple and it just indexes a small file with Twitter hashtag counts per day, partitioning it by hashtag. This means you can do queries like:
(key = 'california') SELECT * FROM hashtags WHERE hashtag = 'california';
It also indexes a toy database of "geonames". So you can check whether some hashtag is an alternate name for a location:
(key = 'california') SELECT * FROM geonames WHERE altname = 'california';
Indeed, you can query in each partition which hashtags correspond to a geo location:
(for any partition key) SELECT * FROM hashtags, geonames WHERE altname = hashtag;