Skip to content
Spark Streaming HBase Example
Scala
Branch: master
Clone or download
Latest commit 1fe075e Apr 4, 2016
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data add code Aug 17, 2015
scripts
src/main/scala/examples update pom to latest version Mar 16, 2016
.gitattributes 👾 Added .gitattributes & .gitignore files Aug 17, 2015
.gitignore fix code Sep 17, 2015
README.txt
dependency-reduced-pom.xml add code Aug 17, 2015
pom.xml update pom to latest version Mar 16, 2016

README.txt

Create an hbase table to write to:
launch the hbase shell
$hbase shell

create '/user/user01/sensor', {NAME=>'data'}, {NAME=>'alert'}, {NAME=>'stats'}

Commands to run labs:

Step 1: First compile the project: Select project  -> Run As -> Maven Install

Step 2: use scp to copy the sparkstreamhbaseapp-1.0.jar to the mapr sandbox or cluster

To run the  streaming:

Step 3: start the streaming app

/opt/mapr/spark/spark-1.5.2/bin/spark-submit --driver-class-path `hbase classpath` --class examples.HBaseSensorStream sparkstreamhbaseapp-1.0.jar

Step 4: copy the streaming data file to the stream directory
cp sensordata.csv  /user/user01/stream/.

Step 5: you can scan the data written to the table, however the values in binary double are not readable from the shell
launch the hbase shell,  scan the data column family and the alert column family 
$hbase shell
scan '/user/user01/sensor',  {COLUMNS=>['data'],  LIMIT => 10}
scan '/user/user01/sensor',  {COLUMNS=>['alert'],  LIMIT => 10 }

Step 6: launch one of the programs below to read data and calculate daily statistics
calculate stats for one column
/opt/mapr/spark/spark-1.5.2/bin/spark-submit --driver-class-path `hbase classpath` --class examples.HBaseReadWrite sparkstreamhbaseapp-1.0.jar
calculate stats for whole row
/opt/mapr/spark/spark-1.5.2/bin/spark-submit --driver-class-path `hbase classpath` --class examples.HBaseReadRowWriteStats sparkstreamhbaseapp-1.0.jar

launch the shell and scan for statistics
scan '/user/user01/sensor',  {COLUMNS=>['stats']}


You can’t perform that action at this time.