A replicated fully asynchronous Akka Persistence journal and snapshot store backed by Apache HBase.
(because I'm trying to get Hbase / HDFS to become testable on Travis... not there yet).
Akka-persistance-hbase also provides a SnapshotStore two implementations, one which stores directly into the messages HBase collection, and another one that used HDFS to store larger snapshots. You must select one of the implementations in the configuration.
The artifact is published to Maven Central, so in order to use it you just have to add the following dependency:
// build.sbt style:
libraryDependencies += "pl.project13.scala" %% "akka-persistence-hbase" % "0.3"
Please note that only versions 0.3+
are compatible with the latests Akka version (2.3-SNAPSHOT
, today is: 20 Jan 2013).
This version of the API should be stable though, so go ahead and give it a spin!
To activate the HBase journal plugin, add the following line to your Akka application.conf
:
akka.persistence.journal.plugin = "hbase-journal"
Two snapshot store implementations are provided.
To activate the HBase snapshot store implementation add the following line to your application.conf
# enable the hadoop-snapshot-store
akka.persistence.snapshot-store.plugin = "hadoop-snapshot-store"
# and select the snapshotter implementation
hadoop-snapshot-store {
impl = "akka.persistence.hbase.snapshot.HBaseSnapshotter"
}
For snapshots it may be sometimes smarter to store them in HDFS directly, instead of in HBase. This has a few advantages - it's nicer to deal with a "very big" snapshot this way, and you can easily download it by using plain hadoop command line tooling if needed.
# enable the hadoop-snapshot-store
akka.persistence.snapshot-store.plugin = "hadoop-snapshot-store"
hadoop-snapshot-store {
# and select the snapshotter implementation
impl = "akka.persistence.hbase.snapshot.HdfsSnapshotter"
# and set the directory where the snapshots should be saved to
snapshot-dir = "/akka-snapshots"
}
For more configuration options check the sources of reference.conf.
Is performance OK? I did a brief and naive test recently (won't even call it a benchmark ;-)), but in general the plugin was able to write "from !
to persisted in hbase" with around 6000 ~ 8000 messages per second.
This was tested using 3 region servers (small instances) on the Google Compute Engine, with batch writes of 200 items per batch, 50 partitions (key prefix) and 4 regions.
Recovery time for 45000 messages was around 3 seconds (keep in mind, my Actor does nothing, just receives the replayed message).
Async: Even though in the code it looks like it issues one Put
at a time, this is not the case, as writes are buffered and then batch written thanks to AsyncBase.
- Scans can be made more parallel - due to the fact we have "partitions"
- HDFS snapshots (
HBase snapshots are done already) - Stress testing and some metrics
A list is maintained here: https://gist.github.com/krasserm/8612920#file-akka-persistence-plugins-md
Apache 2.0
Heads up to Martin Krasser who helped with this project by nice links as well as his akka-persistence-cassandra plugin and eventsourced.