-
Notifications
You must be signed in to change notification settings - Fork 49
How to Deploy Sirius
====
Sirius is not a standalone application, so application deployment depends on the application using the library. This could be a war running in a servlet container, a runnable jar, or maybe even a script. That said, certain requirements need to be met for Sirius to run.
####System Requirements
- Java 6+ / JVM application using the Sirius library (see Getting started with Sirius)
- Enough RAM for the implemented in-memory backend
- Enough RAM for Live-Compaction if enabled
- Enough disk space for the transaction logs
- Akka ports are open between nodes (2552 by default)
####Standard Setup
After writing a Sirius application, the application will be deployed to one or more machines. A standard setup will have each application participate in consensus by adding their akka endpoint to the cluster config file which is shared across all applications.
For write operations, a consensus (simple majority) is required across the cluster; so deploying across an even number of machines makes little sense. For instance, assuming you have 4 nodes, 3 healthy nodes are required for consensus, so only one can go down before write operations begin to fail. So to maintain the same fault tolerance, 3 nodes should be used instead (2 needed for consensus, one can go down).
In the interest of write throughput, consensus clusters are often set up with 3, 5 or 7 nodes.
####Fast Follower Setup
Not all Sirius nodes have to participate in consensus. If a node references a cluster config where its endpoint is not present, it effectively turns off its ordering subsystem. With this configuration, the node will not participate in ordering consensus, but it will still synchronize itself to a randomly selected node in the cluster config , "following" nodes in the ingest cluster. These nodes will not accept writes, and should be considered read-only, but will stay nearly up-to-date, and will not affect write throughput in the same way that adding nodes to the consensus cluster would.
Additionally, while they will not participate in consensus, follower nodes will serve catchup requests to other nodes. Given this, it is possible to set up interesting (and large) follower topologies, based on a small ingest-only cluster.
####Maintenance
The included transaction log file system creates files on disk in the location in the Sirius configuration specified by the application as it configures the Sirius node. This directory will contain the log files. The included implementation may open the log files and index files at any time, as compaction can be triggered at any time. This trigger is based on the size of the log segments and the frequency of transactions.
This gives the application maintainers at least five options for backup and recovery strategies.
- The safest way to create backups of the transaction log files is to backup the files while a node is temporarily shut down. Each node will have a transaction log that will be able to rebuild the data store to its current state, so the data can be successfully backed up by backing up a single node.
- If a node is part of a cluster and they have all started at the same initial transaction, then the log files will be identical from node to node. The transaction logs can be copied from a temporarily stopped node to the recovered node. When started the recovered node will process the transaction log files and start up in the same state as the node the transaction files were copied from.
- If a node fails and is replaced without having any transaction logs, the catch-up algorithm will rebuild the datastore from the other nodes in the cluster. So recovering a failed node without a transaction log backup is possible as long as one other node of the cluster is still healthy by letting the catch up process complete.
- A good use of a follower node is to gather the transactions of the cluster and be the place where backing up the transaction logs occurs. This will allow the backup process to shut down the follower node without affecting the performance of the cluster as a whole, since a follower node does not participate in consensus building or ordering of the updates. While similar to the first option, this option does not require the stopping of an active node and lowering the performance of the cluster during the backup process.
- The application program itself could back up its datastore using its own system to backup the data.
####JMX Monitoring
Sirius permits self monitoring over JMX by exposing various managed beans. All managed beans are under
the namespace com.comcast.xfinity.sirius
. Please refer to a managed bean's scaladoc for attribute documentation.
####Hints and Tricks
-
Adding New Nodes - New nodes can be added during runtime by simply adding their endpoint to the cluster config file. The file is polled periodically, which can be set with
SiriusConfiguration.MEMBERSHIP_CHECK_INTERVAL
. The default is 30 seconds. -
Catchup - A new node will not be online until all data has been ingested. In order to avoid consensus issues, it might be better to start it as a follower node and then add it to the cluster after catchup. One could also copy the UberStore to the new node and then fire it up. These methods could also be applied to a node that has been down for a long time.
Copyright 2013-2017 Comcast Cable Communications Management, LLC
Using Sirius
Getting started with Sirius
Configuring Sirius
How to Deploy Sirius
Reference Applications
Migrations and Release Notes
Migration Guide 1.1.x to 1.2.x
Release Notes 2.0.0
Migration Guide 1.2.x to 2.0.x
Developing Sirius
Getting Started
Contributing Guildelines
License
Releasing Sirius
Scaladocs
Test Timeout Failures
Transaction Log
Implementation
Live Compaction
waltool
Other