Wrapper around cassandra-snapshotter
Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
README.md
run_snapshotter.py
snapshotter_config.yaml.sample

README.md

cassandra-snapshotter-wrapper

A wrapper script for cassandra-snapshotter.

Why?

cassandra-snapshotter is great for creating full backups of your Cassandra clusters in S3. However, it is a command line tool which takes many arguments so when running from cron it can become messy. Additionally, as it is quite noisy with its output and because it uses Fabric and remote SSH it doesn;t respect a local command line redirect for the output. So, this was a quick hack up to:

  • supress/redirect the output in a friendler way
  • make it nicer to manage new backup requirement via a config file
  • dynamically find my cassandra clusters that run in autoscale groups
  • a stop-gap that would be quicker than forking and refactoring the original

Requirements

Runs in Python 2.7. Non-standard library modules required are:

PyYAML==3.10 boto==2.32.1

Assumes that you have installed cassandra-snapshotter and that you have a user set-up called cassandrasnapshotter on your clusters with relevant sudo privileges.

Additional IAM privileges to cassandra-snapshotter's S3 access are EC2-read only

Setup

Take the sample snapshotter_config.yaml and add your set-up and save on the host that will run the snapshotter in /etc/snapshotter_config.yaml. Most options are self-explanatory, however note:

myproduct: this represents the -p argument when run_snapshotter.py is called as per usage below autoscale_group: this is the long name of your autoscale group as per the EC2 console

    snapshot:
        myproduct:
            aws_access_key_id: "XXXXXXXXXXXXXXXXXXXX"
            aws_secret_access_key: "XXXXXXXXXXXXXXXXXXX"
            s3_bucket_name: "mybucket"
            s3_bucket_region: "eu-west-1"
            s3_base_path: "mybackup"
            autoscale_group: "myautoscalegroup"

If your cluster is not running in AWS (or is not wrapped into ASG) you can also specify a list of instances manually; in order to do so first set autoscale_group to 'None' and then specify a string of hosts that form the cluster

    snapshot:
        myproduct:
            aws_access_key_id: "XXXXXXXXXXXXXXXXXXXX"
            aws_secret_access_key: "XXXXXXXXXXXXXXXXXXX"
            s3_bucket_name: "mybucket"
            s3_bucket_region: "eu-west-1"
            s3_base_path: "mybackup"
            autoscale_group: "None"
            instances: "fqdn-1,fqdn-2"

You will now need to create /var/log/snapshotter and ensure that the user you intend to use has write access to it. Logs will appear here in the format myproduct_snapshotter.log

Usage

    usage: run_snapshotter.py [-h] -p P

    Run Cassandra Snapshotter

    optional arguments:
      -h, --help  show this help message and exit
      -p P        Product as per config.yaml e.g. myproduct