Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

A script to create snapshots of a volume and delete old snapshots.

Summary:
You specify how many snapshots to keep.  By default, this keeps the
most recent n/2 daily snapshots, the most recent n/4 weekly snapshots,
and the most recent n/4 monthly snapshots.

Inspired by http://www.geekytidbits.com/rolling-snapshots-ec2/.

Add a note about setting up PATH

A script that makes a (rolling) snapshot of the data on toby.

Run a daily snapshot of toby's data disk at 4am.

Test Plan:
Ran the following and verified they gave an error:
  ./ec2-create-rolling-snapshot.py -n
  ./ec2-create-rolling-snapshot.py -n -d 'backup of toby data'
  ./ec2-create-rolling-snapshot.py -n -d 'backup of toby data' -v vol-06f30e77
  ./ec2-create-rolling-snapshot.py -n -d 'backup of toby data' -v vol-06f30e77 -m 10
  ./ec2-create-rolling-snapshot.py -n -d 'backup of toby data' -v vol-06f30e77 -m 10

Then ran the following and got this output:
  % ./ec2-create-rolling-snapshot.py -n -d 'backup of toby data' -v vol-06f30e77 -m 10 -K ~/aws/pk-backup-role-account.pem -C ~/aws/cert-backup-role-account.pem
  [DRY RUN] Created snap-TBD from vol-06f30e77

It's not a great test; we'll see more as time goes on.  I'll have it
email me every day until we get to see it actually start deleting
things (in like a year's time!, since we're allowed 500 snapshots)

Reviewers: benkomalo, alpert

Reviewed By: benkomalo

Differential Revision: http://phabricator.khanacademy.org/D2006
  • Loading branch information...
commit 3e83ecb2b7d65844183863db9fb833e58b834edc 1 parent 70cf011
@csilvers csilvers authored
View
3  README
@@ -2,3 +2,6 @@ Config files for Amazon Web Services go here.
For instance, all the files used to set up and populate the different
classes of ec2 machines that we have.
+
+There is also the 'aws-tools' directory, which has scripts and
+programs that manage AWS, and may be useful for all of our machines.
View
201 aws-tools/ec2-create-rolling-snapshot.py
@@ -0,0 +1,201 @@
+#!/usr/bin/env python
+
+"""Create a snapshot of a given ec2 EBS volume, deleting old snapshots.
+
+You specify how many snapshots to keep. By default, this keeps the
+most recent n/2 daily snapshots, the most recent n/4 weekly snapshots,
+and the most recent n/4 monthly snapshots.
+
+NOTE: the ec2-* binaries must be on the path!
+
+Inspired by http://www.geekytidbits.com/rolling-snapshots-ec2/.
+"""
+
+
+import datetime
+import subprocess
+
+
+def all_snapshots(volume, description, ec2_arglist, today, dry_run):
+ """(snapshot-id, date) of all snapshots of volume matching description."""
+ if dry_run:
+ # We have to yield the one we pretended we made today.
+ yield ('snapshot-TBD', today.strftime('%Y-%m-%d:%H:%M%:S+0000'))
+
+ output = subprocess.check_output(['ec2-describe-snapshots', '--hide-tags']
+ + ec2_arglist)
+ for line in output.splitlines():
+ (unused_type, snapshot_id, volume_id, status, date,
+ unused_pct, unused_owner_id, unused_volume_size,
+ snapshot_description) = line.split('\t')
+ if volume_id == volume and snapshot_description == description:
+ yield (snapshot_id, date)
+
+
+def create_snapshot(volume, description, ec2_arglist, dry_run):
+ if dry_run:
+ print '[DRY RUN] Created snap-TBD for %s' % volume
+ return
+
+ output = subprocess.check_output(['ec2-create_snapshot', '-d', description]
+ + ec2_arglist + [volume])
+ # Output is, e.g.
+ # SNAPSHOT\tsnap-e1cc35a1\tvol-06f30e77\tpending\t\
+ # 2013-02-05T00:08:06+0000759597320137\t100\ttest snapshot
+ if output.split()[3] not in ('pending', 'completed'):
+ raise RuntimeError('Snapshot state not pending or completed: "%s"'
+ % output)
+ print 'Created %s from %s' % (output.split()[1], volume) # snapshot-id
+
+
+def delete_snapshot(snapshot_id, snapshot_date, ec2_arglist, dry_run):
+ if dry_run:
+ print '[DRY RUN] Deleting %s (%s)' % (snapshot_id, snapshot_date)
+ return
+
+ output = subprocess.check_output(['ec2-delete-snapshot']
+ + ec2_arglist + [snapshot_id])
+ # Output is, e.g.
+ # SNAPSHOT\tsnap-e1cc35a1
+ if output != 'SNAPSHOT\t%s' % snapshot_id:
+ raise RuntimeError('Unexpected output from ec2-delete-snapshot: "%s"'
+ % output)
+ print 'Deleted %s (%s)' % (snapshot_id, snapshot_date)
+
+
+def calculate_good_snapshots(num_daily, num_weekly, num_monthly, today):
+ """Calculate the date-prefix for snapshots we should keep.
+
+ Arguments:
+ num_daily: number of daily snapshots we should keep (including today)
+ num_weekly: number of weekly snapshots we should keep
+ num_monthly: number of monthly snapshots we should keep
+ today: today, as a datetime.date, in UTC timezone.
+
+ Returns:
+ A set of datetime.date objects: all snapshots created on those
+ days we should keep. The rule is we keep the last D daily
+ snapshots, the last W Sunday snapshots, and the last M
+ 1st-of-month snapshots, starting counting with today's
+ snapshot.
+ """
+ keep = set()
+ for i in xrange(num_daily):
+ day = today - datetime.timedelta(i)
+ keep.add(day)
+
+ # Sunday has weekday 6, so this gets us to the previous sunday.
+ last_sunday = today - datetime.timedelta((today.weekday() + 1) % 7)
+ for i in xrange(num_weekly):
+ day = last_sunday - datetime.timedelta(i * 7)
+ keep.add(day)
+
+ last_first_of_month = [today.year, today.month, 1]
+ for i in xrange(num_monthly):
+ day = datetime.date(*last_first_of_month)
+ keep.add(day)
+ last_first_of_month[1] -= 1
+ if last_first_of_month[1] == 0:
+ last_first_of_month[0] -= 1
+ last_first_of_month[1] += 12
+
+ return keep
+
+
+def delete_old_snapshots(all_snapshots,
+ num_daily, num_weekly, num_monthly, today,
+ ec2_arglist, dry_run):
+ """all_snapshots is a list of (snapshot_id, snapshot_date) pairs."""
+ to_keep = calculate_good_snapshots(num_daily, num_weekly, num_monthly,
+ today)
+ for (snapshot_id, snapshot_date) in all_snapshots:
+ # date is in format 'YYYY-MM-DDTHH:MM:SS+TTZZ'
+ snapshot_dt = datetime.date(int(snapshot_date[0:4]),
+ int(snapshot_date[5:7]),
+ int(snapshot_date[8:10]))
+ if snapshot_dt not in to_keep:
+ delete_snapshot(snapshot_id, snapshot_date, ec2_arglist, dry_run)
+
+
+def main(volume, description, max_snapshots,
+ num_daily, num_weekly, num_monthly, ec2_arglist, dry_run,
+ today=datetime.date.today()):
+ """Delete 'old' snapshots matching 'description' on the given volume.
+
+ NOTE: the ec2-* binaries must be on $PATH!
+
+ Arguments:
+ volume: the ec2 EBS volume to snapshot.
+ description: used as the snapshot description. All snapshots
+ sharing the same description are part of a 'snapshot series'.
+ max_snapshots: do not keep more than this many snapshots in
+ one snapshot series.
+ num_daily: how many daily snapshots to keep. If None, make it
+ max_snapshots - num_weekly - num_monthly.
+ num_weekly: how many weekly snapshots to keep. If None, make it
+ max_snapshots / 4
+ num_monthly: how many monthly snapshots to keep. If None, make it
+ max_snapshots / 4
+ ec2_arglist: a list like ['-K', 'foo', '-C', 'foo']. It is passed
+ directly to the ec2 snapshot commands.
+ dry_run: if True, just say what we'd do, but don't do it.
+ today: the day we start calculating snapshots to keep, from.
+ It should be a datetime.date() object in UTC.
+ """
+ snapshots = list(all_snapshots(volume, description, ec2_arglist, today,
+ dry_run))
+
+ if num_weekly is None:
+ num_weekly = max_snapshots / 4
+ if num_monthly is None:
+ num_monthly = max_snapshots / 4
+ if num_daily is None:
+ num_daily = max_snapshots - num_weekly - num_monthly
+ if num_daily < 1:
+ raise ValueError('Must keep at least one daily snapshot!'
+ ' (daily=%s, weekly=%s, monthly=%s)'
+ % (num_daily, num_weekly, num_monthly))
+
+ create_snapshot(volume, description, ec2_arglist, dry_run)
+ delete_old_snapshots(snapshots, num_daily, num_weekly, num_monthly, today,
+ ec2_arglist, dry_run)
+
+
+if __name__ == '__main__':
+ import argparse
+ parser = argparse.ArgumentParser(
+ description='Create a new snapshot and delete too-old snapshots.')
+ parser.add_argument('--description', '-d', required=True,
+ help=('Identify related snapshots (related == share'
+ ' a description). Passed to ec2.'))
+ parser.add_argument('--dry_run', '-n', action='store_true',
+ help='Say what we would do without doing it')
+ parser.add_argument('--volume', '-v', required=True,
+ help='volume-id of the EBS volume to snapshot')
+ parser.add_argument('--max_snapshots', '-m', type=int,
+ required=True,
+ help='The number of snapshots to keep')
+ parser.add_argument('--max-weekly-snapshots', type=int,
+ help=('How many weekly snapshots to take. Must be'
+ ' less than --max_snapshots. Default is'
+ ' max_snapshots / 4'))
+ parser.add_argument('--max-monthly-snapshots', type=int,
+ help=('How many monthly snapshots to take. Must be'
+ ' less than --max_snapshots. Default is'
+ ' max_snapshots / 4'))
+ # max_daily_snapshots is always max_snapshots - weekly - monthly.
+ ec2_args = ('-K', '-C', '-U', '--region')
+ for ec2_arg in ec2_args:
+ parser.add_argument(ec2_arg, help='Passed directly to ec2 commands')
+
+ args = parser.parse_args()
+ ec2_arglist = []
+ for a in ec2_args:
+ a_varname = a.lstrip('-').replace('-', '_')
+ if getattr(args, a_varname, None) is not None:
+ ec2_arglist.append(a) # e.g. '--region'
+ ec2_arglist.append(getattr(args, a_varname)) # e.g. 'us-east1'
+
+ main(args.volume, args.description, args.max_snapshots,
+ None, args.max_weekly_snapshots, args.max_monthly_snapshots,
+ ec2_arglist, args.dry_run)
View
4 internal-webserver/crontab
@@ -14,3 +14,7 @@ PATH = /usr/local/bin:/usr/bin:/bin
# Check for elevated issue rates on Github.
16 * * * * sh -c 'python $HOME/beep-boop/github_reports.py >> $HOME/beep-boop/github.log' || echo "Failed to run beep-boop for github"
+
+# Do a backup every day.
+# We freeze the xfs volume to get a consistent snapshot.
+ 0 4 * * * sh -c '$HOME/aws-config/internal-webserver/snapshot_toby.sh' || echo "Failed to snapshot toby's data"
View
16 internal-webserver/setup.sh
@@ -31,6 +31,9 @@
# Bail on any errors
set -e
+# Activate the multiverse! Needed for ec2-api-tools
+sudo perl -pi.orig -e 'next if /-backports/; s/^# (deb .* multiverse)$/$1/' \
+ /etc/apt/sources.list
sudo apt-get update
install_basic_packages() {
@@ -53,6 +56,18 @@ install_basic_packages() {
sudo service postfix restart
}
+install_ec2_tools() {
+ sudo apt-get install -y ec2-api-tools
+ mkdir -p "$HOME/aws"
+ echo "Copy the pk-backup-role-account.pem and cert-backup-role-account.pem"
+ echo "files from dropbox to $HOME/aws:"
+ echo " https://www.dropbox.com/home/Khan%20Academy%20All%20Staff/Secrets"
+ echo "Also, make sure there is an IAM user called 'backup-role-account"
+ echo "with permissions from 'backup-role-account-permissions'."
+ echo "Then hit enter to continue"
+ read prompt
+}
+
install_repositories() {
echo "Syncing internal-webserver codebase"
sudo apt-get install -y git
@@ -361,6 +376,7 @@ install_publish_notifier() {
cd "$HOME"
install_basic_packages
+install_ec2_tools
install_repositories
install_root_config_files
install_user_config_files
View
33 internal-webserver/snapshot_toby.sh
@@ -0,0 +1,33 @@
+#!/bin/sh
+
+# A script that makes a (rolling) snapshot of the data on toby.
+#
+# This assumes that the volume holding toby's data has a tag named
+# 'Name' with value 'toby data ...' And that this volume is an XFS
+# volume.
+
+set -e # die if any command fails
+
+VOLUME=`ec2-describe-volumes \
+ -K ~/aws/pk-backup-role-account.pem \
+ -C ~/aws/cert-backup-role-account.pem \
+ | grep -e 'TAG.*toby data' \
+ | cut -f3`
+if [ `echo "$VOLUME" | wc -l` -ne 1 ]; then
+ echo "Cannot find a unique volume tagged with the name 'toby data'."
+ exit 1
+fi
+
+PATH="$PATH":/usr/sbin:/usr/bin # for xfs_freeze and for ec2-*.
+
+/usr/sbin/xfs_freeze -f /dev/xvdf1
+trap '/usr/sbin/xfs_freeze -u /dev/xvdf1' 0 1 2 3 6 15
+
+"$HOME/aws-config/aws-tools/ec2-create-rolling-snapshot.py" \
+ -m 16 \
+ -d 'backup of toby data' \
+ -v "$VOLUME" \
+ -K ~/aws/pk-backup-role-account.pem \
+ -C ~/aws/cert-backup-role-account.pem
+
+
Please sign in to comment.
Something went wrong with that request. Please try again.