Skip to content
This repository has been archived by the owner on Jun 27, 2019. It is now read-only.

BigglesZX/s3sb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

//////////////////////////////////////////////////////////////////////////


  s3sb has been superseded by a Python rewrite known helpfully as pys3sb
                    http://github.com/BigglesZX/pys3sb
  no further updates to s3sb will be committed - please check out pys3sb


/////////////////////////////////////\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
//////////////////                                      \\\\\\\\\\\\\\\\\\
///////////                     _.-~^^~-._                     \\\\\\\\\\\
///////                     .-~'   s3sb   '~-.                     \\\\\\\
////                       {  S3 Site Backup  }                       \\\\
//                          \       by       /                          \\
//                           :   BigglesZX  :                           \\
//                           "              "                           \\
//                            *            *                            \\
///                                                                    \\\
// / / / / / / / / / / / / / / / / /  \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\

// 0x00 . Intro  /////////////////////////////////////////////////////////

Welcome to the s3sb README. This file should, if you're lucky (and if
I've finished writing it), contain everything you need to know to get s3sb
running in your environment.

This file will change along with s3sb itself, as different features are
improved and old behaviours retired. The time I have available to work on
projects like s3sb varies immensely, so I'll try to make sure this file 
stays up to date - even in describing behaviours that will soon be
retired - so that you can use s3sb whatever state it's in.


// 0x01 . What is s3sb?  /////////////////////////////////////////////////

In one sentence, s3sb - S3 Site Backup - is a Bash shell script designed
to back up whole websites - files and database contents - on to Amazon S3.

It allows you to specify a directory where files are located, as well as 
connection details for the database used by that site, and when executed 
it will pack up those files into a .tar.gz file, dump the database into a 
.sql.gz file, then upload both those files to an Amazon S3 account, filing
them neatly into a bucket named after the site. Timestamps are included in
the filenames to help future retrieval. How useful!

s3sb was designed to fulfil a need I had in a shared hosting environment.
Your environment may differ, so I'm trying to ensure that s3sb can cope 
with a variety of hosting scenarios so that it can be as helpful to you as
it is to me. It probably won't be perfect yet, but I'm working on it.

s3sb makes use of the s3-bash library created by Raphael James Cohn 
(http://code.google.com/p/s3-bash/). The library hasn't been updated since
version 0.02 in late 2007, so I have made a few tweaks to the code to 
remedy a few issues. Accordingly, s3sb includes the s3-bash files required
to get the whole thing running. As s3-bash is licensed under Apache 
License 2.0 (http://www.apache.org/licenses/LICENSE-2.0), so is s3sb. See
the LICENSE file for more details.

// 0x02 . Prerequisites  /////////////////////////////////////////////////

s3sb is designed to be run from shell or crontab, so you'll need a UNIX-
like environment in which to do that. s3sb should run without issue on Mac
OS X, but this hasn't been tested as much as Linux. Windows (Cygwin) has 
not been tested at all. You're welcome to submit patches to address any 
OS-specific deficiency.

The heart of s3sb's functionality is of course its interfacing with Amazon
S3. So you'll need an Amazon Web Services (AWS) account, first and 
foremost. Once you've initialised S3 on your AWS account, you'll need your
"key" and "secret key" strings in order to allow s3sb access to your S3 
account. Instructions for setting up your key and secret keys in the s3sb 
config files are provided in the Installation and Configuration section.

Since s3sb includes the s3-bash library, you will also need to satisfy the
dependencies of that library. They are GNU coreutils, curl and openssl.
Please see the documentation for your system for instructions on getting 
those packages installed (you've probably got them installed already, 
however, if you're using any recent version of a Linux distribution).

As previously mentioned, I've tweaked the bundled version of s3-bash to 
fix a few issues that were present in the stock code. Please see the 
s3-bash Issues page on Google Code 
(http://code.google.com/p/s3-bash/issues/list) for full details on the 
bugs. The fixes are:

  > Issues #6 and #11
  A bug in the way headers were signed, fixed thanks to the patch provided
  by batlogg on Issue #11.

  > Issue #7
  A URL-forming bug preventing the use of European S3 buckets, fixed 
  thanks to the patch provided by jon.keys

  > Issue #17
  URLencode entities (e.g. %20) in filenames would break put operations. 
  Fixed thanks to the suggestion in the Issue comments from "admin".

Feel free to use the code from my updated s3-bash files to patch your own 
s3-bash installation. The Google Code project for s3-bash has seen very 
little activity since the 0.02 release in 2007 so it's unlikely that 
official updates will be forthcoming any time soon. If there are any 
updates in the future I'll try and merge them into s3sb where appropriate.


// 0x03 . Installation and Configuration  ////////////////////////////////

Installation of s3sb is pretty simple - just clone the git repos or 
grab the files by more manual means. Stick them in a sensibly named 
directory on the machine from which you wish to make your backups. You 
could call it "s3sb", "s3sitebackup", or even "awesomebackuptool". The 
choice is yours.

Once the files are in place, you'll probably need to make the main s3sb 
script executable. In most UNIX-like environments, this can be 
accomplished with the following command (substitute the name of your s3sb
directory from the previous step):

  $ cd awesomebackuptool/
  $ chmod u+x s3sb

Note that (as with all command-line examples in this README) the $ 
indicates your shell prompt, and should not be typed :).

The next step is to configure your AWS key and secret key. At the moment 
these are stored in two text files in the lib/ directory.

Note: for reasons I won't go into here, the files containing your key and 
secret key must be exactly the right length - the key file must be 20 
bytes and the secret key file 40 bytes. What I've found usually happens is
that most text editors append a newline to the end of your file when 
editing, creating a horrid undesirable extra byte. Oh noes!

Thankfully, avoiding this is easy. Browse to the lib/ directory in your 
s3sb installation, copy your AWS key (not secret key) to the clipboard, 
and do the following:

  $ echo -n "<paste your key>" > key.txt

Ignore the <angle brackets>, but keep the double quotes! This will create
key.txt, containing your AWS key and no trailing newline, phew.

Creating your secret key file is a very similar process - copy your AWS 
secret key to the clipboard, then do:

  $ echo -n "<paste your secret key>" > skey.txt

Once again, ignore the angle brackets, but keep the double quotes.

Now, MOST IMPORTANTLY, if you're installing s3sb in a shared environment 
(e.g. a server on which there are users besides yourself), you MUST change
the permissions on the key and secret key files so that they're readable 
only by your user. In fact, it's good practice to do this even if there 
aren't any other users on your server, as it's possible there will be in 
the future and you can bet you won't remember that your secret key file 
is world readable. So, back in the lib/ directory, run the following 
command:

  $ chmod 600 key.txt skey.txt

This will reset the permissions on your key files so they are only 
readable by your user. Nobody else will be able to view their contents.

You can check the permissions were applied correctly by running (in lib/):

  $ ls -lash *key.txt

You should see something like this:

  0 -rw-------  1 youruser yourgroup 20 2009-05-31 05:02 key.txt
  0 -rw-------  1 youruser yourgroup 40 2009-05-31 03:14 skey.txt

The permission settings appear in the second column. Yours should look
exactly like this. If you're in any doubt here, please consult your 
operating system documentation.

Incidentally, the reason s3sb uses files to store your keys and does't 
just read them off the command line is because anything passed as an 
argument to a command can be viewed by other users on the system (in "ps",
for example). And we don't want anyone getting hold of your S3 keys for 
obvious reasons.

Finally, create a new bucket on your S3 account named "s3sb" (this will be
configurable later). Your backups will be placed in this bucket.


// 0x04 . Usage  /////////////////////////////////////////////////////////

Currently, s3sb takes a bunch of parameters on the command line in order 
to configure the files and data that will be involved in your backup. One 
of the first things on my TODO list is to rejig s3sb so that sites are set
up with individual "profiles" in a central config location. That hasn't 
quite happened yet however, so for the time being s3sb must be given all 
the necessary parameters on the command line.

s3sb runs in one of three "modes" - "files", "db" or "all". "Files" mode 
will back up only the files in your site (not the database); "db" mode 
only backs up the database (not the files); while the "all" mode will back
up everything. These modes exist so that you can back up files and db on 
different schedules - for example I have a bunch of sites where the files 
change far less often than the data, so I might back up the files every 
week and the data every day.

Regardless of what mode you run s3sb in, the parameters it takes are the 
same. You'll need the following info:

  i.   A 'friendly name' for your site (e.g. "acmewidgets") - this is used
       to name the directory and archive files into which your backups are
       packed and stored. No spaces or non-alphanumeric characters.

  ii.  The hostname for your database server where the site's database is 
       hosted (it should also be accessible from the machine where the 
       backup is being run - chances are it will be) - you can also use an
       IP address.

  iii. The username you want to use to connect to the database.

  iv.  The password for the above user.

  v.   The name of the database you want to connect to.

  vi.  The directory on your server where the website files that you want 
       to back up reside. I usually set this to include the entire 
       website, but you may wish to only back up a specific subdirectory,
       for whatever reason.

Running s3sb with no arguments/parameters will give you the usage message,
which describes where the info above needs to go:

  $ ./s3sb
  Usage: ./s3sb [all | files | db] sitename dbhost dbuser dbpassfile 
  dbname dirname/

The only cryptic argument here should be "dbpassfile". As mentioned in the
previous section, we don't want to enter passwords as arguments on the 
command line, as these are visible to other users. So, put your db 
password (for our example "acme" site) into its own text file, as we did
with the AWS keys above:

  $ echo -n "<acmedbpass>" > acme.txt
  $ chmod 600 acme.txt

This will save your password into acme.txt and set its permissions so only
your user can read it. If this seems awkward it's because it is - in a 
near-future version we'll be moving this config into a dedicated file as 
previously mentioned.

So, to back up our example "acmewidgets" site, you'd run something like 
the following:

  $ ./s3sb all acmewidgets mysql.acmewidgets.com acmedb 
  /home/youruser/acme.txt acmewidgets /var/www/acme-public/

Note that I've split the above command onto multiple lines for easier 
readability; when you type it out in your shell it should all be on the 
same line. s3sb will fire up, and you should see output similar to the 
following:

  s3sb: starting backup of acmewidgets (mode: all)...
  dumping database...
  archiving site files...
  tar: Removing leading `../' from member names
  uploading database dump...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time
  Current  Dload  Upload   Total   Spent    Left  Speed
  100  6439  0  0  100  6439  0  29870 --:--:-- --:--:-- --:--:--  101k
  uploading site archive...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time
  Current  Dload  Upload   Total   Spent    Left  Speed
  100 10.9M  0  0  100 10.9M  0   936k  0:00:11  0:00:11 --:--:-- 1137k
  cleaning up...
  done

Start your S3 browser of choice (I can recommend Cyberduck for OS X or the
Amazon S3 Firefox Organizer - 
https://addons.mozilla.org/en-US/firefox/addon/3247) and connect to your 
S3 account. Browse to the bucket you created earlier (probably called 
s3sb). Within it you should see a directory named after your site 
(acmewidgets), and within that should be two files, one .tar.gz containing
your site files, and the other a .sql.gz containing your database dump. 
Download them, unpack them somewhere and check that all is in order - make
sure they include everything you're expecting.

You may now want to set up a cron job to run s3sb on a schedule. See the 
documentation for cron on your system for more info on this. You'll want 
to keep email notifications turned on the first few times to make sure 
your backup is working properly. Also bear in mind the point in the Known 
Issues section below about concurrency.


// 0x05 . Known Issues  //////////////////////////////////////////////////

s3sb is a work in progress - I've mostly been focussing on preparing my 
original icky code for a github release (and writing this long-ass file), 
so needless to say there are issues in the current s3sb code which you 
should be aware of. These are being worked on - they're just not fixed 
yet :).

  i.   Awkward config: As previously mentioned, having to specify all the 
       options for a backup right there on the command line every time
       s3sb is run isn't the most convenient arrangement. Soon(TM), s3sb 
       will use a new central config file that'll allow you to set up 
       individual site "profiles" that contain all the config info s3sb 
       needs, allowing you to start a backup run with a much shorter 
       command. It'll also save you from having to deal with separate 
       password files for each site. Phew.

  ii.  Error handling: Right now s3sb just runs through all the steps in
       the dump/package/upload cycle without doing much error checking to
       properly deal with failures in one or more steps. Additionally s3sb
       itself doesn't return a meaningful error code, making it tricky to
       use it in conjunction with other scripts. This is in the TODO list
       and will eventually be written in.

  iii. S3 security: Make sure you trust anyone else who has access to your
       S3 account, as the backup files that s3sb produces are not 
       encrypted. This is being considered as a future feature.

  iv.  Concurrency: I don't have any reason to suspect that this will be a
       problem, but I run all my s3sb cron jobs with a few minutes' space
       between them, to ward off any possible concurrency issues. I do 
       this only because I haven't tested it. Future versions will be 
       tested in this area :).

If you discover any other issues while using s3sb (and it's likely there
will be others), feel free to report them on github and I'll do my best to
fix 'em.


// 0x06 . Roadmap  ///////////////////////////////////////////////////////

The following things are on the list for near-future development (in no 
particular order).

  i.   Centralised config/site profiles: Mentioned a bunch of times above,
       this is what I want to get written into s3sb ASAP.

  ii.  Error handling: As mentioned in Known Issues, s3sb needs to handle
       errors properly.

  iii. Concurrency testing: Need to ensure as much as possible that s3sb 
       can run multiple instances concurrently without creating horrible
       issues in space/time.

  iv.  Backup pruning: I'd like to add the ability for s3sb to 
       automagically prune backups older than a specified age from the S3
       bucket. This strikes me as being a tiny bit tricky but it's on the
       wishlist.

  v.   Backup encryption: As mentioned above, the backups that s3sb puts
       on S3 aren't encrypted. It would be nice to have this as a feature
       in the future, perhaps using something simple like passworded AES.

  vi.  Granular DB backup: I'd like to provide more control over the 
       database side of the backup process, in as much as drilling down to
       include/exclude specific tables.

  vii. Granular file backup: As with the previous point, more granular
       control over the files included in the backup would be good, e.g. 
       the ability to exclude certain files/dirs.

  viii.Add size report for generated archives.

  viv. Check for existence of key files; s3-bash libs, etc

  vv.  Check permissions on key files when run.

That's it for now. Feel free to suggest other features you think would be
useful. I can't promise to be able to add them in a timely fashion (you 
could always submit your own patch), but I'll definitely consider all 
suggestions.

If you made it this far, congratulations! Your dedication to the README is
much appreciated. Hope you enjoy s3sb - let me know how you get on using 
it :). This is my first proper OSS release and it would give me great 
pleasure to know that someone else is using and enjoying something I 
wrote :).

Happy landings,
                                                                 BigglesZX
                                              http://github.com/BigglesZX/
                                          http://github.com/BigglesZX/s3sb


// EOF ///////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////

Releases

No releases published

Packages

No packages published

Languages