Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Syncing from S3 to S3 or from S3 to local directory #344

Closed
webmat opened this Issue · 12 comments

7 participants

@webmat

Is it possible to sync from one S3 bucket to another?

The way I understand syncers is that they let you specify a cloud destination, then in the 'directories' block, you specify directories local to your server.

This works if your data is on your server and you only back up to S3.

But there are other scenarios where the other ways to sync would be very useful:

  • you need to restore from S3 to your server (and s3sync won't cut it, for the same reason this gem stopped using it)
  • your prod environment stores directly to S3 (e.g. with Paperclip or Carrierwave) and you want to back up the content of your prod bucket elsewhere (to other S3 bucket in other zone, to Rackspace Cloudfiles, to the server running the backup operation)

Has anyone else encountered this need? Is it already possible and I've simply missed how to do it?

Thanks!

@ryanstout

I'm also curious on this. I have data go directly to s3 and I want to create a copy, not to handle s3 failure (which really shouldn't happen), but to handle the situation where a bug deletes the original files.

@eric-smartlove

I am in the same exact situation that @ryanstout.

@dlackty

@ryanstout @eric-smartlove I believed AWS's new service data pipeline is the solution for this situation.

@eric-smartlove

Thank you @dlackty for pointing this AWS recent feature, that is really interesting.

Although it looks overcomplicated at first sigh for simply copying data, this tutorial can help a lot.

However, AWS Data Pipeline does not seem to support full bucket or directory copy, according to this message, so it isn't a proper tool yet to easily backup the data of an application (unless all files are in one or few directories).

For now, I will stay on copy/paste in AWS console(!). For everyone else, see this thread that gives some alternatives.

@brandonparsons

I really think that the same functionality as is provided in Rsync::Pull should be provided on S3 sync. I'm not sure if there is something technically difficult that makes this hard to do.

I'd like to be able to not only mirror local directories on S3, but also mirror S3 directory on my local drive (i.e. update my local copy to reflect changes on S3). Or am I missing something and this is possible?

@webmat

@brandonparsons have you tried s3cmd for this (the python tool, not the old ruby gem)? I've been using that successfully on many occasions. It does directory syncing, individual up/downloads (duh), create / destroy buckets, a bunch of features for Cloudfront too.

I've stretched it enormously for big-ish amount of files (860 000+, ~10Gb), but on a smaller scale, it works great.

I also have a proof of concept for a cmd line tool that can parallelize this and work at a greater scale, and also do bucket to bucket syncing. Although since I don't need it anymore, it's been on the backburner for 6+ months... Anybody interested in that?

@brandonparsons
@tombruijn tombruijn added the Question label
@tombruijn
Owner

Hi all, I know this is quite an old issue, but I'm cleaning house in the Backup issue tracker.

I feel like this issue is not necessarily within the Backup gem's scope. If you're able to use Backup for this, great! Feel free to discuss your experience here, but I will close the issue.

@tombruijn tombruijn closed this
@eric-smartlove

You mean the gem 'Backup' is meant to make backups for local data (files) but not for online data.
I understand that this feature is not implemented because of the lack of developpers motivated to do so (and I plead guilty), but it's a bit strange to me to say it's outside of the scope.
In the future, there will probably be more and more online data and less local data.

@tombruijn
Owner

I see I read to quickly and that it is not just about syncing data between s3 buckets or from s3 to other servers, but this is for projects that host certain data on other location such as s3.

Then I will have to say, it's not a planned feature for v4, but I'll ping @meskyanichi to see if he finds it interesting for v5 and maybe can take it in consideration while designing it.

@tombruijn tombruijn reopened this
@tombruijn tombruijn added this to the Version 5 milestone
@rchristensen

+1 for syncing from S3 to a local directory, Backup fits almost every need I have for a backup project I'm working on except that one.

@tombruijn tombruijn added Suggestion and removed Question labels
@tombruijn tombruijn referenced this issue in backup/backup-features
Open

Backing up S3 #9

@tombruijn
Owner

Issue moved to the features repository: backup/backup-features#9

@tombruijn tombruijn closed this
@tombruijn tombruijn removed this from the Version 5 milestone
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.