I'm syncing files on our server to S3 successfully. However, one particular directory - uploads (where people upload a variety of documents) - fails with the following:
2012/02/29 06:58:10][message] Generating checksums for /storage/data/uploads
[2012/02/29 06:58:15][error] ModelError: Backup for Backup to S3 (cnbc_s3_backup) Failed!
[2012/02/29 06:58:15][error] An Error occured which has caused this Backup to abort before completion.
[2012/02/29 06:58:15][error] Reason: ArgumentError
[2012/02/29 06:58:15][error] invalid byte sequence in UTF-8
[2012/02/29 06:58:15][error] Backtrace:
[2012/02/29 06:58:15][error] /home/mlarocque/.rvm/gems/ruby-1.9.3-p125/gems/backup-3.0.23/lib/backup/syncer/cloud.rb:101:in split' [2012/02/29 06:58:15][error] /home/mlarocque/.rvm/gems/ruby-1.9.3-p125/gems/backup-3.0.23/lib/backup/syncer/cloud.rb:101:inlocal_files'
[2012/02/29 06:58:15][error] /home/mlarocque/.rvm/gems/ruby-1.9.3-p125/gems/backup-3.0.23/lib/backup/syncer/cloud.rb:93:in `all_file_names'
split' [2012/02/29 06:58:15][error] /home/mlarocque/.rvm/gems/ruby-1.9.3-p125/gems/backup-3.0.23/lib/backup/syncer/cloud.rb:101:in
It appears that one or more of the filenames has characters which backup doesn't like.
Would validating the user's uploaded file names be an option?
I don't think backup should change the names, since those files when restored would be different.
I agree that validating file names should be done, and I believe it is now. These files are from a 5+ year old Rails app and there have been a ton of file uploads over that time. Perhaps it might be possible for backup to fail somewhat less spectacularly and identify the file that is causing the issue?
this occurs only in ruby 1.9, which is pretty strict about non-unicode encoding.
usually this is solvable using some iconv, to convert offending names to either utf8 (recommended) or downgrade them to ascii. who's in to fix it?
update Cloud Syncers
- Syncer::Cloud namespace
`sync_with S3` -> `sync_with Cloud::S3`
`sync_with CloudFiles` -> `sync_with Cloud::CloudFiles`
- Warn user if paths contain invalid UTF-8 byte sequences (#288)
@mlarocque This has been taken care of in the 'cloud-syncers' branch if you're still needing this :)
It will simply skip any file paths with invalid UTF-8 characters and log a warning with a reference to the skipped path.
There were other changes made and issues addressed in this branch, as well as those addressed in 'develop' (which cloud-syncers is based on). So, it may take several days before we're comfortable releasing a gem with these updates.
You can add this following to your Gemfile for now, so you can sync and identify those bad files.
:git => 'git://github.com/meskyanichi/backup.git',
:branch => 'cloud-syncers'
@burns UTF8 is allowed on the filesystem and on S3...So shouldn't a fix handle UTF8 characters in filenames?
@bjensen The problem here was invalid UTF-8. If the file name is valid UTF-8, then there's no problem.