-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: easier Glacier integration #47
Comments
Thanks for thinking about this. I think Glacier is important, but just haven't had time to work on it. The .bs files are caches of data about differences between snapshots, which are used to decide what to send (e.g. which snapshots should be pulled out of glacier). They can (mostly) be recreated, so are helpful but not required. Since the .bs files are small, I'd lean towards something like #3 -- or maybe reorganize them so they are easier to carve out with a transition rule. |
Something's wrong, beyond the .bs files. As a temporary solution before putting my hands in the buttersink code, I wrote this:
Then I implemented a transition rule that moves everything with tag ToGlacier=Y to Glacier after 1 day. I run buttersink:
followed by my script, and 2 days later everything looks good:
Files on s3:
However, now I run buttersink again (same command as above), which in theory should simply recognize that everything's already aligned and just exit, it malfunctions and tries to re-send the last snapshot, but with a (suboptimal) changed parent:
I think this has to do with 20170320-000001 being in Glacier, but I am not 100% sure anymore... |
Hmmmm... thanks for trying this. It seems to recognize and keep the 20170320 in Glacier, so at least some glacier diffs are being recognized. In the last run, the estimates might be causing it to assume (incorrectly) that it can increase the storage efficiency by transferring additional diffs. It really should be using the actual diff size in Glacier. I'm curious -- if you let then measuring xfer continue (buttersink without the -e flag), would it come up with a more appropriate plan? |
Adding the -d flag gives me some new insight of what's going on:
And as you predicted, removing -e changes things a lot:
My .bs files after the latest command:
So there are two problems here:
This problem looks like it's completely unreated to glacier. What we need is for buttersink to re-use already existing snapshots on s3 whenever possible, even if its logic (agnostic from existing snapshots) would tell it to do otherwise for any reason. I finally tried adding a digit to the second line of 20171218-223900.bs, in an attempt to make buttersink chose the 47GB option instead of the 470GB one. It doesn't:
|
Writing transition rules to Amazon Glacier is problematic.
The main issue is with .bs files, as the glacier transition rules can't easily avoid them.
I see a few options:
The text was updated successfully, but these errors were encountered: