Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor heuristic choices for sending incremental diffs #50

Closed
crusaderky opened this issue Jan 4, 2018 · 7 comments
Closed

Poor heuristic choices for sending incremental diffs #50

crusaderky opened this issue Jan 4, 2018 · 7 comments
Labels

Comments

@crusaderky
Copy link

I have a read-write "current" subvolume, from which I create read-only snapshots every day.
If I didn't do anything significant on a given day, btrfs send <today> -p <yesterday> will produce a stream worth kilobytes.

buttersink doesn't seem to realise this, and tries to do extremely expensive transfers from a much older snapshot.

To another btrfs hard disk:

# buttersink   -n /btrfs/crusaderky/ /mnt/ext_hdd/crusaderky/
  Waiting for btrfs quota usage scan...
  Optimal synchronization:
  36.92 GiB from 3 diffs in btrfs /btrfs/crusaderky
  452.5 GiB from 1 diffs in btrfs /mnt/ext_hdd/crusaderky
  489.4 GiB from 4 diffs in TOTAL
  Keep: ca37...2c4b /mnt/ext_hdd/crusaderky/20170902-130702 from None (452.5 GiB)
  WOULD: Xfer: f970...da46 /btrfs/crusaderky/20171218-223900 from ca37...2c4b /btrfs/crusaderky/20170902-130702 (9.201 GiB)
  WOULD: Xfer: de7c...accf /btrfs/crusaderky/20180104-000001 from ca37...2c4b /btrfs/crusaderky/20170902-130702 (13.86 GiB)
  WOULD: Xfer: 5d72...eef2 /btrfs/crusaderky/20180103-013041 from ca37...2c4b /btrfs/crusaderky/20170902-130702 (13.86 GiB)

To s3:

# buttersink   -n /btrfs/crusaderky/ s3://crusaderky-buttersink/crusaderky/
  Listing S3 Bucket "crusaderky-buttersink" contents...
  measured size (27.72 GiB), estimated size (27.72 GiB)
  Optimal synchronization:
  462.9 GiB from 2 diffs in S3 Bucket "crusaderky-buttersink"
  27.72 GiB from 2 diffs in btrfs /btrfs/crusaderky
  490.7 GiB from 4 diffs in TOTAL
  Keep: ca37...2c4b /crusaderky/20170902-130702 from None (453.7 GiB)
  Keep: f970...da46 /crusaderky/20171218-223900 from ca37...2c4b /crusaderky/20170902-130702 (9.201 GiB)
  WOULD: Xfer: de7c...accf /btrfs/crusaderky/20180104-000001 from ca37...2c4b /btrfs/crusaderky/20170902-130702 (13.86 GiB)
  WOULD: Xfer: 5d72...eef2 /btrfs/crusaderky/20180103-013041 from ca37...2c4b /btrfs/crusaderky/20170902-130702 (13.86 GiB)

In the above situation,

  • the send of 20180103-013041 should use 20171218-223900 as a parent (4.7 GB) and not 20170902-130702 (13.86 GB).
  • the send of 20180104-000001 should use 20180103-013041 as a parent (< 1 MB) and not 20170902-130702 (13.86 GB).
@RandomReaper
Copy link

👍 Same problem here.

@AmesCornish
Copy link
Owner

It's a bit hard to diagnose this without having the snapshots. Note that the heuristic considers factors other than the diff size, including how "tall" the diff stack is on the destination. i.e., it won't create a thousand one-day diffs each depending on the previous, because if any one of those thousand goes bad, you lose the whole thing. Buttersink is designed to occasionally diff from an "old" snapshot, so that your diff repo is more reliable.

In any event, I can see that it would at least be helpful to make the heuristic process more transparent, and maybe give some options for tweaking it. I'll leave this bug open to address that.

@RandomReaper
Copy link

RandomReaper commented Jun 20, 2018

Don't you think sending the snapshots in the order they are taken should be sufficient?
I mean, if my source disk has enough space for storing all snapshots, a destination disk of the same size will suffice for storing them, and this is clearly not the case using the current algorithm.

@AmesCornish
Copy link
Owner

Indeed. My comment should only be relevant when S3 is the destination. I'll investigate further.

@eugene-bright
Copy link
Collaborator

Base for diffs should be updated ones any snapshot transfer is finished.

@eugene-bright
Copy link
Collaborator

My extra note on optimizations #58.

eugene-bright added a commit to eugene-bright/buttersink that referenced this issue Jul 11, 2018
This makes to reuse newly transferred snapshot
as a base for subsequent one.

Resolves: AmesCornish#50
eugene-bright added a commit to eugene-bright/buttersink that referenced this issue Jul 11, 2018
This makes to reuse newly transferred snapshot
as a base for subsequent one.

Resolves: AmesCornish#50
eugene-bright added a commit to eugene-bright/buttersink that referenced this issue Jul 11, 2018
This makes to reuse newly transferred snapshot
as a base for subsequent one.

Resolves: AmesCornish#50
eugene-bright added a commit to eugene-bright/buttersink that referenced this issue Jul 11, 2018
This makes to reuse newly transferred snapshot
as a base for subsequent one.

Resolves: AmesCornish#50
eugene-bright added a commit to eugene-bright/buttersink that referenced this issue Jul 11, 2018
This makes to reuse newly transferred snapshot
as a base for subsequent one.

Resolves: AmesCornish#50
eugene-bright added a commit to eugene-bright/buttersink that referenced this issue Jul 11, 2018
This makes to reuse newly transferred snapshot
as a base for subsequent one.

Resolves: AmesCornish#50
eugene-bright added a commit to eugene-bright/buttersink that referenced this issue Jul 11, 2018
This makes to reuse newly transferred snapshot
as a base for subsequent one.

Resolves: AmesCornish#50
eugene-bright added a commit to eugene-bright/buttersink that referenced this issue Jul 11, 2018
This makes to reuse newly transferred snapshot
as a base for subsequent one.

Resolves: AmesCornish#50
eugene-bright added a commit to eugene-bright/buttersink that referenced this issue Jul 12, 2018
This makes to reuse newly transferred snapshot
as a base for subsequent one.

Resolves: AmesCornish#50
eugene-bright added a commit to eugene-bright/buttersink that referenced this issue Jul 12, 2018
This makes to reuse newly transferred snapshot
as a base for subsequent one.

Resolves: AmesCornish#50
eugene-bright added a commit to eugene-bright/buttersink that referenced this issue Jul 12, 2018
This makes to reuse newly transferred snapshot
as a base for subsequent one.

Resolves: AmesCornish#50
@AmesCornish
Copy link
Owner

The case of transferring into a btrfs system should be addressed in d25e71e. "Tall" diff chains will only be avoided for S3, which is storing diffs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants