Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deduplication does not work after migration from attic #787

Closed
ThomasWaldmann opened this issue Mar 22, 2016 · 2 comments · Fixed by #812
Closed

deduplication does not work after migration from attic #787

ThomasWaldmann opened this issue Mar 22, 2016 · 2 comments · Fixed by #812

Comments

@ThomasWaldmann
Copy link
Member

From the mailing list:

I was able to reproduce the problem on a clean repo and small amount of
static data:

attic init repo
attic create --stats repo::backup1 data
borg upgrade repo
rm -rf ~/.cache/borg
borg create -v --stats repo::backup2 data (deduplication doesn't work)
rm -rf ~/.cache/borg
borg create -v --stats repo::backup3 data (deduplication does work)

Also I tried a number of actions with repository to recreate cache
(borg prune for example or borg create -n), but it didn't help.
@ThomasWaldmann ThomasWaldmann self-assigned this Mar 22, 2016
@enkore
Copy link
Contributor

enkore commented Mar 22, 2016

Chunker params changed.

Quite related: Work on borg rewrite is underway, and that can already change chunker params. I.e. re-chunk old archives without extracting them[1], so that a migrated Attic repo will dedup with newly created Borg archives.

[1] What I basically do is that I have a file-like class around that sources read()s from the chunk-id-list of the source archive; I feed that back into chunkify, which then produces chunks with the current chunker params. When the rewrite is complete the old archive is deleted and so are all old chunks.

@ThomasWaldmann
Copy link
Member Author

Ah, right, the change of default chunker params at borg 1.0.0 explains that.

So, the immediate workaround if the change of chunker params is unwanted is to specify the old chunker params manually, like:

borg create --chunker-params 10,23,16,4095 repo::archive data

But please note: the change to bigger default chunks at borg 1.0.0 was for a reason (lower resource usage [chunk index size, RAM, disk usage]), so think about what you really want.

@ThomasWaldmann ThomasWaldmann removed their assignment Mar 22, 2016
enkore added a commit to enkore/borg that referenced this issue Mar 29, 2016
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.

borg rewrite has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
  space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
  with Borg 1.x archives. (Or to experiment with chunker-params for
  specific use cases

It is interrupt- and resumable.

Chunks are not freed on-the-fly.
Rationale:
  Makes only sense when rechunkifying, but logic on which new chunks to
  free what input chunks is complicated and *very* delicate.


Current TODOs:
- Detect and skip (unless --force) already recompressed chunks
  -- delayed until current PRs on borg.key APIs are decided
     borgbackup#810 borgbackup#789
- Usage example

Future TODOs:
- Refactor tests using py.test fixtures
  -- would require porting ArchiverTestCase to py.test: many changes,
     this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
  -- with the target possibly in another Repo
     (better than "cp" due to full integrity checking, and deduplication
      at the target)

Fixes borgbackup#787 borgbackup#686 borgbackup#630 borgbackup#70 (and probably some I overlooked)
Also see borgbackup#757 and borgbackup#770
enkore added a commit to enkore/borg that referenced this issue Mar 29, 2016
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.

borg rewrite has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
  space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
  with Borg 1.x archives. (Or to experiment with chunker-params for
  specific use cases

It is interrupt- and resumable.

Chunks are not freed on-the-fly.
Rationale:
  Makes only sense when rechunkifying, but logic on which new chunks to
  free what input chunks is complicated and *very* delicate.


Current TODOs:
- Detect and skip (unless --force) already recompressed chunks
  -- delayed until current PRs on borg.key APIs are decided
     borgbackup#810 borgbackup#789
- Usage example

Future TODOs:
- Refactor tests using py.test fixtures
  -- would require porting ArchiverTestCase to py.test: many changes,
     this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
  -- with the target possibly in another Repo
     (better than "cp" due to full integrity checking, and deduplication
      at the target)

Fixes borgbackup#787 borgbackup#686 borgbackup#630 borgbackup#70 (and probably some I overlooked)
Also see borgbackup#757 and borgbackup#770
enkore added a commit to enkore/borg that referenced this issue Mar 29, 2016
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.

borg rewrite has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
  space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
  with Borg 1.x archives. (Or to experiment with chunker-params for
  specific use cases

It is interrupt- and resumable.

Chunks are not freed on-the-fly.
Rationale:
  Makes only sense when rechunkifying, but logic on which new chunks to
  free what input chunks is complicated and *very* delicate.


Current TODOs:
- Detect and skip (unless --force) already recompressed chunks
  -- delayed until current PRs on borg.key APIs are decided
     borgbackup#810 borgbackup#789
- Usage example

Future TODOs:
- Refactor tests using py.test fixtures
  -- would require porting ArchiverTestCase to py.test: many changes,
     this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
  -- with the target possibly in another Repo
     (better than "cp" due to full integrity checking, and deduplication
      at the target)

Fixes borgbackup#787 borgbackup#686 borgbackup#630 borgbackup#70 (and probably some I overlooked)
Also see borgbackup#757 and borgbackup#770
enkore added a commit to enkore/borg that referenced this issue Mar 29, 2016
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.

borg rewrite has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
  space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
  with Borg 1.x archives. (Or to experiment with chunker-params for
  specific use cases

It is interrupt- and resumable.

Chunks are not freed on-the-fly.
Rationale:
  Makes only sense when rechunkifying, but logic on which new chunks to
  free what input chunks is complicated and *very* delicate.


Current TODOs:
- Detect and skip (unless --force) already recompressed chunks
  -- delayed until current PRs on borg.key APIs are decided
     borgbackup#810 borgbackup#789
- Usage example

Future TODOs:
- Refactor tests using py.test fixtures
  -- would require porting ArchiverTestCase to py.test: many changes,
     this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
  -- with the target possibly in another Repo
     (better than "cp" due to full integrity checking, and deduplication
      at the target)

Fixes borgbackup#787 borgbackup#686 borgbackup#630 borgbackup#70 (and probably some I overlooked)
Also see borgbackup#757 and borgbackup#770
enkore added a commit to enkore/borg that referenced this issue Mar 29, 2016
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.

borg rewrite has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
  space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
  with Borg 1.x archives. (Or to experiment with chunker-params for
  specific use cases

It is interrupt- and resumable.

Chunks are not freed on-the-fly.
Rationale:
  Makes only sense when rechunkifying, but logic on which new chunks to
  free what input chunks is complicated and *very* delicate.


Current TODOs:
- Detect and skip (unless --force) already recompressed chunks
  -- delayed until current PRs on borg.key APIs are decided
     borgbackup#810 borgbackup#789
- Usage example

Future TODOs:
- Refactor tests using py.test fixtures
  -- would require porting ArchiverTestCase to py.test: many changes,
     this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
  -- with the target possibly in another Repo
     (better than "cp" due to full integrity checking, and deduplication
      at the target)

Fixes borgbackup#787 borgbackup#686 borgbackup#630 borgbackup#70 (and probably some I overlooked)
Also see borgbackup#757 and borgbackup#770
enkore added a commit to enkore/borg that referenced this issue Apr 1, 2016
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.

borg rewrite has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
  space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
  with Borg 1.x archives. (Or to experiment with chunker-params for
  specific use cases

It is interrupt- and resumable.

Chunks are not freed on-the-fly.
Rationale:
  Makes only sense when rechunkifying, but logic on which new chunks to
  free what input chunks is complicated and *very* delicate.


Current TODOs:
- Detect and skip (unless --force) already recompressed chunks
  -- delayed until current PRs on borg.key APIs are decided
     borgbackup#810 borgbackup#789
- Usage example

Future TODOs:
- Refactor tests using py.test fixtures
  -- would require porting ArchiverTestCase to py.test: many changes,
     this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
  -- with the target possibly in another Repo
     (better than "cp" due to full integrity checking, and deduplication
      at the target)

Fixes borgbackup#787 borgbackup#686 borgbackup#630 borgbackup#70 (and probably some I overlooked)
Also see borgbackup#757 and borgbackup#770
enkore added a commit to enkore/borg that referenced this issue Apr 7, 2016
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.

borg recreate has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
  space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
  with Borg 1.x archives. (Or to experiment with chunker-params for
  specific use cases

It is interrupt- and resumable.

Chunks are not freed on-the-fly.
Rationale:
  Makes only sense when rechunkifying, but logic on which new chunks to
  free what input chunks is complicated and *very* delicate.

Future TODOs:
- Refactor tests using py.test fixtures
  -- would require porting ArchiverTestCase to py.test: many changes,
     this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
  -- with the target possibly in another Repo
     (better than "cp" due to full integrity checking, and deduplication
      at the target)
- Detect and skip (unless --always-recompress) already recompressed chunks

Fixes borgbackup#787 borgbackup#686 borgbackup#630 borgbackup#70 (and probably some I overlooked)
Also see borgbackup#757 and borgbackup#770
enkore added a commit to enkore/borg that referenced this issue Apr 8, 2016
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.

borg recreate has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
  space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
  with Borg 1.x archives. (Or to experiment with chunker-params for
  specific use cases

It is interrupt- and resumable.

Chunks are not freed on-the-fly.
Rationale:
  Makes only sense when rechunkifying, but logic on which new chunks to
  free what input chunks is complicated and *very* delicate.

Future TODOs:
- Refactor tests using py.test fixtures
  -- would require porting ArchiverTestCase to py.test: many changes,
     this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
  -- with the target possibly in another Repo
     (better than "cp" due to full integrity checking, and deduplication
      at the target)
- Detect and skip (unless --always-recompress) already recompressed chunks

Fixes borgbackup#787 borgbackup#686 borgbackup#630 borgbackup#70 (and probably some I overlooked)
Also see borgbackup#757 and borgbackup#770
enkore added a commit to enkore/borg that referenced this issue Apr 10, 2016
Use with caution: permanent data loss by specifying incorrect patterns
is easily possible. Make a dry run to make sure you got everything right.

borg recreate has many uses:
- Can selectively remove files/dirs from old archives, e.g. to free
  space or purging picturarum biggus dickus from history
- Recompress data
- Rechunkify data, to have upgraded Attic / Borg 0.xx archives deduplicate
  with Borg 1.x archives. (Or to experiment with chunker-params for
  specific use cases

It is interrupt- and resumable.

Chunks are not freed on-the-fly.
Rationale:
  Makes only sense when rechunkifying, but logic on which new chunks to
  free what input chunks is complicated and *very* delicate.

Future TODOs:
- Refactor tests using py.test fixtures
  -- would require porting ArchiverTestCase to py.test: many changes,
     this changeset is already borderline too large.
- Possibly add a --target option to not replace the source archive
  -- with the target possibly in another Repo
     (better than "cp" due to full integrity checking, and deduplication
      at the target)
- Detect and skip (unless --always-recompress) already recompressed chunks

Fixes borgbackup#787 borgbackup#686 borgbackup#630 borgbackup#70 (and probably some I overlooked)
Also see borgbackup#757 and borgbackup#770
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants