Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory usage is too high #16

Closed
ThomasWaldmann opened this issue May 15, 2015 · 6 comments
Closed

memory usage is too high #16

ThomasWaldmann opened this issue May 15, 2015 · 6 comments
Assignees

Comments

@ThomasWaldmann
Copy link
Member

see jborg/attic#302 and jborg/attic#41.

@maltefiala
Copy link

As long as this issue exists, this needs to be mentioned somewhere very public, e.g. the landing page.

@maltefiala
Copy link

Ok, I have to rephrase. I tested on a 4GB Ram machine with 20GB swap. Backup fails after 4 TB of data. This makes attic/borg unuseable in a mature production environment. 4GB Ram is an edge case but backup must not fail, never.

@ThomasWaldmann
Copy link
Member Author

You could add this to attic issue 302 (attic and borg are not yet different in that aspect).
Please also give the number of files / dirs.

@ThomasWaldmann ThomasWaldmann self-assigned this Jun 20, 2015
@ThomasWaldmann
Copy link
Member Author

I wanted to note that I am working on improving this by making the chunker params variable by using a commandline option.

Attic (and currently also Borg) creates lots of rather small chunks of ~64kiB (this is because the rolling hash mask is 16bits: if last 16bits of the hash are zero, a chunk is cut - statistic average is every 64kiB). This provides fine granularity deduplication, but creates a high management overhead (esp. RAM and disk space used for indexes).

By using e.g. 20 mask bits, the chunk size would be ~1MiB, lowering disk space and RAM needs to 1/16 in the best case. Note that a (small) file always creates at least 1 chunk, so there can still be lot of small chunks if you have a lot of small files. Also, this causes a bigger granularity for deduplication.

So, bigger chunks have pros and cons, but at least they make Borg a usable backup software for people with a lot of backup data or with relatively little RAM amounts.

See also there, one reason the author had to take obnam was because of above mentioned issue:
http://changelog.complete.org/archives/9353-roundup-of-remote-encrypted-deduplicated-backups-in-linux

(I personally tried obnam before attic, but it was way too slow for my taste, esp. with encryption.)

@ThomasWaldmann
Copy link
Member Author

btw, see also 4633931.

@maltefiala
Copy link

Will try asap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants