Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce cache size needs (txn.active + files) #1766

Open
martin21 opened this issue Oct 26, 2016 · 16 comments
Open

reduce cache size needs (txn.active + files) #1766

martin21 opened this issue Oct 26, 2016 · 16 comments
Labels

Comments

@martin21
Copy link

Related to "significantly reduce cache space needs" #235. But here it isn´t large chunks.archive.d, but rather txn.active + files. This is a cache for /home which has an excessive number of files, but its not that large:

merkaba:~> du -sh /home
164G /home

merkaba:> find /home | wc -l
2865746
merkaba:
> find /home -type f | wc -l
2797574
merkaba:> find /home -type d | wc -l
64319
merkaba:
> find /home/martin/.local/share/local-mail | wc -l
2128451
merkaba:> find /home/martin/.local/share/local-mail -type f | wc -l
2127093
merkaba:
> find /home/martin/.local/share/local-mail -type d | wc -l
1358

Today a borgbackup run into exception due to insufficient free space (see issue "limit cache space needs to avoid out of space exceptions" #1765)

And really:

merkaba:~/.cache/borg> du -sch * | sort -rh | head -4
5,4G insgesamt
4,9G 673e17ea929[…]
420M 809d8850c0bd[…]
44M 723a580c358b[…]

merkaba:~/.cache/borg/673e17ea92925012e32db8d4d92c6ec18a2a08f490d431b068fe6bdaae073737> LANG=C du -sch * | sort -rh | head -5
4.9G total
2.3G txn.active
2.1G files
296M chunks.archive.d
253M chunks

@ThomasWaldmann
Copy link
Member

Your "files" cache/index is huge, 2.1GB. Likely because you have a lot of files.

The txn.active directory is used for transaction processing (while the transaction is active, removed after transaction is completed).

The files cache needs some space "per file" (see formula in internals docs) and also has some "generations" memory back into the past, whatever files it has seen in past backups. Recently we added an env var to adjust the number of generations.

So, to summarize: files cache size is determined by number of files seen in last N backups, N being 10 or 20 or so (or whatever you set the env var to).

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Oct 26, 2016

What you could do:

  • supply enough free diskspace for .cache/borg (obviously, you could mount another fs there, if needed)
  • switch off the files cache (likely causing a big negative performance impact)
  • reduce N (see environment variables docs, BORG_FILES_CACHE_TTL, might have similar impact, depends on how your backup sets / backup schedule looks like)

@enkore
Copy link
Contributor

enkore commented Oct 26, 2016

This can in fact be optimized a fair bit by not keeping a backup cache around (which is what txn.active is about), but directly writing these files. This halves the files+main cache space needs, but also means that an aborted transaction / program run is way more expensive, even more so if the repository is remote / on the internet rather than local.

The files cache is coded relatively dense. About ~80 bytes per file (with some contents, not empty files) at minimum: 32 byte file ID and one 32 byte chunk ID at least, plus some integers. Because of the dense coding these are practically incompressible.

Other possible optimizations for the file cache include simply not caching small files (below a couple kB). This would probably do a lot for you here.

@martin21
Copy link
Author

Thomas, I switched off files cache and got:

2016-10-26T12:20:33+02:00 Sichere /home.
2016-10-26T13:19:11+02:00 /home gesichert.

Thats not too bad.

Last backups with enabled cache where:

2016-10-26T10:33:32+02:00 Sichere /home.
2016-10-26T11:06:14+02:00 /home gesichert.

2016-10-17T12:36:56+02:00 Sichere /home.
2016-10-17T13:38:13+02:00 /home gesichert.

2016-10-14T10:02:19+02:00 Sichere /home.
2016-10-14T11:28:09+02:00 /home gesichert.

When I look at the variations I find it difficult to say what performance impact will be. I am using borgbackup only from office / work related networks with 1 GBit link to backup VM and today ping times of about 1 to 1,5 ms. Backup is to slow with my current DSL uplink tomorrow.

The txn.active directory is used for transaction processing (while the transaction is active, removed after transaction is completed).

So theoretically I could make a symlink from txn.active to /tmp/txn.active to store this directory in tmpfs. This would add another about 2 GiB to RAM usage, in addition to the about 5,4 GiB RSIZE borgbackup used, but on laptop with 16 GiB that would be bearable.

@ThomasWaldmann
Copy link
Member

txn.active is dynamically created and deleted again.

if you backup via dsl uplink, that might heavily influence your backup times.

also, if these are maildirs with a lot of tiny files, maybe there isn't a big difference between stat() and stat+open?

@enkore
Copy link
Contributor

enkore commented Oct 26, 2016

Disabling the files cache doesn't require more bandwidth to the repository.

@martin21
Copy link
Author

Well I am open to other suggestions – maybe using this new environment variable to reduce generations –, right now this works. I do not mind that much whether it takes one hour or half an hour. And if it needs to read in all local files. This is Dual SSD BTRFS RAID 1, I barely notice that read I/O activity during working at the laptop.

@ThomasWaldmann
Copy link
Member

@enkore oops, right. it has to query for chunk presence, but does it in the local chunks cache, in memory.

@enkore
Copy link
Contributor

enkore commented Oct 26, 2016

It doesn't look like the files cache makes a lot of difference for you at all... from the data you posted you can probably just always turn it off.

@jdchristensen
Copy link
Contributor

Would it make sense to not put "small" files in the files cache? That might save some space, and not take much extra time, since the small file could be quickly chunked and matched with the chunks cache. It might even save time in some situations. The definition of "small" could be tunable by an environment variable.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Oct 26, 2016

Maybe, maybe not. Access time might be ~10ms for hard disks, limiting throughput to << 100 files per second (much less for SSDs). Just measure it?

@ThomasWaldmann
Copy link
Member

Related: #235.

@enkore
Copy link
Contributor

enkore commented May 24, 2017

Unclear whether any issue remains. Closing. Feel free to reopen if this assessment is not accurate.

@enkore enkore closed this as completed May 24, 2017
@ThomasWaldmann
Copy link
Member

@jdchristensen had an interesting idea of not putting small files into files cache - it seems there is some benchmarking left to do.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Oct 7, 2017

see #3096 about the small files performance.

@ThomasWaldmann
Copy link
Member

@jdchristensen see #3096.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants