Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

borg diff / borg extract --progress memory leak? #5162

Closed
MorgothSauron opened this issue Apr 28, 2020 · 23 comments
Closed

borg diff / borg extract --progress memory leak? #5162

MorgothSauron opened this issue Apr 28, 2020 · 23 comments
Milestone

Comments

@MorgothSauron
Copy link

MorgothSauron commented Apr 28, 2020

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes. I didn't find anything that would match the problem I'm experiencing. I also used the IRC channel but nobody could help on this issue.

Is this a BUG / ISSUE report or a QUESTION?

Issue

System information. For client/server mode post info for both machines.

Repository hosted on rsync.net

Your borg version (borg -V).

  • Local: 1.1.11
  • rsycn.net: 1.1.11

Operating system (distribution) and version.

Fedora 31 (Python 3.7.7).

$ uname -a
Linux morgoth.home 5.6.6-200.fc31.x86_64 #1 SMP Tue Apr 21 15:34:22 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ 
$ lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: Fedora
Description:    Fedora release 31 (Thirty One)
Release:        31
Codename:       ThirtyOne
$ 

Hardware / network configuration, and filesystems used.

  • Operating System: Fedora 31
  • Kernel Version: 5.6.6-200.fc31.x86_64
  • OS Type: 64-bit
  • Processors: 4 × Intel® Core™ i3-2100 CPU @ 3.10GHz
  • Memory: 7.7 GiB of RAM
  • Gigabit Ethernet
  • Repository is on rsync.net (ZFS ?)

How much data is handled by borg?

  • Repository informations:
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
All archives:                1.16 TB              1.14 TB             87.66 GB

                       Unique chunks         Total chunks
Chunk index:                   97959              1197682
  • Details for first archive (Number of files: 63991):
Archive name: myhost-2020-04-16T21:41:58-all
Archive fingerprint: 8999270d899feefc4d4d9c67011372fdb5f7997d40aea1d2c0556a277d328704
Comment: 
Hostname: myhost
Username: root
Time (start): Thu, 2020-04-16 21:42:06
Time (end): Thu, 2020-04-16 21:47:39
Duration: 5 minutes 33.08 seconds
Number of files: 63991
Command line: /usr/bin/borg create --show-rc --show-version --progress --compression lz4 --stats --exclude-from ./Borg/conf/exclude.conf '::{hostname}-{now}-all' /etc /home/root /home/myuser
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:               90.72 GB             88.42 GB             21.76 MB
All archives:                1.16 TB              1.14 TB             87.66 GB

                       Unique chunks         Total chunks
Chunk index:                   97959              1197682
  • Details for second archive (Number of files: 53178):
Archive name: myhost-2020-04-25T00:33:52-all
Archive fingerprint: 5077a20f10ae9b3551ac4631b8e13fe9aa83015c3e96277013fe9b0602b9fbae
Comment: 
Hostname: myhost
Username: root
Time (start): Sat, 2020-04-25 00:34:01
Time (end): Sat, 2020-04-25 01:28:52
Duration: 54 minutes 51.53 seconds
Number of files: 53178
Command line: /usr/bin/borg create --show-rc --show-version --progress --compression lz4 --stats --exclude-from ./Borg/conf/exclude.conf '::{hostname}-{now}-all' /etc /home/root /home/myuser
Utilization of maximum supported archive size: 0%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:               88.24 GB             86.39 GB              4.92 MB
All archives:                1.16 TB              1.14 TB             87.66 GB

                       Unique chunks         Total chunks
Chunk index:                   97959              1197682
  • There are less than 20 snapshots.

Full borg commandline that lead to the problem (leave away excludes and passwords)

export BORG_REPO=myuser@server.rsync.net:borg-backup/myhost
export BORG_RSH="ssh -i /home/myuser/.ssh/id_ed25519_backup -oPermitLocalCommand=no -o ClearAllForwardings=yes"

/usr/bin/borg diff ::myhost-2020-04-16T21:41:58-all myhost-2020-04-25T00:33:52-all`

Describe the problem you're observing.

The command will start to display diff information. After few files it will stop processing. The command will run for hours before being killed by Linux. The process is killed because the system run out of memory. My Internet download speed is capped when borg is running.

  • memory usage recorded with nmon:
    nmon
  • memory usage recorded with psrecord:
    psrecord

Can you reproduce the problem?

  • Yes. The diff will stop each time on the same file and borg will start using more memory. It happens using any archive in the repository.
  • To rule out problem with my user profile I created a new user on my machine. Same problem happens.
  • No error reported when checking the repository.
  • However the problem does NOT happen when I start borg with python trace. In this case the diff will complete within a minute like usual.
python -m trace --trackcalls /usr/bin/borg diff ::myhost-2020-04-16T21:41:58-all myhost-2020-04-25T00:33:52-all

Include any warning/errors/backtraces from the system logs

borg is killed by Linux (return code 137). no backtrace, even with debug enabled

[16421.399684] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-6.scope,task=borg,pid=12460,uid=1000
[16421.399697] Out of memory: Killed process 12460 (borg) total-vm:11062144kB, anon-rss:6732344kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:21292kB oom_score_adj:0
[16422.852832] oom_reaper: reaped process 12460 (borg), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
@ThomasWaldmann
Copy link
Member

So you're saying borg diff under tracing works correct (correct output), faster and with less memory needs than the same without tracing? That's curious.

@MorgothSauron
Copy link
Author

So you're saying borg diff under tracing works correct (correct output), faster and with less memory needs than the same without tracing? That's curious.

Correct. When running with tracing the memory usage will go up to about 350Mb and the execution completes within a minute. I don't understand either: it's something I noticed after trying a lot of things to try to figure out what is happening.

@ThomasWaldmann
Copy link
Member

If somebody wants to help: it would be nice to independently verify this behavious:

Different host, different OS (can be linux, but maybe a different distribution), maybe also different python version, but same borg version.

Can this be reproduced elsewhere?

@ThomasWaldmann ThomasWaldmann added this to the 1.1.12 milestone Apr 28, 2020
@borgbackup borgbackup deleted a comment from MorgothSauron Apr 28, 2020
@ThomasWaldmann
Copy link
Member

Did you create both archives with the same --chunker-params?

If you never used this option, it means it used the same default value, so the answer would be yes.

@MorgothSauron
Copy link
Author

I never used the --chunker-params option at any moment on this repository.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Apr 28, 2020

I had a look at the code.

If chunker params are same, there is nothing super expensive.
If they are not, it emits a warning about doing expensive stuff (means: it will read all same-path-same-size files' contents and compare them to find changes).

Other than that, it builds some data structures in memory that are about O(filecount*chunksperfile) size.

~60k files is not much. Neither do your archives seem to have many huge files.

@MorgothSauron
Copy link
Author

I made few tests

  • Fresh Fedora 31 VM - KDE Spin (x86_64): Same problem

    • 2Gb RAM
    • borg v1.1.11
    • Python 3.7.7
    • borg kept running until the system ran out of memory
    • the "workaround" with python tracing worked
    • nmon: localhost_200428_1939.nmon
  • Ubuntu server 19.10 (aarch64): No problem

    • 4Gb RAM
    • borg v1.1.10
    • python 3.7.5

@ThomasWaldmann
Copy link
Member

@MorgothSauron can you try 1.1.11 on that ubuntu machine? if that also works, it would likely rule out a bug in borg.

@MorgothSauron
Copy link
Author

I started few new diffs on ubuntu server 19.10 with borg 1.1.10 and the same problem did happen once in 5 runs. What is interesting is that when it gets stuck it is always after displaying diff stats for the same file

+15.3 MB  -15.3 MB home/myuser/Music/.beets/BeetMusicLibrary.blb

@ThomasWaldmann How do I get v1.1.11 ? Download the binary or try to install from source ?

@ThomasWaldmann
Copy link
Member

https://launchpad.net/~costamagnagianfranco/+archive/ubuntu/borgbackup

not sure if any of these work on 19.10, but you could try.

the fat binary from github releases should work on most linux dists, but the ppa package would be better.

@ThomasWaldmann
Copy link
Member

@MorgothSauron is there something special with the BeetMusicLibrary.blb file or the file that comes directly after that in the archive?

@MorgothSauron
Copy link
Author

@ThomasWaldmann As far as I know there is nothing special with BeetMusicLibrary.blb (file command tells it's a SQLite 3.x database)

I checked the few files before and after. Nothing special there

  +6.3 kB   -5.6 kB home/myuser/Development/projects/docker/.git/logs/HEAD
    +41 B     -41 B home/myuser/Development/projects/docker/.git/ORIG_HEAD
  +3.0 kB   -3.1 kB home/myuser/Development/projects/docker/.git/index
   +540 B    -639 B home/myuser/Development/projects/docker/nextcloud/Dockerfile-server
   +906 B    -813 B home/myuser/Development/projects/docker/pihole/docker-compose.yml
   +237 B    -193 B home/myuser/Development/projects/docker/pihole/.env
 +15.3 MB  -15.3 MB home/myuser/Music/.beets/BeetMusicLibrary.blb
 +18.1 kB  -20.9 kB home/myuser/.bash_history
  +4.9 kB   -4.4 kB home/myuser/.pvpn-cli/connect.ovpn
+145.3 kB -112.4 kB home/myuser/.pvpn-cli/pvpn-cli.log
   +392 B    -392 B home/myuser/.pvpn-cli/pvpn-cli.cfg
  +4.7 kB   -4.7 kB home/myuser/.pvpn-cli/ovpn.log
   +773 B    -776 B home/myuser/.lesshst

I'll try to test v1.1.11 on my existing ubuntu server. I want try the fat binary first: I avoid everything from PPA. For testing I may setup a new ubuntu VM.

I'll do few more tests tomorrow. I'll document them and post back. Thanks for the help.

@MorgothSauron
Copy link
Author

  • I tried to do few tests in hope to see something that could help. Hopefully it will make sense.

  • The issue reported is about borg being killed when the system runs out of memory (memory used by borg).

    • For obvious reason I cannot wait the test system to run out of memory evey time
    • it would take hours just to run one test.
  • The problem is accompanied by data download from rsync.net: I cannot run multiple tests in parallel.

  • I mark a test as successful if the diff completes within 10 minutes

    • a working diff takes less than a minute to complete
  • For each test I will use the following set of commands:

### basic repository settings
export BORG_REPO=myuser@server.rsync.net:borg-backup/myhost 
export BORG_REMOTE_PATH=/usr/local/bin/borg1/borg1

### normal run
/usr/bin/borg diff ::myhost-2020-04-16T21:41:58-all myhost-2020-04-25T00:33:52-all

### trace run
python3 -m trace --trackcalls <normal-command>
OS Borg Python Normal Trace
fedora 31 x86_64 1.1.11 3.7.7 FAILED PASS
fedora 31 x86_64 1.1.10 (f) 3.7.7 FAILED N/A
fedora 31 x86_64 1.1.11 3.6.10 FAILED PASS
ubuntu 19.10 aarch64 1.1.11 3.7.5 PASS PASS
ubuntu 20.04 x86_64 1.1.11 3.8.2 FAILED PASS
ubuntu 20.04 x86_64 1.1.10 (f) 3.8.2 FAILED N/A

(f): "Fat" binary

@MorgothSauron
Copy link
Author

MorgothSauron commented Apr 30, 2020

I forgot 2 things:

  1. Executing the diff remotely on the repo server works without problem (ssh -t rsync borg1 diff ./borg-backup/myhost::myhost-2020-04-16T21:41:58-all myhost-2020-04-25T00:33:52-all). No delays.
  2. No problem found when checking the repository (ssh -t rsync borg1 check --verbose --show-rc --progress ./borg-backup/myhost)

Based on that I would say that the repository is in a good state.

@ThomasWaldmann
Copy link
Member

Just to clarify previous post ^^^^ - the commands were locally running on the backup repo server in these cases.

@domrim
Copy link

domrim commented May 3, 2020

Just stumbled over this bug. I think i have the same issue (or at least related to it).
I was trying to restore a large backup on my Workstation (f31, borg 1.1.11) with 32GB Ram (with borg extract --progress server:archive::backup, and after some time the borg process gets killed by the OOM.
But i noticed that the emory consumption is only that large if i execute the extract command with --progress. Maybe this helps to identify the error. If you need more information or some tests executed i will try to provide them.

@taandrighetti
Copy link
Contributor

Hey, thanks for the suggestion about --progress. I can reproduce this as well with the 1.1.11 standalone binary.

On my computer:

$ \time -f "%e s | %M KB" ./borg-linux64 extract remote@repo::archive some/specific/path/
154.48 s | 817976 KB
$ \time -f "%e s | %M KB" ./borg-linux64 --progress extract remote@repo::archive some/specific/path/
Command terminated by signal 9
259.45 s | 3958164 KB

On server:

$ \time -f "%e s | %M KB" borg extract local_repo::archive some/specific/path/
529.92 s | 636048 KB
$ \time -f "%e s | %M KB" borg --progress extract local_repo::archive some/specific/path/
751.29 s | 675212 KB

(Server is 32-bit Linux, that should explain lower memory usage there)

@ThomasWaldmann
Copy link
Member

OK, so it looks like:

  • borg diff has a memory leak with remote repos
  • borg extract --progress has a memory leak with remote repos

@ThomasWaldmann ThomasWaldmann changed the title borg diff - Out of memory: Killed process borg borg diff / borg extract --progress memory leak? May 4, 2020
@ThomasWaldmann
Copy link
Member

could be a duplicate of #5202.

@ThomasWaldmann
Copy link
Member

@MorgothSauron could you do a test with current 1.1-maint branch code?

The fix for #5202 was merged into 1.1-maint, so maybe your issue is also fixed.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Jun 3, 2020

@domrim @taandrighetti ^^^ could you also test?

@MorgothSauron
Copy link
Author

@ThomasWaldmann I built borg from 1.1-maint (1.2.0a9.dev13+g00b09370) and ran the same diff as before:

borg diff ::myhost-2020-04-16T21:41:58-all myhost-2020-04-25T00:33:52-all

  • v1.2.0a9: The diff completes in less than 2 minutes
  • v1.1.11: No change here, the diff gets stuck.

It looks like it is fixed.

@ThomasWaldmann
Copy link
Member

OK, thanks for testing. Closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants