borg uses too much memory (with misc. operations) #5202

antlarr · 2020-05-30T17:45:17Z

Is this a BUG / ISSUE report or a QUESTION?

ISSUE

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

client: 1.1.11 (openSUSE TW system package)
server: 1.1.11 (built locally from debian package)

Operating system (distribution) and version.

client: openSUSE Tumbleweed (20200516)
server: osmc (2020.03-1). Debian-based distribution for mediacenters.

Hardware / network configuration, and filesystems used.

client: x86_64
server: aarch64 (a vero 4k mediacenter with external usb hard disk)

How much data is handled by borg?

Original size of all archives: 2.11 TB
Each archive original size is between 31 and 60 GB
There's a total of 43 archives.

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg recreate -e 'home/user/large-directory' -e 'home/user/large-file' ssh://borgbackup@server/home/borgbackup/repositories/hostname::hostname-2019-09-30T00:01:57

Describe the problem you're observing.

Recreating the archive consumes a lot of memory. I checked it kept growing during the execution of the recreate command peaking just at the end at a VmSize of 34155864 k (~34GB).

After some debugging, I could observe the problem lies in the self.responses dictionary in remote.py, which keeps all the messages received during the processing.

In the middle of a debugging session I got this:

(Pdb) p total_size(self.responses)
25272256788
(Pdb) p len(self.responses.keys())
89057

(total_size is the function from https://code.activestate.com/recipes/577504/ which returns the total size in memory of an object and its contents)

It seems the only way for an item to be removed from the responses dictionary is by first getting its msgid into the waiting_for list. I checked (in a different debugging session than the commands above) and waiting_for doesn't seem to change at all during the execution. It stayed with the same single content every time I checked:

(Pdb) p waiting_for               
[45112]
(Pdb) p max(self.responses.keys())
24628

Of course, the max msgid value of self.responses.keys() kept growing every time I checked, just as its length.

I don't know where waiting_for got that value from, but I'd say maybe waiting_for is waiting for the last msgid coming from the server, so every response will just be stored in self.responses and never discarded.

Note that all this happens inside the self.repository.commit() call in Archive.save . This is the backtrace that can be seen by stopping the execution at mostly any time during the recreate (after a few seconds after starting):

Traceback (most recent call last):
  File "/usr/lib64/python3.8/site-packages/borg/archiver.py", line 164, in wrapper
    return method(self, args, repository=repository, cache=cache_, **kwargs)
  File "/usr/lib64/python3.8/site-packages/borg/archiver.py", line 1727, in do_recreate
    if not recreater.recreate(name, args.comment, args.target):
  File "/usr/lib64/python3.8/site-packages/borg/archive.py", line 1718, in recreate
    self.save(archive, target, comment, replace_original=replace_original)
  File "/usr/lib64/python3.8/site-packages/borg/archive.py", line 1841, in save
    target.save(comment=comment, timestamp=self.timestamp,
  File "/usr/lib64/python3.8/site-packages/borg/archive.py", line 519, in save
    self.repository.commit()
  File "/usr/lib64/python3.8/site-packages/borg/remote.py", line 527, in do_rpc
    return self.call(f.__name__, named, **extra)
  File "/usr/lib64/python3.8/site-packages/borg/remote.py", line 768, in call
    for resp in self.call_many(cmd, [args], **kw):
  File "/usr/lib64/python3.8/site-packages/borg/remote.py", line 897, in call_many
    self.unpacker.feed(data)

Please note that the line numbers in archive.py and remote.py might be different than yours since I added some debug code to those files.

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

I can reproduce it every time.

Include any warning/errors/backtraces from the system logs

The system logs don't show any problem in the client or the server. I also did a borg check which didin't find any problem.

The text was updated successfully, but these errors were encountered:

ThomasWaldmann · 2020-06-01T12:38:08Z

Can you reproduce the issue again:

first using the command you gave in top post
then without any excludes
and again using --dry-run?

Would be interesting whether these also use a lot of memory.

antlarr · 2020-06-01T13:50:22Z

Even better, let me copy what I wrote on irc after textshell and elho mentioned that this behaved like preload was being set to True and I checked it was being set to False:

<antlarr> ouch I think I found a problem
<antlarr>     def unpack_many(self, ids, filter=None, partial_extract=False, preload=False, hardlink_masters=None): 
<antlarr> that's a preload variable
<antlarr>             def preload(chunks):
<antlarr> that function is shadowing the variable
<antlarr> after defining that function, the code does:
<antlarr>             if preload:

so the problem is that that if is always True since the function is always defined instead of using the preload variable passed as a parameter of DownloadPipeline.unpack_many.

I tested that renaming the preload function fixed the problem (consuming less than 150 MB instead of ~34GB and taking less than half the time to finish) but elho was faster to create a commit, so I guess he will submit it soon: elho@be64610

…same name The locally defined preload() function overwrites the preload boolean keyword argument, always evaluating to true, so preloading is done, even when not requested by the caller, causing a memory leak. Also move its definition outside of the loop. This issue was found by Antonio Larrosa in borg issue borgbackup#5202.

elho · 2020-06-01T15:39:21Z

Yes, I built a borg 1.2 from that branch plus a 1.1.11 with that patch on top, testsuite passed, I will submit the actual PR after some real world tests with them have completed successfully.

ThomasWaldmann · 2020-06-01T16:36:31Z

Oh, oops. Thanks for finding this!

This affects pretty much everything using Archive.iter_items() or Archive.extract_item().

…same name The locally defined preload() function overwrites the preload boolean keyword argument, always evaluating to true, so preloading is done, even when not requested by the caller, causing a memory leak. Also move its definition outside of the loop. This issue was found by Antonio Larrosa in borg issue borgbackup#5202.

ThomasWaldmann · 2020-06-02T20:13:14Z

now fixed in master and 1.1-maint (will be in borg 1.1.12).

thanks to @elho and @antlarr for finding and fixing!

ThomasWaldmann · 2020-06-06T14:23:27Z

@LocutusOfBorg @FelixSchwarz as this affects many operations, likely including borg extract, it would be good to have 1.1.12 packaged soon.

Guess noone wants to have a "out of memory" surprise at full restore time...

ThomasWaldmann · 2020-06-06T15:18:36Z

crap, there seems to be a cython code generation issue, so at least the pip pkg is broken on py38, see #5214 .

enkore · 2020-06-06T17:10:50Z

Shouldn't the linter have caught this? I know PyCharm does, but I thought pylint would have as well...

ThomasWaldmann · 2020-06-06T17:30:42Z

@enkore the problem is that pycharm warns about quite a lot in borg source.

some is invalid. some is valid, but harmless. and some is valid and not harmless.

would be quite some effort to clean up the code to remove the warnings about the harmless stuff.

FelixSchwarz · 2020-06-07T20:18:38Z

@antlarr Thank you for spotting the problem :-)

@ThomasWaldmann also thank you for making me aware of the problem. I did built 1.1.13 for Fedora 31/32/rawhide and EPEL 7+8.

LocutusOfBorg · 2020-10-12T08:34:53Z

1.1.12 available in Debian and Ubuntu

ThomasWaldmann added the bug label Jun 1, 2020

ThomasWaldmann added this to the 1.1.12 milestone Jun 1, 2020

ThomasWaldmann changed the title ~~borg uses too much memory recreating an archive~~ borg uses too much memory (with misc. operations) Jun 1, 2020

This was referenced Jun 1, 2020

borg diff / borg extract --progress memory leak? #5162

Closed

borg extract performance #2407

Open

elho mentioned this issue Jun 1, 2020

rename local preload() function to not overwrite keyword argument of same name #5206

Merged

ThomasWaldmann closed this as completed in ebd83a5 Jun 2, 2020

infectormp mentioned this issue Jun 2, 2020

WIP: Support copying objects between repos #5186

Closed

elho mentioned this issue Jun 2, 2020

rename local preload() function to not overwrite keyword argument of same name #5207

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

borg uses too much memory (with misc. operations) #5202

borg uses too much memory (with misc. operations) #5202

antlarr commented May 30, 2020

ThomasWaldmann commented Jun 1, 2020

antlarr commented Jun 1, 2020

elho commented Jun 1, 2020

ThomasWaldmann commented Jun 1, 2020 •

edited

ThomasWaldmann commented Jun 2, 2020

ThomasWaldmann commented Jun 6, 2020

ThomasWaldmann commented Jun 6, 2020

enkore commented Jun 6, 2020

ThomasWaldmann commented Jun 6, 2020

FelixSchwarz commented Jun 7, 2020

LocutusOfBorg commented Oct 12, 2020

borg uses too much memory (with misc. operations) #5202

borg uses too much memory (with misc. operations) #5202

Comments

antlarr commented May 30, 2020

Is this a BUG / ISSUE report or a QUESTION?

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

Operating system (distribution) and version.

Hardware / network configuration, and filesystems used.

How much data is handled by borg?

Full borg commandline that lead to the problem (leave away excludes and passwords)

Describe the problem you're observing.

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Include any warning/errors/backtraces from the system logs

ThomasWaldmann commented Jun 1, 2020

antlarr commented Jun 1, 2020

elho commented Jun 1, 2020

ThomasWaldmann commented Jun 1, 2020 • edited

ThomasWaldmann commented Jun 2, 2020

ThomasWaldmann commented Jun 6, 2020

ThomasWaldmann commented Jun 6, 2020

enkore commented Jun 6, 2020

ThomasWaldmann commented Jun 6, 2020

FelixSchwarz commented Jun 7, 2020

LocutusOfBorg commented Oct 12, 2020

ThomasWaldmann commented Jun 1, 2020 •

edited