Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

borg uses too much memory (with misc. operations) #5202

Closed
antlarr opened this issue May 30, 2020 · 11 comments
Closed

borg uses too much memory (with misc. operations) #5202

antlarr opened this issue May 30, 2020 · 11 comments
Labels
Milestone

Comments

@antlarr
Copy link
Contributor

antlarr commented May 30, 2020

Is this a BUG / ISSUE report or a QUESTION?

ISSUE

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

client: 1.1.11 (openSUSE TW system package)
server: 1.1.11 (built locally from debian package)

Operating system (distribution) and version.

client: openSUSE Tumbleweed (20200516)
server: osmc (2020.03-1). Debian-based distribution for mediacenters.

Hardware / network configuration, and filesystems used.

client: x86_64
server: aarch64 (a vero 4k mediacenter with external usb hard disk)

How much data is handled by borg?

Original size of all archives: 2.11 TB
Each archive original size is between 31 and 60 GB
There's a total of 43 archives.

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg recreate -e 'home/user/large-directory' -e 'home/user/large-file' ssh://borgbackup@server/home/borgbackup/repositories/hostname::hostname-2019-09-30T00:01:57

Describe the problem you're observing.

Recreating the archive consumes a lot of memory. I checked it kept growing during the execution of the recreate command peaking just at the end at a VmSize of 34155864 k (~34GB).

After some debugging, I could observe the problem lies in the self.responses dictionary in remote.py, which keeps all the messages received during the processing.

In the middle of a debugging session I got this:

(Pdb) p total_size(self.responses)
25272256788
(Pdb) p len(self.responses.keys())
89057

(total_size is the function from https://code.activestate.com/recipes/577504/ which returns the total size in memory of an object and its contents)

It seems the only way for an item to be removed from the responses dictionary is by first getting its msgid into the waiting_for list. I checked (in a different debugging session than the commands above) and waiting_for doesn't seem to change at all during the execution. It stayed with the same single content every time I checked:

(Pdb) p waiting_for               
[45112]
(Pdb) p max(self.responses.keys())
24628

Of course, the max msgid value of self.responses.keys() kept growing every time I checked, just as its length.

I don't know where waiting_for got that value from, but I'd say maybe waiting_for is waiting for the last msgid coming from the server, so every response will just be stored in self.responses and never discarded.

Note that all this happens inside the self.repository.commit() call in Archive.save . This is the backtrace that can be seen by stopping the execution at mostly any time during the recreate (after a few seconds after starting):

Traceback (most recent call last):
  File "/usr/lib64/python3.8/site-packages/borg/archiver.py", line 164, in wrapper
    return method(self, args, repository=repository, cache=cache_, **kwargs)
  File "/usr/lib64/python3.8/site-packages/borg/archiver.py", line 1727, in do_recreate
    if not recreater.recreate(name, args.comment, args.target):
  File "/usr/lib64/python3.8/site-packages/borg/archive.py", line 1718, in recreate
    self.save(archive, target, comment, replace_original=replace_original)
  File "/usr/lib64/python3.8/site-packages/borg/archive.py", line 1841, in save
    target.save(comment=comment, timestamp=self.timestamp,
  File "/usr/lib64/python3.8/site-packages/borg/archive.py", line 519, in save
    self.repository.commit()
  File "/usr/lib64/python3.8/site-packages/borg/remote.py", line 527, in do_rpc
    return self.call(f.__name__, named, **extra)
  File "/usr/lib64/python3.8/site-packages/borg/remote.py", line 768, in call
    for resp in self.call_many(cmd, [args], **kw):
  File "/usr/lib64/python3.8/site-packages/borg/remote.py", line 897, in call_many
    self.unpacker.feed(data)

Please note that the line numbers in archive.py and remote.py might be different than yours since I added some debug code to those files.

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

I can reproduce it every time.

Include any warning/errors/backtraces from the system logs

The system logs don't show any problem in the client or the server. I also did a borg check which didin't find any problem.

@ThomasWaldmann
Copy link
Member

Can you reproduce the issue again:

  • first using the command you gave in top post
  • then without any excludes
  • and again using --dry-run?

Would be interesting whether these also use a lot of memory.

@antlarr
Copy link
Contributor Author

antlarr commented Jun 1, 2020

Even better, let me copy what I wrote on irc after textshell and elho mentioned that this behaved like preload was being set to True and I checked it was being set to False:

<antlarr> ouch I think I found a problem
<antlarr>     def unpack_many(self, ids, filter=None, partial_extract=False, preload=False, hardlink_masters=None): 
<antlarr> that's a preload variable
<antlarr>             def preload(chunks):
<antlarr> that function is shadowing the variable
<antlarr> after defining that function, the code does:
<antlarr>             if preload:

so the problem is that that if is always True since the function is always defined instead of using the preload variable passed as a parameter of DownloadPipeline.unpack_many.

I tested that renaming the preload function fixed the problem (consuming less than 150 MB instead of ~34GB and taking less than half the time to finish) but elho was faster to create a commit, so I guess he will submit it soon: elho@be64610

elho added a commit to elho/borg that referenced this issue Jun 1, 2020
…same name

The locally defined preload() function overwrites the preload boolean keyword
argument, always evaluating to true, so preloading is done, even when not
requested by the caller, causing a memory leak.
Also move its definition outside of the loop.

This issue was found by Antonio Larrosa in borg issue borgbackup#5202.
@elho
Copy link
Contributor

elho commented Jun 1, 2020

Yes, I built a borg 1.2 from that branch plus a 1.1.11 with that patch on top, testsuite passed, I will submit the actual PR after some real world tests with them have completed successfully.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Jun 1, 2020

Oh, oops. Thanks for finding this!

This affects pretty much everything using Archive.iter_items() or Archive.extract_item().

@ThomasWaldmann ThomasWaldmann added this to the 1.1.12 milestone Jun 1, 2020
@ThomasWaldmann ThomasWaldmann changed the title borg uses too much memory recreating an archive borg uses too much memory (with misc. operations) Jun 1, 2020
elho added a commit to elho/borg that referenced this issue Jun 2, 2020
…same name

The locally defined preload() function overwrites the preload boolean keyword
argument, always evaluating to true, so preloading is done, even when not
requested by the caller, causing a memory leak.
Also move its definition outside of the loop.

This issue was found by Antonio Larrosa in borg issue borgbackup#5202.
elho added a commit to elho/borg that referenced this issue Jun 2, 2020
…same name

The locally defined preload() function overwrites the preload boolean keyword
argument, always evaluating to true, so preloading is done, even when not
requested by the caller, causing a memory leak.
Also move its definition outside of the loop.

This issue was found by Antonio Larrosa in borg issue borgbackup#5202.
@ThomasWaldmann
Copy link
Member

now fixed in master and 1.1-maint (will be in borg 1.1.12).

thanks to @elho and @antlarr for finding and fixing!

@ThomasWaldmann
Copy link
Member

@LocutusOfBorg @FelixSchwarz as this affects many operations, likely including borg extract, it would be good to have 1.1.12 packaged soon.

Guess noone wants to have a "out of memory" surprise at full restore time...

@ThomasWaldmann
Copy link
Member

crap, there seems to be a cython code generation issue, so at least the pip pkg is broken on py38, see #5214 .

@enkore
Copy link
Contributor

enkore commented Jun 6, 2020

Shouldn't the linter have caught this? I know PyCharm does, but I thought pylint would have as well...

@ThomasWaldmann
Copy link
Member

@enkore the problem is that pycharm warns about quite a lot in borg source.

some is invalid. some is valid, but harmless. and some is valid and not harmless.

would be quite some effort to clean up the code to remove the warnings about the harmless stuff.

@FelixSchwarz
Copy link
Contributor

@antlarr Thank you for spotting the problem :-)

@ThomasWaldmann also thank you for making me aware of the problem. I did built 1.1.13 for Fedora 31/32/rawhide and EPEL 7+8.

@LocutusOfBorg
Copy link
Contributor

1.1.12 available in Debian and Ubuntu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants