-
Notifications
You must be signed in to change notification settings - Fork 6
Memory leak in .paragraph_iter
#20
Comments
Misdiagnozed, may be xpath related after all, in particular |
.paragraph_iter
The bigger part of the memory leak was handled in KWARC/rust-libxml#42 and was due to There seems to be a much slower leak also present, leaking about 1 MB in 100 documents. For comparison, the one fixed in the PR leaked 1 MB in 10 documents, so 10x faster. It's particularly annoying that I can not use valgrind on the |
Larger leak officially patched in rust-libxml 0.2.4. The smaller leak is still observable, can also add that the Processing a million documents allocates 3.8 GB of RAM, to be exact. |
Great breakthrough in the debug process, I can now run valgrind on the examples again! Key was not using jemalloc allocations, as suggested at: rust-lang/rust#49183 (comment) Adding this to the example preamble did the trick: #![feature(alloc_system, allocator_api)]
extern crate alloc_system;
use alloc_system::System;
#[global_allocator]
static A: System = System; |
With valgrind's help, the last culprit has been identified and patched - once libxml advances to merge and release Node::null, I can ship the DNM::default patch and close this issue. |
More precisely, the corpus iterators do. This is a recent regression with the new
Node
implementation inrust-libxml 0.2.3
I believe. Which is in itself correct.My current theory is that the excessive use of
Rc
pointers in the data structures creates impossible to deallocate dependencies, leading to entireDocumentRef
objects to remain allocated long after the document itself has been used and is out of scope.High priority to fix, if any of the corpus workflows are to be possible with an arXiv-sized corpus.
The text was updated successfully, but these errors were encountered: