Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Memory leak in mecab-python3 0.996.2? #34
When parsing large text many times with mecab-python3 0.996.2, python process consumes too many memories. Finally, the process falls into significant slow down.
For example, after about 200 times of loop when running the following code, python process consumes 80% of machine's memory. Consumed memory does not seem to be released and the process does not make progress at all.
import MeCab def main(): with open('kokoro.utf8.txt') as f: text = f.read() tagger = MeCab.Tagger() tagger.parse('') # Hack for mecab-python3 0.7 for i in range(10000): if i % 10 == 0: print(i) parse(tagger, text) def parse(tagger, text): node = tagger.parseToNode(text) while node: node = node.next if __name__ == "__main__": main()
I can confirm this memory leak. It appears to be related to the
This code leaks:
import MeCab tagger = MeCab.Tagger() node = tagger.parseToNode("text") while node: node = node.next
This code does not leak:
import MeCab tagger = MeCab.Tagger() parse_result = tagger.parse("text") for parse_line in parse_result.split("\n"): if parse_line == "EOS": break
I can confirm this bug exists. When adding a patch for 0.996.1 we checked to avoid memory leaks, see here, so this was a bit surprising. My linked code to check for memory usage shows no growth with 0.996.1 but shows growth (a leak) with 0.996.2.
I changed the example code above a little. I moved the tagger creation out of the loop and changed it to print memory usage. We should check the tagger later but I wanted to narrow the issue. Here's my code:
Here's the start of output with 0.7:
So for all three versions, memory use increases over time, though it's noticeably faster for 0.996.2. I am not sure what the cause of memory increase in 0.996.1 is...
I'll look into the cause of this in more detail later, thanks for pointing it out!
Although the SWIG code isn't ideal I don't think it's the problem here.
I spent a while looking at the
My best guess right now would be that there's some horrific heap fragmentation happening that's hurting lots, but I have no evidence to support that theory.