Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script uses a lot of RAM #7

Closed
simonw opened this issue Jul 24, 2019 · 3 comments

Comments

@simonw
Copy link
Collaborator

commented Jul 24, 2019

I'm using an XML pull parser which should avoid the need to slurp the whole XML file into memory, but it's not working - the script still uses over 1GB of RAM when it runs according to Activity Monitor.

I think this is because I'm still causing the full root element to be incrementally loaded into memory just in case I try and access it later.

http://effbot.org/elementtree/iterparse.htm says I should use elem.clear() as I go. It also says:

The above pattern has one drawback; it does not clear the root element, so you will end up with a single element with lots of empty child elements. If your files are huge, rather than just large, this might be a problem. To work around this, you need to get your hands on the root element.

So I will try that recipe and see if it helps.

@simonw simonw added the bug label Jul 24, 2019

@simonw

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 24, 2019

I'm using https://pypi.org/project/memory-profiler/ to explore this in more detail:

$ pip install memory-profiler matplotlib

Then:

$ mprof run healthkit-to-sqlite ~/Downloads/healthkit-export.zip healthkit.db
$ mprof plot

Screen Shot 2019-07-24 at 8 17 06 AM

simonw added a commit that referenced this issue Jul 24, 2019

@simonw

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 24, 2019

Adding el.clear() got me a huge improvement:

Screen Shot 2019-07-24 at 8 23 26 AM

simonw added a commit that referenced this issue Jul 24, 2019

@simonw

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 24, 2019

Clearing the root element each time saved even more:

Screen Shot 2019-07-24 at 8 30 38 AM

@simonw simonw referenced this issue Jul 24, 2019

@simonw simonw closed this in #8 Jul 24, 2019

simonw added a commit that referenced this issue Jul 24, 2019

Use less RAM (#8)
* Call el.clear() for each element
* Clear root element each time

Memory profile graphs here: #7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.