Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix various memory leaks discovered through valgrind #3258

Merged
merged 3 commits into from Jul 16, 2014

Conversation

mdboom
Copy link
Member

@mdboom mdboom commented Jul 14, 2014

This fixes what #3207 started. Turns out there were two problems with #3207: the data array ended up with 2 refcount at the end of the function. Then, when it is later passed to _image.Image.frombuffer, that method also leaked a reference. sigh.

Also, fixes a leak in the collection transform array, and another in the "generic" font I/O that is used to load fonts from a BytesIO buffer (very rarely used, but the test suite does exercise it).

@mdboom mdboom added this to the v1.4.0 milestone Jul 14, 2014
@mdboom
Copy link
Member Author

mdboom commented Jul 14, 2014

So, it seems this works, but still doesn't address the out of memory failures on Travis:

https://travis-ci.org/matplotlib/matplotlib/jobs/29920943

@tacaswell
Copy link
Member

Maybe we should turn down the number of processes we are using. Is there some way to query the travis machine we are running on? I would expect that some of their machines are under powered and we get the failures when we randomly are assigned to one of those.

If they have fixed core to memory ratio and we get one with less than 8 cores this might make sense.

@mdboom
Copy link
Member Author

mdboom commented Jul 14, 2014

That's a good point. I'm not quite sure how they get provisioned, but we could add lshw (or something) to the top of our scripts to get some output for a while and see if there's any correlation between the failures and hardware (though I guess if it's a virtual machine lshw will lie anyway)...

@efiring
Copy link
Member

efiring commented Jul 15, 2014

@mdboom, is there any reason not go go ahead and merge this now?

@mdboom
Copy link
Member Author

mdboom commented Jul 15, 2014

Stay tuned... I think I have a solution for the bigger problem in the works. Just running again over all the tests now...

This works by avoiding storing a reference to the figure in the
"do_test" enclosure in the image_comparison decorator.  nose keeps a
reference to all of its test functions, and thereby holds on to every
figure in every test that uses the image_comparison decorator (which is
most).  Instead of using the figure in the outer scope, this just uses
the figure number, and then it's looked up when the test function is
actually run.

Also adds gc.collect() so we can ensure there are zero figures in memory
at the start of each test.  Probably not strictly necessary, but it
makes analysing memory usage easier.
@mdboom
Copy link
Member Author

mdboom commented Jul 15, 2014

This now (I think) addresses the steady growth of memory usage in the test suite. Fortunately, it is something specific to the test suite itself and does not indicate a deeper issue that would affect user code.

The image_comparison test generates a single nose test for each combination of figure and output test that a given test produces. These are managed as function closures, which include a reference to each figure. nose retains all of these function closures for the run of the test (probably so it can just display its name if the test fails etc.) The fix here is to pass the figure number rather than the figure to the closure.

Here are the results from valgrind's massif tool:

BEFORE:

    GB
2.751^                                                              #         
     |                                                             @#         
     |                                                             @# :  @:@:@
     |                                              @::::          @#::@:@:@:@
     |                                              @: :       @:@@@#::@:@:@:@
     |                                         @@   @: : ::::::@ @ @#::@:@:@:@
     |                                      :::@ :::@: : ::: : @ @ @#::@:@:@:@
     |                          ::         ::: @ :::@: : ::: : @ @ @#::@:@:@:@
     |                       :::: :::::::::::: @ :::@: : ::: : @ @ @#::@:@:@:@
     |                   @::::: : ::: : :: ::: @ :::@: : ::: : @ @ @#::@:@:@:@
     |                   @: ::: : ::: : :: ::: @ :::@: : ::: : @ @ @#::@:@:@:@
     |                   @: ::: : ::: : :: ::: @ :::@: : ::: : @ @ @#::@:@:@:@
     |                   @: ::: : ::: : :: ::: @ :::@: : ::: : @ @ @#::@:@:@:@
     |                  :@: ::: : ::: : :: ::: @ :::@: : ::: : @ @ @#::@:@:@:@
     |                  :@: ::: : ::: : :: ::: @ :::@: : ::: : @ @ @#::@:@:@:@
     |                @::@: ::: : ::: : :: ::: @ :::@: : ::: : @ @ @#::@:@:@:@
     |            @:::@::@: ::: : ::: : :: ::: @ :::@: : ::: : @ @ @#::@:@:@:@
     |         :@@@: :@::@: ::: : ::: : :: ::: @ :::@: : ::: : @ @ @#::@:@:@:@
     |     ::@::@ @: :@::@: ::: : ::: : :: ::: @ :::@: : ::: : @ @ @#::@:@:@:@
     |  :::::@::@ @: :@::@: ::: : ::: : :: ::: @ :::@: : ::: : @ @ @#::@:@:@:@
   0 +----------------------------------------------------------------------->Gi
     0                                                                   866.7

AFTER:

    MB
810.6^                   #                                                    
     |                   #                                                    
     |                   #                                                    
     |                   #                                                    
     |                   #                                                    
     |                   #                                                    
     |                   #                                                    
     |                   #                                                    
     |                   #                                                    
     |                   #                                                    
     |                   #                                                    
     |                 ::#                                                    
     |                 : #                                                    
     |                :: #::                                                  
     |                :: #:                                                   
     |                :: #:                                             :: :@:
     |                :: #:                                     ::::::::::::@:
     |               ::: #:                                  :::::: ::: ::::@:
     |    :      @@::::: #:  @@           ::::@@        ::@@:: :::: ::: ::::@:
     | ::::::::@@@ : ::: #: :@ ::::::::::::: :@ @:::::::: @ :: :::: ::: ::::@:
   0 +----------------------------------------------------------------------->Gi
     0                                                                   939.1

There still obviously one test whose peak usage is quite large (it might be nice to get to the bottom of that), but the monotonically increasing problem is gone.

@efiring
Copy link
Member

efiring commented Jul 15, 2014

Encouraging! The specgram tests are using much more memory (and time) than I think they should need to do for testing purposes; they are using rather large data arrays, and making dense plots. There might be other culprits as well.

While you are looking at testing: the new option to include or exclude pep8 is good, but at present it is causing all the tests for 2.7 to be run twice.

@mdboom
Copy link
Member Author

mdboom commented Jul 15, 2014

Encouraging! The specgram tests are using much more memory (and time) than I think they should need to do for testing purposes; they are using rather large data arrays, and making dense plots. There might be other culprits as well.

Yeah -- now we are down to one-off problems that we can fix incrementally.

While you are looking at testing: the new option to include or exclude pep8 is good, but at present it is causing all the tests for 2.7 to be run twice.

It is? Looking in Travis, I see a 2.7 run with everything- but-PEP8 and a 2.7 test with only PEP8.

@efiring
Copy link
Member

efiring commented Jul 15, 2014

Now I see you are right, of course. I think that in looking at earlier Travis results on some other PR I saw that the pep8 build was taking about as long as the others, and I mistakenly concluded it was running everything plus pep8, without looking at the output.

@mdboom
Copy link
Member Author

mdboom commented Jul 16, 2014

Travis failure appears to be a networking issue downloading "travis-artifacts" (why that's not locally mirrored at Travis, I don't know). In any case, no OOM errors. I'm going to restart the whole suite again to see if I can trigger the memory errors again.

@tacaswell
Copy link
Member

ERROR:  Error installing travis-artifacts:
    fog-brightbox requires fog-json (>= 0)

It looks like their install scripts are broken as fog-json does not appear anywhere else in the logs.

mdboom added a commit that referenced this pull request Jul 16, 2014
Fix various memory leaks discovered through valgrind
@mdboom mdboom merged commit c893d56 into matplotlib:v1.4.x Jul 16, 2014
@mdboom mdboom deleted the ft2font-memleak branch August 7, 2014 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants