Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imshow intermediate memory usage #6952

Closed
LevN0 opened this issue Aug 18, 2016 · 21 comments
Closed

imshow intermediate memory usage #6952

LevN0 opened this issue Aug 18, 2016 · 21 comments
Labels
Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues Performance status: closed as inactive Issues closed by the "Stale" Github Action. Please comment on any you think should still be open. status: inactive Marked by the “Stale” Github Action

Comments

@LevN0
Copy link
Contributor

LevN0 commented Aug 18, 2016

On my machine, imshow uses 3.5 GB of RAM at peak during its plotting process for a 385 MB array, even if this array is already a float64. Nearly 10x the RAM required for just the data alone. Once the plot is finished, it goes significantly. Is this a bug, or is this just how it is?

I cannot simply downsample because I would like to show the user the image at its true size, but allow them to scroll around to different parts via the scrollbars. Is there any approach someone can suggest to do this without having to have 10x the RAM?

Here is a minimal example via pyplot that reproduces this issue exactly; I used task manager to measure the RAM usage.

`import matplotlib.pyplot as plt
import numpy as np

img = np.random.rand(3000, 16024, dtype='float64')
imgplot = plt.imshow(img)

plt.show()`

Python 2.7 and 3.4
matplotlib 1.5.1 (installed via pip)
Using tkagg backend.
Windows 7

@jenshnielsen
Copy link
Member

I agree that seems to high but we probably need to do some memory profiling to see if anything can be done.

Have you considered something like http://matplotlib.org/examples/event_handling/resample.html which downsamples the data and dynamically replots only the fraction needed. I have not tested this or tried extending it to image data

@jenshnielsen jenshnielsen added this to the 2.0.1 (next bug fix release) milestone Aug 18, 2016
@jenshnielsen
Copy link
Member

Also the image code has been largely rewritten for 2.0 so it would be great if you can test 2.0 beta 3 to see how that behaves.

@WeatherGod
Copy link
Member

Indeed, the image code has been greatly improved for v2.0, so I would see
if it improves for your case. I have also come across odd situations of
what seems like a memory leak, but only if I load the data from a netcdf
file using a certain version of the netcdf4 package.

If all else fails, you might be interested in using DataShader.

On Thu, Aug 18, 2016 at 5:28 AM, Jens Hedegaard Nielsen <
notifications@github.com> wrote:

Also the image code has been largely rewritten for 2.0 so it would be
great if you can test 2.0 beta 3 to see how that behaves.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#6952 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AARy-OT9n25hPLopz1Rhek8XOrw4uVbUks5qhCW0gaJpZM4JnITk
.

@anntzer
Copy link
Contributor

anntzer commented Aug 18, 2016

I can confirm the issue with mpl2.0b3 (profiled with memory_profiler), although only with a ~6x memory usage. One of the copies occur because matplotlib always copies the input data, which is going to be difficult to change I think. The other 4x are used in _ImageBase._make_image:

                A = self.norm(A)
                if A.dtype.kind == 'f':
                    # If the image is greyscale, convert to RGBA and
                    # use the extra channels for resizing the over,
                    # under, and bad pixels.  This is needed because
                    # Agg's resampler is very aggressive about
                    # clipping to [0, 1] and we use out-of-bounds
                    # values to carry the over/under/bad information
                    rgba = np.empty((A.shape[0], A.shape[1], 4), dtype=A.dtype)
                    rgba[..., 0] = A  # normalized data
                    rgba[..., 1] = A < 0  # under data
                    rgba[..., 2] = A > 1  # over data
                    rgba[..., 3] = ~A.mask  # bad data
                    A = rgba
                    output = np.zeros((out_height, out_width, 4),
                                      dtype=A.dtype)
                    alpha = 1.0
                    created_rgba_mask = True

(rgba then gets passed to matplotlib._image.resample.)
Most normalizers will lead to this code path being taken (btw the comment seems incorrect here). In practice it seems that the last three layers of rgba are going to be unused most of the time, so they could be passed as a separate, optional boolean array (possibly packed as a bitmask).

@LevN0
Copy link
Contributor Author

LevN0 commented Aug 22, 2016

Took me a bit to do somerthing similar to what @anntzer did. This is what I’ve found, both for MPL 1.5 and 2:

In matplotlib 1.5.2:

Inside image.py, there is a _get_unsampled_image() method. This method calls x = self.to_rgba(…, bytes=False) at some point. To_RGBA with such parameters always returns a float64 if the input is a type of float, and since it’s RGBA [i.e., 4 dimensions] this means there is at least 4x the size of the original data required (up to +1 actually since original data also still exists, and another up to +1 for the initial copy MPL makes). It would be 8x instead of initial 4x if the original data were e.g. float32. Then there is this call,

x = (x*255).astype(np.uint8)

For a reason I cannot determine, the multiplication step of this doubles RAM temporarily as it runs. Tried on Windows 7 and CentOS6, via memory_profiler. Even in isolation,

import numpy as np
from memory_profiler import memory_usage

def f():
     test = np.random.rand(3000, 16024, 4) # 1536 MB, float64
     test = (x*255).astype('uint8')                   

mem_usage = memory_usage(f, interval=.01)
print('Maximum memory usage: %s' % max(mem_usage))

Gives 3150 MB on two machines with different OS’ and different versions of NumPy, which also agrees with Task Manager on Windows . Tried separating the multiplication command from the astype, tried np.multiply, tried copy=False, etc. All give same memory usage except separate np.multiply + separate, which gives 200 MB less. Do not understand this memory usage, guess it has something to do with numpy internals? In any case, it doubles the RAM temporarily as it runs.

Once this conversion to uint8 is finished, RAM goes down very significantly obviously.

But even in best case, during the intermediate operations, that’s at least 9x the original data size worth of RAM necessary, and in float32 case it’s 16-18x. So if I have a 1 GB data file with float32 array, displaying that requires almost 20 GB of RAM. These 1 GB sizes do happen in astronomy.

In matplotlib 2.0.0b3:

Actually @anntzer covered it. What he said is identical to what I get. As before, the result of 6x more memory than original becomes 10x if the original data is float32 and not float64.

Any suggestions? Both 10x and nearly 20x more memory is a really large problem for some images we have (e.g., on 1.5.2 a 512MB float32 grayscale image requires at least 9.5 GB of RAM, checked that just from the sample code I gave in the original issue description). Regarding resample for image data, I want this data to be displayed with scrollbars (i.e., it won’t all fit on the screen at the same time but it should be scrollable), so I do not know how I could use resample to help (although suggested approaches to this would be appreciated). One method that might work is to use extent + imshow to split up the drawing but it seems very hacky and complicates other parts of the code. I have a medium sized application for astronomers, the visualization bits of which are based on MPL so it would take a number of months at least to switch to some other toolkit even if one could do everything MPL does and this image aspect with less RAM + it’s ultimately packaged via PyInstaller which adds another layer of complication. Therefore I am very incentivized to continue with MPL but I really wish there was not a 10x – 20x memory issue, or I could work around it in some way.

One thing I am wondering is this piece of relevant code from image.py in MPL 1.5.2,

if self._rgbacache is None:
    x = self.to_rgba(A, bytes=False)
    # Avoid side effects: to_rgba can return its argument
    # unchanged.
    if np.may_share_memory(x, A):
        x = x.copy()
    # premultiply the colors
    x[..., 0:3] *= x[..., 3:4]
    x = (x*255).astype(np.uint8)
    self._rgbacache = x
else:
    x = self._rgbacache
im = _image.frombyte(x[yslice, xslice, :], 0)
if self._A.ndim == 2:
    im.is_grayscale = self.cmap.is_gray()
else:
    im.is_grayscale = False

Why in that code is to_rgba() called with bytes=False? Is it for the premultiplying the colors step (because the step after that is seemingly exactly what is done by bytes=True). The only way I can see alpha being anything but 1 (and thus premultiplying colors having any effect) is if the data is already an RGBA array, but that could be trivially checked in this method, calling bytes=False only if it’s already true. This would avoid having to call bytes=False in all non-RGBA cases, which saves a lot of RAM (experimentally I get 3.5 GB instead of 9.5 GB for a case, still too much given original data is only 512MB of float32’s but much better). I am not sure if I am not misunderstanding something completely trivial here though.

@tacaswell
Copy link
Member

The reason we pass an a full NxMx4 array is that we are using the Agg resampler which only works on RGBA images. In principle @anntzer is right, however it would involve changing what library is used to do the resampling under the hood.

I do not understand why float32 is worse, do we upcast when we copy?

As for resampling, have a look at https://github.com/bokeh/datashader/pull/200/files which hides a DataShader behind an Image. If you need to tie the artist closely to scroll bars I would go down that route (although, I think our pan/zoom tools does a better job anyway 😉 ).

It is probably worth putting some logic in _make_image to only use the section of the data array with is plausibly visible (we know the axes limits, the image extent and the image size at draw time) which will help with the zoomed-in case, but will not help with the zoomed-out/full view case.

@anntzer
Copy link
Contributor

anntzer commented Aug 23, 2016

I'm a bit confused now. A quick look at the Agg (which I know nothing about) docs suggests that it supports the pixfmt_gray8 pixel format, so could we just make Agg believe that the image are grayscale during the resampling phase?

@LevN0
Copy link
Contributor Author

LevN0 commented Aug 24, 2016

I do not understand why float32 is worse, do we upcast when we copy?

Sort of. When calling norm(A) in _make_image, the result of passing a float32 is always float64. Reason being is Normalize.process_value in colors.py:

        dtype = (np.float32 if dtype.itemsize <= 2
                 else np.promote_types(dtype, float))

I think this will always obtain float64 for a float32. The docstring reads,

    Returns *result*, *is_scalar*, where *result* is a
    masked array matching *value*.  Float dtypes are preserved;
    integer types with two bytes or smaller are converted to
    np.float32, and larger types are converted to np.float.
    Preserving float32 when possible, and using in-place operations,
    can greatly improve speed for large arrays.

It is not clear to me whether the upcast from float32 to float64 is intentional or not from that description, but I lean toward not.

That upcasting/call to norm() takes place prior to the code @anntzer gave, and if unsample=False (in which case code @anntzer gave will not be run) it takes place prior to the call cmap which is the second place of maximum RAM usage. So yes, there is an upcast and its carried forward to all future operations (some of which make their own copies) that compound the memory usage even further.


The reason we pass an a full NxMx4 array is that we are using the Agg resampler which only works on RGBA images.

Is this referring to the _image.resample call, or a later call? When is the resampling done? The docstring of resample says non-RGBA and non-float is accepted. If at least non-float were accepted, and it were possible to never have the NxMx4 as initial data type but pass it as cmaped into uint8 already that too would save a bunch of memory.

Other sources of memory usage (aside from above paragraph and call to norm() converting float32 to float64) are:

(1) Initial copy MPL makes on intake + initial data that still exists

(2) that although _make_image runs A = self.norm(A), norm makes a copy internally meaning the original A is preserved (guessing this is intentionally to avoid changing A though).

(3) that cmap makes a copy (maybe unavoidable). There's a call in cmap xa = xa.astype(int), I am not exactly certain what is happening there but it looks like majority of cases these will be 0-255, perhaps something like the below (or maybe MPL already has a convenience function for this) can be done,

dtype = np.promote_types(np.min_scalar_type(int(xa.min())), np.min_scalar_type(int(xa.max())))

Only saves a bit of memory though, unless making the copy altogether can be refactored somehow .

@tacaswell
Copy link
Member

It is the _image.resample. Sorry, I was a bit wrong, it does take a 2D gray scale, however it will clip all floats to [0, 1]. The over/under/bad machinery relies on being able to have values outside of this range, hence the trick we are doing to carry over/under/bad on the extra channels. See discussion at https://github.com/matplotlib/matplotlib/pull/6581/files for more details.

For the data usage points:

  1. I would be amenable to a flag to imshow (or maybe just on the AxesImage and you have to make it 'by hand' to access?) to control if we make a copy or not

  2. We need to keep both the original and normed data so that if the norm is changed we can re-compute it. If you want to drop this, I think you will need to make a new Artist.

  3. what makes it into cmap should be down-sampled screen resolution for the region visible (this is why relying on our pan/zoom tools (or hooking the scroll bars up to the axes limits) is better than forcing it to render the whole thing and adding scroll bars)

Given your use case, it may make sense to make a new image Artist that norms on construction (all of our norms are not-in-place), only deals with over/under by clipping, and forbids 'bad' (np.nan or np.inf) values so it can implement a _make_image without any copy. I suspect you could get that down to at worst 2-3x transients.


The upcast in norm looks like a bug, can you or @anntzer put in a PR changing that to float32?

@anntzer
Copy link
Contributor

anntzer commented Aug 24, 2016

Where is this over/under/bad machinery located? I assume it doesn't come from Agg itself?
The float32 issue has been fixed by #6837.

@tacaswell
Copy link
Member

The over/under/bad logic is in the ColorMap baseclass __call__ https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/colors.py#L429 implementation.

Sigh, I even merged the float32 fix. @LevN0 can you please update to the latest master (or v2.x) branch?

@anntzer
Copy link
Contributor

anntzer commented Aug 25, 2016

This doesn't seem to work on a (m, n, 4) array as input? Or did I read the code wrong?

@tacaswell
Copy link
Member

@anntzer I have lost track of which 'this' we are talking about.

The pipeline (iirc) is :

  1. 2D data | maybe masked range: [-inf, inf] + {np.nan}
    • user input
  2. 2D normed data | masked range: [-inf, inf] (but can be thought of as [0, 1] + {-1, 2})
    • created via the one of the norm classes
  3. RGBA for resampling | MxNx4 float
    • created in _make_image and eventually passed to _image.resample
  4. display resolution | masked range: [0, 1] + {-1, 2}
    • restructured output from _image.resample where each color plane is used for one of
      {0: in range data, 1: under, 2: over, 3: bad/masked}
  5. display resolution | unmasked RGBA
    • created in the ColorMap.__call__ method This goes back into the rendering pipeline for compositing with the other artists

Passing 2 directly into _image.resample losses the mask and the {-1, 2} values which are needed later in the pipeline.

Another possible way to reduce the memory usage would be to call _image.resample 4 times with a single plane each time. Not sure what other performance impacts that would have.

@anntzer
Copy link
Contributor

anntzer commented Aug 25, 2016

I think I need to sit down and figure this out... I'm not there yet.

@LevN0
Copy link
Contributor Author

LevN0 commented Mar 10, 2017

I rewrote some of my code such that it hooks scrollbars to axis limits. This resolves some memory inefficiencies in my own code, but it does nothing to address the huge memory usage by the trivial code in the original issue report above.

An even worse pathological situation has been pointed out to me: an image that has a data type of byte (i.e. 0 to 255). On norming, this takes memory usage to 4x since it goes to float32. On RGBA creation, this then takes it another 4x to 16x. This is without any other steps in the above pipeline by @tacaswell. There has to be some more efficient way of doing this. Even for relatively small images, 16x is a very large memory penalty.

A comment by @tacaswell above read,

what makes it into cmap should be down-sampled screen resolution for the region visible (this is why relying on our pan/zoom tools (or hooking the scroll bars up to the axes limits) is better than forcing it to render the whole thing and adding scroll bars)

However hooking scrollbars up to axis limits does not seem to affect what makes it to cmap in any way. Nor in fact does matplotlib appear to downsample itself (e.g. see original 5-liner code to display image at the top, the displayed image is downsampled however the transient memory usage is of a completely non-downsampled image).

@tacaswell
Copy link
Member

The scroll bars need to call set_array + set_extents with the down sampled / cropped image. We could probably do this internally (it is not hard conceptually, just lots of details to get right).

What version are you testing with? The pipline I described for for 2.0+, if you are 1.x we do indeed map the full-sized image to RGBA and then down-sample to screen resolution.

Optomizing imshow is in our list of google summer of code projects, lets see if we get any takers.

@WeatherGod
Copy link
Member

WeatherGod commented Mar 10, 2017 via email

@LevN0
Copy link
Contributor Author

LevN0 commented Mar 10, 2017

I am using 2.0 for testing. For clarity, the way I have it setup is that set_array and set_extent use the full array and full extent: this allows me to take advantage of numerous things MPL does for me (the app exposes to the user numerous MPL features, including different norms, colorbars, changing axis limits, upsampling or downsampling, etc). However, via xlim and ylim, only the portion of the image that can fit onto the screen is selected, and it is displayed in a canvas that is only the size of the window. So ideally the full image should never be displayed or drawn (unless user zooms out to get similar view as MPLs default result by calling imshow).

The comment said,

this is why relying on our pan/zoom tools (or hooking the scroll bars up to the axes limits)

I understood this to mean axis limits. I should note that MPL's pan/zoom tools seem to do the same thing my app does: _make_image receives, normalizes and creates an RGBA array for the full array, no matter how zoomed in/panned you are.

Edit: I should say that I was aware that axis limits would not solve this issue before I did a refactor on my code. But I am still hoping that there is some way to remove a more than 16x memory multiplier on int8 data and the somewhat smaller but still quite large (at least 6x) multipliers on other data types.

@WeatherGod
Copy link
Member

WeatherGod commented Mar 10, 2017 via email

@tacaswell
Copy link
Member

What norm are you using? In most cases, the RGBA is created in https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/image.py#L428 which is called with the resampled (normalized) array. If you zoom way out it will help with memory, zooming in could very well make it much worse (because we do not crop).

Sorting out which positions in A at https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/image.py#L365 and cropping the array we are working with is probably the best place to save memory / compute. It looks like we are already doing this on the output https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/image.py#L447 that logic just needs to be moved up.

@QuLogic QuLogic modified the milestones: 2.0.1 (next bug fix release), 2.0.2 (next bug fix release) May 3, 2017
@tacaswell tacaswell mentioned this issue Jul 30, 2017
6 tasks
@tacaswell tacaswell removed this from the 2.1.1 (next bug fix release) milestone Oct 9, 2017
@tacaswell tacaswell added this to the 2.2 (next feature release) milestone Oct 9, 2017
@github-actions
Copy link

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

@github-actions github-actions bot added the status: inactive Marked by the “Stale” Github Action label Mar 28, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 28, 2023
@rcomer rcomer added the status: closed as inactive Issues closed by the "Stale" Github Action. Please comment on any you think should still be open. label May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues Performance status: closed as inactive Issues closed by the "Stale" Github Action. Please comment on any you think should still be open. status: inactive Marked by the “Stale” Github Action
Projects
None yet
Development

No branches or pull requests

7 participants