Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contourf method significantly slower than basemap #1291

Closed
crazyapril opened this issue Apr 3, 2019 · 30 comments · Fixed by #1690
Closed

contourf method significantly slower than basemap #1291

crazyapril opened this issue Apr 3, 2019 · 30 comments · Fixed by #1690
Milestone

Comments

@crazyapril
Copy link

Description

I'm migrating my codes from Basemap to Cartopy. Generally Cartopy is always faster, however I find contourf method can take me seconds. Here's my example code.

import time

import cartopy
import matplotlib.pyplot as plt
import numpy as np
import scipy.ndimage as sn
from mpl_toolkits.basemap import Basemap


#-------------------
#  Create data
#-------------------
x, y = np.mgrid[60:140:0.25, 0:45:0.25]
r = np.random.randint(0, 255, size=(320, 180), dtype='uint8')
r = sn.filters.gaussian_filter(r, (3, 3), mode='constant')

#-------------------
#  Cartopy
#-------------------
ax = plt.axes(projection=cartopy.crs.LambertConformal(
    central_longitude=100.0, central_latitude=35.0))
ax.set_extent((60,140,0,45))
for step in (20, 10, 5):
    tic = time.time()
    col = ax.contourf(x, y, r, transform=cartopy.crs.PlateCarree(),
        levels=np.arange(100,161,step))
    toc = time.time()
    points = sum(len(e.get_paths()) for e in col.collections)
    print('[Cartopy] Step: {} Points: {}  Time: {:.0f}ms'.format(
        step, points, (toc - tic) * 1000))


#-------------------
#  Basemap
#-------------------
_map = Basemap(projection='lcc', resolution=None,
    lon_0=100.0, lat_0=35.0, llcrnrlat=0,
    urcrnrlat=45, llcrnrlon=60, urcrnrlon=140)
for step in (20, 10, 5):
    tic = time.time()
    col = _map.contourf(x, y, r, levels=np.arange(100,161,step))
    toc = time.time()
    points = sum(len(e.get_paths()) for e in col.collections)
    print('[Basemap] Step: {} Points: {}  Time: {:.0f}ms'.format(
        step, points, (toc - tic) * 1000))

The result:

[Cartopy] Step: 20 Points: 123  Time: 729ms
[Cartopy] Step: 10 Points: 273  Time: 1433ms
[Cartopy] Step: 5 Points: 528  Time: 2887ms
[Basemap] Step: 20 Points: 123  Time: 118ms
[Basemap] Step: 10 Points: 273  Time: 151ms
[Basemap] Step: 5 Points: 528  Time: 179ms

which means Cartopy can be up to 16x slower on my old laptop. After some profiling work I find the culprit is https://github.com/SciTools/cartopy/blob/master/lib/cartopy/mpl/geoaxes.py#L1505, it seems this step is to calculate new datalim and autoscale the extent. This is how far I can get to since I don't have enough time to dig in.

  • Why does this step take so much time?
  • As a quick workaround, can we just check if the extent is already set (which is the most scenario imo) before autoscaling?

Environment

  • OS: Windows 10
  • cartopy: 0.17.0
  • basemap: 1.2.0
  • matplotlib: 3.0.2
  • shapely: 1.6.4.post1

Thanks!

@efiring
Copy link

efiring commented Sep 2, 2019

This is a critical problem. In trying to translate one of my oceanography class notebooks from basemap to cartopy, a 2-second basemap plot took over 2 minutes with cartopy. That's prohibitive. As @crazyapril notes, the problem seems to be coming from a calculation added in 2016:
d74eedd
@pelson, do you remember why this was needed, and why it is done only for filled contours, not for line contours? Is there a good test case that shows the need for this calculation?

@pelson
Copy link
Member

pelson commented Sep 10, 2019

Fundamentally cartopy's contour implementation is doing a lot more work than Basemap needs to do, as we are projecting the contour lines once they have been calculated. The commit referenced is ensuring that the data limits are respected by transforming the coordinates from source to target projection.

If this calculation is a prohibitive/expensive operation, then it is always possible to fall back to the equivalent Basemap behaviour - namely, you transform your data yourself, and pass the reprojected data to be contoured in the target project. It isn't ideal as the whole point of cartopy is that you shouldn't have to worry about this stuff (like datelines and poles), but it does highlight that cartopy is a superset of the capabilities available with Basemap.

To illustrate my meaning:

Current code:

ax = plt.axes(projection=target)
ax.contourf(src_x, src_y, data, transform=source)

Becomes:

ax = plt.axes(projection=target)
target_x, target_y, target_data = whatever_used_to_be_done_in_basemap(src_x, src_y, data)
ax.contourf(target_x, target_y, target_data, transform=target)

@efiring
Copy link

efiring commented Sep 10, 2019

In the case of contouring, whatever_used_to_be_done_in_basemap is simply applying the projection transformation from lon, lat to the projected coordinates (meters?) via the __call__ method of the Basemap object, which invokes the corresponding proj4 function. The cartopy equivalent would be via the CRS.transform_points method, correct?

There is still the question of why the expensive calculation to check data limits is invoked only for contourf and not for contour.

And, I suspect the calculation is itself much more expensive than it might need to be, because it is having to navigate a tangle of python steps to do many transform calls instead of just transforming a very large number of points in one call into proj4.

I appreciate the convenience, generality, and elegance of cartopy's design. I'm hoping that performance in some of the worst cases can be greatly improved with some optimizations.

@Laura-Guillory
Copy link

I've experienced this issue as well, generating a lot of contourf maps in Basemap takes me 15 minutes but in Cartopy it takes 1 hour 4 minutes.

I tried working around it as suggested above with CRS.transform_points, but got ValueError: x and y arrays must have the same length . I think I'm missing something here, since surely you should be able to transform longitudes and latitudes that don't form a perfect square (same length)?

Anyway, hoping that whatever causes this slowdown can be fixed, or a workaround appears.

@dopplershift
Copy link
Contributor

To do it manually with lat/lon arrays that each define 1 dimension of a 2D region, you need to do the work of assembling them into the same size arrays. You can accomplish this using np.meshgrid.

@crazyapril
Copy link
Author

So one year later I restarted the migration based on the suggestion above and in case someone may need it, just replace ax.contourf(x, y, data, transform=source_proj) with:

ret = ax.projection.transform_points(source_proj, np.array(x),
    np.array(y), np.array(data)) # This method only accept ndarray!
xx = ret[..., 0]
yy = ret[..., 1]
data = ret[..., 2]
ax.contourf(xx, yy, data)

Still, I think in this case the slowdown it brings somewhat outweigh the convenience. It would be better if this behavior is explained in the documentation.
Also, I find if latitudes cross the dateline, using only np.meshgrid as @dopplershift suggested would cause a blank output. A bug?

@dopplershift
Copy link
Contributor

Unless your data actually represent a height above ground, you shouldn't be passing that to transform_points, so I would try the above as:

ret = ax.projection.transform_points(source_proj, np.array(x),
    np.array(y)) # This method only accept ndarray!
xx = ret[..., 0]
yy = ret[..., 1]
ax.contourf(xx, yy, data)

@karlwx
Copy link

karlwx commented Nov 1, 2020

I don't think the data transformation is the bottleneck. It may make it slower, but the problem here as noted above is the calculation of the data limits. After doing some testing, even without a data transformation cartopy is MUCH slower than matplotlib with contouring.

I use cartopy in several operational scripts and it has gotten to the point where the plotting speed (specifically when using contourf) is a major limiting factor; the plots just can't keep up with the data coming in (unless we had 100's of cpu's available). As the community continues to migrate to cartopy, I think others will run into the same problem. I think this is an issue which should get some more attention to find out where exactly the bottlenecks are and what can be done. I would think that all of the contour math calculations should be vectorized, if possible, as a start.

@greglucas
Copy link
Contributor

I remember profiling all of this a while ago and the bottleneck is in the transform_non_affine function because we are projecting the contour lines from the source projection to target projection as mentioned above. There is a slow for loop to go through all of the geometries and project them into the new projection and currently there really isn't a great way to get around that. There is ongoing work to merge pygeos and Shapely capabilities so that you can do more vectorized calculations which I think would directly apply to this situation.

If you need it operationally fast, did you try the workaround mentioned above?

@karlwx
Copy link

karlwx commented Nov 2, 2020

Greg, I'm not sure I'm understanding that workaround correctly. In my test, the data are already in axes coordinates. Cartopy is slower even when the coordinates do not need to be transformed. To explain what I mean, here's an example - I am not passing any transform kwarg to contourf - but the contourf call is far slower with a cartopy axes instance than matplotlib:

import numpy as np
import cartopy.crs as ccrs
import matplotlib.pyplot as plt

lat = np.arange(-10,90,0.1)
lon = np.arange(0,100,0.1)
lons,lats = np.meshgrid(lon,lat)
data = np.random.rand(len(lat), len(lon))

fig = plt.figure(figsize=(12,9))
ax = fig.add_subplot(111, projection=ccrs.PlateCarree()) # cartopy axes
#ax = fig.add_subplot(111) # contouring on regular matplotlib axes is much faster

ax.contourf(lons, lats, data) # no data transform

Here's a screenshot of the profile for contouring on the cartopy axes:
Screenshot from 2020-11-02 11-43-25

What takes ~15 seconds to do w/ cartopy takes only ~5 seconds with matplotlib alone.

@greglucas
Copy link
Contributor

Thanks @karlwx! I did not realize that was the case even without transforms. I also don't get why the same accurate bounding box calculations aren't computed for the contour case, but only for the filled version here.

I agree this seems like a pretty big speed problem, especially if someone is already setting the extent manually and doesn't care about the limits being updated accurately. I think the lowest hanging fruit would be to add a keyword argument to contourf to tell it to ignore the accurate limit calculations, leave it True by default and allow a user to override it if they want the fast path.

# We need to compute the dataLim correctly for contours.
bboxes = [col.get_datalim(self.transData)
for col in result.collections
if col.get_paths()]
if bboxes:
extent = mtransforms.Bbox.union(bboxes)
self.dataLim.update_from_data_xy(extent.get_points())

@karlwx
Copy link

karlwx commented Nov 9, 2020

After digging into this, it appears the issue runs deeper than we thought. While changing the above code improves speed of the contourf call itself, It turns out there are some serious slow-downs when the image is drawn for display. I added plt.savefig() to the code and here is the result:
Screenshot from 2020-11-09 16-58-53
It looks like something to do with the matplotlib Axes draw wrapper, but I'm not entirely sure how it all fits together.

@Gwi7d31
Copy link

Gwi7d31 commented Nov 23, 2020

I have run into this issue as well trying to use contourf via MetPy. Right now, an RTMA for 2mtr temperatures across the CONUS takes ~5min to plot the contour on an X5650. contour is just fine and I don't see any speed degradations.

# @param [String] passed_run_hour
# @param [String] passed_final_out_directory
# @param [String] passed_netcdf_file_path
def __makeTheGraphics(self, passed_run_hour, passed_final_out_directory, passed_netcdf_file_path):
    data = self.__getNetCDFData(passed_netcdf_file_path)
    print(list(data.variables))
    self.logging.info(f'Establishing variables...')
    # Establish the variables
    lat = data['latitude']
    lon = data['longitude']
    times = data['time']
    temp2mtr = data['TMP_2maboveground'][0]
    # Convert temp to °F
    temp2mtr.metpy.convert_units('degF')
    dp2mtr = data['DPT_2maboveground'][0]
    visibility = data['VIS_surface'][0]
    ####THIS IS WHERE MULTIPROCESSING SHOULD START####
    self.logging.info(f'Beginning creation of plot...')
    # Establish the projection
    crs = ccrs.Mercator()
    # Based upon 100dpi as default 1500x1000 Matplotlib
    fig = plt.figure(figsize=(15., 10.), frameon=False)
    # Matplotlib Adds an Axes to the figure as part of a subplot arrangement
    ax = fig.add_subplot(1, 1, 1, projection=crs)
    # Set the lat/long bounds of the image
    region = "conus"
    bounds = [(-127, -65, 23, 50)]
    ax.set_extent(*bounds, crs=ccrs.PlateCarree())
    # Add details to the map like borders and counties
    ax.add_feature(cfeature.COASTLINE.with_scale('50m'), linewidth=0.75)
    ax.add_feature(cfeature.STATES.with_scale('50m'), linewidth=0.5) # edgecolor='#0099ff'
    ax.add_feature(cfeature.BORDERS.with_scale('50m'), linewidth=1.0)
    #ax.add_feature(USCOUNTIES.with_scale('5m'))
    # Apply Title
    #plt.suptitle('2mtr Temp Test', x=0.25, y=0.25, horizontalalignment='left', verticalalignment='top', fontsize = 16)
    ax.set_title(f'Valid: {passed_run_hour}Z', loc='left')
    self.logging.info(f'Adding contours to plot...')
    #Contour Plot Of 2mtr Temp
    cf = ax.contourf(lon, lat, temp2mtr, range(-40, 120, 5), cmap=plt.cm.gist_earth_r, transform=ccrs.PlateCarree())
    c32f = ax.contour(lon, lat, temp2mtr, [32], transform=ccrs.PlateCarree(), colors='r')
    cb = fig.colorbar(cf, orientation='horizontal', extend='max')
    self.logging.info(f'Saving file -- {passed_final_out_directory}{passed_run_hour}/{region}-2mTemp.jpg')
    fig.savefig(f'{passed_final_out_directory}{passed_run_hour}/{region}-2mTemp.jpg', pad_inches=0.0, format="jpg")

image

@karlwx
Copy link

karlwx commented Nov 23, 2020

I had some success improving the speed of hi-res plots like that by using xarray coarsen to resample the data to a lower resolution (did this with National Blend of Models), but it kind of defeats the purpose of the hi-res guidance in the first place. The bottom line is, the plotting time is still excessive.

@Gwi7d31
Copy link

Gwi7d31 commented Nov 23, 2020

I'll mess around with things a little more this evening. I pulled up some code from when I was playing with GOES16 data a year ago or so. In that code, I used pcolormesh to generate the image. I can't remember the difference between that and contourf right now.

@karlwx
Copy link

karlwx commented Nov 23, 2020

So the difference is pcolormesh does filled grid boxes, while contourf is filled contour lines. At that resolution on a national map, pcolormesh might look okay. It will also be MUCH faster. The disadvantage of pcolormesh is it produces a "blocky" look with low resolution data. contourf is always going to be more visually-appealing, but the trade-off is how slow it is.

@Gwi7d31
Copy link

Gwi7d31 commented Nov 23, 2020

I'll run a benchmark between the two and also drop the images to show comparisons in their quality. This could be an alternative at least when dealing with hi-res data being displayed in CONUS.

@greglucas
Copy link
Contributor

Putting a few cross-referenced issues here with other libraries dealing with the non-affine transforms being slow.

MPL issue with slow non-affine contours:
matplotlib/matplotlib#11464

Astropy's fix:
astropy/astropy#7568

I think there will still be issues with the contours if one of them gets wrapped and needs to be cut along a boundary or something like that though, so the Astropy fix may not be totally suitable for our use-case.

@greglucas
Copy link
Contributor

@karlwx and anyone else interested, I opened up PR #1690 with a faster contour option. Feel free to take a look at that and comment there if you have other thoughts/ideas on it, or if it doesn't work for your use-case.

@johnrobertlawson
Copy link

johnrobertlawson commented Jun 16, 2021

@karlwx and anyone else interested, I opened up PR #1690 with a faster contour option. Feel free to take a look at that and comment there if you have other thoughts/ideas on it, or if it doesn't work for your use-case.

Hi greglucas: was this ever implemented? I've hunted around for any speed optimisations recently and I don't see anything. I've just benchmarked a whole suite of plotting jobs in NCL, basemap, and cartopy, and NCL beats cartopy in over half the time, and I'm banging my head on a wall trying to rectify this. Any help - or a lay-of-the-land in case I missed something - would be greatly appeared, thanks.

@Gwi7d31
Copy link

Gwi7d31 commented Jun 16, 2021 via email

@johnrobertlawson
Copy link

johnrobertlawson commented Jun 16, 2021

Hi John, I can tell you that the code changes do indeed work. I helped to verify the code when this PR came up because I had the exact same issue you are describing. With the issue still being open, the changes were not merged in. What I've personally done is make an Anaconda build with the latest MetPy build and then take the code changes from the PR and apply them to your build. Or if you're not an Anaconda type person and you just need the cartopy build, then clone the PR'd branch here: https://github.com/greglucas/cartopy/tree/contour_fastpath Best, Garrett GwiazdaMeteorologist

Well, that is highly appreciated, thank you! I am a "conda person" so I'll crack on with this. Many thanks.

@greglucas
Copy link
Contributor

Would you two mind commenting on the PR with your thoughts? If this is a feature that would help you it would be great to get feedback from users on it and try to get it in a release! :)

@Gwi7d31
Copy link

Gwi7d31 commented Jun 16, 2021 via email

@wxninja
Copy link

wxninja commented Sep 15, 2021

Hi all--

Is this PR still open and ongoing? I'll admit it's been an exciting find. We're currently in the middle of a (long overdue) migration from basemap, but are really struggling to engineer a solution to this problem. Contourf just takes too long, especially the dozens--if not hundreds--of images we create routinely through one programming cycle.

I'd love to test this, if it's still an option. I'm working in a Conda framework, which I readily admit is not my specialty, so I'm a little out of my element. I've attempted to PIP Greg's clone for the fast_transform (https://github.com/greglucas/cartopy.git) using 'git' inside the Conda framework, but I must have fouled up the process. When I attempt to employ the 'fast' or 'fast_transform' in code, it fails, telling me it can't recognize that kwarg:

ax.contourf(lons, lats, data, transform=ccrs.PlateCarree(), fast=True)
UserWarning: The following kwargs were not used by contour: 'fast'

Anyway, I'm sure I'm doing something wrong...but if anybody can give me a kick in the pants so I can give this a try, I'd appreciate it.

As always, thanks!

@greglucas
Copy link
Contributor

@wxninja, you'll have to be able to install from source to get it to work. But, you will want to checkout the specific branch contour_fastpath, not the master branch. If you go over to the PR #1690, you should be able to keep up-to-date with what is going on.

Here are the GitHub suggested commands you need to check that PR out locally and test it.

git checkout -b greglucas-contour_fastpath master
git pull https://github.com/greglucas/cartopy.git contour_fastpath

Feel free to chime in on the PR over there with comments/suggestions!

@wxninja
Copy link

wxninja commented Sep 16, 2021

@greglucas Thanks! I'm up and running now. Turns out I just kept getting the syntax wrong when trying to pull 1690. Duh.

Just wanted to chime in and say thank you for engineering this potential solution. I think it'll be a great help, and expand the 'usability' of Cartopy for those who need to make large amounts of graphics, or make graphics quickly.

@johnrobertlawson
Copy link

johnrobertlawson commented Sep 16, 2021 via email

@dopplershift dopplershift added this to the 0.20 milestone Sep 16, 2021
@greglucas
Copy link
Contributor

Thanks for the kind words, @johnrobertlawson!

Note to everyone in this thread: The PR addressing this issue has just been merged and you will be able to use the keyword argument transform_first=True to get the faster behavior in the next release. Be warned that it does not handle wrapped coordinates as well, but on limited domains it should provide a significant speed improvement!

@Gwi7d31
Copy link

Gwi7d31 commented Sep 17, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants