Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpolate_to_grid - Unexpected behavior when using Pandas dataframe #1029

Closed
dtho143 opened this issue Apr 22, 2019 · 4 comments · Fixed by #1034
Closed

Interpolate_to_grid - Unexpected behavior when using Pandas dataframe #1029

dtho143 opened this issue Apr 22, 2019 · 4 comments · Fixed by #1034
Labels
Area: Calc Pertains to calculations Type: Bug Something is not working like it should
Milestone

Comments

@dtho143
Copy link

dtho143 commented Apr 22, 2019

I was recently working with the interpolate_to_grid function, following the example from https://unidata.github.io/MetPy/latest/examples/gridding/Point_Interpolation.html, and I was encountering some unexpected behavior. In my case, I was looking at stations in a particular state instead of the entire country, and I had loaded my data from a csv into a Pandas dataframe. When I started interpolating my data, several interpolation methods (linear, cubic, rbf, Cressman, Barnes) worked fine, but when I tried working with natural neighbor and nearest neighbor, it would not work. When I changed the input x and y coordinates from df.Long and df.Lat to df.Long.values and df.Lat.values, the interpolation worked. The other interpolation methods I tried do not have a problem with using df.Long and df.Lat, only the natural and nearest neighbor methods had an issue.

  • In the case of natural neighbor, either it would return an error if the resolution in hres was too small, or it would generate one solid color cell that covered all of my stations. If my resolution was too large, it would not have any output at all.

Code example (works)

gx, gy, img = interpolate_to_grid(df.Long.values, df.Lat.values, df.MaxTemp.values, 
                                  interp_type = "natural_neighbor", hres = 0.1)

Code example (does not work)

gx, gy, img = interpolate_to_grid(df.Long, df.Lat, df.MaxTemp, 
                                  interp_type = "natural_neighbor", hres = 0.1)

Error Message

KeyError                                  Traceback (most recent call last)
<ipython-input-32-2ded1cd705ce> in <module>
     19 
     20 gx, gy, img = interpolate_to_grid(df.Long, df.Lat, df.MaxTemp, 
---> 21                                   interp_type = "natural_neighbor", hres = 0.1)
     22 img = np.ma.masked_where(np.isnan(img), img)
     23 test = plt.pcolormesh(gx, gy, img, cmap=cmap, norm=norm)

~\Anaconda3\envs\GEOG4016\lib\site-packages\metpy\interpolate\grid.py in interpolate_to_grid(x, y, z, interp_type, hres, minimum_neighbors, gamma, kappa_star, search_radius, rbf_func, rbf_smooth, boundary_coords)
    329                                 minimum_neighbors=minimum_neighbors, gamma=gamma,
    330                                 kappa_star=kappa_star, search_radius=search_radius,
--> 331                                 rbf_func=rbf_func, rbf_smooth=rbf_smooth)
    332 
    333     return grid_x, grid_y, img.reshape(grid_x.shape)

~\Anaconda3\envs\GEOG4016\lib\site-packages\metpy\interpolate\points.py in interpolate_to_points(points, values, xi, interp_type, minimum_neighbors, gamma, kappa_star, search_radius, rbf_func, rbf_smooth)
    343     # If this is natural neighbor, hand it along to `natural_neighbor`
    344     elif interp_type == 'natural_neighbor':
--> 345         return natural_neighbor_to_points(points, values, xi)
    346 
    347     # If this is Barnes/Cressman, determine search_radios and hand it along to

~\Anaconda3\envs\GEOG4016\lib\site-packages\metpy\interpolate\points.py in natural_neighbor_to_points(points, values, xi)
    201             points_transposed = np.array(points).transpose()
    202             img[ind] = natural_neighbor_point(points_transposed[0], points_transposed[1],
--> 203                                               values, xi[grid], tri, neighbors, triangle_info)
    204 
    205     return img

~\Anaconda3\envs\GEOG4016\lib\site-packages\metpy\interpolate\points.py in natural_neighbor_point(xp, yp, variable, grid_loc, tri, neighbors, triangle_info)
    145             total_area += cur_area
    146 
--> 147             area_list.append(cur_area * value[0])
    148 
    149         except (ZeroDivisionError, qhull.QhullError) as e:

~\Anaconda3\envs\GEOG4016\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    765         key = com._apply_if_callable(key, self)
    766         try:
--> 767             result = self.index.get_value(self, key)
    768 
    769             if not is_scalar(result):

~\Anaconda3\envs\GEOG4016\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   3116         try:
   3117             return self._engine.get_value(s, k,
-> 3118                                           tz=getattr(series.dtype, 'tz', None))
   3119         except KeyError as e1:
   3120             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 0
  • For nearest neighbor:

Code example (works)

gx, gy, img = interpolate_to_grid(df.Long.values, df.Lat.values, df.MaxTemp.values, 
                                  interp_type = "nearest", hres = 0.1)

Code example (does not work)

gx, gy, img = interpolate_to_grid(df.Long, df.Lat, df.MaxTemp, 
                                  interp_type = "nearest", hres = 0.1)

Error Message

AttributeError                            Traceback (most recent call last)
<ipython-input-33-e06e6b03318d> in <module>
     18 norm = BoundaryNorm(levels, ncolors=cmap.N, clip=True)
     19 
---> 20 gx, gy, img = interpolate_to_grid(df.Long, df.Lat, df.MaxTemp, interp_type = "nearest", hres = 0.1)
     21 img = np.ma.masked_where(np.isnan(img), img)
     22 test = plt.pcolormesh(gx, gy, img, cmap=cmap, norm=norm)

~\Anaconda3\envs\GEOG4016\lib\site-packages\metpy\interpolate\grid.py in interpolate_to_grid(x, y, z, interp_type, hres, minimum_neighbors, gamma, kappa_star, search_radius, rbf_func, rbf_smooth, boundary_coords)
    331                                 rbf_func=rbf_func, rbf_smooth=rbf_smooth)
    332 
--> 333     return grid_x, grid_y, img.reshape(grid_x.shape)
    334 
    335 

~\Anaconda3\envs\GEOG4016\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   4374             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   4375                 return self[name]
-> 4376             return object.__getattribute__(self, name)
   4377 
   4378     def __setattr__(self, name, value):

AttributeError: 'Series' object has no attribute 'reshape'

Technical Details:
Windows 7 Enterprise
Metpy version 0.10.0
Anaconda version 4.5.12
Python 3.7.1

@ahuang11
Copy link
Contributor

ahuang11 commented May 1, 2019

The first error is probably related to some pandas attempting to align the index
image

Actually, it's just trying to locate the 0 index of your pd.Series which doesn't exist
area_list.append(cur_area * value[0])

image

The second error is what the error states: it doesn't have a reshape method.

I think the fix for the function (and other functions in Metpy) would simply be checking whether the input is a pandas dataframe/series, and if so, casting it to a numpy array as you've done.

if isinstance(x, (pd.DataFrame, pd.Series)):
    x = x.values

Or on the flip side, implictly assume the values are pandas Series, since I don't think metpy imports pandas:

if not isinstance(x, np.ndarray):
    x = x.values

@ahaberlie
Copy link
Contributor

I had @dtho143 post this so we could see if @dopplershift et al. want to handle this conversion/check or if we are still only going to accept "array_like". The docs clearly mention accepting only "array_like", but the odd behavior of this working for some interpolations and not others could cause some confusion. I do like either of @ahuang11 's solutions, but does the former open a can of worms (checking for xr.Dataset, xr.DataArray, etc.)?

@jthielen
Copy link
Collaborator

jthielen commented May 2, 2019

I'd be interested in seeing a way to handle this as well (and would be willing to work on it), but I'm not sure if always forcing it directly to NumPy arrays is the best approach, given the plans to move to an xarray-centric model in the near future. However, adding the special case for DataFrames/Series may be the best solution in the interim until the xarray-based refactoring could be done?

@dopplershift
Copy link
Member

@dtho143 Thanks for the excellent report, and sorry for not getting back to you sooner. It's been 🔥 🔥 🚒 around here.

Yuck. So:

  1. I consider a Dataframe to be "array_like", so I would have hoped it would work.
  2. Since, right now at least, I envision DataFrame to be our data model for point data, it really should work

Maybe instead of checking for the type, we could get away with checking for values? Like:

if hasattr(x, 'values'):
    x = x.values

It still makes my skin crawl a bit, but the ugliness would be hidden until we could do better. I do wonder in the case of reshape, if we could tweak things to no longer need the operation. Regardless, it also indicates the need for some tests using pandas in the test suite.

I should also add that since xarray is now a required dependency, we can add a dependency on pandas essentially for free.

@dopplershift dopplershift added this to the 0.11 milestone Aug 30, 2019
@dopplershift dopplershift added Area: Calc Pertains to calculations Type: Bug Something is not working like it should labels Aug 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Calc Pertains to calculations Type: Bug Something is not working like it should
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants