Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does cross_section recognize projections? #1354

Closed
thedmv opened this issue Apr 17, 2020 · 12 comments · Fixed by #1430
Closed

How does cross_section recognize projections? #1354

thedmv opened this issue Apr 17, 2020 · 12 comments · Fixed by #1430
Labels
Area: Cross-sections Pertains to making cross-sections through data Area: Xarray Pertains to xarray integration Type: Question Issues does not require work, but answers a user question
Milestone

Comments

@thedmv
Copy link

thedmv commented Apr 17, 2020

Problem Description

I am attempting to use the cross_section function for NARR data that I have downloaded from PSL @ NOAA. I keep getting errors around how handle coordinates properly.

My Code

import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
from datetime import datetime
from siphon.catalog import TDSCatalog
from netCDF4 import Dataset, num2date, date2index, date2num
from metpy.interpolate import cross_section

base_url = 'https://psl.noaa.gov/thredds/catalog/Datasets/NARR/pressure/'
dtselect = datetime(2018, 7, 2, 18)
ttdscat = TDSCatalog(f'{base_url}/catalog.xml').datasets[f'air.{dtselect:%Y}{dtselect:%m}.nc'].access_urls['OPENDAP']
ttds = Dataset(ttdscat)

# Now we select our time of interest from the data for the temeprature variable.
dtunits = ttds.variables['time'].units # time epoch
t0time = date2num(dates = dtselect, units = dtunits)
t0index = int(np.where(ttds.variables['time'][:] == t0time)[0]) # start time index
times = num2date(ttds.variables['time'][t0index], units = dtunits) # Datetimes, from file 
tt = ttds.variables['air'][t0index, :, :, :]

# Extract the level of interest and the coordinates
lat = ttds.variables['lat'][:]
lon = ttds.variables['lon'][:]
level = ttds.variables['level'][:]
dataproj = ccrs.PlateCarree()

ttx = xr.Dataset({'temperature': (['level', 'y', 'x'], tt),
                  'lat': (['y', 'x'], lat),
                  'lon': (['y', 'x'], lon)},
                  'crs': dataproj},
                 coords = {'x': ttds.variables['x'][:],
                           'y': ttds.variables['y'][:],
                           'time': dtselect,
                           'level': level})
ttx = ttx.metpy.parse_cf()

# Create a line going through NYC
start = (41.26, -73.95)
end = (40.04, -73.70)

cross = cross_section(ttx, start, end).set_coords(('lon', 'lat'))

My error

image

Looking at cross_section.py for help

I was hoping to learn from the example in order to fix my problems, but I can't get past this error from the example script cross_section.py.
image

I'm not sure where to start to begin to address my problem with defining the CRS for my data, and I was hoping to seek guidance from your cross_section.py code, but now I'm stuck with nothing working for me.

-David
Python version: Python 3.7.6
MetPy version: 0.12.0
I am using Python on Windows using Miniconda 3.

@thedmv thedmv added the Type: Bug Something is not working like it should label Apr 17, 2020
@jthielen
Copy link
Collaborator

jthielen commented Apr 20, 2020

To answer your direct question:

How does cross_section recognize projections?

As of MetPy 0.12, it does so by having a CFProjection object in the crs coordinate of a DataArray/Dataset, which is created by the parsing of CF projection attributes in .metpy.parse_cf() as designated by the grid_mapping attribute. In MetPy 1.0, an .assign_crs method has also been added to directly add the proper CRS from Cf projection attributes without requiring them to be in a grid mapping variable.

In your example, this was one of the issues you encountered, since MetPy won't recognize a Cartopy CRS instead of a CFProjection in the 'crs' coordinate. Another issue (as seen in your second screenshot) is that xarray 0.15.1 broke MetPy 0.12.0's accessor functionality. This was fixed in the recent MetPy 0.12.1 release, which I would recommend upgrading to.


To resolve your issue now, I would recommend taking advantage of the convenience functionality in xarray and MetPy, and modifying your code to something like

import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
from datetime import datetime
from metpy.interpolate import cross_section

dtselect = datetime(2018, 7, 2, 18)
access_url = dtselect.strftime('https://psl.noaa.gov/thredds/dodsC/Datasets/NARR/pressure/air.%Y%m.nc')
ttds = xr.open_dataset(access_url).metpy.parse_cf()
ttx = ttds.sel(time=dtselect)

# Create a line going through NYC
start = (41.26, -73.95)
end = (40.04, -73.70)
cross = cross_section(ttx, start, end, steps=20).set_coords(('lon', 'lat'))

print(cross)
<xarray.Dataset>
Dimensions:            (index: 20, level: 29)
Coordinates:
    time               datetime64[ns] 2018-07-02T18:00:00
    crs                object Projection: lambert_conformal_conic
  * level              (level) float32 1000.0 975.0 950.0 ... 150.0 125.0 100.0
    lat                (index) float64 41.26 41.2 41.13 ... 40.17 40.1 40.04
    lon                (index) float64 -73.95 -73.94 -73.93 ... -73.72 -73.7
    x                  (index) float64 8.336e+06 8.34e+06 ... 8.41e+06 8.414e+06
    y                  (index) float64 4.244e+06 4.238e+06 ... 4.129e+06
  * index              (index) int64 0 1 2 3 4 5 6 7 ... 12 13 14 15 16 17 18 19
Data variables:
    Lambert_Conformal  int32 ...
    air                (level, index) float64 304.2 304.3 304.4 ... 204.6 204.6
Attributes:
    Conventions:                     CF-1.2
    centerlat:                       50.0
    centerlon:                       -107.0
    comments:                        
    institution:                     National Centers for Environmental Predi...
    latcorners:                      [ 1.000001  0.897945 46.3544   46.63433 ]
    loncorners:                      [-145.5       -68.32005    -2.569891  14...
    platform:                        Model
    standardpar1:                    50.0
    standardpar2:                    50.000001
    title:                           8x Daily NARR
    history:                         created Sat Feb 27 07:07:49 MST 2016 by ...
    dataset_title:                   NCEP North American Regional Reanalysis ...
    references:                      https://www.esrl.noaa.gov/psd/data/gridd...
    source:                          http://www.emc.ncep.noaa.gov/mmb/rreanl/...
    References:                      
    DODS_EXTRA.Unlimited_Dimension:  time

(given that NARR is not a high-resolution dataset, it's probably a good idea to reduce the number of cross section steps from the default of 100 for the closely spaced start and end points.)

@jthielen jthielen added Area: Cross-sections Pertains to making cross-sections through data Area: Xarray Pertains to xarray integration Status: Not A Bug Describes behavior that is working as expected Type: Question Issues does not require work, but answers a user question and removed Type: Bug Something is not working like it should labels Apr 20, 2020
@thedmv
Copy link
Author

thedmv commented Apr 20, 2020

Thank you @jthielen for your response. I noticed that a new release of MetPy was released and it did finally fix the issues I was having with the MetPy example of cross_section.py script.

However when I tried your fix for my specific dataset, I came across an issue at the xr.open_dataset(access_url).metpy.parse_cf() step. I copy and pasted your recommended script and I get an error:

(C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2) C:\Users\melev\Downloads>python my_narr_cross_section_v2.py
C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\xarray\conventions.py:494: SerializationWarning: variable 'air' has multiple fill values {9.96921e+36, -9.96921e+36}, decoding all values to NaN.
  use_cftime=use_cftime,
oc_open: server error retrieving url: code=403 message="Request too big=2781.088864 Mbytes, max=500.0"oc_open: server error retrieving url: code=403 message="Request too big=2781.088864 Mbytes, max=500.0"oc_open: server error retrieving url: code=403 message="Request too big=2781.088864 Mbytes, max=500.0"oc_open: server error retrieving url: code=403 message="Request too big=2781.088864 Mbytes, max=500.0"oc_open: server error retrieving url: code=403 message="Request too big=2781.088864 Mbytes, max=500.0"oc_open: server error retrieving url: code=403 message="Request too big=2781.088864 Mbytes, max=500.0"oc_open: server error retrieving url: code=403 message="Request too big=2781.088864 Mbytes, max=500.0"Traceback (most recent call last):
  File "my_narr_cross_section_v2.py", line 13, in <module>
    ttds = xr.open_dataset(access_url).metpy.parse_cf()
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\metpy\xarray.py", line 620, in parse_cf
    for single_varname in varname])
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\metpy\xarray.py", line 620, in <listcomp>
    for single_varname in varname])
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\metpy\xarray.py", line 661, in parse_cf
    return var.metpy.quantify()
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\metpy\xarray.py", line 164, in quantify
    not isinstance(self._data_array.data, units.Quantity)
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\xarray\core\dataarray.py", line 548, in data
    return self.variable.data
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\xarray\core\variable.py", line 343, in data
    return self.values
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\xarray\core\variable.py", line 451, in values
    return _as_array_or_item(self._data)
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\xarray\core\variable.py", line 254, in _as_array_or_item
    data = np.asarray(data)
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\xarray\core\indexing.py", line 677, in __array__
    self._ensure_cached()
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\xarray\core\indexing.py", line 674, in _ensure_cached
    self.array = NumpyIndexingAdapter(np.asarray(self.array))
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\xarray\core\indexing.py", line 653, in __array__
    return np.asarray(self.array, dtype=dtype)
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\xarray\core\indexing.py", line 557, in __array__
    return np.asarray(array[self.key], dtype=None)
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\xarray\coding\variables.py", line 72, in __array__
    return self.func(self.array)
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\xarray\coding\variables.py", line 138, in _apply_mask
    data = np.asarray(data, dtype=dtype)
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\numpy\core\_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\xarray\core\indexing.py", line 557, in __array__
    return np.asarray(array[self.key], dtype=None)
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\xarray\backends\netCDF4_.py", line 73, in __getitem__
    key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\xarray\core\indexing.py", line 837, in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\xarray\backends\netCDF4_.py", line 85, in _getitem
    array = getitem(original_array, key)
  File "C:\Users\melev\Desktop\FORCE\NARRviz\narrviz2\lib\site-packages\xarray\backends\common.py", line 54, in robust_getitem
    return array[key]
  File "netCDF4\_netCDF4.pyx", line 4408, in netCDF4._netCDF4.Variable.__getitem__
  File "netCDF4\_netCDF4.pyx", line 5352, in netCDF4._netCDF4.Variable._get
  File "netCDF4\_netCDF4.pyx", line 1887, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Access failure

I could be wrong, but this looks like xr.open_dataset is trying to download the data locally, which I attempted to avoid when I initially wrote the script (the one in my original post). But my original method did not save the NARR metadata, which helps supply cross_section with the needed CRS information (provided by metpy.parse_cf()) . I don't know how you were able to get to further with the same code and I am suspecting that maybe xarray is not the same version as yours.

I am using conda environments and I am attaching the environmental yml file I am using. Note that MetPy is being installed using pip since the latest version was not on conda-forge at the time I wrote it.
environment_gitmetpy.txt

For completeness here is a list of the versions of the packages being used for this script (I did import <package-name>; <package-name>.__version__ for all of these):
Python: 3.7.6
Cartopy: 0.17.0
Matplotlib: 3.2.1
Numpy: 1.18.1
Xarray: 0.15.0
MetPy: 1.0.0rc1.post49+gfd319f50 (not sure why this looks this way but it's suppose to be the latest github version)

Also thank you for your recommendation on the step size for cross_section. Once I get it working, I will definitely use less than 100 points, since it makes sense to do so for this resolution.

@thedmv
Copy link
Author

thedmv commented Apr 20, 2020

I just tried running your recommended script on an updated xarray (0.15.1), and the same problem remains. Is there a way for me to use something like my original method, where I have an array of values, and then read only the metadata using xr.open_dataset and attach it to my array slice so that metpy.parse_cf() reads the information the way it likes? I always thought that xr.open_dataset was supposed to work on the data remotely, but this error seems to suggest otherwise, for this case.

@jthielen
Copy link
Collaborator

jthielen commented Apr 21, 2020

Hmm...that's quite interesting! Indeed, (on Ubuntu and RHEL) I've not had any issue like that no matter what xarray version I test with. Using OPENDAP remote URLs lazy load properly and only download the variable data when used, and not the whole dataset. That being said, I suspect that the error

oc_open: server error retrieving url: code=403 message="Request too big=2781.088864 Mbytes, max=500.0"

is the culprit...leading me to believe this is more of a Windows/NetCDF4 issue than something on MetPy's or even xarray's end. @dopplershift Do you have any thoughts here? See @dopplershift's reply below. My assumption was mistakenly based on using the MetPy 0.12.x branch rather than master.

Though, it still should be possible to work around it. I would think that, assuming in your original example, you can get ttds just fine, then something like this should work (and be simplest):

import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
from datetime import datetime
from siphon.catalog import TDSCatalog
from netCDF4 import Dataset
from metpy.interpolate import cross_section

base_url = 'https://psl.noaa.gov/thredds/catalog/Datasets/NARR/pressure/'
dtselect = datetime(2018, 7, 2, 18)
ttdscat = TDSCatalog(f'{base_url}/catalog.xml').datasets[f'air.{dtselect:%Y}{dtselect:%m}.nc'].access_urls['OPENDAP']
ttds = Dataset(ttdscat)

ttds = xr.open_dataset(xr.backends.NetCDF4DataStore(ttds)).metpy.parse_cf()
ttx = ttds.sel(time=dtselect)

# Create a line going through NYC
start = (41.26, -73.95)
end = (40.04, -73.70)
cross = cross_section(ttx, start, end, steps=20).set_coords(('lon', 'lat'))

If that doesn't work, and it really is a memory issue so that you have to perform that index-based sub-setting with the NetCDF4 dataset...then it's a bit more involved workaround. It would then look something like this:

import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
from datetime import datetime
from siphon.catalog import TDSCatalog
from netCDF4 import Dataset, num2date, date2index, date2num
from metpy.interpolate import cross_section
from metpy.plots.mapping import CFProjection

base_url = 'https://psl.noaa.gov/thredds/catalog/Datasets/NARR/pressure/'
dtselect = datetime(2018, 7, 2, 18)
ttdscat = TDSCatalog(f'{base_url}/catalog.xml').datasets[f'air.{dtselect:%Y}{dtselect:%m}.nc'].access_urls['OPENDAP']
ttds = Dataset(ttdscat)

# Now we select our time of interest from the data for the temeprature variable.
dtunits = ttds.variables['time'].units # time epoch
t0time = date2num(dates = dtselect, units = dtunits)
t0index = int(np.where(ttds.variables['time'][:] == t0time)[0]) # start time index
times = num2date(ttds.variables['time'][t0index], units = dtunits) # Datetimes, from file 
tt = ttds.variables['air'][t0index, :, :, :]

# Extract the level of interest and the coordinates
lat = ttds.variables['lat'][:]
lon = ttds.variables['lon'][:]
level = ttds.variables['level'][:]
dataproj = CFProjection({
    key: ttds.variables['Lambert_Conformal'].getncattr(key) for key in ttds.variables['Lambert_Conformal'].ncattrs()
})
ttx = xr.Dataset({'temperature': (['level', 'y', 'x'], tt),
                  'lat': (['y', 'x'], lat),
                  'lon': (['y', 'x'], lon)},
                 coords = {'x': ttds.variables['x'][:],
                           'y': ttds.variables['y'][:],
                           'time': dtselect,
                           'level': level,
                           'crs': dataproj})
ttx = ttx.metpy.parse_cf()

# Create a line going through NYC
start = (41.26, -73.95)
end = (40.04, -73.70)

cross = cross_section(ttx, start, end).set_coords(('lon', 'lat'))

@jthielen jthielen added Status: Upstream Needs work done in upstream project and removed Status: Not A Bug Describes behavior that is working as expected labels Apr 21, 2020
@dopplershift
Copy link
Member

So the problem is that when we do .metpy.parse_cf(), it tries to add units to the data variable, which requires getting the data as a numpy array--which forces it to try to download the entire 4D array, which the server is complaining is way too much data to be getting over OPeNDAP (the whole point is to subset, at that point just download the whole file).

Luckily, we can work around this by changing the order of some operations:

import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
from datetime import datetime
from metpy.interpolate import cross_section

dtselect = datetime(2018, 7, 2, 18)
access_url = dtselect.strftime('https://psl.noaa.gov/thredds/dodsC/Datasets/NARR/pressure/air.%Y%m.nc')
ttds = xr.open_dataset(access_url)
subset = ttds.sel(time=dtselect)
ttx = subset.metpy.parse_cf()

# Create a line going through NYC
start = (41.26, -73.95)
end = (40.04, -73.70)
cross = cross_section(ttx, start, end, steps=20).set_coords(('lon', 'lat'))

print(cross)

That works for me (and works quite quickly!).

@jthielen any idea if there's anything we can do to make the internal call to .quantify() not force coercion to a numpy array? Or is that really a "flaw" in how xarray is presenting its virtual view of an OPeNDAP array?

@jthielen jthielen removed the Status: Upstream Needs work done in upstream project label Apr 21, 2020
@jthielen
Copy link
Collaborator

jthielen commented Apr 21, 2020

@dopplershift Good catch, and thanks for the correction! I tripped myself up with the different versions at play. I realized now that my initial and second suggestion only worked on the MetPy 0.12.x without quantify(), and they fail for me with the same memory error on current master. Sorry for the mistaken directions @thedmv!

But, this subtle point raises a big issue. metpy.quantify() currently calls .metpy.unit_array which in turn calls .values, which was a mistake on my part...it should be .data instead to allow duck array types through. However, that doesn't help here with the virtual view...it indeed looks like there's a fundamental incompatibility with how xarray lazy loads these arrays and inserting an intermediate wrapper (like Quantity) in between.

Before the problematic call to either .values or .data (they actually end up in the same path in this case), the underlying array is the following:

ttds['air'].variable._data
MemoryCachedArray(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=_ElementwiseFunctionArray(LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x7f84e7c85140>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None), slice(None, None, None)))), func=functools.partial(<function _apply_mask at 0x7f84ea3b2550>, encoded_fill_values={9.96921e+36, -9.96921e+36}, decoded_fill_value=nan, dtype=dtype('float32')), dtype=dtype('float32')), key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None), slice(None, None, None))))))

which is definitely not something that Pint can handle wrapping at the moment, and even if it was, xarray wouldn't be able to handle it being inside an external wrapper right now either.

I'm now at a bit of a loss though about the best way forward on this would be, however, since this is now a somewhat major downside to forcing wrapping in Quantity that I neglected to consider in #1304 / #1325. To at least put some ideas out there, should we:

@jthielen
Copy link
Collaborator

Also, side note, @dopplershift I realize I've been a bit too overeager with the labels, leading to some inaccuracies that have needed to be corrected later. I'll dial it back a bit.

@jthielen
Copy link
Collaborator

After thinking about this a bit more, I think the best way forward is to:

  • remove applying quantify() in parse_cf() (leaving parse_cf() strictly focused on coordinate parsing)
  • add a quantify() method to the Dataset accessor to separately quantify all data variables (as in pint-xarray: https://github.com/TomNicholas/pint-xarray/issues/6), adding a note that this will load data into memory
  • continue accepting DataArrays with either units-on-attribute or Quantity-in-DataArray as function inputs (was initially a workaround for dimension coordinates, which can't have Quantity-in-DataArray, but will now also be for non-quantified data variables)
  • continue outputting Quantity-in-DataArray in Improved xarray compatibility on function input and output #1353
  • make the documentation very clear about the gotchas involved when dealing with xarray and units (e.g., don't trust unit operations outside MetPy's functions with units-on-attribute variables)
  • continue to work upstream for improved xarray > Pint Quantity > Dask support as the way forward to having Quantity-wrapped, lazy-loaded data inside xarray objects. I'd like to try for this as a MetPy v1.1 goal.
  • maybe someday having units on the dtype will make this all easier

Does this seem reasonable?

@dopplershift
Copy link
Member

That seems reasonable to me. The only question I'd double check with is: does any of this interfere with a dtype units (NEP-41) world? I know were low on details as of yet, but I just want to make sure we're not going down any path that obviously interferes.

@thedmv
Copy link
Author

thedmv commented Apr 21, 2020

For documenting purposes, here is the the play-by-play on how each code performed and my takeaways at the end. Thank you @jthielen and @dopplershift for your feedback!!

@jthielen's Suggestions

Suggestion 1: "Simple"

Though, it still should be possible to work around it. I would think that, assuming in your original example, you can get ttds just fine, then something like this should work (and be simplest):

import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
from datetime import datetime
from siphon.catalog import TDSCatalog
from netCDF4 import Dataset
from metpy.interpolate import cross_section

base_url = 'https://psl.noaa.gov/thredds/catalog/Datasets/NARR/pressure/'
dtselect = datetime(2018, 7, 2, 18)
ttdscat = TDSCatalog(f'{base_url}/catalog.xml').datasets[f'air.{dtselect:%Y}{dtselect:%m}.nc'].access_urls['OPENDAP']
ttds = Dataset(ttdscat)

ttds = xr.open_dataset(xr.backends.NetCDF4DataStore(ttds)).metpy.parse_cf()
ttx = ttds.sel(time=dtselect)

# Create a line going through NYC
start = (41.26, -73.95)
end = (40.04, -73.70)
cross = cross_section(ttx, start, end, steps=20).set_coords(('lon', 'lat'))

Result: Same error occurred.

Suggestion 2: More Involved

If that doesn't work, and it really is a memory issue so that you have to perform that index-based sub-setting with the NetCDF4 dataset...then it's a bit more involved workaround. It would then look something like this:

import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
from datetime import datetime
from siphon.catalog import TDSCatalog
from netCDF4 import Dataset, num2date, date2index, date2num
from metpy.interpolate import cross_section
from metpy.plots.mapping import CFProjection

base_url = 'https://psl.noaa.gov/thredds/catalog/Datasets/NARR/pressure/'
dtselect = datetime(2018, 7, 2, 18)
ttdscat = TDSCatalog(f'{base_url}/catalog.xml').datasets[f'air.{dtselect:%Y}{dtselect:%m}.nc'].access_urls['OPENDAP']
ttds = Dataset(ttdscat)

# Now we select our time of interest from the data for the temeprature variable.
dtunits = ttds.variables['time'].units # time epoch
t0time = date2num(dates = dtselect, units = dtunits)
t0index = int(np.where(ttds.variables['time'][:] == t0time)[0]) # start time index
times = num2date(ttds.variables['time'][t0index], units = dtunits) # Datetimes, from file 
tt = ttds.variables['air'][t0index, :, :, :]

# Extract the level of interest and the coordinates
lat = ttds.variables['lat'][:]
lon = ttds.variables['lon'][:]
level = ttds.variables['level'][:]
dataproj = CFProjection({
    key: ttds.variables['Lambert_Conformal'].getncattr(key) for key in ttds.variables['Lambert_Conformal'].ncattrs()
})
ttx = xr.Dataset({'temperature': (['level', 'y', 'x'], tt),
                  'lat': (['y', 'x'], lat),
                  'lon': (['y', 'x'], lon)},
                 coords = {'x': ttds.variables['x'][:],
                           'y': ttds.variables['y'][:],
                           'time': dtselect,
                           'level': level,
                           'crs': dataproj})
ttx = ttx.metpy.parse_cf()

# Create a line going through NYC
start = (41.26, -73.95)
end = (40.04, -73.70)

cross = cross_section(ttx, start, end).set_coords(('lon', 'lat'))

Result: This code worked!! I added a line to print cross and it outputed:
image

@dopplershift's Suggestion

Suggestion 3: Change Order of Operations

That clarification on .metpy.parse_cf() was helpful!

Luckily, we can work around this by changing the order of some operations:

import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
from datetime import datetime
from metpy.interpolate import cross_section

dtselect = datetime(2018, 7, 2, 18)
access_url = dtselect.strftime('https://psl.noaa.gov/thredds/dodsC/Datasets/NARR/pressure/air.%Y%m.nc')
ttds = xr.open_dataset(access_url)
subset = ttds.sel(time=dtselect)
ttx = subset.metpy.parse_cf()

# Create a line going through NYC
start = (41.26, -73.95)
end = (40.04, -73.70)
cross = cross_section(ttx, start, end, steps=20).set_coords(('lon', 'lat'))

print(cross)

Result: This also worked for me. The output:
image

My Takeaways

Thank you @jthielen and @dopplershift! I have a way forward now, and look forward to producing the corresponding plots for these cross-sections (similar to the one your MetPy cross_section.py example).

My first takeaway is that I can be careful about the order of my operations so that metpy.parse_cf() doesn't cause any issues, in terms of downloading my data before my slice is ready. I understand that there may be further issues involved, but this will be helpful to me in the meantime.

My other takeaway is that projection information can be supplied in an 'ad hoc' manner using the CFProjection() function. This more 'complex' method is something I will probably use again. Although I am now using NARR data, I can imagine a situation in which I want to overlay this NARR cross section with a contour using data from a model, let's say WRF (say I am doing a very rough comparison of NARR vs My-WRF-Run by looking at contours of temperature at the same cross-section). I suspect that my WRF netcdf data is not written in a CF compliant manner (but does have the projection information in its metadata!) and therefore I would probably proceed in a similar fashion, as @jthielen did in the more complex case. Namely, I would extract the values for the WRF variable as an array and then extract the relevant crs information from the WRF metadata, and fill in the crs keys and values, using this code as guidance:

>>> for key in ttds.variables['Lambert_Conformal'].ncattrs():
...      print(key, ' ', ttds.variables['Lambert_Conformal'].getncattr(key))
...
false_easting   5632642.22547
false_northing   4612545.65137
grid_mapping_name   lambert_conformal_conic
latitude_of_projection_origin   50.0
longitude_of_central_meridian   -107.0
standard_parallel   [50. 50.]
>>>

If my WRF is running a "Lambert Conformal" projection, then I would just use the identical keys here and swap the values. But if I was using a different projection I would probably have to find the correct parameters for that projection.

After some digging, I found that those keys are found, verbatim in CF Metdata in Table 4, under Lambert Conformal (LCC-1SP). The code snippets y'all provided, with this link for CF Metadata, seems to be the complete answer to my question. My apologies if this link (or very similar material) was already referenced somewhere in the code, and I am just ignorant about it.

@jthielen
Copy link
Collaborator

That seems reasonable to me. The only question I'd double check with is: does any of this interfere with a dtype units (NEP-41) world? I know were low on details as of yet, but I just want to make sure we're not going down any path that obviously interferes.

I would guess that it does not interfere implementation-wise (or at least change anything with respect to that). I would think altering the xarray accessor (what this comes down to) to operate on a dtype-backed-unit array instead of a Pint Quantity as its preferred unit array type would be almost trivial compared to the modifications needed throughout the rest of the library. Where it would be good to keep this in mind, however, is in writing documentation, making sure to not give impressions that this way of working with units and the workarounds still needed are how it will be from now on.

@jthielen
Copy link
Collaborator

Also, thank you @thedmv for the great synopsis!

I would have like to have said in regards to your WRF takeaway that there are helpers in place for that already (that was my plan earlier this year: https://ams.confex.com/ams/2020Annual/meetingapp.cgi/Paper/369355), but they aren't quite there yet. For now, I'd just want to point out https://gist.github.com/jthielen/8881d32d08d625c75c906a7e8ad7583f and #1089, which may give you a head start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Cross-sections Pertains to making cross-sections through data Area: Xarray Pertains to xarray integration Type: Question Issues does not require work, but answers a user question
Projects
None yet
3 participants