New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OPeNDAP loading error for datasets with many zeros and NaNs #1667
Comments
Is it possible to get a C program that duplicates this issue? Otherwise, you may want to open this over at the python project, https://github.com/Unidata/netcdf4-python, and it will be linked back here if there is an issue related to the core C library. As it stands, I can't determine if this is an issue in the C library or the upstream python package. |
Unfortunately, the issue is due to size of the request although it does not seem to make sense when looking at the compressed file size. Although the data compress very well due to the repeated zero and
Warning:fetch: https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/AS-RCEC/TaiESM1/historical/r1i1p1f1/SImon/siconc/gn/v20200218/siconc_SImon_TaiESM1_historical_r1i1p1f1_gn_185001-201412.nc.dods?siconc.siconc[0][0][0:319]
Warning:fetch complete: 0.057 secs
Warning:fetch: https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/AS-RCEC/TaiESM1/historical/r1i1p1f1/SImon/siconc/gn/v20200218/siconc_SImon_TaiESM1_historical_r1i1p1f1_gn_185001-201412.nc.dods?siconc.siconc[0][1][0:319]
Warning:fetch complete: 0.053 secs
Warning:fetch: https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/AS-RCEC/TaiESM1/historical/r1i1p1f1/SImon/siconc/gn/v20200218/siconc_SImon_TaiESM1_historical_r1i1p1f1_gn_185001-201412.nc.dods?siconc.siconc[0][2][0:319]
Warning:fetch complete: 0.056 secs
Warning:fetch: https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/AS-RCEC/TaiESM1/historical/r1i1p1f1/SImon/siconc/gn/v20200218/siconc_SImon_TaiESM1_historical_r1i1p1f1_gn_185001-201412.nc.dods?siconc.siconc[0][3][0:319]
... So each request to the server is for a single Other data access services offered by the TDS do not have the same size limits on requests. For example, I can use Siphon and the |
@lesserwhirls , thank you very much for your detailed response! Good to know that Siphon + cdmremote might be a solution, but would require a large refactoring of our work flow. We are part of the I still wonder that I am able to request the very large datasets (multilevel as well as the same time, latitude and longitude) using TDS with no difficulties, but these very compressible datasets are the ones causing trouble ... I guess the data providers (modellers) were much more conservative in the time chunking of the files for the larger datasets. |
Thanks @lesserwhirls ! |
Very informative discussion! The key points appears to be this:
It sounds like other software (e.g. ncview, panoply) is smart / lazy enough not to just grab all the data. Fortunately, we can easily do this in the Pangeo stack as well, by using Dask to request the data in more manageable chunks. import xarray as xr
OPENDAP_url = 'http://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/AS-RCEC/TaiESM1/historical/r1i1p1f1/SImon/siconc/gn/v20200218/siconc_SImon_TaiESM1_historical_r1i1p1f1_gn_185001-201412.nc'
# the decode_times=False bit is required for the cftime time index
ds_opendap = xr.open_dataset(OPENDAP_url, chunks={'time': '100MB'}, decode_times=False)
display(ds_opendap) This works fine and runs immediately. We can then load the data as data = ds_opendap.siconc.compute() This was quite slow for me (about 5 min to get < 1 GB of data), but it did complete successfully. Maybe a different size chunk would have better performance.
We should be looking in to how to integrate siphon into our CMIP6 pipeline. It has a lot of powerful capabilities that could help us out a lot. I raised an issue with some questions about this in Unidata/siphon#258. That issue has a few code snippets that could be useful. |
Another point that this issue raises is error messages. If the opendap server had said: Is there a way we could propagate more informative errors through this stack? |
Yes. If you suffix your url with the string "#log" you should get some output that helps to figure out The problem is that the netcdf library cannot itself recognize that this happened |
There is a bug in 5.0 in which the response body isn't being returned in the server response (but still a 403 status). I will fix that. However, in 4.6 we do return a message. So are you saying this is as good as it gets when reading through netCDF-C? from netCDF4 import Dataset
ds = Dataset("https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GEFS/Global_1p0deg_Ensemble/members/Best#log")
data = ds.variables['v-component_of_wind_isobaric_ens'][:]
Note:oc_open: server error retrieving url: code=403 message="Request too big=12216.71808 Mbytes, max=500.0"
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-55-addea7083b31> in <module>
----> 1 data = ds2.variables["v-component_of_wind_isobaric_ens"][:]
netCDF4\_netCDF4.pyx in netCDF4._netCDF4.Variable.__getitem__()
netCDF4\_netCDF4.pyx in netCDF4._netCDF4.Variable._get()
netCDF4\_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()
RuntimeError: NetCDF: Access failure There is no way to propagate the "Note:oc_open: server error ..." message upward, or capture that and give a more informative |
That looks like a good error message to me! Not sure if @naomi-henderson saw that in her stack trace... |
Ah, that would have been very helpful! Never saw it. Here is what I see with your example:
This is with netCDF4.version = '1.5.1.2' . Thank you very much following through with this issue! |
So adding the curl -i "https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GEFS/Global_1p0deg_Ensemble/members/Best.dods?v-component_of_wind_isobaric_ens"
HTTP/1.1 403 403
Date: Wed, 11 Mar 2020 16:08:05 GMT
Server: Apache
X-Frame-Options: SAMEORIGIN
Strict-Transport-Security: max-age=63072000; includeSubdomains;
Access-Control-Allow-Origin: *
XDODS-Server: opendap/3.7
Content-Description: dods-error
Content-Security-Policy: frame-ancestors 'self'
Transfer-Encoding: chunked
Content-Type: text/plain
Error {
code = 403;
message = "Request too big=12216.721864 Mbytes, max=500.0";
}; That's the kind of info the C library should have access to, and that's how the "Note:oc_open: server error ..." message is generated. Adding the @naomi-henderson, do you know what version of netCDF-C you are using? ( |
|
You need to review the RFC for URLs. The Fragment suffix is |
The 403 in the message is the standard HTTP access failure, which is |
100% true, but I still say it isn't intuitive for a lot of users.
First, in the case of netCDF-C and the TDS, we control both sides of the problem, so the least we can do for our users is to tell them something useful, no matter how we make it happen. Second, do we need pattern matching? Can we pass on the plain text of the response message if the status is in the |
The problem being addressed is how to send information down the stack. |
As usual, I am overthinking things.
This is the right solution. I will implement it. |
re: Unidata#1667 Make DAP (2 and 4) forcibly report an error message when an error response is received from the DAP servlet.
Hi, I have been struggling to read some opendap URLs using netcdf-python. I have tried with
versions '1.5.1.2' and '1.4.0', same issue. The same URLs work fine with many other opendap-enabled tools: ncview, plot with panoply and the ncdump and the ncotools work fine, but am having trouble reading into python using the netCDF4-python package.
I raised this issue at Unidata/netcdf4-python#998 and @jswhit
kindly suggested I raise it here "since this is definitely happening in the C library".
Here is an example.
The OPeNDAP version of the data is:
http://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/AS-RCEC/TaiESM1/historical/r1i1p1f1/SImon/siconc/gn/v20200218/siconc_SImon_TaiESM1_historical_r1i1p1f1_gn_185001-201412.nc
or it can downloaded directly from:
http://esgf-data1.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/AS-RCEC/TaiESM1/historical/r1i1p1f1/SImon/siconc/gn/v20200218/siconc_SImon_TaiESM1_historical_r1i1p1f1_gn_185001-201412.nc
The downloaded file ( <150M ) gives me no problems.
This particular OPeNDAP file is global seaice concentration, 'siconc' (time: 1980, latitude: 384, longitude: 320). Although it throws an error when the whole dataset is requested,
it loads fine if the request is for a small enough chunk (in this case the first 1017 of 1980 time slices can be successfully loaded, but the first 1018 cannot). Note that the dataset consists of mostly NaNs (land values) and zeros (clear ocean values). The non-zeros are only at high latitude ocean grid points, so it is highly compressible.
Since the file opens in
ncview
and displays the whole dataset from beginning to end, I am assuming the trouble is not in the OPeNDAP server at "llnl.gov". Is it possible that the netcdf-c package has a check on the expected cache size which cannot handle all of the zeros and NaNs? Any ideas, anyone?python code:
'''
import netCDF4
OPENDAP_url =
'http://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/AS-RCEC/TaiESM1/historical/r1i1p1f1/SImon/siconc/gn/v20200218/siconc_SImon_TaiESM1_historical_r1i1p1f1_gn_185001-201412.nc'
ncds = netCDF4.Dataset(OPENDAP_url)
ncds['siconc'][:] # does not work
ncds['siconc'][:1000] # works
'''
with the following error:
'''
RuntimeError
Traceback (most recent call last)
in
5 i = nc_fid.variables['i'][:]
6 j = nc_fid.variables['j'][:]
----> 7 siconc = nc_fid.variables['siconc'][:,:,:]
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.getitem()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._get()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()
RuntimeError: NetCDF: Access failure
'''
The text was updated successfully, but these errors were encountered: