Skip to content
This repository has been archived by the owner on Sep 1, 2022. It is now read-only.

Exception opening remote GRIB datasets #797

Open
rschmunk opened this issue Apr 8, 2017 · 6 comments
Open

Exception opening remote GRIB datasets #797

rschmunk opened this issue Apr 8, 2017 · 6 comments

Comments

@rschmunk
Copy link
Contributor

rschmunk commented Apr 8, 2017

I am encountering an exception trying to use 4.6.8 to open some remote GRIB-2 datasets. There is no problem if I download a dataset to disk and open it locally. This doesn't seem to be a server problem, as I have uploaded it to a local server and tried remote-loading from there.

The failing call to acquire the dataset looks like:

NetcdfDataset.acquireDataset ("http://ftp.opc.ncep.noaa.gov/data/ascat_ab/as_swath_1704021232.grb2", true, null);

and the stack trace is

Caused by: java.lang.IllegalStateException: No records found in dataset latest_ascat_swath_data.grb2
at ucar.nc2.grib.collection.Grib2CollectionBuilder.makeGroups(Grib2CollectionBuilder.java:155)
at ucar.nc2.grib.collection.GribCollectionBuilder.createMultipleRuntimeCollections(GribCollectionBuilder.java:150)
at ucar.nc2.grib.collection.GribCollectionBuilder.createIndex(GribCollectionBuilder.java:138)
at ucar.nc2.grib.collection.GribCdmIndex.openGribCollectionFromDataFile(GribCdmIndex.java:780)
at ucar.nc2.grib.collection.GribCdmIndex.openGribCollectionFromDataFile(GribCdmIndex.java:764)
at ucar.nc2.grib.collection.GribCdmIndex.openGribCollectionFromRaf(GribCdmIndex.java:734)
at ucar.nc2.grib.collection.GribIosp.open(GribIosp.java:213)
at ucar.nc2.NetcdfFile.(NetcdfFile.java:1560)
at ucar.nc2.NetcdfFile.open(NetcdfFile.java:835)
at ucar.nc2.NetcdfFile.open(NetcdfFile.java:424)
... 14 more

Doing some further tests using a local server, it appears that same exception occurs with GRIB-1 datasets. No trouble loading remote netCDF or HDF datasets.

@cofinoa
Copy link
Contributor

cofinoa commented Apr 8, 2017

@msdsoftware the actual problem is an previous exception:

SEVERE: Grib2CollectionBuilder as_swath_1704021232.grb2 : reading/Creating gbx9 index for file http:/ftp.opc.ncep.noaa.gov/data/ascat_ab/as_swath_1704021232.grb2 failed
java.io.FileNotFoundException: http:/ftp.opc.ncep.noaa.gov/data/ascat_ab/as_swath_1704021232.grb2 (No such file or directory)

to read GRIB1/2 files the netcdf-java needs to create some auxiliary files and if it's not found it creates them locally.

In netcdf-java v4.3.23 [1] it's neither working but in v4.2 [2] it's working

The cache strategy for remote grib files has been broken in v4.3 onwards but I'm not sure if this was the intention.

[1] ftp://ftp.unidata.ucar.edu/pub/netcdf-java/v4.3/
[2] ftp://ftp.unidata.ucar.edu/pub/netcdf-java/v4.2/

@rschmunk
Copy link
Contributor Author

rschmunk commented Apr 9, 2017

Ah, right you are, @cofinoa. There was lot of stack trace dumped, much of it specific to my own code. I thought I extracted the important part to post here, but on going back and trying another example, I see I also get

000006 SEVERE: Grib2CollectionBuilder makeGroups - Grib2CollectionBuilder as_swath_1704031257.grb2 : reading/Creating gbx9 index for file http:/ftp.opc.ncep.noaa.gov/data/ascat_ab/as_swath_1704031257.grb2 failed
000007 WARNING: Grib2CollectionBuilder makeGroups - No records found in files. Check Grib1/Grib2 for collection as_swath_1704031257.grb2. If wrong, delete gbx9.

@cofinoa
Copy link
Contributor

cofinoa commented Apr 13, 2017

@msdsoftware I can't find a inmediate workaround to fix this.

The isValidFile method on the GRIB IOSP is able to identify the remote file as GRIB2:

https://github.com/Unidata/thredds/blob/v4.6.8/grib/src/main/java/ucar/nc2/grib/collection/Grib2Iosp.java#L357

https://github.com/Unidata/thredds/blob/v4.6.8/grib/src/main/java/ucar/nc2/grib/collection/GribCdmIndex.java#L160

but open method in the IOSP is breaking this assuming that is local when File object is been used:
https://github.com/Unidata/thredds/blob/v4.6.8/grib/src/main/java/ucar/nc2/grib/collection/GribIosp.java#L200
https://github.com/Unidata/thredds/blob/v4.6.8/grib/src/main/java/ucar/nc2/grib/collection/GribCdmIndex.java#L757

in fact the actual problem is when the netcdf-java tries to build the gbx and ncx files because it's creating a new local RandomAccessFile for the remote GRIB intead a HTTPRandomAccessFile which is in origin the object been passed to isValid and open methods.

As I mentioned in my previous message this was working on netcdf-java v4.2 but not on from v4.3 to v4.6

@lesserwhirls
Copy link
Collaborator

Since GRIB files cannot be treated as random access files until they are indexed, and since you need to read the entire contents of the file in order to index it, does it make sense to try to remotely read the file or download it locally first (you basically have to do this to index it)?

@cofinoa
Copy link
Contributor

cofinoa commented Apr 13, 2017

@lesserwhirls you don't need to read the entire file to index the GRIB file. Only header sections for the GRIB messages are requiered. But you'll need to do it for every message in the file.

This is like (sort of) using record dimensions (unlimited) in netCDF3 served as HTTP endpoints. You will need to query every slice on the netCDF3 file to discover the value for the coordinated variabe in the file (typically time).

IMO this is something that we would need to recover from version v4.2.

Moreover, after debuging the IOSP code, if the .gbx and .ncx files are provided on the HTTP side-by-side to GRIB file, the IOSP will use them. (this need to be tested).

My approach would be at least like in the compressed approach one, if it's neccesary download the complete GRIB file ... store in the DiskCache the gbx and ncx file and then then purged from the cache, the grb when is required. Then you will have already indexed, locally, the remote grb file if the indexed are not availiable on the remote side.

The problem that I was having to debug this is that the code that is triggering the Exception is too deep in the stack and is happening when the IOSP re-opening again the file instead of using the already RAF object been passed by the NetcdfFile class to the IOSP in the open method. Here is the problem:
https://github.com/Unidata/thredds/blob/v4.6.8/grib/src/main/java/ucar/nc2/grib/grib2/Grib2Index.java#L240

@lesserwhirls
Copy link
Collaborator

@cofinoa - you're correct in that we don't have to stream through the data blocks, just the headers.

I have not tried to remotely read a GRIB file from a server via http when the gbx and ncx files live side-by-side and are also exposed via http. It would be pretty neat if that worked.

I don't like the idea of breaking things, and the fact that this worked in 4.2 but not since 4.3 gives me a little pause. I'm not sure if it was dropped on purpose, or just an oversight on getting the big GRIB refactor done (4.3, 4.4, 4.5, and 4.6 each had their own refactor of GRIB). At any rate, I will try to take a look tomorrow to see if anything jumps out at me. If it looks like it will take quite a bit of work, then it will likely have to wait until we get the first 5.0 release out the door.

Thanks for digging into this!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants