-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
double encoding of opendap URLs (works with 4.6.0 but fails in 4.7.3) #1876
Comments
This has been an ongoing issue because the DAP2 protocol chose to The actual URL that is being sent to Thredds is double encoded,
My understanding was that Apache and/or Tomcat should do the
The DAP code should decode a second time to give this:
It looks like the first decode by Apache/Tomcat is not happening. When you use curl, can you tell from the logs, what URL is being I also considered what would happen if you sent the unencoded form:
It appears that libcurl (which netcdf uses) encodes this to be:
So that is no help. I will see if Sean has any insight. |
For your information, when I use curl:
I see the following in the server logs:
|
That is consistent with the fact that curl does not encode urls by default. |
This is probably related to this PR #1835
I notice that one change in that PR was to use %20 instead of + for |
OK, thanks. I tried out c8a3c51 and ncdump does work when I replace %20 by space: LD_LIBRARY_PATH=$HOME/opt/netcdf-c8a3c51c/lib/ ~/opt/netcdf-c8a3c51c/bin/ncdump -h 'http://opendap2.oceanbrowser.net/thredds/dodsC/data/emodnet-domains-1/tmp test.nc'
# netcdf library version 4.8.0-development of Nov 3 2020 09:55:54 $ I get, the correct result:
Currently, the "Data URL" thredds OPeNDAP Dataset Access Form presents the encoded URLs (including %20 instead of space). Should thredds therefore present the URL to the user in an unencoded form ? (Sorry for my confusion). |
Good question. The problem is the client being used. So it if uses netcdf library, |
For what it is worth, I checked with the current version of the netCDF4 module in python. It works for the URL-encoded OPeNDAP links, but not with links where the %20 is replaced by a space. It seems that the python module bundles its own version of the NetCDF library and uses currently version 4.6.3. This is thus consistent with ncdump of version 4.6.x. I am wondering if this old behavior should not be the preferred one. To me its seems the one with the least surprises for the users. But of course, I do not have the same view of the project and I know what kind of problem it creates elsewhere. Accepting both (URL-encoded and non-URL-encoded strings) might be indeed a good practical solution. |
I think the problem is the server side. The netcdf library has to generate something
|
Never mind, I see that server is down either temporary or permanent. |
In reviewing all this, I am now more confused than ever. |
re: Unidata#1876 and: Unidata#1835 and: Unidata/netcdf4-python#1041 The change in PR 1835 was correct with respect to using %20 instead of '+' for encoding blanks. However, it was a mistake to assume everything was unencoded and then to do encoding ourselves. The problem is that different servers do different things, with Columbia being an outlier. So, I have added a set of client controls that can at least give the caller some control over this. The caller can append the following fragment to his URL to control what gets encoded before sending it to the server. The syntax is as follows: ```` https://<host>/<path>/<query>#encode=path|query|all|none ```` The possible values: * path -- URL encode (i.e. %xx encode) as needed in the path part of the URL. * query -- URL encode as needed in the query part of the URL. * all -- equivalent to ````#encode=path,query````. * none -- do not url encode any part of the URL sent to the server; not strictly necessary, so mostly for completeness. Note that if "encode=" is used, then before it is processed, all encoding is turned of so that ````#encode=path```` will only encode the path and not the query. The default is ````#encode=query````, so the path is left untouched, but the query is always encoded. Internally, this required changes to pass the encode flags down into the OC2 library. Misc. Unrelated Changes: * Shut up those irritating warning from putget.m4
Mitigated by PR #1880 |
Thanks a lot for your patch! |
I accidentally deleted this branch before I got it merged to master. |
Never mind; I found a copy on another machine. |
netcdf library version 4.7.3
Ubuntu 20.04, gcc 9.3.0
Accessing this OPENDAP URL, works fine in version netcdf 4.6.0 (from Ubuntu 18.04).
ncdump -h 'http://opendap2.oceanbrowser.net/thredds/dodsC/data/emodnet1-domains/tmp%20test.nc'
This URL with the space encoded as %20 is reported by thredds and I think it is correct.
However,
ncdump -h
fails in version 4.7.3 with the errorNetCDF: file not found
Inspecting the thredds log files shows the servers looks for the file
tmp%2520test.nc.dds
(%20 is again encoded).Directly accessing the dds resource works, which makes me think that this issue could be in netcdf-c:
curl 'http://opendap2.oceanbrowser.net/thredds/dodsC/data/emodnet1-domains/tmp%20test.nc.dds'
Thank you very much for your work on NetCDF which I used for more than 15 years!
The text was updated successfully, but these errors were encountered: