double encoding of opendap URLs (works with 4.6.0 but fails in 4.7.3) #1876

Alexander-Barth · 2020-10-30T10:16:09Z

the version of the software with which you are encountering an issue

netcdf library version 4.7.3

environmental information (i.e. Operating System, compiler info, java version, python version, etc.)

Ubuntu 20.04, gcc 9.3.0

a description of the issue with the steps needed to reproduce it

Accessing this OPENDAP URL, works fine in version netcdf 4.6.0 (from Ubuntu 18.04).

ncdump -h 'http://opendap2.oceanbrowser.net/thredds/dodsC/data/emodnet1-domains/tmp%20test.nc'

This URL with the space encoded as %20 is reported by thredds and I think it is correct.

However, ncdump -h fails in version 4.7.3 with the error NetCDF: file not found

Inspecting the thredds log files shows the servers looks for the file tmp%2520test.nc.dds (%20 is again encoded).

Directly accessing the dds resource works, which makes me think that this issue could be in netcdf-c:

curl 'http://opendap2.oceanbrowser.net/thredds/dodsC/data/emodnet1-domains/tmp%20test.nc.dds'

Thank you very much for your work on NetCDF which I used for more than 15 years!

The text was updated successfully, but these errors were encountered:

DennisHeimbigner · 2020-10-30T17:28:49Z

This has been an ongoing issue because the DAP2 protocol chose to
use URL encoding for its escape mechanism.

The actual URL that is being sent to Thredds is double encoded,
so it looks like this:

http://opendap2.oceanbrowser.net/thredds/dodsC/data/emodnet1-domains/tmp%2520test.nc.dds

My understanding was that Apache and/or Tomcat should do the
initial decode to the URL. This should mean that the DAP
processor in Thredds sees this:

http://opendap2.oceanbrowser.net/thredds/dodsC/data/emodnet1-domains/tmp%20test.nc.dds

The DAP code should decode a second time to give this:

http://opendap2.oceanbrowser.net/thredds/dodsC/data/emodnet1-domains/tmp test.nc.dds

It looks like the first decode by Apache/Tomcat is not happening.

When you use curl, can you tell from the logs, what URL is being
shown in the Thredds log?

I also considered what would happen if you sent the unencoded form:

tmp test.nc.dds

It appears that libcurl (which netcdf uses) encodes this to be:

tmp+test.nc.dds

So that is no help.

I will see if Sean has any insight.

Alexander-Barth · 2020-11-01T20:54:03Z

For your information, when I use curl:

curl 'http://opendap2.oceanbrowser.net/thredds/dodsC/data/emodnet1-domains/tmp%20test.nc.dds'

I see the following in the server logs:

172.17.0.1 - - [01/Nov/2020:20:47:09 +0000] "GET /thredds/dodsC/data/emodnet1-domains/tmp%20test.nc.dds HTTP/1.1" 200 3349

DennisHeimbigner · 2020-11-02T01:36:36Z

That is consistent with the fact that curl does not encode urls by default.

DennisHeimbigner · 2020-11-02T02:57:03Z

This is probably related to this PR #1835
which has this note:

The DAP2 standard specifies the use of URL %xx escaping as also
the escape mechanism for DAP2 itself. This can cause some confusion.
To try to simplify this, the netcdf-c library DAP2 code assumes
that all URL path and constraint information is unescaped. It is assumed
that just before transmission, the constructed URL will be properly escaped.

I notice that one change in that PR was to use %20 instead of + for
encoding spaces. This change has not made it into any release yet.
Additionally, according to the above note, you should not attempt to pre-encode what you give to ncdump.
This is all a bit of mess, but I am stuck with it.

Alexander-Barth · 2020-11-03T09:27:49Z

OK, thanks. I tried out c8a3c51 and ncdump does work when I replace %20 by space:

LD_LIBRARY_PATH=$HOME/opt/netcdf-c8a3c51c/lib/ ~/opt/netcdf-c8a3c51c/bin/ncdump -h 'http://opendap2.oceanbrowser.net/thredds/dodsC/data/emodnet-domains-1/tmp test.nc'
# netcdf library version 4.8.0-development of Nov  3 2020 09:55:54 $

I get, the correct result:

netcdf tmp\ test {
dimensions:
	depth = 13 ;
...

Currently, the "Data URL" thredds OPeNDAP Dataset Access Form presents the encoded URLs (including %20 instead of space). Should thredds therefore present the URL to the user in an unencoded form ?
Or should the user (or higher-level python, julia, ... library) make this substitution?

(Sorry for my confusion).

DennisHeimbigner · 2020-11-03T18:07:48Z

Good question. The problem is the client being used. So it if uses netcdf library,
then thos %'s need changing. But for other clients, e.g. curl, they need to stay.
So from Thredds point of view, there is no right answer.
Frankly I am still unsatisfied with the way that netcdf handles this because
of the confusion. One possibility is that we scan the URL and if any %xx are detected,
we unescape them internally.

Alexander-Barth · 2020-11-04T14:34:16Z

For what it is worth, I checked with the current version of the netCDF4 module in python. It works for the URL-encoded OPeNDAP links, but not with links where the %20 is replaced by a space. It seems that the python module bundles its own version of the NetCDF library and uses currently version 4.6.3. This is thus consistent with ncdump of version 4.6.x.

I am wondering if this old behavior should not be the preferred one. To me its seems the one with the least surprises for the users. But of course, I do not have the same view of the project and I know what kind of problem it creates elsewhere.

Accepting both (URL-encoded and non-URL-encoded strings) might be indeed a good practical solution.

DennisHeimbigner · 2020-11-04T17:54:39Z

I think the problem is the server side. The netcdf library has to generate something
that the server will properly handle.
The original problem in Unidata/netcdf4-python#1041
was that this URL was failing:

http://iridl.ldeo.columbia.edu/SOURCES/.Indices/.soi/.c8110/.anomaly/T/%28Jan%201979%29/VALUE/dods
Can you see if your python still fails on it? How about netcdf-4.6.0?
Unfortunately, we cannot see what the server logs show, so it was unclear
what was happening on the server side.

DennisHeimbigner · 2020-11-04T17:59:50Z

Never mind, I see that server is down either temporary or permanent.

DennisHeimbigner · 2020-11-04T18:35:48Z

In reviewing all this, I am now more confused than ever.
I can understand the columbia server URL problem. That needs special casing.
What I cannot understand is why the oceanbrowser.net server is not
handling the double encoding. It must mean that Apache+Tomcat is not
doing a URL decode.
This is all to much for me. I think Sean's original suggestion was correct.
I am going to add a control parameter that tells netcdf either to encode
or to not-encode the url &/oor query with the default being to not-encode the url part
but to encode the query part. Ugh!

re: Unidata#1876 and: Unidata#1835 and: Unidata/netcdf4-python#1041 The change in PR 1835 was correct with respect to using %20 instead of '+' for encoding blanks. However, it was a mistake to assume everything was unencoded and then to do encoding ourselves. The problem is that different servers do different things, with Columbia being an outlier. So, I have added a set of client controls that can at least give the caller some control over this. The caller can append the following fragment to his URL to control what gets encoded before sending it to the server. The syntax is as follows: ```` https://<host>/<path>/<query>#encode=path|query|all|none ```` The possible values: * path -- URL encode (i.e. %xx encode) as needed in the path part of the URL. * query -- URL encode as needed in the query part of the URL. * all -- equivalent to ````#encode=path,query````. * none -- do not url encode any part of the URL sent to the server; not strictly necessary, so mostly for completeness. Note that if "encode=" is used, then before it is processed, all encoding is turned of so that ````#encode=path```` will only encode the path and not the query. The default is ````#encode=query````, so the path is left untouched, but the query is always encoded. Internally, this required changes to pass the encode flags down into the OC2 library. Misc. Unrelated Changes: * Shut up those irritating warning from putget.m4

DennisHeimbigner · 2020-11-11T18:10:46Z

Mitigated by PR #1880

Alexander-Barth · 2020-11-12T09:33:20Z

Thanks a lot for your patch!

DennisHeimbigner · 2020-11-12T22:22:22Z

I accidentally deleted this branch before I got it merged to master.
Since I would prefer not to recreate it, do you by any chance have a copy of
the branch that you can send me. If nothing else, a zip file
of the checked-out branch would be enough.

DennisHeimbigner · 2020-11-12T22:34:25Z

Never mind; I found a copy on another machine.

DennisHeimbigner mentioned this issue Nov 11, 2020

Yet another fix for DAP2 double URL encoding. #1880

Closed

Alexander-Barth closed this as completed Nov 12, 2020

DennisHeimbigner mentioned this issue Nov 12, 2020

Provide ability to control URL encoding for DAP2 #1884

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

double encoding of opendap URLs (works with 4.6.0 but fails in 4.7.3) #1876

double encoding of opendap URLs (works with 4.6.0 but fails in 4.7.3) #1876

Alexander-Barth commented Oct 30, 2020

DennisHeimbigner commented Oct 30, 2020

Alexander-Barth commented Nov 1, 2020

DennisHeimbigner commented Nov 2, 2020 •

edited

Loading

DennisHeimbigner commented Nov 2, 2020

Alexander-Barth commented Nov 3, 2020

DennisHeimbigner commented Nov 3, 2020

Alexander-Barth commented Nov 4, 2020

DennisHeimbigner commented Nov 4, 2020

DennisHeimbigner commented Nov 4, 2020

DennisHeimbigner commented Nov 4, 2020

DennisHeimbigner commented Nov 11, 2020

Alexander-Barth commented Nov 12, 2020

DennisHeimbigner commented Nov 12, 2020

DennisHeimbigner commented Nov 12, 2020

double encoding of opendap URLs (works with 4.6.0 but fails in 4.7.3) #1876

double encoding of opendap URLs (works with 4.6.0 but fails in 4.7.3) #1876

Comments

Alexander-Barth commented Oct 30, 2020

DennisHeimbigner commented Oct 30, 2020

Alexander-Barth commented Nov 1, 2020

DennisHeimbigner commented Nov 2, 2020 • edited Loading

DennisHeimbigner commented Nov 2, 2020

Alexander-Barth commented Nov 3, 2020

DennisHeimbigner commented Nov 3, 2020

Alexander-Barth commented Nov 4, 2020

DennisHeimbigner commented Nov 4, 2020

DennisHeimbigner commented Nov 4, 2020

DennisHeimbigner commented Nov 4, 2020

DennisHeimbigner commented Nov 11, 2020

Alexander-Barth commented Nov 12, 2020

DennisHeimbigner commented Nov 12, 2020

DennisHeimbigner commented Nov 12, 2020

DennisHeimbigner commented Nov 2, 2020 •

edited

Loading