Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs clarification: /vsicurl?list_dir=no should actually be /vsicurl?empty_dir=yes #7163

Open
scottyhq opened this issue Feb 1, 2023 · 3 comments
Labels
documentation Issues and contributions to the documentation content

Comments

@scottyhq
Copy link
Contributor

scottyhq commented Feb 1, 2023

Expected behavior and actual behavior.

https://gdal.org/user/virtual_file_systems.html#vsicurl-http-https-ftp-files-random-access

Describes the option to not list directories

- list_dir=yes/no: whether an attempt to read the file list of the directory where the file is located should be done. Default to YES.

But looking at log output list_dir=no doesn't do anything and instead empty_dir=yes has the intended affect:

Steps to reproduce the problem.

CPL_DEBUG=ON gdalinfo '/vsicurl?pc_url_signing=yes&list_dir=no&url=https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2021/045/031/LC08_L2SP_045031_20210107_20210307_02_T1/LC08_L2SP_045031_20210107_20210307_02_T1_ST_B10.TIF'

Operating system

OSX

GDAL version and provenance

gdal                      3.6.2           py311h619941e_3    conda-forge
libgdal                   3.6.2                h623d8b8_3    conda-forge
@rouault
Copy link
Member

rouault commented Feb 4, 2023

It does has an effect, but mostly seen when using low level I/O primitives, and not that much with gdalinfo that will try to probe side-car files even if the initial directory listing is disable.

Perhaps this could be rephrased as ?

  • list_dir=yes/no: whether the initial attempt to read the file list of the directory where the file is located should be done at file opening. Default to YES. Note: setting list_dir=no does not prevent higher level logic in GDAL drivers to probe for individual side-ar files. You need to use empty_dir=yes for that.

Compare without list_dir=no, which attemps to do a GET to the directory of the file

$ CPL_CURL_VERBOSE=YES python -c "from osgeo import gdal; f = gdal.VSIFOpenL('/vsicurl?pc_url_signing=yes&url=https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2021/045/031/LC08_L2SP_045031_20210107_20210307_02_T1/LC08_L2SP_045031_20210107_20210307_02_T1_ST_B10.TIF', 'rb')"
* Couldn't find host landsateuwest.blob.core.windows.net in the .netrc file; using defaults
*   Trying 20.150.76.4:443...
* TCP_NODELAY set
* Connected to landsateuwest.blob.core.windows.net (20.150.76.4) port 443 (#0)
* found 376 certificates in /etc/ssl/certs
* ALPN, offering h2
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_256_GCM_SHA384
* 	 server certificate verification OK
* 	 server certificate status verification SKIPPED
* 	 common name: *.blob.core.windows.net (matched)
* 	 server certificate expiration date OK
* 	 server certificate activation date OK
* 	 certificate public key: RSA
* 	 certificate version: #3
* 	 subject: CN=*.blob.core.windows.net
* 	 start date: Sun, 25 Dec 2022 02:12:54 GMT
* 	 expire date: Mon, 25 Dec 2023 02:12:54 GMT
* 	 issuer: C=US,O=Microsoft Corporation,CN=Microsoft RSA TLS CA 02
* ALPN, server did not agree to a protocol
> GET /landsat-c2/level-2/standard/oli-tirs/2021/045/031/LC08_L2SP_045031_20210107_20210307_02_T1/ HTTP/1.1
Host: landsateuwest.blob.core.windows.net
User-Agent: GDAL/3.7.0
Accept: */*

* Mark bundle as not supporting multiuse
< HTTP/1.1 404 The specified resource does not exist.
< Content-Length: 223
< Content-Type: application/xml
< Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
< x-ms-request-id: 1db1b053-101e-0022-178b-381e13000000
< x-ms-version: 2014-02-14
< Access-Control-Expose-Headers: x-ms-request-id,Server,x-ms-version,Content-Length,Date,Transfer-Encoding
< Access-Control-Allow-Origin: *
< Date: Sat, 04 Feb 2023 11:22:58 GMT
< 
[....]

with list_dir=no where the file is directly accessed (actually the URL signing stuff)

$ CPL_CURL_VERBOSE=YES python -c "from osgeo import gdal; f = gdal.VSIFOpenL('/vsicurl?pc_url_signing=yes&list_dir=no&url=https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2021/045/031/LC08_L2SP_045031_20210107_20210307_02_T1/LC08_L2SP_045031_20210107_20210307_02_T1_ST_B10.TIF', 'rb')"
* Couldn't find host planetarycomputer.microsoft.com in the .netrc file; using defaults
*   Trying 2620:1ec:4f:1::42:443...
* TCP_NODELAY set
* Connected to planetarycomputer.microsoft.com (2620:1ec:4f:1::42) port 443 (#0)
* found 376 certificates in /etc/ssl/certs
* ALPN, offering h2
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
* 	 server certificate verification OK
* 	 server certificate status verification SKIPPED
* 	 common name: planetarycomputer.microsoft.com (matched)
* 	 server certificate expiration date OK
* 	 server certificate activation date OK
* 	 certificate public key: RSA
* 	 certificate version: #3
* 	 subject: C=US,ST=Washington,L=Redmond,O=Microsoft Corporation,CN=planetarycomputer.microsoft.com
* 	 start date: Wed, 31 Aug 2022 00:00:00 GMT
* 	 expire date: Wed, 30 Aug 2023 23:59:59 GMT
* 	 issuer: C=US,O=DigiCert Inc,CN=DigiCert TLS RSA SHA256 2020 CA1
* ALPN, server accepted to use h2
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x271da90)
> GET /api/sas/v1/sign?href=https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2021/045/031/LC08_L2SP_045031_20210107_20210307_02_T1/LC08_L2SP_045031_20210107_20210307_02_T1_ST_B10.TIF HTTP/2
Host: planetarycomputer.microsoft.com
user-agent: GDAL/3.7.0
accept: */*
accept-encoding: gzip

* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 200 
< date: Sat, 04 Feb 2023 11:24:25 GMT
< content-type: application/json
< content-length: 531
< strict-transport-security: max-age=15724800; includeSubDomains
< request-context: appId=cid-v1:75161b1b-6883-4b66-9410-715040c44427
< x-azure-ref: 20230204T112425Z-zdmpvd196551r6z8qen6retaf400000001q0000000001t6v
< x-cache: CONFIG_NOCACHE
< accept-ranges: bytes
[...]

Seeing this, if pc_url_signing=yes is set, we should actually likely automatically disable directory listing as it can't work

@scottyhq
Copy link
Contributor Author

scottyhq commented Feb 6, 2023

setting list_dir=no does not prevent higher level logic in GDAL drivers to probe for individual side-car files

Thanks for the clarification @rouault!

if pc_url_signing=yes is set, we should actually likely automatically disable directory listing as it can't work.

Makes sense to me, for what it's worth the Planetary Computer JupyterHub automatically sets GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR.

That said, is there a reason not to reuse the list_dir key and add the additional value option for empty_dir list_dir=yes|no|empty_dir? Just from the docs it's not clear if all of these URL modifiers have corresponding environment variables and override them. Happy to submit a PR to clarify the wording if that is helpful.

@rouault
Copy link
Member

rouault commented Feb 6, 2023

That said, is there a reason not to reuse the list_dir key and add the additional value option for empty_dir list_dir=yes|no|empty_dir?

well, the GDAL_DISABLE_READDIR_ON_OPEN=YES/NO/EMPTY_DIR naming is quite hard to comprehend (double negations, non-boolean value EMPTY_DIR put in something where a boolean is expected from the DISABLE), so the list_dir=yes/no & empty_dir=yes/no split was an (apparently bad) attempt at making things easier to comprehend.

Happy to submit a PR to clarify the wording if that is helpful.

welcome

@rouault rouault added the documentation Issues and contributions to the documentation content label Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Issues and contributions to the documentation content
Projects
None yet
Development

No branches or pull requests

2 participants