Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VSI_CACHE causing bad range requests in some circumstance (/vsizip//vsis3/) #9658

Closed
olsen232 opened this issue Apr 11, 2024 · 3 comments · Fixed by #9669
Closed

VSI_CACHE causing bad range requests in some circumstance (/vsizip//vsis3/) #9658

olsen232 opened this issue Apr 11, 2024 · 3 comments · Fixed by #9669
Assignees
Labels

Comments

@olsen232
Copy link

olsen232 commented Apr 11, 2024

What is the bug?

I have a 4211788 byte (4MiB) zip file containing a ~10MiB tiff file in an S3 bucket.
I'm using gdal from python to open the last tiff file in the zip file.
The following works perfectly for me:
gdal.Open("/vsizip//vsis3/cdn-misc.kx.gd/foss-tickets/file_example_TIFF_10MB.zip", gdal.GA_ReadOnly)
However if I first turn on the VSI cache with gdal.SetConfigOption("VSI_CACHE", "TRUE")
Then it fails with:
S3: Request at offset 4227072, after end of file
Note that this is after the end of the file, and that it corresponds to the start of the 129th cache block - VSI cache blocks default to 32KiB in size.
So, it looks like the VSI cache is for some reason requesting a block outside the file.

I found out this works with /vsizip//vsicurl/ too, not just /vsizip/vsis3/

Steps to reproduce the issue

I've only figured out how to show this from python so far.

from osgeo import gdal

gdal.UseExceptions()

opts = {
    'VSI_CACHE': 'TRUE',
    'CPL_DEBUG': 'ON',
}

for k, v in opts.items():
    gdal.SetConfigOption(k, v)

# Use either /vsizip//vsis3/
path = '/vsizip//vsis3/cdn-misc.kx.gd/foss-tickets/file_example_TIFF_10MB.zip'
# or /vsizip//vsicurl/
path = '/vsizip//vsicurl/https://cdn-misc.kx.gd/foss-tickets/file_example_TIFF_10MB.zip'

gdal_tile = gdal.Open(path, gdal.GA_ReadOnly)
gdal_tile = None

You should see one or other of the following:
S3: Request at offset 4227072, after end of file
VSICURL: Request at offset 4227072, after end of file

I think in this case the program succeeds anyway in spite of trying to read beyond the end of the file, but I'm not sure if that's always the case.

Versions and provenance

macOS Sonoma 14.3.1
GDAL 3.8.5

Additional context

Wasn't too hard to make a new test file that exhibits this behaviour, so I think this must be affecting the majority of zip files, at least when the vsi-cache is on and the user tries to access the last file in the zip.

@elpaso elpaso added the bug label Apr 11, 2024
@olsen232
Copy link
Author

Updated with steps-to-reproduce - file_example_TIFF_10MB.zip should be accessible at the URLs provided

@elpaso elpaso self-assigned this Apr 15, 2024
@elpaso
Copy link
Collaborator

elpaso commented Apr 15, 2024

I looked at the code an this seems a false positive to me: the VSIZIP code tries to read chunk of data adding a Z_BUFSIZE of 65536 bytes, the last read request starts from (uncompressed) offset of 10147800 (which is in valid range of the uncompressed files size 10151438) but by adding the Z_BUFSIZE it lands after the file end, the cache filling code tries to honor the request asking CURL to download the data but CURL cowardly refuses to download data out of the (known) range and emits the error.

I am looking at a way to suppress this warning when possible.

@rouault
Copy link
Member

rouault commented Apr 15, 2024

but by adding the Z_BUFSIZE it lands after the file end,

it should probably not try to read extra bytes if the previous Read() returned less than Z_BUFSIZE bytes

elpaso added a commit to elpaso/gdal that referenced this issue Apr 15, 2024
Do not read past EOF even if the client (e.g. VSIZIP) code say so.

Fix OSGeo#9658
olsen232 pushed a commit to koordinates/gdal that referenced this issue Apr 15, 2024
Do not read past EOF even if the client (e.g. VSIZIP) code say so.

Fix OSGeo#9658
rouault pushed a commit that referenced this issue Apr 16, 2024
Do not read past EOF even if the client (e.g. VSIZIP) code say so.

Fix #9658
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants