Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autotest: Add a VSIFile wrapper class #8222

Merged
merged 5 commits into from
May 30, 2024

Conversation

dbaston
Copy link
Member

@dbaston dbaston commented Aug 14, 2023

What does this PR do?

This PR adds a gdal.vsi_open() function that returns a file-like object, allowing virtual files to be used as "regular" Python files in some cases. For example, a zipped CSV can be read using the Python csv module:

with gdal.vsi_open(f'/vsizip/prime_meridian.csv') as f:
    records = [x for x in csv.DictReader(f)]

assert records[2]['INFORMATION_SOURCE'] == "Institut Geographique National (IGN), Paris"

Still need to correctly handle different linebreak/OS combinations, ensure errors are raised when accessing a closed file, handle append mode (or raise error), and probably other things.

What are related issues/pull requests?

Tasklist

  • Add test case(s)
  • Add documentation
  • Updated Python API documentation (swig/include/python/docs/)
  • Review
  • Adjust for comments
  • All CI builds and checks have passed

@coveralls
Copy link
Collaborator

coveralls commented Aug 14, 2023

Coverage Status

coverage: 69.146% (-0.002%) from 69.148%
when pulling 3e747a9 on dbaston:python-vsifile-class
into 0890073 on OSGeo:master.

self._binary = "b" in mode
self._encoding = encoding

self._fp = VSIFOpenL(self._path, self._mode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably throw an exception is None is returned

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, also switched to VSIFOpenExL to capture the error message

return self

def __next__(self):
chunk_size = 1024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably that CPLReadLineL() could be used to avoid this reimplementatoin

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@rcoup
Copy link
Member

rcoup commented Aug 16, 2023

If it's helpful, here's an implementation we have been using at Koordinates for a while, with a set of tests: feel free to crib anything that's useful. I've licensed it as MIT. Any questions most welcome.

https://gist.github.com/rcoup/5f4b49149ca26084eed4dccfd1b12895

  • was written quite a while ago — I would guess it could be simpler today incorporating some stdlib updates in recent Python versions and/or other bugfixes. (eg: not having open() essentially C&P)
  • Inheriting from io.RawIOBase means it can fit under/into the standard python buffered & text IO wrapper classes
  • I've manually removed/replaced/inlined use of a handful of internal library functions & test setup. If there's anything you get stuck on please ask.

@dbaston
Copy link
Member Author

dbaston commented Aug 16, 2023

Fantastic, thank you!

@dbaston
Copy link
Member Author

dbaston commented Apr 17, 2024

I have not forgotten about this. I'm leaning towards moving it to gdaltest to focus on internal use for now.

Copy link

The GDAL project highly values your contribution and would love to see this work merged! Unfortunately this PR has not had any activity in the last 28 days and is being automatically marked as "stale". If you think this pull request should be merged, please check

  • that all unit tests are passing

  • that all comments by reviewers have been addressed

  • that there is enough information for reviewers, in particular link
    to any issues which this pull request fixes

  • that you have written unit tests where possible
    In case you should have any uncertainty, please leave a comment and we will be happy to help you proceed with this pull request.
    If there is no further activity on this pull request, it will be closed in 2 weeks.

@github-actions github-actions bot added the stale label May 16, 2024
@dbaston dbaston force-pushed the python-vsifile-class branch 2 times, most recently from 8532886 to 6a5158c Compare May 21, 2024 13:26
@dbaston dbaston changed the title WIP: Python bindings: Add a VSIFile wrapper class autotest: Add a VSIFile wrapper class May 21, 2024
@github-actions github-actions bot removed the stale label May 22, 2024
gdal.VSIFWriteL(x, 1, len(x), self._fp)

def seek(self, offset, whence=0):
gdal.VSIFSeekL(self._fp, offset, whence)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to be careful here. Due to VSIFSeekL() taking only unsigned offsets, negative seeking with SEEK_CUR must be emulated doing VSIFSeekL(fp, VSIFTellL(fp) + offset, SEEK_SET)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should the emulation happen?

  • VSIFile.seek
  • gdal.VSIFSeekL (via %pythonprepend%)
  • gdal.VSIFSeekL (via a new wrapper function in cpl_i. The signature here already uses signed offsets, so it seems reasonable that it should handle them correctly)

I'm still scratching my head a bit as to why the tests using negative seek work correctly as-is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • gdal.VSIFSeekL (via a new wrapper function in cpl_i. The signature here already uses signed offsets, so it seems reasonable that it should handle them correctly)

sounds good here. By the way, I've just realized yesterday, that is is not needed to use a "wrapper_foo" + %rename tricks. You can directly create a inline function with the target name. cf 5bcbc5c#diff-56ee4d91557b7ba46f850e46abbdc73e617c981c662a47a27d00eaef97136061R972

I'm still scratching my head a bit as to why the tests using negative seek work correctly as-is.

I suppose that the negative offsets get turned into unsigned values within SWIG .cpp code, and then unsigned overflow on additions within the virtual file system implementations accidentally lead to the correct result. But we'd better not rely on that. In particular the CI sanitize target and our configuration of OSSFuzz errors out on unsigned integer overflows, as they are most of the time bugs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added code to handle negative values in afe7ca8. I think the wrapper_ and %rename are needed in this case because the function name exposed by SWIG is the same as the underlying function name.

assert type(x) is str
x = x.encode(self._encoding)

gdal.VSIFWriteL(x, 1, len(x), self._fp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably assert the returned value to be len(x) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, and to be consistent, we should check the return code of VSIFSeekL and throw an exception on failure?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check the return code of VSIFSeekL and throw an exception on failure?

sounds good

autotest/pymod/gdaltest.py Outdated Show resolved Hide resolved
Co-authored-by: Even Rouault <even.rouault@spatialys.com>
@rouault rouault added this to the 3.10.0 milestone May 30, 2024
@rouault rouault merged commit 5394ddf into OSGeo:master May 30, 2024
32 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants