Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide granular control to SpatialRDD sampling utils #116

Merged
merged 1 commit into from
Sep 11, 2017
Merged

Provide granular control to SpatialRDD sampling utils #116

merged 1 commit into from
Sep 11, 2017

Conversation

jiayuasu
Copy link
Member

No description provided.

@jiayuasu jiayuasu merged commit 964d3bc into apache:0.8 Sep 11, 2017
Kontinuation added a commit to Kontinuation/sedona that referenced this pull request Mar 21, 2024
Kontinuation added a commit to Kontinuation/sedona that referenced this pull request Mar 21, 2024
Kontinuation added a commit to Kontinuation/sedona that referenced this pull request Mar 22, 2024
jiayuasu pushed a commit that referenced this pull request Mar 22, 2024
* [SEDONA-406] Raster deserializer for PySpark (#116)

* Update documentation

* Add documentation for writing Python UDF to work with raster data
Kontinuation added a commit to Kontinuation/sedona that referenced this pull request Oct 11, 2024
* Refactor the serializer of RasterUDT to use a language and library neutral serialization format

* Changed band data serializer (DataBufferSerializer) to a language neutral format.

* Add tests for serializers for raster components

* out-db raster now uses a language neutral serialization format

* Fixed serializing JAI images with non-zero offset (minX != 0 or minY != 0)

* Refactored constructor of DeepCopiedRenderedImage, removed imageBounds from the serialization format since it can be faithfully constructed from minX, minY, width and height of the image

* A draft design of the PySpark RasterType deserializer

* Implemented affine transformation translation for pixel anchors and image with
non-zero origins.

* Support all band data types

* Add a test constructor for testing python raster deserialization

* Implemented as_numpy for ComponentSampleModel and PixelInterleavedSampleModel

* Implemented deserializer for SinglePixelPackedSampleModel

* Implemented deserializer for MultiPixelPackedSampleModel

* Added RS_MakeRasterForTesting to Sedona Spark SQL. This will be used by python
unit test to test the correctness of raster deserializer

* Switch struct.unpack in a loop with numpy.frombuffer. This is way faster.

* Move the prototype code to sedona package

* Parsed byte sequence of serialized out-db raster. Now we can start implementing
a raster data reader using rasterio WarpedVRT.

* Support reading band data as numpy array using rasterio's WarpVRT

* out-db raster deserializer is fully functioning, though as_rasterio is still slow.

* Added unittests for PySpark raster integration

* Added unit test for raster deserialization

* Fixed CI errors

* Don't shade JAI jars into the sedona-spark shaded jar, otherwise JAI imageio
won't work correctly.

* Added a test case for Pandas UDF with raster as param

* Skip the PandasUDF test if spark version is lower than 3.4

* Minor fixes for formats

* Fix a bug of as_rasterio() when retrieving the number of bands of an in-db
sedona raster

* Added toPandas() to raster serde test cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant