-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SEDONA-549] Make RS_Union_aggr support combining all bands of multi-band rasters #1375
Changes from 14 commits
8d8f174
f669d03
635ac12
5e131f3
075392a
72260c7
a5147fd
f7d52cc
6d652d2
556b92e
f5089ff
4ba3de3
aaa8b6d
63f9598
48f2fc2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,9 @@ | ||
## RS_Union_Aggr | ||
|
||
Introduction: Returns a raster containing bands by specified indexes from all rasters in the provided column. Extracts the first bands from each raster and combines them into the output raster based on the input index values. | ||
Introduction: This function combines multiple rasters into a single multiband raster by stacking the bands of each input raster sequentially. The function arranges the bands in the output raster according to the order specified by the index column in the input. It is typically used in scenarios where rasters are grouped by certain criteria (e.g., time and/or location) and an aggregated raster output is desired. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this change the behavior of the old function? The old function only takes the first band of all input raster. Now it takes all bands of each raster? @prantogg There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it essentially stacks all bands of each raster in order of Input dataframe -
resultant raster -
|
||
|
||
!!!Note | ||
RS_Union_Aggr can take multiple banded rasters as input, but it would only extract the first band to the resulting raster. RS_Union_Aggr expects the following input, if not satisfied then will throw an IllegalArgumentException: | ||
RS_Union_Aggr expects the following input, if not satisfied then will throw an IllegalArgumentException: | ||
|
||
- Indexes to be in an arithmetic sequence without any gaps. | ||
- Indexes to be unique and not repeated. | ||
|
@@ -13,30 +13,55 @@ Format: `RS_Union_Aggr(A: rasterColumn, B: indexColumn)` | |
|
||
Since: `v1.5.1` | ||
|
||
SQL Example | ||
SQL Example: | ||
|
||
Contents of `raster_table`. | ||
First, define a window specification that partitions by geographic location and orders by time. This will prepare your data by assigning an index to each raster based on its timestamp within each location group. | ||
|
||
``` | ||
+------------------------------+-----+ | ||
| raster|index| | ||
+------------------------------+-----+ | ||
|GridCoverage2D["geotiff_cov...| 1| | ||
|GridCoverage2D["geotiff_cov...| 2| | ||
|GridCoverage2D["geotiff_cov...| 3| | ||
|GridCoverage2D["geotiff_cov...| 4| | ||
|GridCoverage2D["geotiff_cov...| 5| | ||
+------------------------------+-----+ | ||
val windowSpec = Window.partitionBy("geometry").orderBy("timestamp") | ||
val indexedRasters = df.withColumn("index", row_number().over(windowSpec)) | ||
|
||
indexedRasters.show() | ||
``` | ||
|
||
The indexed rasters will appear as follows, showing that each raster is tagged with a sequential index (ordered by timestamp) within its group (geometry). | ||
|
||
``` | ||
+-------------------+------------------------------+--------------+-----+ | ||
| timestamp| raster| geometry|index| | ||
+-------------------+------------------------------+--------------+-----+ | ||
|2021-01-01T00:00:00|GridCoverage2D["geotiff_cov...|POINT (72 120)| 1| | ||
|2021-01-02T00:00:00|GridCoverage2D["geotiff_cov...|POINT (72 120)| 2| | ||
|2021-01-03T00:00:00|GridCoverage2D["geotiff_cov...|POINT (72 120)| 3| | ||
|2021-01-04T00:00:00|GridCoverage2D["geotiff_cov...|POINT (72 120)| 4| | ||
|2021-01-05T00:00:00|GridCoverage2D["geotiff_cov...|POINT (72 120)| 5| | ||
|2021-01-02T00:00:00|GridCoverage2D["geotiff_cov...|POINT (84 132)| 1| | ||
|2021-01-03T00:00:00|GridCoverage2D["geotiff_cov...|POINT (84 132)| 2| | ||
|2021-01-04T00:00:00|GridCoverage2D["geotiff_cov...|POINT (84 132)| 3| | ||
|2021-01-05T00:00:00|GridCoverage2D["geotiff_cov...|POINT (84 132)| 4| | ||
|2021-01-06T00:00:00|GridCoverage2D["geotiff_cov...|POINT (84 132)| 5| | ||
|2021-01-07T00:00:00|GridCoverage2D["geotiff_cov...|POINT (84 132)| 6| | ||
+-------------------+------------------------------+--------------+-----+ | ||
``` | ||
|
||
To create a stacked raster by grouping on geometry. | ||
|
||
``` | ||
SELECT RS_Union_Aggr(raster, index) FROM raster_table | ||
SELECT geometry, RS_Union_Aggr(raster, index) AS raster, RS_NumBands(raster) AS Num_Bands | ||
FROM indexedRasters | ||
WHERE index <= 4 | ||
GROUP BY geometry | ||
``` | ||
|
||
Output: | ||
|
||
This output raster contains the first band of each raster in the `raster_table` at specified index. | ||
The query yields rasters grouped by geometry, each containing the first four time steps combined into a single multiband raster, where each band represents one time step. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is a great example and sufficient for merging. but can it be followed up by a PR showing how to groupby different time intervals, instead of or in addition to taking the first four timesteps? For example, we might want to group by a single year, then take the first 4 timesteps in that specific year. Or, we might also want to sample scenes at a monthly, or year-quarterly frequency. Would be good to show an example of group by interval that matches pandas and sample interval pandas example:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks Ryan! I see your point, this is definitely a more realistic example. I've updated the example to show a groupby over a quarterly time interval in addition to geometry. |
||
|
||
``` | ||
GridCoverage2D["geotiff_coverage", GeneralEnvel... | ||
+--------------+--------------------+---------+ | ||
| geometry| raster|Num_Bands| | ||
+--------------+--------------------+---------+ | ||
|POINT (72 120)|GridCoverage2D["g...| 4| | ||
|POINT (84 132)|GridCoverage2D["g...| 4| | ||
+--------------+--------------------+---------+ | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide some examples of using this function to stack rasters, using at least the example you are discussing in slack for two RGB rasters? I'm unclear on how the index column controls stacking.