-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SEDONA-549] Make RS_Union_aggr support combining all bands of multi-band rasters #1375
[SEDONA-549] Make RS_Union_aggr support combining all bands of multi-band rasters #1375
Conversation
# Conflicts: # spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/AggregateFunctions.scala
@@ -1,9 +1,9 @@ | |||
## RS_Union_Aggr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide some examples of using this function to stack rasters, using at least the example you are discussing in slack for two RGB rasters? I'm unclear on how the index column controls stacking.
@@ -1,9 +1,9 @@ | |||
## RS_Union_Aggr | |||
|
|||
Introduction: Returns a raster containing bands by specified indexes from all rasters in the provided column. Extracts the first bands from each raster and combines them into the output raster based on the input index values. | |||
Introduction: This function combines multiple rasters into a single multiband raster by stacking the bands of each input raster sequentially. The function arranges the bands in the output raster according to the order specified by the index column in the input. It is typically used in scenarios where rasters are grouped by certain criteria (e.g., time or location) and an aggregated raster output is desired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change to time and/or location? Since depending on how they have filtered that dataset before the groupby operation, they may need to groupby both location and time in order to get, for example, exactly 4 time steps that span two years for the same geographic location.
``` | ||
|
||
Output: | ||
|
||
This output raster contains the first band of each raster in the `raster_table` at specified index. | ||
This output raster contains all bands of each raster in the `raster_table`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we show a groupby example as well? Grouping by time and geography, similar to what we are doing with the segmentation dataset prep?
``` | ||
|
||
Output: | ||
|
||
This output raster contains the first band of each raster in the `raster_table` at specified index. | ||
The query yields rasters grouped by geometry, each containing the first four time steps combined into a single multiband raster, where each band represents one time step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great example and sufficient for merging. but can it be followed up by a PR showing how to groupby different time intervals, instead of or in addition to taking the first four timesteps?
For example, we might want to group by a single year, then take the first 4 timesteps in that specific year.
Or, we might also want to sample scenes at a monthly, or year-quarterly frequency. Would be good to show an example of group by interval that matches pandas and sample interval
pandas example:
import pandas as pd
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
# Group by year and sample every quarter
quarterly_sample = df.resample('A').asfreq('Q')
# Group by year and sample every month
monthly_sample = df.resample('A').asfreq('M')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ryan! I see your point, this is definitely a more realistic example. I've updated the example to show a groupby over a quarterly time interval in addition to geometry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great thanks for the doc updates!
@@ -1,9 +1,9 @@ | |||
## RS_Union_Aggr | |||
|
|||
Introduction: Returns a raster containing bands by specified indexes from all rasters in the provided column. Extracts the first bands from each raster and combines them into the output raster based on the input index values. | |||
Introduction: This function combines multiple rasters into a single multiband raster by stacking the bands of each input raster sequentially. The function arranges the bands in the output raster according to the order specified by the index column in the input. It is typically used in scenarios where rasters are grouped by certain criteria (e.g., time and/or location) and an aggregated raster output is desired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change the behavior of the old function? The old function only takes the first band of all input raster. Now it takes all bands of each raster? @prantogg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it essentially stacks all bands of each raster in order of index
.
Input dataframe -
idx | Raster
--------------------------------
1 | raster1 (R1 | G1 | B1)
2 | raster2 (R2 | G2 | B2)
3 | raster3 (R3 | G3 | B3)
RS_Union_aggr(Raster, idx)
resultant raster -
raster (R1 | G1 | B1 | R2 | G2 | B2 | R3 | G3 | B3 )
Did you read the Contributor Guide?
Is this PR related to a JIRA ticket?
[SEDONA-XXX] my subject
.What changes were proposed in this PR?
How was this patch tested?
Did this PR include necessary documentation updates?