Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEDONA-549] Make RS_Union_aggr support combining all bands of multi-band rasters #1375

Merged
merged 15 commits into from
Apr 29, 2024

Conversation

prantogg
Copy link
Contributor

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

What changes were proposed in this PR?

  • Allows combining all bands of multi-band rasters when aggregating using RS_Union_aggr

How was this patch tested?

  • Passes new and existing tests

Did this PR include necessary documentation updates?

  • Yes, I have updated the documentation.

@@ -1,9 +1,9 @@
## RS_Union_Aggr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide some examples of using this function to stack rasters, using at least the example you are discussing in slack for two RGB rasters? I'm unclear on how the index column controls stacking.

@prantogg prantogg requested a review from rbavery April 26, 2024 23:57
@prantogg prantogg marked this pull request as ready for review April 27, 2024 00:55
@prantogg prantogg requested a review from jiayuasu as a code owner April 27, 2024 00:55
@@ -1,9 +1,9 @@
## RS_Union_Aggr

Introduction: Returns a raster containing bands by specified indexes from all rasters in the provided column. Extracts the first bands from each raster and combines them into the output raster based on the input index values.
Introduction: This function combines multiple rasters into a single multiband raster by stacking the bands of each input raster sequentially. The function arranges the bands in the output raster according to the order specified by the index column in the input. It is typically used in scenarios where rasters are grouped by certain criteria (e.g., time or location) and an aggregated raster output is desired.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change to time and/or location? Since depending on how they have filtered that dataset before the groupby operation, they may need to groupby both location and time in order to get, for example, exactly 4 time steps that span two years for the same geographic location.

```

Output:

This output raster contains the first band of each raster in the `raster_table` at specified index.
This output raster contains all bands of each raster in the `raster_table`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we show a groupby example as well? Grouping by time and geography, similar to what we are doing with the segmentation dataset prep?

@prantogg prantogg requested a review from rbavery April 29, 2024 16:27
```

Output:

This output raster contains the first band of each raster in the `raster_table` at specified index.
The query yields rasters grouped by geometry, each containing the first four time steps combined into a single multiband raster, where each band represents one time step.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great example and sufficient for merging. but can it be followed up by a PR showing how to groupby different time intervals, instead of or in addition to taking the first four timesteps?

For example, we might want to group by a single year, then take the first 4 timesteps in that specific year.

Or, we might also want to sample scenes at a monthly, or year-quarterly frequency. Would be good to show an example of group by interval that matches pandas and sample interval

pandas example:

import pandas as pd
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)

# Group by year and sample every quarter
quarterly_sample = df.resample('A').asfreq('Q')

# Group by year and sample every month
monthly_sample = df.resample('A').asfreq('M')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ryan! I see your point, this is definitely a more realistic example. I've updated the example to show a groupby over a quarterly time interval in addition to geometry.

@prantogg prantogg requested a review from rbavery April 29, 2024 19:07
Copy link
Contributor

@rbavery rbavery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great thanks for the doc updates!

@@ -1,9 +1,9 @@
## RS_Union_Aggr

Introduction: Returns a raster containing bands by specified indexes from all rasters in the provided column. Extracts the first bands from each raster and combines them into the output raster based on the input index values.
Introduction: This function combines multiple rasters into a single multiband raster by stacking the bands of each input raster sequentially. The function arranges the bands in the output raster according to the order specified by the index column in the input. It is typically used in scenarios where rasters are grouped by certain criteria (e.g., time and/or location) and an aggregated raster output is desired.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change the behavior of the old function? The old function only takes the first band of all input raster. Now it takes all bands of each raster? @prantogg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it essentially stacks all bands of each raster in order of index.

Input dataframe -

idx |   Raster
--------------------------------
1    |   raster1 (R1 | G1 | B1)
2    |   raster2 (R2 | G2 | B2)
3    |   raster3 (R3 | G3 | B3)
RS_Union_aggr(Raster, idx)

resultant raster -

raster (R1 | G1 | B1 | R2 | G2 | B2 | R3 | G3 | B3 )

@jiayuasu jiayuasu added this to the sedona-1.6.0 milestone Apr 29, 2024
@jiayuasu jiayuasu merged commit 170871a into apache:master Apr 29, 2024
38 of 50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants