Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEDONA-493] Update default behavior of RS_NormalizeAll #1234

Merged
merged 6 commits into from
Feb 9, 2024

Conversation

prantogg
Copy link
Contributor

@prantogg prantogg commented Feb 8, 2024

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

What changes were proposed in this PR?

This PR, updates the default behavior of RS_NormalizeAll -

  • A safety mode is introduced for cases where the noDataValue is null, setting it to maxLim and normalizing data values to the range [minLim, maxLim-1] to prevent the loss of valid data that matches the new noDataValue.
  • Additionally, refactors the function signatures to facilitate invoking safety mode for both normalization methods - individually on bands and across bands.

How was this patch tested?

  • Passes new and existing tests

Did this PR include necessary documentation updates?

  • Yes, I have updated the documentation update.

@prantogg prantogg changed the title [SEDONA-493][EWT-586] Update default behavior of RS_NormalizeAll [SEDONA-493] Update default behavior of RS_NormalizeAll Feb 8, 2024
- `minValue` and `maxValue` (Optional): Optionally, specific minimum and maximum values of the input raster can be provided. If not provided, these values are computed from the raster data.
- `normalizeAcrossBands` (Optional): A boolean flag to determine the normalization method. If set to true (default), normalization is performed across all bands based on global min and max values. If false, each band is normalized individually based on its own min and max values.

A safety mode is triggered when `noDataValue` is null, setting it to `maxLim` and normalizing data values to the range [minLim, maxLim-1]. This is to avoid replacing valid data that might coincide with the new `noDataValue`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does your current implementation allow null as the input of a Sedona Spark (not just sedona-common)? Do we have a test?

If yes, then the sentence here should be when noDataValueis null or not given). If no, it should bewhen noDataValue is not given`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an important consideration, thanks!
Current implementation allows noDataValue to be set to null in Sedona Spark as well. Confirmed this by testing it in Sedona-spark.
Have added tests for the same under Sedona-common. Would it be better to add tests under Sedona-spark as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accordingly updated the documentation too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prantogg https://github.com/apache/sedona/blob/master/spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/MapAlgebra.scala#L182

InferredExpression is null safe, which means it returns null directly if one of the input is null. Please double check.

In addition, RS_NormarlizeAll should not be put in MapAlgebra.scala. It should be put in RasterEditors.scala.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I misunderstood

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InferredExpression is indeed null safe. Have updated the docs accordingly and moved RS_NormalizeAll to RasterEditors.

A safety mode is triggered when `noDataValue` is not given. This sets `noDataValue` to `maxLim` and normalizes valid data values to the range [minLim, maxLim-1]. This is to avoid replacing valid data that might coincide with the new `noDataValue`.

!!! Warning
Using a noDataValue that falls within the normalization range can lead to loss of valid data. If any data value within a raster band matches the specified noDataValue, it will be replaced and cannot be distinguished or recovered later. Exercise caution in selecting a noDataValue to avoid unintentional data alteration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no indention here. Doc website won't show this correctly.

@jiayuasu jiayuasu merged commit 89e17c8 into apache:master Feb 9, 2024
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants