-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SEDONA-493] Update default behavior of RS_NormalizeAll #1234
Conversation
docs/api/sql/Raster-operators.md
Outdated
- `minValue` and `maxValue` (Optional): Optionally, specific minimum and maximum values of the input raster can be provided. If not provided, these values are computed from the raster data. | ||
- `normalizeAcrossBands` (Optional): A boolean flag to determine the normalization method. If set to true (default), normalization is performed across all bands based on global min and max values. If false, each band is normalized individually based on its own min and max values. | ||
|
||
A safety mode is triggered when `noDataValue` is null, setting it to `maxLim` and normalizing data values to the range [minLim, maxLim-1]. This is to avoid replacing valid data that might coincide with the new `noDataValue`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does your current implementation allow null
as the input of a Sedona Spark (not just sedona-common)? Do we have a test?
If yes, then the sentence here should be when
noDataValueis null or not given). If no, it should be
when noDataValue
is not given`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an important consideration, thanks!
Current implementation allows noDataValue
to be set to null
in Sedona Spark as well. Confirmed this by testing it in Sedona-spark.
Have added tests for the same under Sedona-common. Would it be better to add tests under Sedona-spark as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accordingly updated the documentation too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prantogg https://github.com/apache/sedona/blob/master/spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/MapAlgebra.scala#L182
InferredExpression is null safe, which means it returns null directly if one of the input is null. Please double check.
In addition, RS_NormarlizeAll
should not be put in MapAlgebra.scala
. It should be put in RasterEditors.scala
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, I misunderstood
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InferredExpression is indeed null safe. Have updated the docs accordingly and moved RS_NormalizeAll
to RasterEditors.
docs/api/sql/Raster-operators.md
Outdated
A safety mode is triggered when `noDataValue` is not given. This sets `noDataValue` to `maxLim` and normalizes valid data values to the range [minLim, maxLim-1]. This is to avoid replacing valid data that might coincide with the new `noDataValue`. | ||
|
||
!!! Warning | ||
Using a noDataValue that falls within the normalization range can lead to loss of valid data. If any data value within a raster band matches the specified noDataValue, it will be replaced and cannot be distinguished or recovered later. Exercise caution in selecting a noDataValue to avoid unintentional data alteration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no indention here. Doc website won't show this correctly.
Did you read the Contributor Guide?
Is this PR related to a JIRA ticket?
[SEDONA-XXX] my subject
.What changes were proposed in this PR?
This PR, updates the default behavior of RS_NormalizeAll -
noDataValue
is null, setting it tomaxLim
and normalizing data values to the range [minLim, maxLim-1] to prevent the loss of valid data that matches the newnoDataValue
.How was this patch tested?
Did this PR include necessary documentation updates?