Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEDONA-475] Add RS_NormalizeAll #1221

Merged
merged 9 commits into from
Feb 3, 2024

Conversation

prantogg
Copy link
Contributor

@prantogg prantogg commented Jan 30, 2024

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

What changes were proposed in this PR?

Adds RS_NormalizeAll, to normalize values in all bands of a raster between a given normalization range.
Key features -

  • Flexible Normalization Range: Optionally specify normalization range through minLim and maxLim. By default, the values are normalized to range [0, 255].
  • Handling Uniform Bands: For bands where all pixel values are identical, normalized values are set to lower limit of normalization range.
  • Handling NoDataValues: Allows defining noDataValue to be used for missing or invalid data in raster bands. By default, noDataValue is set to maxLim.
  • Optional Min/Max range of raster: Min and Max range of the input raster's pixel values can be specified via minValue and maxValue to skip computation of them during runtime.
  • Data Type Integrity: The function maintains the data type of raster values ensuring normalized values are cast back to the original raster's data type.
  • Normalization across bands: normalizeAcrossBands flag, when set, normalization is performed across all bands based on global min and max values. If false, each band is normalized individually based on its own min and max values.

How was this patch tested?

Passes new and existing tests

Did this PR include necessary documentation updates?

@prantogg prantogg changed the title [SEDONA-475] Add RS_NormalizeAll [SEDONA-475][EWT-558] Add RS_NormalizeAll Jan 30, 2024
@prantogg prantogg changed the title [SEDONA-475][EWT-558] Add RS_NormalizeAll [SEDONA-475] Add RS_NormalizeAll Jan 30, 2024
@jiayuasu jiayuasu added this to the sedona-1.6.0 milestone Jan 31, 2024

// Normalize the band values
for (int i = 0; i < bandValues.length; i++) {
bandValues[i] = minLim + ((bandValues[i] - minValue) * (maxLim - minLim)) / (maxValue - minValue);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if all band values are the same. Then the denominator becomes zero and hence will lead to unexpected behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if all band values are zero?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens to the NoDataValue of a raster after normalization? @rbavery any experience?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @rbavery, we should exclude NoDataValue when performing the normalization. Moreover, we should provide an optional parameter for users to set the new NoDataValue.

double normalizedMax1 = Arrays.stream(normalizedBand1).max().getAsDouble();
double normalizedMin2 = Arrays.stream(normalizedBand2).min().getAsDouble();
double normalizedMax2 = Arrays.stream(normalizedBand2).max().getAsDouble();
double[] expected1 = {0.0, 17.0, 34.0, 51.0, 68.0, 85.0, 102.0, 119.0, 136.0, 153.0, 170.0, 187.0, 204.0, 221.0, 238.0, 255.0};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please add tests for bands that have all same values.
  2. Please add tests for rasters that have different pixel types. NormalizeAll is not supposed to change the pixel types. Moreover, the data truncation should happen as expected. If the input raster is integer, the normalized output raster should still have integer values, which means some decimal places derived from normalization will get truncated. One example is from MapAlgebra: https://sedona.apache.org/1.5.1/api/sql/Raster-map-algebra/#map-algebra

// Find min and max values in the band
double minValue = Arrays.stream(bandValues).min().orElse(Double.NaN);
double maxValue = Arrays.stream(bandValues).max().orElse(Double.NaN);
System.out.println("minValue: "+minValue);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not commit System.out.println

Copy link
Member

@jiayuasu jiayuasu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short, this function should have the following variants:

RS_NormalizeAll (raster: Raster)

RS_NormalizeAll (raster: Raster, minLim: Double, maxLim: Double)

RS_NormalizeAll (raster: Raster, minLim: Double, maxLim: Double, NoDataValue:Double)

RS_NormalizeAll (raster: Raster, minLim: Double, maxLim: Double, NoDataValue:Double, minValue:Double, maxValue:Double)


if (minValue == maxValue) {
// If all values in the band are same - set middle value of range as default value.
double defaultValue = (minLim + maxLim) / 2.0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this happens, please choose minLim as the default value.

Please describe the behavior in the document.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Have described the current behavior in the PR description above.

@@ -437,6 +437,90 @@ public static double[] normalize(double[] band) {
return result;
}

public static GridCoverage2D normalizeAll(GridCoverage2D rasterGeom) {
return normalizeAll(rasterGeom, 0d, 255d, -9999, -99999, -99999);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use magic numbers here. Can you use null?

// Get the band values as an array
double[] bandValues = bandAsArray(rasterGeom, bandIndex);

// Find min and max values in the band, excluding NoDataValue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should fetch the NoDataValue from each band. There are operators functions in RasterUtils can already do it. The NoDataValue of the input is to be used when storing in the raster band.

}

// Unset minmaxFlag if minValue and maxValue are not given;
boolean minmaxFlag = minValue != -99999;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check null instead of a magic number.

// Normalize the band values, setting NoDataValue to maxLim
for (int i = 0; i < bandValues.length; i++) {
if (bandValues[i] == noDataValue) {
bandValues[i] = maxLim;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the band value equals to the no data value of a band, then you should set it as noDataValue from the input.

If the input noDataValue is not given, then set it to maxLim.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiayuasu setting noDataValue to the maxLim by default will corrupt real data values.

instead I think the default should be to scale real values from 0, 254 and set the noDataValue to 255 if it is not given.

Or, scale real values between 1-255, set noDataValue to 0 by default. this makes more sense to me, 0 is used as a noDataValue more often than 255.

@jiayuasu jiayuasu merged commit adebdac into apache:master Feb 3, 2024
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants