Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEDONA-475] Add RS_NormalizeAll #1221

Merged
merged 9 commits into from
Feb 3, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -437,6 +437,90 @@ public static double[] normalize(double[] band) {
return result;
}

public static GridCoverage2D normalizeAll(GridCoverage2D rasterGeom) {
return normalizeAll(rasterGeom, 0d, 255d, -9999, -99999, -99999);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use magic numbers here. Can you use null?

}

public static GridCoverage2D normalizeAll(GridCoverage2D rasterGeom, double minLim, double maxLim) {
return normalizeAll(rasterGeom, minLim, maxLim, -9999, -99999, -99999);
}

public static GridCoverage2D normalizeAll(GridCoverage2D rasterGeom, double minLim, double maxLim, double noDataValue) {
return normalizeAll(rasterGeom, minLim, maxLim, noDataValue, -99999, -99999);
}

/**
*
* @param rasterGeom Raster to be normalized
* @param minLim Lower limit of normalization range
* @param maxLim Upper limit of normalization range
* @param noDataValue NoDataValue used in raster
* @param minValue Minimum value in raster
* @param maxValue Maximum value in raster
* @return a raster with all values in all bands normalized between minLim and maxLim
*/
public static GridCoverage2D normalizeAll(GridCoverage2D rasterGeom, double minLim, double maxLim, double noDataValue, double minValue, double maxValue) {
if (minLim > maxLim) {
throw new IllegalArgumentException("minLim cannot be greater than maxLim");
}

// Unset minmaxFlag if minValue and maxValue are not given;
boolean minmaxFlag = minValue != -99999;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check null instead of a magic number.


int numBands = rasterGeom.getNumSampleDimensions();
RenderedImage renderedImage = rasterGeom.getRenderedImage();
int rasterDataType = renderedImage.getSampleModel().getDataType();

for (int bandIndex = 1; bandIndex <= numBands; bandIndex++) {
// Get the band values as an array
double[] bandValues = bandAsArray(rasterGeom, bandIndex);

// Find min and max values in the band, excluding NoDataValue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should fetch the NoDataValue from each band. There are operators functions in RasterUtils can already do it. The NoDataValue of the input is to be used when storing in the raster band.

if (!minmaxFlag) {
minValue = Arrays.stream(bandValues).filter(val -> val != noDataValue).min().orElse(Double.NaN);
maxValue = Arrays.stream(bandValues).filter(val -> val != noDataValue).max().orElse(Double.NaN);
}

if (minValue == maxValue) {
jiayuasu marked this conversation as resolved.
Show resolved Hide resolved
// Set default value for constant bands to minLim
Arrays.fill(bandValues, minLim);
} else {
// Normalize the band values, setting NoDataValue to maxLim
for (int i = 0; i < bandValues.length; i++) {
if (bandValues[i] == noDataValue) {
bandValues[i] = maxLim;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the band value equals to the no data value of a band, then you should set it as noDataValue from the input.

If the input noDataValue is not given, then set it to maxLim.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiayuasu setting noDataValue to the maxLim by default will corrupt real data values.

instead I think the default should be to scale real values from 0, 254 and set the noDataValue to 255 if it is not given.

Or, scale real values between 1-255, set noDataValue to 0 by default. this makes more sense to me, 0 is used as a noDataValue more often than 255.

} else {
double normalizedValue = minLim + ((bandValues[i] - minValue) * (maxLim - minLim)) / (maxValue - minValue);
bandValues[i] = castRasterDataType(normalizedValue, rasterDataType);
}
}
}

// Update the raster with the normalized band
rasterGeom = addBandFromArray(rasterGeom, bandValues, bandIndex);
}

return rasterGeom;
}

private static double castRasterDataType(double value, int dataType) {
switch (dataType) {
case DataBuffer.TYPE_BYTE:
return (byte) value;
case DataBuffer.TYPE_SHORT:
return (short) value;
case DataBuffer.TYPE_INT:
return (int) value;
case DataBuffer.TYPE_USHORT:
return (char) value;
case DataBuffer.TYPE_FLOAT:
return (float) value;
case DataBuffer.TYPE_DOUBLE:
default:
return value;
}
}

/**
* @param band1 band values
* @param band2 band values
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@
import org.opengis.referencing.FactoryException;

import java.awt.image.DataBuffer;
import java.lang.reflect.Array;
import java.util.Arrays;
import java.util.Random;

import static org.junit.Assert.*;
Expand Down Expand Up @@ -321,6 +323,107 @@ public void testNormalize() {
assertArrayEquals(expected, actual, 0.1d);
}

@Test
public void testNormalizeAll() throws FactoryException {
GridCoverage2D raster1 = RasterConstructors.makeEmptyRaster(2, 4, 4, 0, 0, 1);
GridCoverage2D raster2 = RasterConstructors.makeEmptyRaster(2, 4, 4, 0, 0, 1);
GridCoverage2D raster3 = RasterConstructors.makeEmptyRaster(2, "I", 4, 4, 0, 0, 1);
GridCoverage2D raster4 = RasterConstructors.makeEmptyRaster(2, 4, 4, 0, 0, 1);

for (int band = 1; band <= 2; band++) {
double[] bandValues1 = new double[4 * 4];
double[] bandValues2 = new double[4 * 4];
double[] bandValues3 = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,-9999};
double[] bandValues4 = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,0};
for (int i = 0; i < bandValues1.length; i++) {
bandValues1[i] = (i) * band;
bandValues2[i] = (1) * (band-1);
}
raster1 = MapAlgebra.addBandFromArray(raster1, bandValues1, band);
raster2 = MapAlgebra.addBandFromArray(raster2, bandValues2, band);
raster3 = MapAlgebra.addBandFromArray(raster3, bandValues3, band);
raster4 = MapAlgebra.addBandFromArray(raster4, bandValues4, band);
}

GridCoverage2D normalizedRaster1 = MapAlgebra.normalizeAll(raster1);
GridCoverage2D normalizedRaster2 = MapAlgebra.normalizeAll(raster1, 256d, 511d);
GridCoverage2D normalizedRaster3 = MapAlgebra.normalizeAll(raster2);
GridCoverage2D normalizedRaster4 = MapAlgebra.normalizeAll(raster3, 0, 255);
GridCoverage2D normalizedRaster5 = MapAlgebra.normalizeAll(raster4, 0, 255, 0, 1, 15);

for (int band = 1; band <= 2; band++) {
double[] normalizedBand1 = MapAlgebra.bandAsArray(normalizedRaster1, band);
double[] normalizedBand2 = MapAlgebra.bandAsArray(normalizedRaster2, band);
double[] normalizedBand3 = MapAlgebra.bandAsArray(normalizedRaster3, band);
double[] normalizedBand4 = MapAlgebra.bandAsArray(normalizedRaster4, band);
double[] normalizedBand5 = MapAlgebra.bandAsArray(normalizedRaster5, band);
double normalizedMin1 = Arrays.stream(normalizedBand1).min().getAsDouble();
double normalizedMax1 = Arrays.stream(normalizedBand1).max().getAsDouble();
double normalizedMin2 = Arrays.stream(normalizedBand2).min().getAsDouble();
double normalizedMax2 = Arrays.stream(normalizedBand2).max().getAsDouble();
double[] expected1 = {0.0, 17.0, 34.0, 51.0, 68.0, 85.0, 102.0, 119.0, 136.0, 153.0, 170.0, 187.0, 204.0, 221.0, 238.0, 255.0};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please add tests for bands that have all same values.
  2. Please add tests for rasters that have different pixel types. NormalizeAll is not supposed to change the pixel types. Moreover, the data truncation should happen as expected. If the input raster is integer, the normalized output raster should still have integer values, which means some decimal places derived from normalization will get truncated. One example is from MapAlgebra: https://sedona.apache.org/1.5.1/api/sql/Raster-map-algebra/#map-algebra

double[] expected2 = {256.0, 273.0, 290.0, 307.0, 324.0, 341.0, 358.0, 375.0, 392.0, 409.0, 426.0, 443.0, 460.0, 477.0, 494.0, 511.0};
double[] expected3 = {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0};
double[] expected4 = {0.0, 18.0, 36.0, 54.0, 72.0, 91.0, 109.0, 127.0, 145.0, 163.0, 182.0, 200.0, 218.0, 236.0, 255.0, 255.0};
double[] expected5 = {0.0, 18.214285714285715, 36.42857142857143, 54.642857142857146, 72.85714285714286, 91.07142857142857, 109.28571428571429, 127.5, 145.71428571428572, 163.92857142857142, 182.14285714285714, 200.35714285714286, 218.57142857142858, 236.78571428571428, 255.0, 255.0};

assertEquals(0d, normalizedMin1, 0.01d);
assertEquals(255d, normalizedMax1, 0.01d);
assertEquals(256d, normalizedMin2, 0.01d);
assertEquals(511d, normalizedMax2, 0.01d);
assertEquals(Arrays.toString(expected1), Arrays.toString(normalizedBand1));
assertEquals(Arrays.toString(expected2), Arrays.toString(normalizedBand2));
assertEquals(Arrays.toString(expected3), Arrays.toString(normalizedBand3));
assertEquals(Arrays.toString(expected4), Arrays.toString(normalizedBand4));
assertEquals(Arrays.toString(expected5), Arrays.toString(normalizedBand5));
}
}

@Test
public void testNormalizeAll2() throws FactoryException {
String[] pixelTypes = {"B", "I", "S", "US", "F", "D"}; // Byte, Integer, Short, Unsigned Short, Float, Double
for (String pixelType : pixelTypes) {
testNormalizeAll2(10, 10, pixelType);
}
}

private void testNormalizeAll2(int width, int height, String pixelType) throws FactoryException {
// Create an empty raster with the specified pixel type
GridCoverage2D raster = RasterConstructors.makeEmptyRaster(1, pixelType, width, height, 10, 20, 1);

// Fill raster
double[] bandValues = new double[width * height];
for (int i = 0; i < bandValues.length; i++) {
bandValues[i] = i;
}
raster = MapAlgebra.addBandFromArray(raster, bandValues, 1);

GridCoverage2D normalizedRaster = MapAlgebra.normalizeAll(raster, 0, 255);

// Check the normalized values and data type
double[] normalizedBandValues = MapAlgebra.bandAsArray(normalizedRaster, 1);
for (int i = 0; i < bandValues.length; i++) {
double expected = (bandValues[i] - 0) * (255 - 0) / (99 - 0);
double actual = normalizedBandValues[i];
switch (normalizedRaster.getRenderedImage().getSampleModel().getDataType()) {
case DataBuffer.TYPE_BYTE:
case DataBuffer.TYPE_SHORT:
case DataBuffer.TYPE_USHORT:
case DataBuffer.TYPE_INT:
assertEquals((int) expected, (int) actual);
break;
default:
assertEquals(expected, actual, 0.01);
}
}

// Assert the data type remains as expected
int resultDataType = normalizedRaster.getRenderedImage().getSampleModel().getDataType();
int expectedDataType = RasterUtils.getDataTypeCode(pixelType);
assertEquals(expectedDataType, resultDataType);
}


@Test
public void testNormalizedDifference() {
double[] band1 = new double[] {960, 1067, 107, 20, 1868};
Expand Down
34 changes: 34 additions & 0 deletions docs/api/sql/Raster-operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -2480,6 +2480,40 @@ Spark SQL Example:
SELECT RS_Normalize(band)
```

### RS_NormalizeAll

Introduction: Normalizes values in all bands of a raster between a given normalization range. By default, the values are normalized to range [0, 255]. RS_NormalizeAll can take upto 6 of the following arguments.

- `raster`: The raster to be normalized.
- `minLim` and `maxLim` (Optional): The lower and upper limits of the normalization range. By default, normalization range is set to [0, 255].
- `noDataValue` (Optional): NoDataValues in rasters represent missing or invalid data. By default, -9999 is taken as NoDataValue.
- `minValue` and `maxValue` (Optional): Optionally, specific minimum and maximum values of the input raster can be provided. If not provided, these values are computed from the raster data.

!!!Note
The function maintains the data type of the raster values. It ensures that the normalized values are cast back to the original data type of each band in the raster.

Formats:
```
RS_NormalizeAll (raster: Raster)`
```
```
RS_NormalizeAll (raster: Raster, minLim: Double, maxLim: Double)
```
```
RS_NormalizeAll (raster: Raster, minLim: Double, maxLim: Double, noDataValue: Double)
```
```
RS_NormalizeAll (raster: Raster, minLim: Double, maxLim: Double, noDataValue: Double, minValue: Double, maxValue: Double)
```

Since: `v1.6.0`

Spark SQL Example:

```sql
SELECT RS_NormalizeAll(raster, 0, 1)
```

### RS_NormalizedDifference

Introduction: Returns Normalized Difference between two bands(band2 and band1) in a Geotiff image(example: NDVI, NDBI)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,7 @@ object Catalog {
function[RS_LogicalOver](),
function[RS_Array](),
function[RS_Normalize](),
function[RS_NormalizeAll](),
function[RS_AddBandFromArray](),
function[RS_BandAsArray](),
function[RS_MapAlgebra](null),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,14 @@ case class RS_Normalize(inputExpressions: Seq[Expression]) extends InferredExpre
}
}

case class RS_NormalizeAll(inputExpressions: Seq[Expression]) extends InferredExpression(
inferrableFunction1(MapAlgebra.normalizeAll), inferrableFunction3(MapAlgebra.normalizeAll), inferrableFunction4(MapAlgebra.normalizeAll), inferrableFunction6(MapAlgebra.normalizeAll)
) {
protected def withNewChildrenInternal(newChildren: IndexedSeq[Expression]) = {
copy(inputExpressions = newChildren)
}
}

case class RS_AddBandFromArray(inputExpressions: Seq[Expression])
extends InferredExpression(nullTolerantInferrableFunction3(MapAlgebra.addBandFromArray), nullTolerantInferrableFunction4(MapAlgebra.addBandFromArray), inferrableFunction2(MapAlgebra.addBandFromArray)) {
protected def withNewChildrenInternal(newChildren: IndexedSeq[Expression]) = {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,15 @@ class rasteralgebraTest extends TestBaseScala with BeforeAndAfter with GivenWhen
assert(df.first().getAs[mutable.WrappedArray[Double]](0)(1) == 255)
}

it("should pass RS_NormalizeAll") {
var df = sparkSession.read.format("binaryFile").load(resourceFolder + "raster/test1.tiff")
df = df.selectExpr("RS_FromGeoTiff(content) as raster")
val result1 = df.selectExpr("RS_NormalizeAll(raster, 0, 255) as normalized").first().get(0)
val result2 = df.selectExpr("RS_NormalizeAll(raster, 0, 255, 0) as normalized").first().get(0)
assert(result1.isInstanceOf[GridCoverage2D])
assert(result2.isInstanceOf[GridCoverage2D])
}

it("should pass RS_Array") {
val df = sparkSession.sql("SELECT RS_Array(6, 1e-6) as band")
val result = df.first().getAs[mutable.WrappedArray[Double]](0)
Expand Down
Loading