Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statistics not excluding nodata correctly v4.1.9 - Need PER BAND Nodata Mask #579

Closed
AndrewAnnex opened this issue Mar 3, 2023 · 6 comments

Comments

@AndrewAnnex
Copy link

I am using the statistics endpoint in titiler on a set of multiband float32 geotiffs with nodata values of 65535 but the statistics include the nodata value as valid pixels throwing off the histograms/percentiles/max value/majority/std deviation. Valid pixel counts and masked pixel counts seem correct so something weird is going on either with my data or the rio tiler.

Running gdalinfo -stats or -hist on the same files computes correct metadata per band as expected (within range from -1 to 1) so it looks like gdal is doing the right thing in terms of excluding data. A guess I have is that nodatas are being handled differently between gdal and the numpy implementation here

@vincentsarago
Copy link
Member

can you share the file @AndrewAnnex ?

@AndrewAnnex
Copy link
Author

@vincentsarago I can, but it's a bit larger than the 50mb file limit but I suspect there may be other issues with the data that makes me want to close this issue for now. It's hyperspectral data so there may be some very large values that should be considered nodata but technically are not set to the nodata value. I did make progress with a partial work-around by using the expression syntax to exclude values but that didn't work consistently due to that issue with the large values. in any case I can try to share next week but think it's fair to close this issue

@AndrewAnnex
Copy link
Author

@vincentsarago here's a url to a file exhibiting the issue on band 16 http://murray-lab.caltech.edu/temp/annex/hrl0001fc92_07_sr182j_mtr3.tif (~53 mb).

Using titiler I get the following stats on that band:

{'b16': {'min': -0.06552428752183914,
  'max': 65535.0,
  'mean': 4509.337045408054,
  'count': 200427.0,
  'sum': 903792896.0,
  'std': 16588.712325390792,
  'median': -0.0015708755236119032,
  'majority': 65535.0,
  'minority': -0.06552428752183914,
  'unique': 185869.0,
  'histogram': [[186636.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 13791.0],
   [-0.06552428752183914,
    6553.44091796875,
    13106.947265625,
    19660.455078125,
    26213.9609375,
    32767.466796875,
    39320.97265625,
    45874.48046875,
    52427.98828125,
    58981.4921875,
    65535.0]],
  'valid_percent': 72.23,
  'masked_pixels': 77061.0,
  'valid_pixels': 200427.0,
  'percentile_2': -0.018522651717066765,
  'percentile_98': 65535.0}}

and I get reasonable values from gdalinfo:

Band 16 Block=256x256 Type=Float32, ColorInterp=Undefined
  Description = BD1300
  Minimum=-0.066, Maximum=0.071, Mean=-0.001, StdDev=0.010
  NoData Value=65535
  Metadata:
    STATISTICS_MAXIMUM=0.070705354213715
    STATISTICS_MEAN=-0.0014813615619181
    STATISTICS_MINIMUM=-0.065524287521839
    STATISTICS_STDDEV=0.0098897473891675
    STATISTICS_VALID_PERCENT=67.26

Interestingly, I am just noticing how the valid percents differ here.

@vincentsarago
Copy link
Member

I think I know what's going on! This is a mask problem. In rio-tiler we get the mask using rasterio's dataset_mask (https://rasterio.readthedocs.io/en/stable/topics/masks.html#dataset-masks)

https://github.com/cogeotiff/rio-tiler/blob/9b344d1a1551b89b6a73d8cb93b947181f1db958/rio_tiler/reader.py

and for some reason we don't get the expected result. 👇 difference between dataset_mask and the nodata mask
Screenshot 2023-03-05 at 11 12 13 PM

I always preferred the per dataset mask approach because it ease the output image creation but maybe it's time to revisit this (and work on rio-tiler v5 🤷)

@AndrewAnnex
Copy link
Author

ah okay that makes sense now. You'd be more aware of the ramifications of changing the default behavior than me, but maybe using the nodata mask could be optional?

@vincentsarago
Copy link
Member

maybe using the nodata mask could be optional?

@AndrewAnnex, sadly no because rio-tiler deals with other mask type (alpha, internal mask) so it's not as simple.

@vincentsarago vincentsarago changed the title statistics not excluding nodata correctly v4.1.9 statistics not excluding nodata correctly v4.1.9 - Need PER BAND Nodata Mask Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants