fix statistics for coverage #684

vincentsarago · 2024-03-12T14:46:37Z

closes #680
overtake #681

Better take coverage into account. This PR tries to match exactextract results!

vincentsarago · 2024-03-12T14:47:28Z

rio_tiler/utils.py

+) -> float:
+    i = numpy.argsort(values)
+    c = numpy.cumsum(weights[i])
+    return values[i[numpy.searchsorted(c, numpy.array(quantiles) * c[-1])]]


will be removed with numpy 2.0

vincentsarago · 2024-03-12T14:49:18Z

rio_tiler/utils.py

    # Avoid non masked nan/inf values
    numpy.ma.fix_invalid(data, copy=False)

    for b in range(data.shape[0]):
-        keys, counts = numpy.unique(data[b].compressed(), return_counts=True)
+        data_comp = data[b].compressed()


data[b].compressed() was called multiple times

vincentsarago · 2024-03-12T14:53:28Z

rio_tiler/utils.py

+                # Population standard deviation of cell values, taking into account coverage fraction.
+                "std": _weighted_stdev(data_comp, masked_coverage.compressed()),
+                # Median value of cells, weighted by the percent of each cell that is covered.
+                "median": _weighted_quantiles(data_comp, masked_coverage.compressed()),


std and median are now weighted by the coverage array

vincentsarago · 2024-03-12T14:55:00Z

tests/test_utils.py

    assert stats[0]["count"] == 1.75
+    assert stats[0]["median"] == 3  # 2 in exactextract


I have no idea why median gives a different results. I've tested a new numpy 2.0 method and it gives 3 while exactextract give 2. I don't want to over engineer the median calculation

For a raster of type T exactextract is returning type T for the quantile and median calculations. T here is int64, so the median of 2.5 is getting truncated to 2. Maybe quantile/median should be returning float64 instead.

oO that makes sense 🙏 thanks for having a look

vincentsarago · 2024-03-12T14:55:42Z

tests/test_utils.py

+    assert stats[0]["max"] == 9
+    # exactextract takes coverage into account, we don't
+    assert stats[0]["minority"] == 1  # 1 in exactextract
+    assert stats[0]["majority"] == 1  # 5 in exactextract


minority and majority do not take coverage into account. We might do this later if needed

vincentsarago · 2024-03-12T14:56:54Z

Note: I've been working on making exactextract available on pypi so we can integrate into our CI isciences/exactextract#87

vincentsarago commented Mar 12, 2024

View reviewed changes

fix statistics for coverage

699a902

vincentsarago force-pushed the patch/better-statistics-for-coverage branch from b1e8d73 to 699a902 Compare March 12, 2024 14:48

vincentsarago commented Mar 12, 2024

View reviewed changes

round

a73c5f4

vincentsarago commented Mar 12, 2024

View reviewed changes

vincentsarago requested a review from kylebarron March 12, 2024 14:57

make sure we can serialize the stats

5e9b679

vincentsarago merged commit 2e76ad7 into main Mar 22, 2024
7 checks passed

vincentsarago deleted the patch/better-statistics-for-coverage branch March 22, 2024 09:40

This was referenced Apr 5, 2024

Update example for get_array_statistics #692

Merged

Validate API results for grid cell area weighted zonal statistics US-GHG-Center/veda-config-ghg#251

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix statistics for coverage #684

fix statistics for coverage #684

vincentsarago commented Mar 12, 2024

vincentsarago Mar 12, 2024

vincentsarago Mar 12, 2024

vincentsarago Mar 12, 2024

vincentsarago Mar 12, 2024

dbaston Mar 12, 2024

vincentsarago Mar 12, 2024

vincentsarago Mar 12, 2024

vincentsarago commented Mar 12, 2024

		assert stats[0]["count"] == 1.75
		assert stats[0]["median"] == 3 # 2 in exactextract

fix statistics for coverage #684

fix statistics for coverage #684

Conversation

vincentsarago commented Mar 12, 2024

vincentsarago Mar 12, 2024

Choose a reason for hiding this comment

vincentsarago Mar 12, 2024

Choose a reason for hiding this comment

vincentsarago Mar 12, 2024

Choose a reason for hiding this comment

vincentsarago Mar 12, 2024

Choose a reason for hiding this comment

dbaston Mar 12, 2024

Choose a reason for hiding this comment

vincentsarago Mar 12, 2024

Choose a reason for hiding this comment

vincentsarago Mar 12, 2024

Choose a reason for hiding this comment

vincentsarago commented Mar 12, 2024