Fix weighted statistics #681

j08lue · 2024-02-26T13:24:22Z

Fixes #680

Change the simple tests for get_array_statistics to reflect more details of weighted stats - tests will fail ❌
Add tests for more weighted stats
Implement correct weighted statistics in get_array_stats

vincentsarago · 2024-03-07T08:21:27Z

tests/test_utils.py

+    assert stats[0]["min"] == 2
+    assert stats[0]["max"] == 3
+    assert stats[0]["mean"] == (1 * 0 + 2 * 0.25 + 3 * 1.0 + 4 * 0) / 1.25
+    assert stats[0]["count"] == 1.25


Let's look at the data

data = np.ma.array((1, 2, 3, 4)).reshape((1, 2, 2)) coverage = np.array((0, 0.25, 1, 0)).reshape((2, 2)) data * coverage >> masked_array( data=[[[0. , 0.5], [3. , 0. ]]], mask=False, fill_value=1e+20)

the stats should then be:

min: 0 max: 3 mean: (0 + 0.5 + 3.0 + 0.) / 4 = 0.875 sum: 0 + 0.5 + 3.0 + 0. = 3.5 count: 0 + 0.25 + 1 + 0 = 1.25

I'm not sure to understand why the mean should be the sum of the data * coverage divided by the sum of the coverage. We already apply the coverage factor so we (IMO) just need to divide by the number of pixel

Because coverage does not sum up to 1. Its size is arbitrary and not related to the overall weight of all cells, so it skews the overall quantity.

A simple case illustrates this: imagine 2 x 2 cells, all containing pixel value 20, the coverage is the same, 0.1, for all cells.

Since the coverage / weight is the same for all cells, you would expect that their weighted average is the same as their simple average, namely 20, right?

Simple average:

(20 + 20 + 20 + 20) / 4 = 20

Weighted sum:

(20 * 0.1 + 20 * 0.1 + 20 * 0.1 + 20 * 0.1) = 8

If you divide that by 4 (the number of cells), you get 2.

You need to divide by the sum of the weights to get the expected result:

8 / (0.1 + 0.1 + 0.1 + 0.1) = 20

This is not a proof, but perhaps still helps?

https://en.wikipedia.org/wiki/Weighted_arithmetic_mean

$\bar{x} = \frac{w_1 x_1 + w_2 x_2 + ... + w_n x_n}{w_1 + w_2 + ... + w_n}$

well I'm not sure to full understand but in our case we don't use weight but % of each pixel, coverage so the sum of all the weight should be the number of pixels not the sum of coverage

j08lue added 2 commits February 26, 2024 14:19

Catch more coverage cases

a181cd0

Merge branch 'main' into coverage-stats

beff9b1

j08lue changed the title ~~Fix weighted statistics~~ WIP: Fix weighted statistics Feb 26, 2024

j08lue changed the title ~~WIP: Fix weighted statistics~~ [WIP] Fix weighted statistics Feb 26, 2024

j08lue marked this pull request as draft February 27, 2024 12:57

j08lue changed the title ~~[WIP] Fix weighted statistics~~ Fix weighted statistics Feb 27, 2024

Merge branch 'main' of https://github.com/cogeotiff/rio-tiler into HEAD

95dd2ca

vincentsarago reviewed Mar 7, 2024

View reviewed changes

vincentsarago mentioned this pull request Mar 12, 2024

fix statistics for coverage #684

Merged

vincentsarago closed this Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix weighted statistics #681

Fix weighted statistics #681

j08lue commented Feb 26, 2024

vincentsarago Mar 7, 2024

j08lue Mar 7, 2024 •

edited

Loading

j08lue Mar 7, 2024

vincentsarago Mar 7, 2024

Fix weighted statistics #681

Fix weighted statistics #681

Conversation

j08lue commented Feb 26, 2024

vincentsarago Mar 7, 2024

Choose a reason for hiding this comment

j08lue Mar 7, 2024 • edited Loading

Choose a reason for hiding this comment

j08lue Mar 7, 2024

Choose a reason for hiding this comment

vincentsarago Mar 7, 2024

Choose a reason for hiding this comment

j08lue Mar 7, 2024 •

edited

Loading