-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix weighted statistics #681
Conversation
assert stats[0]["min"] == 2 | ||
assert stats[0]["max"] == 3 | ||
assert stats[0]["mean"] == (1 * 0 + 2 * 0.25 + 3 * 1.0 + 4 * 0) / 1.25 | ||
assert stats[0]["count"] == 1.25 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's look at the data
data = np.ma.array((1, 2, 3, 4)).reshape((1, 2, 2))
coverage = np.array((0, 0.25, 1, 0)).reshape((2, 2))
data * coverage
>> masked_array(
data=[[[0. , 0.5],
[3. , 0. ]]],
mask=False,
fill_value=1e+20)
the stats should then be:
min: 0
max: 3
mean: (0 + 0.5 + 3.0 + 0.) / 4 = 0.875
sum: 0 + 0.5 + 3.0 + 0. = 3.5
count: 0 + 0.25 + 1 + 0 = 1.25
I'm not sure to understand why the mean
should be the sum of the data * coverage
divided by the sum of the coverage. We already apply the coverage factor so we (IMO) just need to divide by the number of pixel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because coverage
does not sum up to 1. Its size is arbitrary and not related to the overall weight of all cells, so it skews the overall quantity.
A simple case illustrates this: imagine 2 x 2 cells, all containing pixel value 20, the coverage is the same, 0.1, for all cells.
Since the coverage / weight is the same for all cells, you would expect that their weighted average is the same as their simple average, namely 20, right?
Simple average:
(20 + 20 + 20 + 20) / 4 = 20
Weighted sum:
(20 * 0.1 + 20 * 0.1 + 20 * 0.1 + 20 * 0.1) = 8
If you divide that by 4 (the number of cells), you get 2.
You need to divide by the sum of the weights to get the expected result:
8 / (0.1 + 0.1 + 0.1 + 0.1) = 20
This is not a proof, but perhaps still helps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://en.wikipedia.org/wiki/Weighted_arithmetic_mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well I'm not sure to full understand but in our case we don't use weight
but % of each pixel, coverage
so the sum of all the weight should be the number of pixels not the sum of coverage
Fixes #680
get_array_statistics
to reflect more details of weighted stats - tests will fail ❌get_array_stats