Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edge and Island bias from cell aggregation #12

Closed
ethanplunkett opened this issue Feb 22, 2023 · 5 comments · Fixed by #118
Closed

Edge and Island bias from cell aggregation #12

ethanplunkett opened this issue Feb 22, 2023 · 5 comments · Fixed by #118
Assignees

Comments

@ethanplunkett
Copy link
Contributor

preprocess_species() currently aggregates raster data into larger cells by taking the mean of the values from component smaller cells after dropping no data values. For a large cell that overlaps a few smaller data cells along with a bunch of cells without data than we are essentially up-weighting the data cells. If the no data represented unknown and we assume that the data nearby was the best estimate for that unknown, than the current approach would be valid, but, I think, much of the time the no data is open water and thus is very different than the nearby data cells and consequently has very different values, and for most of these species in most of the landscape the no-data represents a much lower probability of occurrence. The initial red flag for me was looking at Dickcissel where a few pixels that only barely overlapped Trinidad had really high values in the distribution. Essentially a tiny sliver of high value was turned into a much larger, coarse cell of high value and thus the proportion of the total distribution that was associated with Tobago was increased substantially. I think the solution is to use the sum when aggregating rather than the mean. Ultimately we re-standardize to a sum of 1 so the absolute magnitude of the cells (increased by taking the sum) doesn't matter but the relative magnitude does and with the sum the proportion of the distribution that's associated with Tobago should stay constant through the change in scale. Using the sum changes our assumption from no data is similar to what it's near to no data represents zero probability of occurrence, neither assumption is correct but I think the second will serve us better.
dickci_spring_migration

@ethanplunkett ethanplunkett self-assigned this Feb 27, 2023
@bmvandoren
Copy link
Contributor

Thanks for this helpful summary. I agree that using the sum (i.e. treating no-data cells as zero) sounds like a good way to treat these situations. I'm in support of that update.

@dsheldon
Copy link
Contributor

Sounds good to me as well, thanks Ethan!

@ethanplunkett
Copy link
Contributor Author

ethanplunkett commented Jun 13, 2023

Along with replacing NA with 0. I'm going to switch from nearest neighbor to bilinear interpolation while reprojecting. This will generally improve the output and also will cause the footprint of non-zero cells to expand slightly (relative to nearest neighbor) - a good thing given the truncation in the ebird S&T data.

In my head this has been linked to a replacing NA's with 0 but rereading the above I realize I didn't mention or explain it.

Currently when BirdFlowR reprojects the ebird S&T data it uses nearest neighbor interpolation to avoid the expansion of NA that would occur with bilinear interpolation, but if we are treat NA's as zero we can use bilinear interpolation.

@ethanplunkett
Copy link
Contributor Author

Comparing the new (keep NA, nearest neighbor):
image
To the old (treat NA as zero and bilinear interpolation):
image
I think this is working the way we'd hoped both by not overestimating the Trinidad density and in spreading out the non-zero footprint a little.

@bmvandoren
Copy link
Contributor

bmvandoren commented Jun 20, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants