Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with auk_zerofill #79

Closed
mhesselbarth opened this issue Feb 8, 2024 · 3 comments
Closed

Issues with auk_zerofill #79

mhesselbarth opened this issue Feb 8, 2024 · 3 comments

Comments

@mhesselbarth
Copy link

mhesselbarth commented Feb 8, 2024

Dear auk Team,

First of all, thank you for providing this amazing package!

I am currently trying to zero-fill data for several EU countries, for which I download the eBird data using the Custom Download here. While my code works for most countries, some return an error message (see code example below).

The issue seems to be that the check in L162 (# ensure all checklist in ebd are in sampling file) fails.

Would it be appropriate to remove all IDs that are not present in checklists datasets? There are two options I can think of 1) using which(observations$checklist_id %in% checklists$checklist_id) in the filter call applied to the observations dataset or 2) using an intersect(checklists$checklist_id, observations$checklist_id).

Given how big the total EBD is, I would really like to use the custom download method for single countries of possible.

library(auk)
library(dplyr)

# set path
f_ebd <- "C:/Users/hesselbarth/Desktop/ebd_HU_smp_relDec-2023.txt"
f_sed <- "C:/Users/hesselbarth/Desktop/ebd_HU_smp_relDec-2023_sampling.txt"

# read data
checklists <- auk::read_sampling(f_sed, unique = TRUE) |> 
  dplyr::filter(all_species_reported)

observations <- auk::read_ebd(f_ebd, unique = TRUE, rollup = TRUE) |>
  dplyr::filter(all_species_reported)

auk::auk_zerofill(x = observations, sampling_events = checklists, collapse = TRUE)
#> Error in auk_zerofill.data.frame(x = observations, sampling_events = checklists, : Some checklists in EBD are 
missing from sampling event data.

Created on 2024-02-08 with reprex v2.1.0

Session info:

R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
 
auk_0.7.0
@mhesselbarth
Copy link
Author

mhesselbarth commented Feb 8, 2024

Seeing this just now, but related to #46 and #23

@mstrimas
Copy link
Contributor

mstrimas commented Feb 8, 2024

This is a bug that can arise when some checklists in a set of shared checklists are complete and others are incomplete. You can identify the problem checklists with:

#> anti_join(observations, checklists, by = "checklist_id") |> 
  distinct(sampling_event_identifier)
# A tibble: 25 × 1
   sampling_event_identifier
   <chr>                    
 1 S32923769                
 2 S32923882                
 3 S32929530,S32929531      
 4 S33712624,S53448768      
 5 S35463195                
 6 S35463195,S35463196      
 7 S35757591                
 8 S39141210                
 9 S40068949                
10 S40887336                
# ℹ 15 more rows

Then you can look up the first checklist at https://ebird.org/checklist/S32923769

Notice that this checklist is complete, but the other two checklists in the group are not complete https://ebird.org/checklist/S31680386
https://ebird.org/checklist/S32923770
and they are missing some species that the first checklist has.

In the long term, I need to come up with a solution for this bug, but in the short term, I think the best solution is to remove the problematic observations:

checklists <- read_sampling(f_sed, unique = TRUE) |> 
  filter(all_species_reported)
observations <- read_ebd(f_ebd, unique = TRUE, rollup = TRUE) |>
  filter(all_species_reported) |> 
  semi_join(checklists, by = "checklist_id")

auk::auk_zerofill(x = observations, sampling_events = checklists, collapse = TRUE)

@mhesselbarth
Copy link
Author

Thanks for the quick reply and the help. This seems to work 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants