-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R session crashing when querying local database #2
Comments
@Pakillo Thanks so much for the detailed report! I can reproduce this crash. Looks like this might be a For comparison, can you try the same query in pure gbif <- arrow::open_dataset("/home/shared-data/gbif/occurrence/2021-11-01/occurrence.parquet/")
pinsapo <- gbif %>%
filter(species == "Abies pinsapo",
countrycode == "ES",
year > 2010)
pinsapo %>% collect() That query works for me (though is still somewhat intensive). Also would be great if you could test with adding a (Obviously these are things we'd want to at least document! and it's possible we can avoid them, at least on the local side, by reading the parquet files into the native duckdb backend first. I'm totally still learning here so thanks for exploring this with me!) |
Thanks for the quick reply @cboettig! You're totally right about con <- gbif_conn(dir = "/media/frs/Elements/gbifdb/occurrence.parquet/")
gbif <- tbl(con, "gbif")
pinsapo <- gbif %>%
filter(species == "Abies pinsapo",
countrycode == "ES",
year > 2010) %>%
select(species, year, decimallatitude, decimallongitude)
pinsapo BUT from several tests I've done selecting different columns, it looks to me the problem may be with the last two columns in the database: So, I can select up to 48 columns from the 50 available without any problem, as long as I don't select any of those two columns. Does this make sense? What can be done? (apart from documenting this in the package, of course). If you avoid those two columns it seems all my queries work fine |
nice tracking that down! yeah that makes sense! Yeah, I think we can potentially work around this issue then by excluding those 2 columns by default (e.g. internally inside |
Excluding those two columns looks like a good solution to me, at least by now 👍 I had submitted the review before seeing your response, just to move things forward, but I'm happy to keep an eye if I can be of help Cheers |
Hi @cboettig,
I've tried your examples for querying a local database and they work fine. But I'm unable to do a basic query of species occurrences (<150 records). What am I doing wrong?
My code:
When I call 'pinsapo' my R session crashes. I am attaching images of two attempts:
My session info:
This is the last bit I need to solve to finish my review!
Thanks
The text was updated successfully, but these errors were encountered: