Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in macOS: IOException - Could not remove tempfile #203

Open
rafapereirabr opened this issue Jul 17, 2024 · 4 comments
Open

error in macOS: IOException - Could not remove tempfile #203

rafapereirabr opened this issue Jul 17, 2024 · 4 comments

Comments

@rafapereirabr
Copy link

Hi all, thanks for such an excellent package! I'm using {duckplyr} as a dependency in my package {censobr} and it works super nicely but it generates this strange error in my cmd-checks for macOS . The error only occurs in macOS.

Error: Error: R CMD check found ERRORs
Execution halted
180.900 129.583 310.242
libc++abi: terminating due to uncaught exception of type duckdb::IOException: {"exception_type":"IO","exception_message":"Could not remove file ".tmp/duckdb_temp_storage-0.tmp": No such file or directory","errno":"2"}

1 error ✖ | 0 warnings ✔ | 0 notes ✔
Error: Process completed with exit code 1.

The error seems to occur when GithubActions checks this function:

censobr::read_population(year = 2000,
                         merge_households = TRUE)

Internally, this is what the function is doing. It (1) opens two .parquet files using arrow, (2) converts them to duckdb, (3) performs a left join, and (4) converts the result back to arrow.

library(censobr)
library(duckplyr)
library(dplyr)

df <- censobr::read_population(year = 2000)
df_household <- censobr::read_households(year = 2000)

key_vars <- c('code_muni', 'code_state', 'abbrev_state','name_state',
              'code_region', 'name_region', 'code_weighting', 'V0300')

# drop repeated vars
all_common_vars <- names(df)[names(df) %in% names(df_household)]
vars_to_drop <- setdiff(all_common_vars, key_vars)
df_household <- dplyr::select(df_household, -all_of(vars_to_drop))

# convert to duckdb
df <- arrow::to_duckdb(df)
df_household <- arrow::to_duckdb(df_household)

# merge
df_geo <- duckplyr::left_join(df, df_household)

# back to arrow
df_geo <- arrow::to_arrow(df_geo)

@krlmlr
Copy link
Collaborator

krlmlr commented Jul 17, 2024

Thanks. We might need to close the duckdb connection and shutdown duckdb (or just gc() ) on unload of duckplyr. Would you like to contribute a PR? Beware of dragons!

@rafapereirabr
Copy link
Author

Hi @krlmlr , I just had a look at the code of duckplyr::left_join() and I couldn't find an explicit call to dbConnect() so I'm not sure how to close the duckdb connection here.

Perhaps the connection to duckdb should be closed within arrow::to_arrow() ?

@krlmlr
Copy link
Collaborator

krlmlr commented Jul 18, 2024

See create_default_duckdb_connection() .

@rafapereirabr
Copy link
Author

Hi @krlmlr , I checked the code but I'm not really familiar with the internals of {duckplyr} so I couldn't find a way to suggest any changes here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants