Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Connection is garbage-collected, use dbDisconnect() to avoid this (when using to_duckdb()) #38382

Closed
PMassicotte opened this issue Oct 21, 2023 · 0 comments · Fixed by #38495
Assignees
Milestone

Comments

@PMassicotte
Copy link

PMassicotte commented Oct 21, 2023

I have the following code:

library(tidyverse)
library(arrow)
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:lubridate':
#> 
#>     duration
#> The following object is masked from 'package:utils':
#> 
#>     timestamp

bb <- s3_bucket(
  bucket = "cdoc",
  endpoint_override = "s3.valeria.science",
  anonymous = TRUE
)

open_dataset(bb) |>
  to_duckdb() |>
  summarise(mean_doc = mean(doc, na.rm = TRUE), .by = ecosystem) |>
  collect()
#> # A tibble: 5 × 2
#>   ecosystem mean_doc
#>   <chr>        <dbl>
#> 1 coastal      253. 
#> 2 river        527. 
#> 3 lake        1323. 
#> 4 ocean         60.2
#> 5 estuary      235.

When I quit R I get this message:

Warning messages:
Connection is garbage-collected, use dbDisconnect() to avoid this.
Database is garbage-collected, use dbDisconnect(con, shutdown=TRUE) or duckd
duckdb_shutdown(drv) to avoid this.

One way to avoid this is to explicitly use a connection:
Credit: https://discord.com/channels/909674491309850675/921100826884341781/1165053445657608222

library(DBI)
library(duckdb)

drv <- duckdb()
con <- dbConnect(drv)

open_dataset(bb) |>
  to_duckdb(con = con) |>
  summarise(mean_doc = mean(doc, na.rm = TRUE), .by = ecosystem) |>
  collect()
#> # A tibble: 5 × 2
#>   ecosystem mean_doc
#>   <chr>        <dbl>
#> 1 lake        1323. 
#> 2 river        527. 
#> 3 coastal      253. 
#> 4 ocean         60.2
#> 5 estuary      235.

dbDisconnect(con)
duckdb_shutdown(drv)

Is this expected or it should be done automatically?

Created on 2023-10-21 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.1 (2023-06-16)
#>  os       Ubuntu 23.04
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language en_CA:en
#>  collate  en_CA.UTF-8
#>  ctype    en_CA.UTF-8
#>  tz       America/Toronto
#>  date     2023-10-21
#>  pandoc   2.17.1.1 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version  date (UTC) lib source
#>  arrow       * 13.0.0.1 2023-09-22 [1] RSPM (R 4.3.0)
#>  assertthat    0.2.1    2019-03-21 [1] RSPM (R 4.3.0)
#>  bit           4.0.5    2022-11-15 [1] RSPM (R 4.3.0)
#>  bit64         4.0.5    2020-08-30 [1] RSPM (R 4.3.0)
#>  blob          1.2.4    2023-03-17 [1] RSPM (R 4.3.0)
#>  cli           3.6.1    2023-03-23 [1] RSPM (R 4.3.0)
#>  colorspace    2.1-0    2023-01-23 [1] RSPM (R 4.3.0)
#>  DBI         * 1.1.3    2022-06-18 [1] RSPM (R 4.3.0)
#>  dbplyr        2.3.4    2023-09-26 [1] RSPM (R 4.3.0)
#>  digest        0.6.33   2023-07-07 [1] RSPM (R 4.3.0)
#>  dplyr       * 1.1.3    2023-09-03 [1] RSPM (R 4.3.0)
#>  duckdb      * 0.9.1    2023-10-13 [1] RSPM (R 4.3.0)
#>  evaluate      0.22     2023-09-29 [1] CRAN (R 4.3.1)
#>  fansi         1.0.5    2023-10-08 [1] RSPM (R 4.3.0)
#>  fastmap       1.1.1    2023-02-24 [1] RSPM (R 4.3.0)
#>  forcats     * 1.0.0    2023-01-29 [1] RSPM (R 4.3.0)
#>  fs            1.6.3    2023-07-20 [1] RSPM (R 4.3.0)
#>  generics      0.1.3    2022-07-05 [1] RSPM (R 4.3.0)
#>  ggplot2     * 3.4.4    2023-10-12 [1] RSPM (R 4.3.0)
#>  glue          1.6.2    2022-02-24 [1] RSPM (R 4.3.0)
#>  gtable        0.3.4    2023-08-21 [1] RSPM (R 4.3.0)
#>  hms           1.1.3    2023-03-21 [1] RSPM (R 4.3.0)
#>  htmltools     0.5.6.1  2023-10-06 [1] RSPM (R 4.3.0)
#>  knitr         1.44     2023-09-11 [1] RSPM (R 4.3.0)
#>  lifecycle     1.0.3    2022-10-07 [1] RSPM (R 4.3.0)
#>  lubridate   * 1.9.3    2023-09-27 [1] RSPM (R 4.3.0)
#>  magrittr      2.0.3    2022-03-30 [1] RSPM (R 4.3.0)
#>  munsell       0.5.0    2018-06-12 [1] RSPM (R 4.3.0)
#>  pillar        1.9.0    2023-03-22 [1] RSPM (R 4.3.0)
#>  pkgconfig     2.0.3    2019-09-22 [1] RSPM (R 4.3.0)
#>  purrr       * 1.0.2    2023-08-10 [1] RSPM (R 4.3.0)
#>  R.cache       0.16.0   2022-07-21 [1] RSPM (R 4.3.0)
#>  R.methodsS3   1.8.2    2022-06-13 [1] RSPM (R 4.3.0)
#>  R.oo          1.25.0   2022-06-12 [1] RSPM (R 4.3.0)
#>  R.utils       2.12.2   2022-11-11 [1] RSPM (R 4.3.0)
#>  R6            2.5.1    2021-08-19 [1] RSPM (R 4.3.0)
#>  readr       * 2.1.4    2023-02-10 [1] RSPM (R 4.3.0)
#>  reprex        2.0.2    2022-08-17 [1] RSPM (R 4.3.0)
#>  rlang         1.1.1    2023-04-28 [1] RSPM (R 4.3.0)
#>  rmarkdown     2.25     2023-09-18 [1] RSPM (R 4.3.0)
#>  scales        1.2.1    2022-08-20 [1] RSPM (R 4.3.0)
#>  sessioninfo   1.2.2    2021-12-06 [1] RSPM (R 4.3.0)
#>  stringi       1.7.12   2023-01-11 [1] CRAN (R 4.3.0)
#>  stringr     * 1.5.0    2022-12-02 [1] RSPM (R 4.3.0)
#>  styler        1.10.2   2023-08-29 [1] RSPM (R 4.3.0)
#>  tibble      * 3.2.1    2023-03-20 [1] RSPM (R 4.3.0)
#>  tidyr       * 1.3.0    2023-01-24 [1] RSPM (R 4.3.0)
#>  tidyselect    1.2.0    2022-10-10 [1] RSPM (R 4.3.0)
#>  tidyverse   * 2.0.0    2023-02-22 [1] RSPM (R 4.3.0)
#>  timechange    0.2.0    2023-01-11 [1] RSPM (R 4.3.0)
#>  tzdb          0.4.0    2023-05-12 [1] RSPM (R 4.3.0)
#>  utf8          1.2.3    2023-01-31 [1] RSPM (R 4.3.0)
#>  vctrs         0.6.4    2023-10-12 [1] RSPM (R 4.3.0)
#>  withr         2.5.1    2023-09-26 [1] RSPM (R 4.3.0)
#>  xfun          0.40     2023-08-09 [1] RSPM (R 4.3.0)
#>  yaml          2.3.7    2023-01-23 [1] RSPM (R 4.3.0)
#> 
#>  [1] /home/filoche/R/x86_64-pc-linux-gnu-library/4.3
#>  [2] /usr/bin/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@PMassicotte PMassicotte changed the title Connection is garbage-collected, use dbDisconnect() to avoid this. [R[ Connection is garbage-collected, use dbDisconnect() to avoid this (when using to_duckdb()) Oct 21, 2023
@PMassicotte PMassicotte changed the title [R[ Connection is garbage-collected, use dbDisconnect() to avoid this (when using to_duckdb()) [R] Connection is garbage-collected, use dbDisconnect() to avoid this (when using to_duckdb()) Oct 21, 2023
thisisnic pushed a commit that referenced this issue Nov 1, 2023
…38495)

### Rationale for this change

We get lots of warning messages about unclosed connections when running tests + users get them on exit when they weren't expecting them.

### What changes are included in this PR?

A finalizer was added on exit to close the global arrow_duck_con that we cache in the global options.

### Are these changes tested?

Yes, the finalizer will run in every test that runs `to_duckb()` with the default connection.

### Are there any user-facing changes?

No.
* Closes: #38382

Authored-by: Dewey Dunnington <dewey@fishandwhistle.net>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
@thisisnic thisisnic added this to the 15.0.0 milestone Nov 1, 2023
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…exit (apache#38495)

### Rationale for this change

We get lots of warning messages about unclosed connections when running tests + users get them on exit when they weren't expecting them.

### What changes are included in this PR?

A finalizer was added on exit to close the global arrow_duck_con that we cache in the global options.

### Are these changes tested?

Yes, the finalizer will run in every test that runs `to_duckb()` with the default connection.

### Are there any user-facing changes?

No.
* Closes: apache#38382

Authored-by: Dewey Dunnington <dewey@fishandwhistle.net>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
assignUser pushed a commit that referenced this issue Jan 10, 2024
…38495)

### Rationale for this change

We get lots of warning messages about unclosed connections when running tests + users get them on exit when they weren't expecting them.

### What changes are included in this PR?

A finalizer was added on exit to close the global arrow_duck_con that we cache in the global options.

### Are these changes tested?

Yes, the finalizer will run in every test that runs `to_duckb()` with the default connection.

### Are there any user-facing changes?

No.
* Closes: #38382

Authored-by: Dewey Dunnington <dewey@fishandwhistle.net>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…exit (apache#38495)

### Rationale for this change

We get lots of warning messages about unclosed connections when running tests + users get them on exit when they weren't expecting them.

### What changes are included in this PR?

A finalizer was added on exit to close the global arrow_duck_con that we cache in the global options.

### Are these changes tested?

Yes, the finalizer will run in every test that runs `to_duckb()` with the default connection.

### Are there any user-facing changes?

No.
* Closes: apache#38382

Authored-by: Dewey Dunnington <dewey@fishandwhistle.net>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants