Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] parallel as.data.frame.Table hangs indefinitely on Windows #26715

Closed
asfimport opened this issue Nov 30, 2020 · 10 comments
Closed

[R] parallel as.data.frame.Table hangs indefinitely on Windows #26715

asfimport opened this issue Nov 30, 2020 · 10 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Nov 30, 2020

On Windows only

Tested on 2 machines, mingw. 

Reprex

install.packages("arrow", repos = "https://arrow-r-nightly.s3.amazonaws.com")
remotes::install_github("meztez/bigrquerystorage", INSTALL_opts = "--no-multiarch")

library(bigrquerystorage)
Sys.info()
sessionInfo()
Sys.setenv("BIGQUERY_TEST_PROJECT"="labo-brunotremblay-253317")
con <- bigrquery::dbConnect(
  bigrquery::bigquery(),
  project = "bigquery-public-data",
  dataset = "usa_names",
  billing = bigrquery:::bq_test_project(),
  quiet = FALSE)

# Does not hang
options(arrow.use_threads = FALSE)
dt <- bigrquerystorage::dbReadTable(con, "bigquery-public-data.usa_names.usa_1910_current")

# Hangs
options(arrow.use_threads = TRUE)
dt <- bigrquerystorage::dbReadTable(con, "bigquery-public-data.usa_names.usa_1910_current")

 

Session details

 

> Sys.info()
       sysname        release        version       nodename        machine          login           user effective_user 
     "Windows"       "10 x64"  "build 19042"   "C000055787"       "x86-64"     "gen01914"     "gen01914"     "gen01914" 
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     other attached packages:
[1] bigrquery_1.3.2.9001

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5           rstudioapi_0.13      magrittr_1.5         tidyselect_1.1.0     bit_4.0.4            R6_2.5.0            
 [7] rlang_0.4.8          dplyr_1.0.2          httr_1.4.2           tools_4.0.3          arrow_2.0.0.20201130 DBI_1.1.0           
[13] dbplyr_2.0.0         ellipsis_0.3.1       remotes_2.2.0        bit64_4.0.5          assertthat_0.2.1     gargle_0.5.0        
[19] tibble_3.0.4         lifecycle_0.2.0      crayon_1.3.4         purrr_0.3.4          fs_1.5.0             vctrs_0.3.4         
[25] glue_1.4.2           compiler_4.0.3       pillar_1.4.6         generics_0.1.0       jsonlite_1.7.1       pkgconfig_2.0.3  

 


**Reporter**: [Bruno Tremblay](https://issues.apache.org/jira/browse/ARROW-10773)
**Assignee**: [Neal Richardson](https://issues.apache.org/jira/browse/ARROW-10773) / @nealrichardson
#### Related issues:
- [[R] Investigate/fix thread safety issues (esp. Windows)](https://github.com/apache/arrow/issues/24563) (relates to)

<sub>**Note**: *This issue was originally created as [ARROW-10773](https://issues.apache.org/jira/browse/ARROW-10773). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>
@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
[~meztez] could you install arrow from our nightly repository and retry? This sounds similar to ARROW-10080.

@asfimport
Copy link
Collaborator Author

Bruno Tremblay:
Same results.

 

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
I see, so this sounds instead like ARROW-8379. Is it possible to keep the result from bigquery as an arrow Table and share that? Or even better since I presume that data is quite big, get the Table and then try to slice it down to a more minimal reproducer (that hangs when you call as.data.frame() on it) that you can share?

@asfimport
Copy link
Collaborator Author

Bruno Tremblay:
Here is the fun part.

The Table is built from a single vector of RAWSXP reprensenting an IPC stream.

When this raw vector is saved to disk using saveRDS then reread using readRDS, the resulting Table has no problem behind converted to a data.frame even with multithreading.

It is only in the case where the vector stays in memory that the problem occurs on multi-threading.

Mind you building the Table itself is not an issue and querying the table for everyrow also yield the expected results.

It's pretty hard for me to nail the problem down as I do not have any notion yet of how threads are handled in Cpp.

 

But I'm pretty sure it has to do with either memory management or the length/capacity of the in memory vector. 

 

Next up is doing a memory dump to compare between in-memory only and memory-disk-memory method.

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
Yeah, that is interesting. If the raw vector itself is fine if you readRDS and convert to Table, then I wonder if there's something in your C++ code that makes the query that receives the raw vector that's the issue, something about how it owns the memory. Comparing the bits also sounds like a good idea, let me know what you find.

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
[~meztez] is this still an issue for you?

@asfimport
Copy link
Collaborator Author

Bruno Tremblay:
I've disabled threading on windows for the moment. I do not think I will revisit this issue soon as I could not find any difference in the bytes themselves and could not get rid of the issue after several hours.

 

Tested with the latest R and it still hangs just now. It made me thinks about comparing pointer ownerships.

 

Should I reinstall arrow too?

@asfimport
Copy link
Collaborator Author

Bruno Tremblay:
So I've done further testing. I still encounter the same issue.

Comparing memory dump did not yield satisfactory results.

I will try to get my hands on a different windows install maybe, a different CPU.

 

I'll update if I find something.

 

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
We believe that this has been resolved in ARROW-8379. If you still experience this with version 6.0.0 or greater (after it is released in mid-October), please open a new issue.

@asfimport
Copy link
Collaborator Author

Neal Richardson / @nealrichardson:
Correct, that's what we're going for in the 6.0.0 release, just to make the feature stable on Windows. After the release we're planning to experiment with some approaches to re-enable multithreading on Windows, but we wanted to eliminate the hanging behavior first, even if that meant a tradeoff in performance in the cases where it didn't deadlock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants