-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Segmentation fault when using write_parquet() #34211
Comments
Thanks for reporting this @PMassicotte. A couple of things to suggest:
|
Thank you @thisisnic for replying.
r$> arrow::arrow_info()
Arrow package version: 11.0.0.2
Capabilities:
dataset TRUE
substrait FALSE
parquet TRUE
json TRUE
s3 TRUE
gcs TRUE
utf8proc TRUE
re2 TRUE
snappy TRUE
gzip TRUE
brotli TRUE
zstd TRUE
lz4 TRUE
lz4_frame TRUE
lzo FALSE
bz2 TRUE
jemalloc TRUE
mimalloc TRUE
Memory:
Allocator jemalloc
Current 0 bytes
Max 112.63 Mb
Runtime:
SIMD Level avx2
Detected SIMD Level avx2
Build:
C++ Library Version 11.0.0
C++ Compiler GNU
C++ Compiler Version 7.5.0
Thank you very much. |
I encountered segfault when writing tables using pythons parquetwriter, in particular compression=gzip would segfault every time. I noticed that making the code non-threaded fixed the issue, so perhaps the compression types are using too many resources? |
Just wanted to chime in here since I'm experiencing a very similar error. It happens when I'm writing a dataframe to an arrow/feather file. Like for OP it works in arrow 10 but not 11. And is originating from the same line. Here is the top of the stacktrace running R with debugger attached:
It happens during unit testing in my package so I can reproduce it as much as I want locally, and happens as well running github actions on ubuntu, windows and macOS. But unfortunately have not been able to create a minimum reproducible example. Just calling the function separately that gives the issue with the same inputs does not give the error. So it seems it is dependent on something that happens earlier during my unit testing. I've tried turning of thread using Attached is my sessionInfo and arrowInfo. If you have any ideas on how to debug further please let me know. SessionInfoR version 4.2.2 (2022-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.10Matrix products: default locale: attached base packages: other attached packages: loaded via a namespace (and not attached): arrowInfoArrow package version: 11.0.0.2Capabilities: dataset TRUE Memory: Allocator jemalloc Runtime: SIMD Level avx2 Build: C++ Library Version 11.0.0 |
@egillax Thanks for reporting this. @PMassicotte Would you mind trying again with the debugger attached, but pasting the complete output of when it crashes? Sometimes the top line isn't quite the right one to give us enough clues, and I want to see if your error is the same as the one reported by @egillax . @paleolimbot Would you mind taking a look at this? If this is a common error between both of the folks above, the line that seems to be where the crash happens was modified in #14277, so you'll know more than me about that bit. |
@thisisnic Sure. How can I do that? Is there documentation showing how to attach the output of the debugger? |
@egillax Is the package you're testing available publicly? It would help me a lot to fix this if I can reproduce locally. @PMassicotte I don't recall if we have instructions but it looks like you're on Ubuntu so you could probably do |
Based on @jiggunjer's observation, you all could also try setting:
(We added some cancellation features in 11.0.0 and that is a way to turn them off) |
Here is what I am getting:
|
Biogeochemical Argo!!!! (Cool to see it here...that's what I did in my previous job!) Since all these seem related to |
Ah, cool to meet a fellow bioargo colleague :) glimpse(bioargo_dark_corrected)
Rows: 326,727
Columns: 111
$ filename <chr> "4902602_Sprof.nc", "4…
$ floatname <chr> "4902602", "4902602", …
$ takuse <chr> "takuse002b", "takuse0…
$ date_time <dttm> 2021-10-26 14:23:06, …
$ juld <dbl> 26231.6, 26231.6, 2623…
$ juld_qc <chr> "1", "1", "1", "1", "1…
$ juld_location <dbl> 26231.62, 26231.62, 26…
$ pres <dbl> 0.00, 0.10, 0.13, 0.28…
$ pres_qc <chr> "1", "1", "1", "1", "1…
$ pres_adjusted <dbl> NA, NA, NA, NA, -0.050…
$ pres_adjusted_qc <chr> " ", " ", " ", " ", "1…
$ pres_adjusted_error <dbl> NA, NA, NA, NA, NA, NA…
$ temp <dbl> -0.5600167, -0.5598500…
$ temp_qc <chr> "3", "3", "3", "3", "3…
$ temp_d_pres <dbl> -0.05, 0.03, 0.00, 0.0…
$ temp_adjusted <dbl> -0.5680000, -0.5680000…
$ temp_adjusted_qc <chr> "8", "1", "8", "8", "1…
$ temp_adjusted_error <dbl> NA, NA, NA, NA, NA, NA…
$ psal <dbl> 31.58545, 31.58433, 31…
$ psal_qc <chr> "3", "3", "3", "3", "3…
$ psal_d_pres <dbl> -0.05, 0.03, 0.00, 0.0…
$ psal_adjusted <dbl> 31.58900, 31.58900, 31…
$ psal_adjusted_qc <chr> "8", "1", "8", "8", "1…
$ psal_adjusted_error <dbl> NA, NA, NA, NA, NA, NA…
$ doxy <dbl> NA, NA, NA, NA, NA, NA…
$ doxy_qc <chr> " ", " ", " ", " ", " …
$ doxy_d_pres <dbl> NA, NA, NA, NA, NA, NA…
$ doxy_adjusted <dbl> NA, NA, NA, NA, NA, NA…
$ doxy_adjusted_qc <chr> " ", " ", " ", " ", " …
$ doxy_adjusted_error <dbl> NA, NA, NA, NA, NA, NA…
$ down_irradiance380 <dbl> 0.004928912, 0.0049844…
$ down_irradiance380_qc <chr> "1", "1", " ", " ", "8…
$ down_irradiance380_d_pres <dbl> 0.02, 0.02, NA, NA, -0…
$ down_irradiance380_adjusted <dbl> NA, NA, NA, NA, NA, NA…
$ down_irradiance380_adjusted_qc <chr> " ", " ", " ", " ", " …
$ down_irradiance380_adjusted_error <dbl> NA, NA, NA, NA, NA, NA…
$ down_irradiance412 <dbl> 0.007823820, 0.0079019…
$ down_irradiance412_qc <chr> "1", "1", " ", " ", "8…
$ down_irradiance412_d_pres <dbl> 0.02, 0.02, NA, NA, -0…
$ down_irradiance412_adjusted <dbl> NA, NA, NA, NA, NA, NA…
$ down_irradiance412_adjusted_qc <chr> " ", " ", " ", " ", " …
$ down_irradiance412_adjusted_error <dbl> NA, NA, NA, NA, NA, NA…
$ down_irradiance490 <dbl> 0.007710332, 0.0077336…
$ down_irradiance490_qc <chr> "1", "1", " ", " ", "8…
$ down_irradiance490_d_pres <dbl> 0.02, 0.02, NA, NA, -0…
$ down_irradiance490_adjusted <dbl> NA, NA, NA, NA, NA, NA…
$ down_irradiance490_adjusted_qc <chr> " ", " ", " ", " ", " …
$ down_irradiance490_adjusted_error <dbl> NA, NA, NA, NA, NA, NA…
$ downwelling_par <dbl> 7.388872, 7.413715, NA…
$ downwelling_par_qc <chr> "1", "1", " ", " ", "8…
$ downwelling_par_d_pres <dbl> 0.02, 0.02, NA, NA, -0…
$ downwelling_par_adjusted <dbl> NA, NA, NA, NA, NA, NA…
$ downwelling_par_adjusted_qc <chr> " ", " ", " ", " ", " …
$ downwelling_par_adjusted_error <dbl> NA, NA, NA, NA, NA, NA…
$ chla <dbl> NA, 0.489100, NA, NA, …
$ chla_qc <chr> " ", "3", " ", " ", "3…
$ chla_d_pres <dbl> NA, 0.0, NA, NA, 0.0, …
$ chla_adjusted <dbl> NA, 0.2993, NA, NA, 0.…
$ chla_adjusted_qc <chr> " ", "5", " ", " ", "5…
$ chla_adjusted_error <dbl> NA, NA, NA, NA, NA, NA…
$ bbp700 <dbl> NA, 0.0007781871, NA, …
$ bbp700_qc <chr> " ", "2", " ", " ", "2…
$ bbp700_d_pres <dbl> NA, 0.0, NA, NA, 0.0, …
$ bbp700_adjusted <dbl> NA, NA, NA, NA, NA, NA…
$ bbp700_adjusted_qc <chr> " ", " ", " ", " ", " …
$ bbp700_adjusted_error <dbl> NA, NA, NA, NA, NA, NA…
$ cdom <dbl> NA, 0.7080, NA, NA, 0.…
$ cdom_qc <chr> " ", "0", " ", " ", "0…
$ cdom_d_pres <dbl> NA, 0.0, NA, NA, 0.0, …
$ cdom_adjusted <dbl> NA, NA, NA, NA, NA, NA…
$ cdom_adjusted_qc <chr> " ", " ", " ", " ", " …
$ cdom_adjusted_error <dbl> NA, NA, NA, NA, NA, NA…
$ nitrate <dbl> NA, NA, NA, NA, NA, NA…
$ nitrate_qc <chr> " ", " ", " ", " ", " …
$ nitrate_d_pres <dbl> NA, NA, NA, NA, NA, NA…
$ nitrate_adjusted <dbl> NA, NA, NA, NA, NA, NA…
$ nitrate_adjusted_qc <chr> " ", " ", " ", " ", " …
$ nitrate_adjusted_error <dbl> NA, NA, NA, NA, NA, NA…
$ n_levels <int> 4, 5, 6, 7, 8, 9, 10, …
$ n_prof <int> 1, 1, 1, 1, 1, 1, 1, 1…
$ cycle_number <int> 1, 1, 1, 1, 1, 1, 1, 1…
$ direction <chr> "A", "A", "A", "A", "A…
$ latitude <dbl> 72.71641, 72.71641, 72…
$ longitude <dbl> -66.70568, -66.70568, …
$ position_qc <chr> "1", "1", "1", "1", "1…
$ config_mission_number <int> 1, 1, 1, 1, 1, 1, 1, 1…
$ profile_pres_qc <chr> "A", "A", "A", "A", "A…
$ profile_temp_qc <chr> "B", "B", "B", "B", "B…
$ profile_psal_qc <chr> "B", "B", "B", "B", "B…
$ profile_doxy_qc <chr> "B", "B", "B", "B", "B…
$ profile_down_irradiance380_qc <chr> "A", "A", "A", "A", "A…
$ profile_down_irradiance412_qc <chr> "A", "A", "A", "A", "A…
$ profile_down_irradiance490_qc <chr> "A", "A", "A", "A", "A…
$ profile_downwelling_par_qc <chr> "A", "A", "A", "A", "A…
$ profile_chla_qc <chr> "B", "B", "B", "B", "B…
$ profile_bbp700_qc <chr> "A", "A", "A", "A", "A…
$ profile_cdom_qc <chr> " ", " ", " ", " ", " …
$ profile_nitrate_qc <chr> "A", "A", "A", "A", "A…
$ position <lgl> NA, NA, NA, NA, NA, NA…
$ profile_pres <lgl> NA, NA, NA, NA, NA, NA…
$ profile_temp <lgl> NA, NA, NA, NA, NA, NA…
$ profile_psal <lgl> NA, NA, NA, NA, NA, NA…
$ profile_doxy <lgl> NA, NA, NA, NA, NA, NA…
$ profile_down_irradiance380 <lgl> NA, NA, NA, NA, NA, NA…
$ profile_down_irradiance412 <lgl> NA, NA, NA, NA, NA, NA…
$ profile_down_irradiance490 <lgl> NA, NA, NA, NA, NA, NA…
$ profile_downwelling_par <lgl> NA, NA, NA, NA, NA, NA…
$ profile_chla <lgl> NA, NA, NA, NA, NA, NA…
$ profile_bbp700 <lgl> NA, NA, NA, NA, NA, NA…
$ profile_cdom <lgl> NA, NA, NA, NA, NA, NA…
$ profile_nitrate <lgl> NA, NA, NA, NA, NA, NA… r$> str(bioargo_dark_corrected[integer(0), ])
tibble [0 × 111] (S3: tbl_df/tbl/data.frame)
$ filename : chr(0)
$ floatname : chr(0)
$ takuse : Named chr(0)
..- attr(*, "names")= chr(0)
$ date_time : 'POSIXct' num(0)
- attr(*, "tzone")= chr "UTC"
$ juld : num(0)
$ juld_qc : chr(0)
$ juld_location : num(0)
$ pres : num(0)
$ pres_qc : chr(0)
$ pres_adjusted : num(0)
$ pres_adjusted_qc : chr(0)
$ pres_adjusted_error : num(0)
$ temp : num(0)
$ temp_qc : chr(0)
$ temp_d_pres : num(0)
$ temp_adjusted : num(0)
$ temp_adjusted_qc : chr(0)
$ temp_adjusted_error : num(0)
$ psal : num(0)
$ psal_qc : chr(0)
$ psal_d_pres : num(0)
$ psal_adjusted : num(0)
$ psal_adjusted_qc : chr(0)
$ psal_adjusted_error : num(0)
$ doxy : num(0)
$ doxy_qc : chr(0)
$ doxy_d_pres : num(0)
$ doxy_adjusted : num(0)
$ doxy_adjusted_qc : chr(0)
$ doxy_adjusted_error : num(0)
$ down_irradiance380 : num(0)
$ down_irradiance380_qc : chr(0)
$ down_irradiance380_d_pres : num(0)
$ down_irradiance380_adjusted : num(0)
$ down_irradiance380_adjusted_qc : chr(0)
$ down_irradiance380_adjusted_error: num(0)
$ down_irradiance412 : num(0)
$ down_irradiance412_qc : chr(0)
$ down_irradiance412_d_pres : num(0)
$ down_irradiance412_adjusted : num(0)
$ down_irradiance412_adjusted_qc : chr(0)
$ down_irradiance412_adjusted_error: num(0)
$ down_irradiance490 : num(0)
$ down_irradiance490_qc : chr(0)
$ down_irradiance490_d_pres : num(0)
$ down_irradiance490_adjusted : num(0)
$ down_irradiance490_adjusted_qc : chr(0)
$ down_irradiance490_adjusted_error: num(0)
$ downwelling_par : num(0)
$ downwelling_par_qc : chr(0)
$ downwelling_par_d_pres : num(0)
$ downwelling_par_adjusted : num(0)
$ downwelling_par_adjusted_qc : chr(0)
$ downwelling_par_adjusted_error : num(0)
$ chla : num(0)
$ chla_qc : chr(0)
$ chla_d_pres : num(0)
$ chla_adjusted : num(0)
$ chla_adjusted_qc : chr(0)
$ chla_adjusted_error : num(0)
$ bbp700 : num(0)
$ bbp700_qc : chr(0)
$ bbp700_d_pres : num(0)
$ bbp700_adjusted : num(0)
$ bbp700_adjusted_qc : chr(0)
$ bbp700_adjusted_error : num(0)
$ cdom : num(0)
$ cdom_qc : chr(0)
$ cdom_d_pres : num(0)
$ cdom_adjusted : num(0)
$ cdom_adjusted_qc : chr(0)
$ cdom_adjusted_error : num(0)
$ nitrate : num(0)
$ nitrate_qc : chr(0)
$ nitrate_d_pres : num(0)
$ nitrate_adjusted : num(0)
$ nitrate_adjusted_qc : chr(0)
$ nitrate_adjusted_error : num(0)
$ n_levels : int(0)
$ n_prof : int(0)
$ cycle_number : int(0)
$ direction : chr(0)
$ latitude : num(0)
$ longitude : num(0)
$ position_qc : chr(0)
$ config_mission_number : int(0)
$ profile_pres_qc : chr(0)
$ profile_temp_qc : chr(0)
$ profile_psal_qc : chr(0)
$ profile_doxy_qc : chr(0)
$ profile_down_irradiance380_qc : chr(0)
$ profile_down_irradiance412_qc : chr(0)
$ profile_down_irradiance490_qc : chr(0)
$ profile_downwelling_par_qc : chr(0)
$ profile_chla_qc : chr(0)
$ profile_bbp700_qc : chr(0)
$ profile_cdom_qc : chr(0)
$ profile_nitrate_qc : chr(0)
$ position : logi(0)
[list output truncated] |
If I'm not mistaken, I believe you're even using a package I wrote to read the NetCDFs! I don't see any odd column types here but there are certainly a lot of columns and that may be helpful to help us make a reproducer. |
Yes! I just tried:
This is not crashing. So it looks like that the problem (data type?) disappear if I save/read using another format. |
I think I was able to reproduce it. library(tidyverse)
library(arrow)
file <- curl::curl_download("https://download849.mediafire.com/r4csstfcwwwgGquvCho4H6GtScoCJac108RL-q6X9MtoWuPDQvZOQAWhxQqlCjLj2RmsyzikhTZ0ijBElIAs5in5whbp-w/7dk60h8gnj4n1qj/bioargo_correction_b.parquet", destfile = tempfile(fileext = ".parquet"))
bioargo <- read_parquet(file)
bioargo
bioargo |>
group_by(takuse, date_time, n_prof) |>
filter(pres == max(pres)) |>
ggplot(aes(x = pres)) +
geom_histogram(binwidth = 10, color = "white")
bioargo_dark_corrected <- bioargo |>
group_by(takuse, date_time, n_prof) |>
mutate(chla = chla - min(chla, na.rm = TRUE)) |>
ungroup()
write_parquet(bioargo_dark_corrected, tempfile())
Can you confirm? |
Brilliant! I had to download the file separately but this is fantastic. library(tidyverse)
library(arrow)
# Download from:
# https://download849.mediafire.com/r4csstfcwwwgGquvCho4H6GtScoCJac108RL-q6X9MtoWuPDQvZOQAWhxQqlCjLj2RmsyzikhTZ0ijBElIAs5in5whbp-w/7dk60h8gnj4n1qj/bioargo_correction_b.parquet
file <- "~/Desktop/bioargo_correction_b.parquet"
bioargo <- read_parquet(file)
bioargo
pdf(tempfile())
bioargo |>
group_by(takuse, date_time, n_prof) |>
filter(pres == max(pres)) |>
ggplot(aes(x = pres)) +
geom_histogram(binwidth = 10, color = "white")
dev.off()
bioargo_dark_corrected <- bioargo |>
group_by(takuse, date_time, n_prof) |>
mutate(chla = chla - min(chla, na.rm = TRUE)) |>
ungroup()
write_parquet(bioargo_dark_corrected, tempfile()) This reprex appears to crash R. Standard output and error *** caught segfault ***
address 0x18, cause 'invalid permissions'
Traceback:
1: Table__from_dots(dots, schema, option_use_threads())
2: Table$create(x, schema = schema)
3: as_arrow_table.data.frame(x)
4: as_arrow_table(x)
5: doTryCatch(return(expr), name, parentenv, handler)
6: tryCatchOne(expr, names, parentenv, handlers[[1L]])
7: tryCatchList(expr, classes, parentenv, handlers)
8: tryCatch(as_arrow_table(x), arrow_no_method_as_arrow_table = function(e) { abort("Object must be coercible to an Arrow Table using `as_arrow_table()`", parent = e, call = caller_env(2))})
9: as_writable_table(x)
10: write_parquet(bioargo_dark_corrected, tempfile())
11: eval(expr, envir, enclos)
12: eval(expr, envir, enclos)
13: eval_with_user_handlers(expr, envir, enclos, user_handlers)
14: withVisible(eval_with_user_handlers(expr, envir, enclos, user_handlers))
15: withCallingHandlers(withVisible(eval_with_user_handlers(expr, envir, enclos, user_handlers)), warning = wHandler, error = eHandler, message = mHandler)
16: doTryCatch(return(expr), name, parentenv, handler)
17: tryCatchOne(expr, names, parentenv, handlers[[1L]])
18: tryCatchList(expr, classes, parentenv, handlers)
19: tryCatch(expr, error = function(e) { call <- conditionCall(e) if (!is.null(call)) { if (identical(call[[1L]], quote(doTryCatch))) call <- sys.call(-4L) dcall <- deparse(call, nlines = 1L) prefix <- paste("Error in", dcall, ": ") LONG <- 75L sm <- strsplit(conditionMessage(e), "\n")[[1L]] w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") if (is.na(w)) w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L], type = "b") if (w > LONG) prefix <- paste0(prefix, "\n ") } else prefix <- "Error : " msg <- paste0(prefix, conditionMessage(e), "\n") .Internal(seterrmessage(msg[1L])) if (!silent && isTRUE(getOption("show.error.messages"))) { cat(msg, file = outFile) .Internal(printDeferredWarnings()) } invisible(structure(msg, class = "try-error", condition = e))})
20: try(f, silent = TRUE)
21: handle(ev <- withCallingHandlers(withVisible(eval_with_user_handlers(expr, envir, enclos, user_handlers)), warning = wHandler, error = eHandler, message = mHandler))
22: timing_fn(handle(ev <- withCallingHandlers(withVisible(eval_with_user_handlers(expr, envir, enclos, user_handlers)), warning = wHandler, error = eHandler, message = mHandler)))
23: evaluate_call(expr, parsed$src[[i]], envir = envir, enclos = enclos, debug = debug, last = i == length(out), use_try = stop_on_error != 2L, keep_warning = keep_warning, keep_message = keep_message, output_handler = output_handler, include_timing = include_timing)
24: evaluate::evaluate(...)
25: evaluate(code, envir = env, new_device = FALSE, keep_warning = if (is.numeric(options$warning)) TRUE else options$warning, keep_message = if (is.numeric(options$message)) TRUE else options$message, stop_on_error = if (is.numeric(options$error)) options$error else { if (options$error && options$include) 0L else 2L }, output_handler = knit_handlers(options$render, options))
26: in_dir(input_dir(), expr)
27: in_input_dir(evaluate(code, envir = env, new_device = FALSE, keep_warning = if (is.numeric(options$warning)) TRUE else options$warning, keep_message = if (is.numeric(options$message)) TRUE else options$message, stop_on_error = if (is.numeric(options$error)) options$error else { if (options$error && options$include) 0L else 2L }, output_handler = knit_handlers(options$render, options)))
28: eng_r(options)
29: block_exec(params)
30: call_block(x)
31: process_group.block(group)
32: process_group(group)
33: withCallingHandlers(if (tangle) process_tangle(group) else process_group(group), error = function(e) { setwd(wd) cat(res, sep = "\n", file = output %n% "") message("Quitting from lines ", paste(current_lines(i), collapse = "-"), " (", knit_concord$get("infile"), ") ") })
34: process_file(text, output)
35: knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
36: rmarkdown::render(input, quiet = TRUE, envir = globalenv(), encoding = "UTF-8")
37: (function (input) { rmarkdown::render(input, quiet = TRUE, envir = globalenv(), encoding = "UTF-8")})(input = base::quote("loyal-rat_reprex.R"))
38: (function (what, args, quote = FALSE, envir = parent.frame()) { if (!is.list(args)) stop("second argument must be a list") if (quote) args <- lapply(args, enquote) .Internal(do.call(what, args, envir))})(base::quote(function (input) { rmarkdown::render(input, quiet = TRUE, envir = globalenv(), encoding = "UTF-8")}), base::quote(list(input = "loyal-rat_reprex.R")), envir = base::quote(<environment>), quote = base::quote(TRUE))
39: do.call(do.call, c(readRDS("/var/folders/p5/sxv05ml96sd1n2p3ssfhzzth0000gn/T//Rtmpod6Iee/callr-fun-768e1cc53b50"), list(envir = .GlobalEnv, quote = TRUE)), envir = .GlobalEnv, quote = TRUE)
40: saveRDS(do.call(do.call, c(readRDS("/var/folders/p5/sxv05ml96sd1n2p3ssfhzzth0000gn/T//Rtmpod6Iee/callr-fun-768e1cc53b50"), list(envir = .GlobalEnv, quote = TRUE)), envir = .GlobalEnv, quote = TRUE), file = "/var/folders/p5/sxv05ml96sd1n2p3ssfhzzth0000gn/T//Rtmpod6Iee/callr-res-768e58b90ff1", compress = FALSE)
41: withCallingHandlers({ NULL saveRDS(do.call(do.call, c(readRDS("/var/folders/p5/sxv05ml96sd1n2p3ssfhzzth0000gn/T//Rtmpod6Iee/callr-fun-768e1cc53b50"), list(envir = .GlobalEnv, quote = TRUE)), envir = .GlobalEnv, quote = TRUE), file = "/var/folders/p5/sxv05ml96sd1n2p3ssfhzzth0000gn/T//Rtmpod6Iee/callr-res-768e58b90ff1", compress = FALSE) flush(stdout()) flush(stderr()) NULL invisible()}, error = function(e) { { callr_data <- as.environment("tools:callr")$`__callr_data__` err <- callr_data$err if (FALSE) { assign(".Traceback", .traceback(4), envir = callr_data) dump.frames("__callr_dump__") assign(".Last.dump", .GlobalEnv$`__callr_dump__`, envir = callr_data) rm("__callr_dump__", envir = .GlobalEnv) } e <- err$process_call(e) e2 <- err$new_error("error in callr subprocess") class(e2) <- c("callr_remote_error", class(e2)) e2 <- err$add_trace_back(e2) cut <- which(e2$trace$scope == "global")[1] if (!is.na(cut)) { e2$trace <- e2$trace[-(1:cut), ] } saveRDS(list("error", e2, e), file = paste0("/var/folders/p5/sxv05ml96sd1n2p3ssfhzzth0000gn/T//Rtmpod6Iee/callr-res-768e58b90ff1", ".error")) }}, interrupt = function(e) { { callr_data <- as.environment("tools:callr")$`__callr_data__` err <- callr_data$err if (FALSE) { assign(".Traceback", .traceback(4), envir = callr_data) dump.frames("__callr_dump__") assign(".Last.dump", .GlobalEnv$`__callr_dump__`, envir = callr_data) rm("__callr_dump__", envir = .GlobalEnv) } e <- err$process_call(e) e2 <- err$new_error("error in callr subprocess") class(e2) <- c("callr_remote_error", class(e2)) e2 <- err$add_trace_back(e2) cut <- which(e2$trace$scope == "global")[1] if (!is.na(cut)) { e2$trace <- e2$trace[-(1:cut), ] } saveRDS(list("error", e2, e), file = paste0("/var/folders/p5/sxv05ml96sd1n2p3ssfhzzth0000gn/T//Rtmpod6Iee/callr-res-768e58b90ff1", ".error")) }}, callr_message = function(e) { try(signalCondition(e))})
42: doTryCatch(return(expr), name, parentenv, handler)
43: tryCatchOne(expr, names, parentenv, handlers[[1L]])
44: tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
45: doTryCatch(return(expr), name, parentenv, handler)
46: tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]), names[nh], parentenv, handlers[[nh]])
47: tryCatchList(expr, classes, parentenv, handlers)
48: tryCatch(withCallingHandlers({ NULL saveRDS(do.call(do.call, c(readRDS("/var/folders/p5/sxv05ml96sd1n2p3ssfhzzth0000gn/T//Rtmpod6Iee/callr-fun-768e1cc53b50"), list(envir = .GlobalEnv, quote = TRUE)), envir = .GlobalEnv, quote = TRUE), file = "/var/folders/p5/sxv05ml96sd1n2p3ssfhzzth0000gn/T//Rtmpod6Iee/callr-res-768e58b90ff1", compress = FALSE) flush(stdout()) flush(stderr()) NULL invisible()}, error = function(e) { { callr_data <- as.environment("tools:callr")$`__callr_data__` err <- callr_data$err if (FALSE) { assign(".Traceback", .traceback(4), envir = callr_data) dump.frames("__callr_dump__") assign(".Last.dump", .GlobalEnv$`__callr_dump__`, envir = callr_data) rm("__callr_dump__", envir = .GlobalEnv) } e <- err$process_call(e) e2 <- err$new_error("error in callr subprocess") class(e2) <- c("callr_remote_error", class(e2)) e2 <- err$add_trace_back(e2) cut <- which(e2$trace$scope == "global")[1] if (!is.na(cut)) { e2$trace <- e2$trace[-(1:cut), ] } saveRDS(list("error", e2, e), file = paste0("/var/folders/p5/sxv05ml96sd1n2p3ssfhzzth0000gn/T//Rtmpod6Iee/callr-res-768e58b90ff1", ".error")) }}, interrupt = function(e) { { callr_data <- as.environment("tools:callr")$`__callr_data__` err <- callr_data$err if (FALSE) { assign(".Traceback", .traceback(4), envir = callr_data) dump.frames("__callr_dump__") assign(".Last.dump", .GlobalEnv$`__callr_dump__`, envir = callr_data) rm("__callr_dump__", envir = .GlobalEnv) } e <- err$process_call(e) e2 <- err$new_error("error in callr subprocess") class(e2) <- c("callr_remote_error", class(e2)) e2 <- err$add_trace_back(e2) cut <- which(e2$trace$scope == "global")[1] if (!is.na(cut)) { e2$trace <- e2$trace[-(1:cut), ] } saveRDS(list("error", e2, e), file = paste0("/var/folders/p5/sxv05ml96sd1n2p3ssfhzzth0000gn/T//Rtmpod6Iee/callr-res-768e58b90ff1", ".error")) }}, callr_message = function(e) { try(signalCondition(e))}), error = function(e) { NULL if (TRUE) { try(stop(e)) } else { invisible() }}, interrupt = function(e) { NULL if (TRUE) { e } else { invisible() }})
An irrecoverable exception occurred. R is aborting now ... |
It looks like this is a problem with ALTREP bypass: the array (probably most of them) are already ALTREP coming from Arrow, and the segfault we get is when we try to get the chunked array back:
|
Nice! thank you for your rapid response! |
…ting to access the underlying ChunkedArray (#34489) ### Rationale for this change When we attempt to re-use an object that Arrow itself created previously by wrapping a chunked array, we will get a crash if this object has been materialized (i.e., R values have been accessed and the ChunkedArray reference deleted). This behaviour changed between 10.0.0 and 11.0.0 because I redid the ALTREP implementation just after the 10.0.0 release. The following test crashes R on main and 11.0.0 but passes after this PR: ``` r library(arrow, warn.conflicts = FALSE) #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information. library(testthat, warn.conflicts = FALSE) withr::local_namespace("arrow") test_that("Materialized ALTREP arrays don't cause arrow to crash when attempting to bypass", { a_int <- Array$create(c(1L, 2L, 3L)) b_int <- a_int$as_vector() expect_true(is_arrow_altrep(b_int)) expect_false(test_arrow_altrep_is_materialized(b_int)) # Some operations that use altrep bypass expect_equal(infer_type(b_int), int32()) expect_equal(as_arrow_array(b_int), a_int) # Still shouldn't have materialized yet expect_false(test_arrow_altrep_is_materialized(b_int)) # Force it to materialize and check again test_arrow_altrep_force_materialize(b_int) expect_true(test_arrow_altrep_is_materialized(b_int)) expect_equal(infer_type(b_int), int32()) expect_equal(as_arrow_array(b_int), a_int) }) #> Test passed 🎉 ``` ### What changes are included in this PR? We used a function called `is_arrow_altrep()` to check if we could safely access the ChunkedArray reference; however, *materialized* ALTREP arrays still cause this return `true`. I added a new function `is_unmaterialized_arrow_altrep()` and replaced usage that depended on the ChunkedArray actually existing to use it. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: #34211 Authored-by: Dewey Dunnington <dewey@voltrondata.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
Describe the bug, including details regarding any error messages, version, and platform.
I am randomly getting segfault when using
write_parquet()
with the latest release (the same code works well with v 10.0.1).Following this (https://arrow.apache.org/docs/7.0/r/articles/developers/debugging.html), here is the exact line when the code crashes.
Component(s)
R
The text was updated successfully, but these errors were encountered: