fwrite "segfault from C stack overflow" for very wide (1,344,389 column) DT #1903

asivadas · 2016-11-07T07:09:28Z

I am using the latest development version of data.table and when I am trying to fwrite a huge table, it errors out with the message "segfault from C stack overflow".

The github version which I installed two weeks back on another system is working perfectly for the same file.

Could you please fix this?

mattdowle · 2016-11-07T09:20:20Z

I'll need much more information please. How many rows and columns is the data.table? What types are the columns? Please run with verbose=TRUE and provide the output. Which version of R? Which operating system?

mattdowle · 2016-11-07T21:42:40Z

If you are on Windows please ensure to purge the old version fully. Maybe reboot required, even these days. We've seen problems before where Windows appears to keep the old .dll hanging around. When this happens it causes segfault as you described. Also, please try the very latest version as of today as there have been changes in the last few days. (The verbose=TRUE output will give me more info now, for example, if you can provide that.)

asivadas · 2016-11-08T07:46:26Z

Sorry about the lack of clarity.
fwrite works fine when I try to write a file (1344388 rows, 2015 cols) but it fails when I try to save the transposed file (2010 rows, 1344388 cols)

Please find all information below:

library(data.table)
data.table 1.9.7 IN DEVELOPMENT built 2016-11-05 10:54:12 UTC; travis
The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
Release notes, videos and slides: http://r-datatable.com
hapsPop=as.data.frame(fread("Q1005__aachanged.haps",stringsAsFactors=F,header=F))
Read 1344388 rows and 2015 (of 2015) columns from 5.078 GB file in 00:02:08
hapsPop=hapsPop[nchar(as.character(hapsPop[,4]))==1 & nchar(as.character(hapsPop[,5]))==1, ]
bin = hapsPop
a=bin[,6:ncol(bin)]
a=as.matrix(a)
a[a == 1] = 2
a[a == 0] = 1
a[a == NA] = 0
t.a=transposeBigData(a)
t.a = setDT(as.data.frame(t.a), keep.rownames = TRUE)[]
fwrite(t.a,file="test.txt",col.names=F,sep=" ",quote=T,verbose=TRUE)
Error: segfault from C stack overflow
dim(t.a)
[1] 2010 1344389
fwrite(hapsPop,file="test.txt",col.names=F,sep=" ",quote=T,verbose=TRUE)maxLineLen=4083 from sample. Found in 0.100s
Writing column names ... done in 0.000s
Writing 1344388 rows in 1310 batches of 1027 rows (each buffer size 8MB, turbo=1, showProgress=1, nth=32) ...
Written 13.2% of 1344388 rows in 2 secs using 32 threads. anyBufferGrown=no; maxBuffUsed=49%. Finished in 13 secs.Written 32.8% of 1344388 rows in 3 secs using 32 threads. anyBufferGrown=no; maxBuffUsed=49%. Finished in 6 secs. Written 52.3% of 1344388 rows in 4 secs using 32 threads. anyBufferGrown=no; maxBuffUsed=49%. Finished in 3 secs. Written 69.4% of 1344388 rows in 5 secs using 32 threads. anyBufferGrown=no; maxBuffUsed=49%. Finished in 2 secs. Written 86.6% of 1344388 rows in 6 secs using 32 threads. anyBufferGrown=no; maxBuffUsed=49%. Finished in 0 secs. done (actual nth=32, anyBufferGrown=no, maxBuffUsed=49%)
dim(hapsPop)
[1] 1344388 2015
sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

locale:
[1] LC_CTYPE=en_IN LC_NUMERIC=C LC_TIME=en_IN
[4] LC_COLLATE=en_IN LC_MONETARY=en_IN LC_MESSAGES=en_IN
[7] LC_PAPER=en_IN LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] data.table_1.9.7

mattdowle · 2016-11-10T03:15:37Z

Excellent thanks. I'll try and reproduce. Quite feasible that the rows-per-thread calculation is going wrong in this very wide case.

mattdowle · 2016-11-11T23:28:50Z

Reproduced and fixed. Thanks again.

mattdowle added this to the v1.9.8 milestone Nov 7, 2016

jangorecki added the fwrite label Nov 9, 2016

mattdowle changed the title ~~fwrite errors out with "segfault from C stack overflow" for large files~~ fwrite "segfault from C stack overflow" for very wide (1,344,389 column) DT Nov 10, 2016

mattdowle closed this as completed in 4fb148f Nov 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fwrite "segfault from C stack overflow" for very wide (1,344,389 column) DT #1903

fwrite "segfault from C stack overflow" for very wide (1,344,389 column) DT #1903

asivadas commented Nov 7, 2016

mattdowle commented Nov 7, 2016

mattdowle commented Nov 7, 2016

asivadas commented Nov 8, 2016

mattdowle commented Nov 10, 2016

mattdowle commented Nov 11, 2016

fwrite "segfault from C stack overflow" for very wide (1,344,389 column) DT #1903

fwrite "segfault from C stack overflow" for very wide (1,344,389 column) DT #1903

Comments

asivadas commented Nov 7, 2016

mattdowle commented Nov 7, 2016

mattdowle commented Nov 7, 2016

asivadas commented Nov 8, 2016

mattdowle commented Nov 10, 2016

mattdowle commented Nov 11, 2016