Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fwrite "segfault from C stack overflow" for very wide (1,344,389 column) DT #1903

Closed
asivadas opened this issue Nov 7, 2016 · 5 comments
Closed
Labels
Milestone

Comments

@asivadas
Copy link

asivadas commented Nov 7, 2016

I am using the latest development version of data.table and when I am trying to fwrite a huge table, it errors out with the message "segfault from C stack overflow".

The github version which I installed two weeks back on another system is working perfectly for the same file.

Could you please fix this?

@mattdowle
Copy link
Member

I'll need much more information please. How many rows and columns is the data.table? What types are the columns? Please run with verbose=TRUE and provide the output. Which version of R? Which operating system?

@mattdowle
Copy link
Member

If you are on Windows please ensure to purge the old version fully. Maybe reboot required, even these days. We've seen problems before where Windows appears to keep the old .dll hanging around. When this happens it causes segfault as you described. Also, please try the very latest version as of today as there have been changes in the last few days. (The verbose=TRUE output will give me more info now, for example, if you can provide that.)

@mattdowle mattdowle added this to the v1.9.8 milestone Nov 7, 2016
@asivadas
Copy link
Author

asivadas commented Nov 8, 2016

Sorry about the lack of clarity.
fwrite works fine when I try to write a file (1344388 rows, 2015 cols) but it fails when I try to save the transposed file (2010 rows, 1344388 cols)

Please find all information below:

library(data.table)
data.table 1.9.7 IN DEVELOPMENT built 2016-11-05 10:54:12 UTC; travis
The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
Release notes, videos and slides: http://r-datatable.com
hapsPop=as.data.frame(fread("Q1005__aachanged.haps",stringsAsFactors=F,header=F))
Read 1344388 rows and 2015 (of 2015) columns from 5.078 GB file in 00:02:08
hapsPop=hapsPop[nchar(as.character(hapsPop[,4]))==1 & nchar(as.character(hapsPop[,5]))==1, ]
bin = hapsPop
a=bin[,6:ncol(bin)]
a=as.matrix(a)
a[a == 1] = 2
a[a == 0] = 1
a[a == NA] = 0
t.a=transposeBigData(a)
t.a = setDT(as.data.frame(t.a), keep.rownames = TRUE)[]
fwrite(t.a,file="test.txt",col.names=F,sep=" ",quote=T,verbose=TRUE)
Error: segfault from C stack overflow
dim(t.a)
[1] 2010 1344389
fwrite(hapsPop,file="test.txt",col.names=F,sep=" ",quote=T,verbose=TRUE)maxLineLen=4083 from sample. Found in 0.100s
Writing column names ... done in 0.000s
Writing 1344388 rows in 1310 batches of 1027 rows (each buffer size 8MB, turbo=1, showProgress=1, nth=32) ...
Written 13.2% of 1344388 rows in 2 secs using 32 threads. anyBufferGrown=no; maxBuffUsed=49%. Finished in 13 secs.Written 32.8% of 1344388 rows in 3 secs using 32 threads. anyBufferGrown=no; maxBuffUsed=49%. Finished in 6 secs. Written 52.3% of 1344388 rows in 4 secs using 32 threads. anyBufferGrown=no; maxBuffUsed=49%. Finished in 3 secs. Written 69.4% of 1344388 rows in 5 secs using 32 threads. anyBufferGrown=no; maxBuffUsed=49%. Finished in 2 secs. Written 86.6% of 1344388 rows in 6 secs using 32 threads. anyBufferGrown=no; maxBuffUsed=49%. Finished in 0 secs. done (actual nth=32, anyBufferGrown=no, maxBuffUsed=49%)
dim(hapsPop)
[1] 1344388 2015
sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

locale:
[1] LC_CTYPE=en_IN LC_NUMERIC=C LC_TIME=en_IN
[4] LC_COLLATE=en_IN LC_MONETARY=en_IN LC_MESSAGES=en_IN
[7] LC_PAPER=en_IN LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] data.table_1.9.7

@mattdowle mattdowle changed the title fwrite errors out with "segfault from C stack overflow" for large files fwrite "segfault from C stack overflow" for very wide (1,344,389 column) DT Nov 10, 2016
@mattdowle
Copy link
Member

Excellent thanks. I'll try and reproduce. Quite feasible that the rows-per-thread calculation is going wrong in this very wide case.

@mattdowle
Copy link
Member

Reproduced and fixed. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants