Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEGFAULT in helper_feature, corrupt matrix from weight_matrix_cpp? #281

Open
niklasfries opened this issue Sep 27, 2021 · 4 comments
Open

Comments

@niklasfries
Copy link

Occasionally I get SEGFAULT on R/features.R:391: dt[, sample_frequence := as.integer(.N), by = cnms].
The error claims that the stack size is exceeded, but that's not true (unless there's a bug in data.table).
My guess is that the matrix returned from weight_matrix_cpp is corrupted, and that modifying the matrix causes the problem.
A simple and robust workaround could be to create a new table with the frequencies instead of reusing the returned feature matrix.

@martinju
Copy link
Member

Hmm, I have not experienced something like this, and it puzzles me, as the cpp-function is very simple (I guess you mean feature_matrix_cpp, not weight_matrix_cpp?)

Could you provide an example where you (occasionally) have experienced this error?

What system/R-version are you on?

@niklasfries
Copy link
Author

Yes I mean feature_matrix_cpp, sorry about that.
I am using Ubuntu with R 4.1.0, RcppArmadillo 0.10.6.0.0, data.table 1.14.0, shapr 0.2.0.900.

I agree that it's not a problem in the shapr code, it's probably something in RcppArmadillo, maybe an allocator problem.
A flaw in this theory is that the return value from feature_matrix_cpp is used to create a new data.table on R/features.R:388: dt <- data.table::data.table(x), and my guess is that this is pass by value.

I wouldn't have reported the issue if I saw a problem with the workaround, but I have no experience with data.table so I don't know how much work that would be.
I can try to construct an example if you feel like this is worthwhile.

@martinju
Copy link
Member

That is almost exactly the same versions as me (except I am on R 4.1.1). What Rcpp version do you have? I am on 1.0.7.

I don't quite understand your workaround. Do you mean keeping the frequencies separate from the (possibly corrupted) dt?
One of the benefits of data.table is to avoid duplicating data, and we one to keep it like that whenever possible.

@niklasfries
Copy link
Author

I have Rcpp 1.0.7 as well.
In helper_feature the matrix from feature_matrix_cpp is used to create a data.table with m columns, and then two more columns are added: sample_frequence and is_duplicate.
Next, the original columns are deleted, and only the two new columns are returned.
My suggestion is that a new table is created for the new columns, so that the original data.table is treated as read only.
I'm not sure whether I would fix this if it was my project, but at least now you know that there is a potential problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants