xgboost x - segmentation fault #4

mglowacki100 · 2016-07-19T11:24:40Z

Segmentation fault (core dumped) occurs randomly when I run xgboost x model through R (02-models.build.R)...
I need to double-check it, but as far I remember there was no such problem with model.xgbx.x.stack.py (run in spider) with hard-coded options.

diefimov · 2016-07-22T09:32:32Z

It happened to me before, when data do not fit to RAM. Try to decrease dataset and check if you have the same error.

mglowacki100 · 2016-07-23T08:14:33Z

I have 64gb of ram, additionally I've set 256gb swapfile, but problem still occurs.
Now, I suspect this line: sys.path.append('/Users/ef/xgboost/wrapper'). I didn't update it.

mglowacki100 · 2016-07-23T14:35:24Z

This line doesn't matter.
I've tried to run directly in spider with hardcoded params - now, script stalls instead of segmentation fault. Still all cores used but usage in total is about 20%...
This is a similar issue:
dmlc/xgboost#209

diefimov · 2016-07-23T14:50:48Z

Did you try to reduce dataset (take first 10000 lines only, for example)? My guess is that if it is not the memory problem, then xgboost stalls because of some problem with data.

mglowacki100 · 2016-07-23T21:39:57Z

I've shrinked both: test.csv (first 10k rows) and train.csv(random 10k rows - to have all labels) and there is no segmentation fault or stalling.
Btw. how much RAM do you used for full dataset?

diefimov · 2016-07-24T06:39:03Z

I work on Mac with 32 Gb RAM. Try to increase dataset gradually. I suppose the problem with some line in the dataset, probably, some entries are not appropriate for xgboost (it could be the problem with NA values for example).

mglowacki100 · 2016-07-24T18:29:18Z

I've tried a few combinations, but this time except segmentation fault in logs, I got more meaningful message in RStudio:

train - all, test - 10k first
combine.preds("train_raw/model.xgbx", 10)
Reading epoch 0 ...
Reading epoch 1 ...
Reading epoch 2 ...
Reading epoch 3 ...
Reading epoch 4 ...
Reading epoch 5 ...
Reading epoch 6 ...
Reading epoch 7 ...
Reading epoch 8 ...
Reading epoch 9 ...
Error in rowSums(actual * predicted) :
error in evaluating the argument 'x' in selecting a method for function 'rowSums': Error in actual * predicted : non-conformable arrays

train - all random, test - all random
...
Error in rowSums(actual * predicted) :
error in evaluating the argument 'x' in selecting a method for function 'rowSums': Error in actual * predicted : non-conformable arrays

train - 60k random, test - 60k random
...
Error in rowSums(actual * predicted) :
error in evaluating the argument 'x' in selecting a method for function 'rowSums': Error in actual * predicted : non-conformable arrays

train - 30k random, test - 10k first
Error in cbind(data.pred, as.matrix(data.pred.epoch)) :
number of rows of matrices must match (see arg 2)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xgboost x - segmentation fault #4

xgboost x - segmentation fault #4

mglowacki100 commented Jul 19, 2016

diefimov commented Jul 22, 2016

mglowacki100 commented Jul 23, 2016

mglowacki100 commented Jul 23, 2016

diefimov commented Jul 23, 2016

mglowacki100 commented Jul 23, 2016

diefimov commented Jul 24, 2016

mglowacki100 commented Jul 24, 2016

xgboost x - segmentation fault #4

xgboost x - segmentation fault #4

Comments

mglowacki100 commented Jul 19, 2016

diefimov commented Jul 22, 2016

mglowacki100 commented Jul 23, 2016

mglowacki100 commented Jul 23, 2016

diefimov commented Jul 23, 2016

mglowacki100 commented Jul 23, 2016

diefimov commented Jul 24, 2016

mglowacki100 commented Jul 24, 2016