Did DMatrix initialized with dense or sparse matrix lead to different result #1634

breakhearts · 2016-10-04T04:20:51Z

I tried both R and python package. When i initialized DMatrix with dense

dmodel = xgb.DMatrix(model, label=Y.iloc[40000:].values, feature_names=dt.columns)

or sparse

dmodel = xgb.DMatrix(csc_matrix(model), label=Y.iloc[40000:].values, feature_names=dt.columns)

The training accuracy is different, is that correct? And should I always use sparse matrix?

The text was updated successfully, but these errors were encountered:

khotilov · 2016-10-04T04:38:18Z

The training accuracy is different, is that correct?

Yes. "Sparse" elements are treated as "missing" by the tree booster and as zeros by the linear booster.

And should I always use sparse matrix?

Use whichever works better for you.

adamwlev · 2016-10-21T03:43:03Z

This was totally unexpected for me. Thanks for raising this @breakhearts

tqchen closed this as completed Jul 4, 2018

lock bot locked as resolved and limited conversation to collaborators Oct 24, 2018

Provide feedback