Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Did DMatrix initialized with dense or sparse matrix lead to different result #1634

Closed
breakhearts opened this issue Oct 4, 2016 · 2 comments

Comments

@breakhearts
Copy link

I tried both R and python package. When i initialized DMatrix with dense

dmodel = xgb.DMatrix(model, label=Y.iloc[40000:].values, feature_names=dt.columns)

or sparse

dmodel = xgb.DMatrix(csc_matrix(model), label=Y.iloc[40000:].values, feature_names=dt.columns)

The training accuracy is different, is that correct? And should I always use sparse matrix?

@khotilov
Copy link
Member

khotilov commented Oct 4, 2016

The training accuracy is different, is that correct?

Yes. "Sparse" elements are treated as "missing" by the tree booster and as zeros by the linear booster.

And should I always use sparse matrix?

Use whichever works better for you.

@adamwlev
Copy link

This was totally unexpected for me. Thanks for raising this @breakhearts

@tqchen tqchen closed this as completed Jul 4, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Oct 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants