Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecation warning for lists passed into DMatrix #3970

Merged
merged 5 commits into from
Dec 14, 2018

Conversation

scwilkinson
Copy link
Contributor

The documentation does not include lists as an allowed type for the data inputted into DMatrix. Despite this, a list can be passed in without an error. This change would prevent a list form being passed in directly.

I experienced an issue where passing in a list vs a np.array resulted in different predictions (sometimes over 10% relative difference) for the same data. Though these differences were infrequent (~1.5% of cases tested), in certain applications this could cause serious issues.

The documentation does not include lists as an allowed type for the data inputted into DMatrix. Despite this, a list can be passed in without an error. This change would prevent a list form being passed in directly.

I experienced an issue where passing in a list vs a np.array resulted in different predictions (sometimes over 10% relative difference) for the same data. Though these differences were infrequent (~1.5% of cases tested), in certain applications this could cause serious issues.
@trivialfis
Copy link
Member

Internally the list is converted to csr_matrix from scipy. If we were to disable such a conversion, we might just disable it by removing:

try:
csr = scipy.sparse.csr_matrix(data)
self._init_from_csr(csr)
except:

@scwilkinson @hcho3 WDYT?

@scwilkinson
Copy link
Contributor Author

@trivialfis As far as I can tell, lists are the only problem datatype that can currently get through DMatrix. It seems then like the simplest way to avoid any issues they cause would be to specifically catch them.

@trivialfis
Copy link
Member

@scwilkinson Make sense to me. :) But this breaks a test from distributed tests. So I think maybe it's more appropriate to display a deprecation warning for now and remove it completely after the next release.
I can help changing your code if you don't mind.

@scwilkinson
Copy link
Contributor Author

@trivialfis That sounds like a good idea! Also handy for people who may be used to passing in lists, so they don't get surprised by their code suddenly throwing TypeErrors.

I'd appreciate the help!

@scwilkinson scwilkinson changed the title Ensure lists cannot be passed into DMatrix Deprecation warning for lists passed into DMatrix Dec 7, 2018
@trivialfis
Copy link
Member

@scwilkinson It doesn't seem I have the permission to push to your branch. Here is a small patch I made, thanks.

From c5e1f558b1bc8fb2661d3c38039f376eaef8a1f9 Mon Sep 17 00:00:00 2001
From: fis <ybbs.daans@hotmail.com>
Date: Mon, 10 Dec 2018 01:35:39 +1300
Subject: [PATCH] Use warning instead.

---
 python-package/xgboost/core.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/python-package/xgboost/core.py b/python-package/xgboost/core.py
index 225fe756..748a3da0 100644
--- a/python-package/xgboost/core.py
+++ b/python-package/xgboost/core.py
@@ -14,6 +14,7 @@ import ctypes
 import os
 import re
 import sys
+import warnings
 
 import numpy as np
 import scipy.sparse
@@ -382,7 +383,8 @@ class DMatrix(object):
         weight = _maybe_dt_array(weight)
 
         if isinstance(data, list):
-            raise TypeError('can not initialize DMatrix from list')
+            warnings.warn('Initializing DMatrix from List is deprecated.',
+                          DeprecationWarning)
         elif isinstance(data, STRING_TYPES):
             self.handle = ctypes.c_void_p()
             _check_call(_LIB.XGDMatrixCreateFromFile(c_str(data),
-- 
2.19.1

@scwilkinson
Copy link
Contributor Author

@trivialfis I've added a change similar to yours, but it continues with initialisation after raising the warning. Hopefully this will fix the tests!

@codecov-io
Copy link

codecov-io commented Dec 13, 2018

Codecov Report

Merging #3970 into master will increase coverage by 0.3%.
The diff coverage is 66.66%.

Impacted file tree graph

@@             Coverage Diff             @@
##             master    #3970     +/-   ##
===========================================
+ Coverage     56.35%   56.66%   +0.3%     
  Complexity      205      205             
===========================================
  Files           186      187      +1     
  Lines         14775    14826     +51     
  Branches        498      498             
===========================================
+ Hits           8327     8401     +74     
+ Misses         6209     6186     -23     
  Partials        239      239
Impacted Files Coverage Δ Complexity Δ
python-package/xgboost/core.py 77.38% <66.66%> (-0.06%) 0 <0> (ø)
src/tree/param.h 86.66% <0%> (-0.26%) 0% <0%> (ø)
src/tree/updater_refresh.cc 98.76% <0%> (-0.02%) 0% <0%> (ø)
tests/cpp/tree/test_tree_model.cc 100% <0%> (ø) 0% <0%> (?)
src/tree/updater_colmaker.cc 1.59% <0%> (ø) 0% <0%> (ø) ⬇️
src/tree/updater_histmaker.cc 2.89% <0%> (ø) 0% <0%> (ø) ⬇️
src/tree/updater_skmaker.cc 2.21% <0%> (+0.02%) 0% <0%> (ø) ⬇️
src/tree/updater_quantile_hist.cc 33.82% <0%> (+0.08%) 0% <0%> (ø) ⬇️
src/objective/multiclass_obj.cu 93.68% <0%> (+1.05%) 0% <0%> (ø) ⬇️
include/xgboost/tree_model.h 62.37% <0%> (+4.61%) 0% <0%> (ø) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 48dddfd...732e4f9. Read the comment docs.

@trivialfis trivialfis closed this Dec 14, 2018
@trivialfis trivialfis reopened this Dec 14, 2018
@trivialfis
Copy link
Member

Seems Travis hangs. Just trying to trigger a rerun.

@trivialfis trivialfis merged commit fd722d6 into dmlc:master Dec 14, 2018
@trivialfis
Copy link
Member

@scwilkinson Thanks for the good contribution !

@scwilkinson
Copy link
Contributor Author

@trivialfis Thank you for the help!

@lock lock bot locked as resolved and limited conversation to collaborators Mar 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants