Handle empty dataset in ranking metric. #6272

pseudotensor · 2020-10-22T06:47:04Z

Same setup as here: #6232 (comment)

but now in xgboost 1.2.0 (not 1.1.0) when I run:

import pandas as pd
def fun():
    from dask.distributed import Client, wait
    from dask_cuda import LocalCUDACluster

    with LocalCUDACluster() as cluster:
        with Client(cluster) as client:

            import xgboost as xgb
            import dask_cudf

            target = "default payment next month"
            Xpd = pd.read_csv("creditcard.csv")
            Xpd = Xpd[['AGE', target]]
            Xpd.to_csv("creditcard_1.csv")
            X = dask_cudf.read_csv("creditcard_1.csv")
            y = X[target]
            X = X.drop(target, axis=1)

            kwargs_fit = {}
            kwargs_cudf_fit = kwargs_fit.copy()

            valid_X = dask_cudf.read_csv("creditcard_1.csv")
            valid_y = valid_X[target]
            valid_X = valid_X.drop(target, axis=1)
            kwargs_cudf_fit['eval_set'] = [(valid_X, valid_y)]

            params = {}  # copy.deepcopy(self.model.get_params())
            params['tree_method'] = 'gpu_hist'

            dask_model = xgb.dask.DaskXGBClassifier(**params)
            dask_model.fit(X, y, verbose=True) #, eval_set=kwargs_cudf_fit.get('eval_set'),
#                           sample_weight_eval_set=kwargs_cudf_fit.get('sample_weight_eval_set'), verbose=True)
            print("here")

if __name__ == '__main__':
    fun()

I get many warnings:

[23:33:04] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:33:04] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:33:04] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:33:04] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:33:04] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:33:04] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:33:04] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:33:04] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:33:04] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:33:04] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:33:04] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:33:04] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:33:04] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:33:04] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:33:04] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:33:04] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:33:04] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.

etc.

If I uncomment the commented-out parts for the eval_set, I get more warnings:

[23:36:56] WARNING: /root/repo/xgboost/src/metric/elementwise_metric.cu:336: label set is empty
[23:36:56] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:36:56] WARNING: /root/repo/xgboost/src/metric/elementwise_metric.cu:336: label set is empty
[23:36:56] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:36:56] WARNING: /root/repo/xgboost/src/metric/elementwise_metric.cu:336: label set is empty
[23:36:56] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:36:56] WARNING: /root/repo/xgboost/src/metric/elementwise_metric.cu:336: label set is empty
[23:36:56] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:36:56] WARNING: /root/repo/xgboost/src/metric/elementwise_metric.cu:336: label set is empty
[23:36:56] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:36:56] WARNING: /root/repo/xgboost/src/metric/elementwise_metric.cu:336: label set is empty
[23:36:56] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:36:56] WARNING: /root/repo/xgboost/src/metric/elementwise_metric.cu:336: label set is empty
[23:36:56] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:36:56] WARNING: /root/repo/xgboost/src/metric/elementwise_metric.cu:336: label set is empty
[23:36:56] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:36:56] WARNING: /root/repo/xgboost/src/metric/elementwise_metric.cu:336: label set is empty
[23:36:56] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:36:56] WARNING: /root/repo/xgboost/src/metric/elementwise_metric.cu:336: label set is empty
[23:36:56] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
[23:36:56] WARNING: /root/repo/xgboost/src/metric/elementwise_metric.cu:336: label set is empty

etc.

Crucially, if I add an eval_metric like:


import pandas as pd
def fun():
    from dask.distributed import Client, wait
    from dask_cuda import LocalCUDACluster

    with LocalCUDACluster() as cluster:
        with Client(cluster) as client:

            import xgboost as xgb
            import dask_cudf

            target = "default payment next month"
            Xpd = pd.read_csv("creditcard.csv")
            Xpd = Xpd[['AGE', target]]
            Xpd.to_csv("creditcard_1.csv")
            X = dask_cudf.read_csv("creditcard_1.csv")
            y = X[target]
            X = X.drop(target, axis=1)

            kwargs_fit = {}
            kwargs_cudf_fit = kwargs_fit.copy()

            valid_X = dask_cudf.read_csv("creditcard_1.csv")
            valid_y = valid_X[target]
            valid_X = valid_X.drop(target, axis=1)
            kwargs_cudf_fit['eval_set'] = [(valid_X, valid_y)]

            params = {}  # copy.deepcopy(self.model.get_params())
            params['tree_method'] = 'gpu_hist'
            params['eval_metric'] = 'auc'

            dask_model = xgb.dask.DaskXGBClassifier(**params)
            dask_model.fit(X, y, eval_set=kwargs_cudf_fit.get('eval_set'),
                           sample_weight_eval_set=kwargs_cudf_fit.get('sample_weight_eval_set'), verbose=True)
            print("here")

if __name__ == '__main__':
    fun()

This issue becomes fatal:

/home/jon/minicondadai/lib/python3.6/site-packages/distributed/client.py:3479: RuntimeWarning: coroutine 'Client._update_scheduler_info' was never awaited
  self.sync(self._update_scheduler_info)
task [xgboost.dask]:tcp://127.0.0.1:34473 connected to the tracker
task [xgboost.dask]:tcp://127.0.0.1:35613 connected to the tracker
task [xgboost.dask]:tcp://127.0.0.1:34473 got new rank 0
task [xgboost.dask]:tcp://127.0.0.1:35613 got new rank 1
worker tcp://127.0.0.1:35613 has an empty DMatrix.  All workers associated with this DMatrix: {'tcp://127.0.0.1:34473'}
worker tcp://127.0.0.1:35613 has an empty DMatrix.  All workers associated with this DMatrix: {'tcp://127.0.0.1:34473'}
[23:46:04] WARNING: /root/repo/xgboost/src/objective/regression_obj.cu:59: Label set is empty.
task [xgboost.dask]:tcp://127.0.0.1:34473 connected to the tracker
distributed.worker - WARNING -  Compute Failed
Function:  dispatched_train
args:      ('tcp://127.0.0.1:34473', {'feature_names': None, 'feature_types': None, 'has_label': True, 'has_weights': False, 'missing': nan, 'worker_map': defaultdict(<class 'list'>, {'tcp://127.0.0.1:34473': [<Future: finished, key: tuple-d5f09e67-afdf-4395-8eaa-5744207937cb>]}), 'is_quantile': False}, [({'feature_names': None, 'feature_types': None, 'has_label': True, 'has_weights': False, 'missing': nan, 'worker_map': defaultdict(<class 'list'>, {'tcp://127.0.0.1:34473': [<Future: finished, key: tuple-1b7129e7-1910-4ace-9220-3c27ed63fbac>]}), 'is_quantile': False}, 'validation_0')])
kwargs:    {}
Exception: XGBoostError('[23:46:05] /root/repo/xgboost/src/metric/rank_metric.cc:242: Check failed: info.labels_.Size() != 0U (0 vs. 0) : label set cannot be empty\nStack trace:\n  [bt] (0) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x6a) [0x7f3444028b3a]\n  [bt] (1) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(xgboost::metric::EvalAuc::Eval(xgboost::HostDeviceVector<float> const&, xgboost::MetaInfo const&, bool)+0xde2) [0x7f344415bee2]\n  [bt] (2) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(xgboost::LearnerImpl::EvalOneIter(int, std::vector<std::shared_ptr<xgboost::DMatrix>, std::allocator<std::shared_ptr<xgboost::DMatrix> > > const&, std::vector<std::string, std::allocator<std::string> > const&)+0x46f) [0x7f3444133acf]\n  [bt] (3) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(XGBoosterEvalOneIter+0x323) [0x7f3444032463]\n  [bt] (4) /home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f35331f0630]\n  [bt] (5) /home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7f35331effed]\n  [bt] (6) /home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7f3533206f9e]\n  [bt] (7) /home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x139d5) [0x7f35332079d5]\n  [bt] (8) dask-worker [tcp://127.0.0.1:35613](_PyObject_FastCallDict+0x8b) [0x556257d3500b]\n\n',)

task [xgboost.dask]:tcp://127.0.0.1:34473 got new rank 0
/home/jon/minicondadai/lib/python3.6/site-packages/distributed/client.py:4773: RuntimeWarning: coroutine 'Client._close' was never awaited
  c.close(timeout=2)
/home/jon/minicondadai/lib/python3.6/site-packages/distributed/client.py:4773: RuntimeWarning: coroutine 'Client._close' was never awaited
  c.close(timeout=2)
Traceback (most recent call last):
  File "dask_cudf_scitkit_example.py", line 38, in <module>
    fun()
  File "dask_cudf_scitkit_example.py", line 34, in fun
    sample_weight_eval_set=kwargs_cudf_fit.get('sample_weight_eval_set'), verbose=True)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/dask.py", line 1080, in fit
    eval_set, sample_weight_eval_set, verbose)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/client.py", line 824, in sync
    self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
  File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/utils.py", line 339, in sync
    raise exc.with_traceback(tb)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/utils.py", line 323, in f
    result[0] = yield future
  File "/home/jon/minicondadai/lib/python3.6/site-packages/tornado/gen.py", line 735, in run
    value = future.result()
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/dask.py", line 1067, in _fit_async
    evals=evals, verbose_eval=verbose)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/dask.py", line 633, in _train_async
    results = await client.gather(futures)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/distributed/client.py", line 1833, in _gather
    raise exception.with_traceback(traceback)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/dask.py", line 618, in dispatched_train
    **kwargs)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/training.py", line 222, in train
    xgb_model=xgb_model, callbacks=callbacks)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/training.py", line 85, in _train_internal
    bst_eval_set = bst.eval_set(evals, i, feval)
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/core.py", line 1230, in eval_set
    ctypes.byref(msg)))
  File "/home/jon/minicondadai/lib/python3.6/site-packages/xgboost/core.py", line 188, in _check_call
    raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [23:46:05] /root/repo/xgboost/src/metric/rank_metric.cc:242: Check failed: info.labels_.Size() != 0U (0 vs. 0) : label set cannot be empty
Stack trace:
  [bt] (0) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x6a) [0x7f3444028b3a]
  [bt] (1) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(xgboost::metric::EvalAuc::Eval(xgboost::HostDeviceVector<float> const&, xgboost::MetaInfo const&, bool)+0xde2) [0x7f344415bee2]
  [bt] (2) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(xgboost::LearnerImpl::EvalOneIter(int, std::vector<std::shared_ptr<xgboost::DMatrix>, std::allocator<std::shared_ptr<xgboost::DMatrix> > > const&, std::vector<std::string, std::allocator<std::string> > const&)+0x46f) [0x7f3444133acf]
  [bt] (3) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(XGBoosterEvalOneIter+0x323) [0x7f3444032463]
  [bt] (4) /home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f35331f0630]
  [bt] (5) /home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7f35331effed]
  [bt] (6) /home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7f3533206f9e]
  [bt] (7) /home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x139d5) [0x7f35332079d5]
  [bt] (8) dask-worker [tcp://127.0.0.1:35613](_PyObject_FastCallDict+0x8b) [0x556257d3500b]

I'm guessing this is related to #6232 and recent call back fixes, but I can't be sure.

@teju85

The text was updated successfully, but these errors were encountered:

pseudotensor · 2020-10-22T06:51:22Z

Also, this problem only happens with multiple GPUs. This is probably also related to #6268 since also eval_set specific related and only happens with multiple GPUs.

pseudotensor · 2020-10-22T06:57:49Z

I should clarify that to work around #6268 issue that blocks that script from even running, I'm appending to the bottom of site-packages/xgboost/sklearn.py the following:

class XGBoostLabelEncoder(object):
    def __init__(self):
        pass
    
    def fit(self, input_array, y=None):
        return self
    
    def transform(self, input_array, y=None):
        return input_array

    def fit_transform(self, input_array, y=None):
        return input_array

    def inverse_transform(self, input_array, y=None):
        return input_array

This redefines the label encoder to a dummy mode that does nothing, and I ensure that I label encode or have correct y values (else it'll fail with that should be label encoded from 0 .. num classes - 1).

pseudotensor · 2020-10-22T07:05:00Z

Work-around for now is to not pass the eval_set, since anyways there is no call back support, so the only purpose would be to print out the score with verbose mode which is not useful except for debugging.

trivialfis · 2020-10-22T07:08:58Z

This should be caused by #6268 (comment) with ranking metric.

pseudotensor · 2020-12-24T12:11:21Z

@trivialfis FYI still get this with latest 1.3.1 or 1.4.0.

Exception: XGBoostError('[03:49:27] /workspace/xgboost/src/metric/rank_metric.cc:611: Check failed: info.labels_.Size() != 0U (0 vs. 0) : label set cannot be empty\nStack trace:\n  [bt] (0) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x54) [0x14f237b04f64]\n  [bt] (1) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(xgboost::metric::EvalAucPR::Eval(xgboost::HostDeviceVector<float> const&, xgboost::MetaInfo const&, bool)+0x9cb) [0x14f237c5790b]\n  [bt] (2) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(xgboost::LearnerImpl::EvalOneIter(int, std::vector<std::shared_ptr<xgboost::DMatrix>, std::allocator<std::shared_ptr<xgboost::DMatrix> > > const&, std::vector<std::string, std::allocator<std::string> > const&)+0x4f4) [0x14f237c2e964]\n  [bt] (3) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(XGBoosterEvalOneIter+0x22d) [0x14f237b0cb6d]\n  [bt] (4) /home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x14f31c200630]\n  [bt] (5) /home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x14f31c1fffed]\n  [bt] (6) /home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x14f31c216f9e]\n  [bt] (7) /home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x139d5) [0x14f31c2179d5]\n  [bt] (8) dask-worker [tcp://172.16.2.210:38823](_PyObject_FastCallDict+0x8b) [0x555f2de8000b]\n\n',)

Exception: XGBoostError('[04:11:10] /workspace/xgboost/src/metric/rank_metric.cc:242: Check failed: info.labels_.Size() != 0U (0 vs. 0) : label set cannot be empty\nStack trace:\n  [bt] (0) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x54) [0x14f237b04f64]\n  [bt] (1) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(xgboost::metric::EvalAuc::Eval(xgboost::HostDeviceVector<float> const&, xgboost::MetaInfo const&, bool)+0x9cb) [0x14f237c58a8b]\n  [bt] (2) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(xgboost::LearnerImpl::EvalOneIter(int, std::vector<std::shared_ptr<xgboost::DMatrix>, std::allocator<std::shared_ptr<xgboost::DMatrix> > > const&, std::vector<std::string, std::allocator<std::string> > const&)+0x4f4) [0x14f237c2e964]\n  [bt] (3) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(XGBoosterEvalOneIter+0x22d) [0x14f237b0cb6d]\n  [bt] (4) /home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x14f31c200630]\n  [bt] (5) /home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x14f31c1fffed]\n  [bt] (6) /home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x14f31c216f9e]\n  [bt] (7) /home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x139d5) [0x14f31c2179d5]\n  [bt] (8) dask-worker [tcp://172.16.2.210:38823](_PyObject_FastCallDict+0x8b) [0x555f2de8000b]\n\n',)

See these messages:

[03:48:39] WARNING: /workspace/xgboost/src/learner.cc:1219: Empty dataset at worker: 1

and

Exception: XGBoostError('[03:49:27] /workspace/xgboost/rabit/include/rabit/internal/utils.h:90: Allreduce failed',)

I only see these since enabling early stopping (currently above was 1 node with 2 GPUs)

trivialfis · 2020-12-24T12:52:27Z

Yup. I haven't been able to get to those metrics. But will look into them in 1.4.

pseudotensor · 2020-12-24T12:57:48Z

It's not critical, I just happen to be using a chunksize from X for eval_set, and eval_set was sufficiently smaller that it led to not enough data for each worker.

Quick question: Is there any constraint for xgboost dask in terms of number of partitions for X,y,sample_weight vs. eval_set/sample_weight_eval_set?

I'm assuming each group of those two can have different number of partitions, while within each group they should have same. Is that right?

trivialfis · 2020-12-24T13:08:54Z

Different dask dmatrix does not impose constraint on each other. You assumption is right.

pseudotensor · 2020-12-25T06:33:52Z

Hmm, as in #6551 I'm hitting this error even when I ensure X/y and valid_X/valid_y have many partitions for workers.

Just 2 GPUs 1 node and sometimes for no good reason hit this error.

[19:17:59] WARNING: /workspace/xgboost/src/learner.cc:1219: Empty dataset at worker: 1
[19:17:59] WARNING: /workspace/xgboost/src/learner.cc:1219: Empty dataset at worker: 1
2020-12-24 19:18:00,283 - distributed.worker - WARNING -  Compute Failed
Function:  dispatched_train
args:      ('tcp://172.16.2.210:33765', [b'DMLC_NUM_WORKER=2', b'DMLC_TRACKER_URI=172.16.2.210', b'DMLC_TRACKER_PORT=9091', b'DMLC_TASK_ID=[xgboost.dask]:tcp://172.16.2.210:33765'], {'feature_names': None, 'feature_types': None, 'feature_weights': None, 'meta_names': ['labels'], 'missing': nan, 'parts': [(          0_v1   100_v88   101_v89  ...    98_v86     99_v87    9_v108
0     1.335739  3.321300  0.095678  ...  0.866426   9.551836  2.382692
1          NaN       NaN  2.678584  ...       NaN   9.848003  1.825361
2     0.943877  3.367346  0.111388  ...  1.071429   8.447465  1.375753
3     0.797415  1.408046  0.039051  ...  1.242817  10.747144  2.230754
4          NaN       NaN       NaN  ...       NaN        NaN       NaN
...        ...       ...       ...  ...       ...        ...       ...
5711  1.267744  1.145375  0.054302  ...  1.174744   8.252816  1.779999
5712       NaN       NaN       NaN  ...       NaN        NaN       NaN
5713       NaN       NaN       NaN  ...       NaN        NaN     
kwargs:    {}
Exception: XGBoostError('[19:17:59] /workspace/xgboost/src/metric/rank_metric.cc:611: Check failed: info.labels_.Size() != 0U (0 vs. 0) : label set cannot be empty\nStack trace:\n  [bt] (0) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x54) [0x14f055b64f64]\n  [bt] (1) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(xgboost::metric::EvalAucPR::Eval(xgboost::HostDeviceVector<float> const&, xgboost::MetaInfo const&, bool)+0x9cb) [0x14f055cb790b]\n  [bt] (2) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(xgboost::LearnerImpl::EvalOneIter(int, std::vector<std::shared_ptr<xgboost::DMatrix>, std::allocator<std::shared_ptr<xgboost::DMatrix> > > const&, std::vector<std::string, std::allocator<std::string> > const&)+0x4f4) [0x14f055c8e964]\n  [bt] (3) /home/jon/minicondadai/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(XGBoosterEvalOneIter+0x22d) [0x14f055b6cb6d]\n  [bt] (4) /home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x14f13201c630]\n  [bt] (5) /home/jon/minicondadai/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x14f13201bfed]\n  [bt] (6) /home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x14f132032f9e]\n  [bt] (7) /home/jon/minicondadai/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x139d5) [0x14f1320339d5]\n  [bt] (8) dask-worker [tcp://172.16.2.210:33765](_PyObject_FastCallDict+0x8b) [0x55bbdf1e500b]\n\n',)

2020-12-24 19:18:00,360 - distributed.worker - WARNING -  Compute Failed
Function:  dispatched_train
args:      ('tcp://172.16.2.210:37553', [b'DMLC_NUM_WORKER=2', b'DMLC_TRACKER_URI=172.16.2.210', b'DMLC_TRACKER_PORT=9091', b'DMLC_TASK_ID=[xgboost.dask]:tcp://172.16.2.210:37553'], {'feature_names': None, 'feature_types': None, 'feature_weights': None, 'meta_names': ['labels'], 'missing': nan, 'parts': [(           0_v1   100_v88   101_v89  ...    98_v86     99_v87    9_v108
5716   1.256061  0.969933  1.058731  ...  2.056256  13.306870  2.642831
5717   2.107512  3.972944  1.589945  ...  0.640798   9.601439  1.986946
5718   1.232747  2.627320  2.895753  ...  0.713946  10.347195  1.876453
5719        NaN       NaN       NaN  ...       NaN        NaN       NaN
5720   1.672046  0.335700  1.455212  ...  2.537121  11.874986  1.966222
...         ...       ...       ...  ...       ...        ...       ...
11427       NaN       NaN       NaN  ...       NaN        NaN       NaN
11428  0.497645  1.950236  4.318180  ...  1.735036  10.435363  2.076428
11429       NaN       NaN       NaN  ...       NaN      
kwargs:    {}
Exception: XGBoostError('[19:17:59] /workspace/xgboost/rabit/include/rabit/internal/utils.h:90: Allreduce failed',)

Like the other issue, this is despite have plenty of partitions. I print the partitions for the Xy group and the valid_X/valid_y group:

020-12-24 19:17:55,954 C: NA  D:  NA    M:  NA    NODE:SERVER      19354  PDEBUG | ('num_workers: 2',)
2020-12-24 19:17:56,319 C: NA  D:  NA    M:  NA    NODE:SERVER      19354  PDEBUG | to_dask duration for X_shape=(91457, 128): 0.000324011 0.306071 0.0581458
2020-12-24 19:17:56,332 C: NA  D:  NA    M:  NA    NODE:SERVER      19354  PDEBUG | ('Xy npartitions: 16 16',)
2020-12-24 19:17:56,585 C: NA  D:  NA    M:  NA    NODE:SERVER      19354  PDEBUG | to_dask duration for X_shape=(22864, 128): 0.000548124 0.192724 0.0530603
2020-12-24 19:17:56,585 C: NA  D:  NA    M:  NA    NODE:SERVER      19354  PDEBUG | ('valid Xy npartitions: 4 4',)

You can see this is for the relevant time window.

pseudotensor · 2020-12-25T06:37:14Z

Note that in the to_dask() function I have, I basically:

put y, sample_weight (if exists) with X and call it X_pd
X_dask = dd.from_pandas(X_pd, chunksize=chunksize).persist()
X_dask = dask_cudf.from_dask_dataframe(X_dask)
Then I extract back out of X_dask the actual X and y using pandas drop etc.

I do same exact function for valid_X/valid_y/sample_weight_eval_set.

So I'm pretty convinced that sometimes, despite having several partitions per worker that should be scattered upon "persist" that either they are not actually scattered and there is a bug in distributed/dask or xgboost is mis-managing the pieces of data and erroneously seeing empty frames.

pseudotensor · 2020-12-25T07:11:52Z

FYI I found the exact pickle (we save pickled state when bad things happen) and re-ran and it doesn't fail in same way, even with the same 2-node cluster running (it's still running stuff).

So there is some inconsistent behavior

I could share what I have, but it's wrapped-up too much in our local code, and if not reproing doesn't help.

pseudotensor · 2020-12-25T07:13:07Z

In case info helps:

Xy npartitions: 16 16
valid Xy npartitions: 4 4
DaskXGBClassifier(booster='gbtree', colsample_bytree=0.55,
                  debug_verbose=2,
                  early_stopping_limit=None, early_stopping_rounds=20, eval_metric='aucpr',
                  gamma=0.0, gpu_id=0,
                  grow_policy='lossguide',
                  learning_rate=0.15, max_bin=256,
                  max_delta_step=0.0, max_depth=0, max_leaves=256,
                  min_child_weight=1,
                  n_estimators=600, n_jobs=9, ...)
{'base_score': None, 'booster': 'gbtree', 'colsample_bylevel': None, 'colsample_bynode': None, 'colsample_bytree': 0.55, 'gamma': 0.0, 'gpu_id': 0, 'importance_type': 'gain', 'interaction_constraints': None, 'learning_rate': 0.15, 'max_delta_step': 0.0, 'max_depth': 0, 'min_child_weight': 1, 'missing': nan, 'monotone_constraints': None, 'n_estimators': 600, 'n_jobs': 9, 'num_parallel_tree': None, 'objective': 'binary:logistic', 'random_state': 278438169, 'reg_alpha': 0.0, 'reg_lambda': 2.0, 'scale_pos_weight': 1.0, 'subsample': 0.5, 'tree_method': 'gpu_hist', 'validate_parameters': None, 'verbosity': None, 'use_label_encoder': False, 'model_class_name': 'XGBoostGBMDaskModel', 'num_class': 1, 'labels': [0, 1], 'score_f_name': 'LOGLOSS', 'time_column': None, 'encoder': None, 'tgc': None, 'pred_gap': None, 'pred_periods': None, 'target': None, 'tsp': None, 'early_stopping_rounds': 20, 'max_bin': 256, 'grow_policy': 'lossguide', 'max_leaves': 256, 'eval_metric': 'aucpr', 'early_stopping_threshold': 1e-05, 'monotonicity_constraints': False, 'silent': 0, 'debug_verbose': 2, 'seed': 278438169, 'disable_gpus': False, 'lossguide': False, 'accuracy': 7, 'time_tolerance': 10, 'interpretability': 1, 'ensemble_level': 3, 'train_shape': (114321, 133), 'valid_shape': None, 'model_origin': 'DefaultIndiv: do_te:True,interp:11,depth:6,num_as_cat:False', 'resumed_experiment_id': 'bedd7566-45e6-11eb-bb81-0cc47adb058f', 'str_uuid': 'ret_ff6609f1-e952-4a09-af96-9388336d482c', 'experiment_description': '3.cineweru', 'train_dataset_name': 'train.csv.zip', 'valid_data_name': '[Valid]', 'test_data_name': '[Test]', 'ngenes': 127, 'ngenes_max': 133, 'uses_gpu': True, 'early_stopping_limit': None}
(Delayed('int-46fdf95d-3bc2-4c36-b9e1-d5c25b2f5bfe'), 127)
(dd.Scalar<size-ag..., dtype=int64>,)

trivialfis changed the title ~~dask_cudf: Label set is empty and Check failed: info.labels_.Size() != 0U (0 vs. 0) : label set cannot be empty~~ Handle empty dataset in ranking metric. Oct 29, 2020

trivialfis mentioned this issue Dec 5, 2020

Don't validate feature when number of rows is 0. #6472

Merged

pseudotensor mentioned this issue Dec 24, 2020

/workspace/xgboost/rabit/include/rabit/internal/utils.h:90: Allreduce failed #6551

Closed

trivialfis mentioned this issue Feb 8, 2021

Improve implementation for auc and aucpr. #6692

Closed

trivialfis mentioned this issue Oct 8, 2021

Re-implement PR-AUC. #7297

Merged

trivialfis closed this as completed in #7297 Oct 26, 2021

daviddwlee84 mentioned this issue Nov 21, 2023

Dask distributed training with custom evaluation function on HDFS parquet file will cause empty DMatrix warning and corresponding errors #9795

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle empty dataset in ranking metric. #6272

Handle empty dataset in ranking metric. #6272

pseudotensor commented Oct 22, 2020

pseudotensor commented Oct 22, 2020 •

edited

pseudotensor commented Oct 22, 2020

pseudotensor commented Oct 22, 2020

trivialfis commented Oct 22, 2020

pseudotensor commented Dec 24, 2020 •

edited

trivialfis commented Dec 24, 2020

pseudotensor commented Dec 24, 2020

trivialfis commented Dec 24, 2020

pseudotensor commented Dec 25, 2020

pseudotensor commented Dec 25, 2020

pseudotensor commented Dec 25, 2020

pseudotensor commented Dec 25, 2020 •

edited

Handle empty dataset in ranking metric. #6272

Handle empty dataset in ranking metric. #6272

Comments

pseudotensor commented Oct 22, 2020

pseudotensor commented Oct 22, 2020 • edited

pseudotensor commented Oct 22, 2020

pseudotensor commented Oct 22, 2020

trivialfis commented Oct 22, 2020

pseudotensor commented Dec 24, 2020 • edited

trivialfis commented Dec 24, 2020

pseudotensor commented Dec 24, 2020

trivialfis commented Dec 24, 2020

pseudotensor commented Dec 25, 2020

pseudotensor commented Dec 25, 2020

pseudotensor commented Dec 25, 2020

pseudotensor commented Dec 25, 2020 • edited

pseudotensor commented Oct 22, 2020 •

edited

pseudotensor commented Dec 24, 2020 •

edited

pseudotensor commented Dec 25, 2020 •

edited