Sketching from adapters #5365

RAMitchell · 2020-02-26T03:59:11Z

Overhaul of GPU sketching code to support sketching on external data.

Todo:
Verify peak memory usage

Some performance charts:

trivialfis · 2020-02-26T04:18:38Z

I'm a little bit concerned about removing weight, we should do some experiments to estimate the impact on various datasets with different characteristics.

hcho3 · 2020-02-26T05:36:19Z

Weighted quantile sketch is one of prominent contribution of XGBoost paper (2016). Is it necessary to remove it?

RAMitchell · 2020-02-26T07:16:39Z

We cant currently access weights via the adapter constructor - they get added after in different c_api functions. If we can change the c_api to get weights on dmatrix construction we can do it.

trivialfis · 2020-02-26T07:25:09Z

Can we create an empty DMatrix handle at first?

RAMitchell · 2020-03-03T23:03:40Z

Current implementation preserves existing behaviour, using weights to compute quantiles when available from the DMatrix. When we implement DMatrix from adapters we can revisit how to access weights in this case.

src/common/hist_util.cu

trivialfis · 2020-03-04T05:55:43Z

src/common/hist_util.cu

-  }
+// Count the entries in each column and exclusive scan
+void GetColumnSizesScan(int device,
+                        dh::caching_device_vector<size_t>* column_sizes_scan,


I defined bst_row_t and bst_feature_t and we should use them more often.

Noted. If something is a sum I think it is reasonable to use size_t. For example a sum of bst_row_t can exceed 32 bits.

src/common/hist_util.cu

trivialfis · 2020-03-06T04:24:23Z

I don't think the maker function can be a cause of fatal error right? It's just a helper function for type deduction

trivialfis · 2020-03-06T06:45:11Z

@RAMitchell

This is the whole definition of thrust::make_transform_iterator:

template <class AdaptableUnaryFunction, class Iterator>
inline __host__ __device__
transform_iterator<AdaptableUnaryFunction, Iterator>
make_transform_iterator(Iterator it, AdaptableUnaryFunction fun)
{
  return transform_iterator<AdaptableUnaryFunction, Iterator>(it, fun);
} // end make_transform_iterator

Other than it's a host device function, there's no difference with your implementation. Could you take a deeper look into the fatal error you encountered? I think it's worth the debugging effort.

RAMitchell · 2020-03-06T19:52:50Z

My implementation explicitly specifies the return type. The issue has always beem automatic return type deduction on msvc. I can look further.

trivialfis · 2020-03-06T20:25:42Z

Yup thanks! You can just print the return type by debugger or use typeid(RetType).name(). If that's the root cause than there's nothing we can do. But I'm still interested in what other type can MSVC deduce. ;-) It's just an unification algorithm, should not be way too weird.

RAMitchell · 2020-03-07T01:58:29Z

These are the two return types for my version and thrust:

thrust::transform_iterator<__nv_dl_wrapper_t<__nv_dl_tag<void (__cdecl*)(xgboost::data::CupyAdapter *,unsigned __int64,unsigned __int64,float,xgboost::common::SketchContainer *,int),&xgboost::common::ProcessBatch<xgboost::data::CupyAdapter>,1>,xgboost::data::CupyAdapterBatch const >,thrust::counting_iterator<unsigned __int64,thrust::use_default,thrust::use_default,thrust::use_default>,xgboost::data::COOTuple,thrust::use_default>

thrust::transform_iterator<__nv_dl_wrapper_t<__nv_dl_tag<void (__cdecl*)(xgboost::data::CupyAdapter *,unsigned __int64,unsigned __int64,float,xgboost::common::SketchContainer *,int),&xgboost::common::ProcessBatch<xgboost::data::CupyAdapter>,2>,xgboost::data::CupyAdapterBatch const >,thrust::counting_iterator<unsigned __int64,thrust::use_default,thrust::use_default,thrust::use_default>,thrust::use_default,thrust::use_default>

trivialfis · 2020-03-07T07:47:52Z

Let me take another look today. It seems weird that the GetRowStride is defined twice and have to pull all data to host.

This reverts commit a38e7bd.

This reverts commit 6a85632.

RAMitchell marked this pull request as ready for review March 3, 2020 23:02

RAMitchell force-pushed the adapter-sketch branch from 7a4e713 to 85a7c62 Compare March 4, 2020 00:21

trivialfis reviewed Mar 4, 2020

View reviewed changes

RAMitchell added 6 commits March 5, 2020 10:18

Sketching from adapters

91563a5

Add weights test

c37c6f8

Reintroduce weights for DeviceSketch

788360c

Linux build

d7fb199

Remove incorrect test

3910f0b

Address review comments

5852f48

RAMitchell force-pushed the adapter-sketch branch from 85a7c62 to 5852f48 Compare March 4, 2020 22:03

Remove benchmark code

9479cf7

trivialfis approved these changes Mar 7, 2020

View reviewed changes

RAMitchell merged commit a38e7bd into dmlc:master Mar 7, 2020

sriramch mentioned this pull request Mar 10, 2020

- ranking metric acceleration on the gpu #5398

Merged

sriramch added a commit to sriramch/xgboost that referenced this pull request Mar 11, 2020

Revert "Sketching from adapters (dmlc#5365)"

6a85632

This reverts commit a38e7bd.

sriramch added a commit to sriramch/xgboost that referenced this pull request Mar 11, 2020

Revert "Revert "Sketching from adapters (dmlc#5365)""

2bc60f9

This reverts commit 6a85632.

sriramch mentioned this pull request Mar 12, 2020

Fix memory usage of device sketching #5407

Merged

trivialfis mentioned this pull request Mar 13, 2020

Run prediction on histogram index. #5319

Closed

lock bot locked as resolved and limited conversation to collaborators Jun 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sketching from adapters #5365

Sketching from adapters #5365

RAMitchell commented Feb 26, 2020

trivialfis commented Feb 26, 2020 •

edited

Loading

hcho3 commented Feb 26, 2020

RAMitchell commented Feb 26, 2020

trivialfis commented Feb 26, 2020

RAMitchell commented Mar 3, 2020

trivialfis Mar 4, 2020

RAMitchell Mar 4, 2020

trivialfis commented Mar 6, 2020 •

edited

Loading

trivialfis commented Mar 6, 2020

RAMitchell commented Mar 6, 2020

trivialfis commented Mar 6, 2020 •

edited

Loading

RAMitchell commented Mar 7, 2020

trivialfis commented Mar 7, 2020

Sketching from adapters #5365

Sketching from adapters #5365

Conversation

RAMitchell commented Feb 26, 2020

trivialfis commented Feb 26, 2020 • edited Loading

hcho3 commented Feb 26, 2020

RAMitchell commented Feb 26, 2020

trivialfis commented Feb 26, 2020

RAMitchell commented Mar 3, 2020

trivialfis Mar 4, 2020

Choose a reason for hiding this comment

RAMitchell Mar 4, 2020

Choose a reason for hiding this comment

trivialfis commented Mar 6, 2020 • edited Loading

trivialfis commented Mar 6, 2020

RAMitchell commented Mar 6, 2020

trivialfis commented Mar 6, 2020 • edited Loading

RAMitchell commented Mar 7, 2020

trivialfis commented Mar 7, 2020

trivialfis commented Feb 26, 2020 •

edited

Loading

trivialfis commented Mar 6, 2020 •

edited

Loading

trivialfis commented Mar 6, 2020 •

edited

Loading