[REVIEW] ENH Decision Tree new backend `computeSplit*Kernel` histogram calculation optimization #3674

venkywonka · 2021-03-31T07:28:19Z

This is a follow-up of PR #3616 and should be merged after that.
This PR introduces:

Modularizing the pdf_to_cdf conversion using inclusive-sumscan into a device function so that it can be reused by both the ML::DecisionTree::computeSplitClassificationKernel and ML::DecisionTree::computeSplitRegressionKernel
Integrating the above mentioned device function to calculate the prediction sums and counts in the ML::DecisionTree::computeSplitRegressionKernel . These histograms are used for node-splitting in decision trees for the task of regression.
The reason for this optimization follows the same explanation given in PR ENH Decision Tree new backend computeSplitClassificationKernel histogram calculation and occupancy optimization #3616
As of now, only the first pass has been optimized using sumscans.

* using atomics to calculate PDFs and then using blockScan to get required CDFs that was originally issueing too many atomicAdds to shared memory

* was earlier hard-coded

* dynamically assigning based on occupancy while ceil-ing it to minimum 4 blocks

* pruning unnecessary comments and code * improving doxygen comments * adding some explanatory comments

…stogram-calculation-optimization-for-computesplitclassificationkernel

* shift the blockscan code to a reusable device function * change appropriately in `computeSplitClassificationKernel`

…ogram-calculation-optimization-computeSplitRegressionKernel

teju85 · 2021-03-31T07:42:19Z

cpp/src/decisiontree/batched-levelalgo/kernels.cuh

+  typedef cub::BlockScan<DataT, TPB> BlockScan;
+  __shared__ typename BlockScan::TempStorage temp_storage;
+
+  for (IdxT tix = threadIdx.x; tix < max(TPB, nbins); tix += blockDim.x) {


If nbins > TPB, then the resulting scan will be incorrect (because the total sum of the previous iteration is not being carried forward for the next iteration).

IOW, InclusiveSum provides an option to also get the total sum with this function: https://nvlabs.github.io/cub/classcub_1_1_block_scan.html#a99222ab9b122e6df879ee04b4e8244da

😅 Thank you for that, have rectified it 👍🏻

teju85 · 2021-03-31T07:51:34Z

cpp/src/decisiontree/batched-levelalgo/kernels.cuh

+    // locations
+    offset_cdf += nbins;
+    //convert pdf to cdf
+    pdf_to_cdf<int, IdxT, TPB>(pdf_shist + offset_pdf, cdf_shist + offset_cdf,


why not just compute this cdf using the cdf from the above and its total sum? That way, we could avoid an extra block-scan operation.

I did think of that initially, but leveraging total-sum from previous sumscan did not strike me! I have added it thejaswi 🙏🏻

…ogram-calculation-optimization-computeSplitRegressionKernel

* incorporating block_aggregate in inclusive sumscan * using total sum instead of doing a right-to-left sumscan

…ogram-calculation-optimization-computeSplitRegressionKernel

codecov-io · 2021-04-03T05:20:21Z

Codecov Report

Merging #3674 (d4ee98f) into branch-0.19 (fd9ec89) will increase coverage by 2.21%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##           branch-0.19    #3674      +/-   ##
===============================================
+ Coverage        80.70%   82.92%   +2.21%     
===============================================
  Files              227      227              
  Lines            17615    17591      -24     
===============================================
+ Hits             14217    14587     +370     
+ Misses            3398     3004     -394

Flag	Coverage Δ
dask	`45.31% <ø> (+0.32%)`	⬆️
non-dask	`74.95% <ø> (+2.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
python/cuml/__init__.py	`95.58% <ø> (+0.20%)`	⬆️
...cuml/_thirdparty/sklearn/preprocessing/__init__.py	`100.00% <ø> (ø)`
...on/cuml/_thirdparty/sklearn/preprocessing/_data.py	`64.27% <ø> (+1.16%)`	⬆️
...hirdparty/sklearn/preprocessing/_discretization.py	`83.33% <ø> (-0.88%)`	⬇️
...l/_thirdparty/sklearn/preprocessing/_imputation.py	`85.54% <ø> (+22.74%)`	⬆️
python/cuml/_thirdparty/sklearn/utils/extmath.py	`56.89% <ø> (ø)`
...cuml/_thirdparty/sklearn/utils/skl_dependencies.py	`79.54% <ø> (+25.62%)`	⬆️
...ython/cuml/_thirdparty/sklearn/utils/validation.py	`18.41% <ø> (-4.04%)`	⬇️
python/cuml/cluster/__init__.py	`100.00% <ø> (ø)`
python/cuml/cluster/agglomerative.pyx	`96.47% <ø> (ø)`
... and 152 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1554f14...d4ee98f. Read the comment docs.

teju85 · 2021-04-04T07:27:47Z

@venkywonka please resolve conflicts

…stogram-calculation-optimization-computeSplitRegressionKernel

hcho3

LGTM. This pull request produces ~1.5x speed up on a few regression data sets I've tried. It does not impact the classification task.

(only showing public data sets here)

teju85

Changes LGTM.

JohnZed · 2021-04-06T02:36:08Z

@gpucibot merge

venkywonka added 20 commits March 20, 2021 14:58

🌾 kernel perf improvements👇🏻

b4e2801

* using atomics to calculate PDFs and then using blockScan to get required CDFs that was originally issueing too many atomicAdds to shared memory

✔ accuracy bug fix

144607e

♻ fused left and right scans into single loop

06149d9

♻ computing blks_for_rows based on occupancy-calculator

3e000c4

* was earlier hard-coded

🎨 clang-format fix

dca9394

♻ reduce smem bank conflicts by changing rw patterns

71c1097

♻ changing to ceil-based dynamic blks_for_rows strategy

7446bf3

* dynamically assigning based on occupancy while ceil-ing it to minimum 4 blocks

♻ changing to occupancy-based dynamic blks_for_rowsstrategy

6eafd08

🎨 clang format fix

b537a10

🔀 merge branch-0.19

24a50be

♻ review changes

368b5c5

* pruning unnecessary comments and code * improving doxygen comments * adding some explanatory comments

© copyright fix

dbb3111

🔀 merge branch-0.19

32a773f

🔀 Merge remote-tracking branch 'upstream/branch-0.19' into enh-ext-hi…

aa99a2e

…stogram-calculation-optimization-for-computesplitclassificationkernel

♻ modularize pdf-cdf conversion

7dedb41

* shift the blockscan code to a reusable device function * change appropriately in `computeSplitClassificationKernel`

♻ pdf to cdf blockscan impl for first pass of regression kernel

25bfbcf

♻ prune debug code and add explanatory comments

105b9de

🎨 clang format fix

b1946b3

🔀 merge with upstream/branch-0.19

3a1f282

Merge remote-tracking branch 'upstream/branch-0.19' into enh-ext-hist…

9ab85c1

…ogram-calculation-optimization-computeSplitRegressionKernel

github-actions bot added the CUDA/C++ label Mar 31, 2021

venkywonka added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Perf Related to runtime performance of the underlying code labels Mar 31, 2021

teju85 requested changes Mar 31, 2021

View reviewed changes

venkywonka added 3 commits April 1, 2021 18:05

Merge remote-tracking branch 'upstream/branch-0.19' into enh-ext-hist…

58cfc32

…ogram-calculation-optimization-computeSplitRegressionKernel

♻ review changes

addb8fc

* incorporating block_aggregate in inclusive sumscan * using total sum instead of doing a right-to-left sumscan

🎨 clang format fix

6eb748d

venkywonka changed the title ~~[WIP] ENH Decision Tree new backend computeSplitRegressionKernel histogram calculation optimization~~ [REVIEW] ENH Decision Tree new backend computeSplitRegressionKernel histogram calculation optimization Apr 1, 2021

venkywonka marked this pull request as ready for review April 1, 2021 13:10

venkywonka requested a review from a team as a code owner April 1, 2021 13:10

venkywonka changed the title ~~[REVIEW] ENH Decision Tree new backend computeSplitRegressionKernel histogram calculation optimization~~ [REVIEW] ENH Decision Tree new backend computeSplit*Kernel histogram calculation optimization Apr 1, 2021

venkywonka mentioned this pull request Apr 1, 2021

ENH Decision Tree new backend computeSplitClassificationKernel histogram calculation and occupancy optimization #3616

Merged

Merge remote-tracking branch 'upstream/branch-0.19' into enh-ext-hist…

d4ee98f

…ogram-calculation-optimization-computeSplitRegressionKernel

🔀 Merge remote-tracking branch 'upstream/branch-0.19' into enh-ext-hi…

6e39fef

…stogram-calculation-optimization-computeSplitRegressionKernel

venkywonka force-pushed the enh-ext-histogram-calculation-optimization-computeSplitRegressionKernel branch from 607a926 to 6e39fef Compare April 4, 2021 08:05

hcho3 approved these changes Apr 6, 2021

View reviewed changes

teju85 approved these changes Apr 6, 2021

View reviewed changes

rapids-bot bot merged commit 9feecfb into rapidsai:branch-0.19 Apr 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] ENH Decision Tree new backend `computeSplit*Kernel` histogram calculation optimization #3674

[REVIEW] ENH Decision Tree new backend `computeSplit*Kernel` histogram calculation optimization #3674

venkywonka commented Mar 31, 2021 •

edited

teju85 Mar 31, 2021

teju85 Mar 31, 2021

venkywonka Apr 1, 2021

teju85 Mar 31, 2021

venkywonka Apr 1, 2021

codecov-io commented Apr 3, 2021

teju85 commented Apr 4, 2021

hcho3 left a comment

teju85 left a comment

JohnZed commented Apr 6, 2021

[REVIEW] ENH Decision Tree new backend computeSplit*Kernel histogram calculation optimization #3674

[REVIEW] ENH Decision Tree new backend computeSplit*Kernel histogram calculation optimization #3674

Conversation

venkywonka commented Mar 31, 2021 • edited

teju85 Mar 31, 2021

Choose a reason for hiding this comment

teju85 Mar 31, 2021

Choose a reason for hiding this comment

venkywonka Apr 1, 2021

Choose a reason for hiding this comment

teju85 Mar 31, 2021

Choose a reason for hiding this comment

venkywonka Apr 1, 2021

Choose a reason for hiding this comment

codecov-io commented Apr 3, 2021

Codecov Report

teju85 commented Apr 4, 2021

hcho3 left a comment

Choose a reason for hiding this comment

teju85 left a comment

Choose a reason for hiding this comment

JohnZed commented Apr 6, 2021

[REVIEW] ENH Decision Tree new backend `computeSplit*Kernel` histogram calculation optimization #3674

[REVIEW] ENH Decision Tree new backend `computeSplit*Kernel` histogram calculation optimization #3674

venkywonka commented Mar 31, 2021 •

edited