-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] ENH Decision Tree new backend computeSplit*Kernel
histogram calculation optimization
#3674
Conversation
* using atomics to calculate PDFs and then using blockScan to get required CDFs that was originally issueing too many atomicAdds to shared memory
* was earlier hard-coded
* dynamically assigning based on occupancy while ceil-ing it to minimum 4 blocks
* pruning unnecessary comments and code * improving doxygen comments * adding some explanatory comments
…stogram-calculation-optimization-for-computesplitclassificationkernel
* shift the blockscan code to a reusable device function * change appropriately in `computeSplitClassificationKernel`
…ogram-calculation-optimization-computeSplitRegressionKernel
typedef cub::BlockScan<DataT, TPB> BlockScan; | ||
__shared__ typename BlockScan::TempStorage temp_storage; | ||
|
||
for (IdxT tix = threadIdx.x; tix < max(TPB, nbins); tix += blockDim.x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nbins > TPB
, then the resulting scan will be incorrect (because the total sum of the previous iteration is not being carried forward for the next iteration).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IOW, InclusiveSum
provides an option to also get the total sum with this function: https://nvlabs.github.io/cub/classcub_1_1_block_scan.html#a99222ab9b122e6df879ee04b4e8244da
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😅 Thank you for that, have rectified it 👍🏻
// locations | ||
offset_cdf += nbins; | ||
//convert pdf to cdf | ||
pdf_to_cdf<int, IdxT, TPB>(pdf_shist + offset_pdf, cdf_shist + offset_cdf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just compute this cdf using the cdf from the above and its total sum? That way, we could avoid an extra block-scan operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did think of that initially, but leveraging total-sum from previous sumscan did not strike me! I have added it thejaswi 🙏🏻
…ogram-calculation-optimization-computeSplitRegressionKernel
* incorporating block_aggregate in inclusive sumscan * using total sum instead of doing a right-to-left sumscan
computeSplit*Kernel
histogram calculation optimization
…ogram-calculation-optimization-computeSplitRegressionKernel
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #3674 +/- ##
===============================================
+ Coverage 80.70% 82.92% +2.21%
===============================================
Files 227 227
Lines 17615 17591 -24
===============================================
+ Hits 14217 14587 +370
+ Misses 3398 3004 -394
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@venkywonka please resolve conflicts |
…stogram-calculation-optimization-computeSplitRegressionKernel
607a926
to
6e39fef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM.
@gpucibot merge |
This is a follow-up of PR #3616 and should be merged after that.
This PR introduces:
pdf_to_cdf
conversion using inclusive-sumscan into a device function so that it can be reused by both theML::DecisionTree::computeSplitClassificationKernel
andML::DecisionTree::computeSplitRegressionKernel
ML::DecisionTree::computeSplitRegressionKernel
. These histograms are used for node-splitting in decision trees for the task of regression.computeSplitClassificationKernel
histogram calculation and occupancy optimization #3616