We studied how to parallelize the Decision Tree Building procedure in a multi-processors setting. We first used Cilk spawn, sync, and parallel for to parallelize the algorithm. We tested its correctness as well as the time and speedup. Then we looked at an approximation algorithm, Streaming Parallel Decision Tree, which is designed for classifying large data sets and streaming data. It assumes a setting where the size of data set is so large that it has to be stored distributively in different processors. We implemented SPDT in shared memory machine, measured its accuracy and performance and compared it with our recursive decision tree. At last, we applied the decision tree algorithm as a subroutine to build Bagging Forest. We saw some very interesting behaviors in terms of accurracy, time and speedup.
-
Notifications
You must be signed in to change notification settings - Fork 0
XXYXie/ParallelDecisionTree
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Parallel recursive and streaming decision tree implemented in CilkPlus.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published