Skip to content

adambloniarz/silo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#SILO: Supervised Local modeling method for distributing random forests This code implements the method introduced in

Bloniarz, A., Wu, C., Yu, B., & Talwalkar, A. (2016). Supervised Neighborhoods for Distributed Nonparametric Regression. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (pp. 1450-1459). (link)

To compile and assemble jar, run sbt assembly. This will create ./target/scala-2.10/DistributedForest-assembly-1.0.jar

##Basic usage

import edu.berkeley.statistics.DistributedForest.DistributedForest
import edu.berkeley.statistics.SerialForest.{RandomForestParameters, TreeParameters}

// Set random forest parameters
val forestParameters = RandomForestParameters(100,                    // Number of trees
                                              true,                   // Resample with replacement?
                                              TreeParameters(3,       // mtry
                                                             10))     // max number of training points in leaf node

// Train the models
// assumes trainingDataRDD is a RDD[LabeledPoint]
val forests = DistributedForest.train(trainingDataRDD, forestParameters)

// persist the trained forests in memory
forests.persist()

// set batch size for processing test points
val batchSize = 100

// set parameter to limit the size of supervised neighborhoods
// set to a large value to use the entire neighborhood (recommended)
val numPNNsPerPartition = 100000  

// Make predictions at test points with local regression
// assumes testData is an IndexedSeq[Vector]
val predictions = DistributedForest.predictWithLocalRegressionBatch(
      testData, forests, numPNNsPerPartition, batchSize)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages