The purpose of these experiments is to evaluate the predictive performance **(test accuracy)** of MF as a function of 

(i) fraction of training data and 

(ii) training time.

- We divide the training data into **100 mini-batches** and we compare the performance of online random forests (MF, ORF-Saffari [20]) to batch random forests (Breiman-RF, ERT-k, ERT-1) which are trained on the same fraction of the training data.
- We evaluate on four of the five datasets used in [20] — we excluded the mushroom dataset as even very simple logical rules achieve > 99% accuracy on this dataset. 
- We re-scaled the datasets such that each feature takes on values in the range [0, 1] (by subtracting the min value along that dimension and dividing by the range along that dimension, where range = max − min).

As is common in the random forest literature [2], we set **the number of trees M = 100.**

For Mondrian forests, we set the lifetime λ = ∞ and the HNSP discount parameter γ = 10D. For ORF-Saffari, we set num epochs = 20 (number of passes through the training data) and set the other hyper parameters to the values used in [20]. For Breiman-RF and ERT, the hyper parameters are set to default values. We repeat each algorithm with five random initializations and report the mean performance. The results are shown in Figure 3. (The * in Breiman-RF* indicates scikit-learn implementation.)

# ORF-Saffari

In [3]:
import numpy as np
import math
from ORFpy import ORF, dataRange
#from ORFpy import ORT # if you want online random tree

def f(x):
    return int(x[0]*x[0] + x[1]*x[1] < 1)

n = 1000
X = np.random.randn(n,2)
y = map(f,X)

# setting parameters for ORF. For more details: >>> help(ORF).
param = {'minSamples': 100, 'minGain': .01, 'numClasses': 2, 'xrng': dataRange(X), 'maxDepth': 4}
orf = ORF(param,numTrees=50)
for i in range(n):
    orf.update(X[i,:],y[i])

orf.forest[0].draw()
#                             _________________________________________________X2 < 1.26_
#                _____________X2 < -0.6________________                                 0
#   _____________X2 < -0.92_             _____________X1 < 0.72___
# __X1 < -0.31_            1           __X1 < -1.52_           __X1 < 1.74_
# 0           0                        0           1           0          0

xtest = np.random.randn(n, 2)
ytest = map(f, xtest)
preds = orf.predicts(xtest)

predAcc = sum(map(lambda z: int(z[0] == z[1]), zip(preds,ytest))) / float(len(preds))
conf = orf.confusion(xtest,ytest)
orf.printConfusion(conf)
# y\pred   0      1
# 0       578     52
# 1       1      369
print "Accuracy: " + str(round(predAcc * 100,2)) + "%"
# Accuracy: 94.7%


                           _____________________________________X2 < 1.15_ 
  _________________________X1 < 0.5___                                   0
__X1 < -1.08___                    __X2 < -1.37___                        
0           __X2 < 0.65_           0           __X2 < -0.02_              
            1          0                       0           0              

y\pred	 0	1
0	584	32
1	24	360
Accuracy: 94.4%


# Breiman RF

In [4]:
from sklearn.ensemble import RandomForestClassifier

# ERT

In [6]:
from sklearn.ensemble import ExtraTreesClassifier