## Example for extending Mondrian Trees

In [1]:
import Base.copy
using Distributions
using MLBase
include("Mondrian_Forest_Classifier.jl"); 
include("Mondrian_extention.jl")

expand!

This notebook explains how to extend Mondrian Trees and Mondrian Forests. Using an online algorithm, we can incorporate data points one by one.

### Functions

All functions used in this section can be found in the files "Mondrian_extention.jl" and "Mondrian_extention_utils.jl". 

In [25]:
?Extend_Mondrian_Tree!

search:



`function Extend_Mondrian_Tree!(T::Mondrian_Tree,λ::Float64,X::Array{Float64} where N,Y::Int64)`

This function extends an already existing Mondrian Tree by ONE new datapoint that gets incorperated in the tree. 

`Input`: Mondrian Tree T (abstract type Mondrian_Tree), Lifetime parameter λ (Float64), 1dim Array of Features Array X (Array of Float64), classlabel Y (Int64)

`Output`: Mondrian Tree

`Files needed to run this function`: "Mondrian_Forest_Classifier.jl", "Mondrian_extention.jl"

This function calls the function Extend_Mondrian_Block.

The usage of the function "expand!" is recommended to expand Mondrian Trees as it has a nicer user interface.


In [26]:
?Extend_Mondrian_Tree!

search:



`function Extend_Mondrian_Tree!(T::Mondrian_Tree,λ::Float64,X::Array{Float64} where N,Y::Int64)`

This function extends an already existing Mondrian Tree by ONE new datapoint that gets incorperated in the tree. 

`Input`: Mondrian Tree T (abstract type Mondrian_Tree), Lifetime parameter λ (Float64), 1dim Array of Features Array X (Array of Float64), classlabel Y (Int64)

`Output`: Mondrian Tree

`Files needed to run this function`: "Mondrian_Forest_Classifier.jl", "Mondrian_extention.jl"

This function calls the function Extend_Mondrian_Block.

The usage of the function "expand!" is recommended to expand Mondrian Trees as it has a nicer user interface.


In [27]:
?expand!

search: [1me[22m[1mx[22m[1mp[22m[1ma[22m[1mn[22m[1md[22m [1me[22m[1mx[22m[1mp[22m[1ma[22m[1mn[22m[1md[22muser macro[1me[22m[1mx[22m[1mp[22m[1ma[22m[1mn[22m[1md[22m @macro[1me[22m[1mx[22m[1mp[22m[1ma[22m[1mn[22m[1md[22m



`function expand!(T::Mondrian_Tree,X::Array{Float64,N} where N,Y::Array{Int64},λ::Float64)`

This function expands an already sampled Mondrian Tree by a desired number of datapoints. 

`Input`: Mondrian Tree T (abstract type Mondrian_Tree), array of features X (Array of Float64), array of class labels (1dim of Float 64), Lifetime parameter λ (Float 64)

Each row in the array X represents one set of features, the corresponding row in Y represents the class label. 

`Output`: Mondrian Tree with incoporated new datapoints

`Files needed to run this function`: Mondrian_Forest_Classifier.jl", "Mondrian_extention.jl"

This function calls the function Extend_Mondrian_Tree. 


### 1) Extend a single tree

We generate two sets of fakedata and sample the forest on the first one. Then we extend the forest on the second one and compute the training accuracy. For comparison, we then train a Mondrian Tree on the merged datasets and evaluate the training accuracy for this one as well.

In [2]:
# generate fake data
function 
    FakedataClassif(n_obs,d)
    x=randn((n_obs,d))
    y=(sum(x*randn(d),2) .> mean(sum(x*randn(d),2)))
    y=y+1
    y=Int.(y)
    return x,y
end

FakedataClassif (generic function with 2 methods)

In [3]:
data = FakedataClassif(10000,100)  #base dataset
d_new=FakedataClassif(1000,100);   #extention dataset

In [4]:
T = Mondrian_Tree()
T_online = Sample_Mondrian_Tree!(T,1e9,data[1],data[2]);  #Sample tree on dataset "data"

In [5]:
T_online=expand!(T_online,d_new[1],d_new[2],1e9);  #Expand the tree on the extention dataset

In [5]:
All_data=vcat(data[1],d_new[1]);
all_labels = vcat(data[2],d_new[2]);

In [8]:
T_full = Sample_Mondrian_Tree!(T,1e9,All_data,all_labels);  #compute tree of the merged dataset

In [9]:
compute_predictive_posterior_distribution!(T_full,10*size(All_data,2))

In [10]:
pred = []
    for i in 1:size(All_data,1)
        p = predict!(T_full,All_data[i,:],10*size(All_data,2))
        if p[1] > p[2]
            push!(pred,1)
        else
            push!(pred,2)
        end
    end


In [11]:
counter = 0
for i=1:size(pred)[1]
    if pred[i]==all_labels[i]
        counter = counter+1
    end
end
counter = counter /size(pred)[1]

0.5629090909090909

In [12]:
compute_predictive_posterior_distribution!(T_online,10*size(All_data,2))

In [13]:
 pred = []
        for i in 1:size(All_data,1)
            p = predict!(T_online,All_data[i,:],10*size(All_data,2))
            if p[1] > p[2]
                push!(pred,1)
            else
                push!(pred,2)
            end
        end

In [14]:
counter = 0
for i=1:size(pred)[1]
    if pred[i]==all_labels[i]
        counter = counter+1
    end
end
counter = counter /size(pred)[1]

0.5629090909090909

### 2) Extend Mondrian Forests

Now we extend a Mondrian Forest classifier that was pretrained on the same dataset as above. We again extend the classifier and then train a new classifier on the whole datset for comparison.

In [15]:
function Extend_Mondrian_Forest_Classifier(MF,X_extend, Y_extend,λ)
#     X=MF.X
#     if size(X)[2] != size(X_extend)[2]
#         println("Error - the number of features in the new data doesn't fit the original data")
#     end
    Trees = MF.Trees
    for i=1:MF.n_trees
        T = expand!(Trees[i], X_extend,Y_extend,λ)
        Trees[i]=T
    end
    MF.Trees=Trees
    return MF
end

Extend_Mondrian_Forest_Classifier (generic function with 1 method)

In [19]:
MF = Mondrian_Forest_Classifier(50)  #initialise Mondrian Forest Classifier with 100 Trees

Mondrian_Forest_Classifier(50, Mondrian_Tree[], Array{Float64}(0,0), Int64[])

In [None]:
train!(MF, data[1], data[2], 1e9);  #Train the classifier on the same dataset as before

In [20]:
MF_ex = Extend_Mondrian_Forest_Classifier(MF,d_new[1],d_new[2],1e9);  #extend the classifier

LoadError: [91mMethodError: no method matching get(::Array{Int64,1})[0m
Closest candidates are:
  get(::AbstractArray, [91m::Integer[39m, [91m::Any[39m) at abstractarray.jl:1010
  get(::AbstractArray, [91m::Tuple{}[39m, [91m::Any[39m) at abstractarray.jl:1011
  get(::AbstractArray, [91m::Tuple{Vararg{Int64,N}} where N[39m, [91m::Any[39m) at abstractarray.jl:1012
  ...[39m

In [6]:
MF_control = Mondrian_Forest_Classifier(50)   #initialise control classifier with 100 Trees
train!(MF_control, All_data, all_labels, 1e9)  # train control classifier

1-element Array{Future,1}:
 Future(1, 1, 1, #NULL)

In [7]:
pred=predict!(MF_control, All_data);
println("Train Accuracy")
println(correctrate(all_labels,convert(Array{Int,1},pred)))

Train Accuracy
0.7377272727272727


## To Do:

 test on more classes, implement paused mondrians, finish setting up the framework 