DecisionTree notebook #5

tlienart · 2018-04-09T08:53:18Z

# I would strongly suggest not running past n=4
n = 5

is a bit funny.

Re benchmark with Python do you

have some idea why DecisionTree.jl is a factor 3-4 slower?
know whether the results identical? (tree obtained, classification accuracy), this is not necessarily the case as DT.jl may be using a different algorithm

Also I think it would be interesting to test on pure decision trees (not on forest)

harveydevereux · 2018-04-09T09:19:05Z

I'm returning to the DT notebook today so I'll have some answers shortly.

Thanks for spotting the n=5 !

harveydevereux · 2018-04-09T12:32:30Z

My best guess is that python uses sparse data representations for the training data, and perhaps because in python trees are represented as arrays (cython pointers) rather than spawning large sets of nested node and leaf objects. Perhaps it would be useful to look deeper, say into memory allocations in Julia.
Both DT.jl and python scikitlearn are implementing the same model (CART) and seem to produce the
same decision tree and prediction accuracy on the titanic data. So I think the benchmark is very comparable.

I've now added a pure decision tree benchmark. It seems that Julia is slower by an order of magnitude, this also seems to get worse with more data.

tlienart · 2018-04-14T07:55:44Z

Wow, an order of magnitude! well that'd be a nice side project to work on: build a decent DecisionTreeFast.jl, there's really no reason for it to be much slower than the Python one...

I think their code is not based on SkLearn but rather on something (that I don't know) like ml toolkit or some similar name. It'd be interesting to see if it compares favourably to that one or not (probably not).

dominusmi assigned harveydevereux Apr 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DecisionTree notebook #5

DecisionTree notebook #5

tlienart commented Apr 9, 2018

harveydevereux commented Apr 9, 2018

harveydevereux commented Apr 9, 2018

tlienart commented Apr 14, 2018

DecisionTree notebook #5

DecisionTree notebook #5

Comments

tlienart commented Apr 9, 2018

harveydevereux commented Apr 9, 2018

harveydevereux commented Apr 9, 2018

tlienart commented Apr 14, 2018