A rudimentary implementation of different Decision Tree related algorithms made in python without using any external machine learning libraries
Binary Decision Tree and Bagged Decision Trees implemented from scratch in Python
Given a dataset the goal of the algorithm is to generate a Binary Decision Tree to accurately predict the value of a new example
In each iteration it calculates the entropy using the formula
- { x · log2(x) + (1 - x) · log2(1 - x) }
where is x is the ratio of true cases to total cases
more about this formula and calculation of entropy in Binary Trees
It further calculates information gain, based on which it decides where to split
Information gain is calculated as follows:
Assume that before splitting entropy was Hroot and there were n elements in total
Now it splits at some feature and sends a and b elements to the left and right branches respectively, with their entropies being Hleft and Hright
gain = Hroot - { [ a/n ] · Hleft + [ b/n ] · Hright }
The goal is to maximise gain each iteration and for every branch
Given a dataset the goal of the algorithm is to generate a set of 'n' Binary Decision Trees to predict the probability of a new example being true or false
Each Binary Decision Tree is created on a unique dataset generated with sampling with replacement on the original dataset
The generation of the Binary Decision Trees follows the same process as before only with a different dataset
An example result of binarydecisiontree.ipynb (implementation of Binary Decision Tree)- Python 3.x
- graphviz 9.0.0 (required only for visualisation)
- dsplot 0.9.0 (required only for visualisation)
$ git clone https://github.com/AngadBasandrai/decision-tree-python.git
- Open .ipynb files
sudo apt install graphviz
sudo yum install graphviz
sudo port install graphviz
brew install graphviz
- Refer to DSPlot installation guide
NOTE: The files were made in kaggle and there may be some portability issues so it is recommended to import them into kaggle
Angad Basandrai |
Made with ❤️ by Angad Basandrai