Skip to content

A rudimentary implementation of Decision Trees made in python without using any external machine learning libraries

License

Notifications You must be signed in to change notification settings

AngadBasandrai/decision-tree-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

  Decision Tree Python


A rudimentary implementation of different Decision Tree related algorithms made in python without using any external machine learning libraries


Introduction

Binary Decision Tree and Bagged Decision Trees implemented from scratch in Python

Binary Decision Tree

Given a dataset the goal of the algorithm is to generate a Binary Decision Tree to accurately predict the value of a new example

In each iteration it calculates the entropy using the formula

- { x · log2(x)  +  (1 - x) · log2(1 - x) }

where is x is the ratio of true cases to total cases

more about this formula and calculation of entropy in Binary Trees

It further calculates information gain, based on which it decides where to split

Information gain is calculated as follows:

Assume that before splitting entropy was Hroot and there were n elements in total

Now it splits at some feature and sends a and b elements to the left and right branches respectively, with their entropies being Hleft and Hright

gain = Hroot - { [ a/n ] · Hleft + [ b/n ] · Hright }

The goal is to maximise gain each iteration and for every branch

Bagged Decision Trees Algorithm

Given a dataset the goal of the algorithm is to generate a set of 'n' Binary Decision Trees to predict the probability of a new example being true or false

Each Binary Decision Tree is created on a unique dataset generated with sampling with replacement on the original dataset

The generation of the Binary Decision Trees follows the same process as before only with a different dataset

Screenshots

Screenshot

An example result of binarydecisiontree.ipynb (implementation of Binary Decision Tree)

Dependencies

  • Python 3.x
  • graphviz 9.0.0 (required only for visualisation)
  • dsplot 0.9.0 (required only for visualisation)

Instructions

Directions to Install

$ git clone https://github.com/AngadBasandrai/decision-tree-python.git

Directions to Run

  • Open .ipynb files

Graphviz Installation (required only for visualisation)

Windows

Linux

  • Ubuntu and Debian packages
sudo apt install graphviz
  • Fedora packages or RedHat Enterprise and CentOS systems

sudo yum install graphviz

Mac

sudo port install graphviz
brew install graphviz

DSPlot installation (required only for visualisation)


NOTE: The files were made in kaggle and there may be some portability issues so it is recommended to import them into kaggle

Contributors

Angad Basandrai

Angad Basandrai

GitHub

License

License

Made with ❤️ by Angad Basandrai

About

A rudimentary implementation of Decision Trees made in python without using any external machine learning libraries

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published