Inducing Decision Trees

Description

Implement and test the decision tree learning algorithm.

Download the two datasets available on repo. Each data set is divided into three sets: the training set, the validation set and the test set. Data sets are in CSV format. The first line in the file gives the attribute names. Each line after that is a training (or test) example that contains a list of attribute values separated by a comma. The last attribute is the class-variable. Assume that all attributes take values from the domain (0,1).
Implemented the decision tree learning algorithm. The main step in decision tree learning is choosing the next attribute to split on. Implemented the following two heuristics for selecting the next attribute -

Information gain heuristic.
Variance impurity heuristic described below. Let K denote the number of examples in the training set. Let K0 denote the number of training examples that have class = 0 and K1 denote the number of training examples that have class = 1. The variance impurity of the training set S is defined as:

Notice that the impurity is 0 when the data is pure. The gain for this impurity is defined as usual.

where X is an attribute, Sx denotes the set of training examples that have X = x and Pr(x) is the fraction of the training examples that have X = x (i.e., the number of training examples that have X = x divided by the number of training examples in S).

Implemented a function to print the decision tree to standard output. We will use the following format.

According to this tree, if wesley = 0 and honor = 0 and barclay = 0, then the class value of the corresponding instance should be 1. In other words, the value appearing before a colon is an attribute value, and the value appearing after a colon is a class value.

How to Run?

a. Place the file DecisionTree.py in a directory.
b. Use below command to run the script

python DecisionTree.py

c. Parameters for the script would be asked now. Please provide in below format -

<Training dataset Path><Validation dataset Path><Test dataset Path><Print Tree?Yes/No><Heuristic?H1/H2>

Ex:-

D:\data_TEMP\\training_set.csv D:\data_TEMP\\validation_set.csv D:\data_TEMP\\test_set.csv yes h1

d. That's it! Output would show the accuracies for training, validation, test data. Along with decision tree based on input provided.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
datasets		datasets
images		images
DecisionTree.py		DecisionTree.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inducing Decision Trees

Description

How to Run?

About

Releases

Packages

Languages

License

Banu-Prasanth-Pulicharla/inducing-decision-trees

Folders and files

Latest commit

History

Repository files navigation

Inducing Decision Trees

Description

How to Run?

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages