Decision Tree

DECISION TREE

Click for R-Script

hunts (1)

WHAT IS A DECISION TREE

A decision tree can be used to visually and explicitly represent decisions and decision making. As the name goes, it uses a tree-like model of decisions that have influenced a wide area of machine learning, covering both classification and regression.

Hunt's Algorithm for Decision Tree Induction

Hunt

If Dt contains records that belong the same class yt, then t is a leaf node labeled as yt.
If Dt is an empty set, then t is a leaf node labeled by the default class, yd.
If Dt contains records that belong to more than one class, use an attribute test to split the data into smaller subsets.

SPLITTING CRITERIA

GINI IMPURITY
Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset.
ENTROPY
Entropy is the measure of randomness in the universe of discourse.

Source: https://en.wikipedia.org/wiki/Decision_tree_learning

OVERFITTING IN DECISION TREES

Source: https://www.semanticscholar.org/paper/Image-Classification-and-Text-Extraction-using-Deepa-Lalwani/6a4095ad8df6bbcc5cdf8f586ca2b420381df5ce

Generally Data is split into Training data and Testing data for model training.
Model is trained on Training Data and later on, its accuracy is calculated on the basis of Results taken using Testing Dataset.
While training the model a very common complication occurs that is called OVER-fitting.
Basically, overfitting refers to a model which has learned the training data so well, that it fails to work accurately on Validation Set.
In actuality, the model learns the noise and details of training data, hence, it negatively impacts the performance of the model on new (test) data.

TREE PRUNING

A Decision Tree is Pruned to reduce overfitting and attain better results.
Pruning reduces the size of decision trees by removing parts of the tree that do not provide power to classify instances.
At each stage of splitting the tree, we check the cross-validation error. If the error does not decrease significantly enough then we stop.

Click for R-Script

Useful Resources

Home

1-Statistical Tool R

Programming Language R

User-Interface

2-Data Pre-processing

3-Checking for Significant Differences

Group Comparison of Variables within 2 Groups

Comparison of Multiple Groups

Group Comparison of Multivariate Data

4-Multivariate Methods

Supervised Learning vs Unsupervised Learning

Unsupervised Learning

Supervised Learning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decision Tree

DECISION TREE

Click for R-Script

WHAT IS A DECISION TREE

Hunt's Algorithm for Decision Tree Induction

SPLITTING CRITERIA

OVERFITTING IN DECISION TREES

TREE PRUNING

Click for R-Script

Useful Resources

Home

1-Statistical Tool R

2-Data Pre-processing

3-Checking for Significant Differences

4-Multivariate Methods

5-Implementation of above algorithms in R

Clone this wiki locally