# Decision Trees in Python
### Decision Tree Algorithm
A decision tree is a flowchart-like tree structure where an internal node represents feature(or attribute), the branch represents a decision rule, and each leaf node represents the outcome. The top note is referred to as the *Root* node, and it learns to partition on the basis of the attribute value. 
<br>
<br> Partitions are made in a recursive manner. And this structure helps with decision making. It's closest relative is a generic flow chart, which essentially mimics human-level thinking.

### How Does it Work?
Here is the basic idea of a decision tree's parts:
1. Select the best attribute using Attribute Selection Measures to split the data records.
2. Make that attribute a decision node and breaks the dataset into smaller subsets.
3. Starts tree building by repeating this proceess recursively for each child until one of the conditions will match:
    * All the tuples belong to the same attribute value
    * There are no more remaining attributes
    * There are no more instances

### Attribute Selection Measures
This is a heuristic for selecting the splitting criterion that partition data in the best possible manner. It is also known as 'splitting rules' because it helps us to determine breakpoints for tuples on a given node. ASM provides a rank to each feature(or attribute) by explaining the given dataset. The attribute with the best score (i.e. lowest Gini value) will be selected as a splitting attribute. Popular measures are Information Gain, Gain Ratio and Gini Index.

#### Information Gain
Shannon coined the concept of entropy, which measures the impurity of the input set. In physics and mathematics, entropy referred as the randomness or the impurity in the system. In information theory, it refers to the impurity in a group of examples. Information Gain is the decrease in entropy. IG computes the difference between entropy before the split, and the average entropy after the split based on given attribute values. The **Iterative Dichotomiser (ID3** decision tree algorithm uses the IG method.

$$ Gain(A) = Info(D) - Info_{A}(D) $$

#### Gain Ratio
Information Gain is biased for the attributes that have many outcomes. This means IG prefers the attribute with a *large number* of distinct values. For instance, consider if an attribute with a unique identifier, such as customer_ID, has *no* info due to pure partition. This maximizes information gain and creates useless partitioning.
<br>
<br>
**C4.5**, an improvement of the **ID3** algorithm, uses an extension to information gain known as the *Gain Ratio*. Gain Ratio handles the issue of bias by normalizing the information gain using split info.

$$ GainRatio(A) = \frac{Gain(A)}{SplitInfo_{A}(D)} $$

The attribute with the highest gain ratio is chosen as the splitting attribute

#### Gini Index
Another decision tree algorithm, **CART (Classification and Regression Tree)**, uses the Gini method to create split points. The Gini Index considers a binary split for each attribute. You can compute a weighted sum of the impurity if each partition. If a binary split on attribute *A* partitions data *D* into *D1* and D2*, the Gini Index of D is:
<br> 

$$ Gini_{A}(D) = \frac{D1}{D}Gini(D_{1})+\frac{D2}{D} Gini(D_{2}) $$

<br>
<br>