# Decision Trees in Python
### Decision Tree Algorithm
A decision tree is a flowchart-like tree structure where an internal node represents feature(or attribute), the branch represents a decision rule, and each leaf node represents the outcome. The top note is referred to as the *Root* node, and it learns to partition on the basis of the attribute value. 
<br>
<br> Partitions are made in a recursive manner. And this structure helps with decision making. It's closest relative is a generic flow chart, which essentially mimics human-level thinking.

### How Does it Work?
Here is the basic idea of a decision tree's parts:
1. Select the best attribute using Attribute Selection Measures to split the data records.
2. Make that attribute a decision node and breaks the dataset into smaller subsets.
3. Starts tree building by repeating this proceess recursively for each child until one of the conditions will match:
    * All the tuples belong to the same attribute value
    * There are no more remaining attributes
    * There are no more instances

### Attribute Selection Measures
This is a heuristic for selecting the splitting criterion that partition data in the best possible manner. It is also known as 'splitting rules' because it helps us to determine breakpoints for tuples on a given node. ASM provides a rank to each feature(or attribute) by explaining the given dataset. The attribute with the best score (i.e. lowest Gini value) will be selected as a splitting attribute. Popular measures are Information Gain, Gain Ratio and Gini Index.

#### Information Gain
Shannon coined the concept of entropy, which measures the impurity of the input set. In physics and mathematics, entropy referred as the randomness or the impurity in the system. In information theory, it refers to the impurity in a group of examples. Information Gain is the decrease in entropy. IG computes the difference between entropy before the split, and the average entropy after the split based on given attribute values. The **Iterative Dichotomiser (ID3** decision tree algorithm uses the IG method.

$$ Gain(A) = Info(D) - Info_{A}(D) $$

#### Gain Ratio
Information Gain is biased for the attributes that have many outcomes. This means IG prefers the attribute with a *large number* of distinct values. For instance, consider if an attribute with a unique identifier, such as customer_ID, has *no* info due to pure partition. This maximizes information gain and creates useless partitioning.
<br>
<br>
**C4.5**, an improvement of the **ID3** algorithm, uses an extension to information gain known as the *Gain Ratio*. Gain Ratio handles the issue of bias by normalizing the information gain using split info.

$$ GainRatio(A) = \frac{Gain(A)}{SplitInfo_{A}(D)} $$

The attribute with the highest gain ratio is chosen as the splitting attribute

#### Gini Index
Another decision tree algorithm, **CART (Classification and Regression Tree)**, uses the Gini method to create split points. The Gini Index considers a binary split for each attribute. You can compute a weighted sum of the impurity if each partition. If a binary split on attribute *A* partitions data *D* into *D1* and D2*, the Gini Index of D is:
<br> 

$$ Gini_{A}(D) = \frac{D1}{D}Gini(D_{1})+\frac{D2}{D} Gini(D_{2}) $$

<br>
<br>
In case of a discrete-valued attribute, the subset that gives the minimum Gini Index for that chosen attribute is selected as the splitting attribute. In cases of continuous-valued attributes, the strategy is to select each pair of adjacent values as a possible split-point and the point with a smaller Gini Index will be chosen as the splitting point.

### Decision Tree Classifier Building in Scikit-learn

In [1]:
#Import Libraries
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

# Import metrics module for accuracy calculation
from sklearn import metrics

In [2]:
# Load our data (Pima Indian Diabetes Data)
col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age', 'label']
# load dataset
pima = pd.read_csv("diabetes.csv", header=None, names=col_names)

In [3]:
pima.head()

Unnamed: 0,pregnant,glucose,bp,skin,insulin,bmi,pedigree,age,label
0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
1,6,148,72,35,0,33.6,0.627,50,1
2,1,85,66,29,0,26.6,0.351,31,0
3,8,183,64,0,0,23.3,0.672,32,1
4,1,89,66,23,94,28.1,0.167,21,0


#### Feature Selection
Here, we need to divide the given columns into two types of variables:
1. Dependent (or target variable)
2. Independent (or feature variables)

In [4]:
#Drop Index 0
pima.drop(index=pima.index[0],axis=0,inplace=True)
pima.head(3)

Unnamed: 0,pregnant,glucose,bp,skin,insulin,bmi,pedigree,age,label
1,6,148,72,35,0,33.6,0.627,50,1
2,1,85,66,29,0,26.6,0.351,31,0
3,8,183,64,0,0,23.3,0.672,32,1


In [5]:
#split dataset in features and target variable
feature_cols = ['pregnant', 'insulin', 'bmi', 'age','glucose','bp','pedigree']
X = pima[feature_cols] # Features
y = pima.label # Target variable

#### Splitting Data
To understand model performance, dividing the dataset into train and test sets is recommended. We can use the function train_test_split(). We will also need to pass 3 parameters features, target, and test_set size.

In [6]:
# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1) # 70% training and 30% test

#### Building the decision tree model

In [7]:
# Create Decision Tree classifer object
clf = DecisionTreeClassifier()

# Train Decision Tree Classifer
clf = clf.fit(X_train,y_train)

#Predict the response for test dataset
y_pred = clf.predict(X_test)

#### Evaluate the Model
Let's estimate, how accurately the classifier or model can predict the type of cultivars. Accuracy can be computed by comparing actual test set values and predicted values.

In [8]:
# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

Accuracy: 0.683982683982684


~66% is a relatively good accuracy rating, and we can improve it by fine tunung the parameters in our decision tree algorithm.

#### Visualizing Decision Tree
You can use Scikit-learn's *export_graphviz* function to display the tree within Jupyter. For plotting, you also will need to install graphviz and pydotplus.
<br>
<br> *export_graphviz* converts the decision tree classifier into a dot file and pydotplus converts this dot file to png or another displayable format on Jupyter.

In [10]:
from sklearn.tree import export_graphviz
from io import StringIO  
from IPython.display import Image  
import pydotplus

dot_data = StringIO()
export_graphviz(clf, out_file=dot_data,  
                filled=True, rounded=True,
                special_characters=True,feature_names = feature_cols,class_names=['0','1'])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())  
graph.write_png('diabetes.png')
Image(graph.create_png())

InvocationException: GraphViz's executables not found