# Decision Tree

The general motive of using Decision Tree is to create a training model which can use to predict class or value of target variables by learning decision rules inferred from prior data (training data).

Assumptions: 
    1. At the beginning, the whole training set is considered as the root.
    2. Feature values are preferred to be categorical. 
       If the values are continuous then they are discretized prior to building the model.
    3. Records are distributed recursively on the basis of attribute values.
    4. Order to placing attributes as root or internal node of the tree is done by using some statistical approach.

Types:
    1. ID3 - uses Entropy function and Information gain as metrics.
        Calculate Entroy of each feature and select the one with highest information gain.
    2. CART (Classification and Regression Trees) → uses Gini Index(Classification) as metric.
        A Gini score gives an idea of how good a split is by how mixed the classes are in the two groups created by the split.
        Calculate the score for every row and spilt the data accordingly

The Gini Impurity of a node is the probability that a randomly chosen sample in a node would be incorrectly labeled if it was labeled by the distribution of samples in the node. For example, in the top (root) node, there is a 44.4% chance of incorrectly classifying a data point chosen at random based on the sample labels in the node.

Two approaches which we can use to avoid overfitting:
    1. Pre-Pruning : If it doesn't pass a threshold point, stop.
    2. Post-Pruning: Using cross-validation data, it tests whether expanding a node will make an improvement or not.

Hyperparameter:
    1. Min-sample-per-leaf node
    2. Tree depth
    3. Number of splits per node

 Time Complexity: 
     1. During training: O(n^2p)
     2. During prediction: O(p)

In [None]:
#Import libraries
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics

In [None]:
#Load Data
pima = pd.read_csv('D:/Datasets/prima-indian-diabetes.csv')
pima.head()

In [None]:
pima.columns

In [None]:
#Assign Features and target 
feature_cols = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
       'BMI', 'DiabetesPedigreeFunction', 'Age']
X = pima[feature_cols]
Y = pima[['Outcome']]

In [None]:
# import pandas_profiling 
# pandas_profiling.ProfileReport(pima)

In [None]:
#Split into train and test
X_train, X_test, y_train, y_test = train_test_split(X,Y, test_size=0.2,random_state=1)

In [None]:
#define Decision Tree 
clf = DecisionTreeClassifier()
# Train Decision Tree Classifer
clf = clf.fit(X_train,y_train)

#Predict the response for test dataset
y_pred = clf.predict(X_test)

In [None]:
# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

In [None]:
# !pip install graphviz
# !pip install pydotplus

In [None]:
#Visualize the decision tree
from sklearn.tree import export_graphviz
from sklearn.externals.six import StringIO  
from IPython.display import Image  
import pydotplus

dot_data = StringIO()
export_graphviz(clf, out_file=dot_data,  
                filled=True, rounded=True,
                special_characters=True,feature_names = feature_cols,class_names=['0','1'])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())  
# graph.write_png('diabetes.png')
Image(graph.create_png())

Pros: 
    1. Decision trees are easy to interpret and visualize.
    2. It can easily capture Non-linear patterns.
    3. It requires fewer data preprocessing from the user, for example, there is no need to normalize columns.
    4. It can be used for feature engineering such as predicting missing values, suitable for variable selection.
    5. The decision tree has no assumptions about distribution because of the non-parametric nature of the algorithm.

Cons: 
    1. Sensitive to noisy data. It can overfit noisy data.
    2. The small variation(or variance) in data can result in the different decision tree. This can be reduced by bagging and boosting algorithms.
    3. Decision trees are biased with imbalance dataset, so it is recommended that balance out the dataset before creating the decision tree.