### Classification Trees used to classify within categories for eg. male/female etc.

### We split our data i.e. starting from parent node into left and right (child) nodes. and keep splitting until you reach one entire category (samples at each node belong to one class)

### If there's only L and R child nodes it means its a binary tree

####  We split our data in order to maximize Information Gain (IG) for which we have a formula. Basically it's finding the difference between Impurity of the parent node and sum of Impurities of L and R child nodes. [smaller the sum of L and R impurities, larger IG]

An example of splitting a data set based on scatter plot

![title](splits_2.png)

### Decision tree of the example above:

![title](splits_1.png)

## Random Forest and Gradient Boosting algorithms have been derived from Decision Trees.

Decision Trees are relatively simple and may not be used alone but very useful with the above combinations. Implementations are facial recognition, Wii game etc

##### When algorithm is not based on Euclidean distance (such as decision tree) we do not need to scale our data using StandardScaler

In [1]:
import pandas as pd
import numpy as np

In [3]:
file = pd.read_csv("Social_Network_Ads.csv")
file.head(4)

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0


In [12]:
x = file.iloc[:,[2,3]].values
y = file.iloc[:,4].values


In [19]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.25, random_state=0)

In [22]:
from sklearn.tree import DecisionTreeClassifier

classifer = DecisionTreeClassifier(criterion='entropy',random_state=0)
classifer.fit(x_train,y_train)
pred = classifer.predict(x_test)

##### criterion is a variable of the DecisionTreeClassifier(). measures the quality of split. It's set to "Gini" by default but we prefer to work on "entropy". Objective is that child nodes are homogeneous. Entropy is reduced if they are, so if entropy=0 that means all the child nodes are homogeneous.

In [23]:
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test,pred)
cm

array([[62,  6],
       [ 3, 29]], dtype=int64)

### Visualizing the results on colour map:

![title](graph.png)

An example of over-fitting in the training set. As efforts are made to catch red points in the green region. 

In [34]:
iris = pd.read_csv("iris.data", names=["sepal length cm","sepal width cm","petal length cm","petal width cm","class"])
iris.head()

Unnamed: 0,sepal length cm,sepal width cm,petal length cm,petal width cm,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [35]:
feature = iris.iloc[:,[2,3]].values
target = iris.iloc[:,4].values

In [38]:
f_train, f_test, t_train, t_test = train_test_split(feature,target,test_size=0.25, random_state=0)

In [42]:
model = DecisionTreeClassifier(criterion="entropy",max_depth=4)
model.fit(f_train,t_train)
pred = model.predict(f_test)
cm = confusion_matrix(t_test,pred)
cm

array([[13,  0,  0],
       [ 0, 15,  1],
       [ 0,  0,  9]], dtype=int64)