# Decision-Tree Breakdown

- Clasfication algorithm
- Split by "Purity" of the child nodes, i.e. child nodes having more of a class that other classes
- ...

## Keywords 

- **Impurity**
  - Degree of mixture with respect to class, i.e. number of class differences
  - Low Impurity: One class has more instances than any other
  - Zeror Impurity: Having only one of the classes
  - High Impurity: Both split having equal number of classes

## Metrics to Split

- Gini impurity, entropy, information gain, and log loss

### Gini Impurity

- For each child node, the probabilty of a given class belonging to it 
- $1 - \sum P(i)^2$,
  - $i$ = class
  - $P(i)$ = the probability of samples belonging to class i in a given node.
- https://www.coursera.org/learn/the-nuts-and-bolts-of-machine-learning/supplement/zShQK/explore-decision-trees


## Intuitions & Steps

## Caveats

- Advantages:
  - Require relatively few pre-processing steps
  - Can work easily with all types of variables (continuous, categorical, discrete)
  - Do not require normalization or scaling
  - Decisions are transparent
  - Not affected by extreme univariate values
- Disadvantages:
  - Can be computationally expensive relative to other algorithms
  - Small changes in data can result in significant changes in predictions

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree

from sklearn.metrics import (
    ConfusionMatrixDisplay, 
    confusion_matrix,
    recall_score, 
    precision_score, 
    f1_score, 
    accuracy_score
)

sns.set()
plt.rcParams["figure.figsize"] = (12, 4)

In [5]:
# https://www.coursera.org/learn/the-nuts-and-bolts-of-machine-learning/ungradedLab/HVyMU/annotated-follow-along-guide-build-a-decision-tree
df = pd.read_csv("../z.data/Churn_Modelling.csv").drop(columns=["RowNumber"])
df.head()

Unnamed: 0,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0
