<a href="https://colab.research.google.com/github/SangamSilwal/Machine-learning-Series/blob/main/Decision_Tree_28.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hyperparameters in DecisionTreeClassifier

1. Max_depth - It represent the maximum depth of the tree. Prevents the tree from growing too deep(overfitting)

2. min_samples_split - It is the minimum number of samples required to split an internal node. (Larger value -> fewer split -> more general tree)

3. min_samples_leaf - It is the minimum number of samples required to be at a leaf node. It ensures leaves arent too small. Higher value make the model more conservative

4. max_leaf_nodes - It limits the number of the leaf nodes. If **None**, tree grow without this constraint.

5. max_features - Number of the features to consider when looking for the best split.Options are : auto , sqrt , log2 or some integer or float

6. criterion - It is the function to measure the quality of split. For classification "gini" or "entropy" .

7. splitter - best or random

We can tune the above parameter using ***GridSearchCv*** or **RandomizedSearchCv**

***Visualizing Root node and leaf node***

In [1]:
#                  [Root Node: Is it Sunny?]
#                  /                       \
#            Yes /                           \ No
#               /                             \
#      [Node: Is humidity > 70?]         [Leaf: Play Tennis]
#           /         \
#    Yes /             \ No
#       /               \
#  [Leaf: Don't Play]   [Leaf: Play]


In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

In [4]:
from sklearn.datasets import load_breast_cancer
X,y = load_breast_cancer(return_X_y=True)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=42)

In [13]:
dt = DecisionTreeClassifier(random_state=42)

param_grid = {
    'max_depth': [2,3,4,5,None],
    'min_samples_split':[2,5,10],
    'min_samples_leaf':[1,2,4],
    'criterion':['gini','entropy']
}

# applying GridSearch
from sklearn.model_selection import GridSearchCV
grid_search = GridSearchCV(estimator=dt,param_grid=param_grid,cv=10,scoring='accuracy',n_jobs=-1)
grid_search.fit(X_train,y_train)

print("Best Parameters: ",grid_search.best_params_)
print("Best Score: ",grid_search.best_score_)
print("test accuracy: ",grid_search.score(X_test,y_test))

Best Parameters:  {'criterion': 'entropy', 'max_depth': 5, 'min_samples_leaf': 2, 'min_samples_split': 10}
Best Score:  0.9419871794871796
test accuracy:  0.9590643274853801
