# Decision Tree
* A decision tree is a supervised machine learning algorithm that is widely used for both classification and regression tasks.
* It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome.
* In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions and do not contain any further branches.
* Main disdavantage of decision tree is overfitting.

### Importing libraries............

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report,confusion_matrix
from sklearn.model_selection import GridSearchCV

In [None]:
df=pd.read_csv("/kaggle/input/heart-disease/heart.csv")

In [None]:
df.head()

### Splitting Data........

In [None]:
x=df.drop("target",axis=1)
y=df['target']

In [None]:
x_train,x_test,y_train,y_test=train_test_split(x,y,train_size=0.75)

In [None]:
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(x_test.shape)

Desicion tree is not a distance based algorithm. So it doesn't require feature scaling.

In [None]:
dtc=DecisionTreeClassifier(criterion='gini',
                           max_depth=None,
                           min_samples_split=2,
                           min_samples_leaf=1,
                           max_features=None) # taking all default parameter values.
dtc.fit(x_train,y_train)

In [None]:
y_pred=dtc.predict(x_test)

In [None]:
print(classification_report(y_test,y_pred))

In [None]:
sns.heatmap(confusion_matrix(y_test,y_pred),annot=True)

### Hyperparameter Tuning 

In [None]:
# Number of features to consider at every split
max_features = ['auto', 'sqrt']
# Maximum number of levels in tree
max_depth = [2,4]
# Minimum number of samples required to split a node
min_samples_split = [2, 5]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2]
# criterion
criteria = [ 'gini','entropy']

In [None]:
# Create the param grid
param_grid = {
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
               'criterion':criteria}

print(param_grid)

In [None]:
gs_cv=GridSearchCV(dtc,param_grid=param_grid,cv = 3, verbose=2, n_jobs = 4)

In [None]:
gs_cv.fit(x_train,y_train)

In [None]:
gs_cv.best_params_

In [None]:
depth=gs_cv.best_params_['max_depth']
features=gs_cv.best_params_['max_features']
leaf=gs_cv.best_params_['min_samples_leaf']
split=gs_cv.best_params_['min_samples_split']
criteria=gs_cv.best_params_['criterion']

In [None]:
dtc=DecisionTreeClassifier(max_depth= depth,
                           max_features= features,
                           min_samples_leaf= leaf,
                           min_samples_split= split,
                           criterion=criteria
                          )


dtc.fit(x_train,y_train)
y_pred=dtc.predict(x_test)
print(classification_report(y_test,y_pred))

In [None]:
sns.heatmap(confusion_matrix(y_test,y_pred),annot=True)

### Conclusion:
* After tuning the model with GridSearchcv, the performance of the model is increased.