# 20MAI0005_Abhishek_Kumar

# Decision Tree 

A decision tree is a map of the possible outcomes of a series of related choices. It allows an individual or organization to weigh possible actions against one another based on their costs, probabilities, and benefits.

As the name goes, it uses a tree-like model of decisions. They can be used either to drive informal discussion or to map out an algorithm that predicts the best choice mathematically.

A decision tree typically starts with a single node, which branches into possible outcomes. Each of those outcomes leads to additional nodes, which branch off into other possibilities. This gives it a tree-like shape.

# KNN

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric classification method first developed by Evelyn Fix and Joseph Hodges in 1951,[1] and later expanded by Thomas Cover.[2] It is used for classification and regression. In both cases, the input consists of the k closest training examples in data set. The output depends on whether k-NN is used for classification or regression:

In k-NN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.
In k-NN regression, the output is the property value for the object. This value is the average of the values of k nearest neighbors.

In [4]:

# Supress Warnings

import warnings
warnings.filterwarnings('ignore')

In [1]:
#Import the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
#Read the dataset onto a variable

train=pd.read_csv("titanic_data.csv") #titanic dataset
train.head(5)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [3]:
#Fix the Predictor Variables
df=train[['Survived','Pclass','Sex','Age','Fare']]

In [5]:
#Change male to '1' and female to '0'
df["Sex"]=df["Sex"].apply(lambda sex:1 if sex=="male" else 0)

In [6]:
df['Sex'].value_counts()

1    577
0    314
Name: Sex, dtype: int64

In [7]:
#Handling the Missing Values -  Data Imputation
df.isna().sum()

Survived      0
Pclass        0
Sex           0
Age         177
Fare          0
dtype: int64

In [8]:
#Outliners
df['Age']=df['Age'].fillna(df['Age'].median()) 


In [9]:
#Take a look at the Dataframe
df.head()

Unnamed: 0,Survived,Pclass,Sex,Age,Fare
0,0,3,1,22.0,7.25
1,1,1,0,38.0,71.2833
2,1,3,0,26.0,7.925
3,1,1,0,35.0,53.1
4,0,3,1,35.0,8.05


In [10]:
#Set the Predictor(x) and Response(Y) variables
X=df.drop("Survived", axis=1)
Y=df["Survived"]

In [11]:
#Splitting into training and test set
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,random_state=25)

# DECISION TREE

In [12]:
# Call the Decision Tree Model
from sklearn.tree import DecisionTreeClassifier
dtree=DecisionTreeClassifier(max_depth=10, random_state=101, max_features=None, min_samples_leaf=15)
dtree.fit(X_train,Y_train)

DecisionTreeClassifier(max_depth=10, min_samples_leaf=15, random_state=101)

In [13]:
#Compute the Predictions or Y hat
Y_pred_d= dtree.predict(X_test)
Y_pred_d

array([0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0,
       0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1,
       0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1,
       1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1,
       1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1,
       1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1,
       1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1,
       0, 0, 1, 0], dtype=int64)

In [14]:
#Confusion Matrix
from sklearn.metrics import confusion_matrix
cm_d=confusion_matrix(Y_test,Y_pred_d)
cm_d

array([[143,  22],
       [ 33,  70]], dtype=int64)

In [15]:
confusion_d=pd.crosstab(Y_pred_d,Y_test,rownames=['predicted'],colnames=['actual'])
print("\n The Confusion Matrix is:")
confusion_d


 The Confusion Matrix is:


actual,0,1
predicted,Unnamed: 1_level_1,Unnamed: 2_level_1
0,143,33
1,22,70


In [16]:
#Accuracy Score
from sklearn.metrics import accuracy_score
accuracy_dtree=accuracy_score(Y_test,Y_pred_d)
print("Accuracy using Decision Tree Model is: ",accuracy_dtree)

Accuracy using Decision Tree Model is:  0.7947761194029851


In [17]:
#Classification Report
from sklearn.metrics import classification_report
report=classification_report(Y_test,Y_pred_d)
print(report)

              precision    recall  f1-score   support

           0       0.81      0.87      0.84       165
           1       0.76      0.68      0.72       103

    accuracy                           0.79       268
   macro avg       0.79      0.77      0.78       268
weighted avg       0.79      0.79      0.79       268



# KNN

In [18]:
#K Nearest Neighbor Model
from sklearn.neighbors import KNeighborsClassifier
knn=KNeighborsClassifier(n_neighbors=15)
knn.fit(X_train,Y_train)

KNeighborsClassifier(n_neighbors=15)

In [19]:
#Compute the Predictions
Y_pred_k=knn.predict(X_test)
Y_pred_k

array([0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1,
       1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1,
       1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1,
       1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0,
       0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 1, 1], dtype=int64)

In [20]:
#Confusion Matrix
from sklearn.metrics import confusion_matrix
cm_k=confusion_matrix(Y_test,Y_pred_k)
cm_k

array([[138,  27],
       [ 53,  50]], dtype=int64)

In [21]:
confusion_k=pd.crosstab(Y_pred_k,Y_test,rownames=['predicted'],colnames=['actual'])
print("\n The Confusion Matrix is:")
confusion_k


 The Confusion Matrix is:


actual,0,1
predicted,Unnamed: 1_level_1,Unnamed: 2_level_1
0,138,53
1,27,50


In [22]:
#Confusion Matrix
from sklearn.metrics import accuracy_score
accuracy_knn=accuracy_score(Y_test,Y_pred_k)
print("Accuracy using K Nearest Neighbor Model is: ",accuracy_knn)

Accuracy using K Nearest Neighbor Model is:  0.7014925373134329


In [23]:
#Classification Report
from sklearn.metrics import classification_report
report=classification_report(Y_test,Y_pred_k)
print(report)

              precision    recall  f1-score   support

           0       0.72      0.84      0.78       165
           1       0.65      0.49      0.56       103

    accuracy                           0.70       268
   macro avg       0.69      0.66      0.67       268
weighted avg       0.69      0.70      0.69       268

