# Decision Tree Classifier

A Decision Tree is a supervised algorithm used in machine learning.In this kind of decision trees, the decision variable is categorical. It is using a binary tree graph (each node has two children) to assign for each data sample a target value. The target values are presented in the tree leaves. To reach to the leaf, the sample is propagated through nodes, starting at the root node.

## Step-1 Import Libraries and dataset

In [50]:
import pandas as pd

In [51]:
df = pd.read_excel('ML_DTreeClassifier_Dataset.xlsx')

In [52]:
df.head()

Unnamed: 0,age,weight,gender,likeness,height
0,27,76.0,Male,Biryani,170.688
1,41,70.0,Male,Biryani,165.0
2,29,80.0,Male,Biryani,171.0
3,27,102.0,Male,Biryani,173.0
4,29,67.0,Male,Biryani,164.0


## Step-2 Convert gender into dummies variables

In [53]:
# convert gender in dummies variables
df['gender'] = df['gender'].replace("Male",1)
df['gender'] = df['gender'].replace("Female", 0)
df.head()

Unnamed: 0,age,weight,gender,likeness,height
0,27,76.0,1,Biryani,170.688
1,41,70.0,1,Biryani,165.0
2,29,80.0,1,Biryani,171.0
3,27,102.0,1,Biryani,173.0
4,29,67.0,1,Biryani,164.0


## Step-3 Splitting dataset into training and testing data

In [54]:
# selection of input and output variables
X = df[['age','gender']]
y = df['likeness']

In [55]:
X.head()

Unnamed: 0,age,gender
0,27,1
1,41,1
2,29,1
3,27,1
4,29,1


In [56]:
y.head()

0    Biryani
1    Biryani
2    Biryani
3    Biryani
4    Biryani
Name: likeness, dtype: object

## Step-4 Fit DecisionTreeClassifier

In [74]:
# machine learning algorithm
from sklearn.tree import DecisionTreeClassifier

#create and fit our model
model = DecisionTreeClassifier().fit(X,y)

array(['Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani'], dtype=object)

## Step-5 Prediction

In [76]:
# prediction
y_pred=model.predict([[23,0]])
y_pred=model.predict(X_test)
y_pred

array(['Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani', 'Biryani',
       'Biryani'], dtype=object)

## Step-6 Accuracy Score

In [75]:
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
result = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(result)
result1 = classification_report(y_test, y_pred)
print("Classification Report:",)
print (result1)
result2 = accuracy_score(y_test,y_pred)
print("Accuracy:",result2)

Confusion Matrix:
[[30  0  0]
 [ 8  0  0]
 [11  0  0]]
Classification Report:
              precision    recall  f1-score   support

     Biryani       0.61      1.00      0.76        30
      Pakora       0.00      0.00      0.00         8
      Samosa       0.00      0.00      0.00        11

    accuracy                           0.61        49
   macro avg       0.20      0.33      0.25        49
weighted avg       0.37      0.61      0.46        49

Accuracy: 0.6122448979591837


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [58]:

## split data into test and train (80/20)
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X_train,X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state = 1) # 80% training and 20% testing data

# create a model
model = DecisionTreeClassifier()
# fitting a model
model.fit(X_train, y_train)

predicted_values = model.predict(X_test)
predicted_values

# checking score
# y_test = actual_values
score = accuracy_score(y_test, predicted_values)
score

0.5918367346938775

In [59]:
# how to train and save your model

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
import joblib

model = DecisionTreeClassifier().fit(X,y)
joblib.dump(model, "foodie.joblib")

['foodie.joblib']

## Step-7 Plotting

In [70]:
# graph
from sklearn import tree
import graphviz
import os
os.environ["PATH"] += os.pathsep + 'C:/GraphViz/bin'
model = DecisionTreeClassifier().fit(X, y)
# graphic evaluation
dot_data = tree.export_graphviz(model, out_file="foodie.dot",
                    feature_names=['age','gender'],
                    class_names=sorted(y.unique()),
                    label='all',
                    rounded=True,
                    filled=True)
