### Classification using Supervised ML

`Decision Tree Classification`

> A decision tree is a flowchart-like tree structure where an internal node represents feature(or attribute), the branch represents a decision rule, and each leaf node represents the outcome. The topmost node in a decision tree is known as the root node.

> It learns to partition on the basis of the attribute value. It partitions the tree in recursively manner call recursive partitioning. This flowchart-like structure helps you in decision making.

> It's visualization like a flowchart diagram which easily mimics the human level thinking. That is why decision trees are easy to understand and interpret.

`Project Objective`
* As a marketing manager, you want a set of customers who are most likely to purchase your product. This is how you can save your marketing budget by finding your audience. This process of classifying customers into a group of potential and non-potential customers is a classification problem. 



* Classification is a two-step process, learning step and prediction step. In the learning step, the model is developed based on given training data. In the prediction step, the model is used to predict the response for given data. Decision Tree is one of the easiest and popular classification algorithms to understand and interpret. It can be utilized for both classification and regression kind of problem.

### 1. Importing the required Libraries.

In [1]:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
import numpy as np

### 2. Importing Data from the Dataset.

In [2]:
data = pd.read_csv("iphone_purchase_records.csv") #Reading CSV file from the above link.
print("------------Data imported successfully---------------\n")
data.info()
data.head(10)#df.head() allows you to print 10 rows in your dataset.

------------Data imported successfully---------------

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400 entries, 0 to 399
Data columns (total 4 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Gender           400 non-null    object
 1   Age              400 non-null    int64 
 2   Salary           400 non-null    int64 
 3   Purchase Iphone  400 non-null    int64 
dtypes: int64(3), object(1)
memory usage: 12.6+ KB


Unnamed: 0,Gender,Age,Salary,Purchase Iphone
0,Male,19,19000,0
1,Male,35,20000,0
2,Female,26,43000,0
3,Female,27,57000,0
4,Male,19,76000,0
5,Male,27,58000,0
6,Female,27,84000,0
7,Female,32,150000,1
8,Male,25,33000,0
9,Female,35,65000,0


### 3. Preparing the Data

In [3]:
'''To make a decision tree, all data has to be numerical.

We have to convert the non numerical column 'Gender' into numerical values.'''


x = data.iloc[:,:-1].values
y = data.iloc[:, 3].values

from sklearn.preprocessing import LabelEncoder
labelEncoder_gender =  LabelEncoder()
x[:,0] = labelEncoder_gender.fit_transform(x[:,0])

from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=0.25, random_state=0)

df = pd.DataFrame({'Input [X_test]': list(X_test), 'Output [Y_test]': Y_test})
df.head(10)

Unnamed: 0,Input [X_test],Output [Y_test]
0,"[1, 30, 87000]",0
1,"[0, 38, 50000]",0
2,"[1, 35, 75000]",0
3,"[0, 30, 79000]",0
4,"[0, 35, 50000]",0
5,"[1, 27, 20000]",0
6,"[0, 31, 15000]",0
7,"[1, 36, 144000]",1
8,"[0, 18, 68000]",0
9,"[1, 47, 43000]",0


### 4. Training the Algorithm

In [4]:
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = "entropy", random_state=0)
classifier.fit(X_train, Y_train)
print("xxxxxxxxxxxxxx Training the Model is completed.xxxxxxxxxxxxxxxxxxx")

xxxxxxxxxxxxxx Training the Model is completed.xxxxxxxxxxxxxxxxxxx


### 5. Making Predictions

In [5]:
y_pred = classifier.predict(X_test)
df = pd.DataFrame({'Input [X_test]': list(X_test), 'Output [y_pred]': np.round(y_pred)})
df.head(10)

Unnamed: 0,Input [X_test],Output [y_pred]
0,"[1, 30, 87000]",0
1,"[0, 38, 50000]",0
2,"[1, 35, 75000]",0
3,"[0, 30, 79000]",0
4,"[0, 35, 50000]",0
5,"[1, 27, 20000]",0
6,"[0, 31, 15000]",0
7,"[1, 36, 144000]",1
8,"[0, 18, 68000]",0
9,"[1, 47, 43000]",0


In [6]:
# Comparing Actual vs Predicted
df = pd.DataFrame({'Actual': Y_test, 'Classified': np.round(y_pred)}) #Comparing the accurracy  
display(df.head(10))

Unnamed: 0,Actual,Classified
0,0,0
1,0,0
2,0,0
3,0,0
4,0,0
5,0,0
6,0,0
7,1,1
8,0,0
9,0,0


### Evaluating the model

In [7]:
from sklearn import metrics
cm = metrics.confusion_matrix(Y_test, y_pred) 
print(cm)
accuracy = metrics.accuracy_score(Y_test, y_pred) 
print("Accuracy score:",accuracy)

[[63  5]
 [ 3 29]]
Accuracy score: 0.92
