# **Concepts of Classification**

In this lesson, we will cover the following concepts of classification with the help of a business use case:
* Naive Baye's theorem
* Support vector machines
* Decision Tree Classification
* Random Forest Classification
* K Nearest Neighbor Classification
* Model Comparision
* Hyperparameter Tuning - GridSearchCV()
* Evaluating the model using accuracy and confusion matrix

## **Naive Baye's Theorem**

Naive Baye's theorem is used in classifications that assume that the presence of a particular feature in a class is unrelated to the presence of any other feature.



![NB1](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/0.7_Supervised_Learning_-_Regression_and_Classification/Trainer_PPT_and_IPYNB/0.4_Classification/NB1.JPG)

###### **Example**

* As a first step toward prediction using Naive Baye's theorem, you will have to estimate the frequency of each and every attribute.

  ![NB2](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/0.7_Supervised_Learning_-_Regression_and_Classification/Trainer_PPT_and_IPYNB/0.4_Classification/NB2.JPG)

* Calculating the likelihood of each attribute: 

  ![NB3](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/0.7_Supervised_Learning_-_Regression_and_Classification/Trainer_PPT_and_IPYNB/0.4_Classification/NB3.JPG)
  <br><br><br>

  ![NB4](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/0.7_Supervised_Learning_-_Regression_and_Classification/Trainer_PPT_and_IPYNB/0.4_Classification/NB4.JPG)

* Let us find the probability of playing golf under the following conditions:
	* Outlook 		=	Rain 
	* Humidity 		=	High
	* Wind			=	Weak
	* Play			=	?




* Solution:

  * Calculation:
  
  ![NB5](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/0.7_Supervised_Learning_-_Regression_and_Classification/Trainer_PPT_and_IPYNB/0.4_Classification/NB5.JPG)

  * Prediction:
  
  ![NB6](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/0.7_Supervised_Learning_-_Regression_and_Classification/Trainer_PPT_and_IPYNB/0.4_Classification/NB6.JPG)

## **Support Vector Machine**

Let us understand the following basics before moving on to the mathematics behind SVM:
* Linear Separators
* Optimal Separation
* Classification Margin

**1. Linear Separators:**

Consider a binary separation which can be viewed as the task of separating classes in the feature space.

![SVM1](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/0.7_Supervised_Learning_-_Regression_and_Classification/Trainer_PPT_and_IPYNB/0.4_Classification/SVM1.JPG)

**2. Optimal Separation:**

Classification becomes difficult in the presence of multiple separators. Therefore, it is important to have an optimal separator.


![SVM2](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/0.7_Supervised_Learning_-_Regression_and_Classification/Trainer_PPT_and_IPYNB/0.4_Classification/SVM2.JPG)

**3. Classification margin:**

* Concept of classification margin:
Let's use the diagram below, to understand a separator and classification margin in a better manner.

![SVM3](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/0.7_Supervised_Learning_-_Regression_and_Classification/Trainer_PPT_and_IPYNB/0.4_Classification/SVM3.JPG)

* Need for maximizing the classification margin:
  * It generalizes the predictions and performs better on the test data by not overfitting the model to the training data.
  * It takes care of the support vectors, ignoring other training examples.

![SVM4](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/0.7_Supervised_Learning_-_Regression_and_Classification/Trainer_PPT_and_IPYNB/0.4_Classification/SVM4.JPG)



###### **Linear SVM**

![SVM5](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/0.7_Supervised_Learning_-_Regression_and_Classification/Trainer_PPT_and_IPYNB/0.4_Classification/SVM5.JPG)

* Formulate the quadratic optimization problem: 

  ![SVM6](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/0.7_Supervised_Learning_-_Regression_and_Classification/Trainer_PPT_and_IPYNB/0.4_Classification/SVM6.JPG)


* Reformulate the problem as:

  ![SVM7](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/0.7_Supervised_Learning_-_Regression_and_Classification/Trainer_PPT_and_IPYNB/0.4_Classification/SVM7.JPG)

  ![SVM8](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/0.7_Supervised_Learning_-_Regression_and_Classification/Trainer_PPT_and_IPYNB/0.4_Classification/SVM8.JPG)

**Feature Spaces:**

  ![SVM9](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/0.7_Supervised_Learning_-_Regression_and_Classification/Trainer_PPT_and_IPYNB/0.4_Classification/SVM9.JPG)


The original feature space can always be mapped to some higher-dimensional feature space where the training set is separable.


### **Kernel Trick**

* The linear classifier relies on inner product between vectors K(xi,xj)=xiTxj.

* If every datapoint is mapped into high-dimensional space via some transformation **Φ:  x → φ(x)**, the inner product becomes<br>

    **K(xi,xj)= φ(xi) Tφ(xj)**
    <br>

* A **kernel function** is a function that is equivalent to an inner product in a feature space.
<br>

* **Example:** 
	
  ![SVM10](https://labcontent.simplicdn.net/data-content/content-assets/Data_and_AI/Applied_Machine_Learning/Images/0.7_Supervised_Learning_-_Regression_and_Classification/Trainer_PPT_and_IPYNB/0.4_Classification/SVM10.JPG)

* Thus, a kernel function implicitly maps data to a high-dimensional space (without the need to compute each φ(x) explicitly).

## **Problem Statement**


Our aim in this project is to predict if a person would buy an iPhone with respect to their gender, age, and income. We will also compare different classification algorithms..


### **Dataset**

iphone_purchase_records.csv

### **Solution**

Note: In logistic regression, we have used PCA for feature selection, but let us take another route this time.

#### **Import Libraries**

In python, Pandas is used for data manipulation and analysis. Numpy is a package which includes a multidimensional array object as well as a number of derived objects. Matplotlib is an amazing visualization library in Python for 2D plots of arrays. Seaborn is an open-source Python library built on top of matplotlib.

These libraries are written with the import keyword.

---


In [1]:
#import required libraries
import pandas as pd

#import required libraries for visualization
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import seaborn as sns

#### **Data Acquisition**

In [6]:
# Step 1 - Load Data
import pandas as pd
data_set = pd.read_csv("iphone_purchase_records.csv")
X = data_set.iloc[:,:-1].values
y = data_set.iloc[:, 3].values

In [3]:
#Preview the train data
data_set.head(5)

Unnamed: 0,Gender,Age,Salary,Purchase Iphone
0,Male,19,19000,0
1,Male,35,20000,0
2,Female,26,43000,0
3,Female,27,57000,0
4,Male,19,76000,0


In [4]:
#Check the data type
data_set.dtypes

Gender             object
Age                 int64
Salary              int64
Purchase Iphone     int64
dtype: object

#### **Feature Extraction**

In the below code, you are using the sklearn library, which contains a lot of tools for machine learning and statistical modeling, including classification, regression, clustering, and dimensionality reduction. 

**1. Use LabelEncoder to convert gender to number**

In [8]:
from sklearn.preprocessing import LabelEncoder
labelEncoder_gender =  LabelEncoder()
X[:,0] = labelEncoder_gender.fit_transform(X[:,0])

# Optional - if you want to convert X to float data type
X = np.vstack(X[:, :]).astype(np.float64)

#### **Splitting Datasets**

In [9]:
# Step 3 - Split data into training and testing
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10, random_state=0)

In [10]:
X_train.shape

(360, 3)

#### **Feature Scaling**

In [11]:
# Step 4 - Feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Hyperparameter tuning  using GridSearch CV 

In [12]:
# Define the parameter grid to search
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [0.1, 0.01, 0.001, 0.0001],
    'kernel': ['rbf', 'linear', 'poly']
}

In [13]:
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
svm = SVC()

In [14]:
# Perform grid search cross-validation
grid_search = GridSearchCV(svm, param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

0,1,2
,estimator,SVC()
,param_grid,"{'C': [0.1, 1, ...], 'gamma': [0.1, 0.01, ...], 'kernel': ['rbf', 'linear', ...]}"
,scoring,'accuracy'
,n_jobs,
,refit,True
,cv,5
,verbose,0
,pre_dispatch,'2*n_jobs'
,error_score,
,return_train_score,False

0,1,2
,C,100
,kernel,'rbf'
,degree,3
,gamma,0.1
,coef0,0.0
,shrinking,True
,probability,False
,tol,0.001
,cache_size,200
,class_weight,


In [15]:
# Get the best parameters and the best score
best_params = grid_search.best_params_
best_score = grid_search.best_score_

print("Best Parameters:", best_params)
print("Best Score:", best_score)



Best Parameters: {'C': 100, 'gamma': 0.1, 'kernel': 'rbf'}
Best Score: 0.9027777777777779


In [16]:
# Evaluate the model on the test set
best_model = grid_search.best_estimator_
print(best_model)
test_score = best_model.score(X_test, y_test)
print("Test Set Score:", test_score)

SVC(C=100, gamma=0.1)
Test Set Score: 0.925


In [17]:
support_vectors = best_model.support_vectors_

In [18]:
support_vectors.shape

(89, 3)

# Train Naive Bays

In [19]:
from sklearn.naive_bayes import GaussianNB

In [20]:
# Define the Naive Bayes classifier
naive_bayes = GaussianNB()

# Define hyperparameters grid for grid search
param_grid = {
    'var_smoothing': [1e-9, 1e-8, 1e-7, 1e-6, 1e-5]  # Adjust var_smoothing hyperparameter
}

In [21]:
# Perform grid search cross-validation
grid_search = GridSearchCV(naive_bayes, param_grid, cv=5)
grid_search.fit(X_train, y_train)

0,1,2
,estimator,GaussianNB()
,param_grid,"{'var_smoothing': [1e-09, 1e-08, ...]}"
,scoring,
,n_jobs,
,refit,True
,cv,5
,verbose,0
,pre_dispatch,'2*n_jobs'
,error_score,
,return_train_score,False

0,1,2
,priors,
,var_smoothing,1e-09


In [22]:
# Print the best hyperparameters found
print("Best hyperparameters:", grid_search.best_params_)

# Evaluate the model on the test set
accuracy = grid_search.score(X_test, y_test)
print("Test set accuracy:", accuracy)

Best hyperparameters: {'var_smoothing': 1e-09}
Test set accuracy: 0.95


In [23]:
# Get feature importance (class conditional probabilities)
feature_importance = grid_search.best_estimator_.theta_
print("Feature importance (class conditional probabilities):")
print(feature_importance)

Feature importance (class conditional probabilities):
[[ 0.03223466 -0.47437776 -0.27091672]
 [-0.05372443  0.79062961  0.45152786]]


Feature Importance: Even though Naive Bayes assumes feature independence, it still gives weights to each feature through the class conditional probabilities. Features with higher conditional probabilities for a given class are considered more important for predicting that class.

# Train Decision Tree

In [None]:
# Decision tree model
from sklearn.tree import DecisionTreeClassifier
dt = DecisionTreeClassifier()

In [25]:
param_grid = {
    'criterion': ['gini', 'entropy'],
    'splitter': ['best', 'random'],
    'max_depth': [None, 10, 20, 30, 40, 50],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': [None, 'auto', 'sqrt', 'log2']
}

In [26]:
# Perform grid search cross-validation
grid_search = GridSearchCV(estimator=dt, param_grid=param_grid, cv=5, scoring='accuracy', n_jobs=-1, verbose=1)
grid_search.fit(X_train, y_train)

Fitting 5 folds for each of 864 candidates, totalling 4320 fits


1080 fits failed out of a total of 4320.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
606 fits failed with the following error:
Traceback (most recent call last):
  File "f:\Machine_Learning_Tutorials\.venv\Lib\site-packages\sklearn\model_selection\_validation.py", line 859, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "f:\Machine_Learning_Tutorials\.venv\Lib\site-packages\sklearn\base.py", line 1356, in wrapper
    estimator._validate_params()
  File "f:\Machine_Learning_Tutorials\.venv\Lib\site-packages\sklearn\base.py", line 469, in _validate_params
    validate_parameter_constraints(
  File "f:\Machine_Learning_Tutorials\.venv\Lib\site-packages\sklearn\utils\_param_validation.py", line 98, in validate

0,1,2
,estimator,DecisionTreeClassifier()
,param_grid,"{'criterion': ['gini', 'entropy'], 'max_depth': [None, 10, ...], 'max_features': [None, 'auto', ...], 'min_samples_leaf': [1, 2, ...], ...}"
,scoring,'accuracy'
,n_jobs,-1
,refit,True
,cv,5
,verbose,1
,pre_dispatch,'2*n_jobs'
,error_score,
,return_train_score,False

0,1,2
,criterion,'gini'
,splitter,'random'
,max_depth,10
,min_samples_split,2
,min_samples_leaf,4
,min_weight_fraction_leaf,0.0
,max_features,
,random_state,
,max_leaf_nodes,
,min_impurity_decrease,0.0


In [27]:
# Get the best parameters and the best score
best_params = grid_search.best_params_
best_score = grid_search.best_score_

print("Best Parameters:", best_params)
print("Best Cross-Validation Score:", best_score)

Best Parameters: {'criterion': 'gini', 'max_depth': 10, 'max_features': None, 'min_samples_leaf': 4, 'min_samples_split': 2, 'splitter': 'random'}
Best Cross-Validation Score: 0.9055555555555556


In [28]:
# Evaluate the model on the test set
best_model = grid_search.best_estimator_
test_score = best_model.score(X_test, y_test)
print("Test Set Score:", test_score)

Test Set Score: 0.875


# Train Random Forest Model

In [29]:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier()

In [30]:
# Define the parameter grid to search
param_grid = {
    'n_estimators': [100, 200,300,400],
    'max_depth': [None, 10, 20, 30, 40, 50],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': ['auto', 'sqrt', 'log2'],
    'bootstrap': [True, False]
}

In [31]:
# Perform grid search cross-validation
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5, scoring='accuracy', n_jobs=-1, verbose=2)
grid_search.fit(X_train, y_train)

Fitting 5 folds for each of 1296 candidates, totalling 6480 fits


2160 fits failed out of a total of 6480.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
1112 fits failed with the following error:
Traceback (most recent call last):
  File "f:\Machine_Learning_Tutorials\.venv\Lib\site-packages\sklearn\model_selection\_validation.py", line 859, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "f:\Machine_Learning_Tutorials\.venv\Lib\site-packages\sklearn\base.py", line 1356, in wrapper
    estimator._validate_params()
  File "f:\Machine_Learning_Tutorials\.venv\Lib\site-packages\sklearn\base.py", line 469, in _validate_params
    validate_parameter_constraints(
  File "f:\Machine_Learning_Tutorials\.venv\Lib\site-packages\sklearn\utils\_param_validation.py", line 98, in validat

0,1,2
,estimator,RandomForestClassifier()
,param_grid,"{'bootstrap': [True, False], 'max_depth': [None, 10, ...], 'max_features': ['auto', 'sqrt', ...], 'min_samples_leaf': [1, 2, ...], ...}"
,scoring,'accuracy'
,n_jobs,-1
,refit,True
,cv,5
,verbose,2
,pre_dispatch,'2*n_jobs'
,error_score,
,return_train_score,False

0,1,2
,n_estimators,100
,criterion,'gini'
,max_depth,
,min_samples_split,5
,min_samples_leaf,4
,min_weight_fraction_leaf,0.0
,max_features,'sqrt'
,max_leaf_nodes,
,min_impurity_decrease,0.0
,bootstrap,True


In [32]:
# Get the best parameters and the best score
best_params = grid_search.best_params_
best_score = grid_search.best_score_
print("Best Parameters:", best_params)
print("Best Cross-Validation Score:", best_score)

Best Parameters: {'bootstrap': True, 'max_depth': None, 'max_features': 'sqrt', 'min_samples_leaf': 4, 'min_samples_split': 5, 'n_estimators': 100}
Best Cross-Validation Score: 0.9166666666666666


In [41]:
from pprint import pprint
pprint(best_params)

{'bootstrap': True,
 'max_depth': None,
 'max_features': 'sqrt',
 'min_samples_leaf': 4,
 'min_samples_split': 5,
 'n_estimators': 100}


In [43]:
# Evaluate the model on the test set
best_model = grid_search.best_estimator_
test_score = best_model.score(X_test, y_test)
print("Test Set Score:", test_score)

Test Set Score: 0.925


### **Model Comparison**

Note: Comparing the accuracy score of all the classifier models that we used above.

In [44]:
X.shape

(400, 3)

In [45]:
# Step 8 - Feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X = sc.fit_transform(X)

In [46]:
# Step 9 - Compare classification algorithms
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC

In [47]:
classification_models = []
classification_models.append(('Logistic Regression', LogisticRegression(solver="liblinear")))
classification_models.append(('K Nearest Neighbor', KNeighborsClassifier(n_neighbors=5, metric="minkowski",p=2)))
classification_models.append(('Kernel SVM', SVC(kernel = 'rbf',gamma='scale')))
classification_models.append(('Naive Bayes', GaussianNB()))
classification_models.append(('Decision Tree', DecisionTreeClassifier(criterion = "entropy")))
classification_models.append(('Random Forest', RandomForestClassifier(n_estimators=100, criterion="entropy")))


In [48]:
classification_models

[('Logistic Regression', LogisticRegression(solver='liblinear')),
 ('K Nearest Neighbor', KNeighborsClassifier()),
 ('Kernel SVM', SVC()),
 ('Naive Bayes', GaussianNB()),
 ('Decision Tree', DecisionTreeClassifier(criterion='entropy')),
 ('Random Forest', RandomForestClassifier(criterion='entropy'))]

In [49]:
for name, model in classification_models:
  kfold = KFold(n_splits=10, random_state=(7), shuffle=(True))
  result = cross_val_score(model, X, y, cv=kfold, scoring='accuracy')
  print("%s: Mean Accuracy = %.2f%% - SD Accuracy = %.2f%%" % (name, result.mean()*100, result.std()*100))

Logistic Regression: Mean Accuracy = 84.00% - SD Accuracy = 6.24%
K Nearest Neighbor: Mean Accuracy = 91.25% - SD Accuracy = 5.15%
Kernel SVM: Mean Accuracy = 90.75% - SD Accuracy = 4.88%
Naive Bayes: Mean Accuracy = 88.75% - SD Accuracy = 5.15%
Decision Tree: Mean Accuracy = 85.75% - SD Accuracy = 5.71%
Random Forest: Mean Accuracy = 89.25% - SD Accuracy = 4.48%


## **Confusion Matrix**

It can be used to find the number of correct and incorrect entries.

*   If an individual has not purchased an iPhone and the expected value states that they have not purchased, it is a true negative (TN), i.e., the actual value is 0 and the predicted value is also 0.

*   If an individual has not purchased an iPhone but the expected value states that they have, it is a false positive (FP), i.e., the actual value is 0 and the value expected is 1.

*   If an individual has purchased an iPhone but the expected value states that they have not, it is a false negative (FN), i.e., the real value is 1 and the value expected is 0.

*   If an individual has purchased an iPhone and the expected value also says that they have purchased it is True Positive (TP), i.e., the actual value is 1 and the predicted value is also 1.

**Accuracy score:** This is the most common metric that is used to verify model accuracy. In other words, it is the proportion of the overall number of accurate predictions to the total number of predictions.

**Accuracy score = (TP+TN)/(TP+TN+FP+FN)**

**Recall score:** It is the proportion of positive incidents that we correctly expected. 

**Recall score = TP/(TP+FN)**

**Precision score:** This is the proportion of positive outcomes expected that are currently positive. 

**Precision score = TP/(TP+FP)**





In [50]:
# Ensure we have the best model from the Random Forest GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

# Define Random Forest and parameter grid
rf = RandomForestClassifier()
param_grid = {
    'n_estimators': [100, 200],
    'max_features': ['sqrt', 'log2'],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5],
    'min_samples_leaf': [1, 4],
    'bootstrap': [True, False]
}

# Perform grid search (using a smaller parameter grid for faster execution)
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=3, scoring='accuracy', n_jobs=-1)
grid_search.fit(X_train, y_train)

# Get the best model
best_model = grid_search.best_estimator_
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best cross-validation score: {grid_search.best_score_:.4f}")

In [51]:
y_pred = best_model.predict(X_test)

In [53]:
# Step 7 - Confusion matrix
from sklearn import metrics
cm = metrics.confusion_matrix(y_test, y_pred) 
print("Confusion Matrix:\n",cm)
accuracy = metrics.accuracy_score(y_test, y_pred) 
print("Accuracy score:",accuracy)
precision = metrics.precision_score(y_test, y_pred) 
print("Precision score:",precision)
recall = metrics.recall_score(y_test, y_pred) 
print("Recall score:",recall)


Confusion Matrix:
 [[30  2]
 [ 1  7]]
Accuracy score: 0.925
Precision score: 0.7777777777777778
Recall score: 0.875


### **Conclusion**

From the results, we can see that KNN and Kernel SVM have done better than the others for this particular dataset. So, we will shortlist these two for this project. This is precisely the same result that we arrived at by independently applying each of those algorithms.