# Classification Modeling Examples

* Here we'll be taking the example of `Titanic` Dataset and perform some Simple Base Classification Models on the Data.
* We'll also note for change in performance of the Models according to the Model Specified.
* Here we'll be predicting whether a person survived the `Titanic` Shipwreck or not, so it's a `Classification Problem`. (0 - The Person Didn't Survive, 1 - The Person Survived])

# Importing Libraries and Data

In [None]:
# Importing in Common Data Science Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

# Modeling & Accuracy Metrics
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.metrics import classification_report, confusion_matrix

In [None]:
# Reading in the Data
data = pd.read_csv('titanic.csv')

In [None]:
# Visualising the First 5 Records of Data
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38,1,0,PC 17599,71.2833,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,S
4,5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,S


In [None]:
# Shape of the Dataframe
data.shape

(891, 11)

# Splitting the Data into Train - Test

In [None]:
# Splitting Data into Train and Test
x_train, x_test, y_train, y_test = train_test_split(
    data.drop('Survived', axis = 1),
    data.Survived,
    test_size = 0.25,
    random_state = 0
)

# ML Modeling

Note : <br>
The Metrics being used in Base Model Testing : 

1.   Accuracy
2.   Recall
3.   Precision
4.   F1 - Score

Using the Following - 

1.   Classification Report
2.   Confusion Matrix



In [None]:
# Sizes of the DataFrames we're going to do Operations on
print(f"X-Train = {len(x_train)}, Y-Train = {len(y_train)}")
print(f"X-Test = {len(x_test)}, Y-Test = {len(y_test)}")

X-Train = 668, Y-Train = 668
X-Test = 223, Y-Test = 223


## **1. Logistic Regression Classifier**

In [None]:
# Fitting Logistic Regression to the Training Set
classifier = LogisticRegression()
classifier.fit(x_train, y_train)

LogisticRegression()

In [None]:
# Predicting the Test Set Results
y_pred = classifier.predict(x_test)
y_pred[:10]

array([0, 0, 0, 1, 1, 0, 1, 1, 1, 1], dtype=int64)

__Checking Accuracies__

In [None]:
# Confusion Matrix
confusion_matrix(y_test, y_pred)

array([[118,  21],
       [ 26,  58]], dtype=int64)

In [None]:
# Classification Report
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.82      0.85      0.83       139
           1       0.73      0.69      0.71        84

    accuracy                           0.79       223
   macro avg       0.78      0.77      0.77       223
weighted avg       0.79      0.79      0.79       223



## <b> 2. Kernel - SVM Classifier

In [None]:
# Fitting Kernel SVM to the Training Set
classifier = SVC(kernel = 'rbf')
classifier.fit(x_train, y_train)

SVC()

In [None]:
# Predicting the Test Set Results
y_pred = classifier.predict(x_test)
y_pred[:10]

array([0, 0, 0, 1, 0, 0, 1, 1, 1, 0], dtype=int64)

__Checking Accuracies__

In [None]:
# Confusion Matrix
confusion_matrix(y_test, y_pred)

array([[133,   6],
       [ 56,  28]], dtype=int64)

In [None]:
# Classification Report
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.70      0.96      0.81       139
           1       0.82      0.33      0.47        84

    accuracy                           0.72       223
   macro avg       0.76      0.65      0.64       223
weighted avg       0.75      0.72      0.68       223



## **3. Linear - SVM  Classifier**

In [None]:
# Fitting Kernel SVM to the Training Set
classifier = SVC(kernel = 'linear')
classifier.fit(x_train, y_train)

SVC(kernel='linear')

In [None]:
# Predicting the Test Set Results
y_pred = classifier.predict(x_test)

__Checking Accuracies__

In [None]:
# Confusion Matrix
confusion_matrix(y_test, y_pred)

array([[116,  23],
       [ 25,  59]], dtype=int64)

In [None]:
# Classification Report
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.82      0.83      0.83       139
           1       0.72      0.70      0.71        84

    accuracy                           0.78       223
   macro avg       0.77      0.77      0.77       223
weighted avg       0.78      0.78      0.78       223



## **4. K-NN Classifier**

In [None]:
# Fitting KNN to the Training Set
classifier = KNeighborsClassifier()
classifier.fit(x_train, y_train)

KNeighborsClassifier()

In [None]:
# Predicting the Validation Set Results
y_pred = classifier.predict(x_test)
y_pred[:10]

array([0, 0, 0, 0, 0, 1, 1, 1, 1, 0], dtype=int64)

__Checking Accuracies__

In [None]:
# Confusion Matrix
confusion_matrix(y_test, y_pred)

array([[114,  25],
       [ 37,  47]], dtype=int64)

In [None]:
# Classification Report
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.75      0.82      0.79       139
           1       0.65      0.56      0.60        84

    accuracy                           0.72       223
   macro avg       0.70      0.69      0.69       223
weighted avg       0.72      0.72      0.72       223



## **5. Decision Tree  Classifier**

In [None]:
# Fitting Decision Tree Classifier to the Training Set
classifier = DecisionTreeClassifier()
classifier.fit(x_train, y_train)

DecisionTreeClassifier()

In [None]:
# Predicting the Test Set Results
y_pred = classifier.predict(x_test)
y_pred[:10]

array([0, 0, 0, 1, 0, 0, 1, 1, 1, 0], dtype=int64)

__Checking Accuracies__

In [None]:
# Confusion Matrix
confusion_matrix(y_test, y_pred)

array([[115,  24],
       [ 29,  55]], dtype=int64)

In [None]:
# Classification Report
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.80      0.83      0.81       139
           1       0.70      0.65      0.67        84

    accuracy                           0.76       223
   macro avg       0.75      0.74      0.74       223
weighted avg       0.76      0.76      0.76       223



## __6. Random Forest  Classifier__

In [None]:
# Fitting Random Forest Classifier to the Training Set
classifier = RandomForestClassifier()
classifier.fit(x_train, y_train)

RandomForestClassifier()

In [None]:
# Predicting the Test Set Results
y_pred = classifier.predict(x_test)
y_pred[:10]

array([0, 0, 0, 1, 1, 1, 1, 1, 1, 1], dtype=int64)

__Checking Accuracies__

In [None]:
# Confusion Matrix
confusion_matrix(y_test, y_pred)

array([[126,  13],
       [ 22,  62]], dtype=int64)

In [None]:
# Classification Report
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.85      0.91      0.88       139
           1       0.83      0.74      0.78        84

    accuracy                           0.84       223
   macro avg       0.84      0.82      0.83       223
weighted avg       0.84      0.84      0.84       223



## **7. XGBoost Classifier**

In [None]:
# Fitting XGBoost to the Training Set
classifier = XGBClassifier()
classifier.fit(x_train, y_train)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.300000012, max_delta_step=0, max_depth=6,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=100, n_jobs=0, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

In [None]:
# Predicting the Test Set Results
y_pred = classifier.predict(x_test)

__Checking Accuracies__

In [None]:
# Confusion Matrix
confusion_matrix(y_test, y_pred)

array([[117,  22],
       [ 19,  65]], dtype=int64)

In [None]:
# Classification Report
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.84      0.88      0.86       139
           1       0.78      0.73      0.75        84

    accuracy                           0.82       223
   macro avg       0.81      0.80      0.81       223
weighted avg       0.82      0.82      0.82       223



## __Score Card of the Model's Performances__

Therefore, Accuracies of the Base Models are - <br>



```
   Model Name                             Accuracy
1. Logistic Regression :                    0.79
2. Kernel-SVM :                             0.72              
3. Liner-SVM :                              0.78
4. K-NN :                                   0.72       
5. Decision Tree :                          0.76          
6. Random Forest :                          0.84     
7. XGBoost Classifier :                     0.82      
``` 

* We can note from here that, as the complexity of the Model increases, Accuracy of the Model increases as well.

# End