# Spam Email detection Baseline Models
Prepared By Deepa Francis<br>
For BrainStation<br>
On July 31, 2023

# Table of Contents
[1. Configuring Resources](#cr) <br>
- [1.1. Set up Libraries](#sl) <br>
- [1.2. Load Data](#ld) <br>
- [1.3. Summary Satistics](#ss) <br>

[2. Logistic Regression](#lr) <br>
- [2.1. xxx](#xxx) <br>

[3. PCA](#pca) <br>
- [3.1. xxx](#xxx) <br>

[4. SVM](#svm) <br>
- [4.1. xxx](#xxx) <br>

[5. Random Forest](#rf) <br>
- [5.1. xxx](#xxx) <br>

[6.Naive Bayes](#nb) <br>
- [6.1. xxx](#xxx) <br>

[7. XGBoost](#xgb) <br>
- [7.1. xxx](#xxx) <br>

<a id = "cr"></a>
## 1. Configuring Resources

A baseline model is essentially a simple model that acts as a reference in a machine learning project. Its main function is to contextualize the results of trained models.
Baseline models usually lack complexity and may have little predictive power. Regardless, their inclusion is a necessity for many reasons.
Here, we are going to develop 6 baseline models for comparison and we will explore how hyperparameter optimization is used to find the best tuning parameter. Further, we use grid search for fine tuning these parameters.

<a id = "sl"></a>
### 1.1. Setting up Libraries

In [8]:
# import relevant packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from scipy import stats
from scipy.stats import norm

from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report

from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV


import warnings
warnings.filterwarnings('ignore')

# display all columns in dataframe
pd.set_option('display.max_columns', None)

<a id = "ld"></a>
### 1.2. Load Data

In [2]:
# Load the data
X_train = pd.read_csv('X_train.csv') 
X_test = pd.read_csv('X_test.csv') 
X_validation = pd.read_csv('X_validation.csv') 

y_train = pd.read_csv('y_train.csv') 
y_test = pd.read_csv('y_test.csv') 
y_validation = pd.read_csv('y_validation.csv') 

In [3]:
# Check the sahpes
print(f'The shape of X_train is {X_train.shape}')
print(f'The shape of X_test is {X_test.shape}')
print(f'The shape of X_validation is {X_validation.shape}')

The shape of X_train is (22400, 228)
The shape of X_test is (12000, 228)
The shape of X_validation is (5600, 228)


Check for any null values

In [4]:
X_train.isna().sum().sum()

0

In [5]:
X_test.isna().sum().sum()

0

In [6]:
X_validation.isna().sum().sum()

0

<a id = "ss"></a>
### 1.3. Summary Statistics

In [7]:
# Summary statistics
X_train.describe()

Unnamed: 0,subject_Word_Count_clipped,message_Word_Count_clipped,email_from_hash_0,email_from_hash_1,email_from_hash_2,email_from_hash_3,email_from_hash_4,email_from_hash_5,email_from_hash_6,email_from_hash_7,email_from_hash_8,email_from_hash_9,email_from_hash_10,email_from_hash_11,email_from_hash_12,email_from_hash_13,email_from_hash_14,email_from_hash_15,email_from_hash_16,email_from_hash_17,email_from_hash_18,email_from_hash_19,email_from_hash_20,email_from_hash_21,email_from_hash_22,email_from_hash_23,email_from_hash_24,email_from_hash_25,email_from_hash_26,email_from_hash_27,email_from_hash_28,email_from_hash_29,email_from_hash_30,email_from_hash_31,email_from_hash_32,email_from_hash_33,email_from_hash_34,email_from_hash_35,email_from_hash_36,email_from_hash_37,email_from_hash_38,email_from_hash_39,email_from_hash_40,email_from_hash_41,email_from_hash_42,email_from_hash_43,email_from_hash_44,email_from_hash_45,email_from_hash_46,email_from_hash_47,email_from_hash_48,email_from_hash_49,email_from_hash_50,email_from_hash_51,email_from_hash_52,email_from_hash_53,email_from_hash_54,email_from_hash_55,email_from_hash_56,email_from_hash_57,email_from_hash_58,email_from_hash_59,email_from_hash_60,email_from_hash_61,email_from_hash_62,email_from_hash_63,email_from_hash_64,email_from_hash_65,email_from_hash_66,email_from_hash_67,email_from_hash_68,email_from_hash_69,host_transform__from_host_Others,host_transform__from_host_gmail,host_transform__from_host_gmx,host_transform__from_host_hotmail,host_transform__from_host_mail,host_transform__from_host_yahoo,domain_transform__from_domain_Others,domain_transform__from_domain_ca,domain_transform__from_domain_com,domain_transform__from_domain_edu,domain_transform__from_domain_net,domain_transform__from_domain_org,domain_transform__from_domain_uk,prominent_topic_0,prominent_topic_1,prominent_topic_2,prominent_topic_3,prominent_topic_4,subject_alert,subject_best,subject_branch,subject_bush,subject_cnn,subject_commit,subject_data,subject_discount,subject_dont,subject_file,subject_function,subject_help,subject_iphealth,subject_make,subject_medic,subject_mhln,subject_need,subject_new,subject_news,subject_notif,subject_patch,subject_perl,subject_pill,subject_price,subject_problem,subject_question,subject_rev,subject_samba,subject_ship,subject_softwar,subject_stock,subject_sugar,subject_svn,subject_test,subject_time,subject_use,subject_viagra,subject_want,message_ad,message_altern,message_avail,message_back,message_base,message_best,message_bit,message_boundari,message_ca,message_call,message_chang,message_check,message_click,message_code,message_come,message_contact,message_could,message_current,message_data,message_day,message_de,message_dear,message_en,message_end,message_even,message_file,message_find,message_first,message_follow,message_free,message_gener,message_gif,message_give,message_go,message_good,message_guid,message_help,message_hi,message_high,message_id,message_imag,message_inform,message_iso,message_know,message_last,message_list,message_look,message_mailman,message_make,message_mani,message_math,message_may,message_messag,message_minim,message_much,message_na,message_name,message_need,message_net,message_new,message_news,message_one,message_peopl,message_pleas,message_post,message_price,message_printabl,message_project,message_provid,message_read,message_receiv,message_reproduc,message_right,message_run,message_said,message_say,message_see,message_self,message_servic,message_set,message_st,message_system,message_take,message_th,message_thank,message_think,message_time,message_tri,message_two,message_type,message_us,message_use,message_version,message_want,message_way,message_well,message_work,message_world,message_wrote,message_ye
count,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0,22400.0
mean,6.306563,328.429688,0.0,-0.000848,-0.003571,0.001071,-0.012366,0.007634,-0.002054,-0.005625,-0.003571,-0.002634,0.001429,0.001384,-0.012411,0.002277,0.002232,-0.000536,0.00375,0.008929,-0.003304,0.003839,0.004821,-0.002589,0.005625,0.00058,-0.002723,-0.091473,-0.00625,-0.003482,-0.000536,-0.006652,-0.006652,-0.000625,-0.000313,0.005804,-0.002366,0.002232,0.002098,0.006562,0.004196,-0.000268,-0.005357,-0.004554,0.006652,-0.001652,-0.000893,-0.006964,-0.002232,0.001607,0.003438,-0.002188,0.002812,-0.012277,0.002277,0.009241,0.004018,-0.000179,0.004018,-0.001652,-0.005089,-0.001116,-0.003929,-0.000134,-0.022054,0.00817,-0.003929,0.002946,-0.0075,-0.00567,-0.006562,-0.001384,0.87317,0.065759,0.004643,0.008214,0.023616,0.024598,0.193125,0.029107,0.503929,0.022054,0.089196,0.132723,0.029866,0.246518,0.197589,0.219643,0.177902,0.158348,0.017988,0.01129,0.006537,0.00718,0.008889,0.024202,0.009152,0.007315,0.010639,0.01042,0.008207,0.012533,0.009982,0.008534,0.011015,0.012602,0.014536,0.022145,0.012638,0.010304,0.010593,0.019197,0.011724,0.01491,0.01704,0.011675,0.012979,0.025483,0.007607,0.009736,0.010726,0.010646,0.024743,0.010343,0.008464,0.013634,0.009435,0.010381,0.029214,0.019074,0.026198,0.018309,0.028589,0.03135,0.042066,0.015197,0.047545,0.022682,0.018284,0.02266,0.036055,0.036728,0.020681,0.023768,0.027538,0.020712,0.031096,0.044476,0.026102,0.024237,0.019464,0.022542,0.024951,0.035288,0.027507,0.022129,0.021479,0.032589,0.026884,0.029343,0.020893,0.038315,0.023643,0.040364,0.061951,0.028185,0.030398,0.036956,0.021282,0.02412,0.032341,0.031194,0.018696,0.048364,0.03259,0.033285,0.034199,0.021325,0.023545,0.047286,0.030219,0.020188,0.024689,0.023973,0.039005,0.035846,0.026962,0.05062,0.045403,0.049378,0.026467,0.048892,0.045013,0.04954,0.052142,0.023781,0.034116,0.035188,0.025511,0.020492,0.022636,0.020323,0.032066,0.017965,0.040524,0.025875,0.027583,0.021093,0.016622,0.021018,0.027051,0.017377,0.034946,0.024007,0.043985,0.033148,0.020547,0.043694,0.064698,0.05248,0.025415,0.031209,0.028406,0.021314,0.042186,0.024018,0.041451,0.022782
std,3.610184,375.479508,0.161193,0.176146,0.176997,0.175764,0.1887,0.182948,0.164199,0.201704,0.170313,0.157671,0.170082,0.155979,0.177354,0.146827,0.161731,0.155841,0.152905,0.170901,0.14233,0.148157,0.149926,0.161173,0.159143,0.148657,0.153072,0.330697,0.164365,0.151739,0.203324,0.153825,0.170351,0.153241,0.157976,0.161089,0.148639,0.154676,0.138377,0.153538,0.150835,0.14174,0.173898,0.155776,0.192498,0.170735,0.135291,0.176896,0.160623,0.163386,0.143421,0.1453,0.174598,0.184881,0.160205,0.210137,0.158911,0.152951,0.151136,0.143141,0.1721,0.144388,0.158069,0.148958,0.203227,0.1894,0.167663,0.176503,0.20559,0.182286,0.16798,0.163252,0.33279,0.247866,0.067982,0.090262,0.151853,0.154901,0.394759,0.168111,0.499996,0.146861,0.285033,0.339283,0.170222,0.430993,0.398189,0.414014,0.382439,0.365075,0.121297,0.097159,0.063099,0.071329,0.082705,0.11451,0.09195,0.076621,0.09716,0.097754,0.087237,0.104958,0.097287,0.086332,0.096081,0.108505,0.110993,0.142996,0.107747,0.08732,0.091843,0.127487,0.100798,0.110738,0.126363,0.102599,0.10913,0.120734,0.076014,0.094799,0.089056,0.099304,0.116652,0.093091,0.08834,0.110309,0.09365,0.096761,0.103655,0.04968,0.100995,0.069793,0.078581,0.103117,0.102094,0.047281,0.12961,0.088692,0.067551,0.08851,0.110219,0.092984,0.07707,0.081514,0.08019,0.074101,0.115328,0.115627,0.104701,0.088,0.076886,0.076525,0.082709,0.124124,0.090056,0.07302,0.075217,0.106089,0.097476,0.100387,0.078574,0.096321,0.080369,0.110751,0.124185,0.091817,0.106454,0.092776,0.066048,0.083534,0.108543,0.093508,0.066853,0.10426,0.096466,0.08902,0.094473,0.079492,0.069183,0.119694,0.096693,0.056704,0.0915,0.096665,0.099217,0.093266,0.111771,0.11775,0.146389,0.097823,0.090501,0.101254,0.117928,0.151637,0.110172,0.064987,0.072494,0.080262,0.085548,0.057779,0.080477,0.081419,0.114201,0.068997,0.101756,0.078968,0.094897,0.083686,0.06569,0.082426,0.086187,0.065708,0.091645,0.078867,0.099699,0.090943,0.071699,0.101139,0.13728,0.113728,0.088191,0.089744,0.081826,0.071927,0.115282,0.090743,0.107065,0.081394
min,1.0,1.0,-2.0,-1.0,-1.0,-2.0,-2.0,-2.0,-1.0,-2.0,-1.0,-1.0,-2.0,-2.0,-1.0,-2.0,-2.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-2.0,-1.0,-1.0,-2.0,-1.0,-1.0,-2.0,-1.0,-1.0,-2.0,-2.0,-1.0,-1.0,-2.0,-2.0,-2.0,-1.0,-1.0,-2.0,-2.0,-1.0,-2.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-2.0,-2.0,-1.0,-2.0,-1.0,-2.0,-2.0,-1.0,-1.0,-2.0,-1.0,-2.0,-2.0,-2.0,-1.0,-1.0,-1.0,-1.0,-1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,4.0,99.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,6.0,196.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,8.0,398.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037218,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.067218,0.0,0.050252,0.0,0.0,0.06967,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064328,0.053137,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,20.0,2000.0,2.0,2.0,1.0,2.0,1.0,2.0,2.0,2.0,1.0,2.0,1.0,1.0,2.0,1.0,2.0,2.0,2.0,1.0,1.0,2.0,1.0,1.0,1.0,2.0,1.0,2.0,1.0,1.0,1.0,2.0,2.0,2.0,2.0,1.0,1.0,1.0,2.0,1.0,1.0,2.0,2.0,2.0,2.0,1.0,1.0,2.0,2.0,2.0,2.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,2.0,1.0,2.0,2.0,2.0,1.0,2.0,2.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.72658,1.0,1.0,0.708329,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.99466,0.705975,1.0,0.786965,1.0,1.0,1.0,0.756766,1.0,1.0,0.995216,0.989087,1.0,1.0,1.0,1.0,1.0,1.0,0.998586,1.0,1.0,1.0,0.991966,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.926122,1.0,1.0,1.0,1.0,0.727616,1.0,1.0,1.0,1.0,0.929324,1.0,0.88723,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.8428,1.0,1.0,0.575818,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.976022,1.0,1.0,0.954763,1.0,1.0,1.0,0.675366,1.0,1.0,0.949583,1.0,1.0,1.0,0.999257,0.998934,1.0,0.996128,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.995305,1.0,0.837117,1.0,1.0,1.0,1.0,0.997227


We have a mix of numerical features and binary categorical features. The choice of feature scaling method depends on the nature of the features and the algorithm we plan to use.<br>

- For numerical features, it's often a good idea to scale them to a similar range to avoid issues caused by different scales. Two common scalers used for numerical features are Min-Max scaling and Standardization (Z-score scaling).<br>
- For binary categorical features, scaling is not necessary since they already have a fixed range (0 or 1).<br>

For simplicity of coding, we choose a single scaling technique that can take care of both numerical and binary features, and that is MinMax scaling.<br>

MinMax scaling scales all the features to a fixed range, typically [0, 1]. It works for numerical features as well as binary features. For numerical features, it linearly scales the values to the specified range. For binary features, it maps the minimum value to 0 and the maximum value to 1.

In [None]:
<a id = "lr"></a>
## 2. Logistic Regression