# Regression Benchmark 

### Probem Example (Regression) - Big Mart Sales:
To build a predictive model and find out the sales of each product at the store.

Good starting point:
- Mean - whats has been total sales of each product month on month
- Mean with respect to another variable

The two most commonly used measures of central tendency for numerical data are the mean and the median. Since the regression problem deals with continuous data, mean and median are the correct measures.


To evaluate the model:

Mean Absolute Error -> sum of abs different between every observation, divided by number of obs


In [1]:
#importing libraries 

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt

In [2]:
data = pd.read_csv('train_bm.csv')

In [4]:
data.shape

(8523, 12)

In [56]:
data.describe()

Unnamed: 0,Item_Weight,Item_Visibility,Item_MRP,Outlet_Establishment_Year,Item_Outlet_Sales
count,7060.0,8523.0,8523.0,8523.0,8523.0
mean,12.857645,0.066132,140.992782,1997.831867,2181.288914
std,4.643456,0.051598,62.275067,8.37176,1706.499616
min,4.555,0.0,31.29,1985.0,33.29
25%,8.77375,0.026989,93.8265,1987.0,834.2474
50%,12.6,0.053931,143.0128,1999.0,1794.331
75%,16.85,0.094585,185.6437,2004.0,3101.2964
max,21.35,0.328391,266.8884,2009.0,13086.9648


In [6]:
data.isnull().sum()

Item_Identifier                 0
Item_Weight                  1463
Item_Fat_Content                0
Item_Visibility                 0
Item_Type                       0
Item_MRP                        0
Outlet_Identifier               0
Outlet_Establishment_Year       0
Outlet_Size                  2410
Outlet_Location_Type            0
Outlet_Type                     0
Item_Outlet_Sales               0
dtype: int64

### Shuffling and Creating Train and Test Set

In [8]:
from sklearn.utils import shuffle

In [12]:
#shuffle dataset

data = shuffle(data, random_state=42)

#creating 4 division of data
div = int(data.shape[0]/4)

# 3 parts to train set and 1 part to test set
train = data.iloc[:3*div+1,:]
test = data.iloc[3*div+1:]

In [13]:
train.shape, test.shape, data.shape

((6391, 12), (2132, 12), (8523, 12))

## simple mean model (benchmark)

Find out the simple mean of train model and save it as a column in test dataset


In [15]:
test['simple_mean'] = train.Item_Outlet_Sales.mean()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test['simple_mean'] = train.Item_Outlet_Sales.mean()


Calculate the error between the mean values created in line above with the actual value in Item_Outlet_Sales()


In [57]:
from sklearn.metrics import mean_absolute_error as MAE

simple_mean_error = MAE(test.Item_Outlet_Sales, test.simple_mean)
simple_mean_error

1348.3091635746123

This is the benchmark value for model we will create to predict accuracy of the models.

## Mean Item Outlet Sales with respect to Outlet_Type

Now, we will try to improve the above prediction by predicting based on outlet tyoe


In [27]:
out_type = pd.pivot_table(train, values = 'Item_Outlet_Sales', index=['Outlet_Type'], aggfunc=np.mean)

out_type

Unnamed: 0_level_0,Item_Outlet_Sales
Outlet_Type,Unnamed: 1_level_1
Grocery Store,334.106148
Supermarket Type1,2293.636762
Supermarket Type2,2034.330733
Supermarket Type3,3684.008727


In [29]:
# initializing new column to zero
test['Out_type_mean'] = 0

for i in test.Outlet_Type.unique():
    test['Out_type_mean'][test.Outlet_Type == str(i)] = train['Item_Outlet_Sales'][train.Outlet_Type == str(i)].mean()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test['Out_type_mean'] = 0
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test['Out_type_mean'][test.Outlet_Type == str(i)] = train['Item_Outlet_Sales'][train.Outlet_Type == str(i)].mean()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._where(~key, value, inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-d

In [30]:
test.head()

Unnamed: 0,Item_Identifier,Item_Weight,Item_Fat_Content,Item_Visibility,Item_Type,Item_MRP,Outlet_Identifier,Outlet_Establishment_Year,Outlet_Size,Outlet_Location_Type,Outlet_Type,Item_Outlet_Sales,simple_mean,Out_type_mean
432,FDF10,15.5,Regular,0.157172,Snack Foods,149.1418,OUT049,1999,Medium,Tier 1,Supermarket Type1,588.5672,2169.533,2293.636762
4451,FDZ37,,Regular,0.019673,Canned,86.4198,OUT027,1985,Medium,Tier 3,Supermarket Type3,1918.8356,2169.533,3684.008727
1412,DRF23,4.61,Low Fat,0.123346,Hard Drinks,172.5396,OUT017,2007,,Tier 2,Supermarket Type1,3663.2316,2169.533,2293.636762
1329,NCQ41,,Low Fat,0.019386,Health and Hygiene,194.5794,OUT027,1985,Medium,Tier 3,Supermarket Type3,3511.4292,2169.533,3684.008727
6874,NCM17,7.93,Low Fat,0.071426,Health and Hygiene,45.9086,OUT018,2009,Medium,Tier 3,Supermarket Type2,1070.6064,2169.533,2034.330733


In [32]:
#mean suqare error

err = MAE(test.Item_Outlet_Sales, test.Out_type_mean)
err

1114.8889656414237

Conclusion: As we can see, the MAE did improve by using Outlet Type.

## Mean Item Outlet Sales with respect to Outlet_Establishment_Year


In [35]:
establis_year = pd.pivot_table(data, values='Item_Outlet_Sales', index= ['Outlet_Establishment_Year'], aggfunc=np.mean)

establis_year

Unnamed: 0_level_0,Item_Outlet_Sales
Outlet_Establishment_Year,Unnamed: 1_level_1
1985,2483.677474
1987,2298.995256
1997,2277.844267
1998,339.351662
1999,2348.354635
2002,2192.384798
2004,2438.841866
2007,2340.675263
2009,1995.498739


In [42]:
test['establisment_year_mean'] = 0

for i in data.Outlet_Establishment_Year.unique():
    test['establisment_year_mean'][test.Outlet_Establishment_Year == str(i)] = train['Item_Outlet_Sales'][train['Outlet_Establishment_Year'] == str(i)].mean()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test['establisment_year_mean'] = 0
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test['establisment_year_mean'][test.Outlet_Establishment_Year == str(i)] = train['Item_Outlet_Sales'][train['Outlet_Establishment_Year'] == str(i)].mean()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._where(~key, value, inplace=True)


In [43]:
errorrr = MAE(test.Item_Outlet_Sales, test.establisment_year_mean)

errorrr

2216.5290828330203

Conclusion: As we can see, the MAE did not improve by using establishment year.

## Mean Item_Outlet_Sales with respect to both Outlet_Location_Type and Outlet_Establishment_Year

Now we will use 2 features to create our model. Making the model more complex

In [46]:
combo_mean = pd.pivot_table(train, values='Item_Outlet_Sales', index = ['Outlet_Location_Type', 'Outlet_Establishment_Year'], aggfunc=np.mean)

combo_mean

Unnamed: 0_level_0,Unnamed: 1_level_0,Item_Outlet_Sales
Outlet_Location_Type,Outlet_Establishment_Year,Unnamed: 2_level_1
Tier 1,1985,332.70906
Tier 1,1997,2249.438082
Tier 1,1999,2368.598566
Tier 2,2002,2105.096784
Tier 2,2004,2435.711052
Tier 2,2007,2350.448072
Tier 3,1985,3684.008727
Tier 3,1987,2254.35211
Tier 3,1998,335.469243
Tier 3,2009,2034.330733


In [47]:
# Initiating new empty column
test['Super_mean'] = 0

# Assigning variables to strings ( to shorten code length)
s2 = 'Outlet_Location_Type'
s1 = 'Outlet_Establishment_Year'

# For every Unique Value in s1
for i in test[s1].unique():
  # For every Unique Value in s2
  for j in test[s2].unique():
    # Calculate and Assign mean to new column, corresponding to both unique values of s1 and s2 simultaneously
    test['Super_mean'][(test[s1] == i) & (test[s2]==str(j))] = train['Item_Outlet_Sales'][(train[s1] == i) & (train[s2]==str(j))].mean()

here(~key, value, inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test['Super_mean'][(test[s1] == i) & (test[s2]==str(j))] = train['Item_Outlet_Sales'][(train[s1] == i) & (train[s2]==str(j))].mean()
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._where(~key, value, inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test['Super_mean'][(test[s1] == i) & (test[s2]==str(j))] = train['Item_Outlet_Sales'][(train[s1] == i) & (train[s2]==str(j))].mean()
A value is trying to be set on a copy o

In [48]:
#calculating mean absolute error
super_mean_error = MAE(test['Item_Outlet_Sales'] , test['Super_mean'] )
super_mean_error

1118.0230715619844

# Classification Benchmark

### Probem Example (Classification) - Titanic:
To predict wether a passenger of titanic would have survived or not?

Good point to start: Mode

To Evaluate: Accuracy - Correctly predict observation upon total observation


In [50]:
#importing libraries 
import pandas as pd 
import numpy as np
from sklearn.metrics import accuracy_score

In [54]:
data2 = pd.read_csv("train.csv")
data2.shape

(891, 12)

In [60]:
data2.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [62]:
data2.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

### Shuffling and Creating Train and Test Set

In [63]:
from sklearn.utils import shuffle

# Shuffling the Dataset
data2 = shuffle(data2, random_state = 42)

#creating 4 divisions
div = int(data2.shape[0]/4)

# 3 parts to train set and 1 part to test set
train2 = data2.loc[:3*div+1,:]
test2 = data2.loc[3*div+1:]

train2.shape, test2.shape

((621, 12), (271, 12))

## Simple Mode

In [65]:
train2.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
709,710,1,3,"Moubarek, Master. Halim Gonios (""William George"")",male,,1,1,2661,15.2458,,C
439,440,0,2,"Kvillner, Mr. Johan Henrik Johannesson",male,31.0,0,0,C.A. 18723,10.5,,S
840,841,0,3,"Alhomaki, Mr. Ilmari Rudolf",male,20.0,0,0,SOTON/O2 3101287,7.925,,S
720,721,1,2,"Harper, Miss. Annie Jessie ""Nina""",female,6.0,0,1,248727,33.0,,S
39,40,1,3,"Nicola-Yarred, Miss. Jamila",female,14.0,1,0,2651,11.2417,,C


In [77]:
test2['simple_mode'] = train2['Survived'].mode()[0]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test2['simple_mode'] = train2['Survived'].mode()[0]


In [75]:
test2.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,simple_mode
667,668,0,3,"Rommetvedt, Mr. Knud Paust",male,,0,0,312993,7.775,,S,0
571,572,1,1,"Appleton, Mrs. Edward Dale (Charlotte Lamson)",female,53.0,2,0,11769,51.4792,C101,S,0
636,637,0,3,"Leinonen, Mr. Antti Gustaf",male,32.0,0,0,STON/O 2. 3101292,7.925,,S,0
714,715,0,2,"Greenberg, Mr. Samuel",male,52.0,0,0,250647,13.0,,S,0
262,263,0,1,"Taussig, Mr. Emil",male,52.0,1,1,110413,79.65,E67,S,0


In [78]:
simple_mode_accuracy = accuracy_score(test2.Survived, test2.simple_mode)
simple_mode_accuracy

0.6346863468634686

## Mode based on gender

In [81]:
gender_mode = pd.crosstab(train2.Survived, train2.Sex)
gender_mode

Sex,female,male
Survived,Unnamed: 1_level_1,Unnamed: 2_level_1
0,57,321
1,167,76


In [89]:
test2['gender_mode'] = test2.Survived

#for every unique value in the columns
for i in test2.Sex.unique():
    test2['gender_mode'][test2.Sex == str(i)] = train2['Survived'][train2.Sex == str(i)].mode()[0]


male
female
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test2['gender_mode'] = test2.Survived
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test2['gender_mode'][test2.Sex == str(i)] = train2['Survived'][train2.Sex == str(i)].mode()[0]


In [114]:
test2[test2['Sex'] == "female"].head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,simple_mode,gender_mode,pclass_mode
571,572,1,1,"Appleton, Mrs. Edward Dale (Charlotte Lamson)",female,53.0,2,0,11769,51.4792,C101,S,0,1,0
610,611,0,3,"Andersson, Mrs. Anders Johan (Alfrida Konstant...",female,39.0,1,5,347082,31.275,,S,0,1,0
297,298,0,1,"Allison, Miss. Helen Loraine",female,2.0,1,2,113781,151.55,C22 C26,S,0,1,0
147,148,0,3,"Ford, Miss. Robina Maggie ""Ruby""",female,9.0,2,2,W./C. 6608,34.375,,S,0,1,0
325,326,1,1,"Young, Miss. Marie Grice",female,36.0,0,0,PC 17760,135.6333,C32,C,0,1,0


In [83]:
gender_mode_accuracy = accuracy_score(test2.Survived, test2.gender_mode)
gender_mode_accuracy

0.7896678966789668

So, by using just gender's mode, the accuracy jumped from 0.63 to 0.78.

### Lets check other variables for identifying which can be used to mode in prediction

In [90]:
train2.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
709,710,1,3,"Moubarek, Master. Halim Gonios (""William George"")",male,,1,1,2661,15.2458,,C
439,440,0,2,"Kvillner, Mr. Johan Henrik Johannesson",male,31.0,0,0,C.A. 18723,10.5,,S
840,841,0,3,"Alhomaki, Mr. Ilmari Rudolf",male,20.0,0,0,SOTON/O2 3101287,7.925,,S
720,721,1,2,"Harper, Miss. Annie Jessie ""Nina""",female,6.0,0,1,248727,33.0,,S
39,40,1,3,"Nicola-Yarred, Miss. Jamila",female,14.0,1,0,2651,11.2417,,C


## Mode in Pclass

In [91]:
#lets check Pclass

pclass_mode = pd.crosstab(train2.Pclass, train2.Survived)
pclass_mode

Survived,0,1
Pclass,Unnamed: 1_level_1,Unnamed: 2_level_1
1,50,97
2,66,65
3,262,81


In [127]:
#Lets use this to idenify the accuracy. So, as per mode, anyone who belongs to Pclass 1 will survive, anyone from Pclass 2 will not survive, anyone from Pclass 3 will not survive

test2['pclass_mode'] = 1

for i in test2['Pclass'].unique():
    test2['pclass_mode'][test2.Pclass == (i)] = train2['Survived'][train2.Pclass == (i)].mode()[0]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test2['pclass_mode'] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test2['pclass_mode'][test2.Pclass == (i)] = train2['Survived'][train2.Pclass == (i)].mode()[0]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._where(~key, value, inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide

In [128]:
pclass_mode_accuracy = accuracy_score(test2.Survived, test2.pclass_mode)

pclass_mode_accuracy

0.6678966789667896

In [136]:
test2[test2['Pclass'] == 2].head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,simple_mode,gender_mode,pclass_mode
714,715,0,2,"Greenberg, Mr. Samuel",male,52.0,0,0,250647,13.0,,S,0,0,0
150,151,0,2,"Bateman, Rev. Robert James",male,51.0,0,0,S.O.P. 1166,12.525,,S,0,0,0
705,706,0,2,"Morley, Mr. Henry Samuel (""Mr Henry Marshall"")",male,39.0,0,0,250655,26.0,,S,0,0,0
463,464,0,2,"Milling, Mr. Jacob Christian",male,48.0,0,0,234360,13.0,,S,0,0,0
123,124,1,2,"Webber, Miss. Susan",female,32.5,0,0,27267,13.0,E101,S,0,1,0


The accuracy improved from 0.63 to 0.66

PassengerId    621
Survived         2
Pclass           3
Name           621
Sex              2
Age             82
SibSp            7
Parch            7
Ticket         507
Fare           215
Cabin          111
Embarked         3
dtype: int64

## Mode on Embarked

In [140]:
embarked_mode = pd.crosstab(train2.Survived, train2.Embarked)
embarked_mode

Embarked,C,Q,S
Survived,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,56,39,283
1,70,19,152


In [143]:
test2['Embarked_mode'] = 1

for i in test2['Embarked'].unique():
    test2['Embarked_mode'][test2.Embarked == str(i)] = train2['Survived'][train2.Embarked == str(i)].mode()[0]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test2['Embarked_mode'] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test2['Embarked_mode'][test2.Embarked == str(i)] = train2['Survived'][train2.Embarked == str(i)].mode()[0]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._where(~key, value, inplace=True)


In [144]:
pclass_mode_accuracy = accuracy_score(test2.Survived, test2.Embarked_mode)

pclass_mode_accuracy

0.6494464944649446

Pclass is only slightly better than based on Embark

## Survival with respect to both Gender_mode and Pclass_mode

Now we will use 2 features to create our model. Making the model more complex

In [150]:
    from scipy.stats import mode


In [162]:
all_mode = pd.pivot_table(train2, values='Survived', index=['Sex', 'Pclass'], aggfunc='count')
all_mode

Unnamed: 0_level_0,Unnamed: 1_level_0,Survived
Sex,Pclass,Unnamed: 2_level_1
female,1,66
female,2,59
female,3,99
male,1,81
male,2,72
male,3,244


In [164]:
# Initiating new empty column
test2['all_mode'] = 0

# Assigning variables to strings ( to shorten code length)
s2 = 'Sex'
s1 = 'Pclass'

# For every Unique Value in s1
for i in test2[s1].unique():
  # For every Unique Value in s2
  for j in test2[s2].unique():
    # Calculate and Assign mean to new column, corresponding to both unique values of s1 and s2 simultaneously
    test2['all_mode'][(test2[s1] == i) & (test2[s2]==str(j))] = train2['Survived'][(train2[s1] == i) & (train2[s2]==str(j))].mode()[0]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test2['all_mode'] = 0
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  test2['all_mode'][(test2[s1] == i) & (test2[s2]==str(j))] = train2['Survived'][(train2[s1] == i) & (train2[s2]==str(j))].mode()[0]
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._where(~key, value, inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.

In [165]:
all_mode_accuracy = accuracy_score(test2.all_mode, test2.Survived)
all_mode_accuracy

0.7859778597785978

Accuracy didn't change much as compared to accuracy based on Gender