## <center> Liquor Sales Imputation  </center>

### Imputation
*    The different techniques and models used for imputation are:
*    - Mean for numerical variables
     - Simple Imputer with mean
     - Simple Imputer with median
     - Simple Imputer with mode
     - Iterative Imputer with mean 
     - Iterative Imputer with median 
     - Iterative Imputer with mode 
     - Iterative Imputer with extra tree regressor
     - Iterative Imputer with bayesian ridge
     - Imputation with KNN imputer
     

*    Columns imputed are sale_dollars, pack, bottle_volume_ml, state_bottle_cost, state_bottle_retail, bottles_sold, 
     volume_sold_liters and volume_sold_gallons. Rest of the columns are object type and were left as is because imputing 
     them with mode would have given biased data. Intuition behind this decision is that if a specific liquor category is 
     imputed (due to mode) it will not be a true representative of the data. It will show an exorbitant number of sales of 
     that category when that in reality does not hold true. 

In [1]:
from sklearn.experimental import enable_iterative_imputer  
from sklearn.impute import IterativeImputer
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from sklearn.impute import KNNImputer
from sklearn.linear_model import BayesianRidge
from sklearn.ensemble import ExtraTreesRegressor
import numpy as np
from numpy import nan
from numpy import isnan
import pandas as pd

In [2]:
liquor = pd.read_csv("Iowa Liq Final.csv")

In [3]:
liquor.head()

Unnamed: 0,invoice_and_item_number,date,store_number,store_name,address,city,zip_code,county_number,county,category,...,item_number,item_description,pack,bottle_volume_ml,state_bottle_cost,state_bottle_retail,bottles_sold,volume_sold_liters,volume_sold_gallons,sale_dollars
0,INV-33179700135,4/1/2021,2576.0,Hy-Vee Wine and Spirits / Storm Lake,1250 N Lake St,Storm Lake,50588.0,11.0,BUENA VIST,1081600.0,...,64870.0,Fireball Cinnamon Whiskey,48.0,100.0,0.9,1.35,48.0,4.8,1.26,64.8
1,INV-33196200106,4/1/2021,2649.0,Hy-Vee #3 / Dubuque,400 Locust St,Dubuque,52001.0,31.0,DUBUQUE,1081200.0,...,65200.0,Tequila Rose Liqueur,12.0,750.0,11.5,17.25,4.0,3.0,0.79,69.0
2,INV-33184300011,4/1/2021,2539.0,Hy-Vee Food Store / Iowa Falls,640 S. Oak,Iowa Falls,50126.0,42.0,HARDIN,1031100.0,...,38008.0,Smirnoff 80prf PET,6.0,1750.0,14.75,22.13,6.0,10.5,2.77,132.78
3,INV-33184100015,4/1/2021,4024.0,Wal-Mart 1546 / Iowa Falls,840 S Oak,Iowa Falls,50126.0,42.0,HARDIN,1031100.0,...,36648.0,Caliber Vodka,12.0,750.0,3.31,4.97,12.0,9.0,2.37,59.64
4,INV-33174200025,4/1/2021,5385.0,Vine Food & Liquor,2704 Vine St.,West Des Moines,50265.0,77.0,POLK,1012200.0,...,4626.0,Buchanan Deluxe 12YR,12.0,750.0,20.99,31.49,2.0,1.5,0.39,62.98


In [4]:
liquor.shape

(983741, 23)

In [5]:
liquor.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 983741 entries, 0 to 983740
Data columns (total 23 columns):
 #   Column                   Non-Null Count   Dtype  
---  ------                   --------------   -----  
 0   invoice_and_item_number  979697 non-null  object 
 1   date                     979287 non-null  object 
 2   store_number             958205 non-null  float64
 3   store_name               960273 non-null  object 
 4   address                  947182 non-null  object 
 5   city                     950748 non-null  object 
 6   zip_code                 949218 non-null  float64
 7   county_number            950646 non-null  float64
 8   county                   917218 non-null  object 
 9   category                 935031 non-null  float64
 10  category_name            932255 non-null  object 
 11  vendor_number            933668 non-null  float64
 12  vendor_name              909350 non-null  object 
 13  item_number              912275 non-null  float64
 14  item

#### Missing Values Percentage

In [6]:
#Missing Values 
liquor.isnull().sum().sum()/liquor.size*100

4.49051564164357

In [7]:
#Missing value by percentage per column
liquor.isnull().sum() / liquor.shape[0] * 100.00

invoice_and_item_number    0.411084
date                       0.452761
store_number               2.595805
store_name                 2.385587
address                    3.716324
city                       3.353830
zip_code                   3.509359
county_number              3.364199
county                     6.762247
category                   4.951507
category_name              5.233695
vendor_number              5.090059
vendor_name                7.562051
item_number                7.264717
item_description           7.854405
pack                       8.718657
bottle_volume_ml           8.079464
state_bottle_cost          4.210763
state_bottle_retail        3.162214
bottles_sold               3.256752
volume_sold_liters         2.930853
volume_sold_gallons        4.641974
sale_dollars               3.773554
dtype: float64

#### Imputation 
#### Imputation by Mean Columnwise

In [8]:
#making a copy of data to run imputations
liquorA = liquor.copy()

In [9]:
mean1 = liquorA['sale_dollars'].mean()
mean1

def impute_mean(variable):
    liquorA['sale_dollars'+'_mean'] = liquorA['sale_dollars'].fillna(liquorA['sale_dollars'].mean())
    
impute_mean('sale_dollars')

liquorA['sale_dollars_mean'].isna().sum()

0

In [10]:
mean3 = liquorA['bottle_volume_ml'].mean()
mean3

def impute_mean(variable):
    liquorA['bottle_volume_ml'+'_mean'] = liquorA['bottle_volume_ml'].fillna(liquorA['bottle_volume_ml'].mean())
    
impute_mean('bottle_volume_ml')

liquorA['bottle_volume_ml_mean'].isna().sum()

0

In [11]:
mean4 = liquorA['state_bottle_cost'].mean()
mean4

def impute_mean(variable):
    liquorA['state_bottle_cost'+'_mean'] = liquorA['state_bottle_cost'].fillna(liquorA['state_bottle_cost'].mean())
    
impute_mean('state_bottle_cost')

liquorA['state_bottle_cost_mean'].isna().sum()

0

In [12]:
mean5 = liquorA['state_bottle_retail'].mean()
mean5

def impute_mean(variable):
    liquorA['state_bottle_retail'+'_mean'] = liquorA['state_bottle_retail'].fillna(liquorA['state_bottle_retail'].mean())
    
impute_mean('state_bottle_retail')

liquorA['state_bottle_retail_mean'].isna().sum()

0

In [13]:
mean6 = liquorA['bottles_sold'].mean()
mean6

def impute_mean(variable):
    liquorA['bottles_sold'+'_mean'] = liquorA['bottles_sold'].fillna(liquorA['bottles_sold'].mean())
    
impute_mean('bottles_sold')

liquorA['bottles_sold_mean'].isna().sum()

0

In [14]:
mean7 = liquorA['volume_sold_liters'].mean()
mean7

def impute_mean(variable):
    liquorA['volume_sold_liters'+'_mean'] = liquorA['volume_sold_liters'].fillna(liquorA['volume_sold_liters'].mean())
    
impute_mean('volume_sold_liters')

liquorA['volume_sold_liters_mean'].isna().sum()

0

In [15]:
mean8 = liquorA['volume_sold_gallons'].mean()
mean8

def impute_mean(variable):
    liquorA['volume_sold_gallons'+'_mean'] = liquorA['volume_sold_gallons'].fillna(liquorA['volume_sold_gallons'].mean())
    
impute_mean('volume_sold_gallons')

liquorA['volume_sold_gallons_mean'].isna().sum()

0

In [16]:
liquorA.drop(['pack', 'bottle_volume_ml', 'state_bottle_cost', 'state_bottle_retail',
       'bottles_sold', 'volume_sold_liters', 'volume_sold_gallons',
       'sale_dollars'], axis = 1, inplace = True)

In [17]:
liquorA.head()

Unnamed: 0,invoice_and_item_number,date,store_number,store_name,address,city,zip_code,county_number,county,category,...,vendor_name,item_number,item_description,sale_dollars_mean,bottle_volume_ml_mean,state_bottle_cost_mean,state_bottle_retail_mean,bottles_sold_mean,volume_sold_liters_mean,volume_sold_gallons_mean
0,INV-33179700135,4/1/2021,2576.0,Hy-Vee Wine and Spirits / Storm Lake,1250 N Lake St,Storm Lake,50588.0,11.0,BUENA VIST,1081600.0,...,SAZERAC COMPANY INC,64870.0,Fireball Cinnamon Whiskey,64.8,100.0,0.9,1.35,48.0,4.8,1.26
1,INV-33196200106,4/1/2021,2649.0,Hy-Vee #3 / Dubuque,400 Locust St,Dubuque,52001.0,31.0,DUBUQUE,1081200.0,...,McCormick Distilling Co.,65200.0,Tequila Rose Liqueur,69.0,750.0,11.5,17.25,4.0,3.0,0.79
2,INV-33184300011,4/1/2021,2539.0,Hy-Vee Food Store / Iowa Falls,640 S. Oak,Iowa Falls,50126.0,42.0,HARDIN,1031100.0,...,DIAGEO AMERICAS,38008.0,Smirnoff 80prf PET,132.78,1750.0,14.75,22.13,6.0,10.5,2.77
3,INV-33184100015,4/1/2021,4024.0,Wal-Mart 1546 / Iowa Falls,840 S Oak,Iowa Falls,50126.0,42.0,HARDIN,1031100.0,...,SAZERAC NORTH AMERICA,36648.0,Caliber Vodka,59.64,750.0,3.31,4.97,12.0,9.0,2.37
4,INV-33174200025,4/1/2021,5385.0,Vine Food & Liquor,2704 Vine St.,West Des Moines,50265.0,77.0,POLK,1012200.0,...,DIAGEO AMERICAS,4626.0,Buchanan Deluxe 12YR,62.98,750.0,20.99,31.49,2.0,1.5,0.39


#### Testing the Imputation by Mean columnwise
*      Running a Linear Regression model to test the imputation by predicting sales dollars against the columns imputed

In [18]:
X_M = liquorA.iloc[:,16:]
y_m = liquorA['sale_dollars_mean']

In [19]:
from sklearn.model_selection import train_test_split

X_trainm, X_testm, y_trainm, y_testm = train_test_split(X_M, y_m, test_size = 0.3, random_state=42)

In [20]:
from sklearn.linear_model import LinearRegression

linear_reg = LinearRegression()

linear_reg.fit(X_trainm, y_trainm)

print('The score with manual column wise mean imputation:', linear_reg.score(X_testm, y_testm))

The score with manual column wise mean imputation: 0.7228135649284693


#### Simple Imputer with Mean

In [21]:
#making a copy of data to run imputations
liquor2 = liquor.copy()

In [22]:
#all numerical columns from 15 till end
liquor_X2 = liquor2.iloc[:, 15:]
liquor_X2.columns

Index(['pack', 'bottle_volume_ml', 'state_bottle_cost', 'state_bottle_retail',
       'bottles_sold', 'volume_sold_liters', 'volume_sold_gallons',
       'sale_dollars'],
      dtype='object')

In [23]:
#SIMPLE IMPUTER WITH MEAN

from sklearn.impute import SimpleImputer

#define the imputer
simp_mean = SimpleImputer(missing_values=np.nan, strategy='mean')

#print missing values before imputation
print('Missing Values: %d' % sum(isnan(liquor_X2).sum()))

#transforming the dataset
liquor_imputed = simp_mean.fit_transform(liquor_X2)

#count the number of NaN values in each column
print('Missing: %d' % isnan(liquor_imputed).sum().flatten())

Missing Values: 381438
Missing: 0


In [24]:
#transferring it back to pandas dataframe
liquor_imputed = pd.DataFrame.from_records(liquor_imputed, columns=liquor_X2.columns)

In [25]:
liquor_imputed.head()

Unnamed: 0,pack,bottle_volume_ml,state_bottle_cost,state_bottle_retail,bottles_sold,volume_sold_liters,volume_sold_gallons,sale_dollars
0,48.0,100.0,0.9,1.35,48.0,4.8,1.26,64.8
1,12.0,750.0,11.5,17.25,4.0,3.0,0.79,69.0
2,6.0,1750.0,14.75,22.13,6.0,10.5,2.77,132.78
3,12.0,750.0,3.31,4.97,12.0,9.0,2.37,59.64
4,12.0,750.0,20.99,31.49,2.0,1.5,0.39,62.98


In [26]:
X = liquor_imputed.iloc[:,:7]
y =  liquor_imputed.iloc[:,7]

In [27]:
from sklearn.model_selection import train_test_split
X = liquor_imputed.iloc[:,:7]
y =  liquor_imputed.iloc[:,7]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=42)

In [28]:
from sklearn.linear_model import LinearRegression

linear_reg = LinearRegression()

linear_reg.fit(X_train, y_train)

print('The score with simple imputer with mean imputation:',linear_reg.score(X_test, y_test))

The score with simple imputer with mean imputation: 0.7255197758885583


#### Simple Imputer with Median 

In [29]:
#SIMPLE IMPUTER WITH MEDIAN

from sklearn.impute import SimpleImputer

#define the imputer
simp_median2 = SimpleImputer(missing_values=np.nan, strategy='median')

#print missing values before imputation
print('Missing Values: %d' % sum(isnan(liquor_X2).sum()))

#transforming the dataset
liquor_imputed2 = simp_median2.fit_transform(liquor_X2)

#count the number of NaN values in each column
print('Missing Values after imputation: %d' % isnan(liquor_imputed2).sum().flatten())

Missing Values: 381438
Missing Values after imputation: 0


In [30]:
liquor_imputed2 = pd.DataFrame.from_records(liquor_imputed2, columns=liquor_X2.columns)

In [31]:
from sklearn.model_selection import train_test_split
X2 = liquor_imputed2.iloc[:,:7]
y2 =  liquor_imputed2.iloc[:,7]

X_train2, X_test2, y_train2, y_test2 = train_test_split(X2, y2, test_size = 0.3, random_state=42)

In [32]:
from sklearn.linear_model import LinearRegression

linear_reg2 = LinearRegression()

linear_reg2.fit(X_train2, y_train2)

print('The score with simple imputer with median imputation:',linear_reg2.score(X_test2, y_test2))

The score with simple imputer with median imputation: 0.7256714821256673


#### Simple Imputer with Mode

In [33]:
#SIMPLE IMPUTER WITH MODE

from sklearn.impute import SimpleImputer

#define the imputer
simp_mode3 = SimpleImputer(missing_values=np.nan, strategy='most_frequent')

#print missing values before imputation
print('Missing Values: %d' % sum(isnan(liquor_X2).sum()))

#transforming the dataset
liquor_imputed3 = simp_mode3.fit_transform(liquor_X2)

#count the number of NaN values in each column
print('Missing Values after imputation: %d' % isnan(liquor_imputed3).sum().flatten())

Missing Values: 381438
Missing Values after imputation: 0


In [34]:
liquor_imputed3 = pd.DataFrame.from_records(liquor_imputed3, columns=liquor_X2.columns)

In [35]:
from sklearn.model_selection import train_test_split
X3 = liquor_imputed3.iloc[:,:7]
y3 =  liquor_imputed3.iloc[:,7]

X_train3, X_test3, y_train3, y_test3 = train_test_split(X3, y3, test_size = 0.3, random_state=42)

In [36]:
from sklearn.linear_model import LinearRegression

linear_reg3 = LinearRegression()

linear_reg3.fit(X_train3, y_train3)

print('The score with simple imputer with mode imputation:',linear_reg3.score(X_test3, y_test3))

The score with simple imputer with mode imputation: 0.7253684680456922


#### Iterative Imputer with Mean

In [37]:
#ITERATIVE IMPUTER WITH MEAN

from sklearn.experimental import enable_iterative_imputer  
from sklearn.impute import IterativeImputer
#defining the imputer
imp_mean = IterativeImputer(max_iter = 10, random_state = 42)

# print total missing before imputation
print('Missing Values: %d' % sum(isnan(liquor_X2).sum()))

#transform the dataset
liquor_imputed4 = imp_mean.fit_transform(liquor_X2)

#print total missing values after mean imputation
print('Missing Values After Imputation: %d' % sum(isnan(liquor_imputed4).sum().flatten()))

Missing Values: 381438
Missing Values After Imputation: 0


In [38]:
liquor_imputed4 = pd.DataFrame.from_records(liquor_imputed4, columns=liquor_X2.columns)

In [39]:
from sklearn.model_selection import train_test_split
X4 = liquor_imputed4.iloc[:,:7]
y4 =  liquor_imputed4.iloc[:,7]

X_train4, X_test4, y_train4, y_test4 = train_test_split(X4, y4, test_size = 0.3, random_state=42)

In [40]:
from sklearn.linear_model import LinearRegression

linear_reg4 = LinearRegression()

linear_reg4.fit(X_train4, y_train4)

print('The score with iterative imputer with mean imputation:',linear_reg4.score(X_test4, y_test4))

The score with iterative imputer with mean imputation: 0.7282854684908151


#### Iterative Imputer with Median

In [41]:
liquor3 = liquor.copy()

In [42]:
#all numerical columns from 15 till end
liquor_X3 = liquor3.iloc[:, 15:]
liquor_X3.columns

Index(['pack', 'bottle_volume_ml', 'state_bottle_cost', 'state_bottle_retail',
       'bottles_sold', 'volume_sold_liters', 'volume_sold_gallons',
       'sale_dollars'],
      dtype='object')

In [43]:
#ITERATIVE IMPUTER WITH MEDIAN

from sklearn.experimental import enable_iterative_imputer  
from sklearn.impute import IterativeImputer
#defining the imputer
imp_median = IterativeImputer(initial_strategy = 'median', max_iter = 10, random_state = 42)

# print total missing before imputation
print('Missing Values: %d' % sum(isnan(liquor_X3).sum()))

#transform the dataset
liquor_imputed5 = imp_median.fit_transform(liquor_X3)

#print total missing values after mean imputation
print('Missing Values After Imputation: %d' % sum(isnan(liquor_imputed5).sum().flatten()))

Missing Values: 381438
Missing Values After Imputation: 0


In [44]:
liquor_imputed5 = pd.DataFrame.from_records(liquor_imputed5, columns=liquor_X3.columns)

In [45]:
from sklearn.model_selection import train_test_split
X5 = liquor_imputed5.iloc[:,:7]
y5 =  liquor_imputed5.iloc[:,7]

X_train5, X_test5, y_train5, y_test5 = train_test_split(X5, y5, test_size = 0.3, random_state=42)

In [46]:
from sklearn.linear_model import LinearRegression

linear_reg5 = LinearRegression()

linear_reg5.fit(X_train5, y_train5)

print('The score with iterative imputer with median imputation:',linear_reg5.score(X_test5, y_test5))

The score with iterative imputer with median imputation: 0.7284178754816684


#### Iterative Imputer with Mode

In [47]:
#ITERATIVE IMPUTER WITH MODE

from sklearn.experimental import enable_iterative_imputer  
from sklearn.impute import IterativeImputer
#defining the imputer
imp_mode = IterativeImputer(initial_strategy = 'most_frequent', max_iter = 10, random_state = 42)

# print total missing before imputation
print('Missing Values: %d' % sum(isnan(liquor_X2).sum()))

#transform the dataset
liquor_imputed6 = imp_mode.fit_transform(liquor_X2)

#print total missing values after mean imputation
print('Missing Values After Imputation: %d' % sum(isnan(liquor_imputed6).sum().flatten()))

Missing Values: 381438
Missing Values After Imputation: 0


In [48]:
liquor_imputed6 = pd.DataFrame.from_records(liquor_imputed6, columns=liquor_X3.columns)

In [49]:
from sklearn.model_selection import train_test_split
X6 = liquor_imputed6.iloc[:,:7]
y6 =  liquor_imputed6.iloc[:,7]

X_train6, X_test6, y_train6, y_test6 = train_test_split(X6, y6, test_size = 0.3, random_state=42)

In [50]:
from sklearn.linear_model import LinearRegression

linear_reg6 = LinearRegression()

linear_reg6.fit(X_train6, y_train6)

linear_reg6.score(X_test6, y_test6)

0.7283068871810945

####  Iterative Imputer with Extra Tree Regressor

In [None]:
#ITERATIVE IMPUTER WITH EXTRA TREE REGRESSOR

from sklearn.ensemble import ExtraTreesRegressor

imp_tree = IterativeImputer(
    estimator=ExtraTreesRegressor(), max_iter=10, random_state=42)

#transform the dataset
liquor_imputed7 = imp_tree.fit_transform(liquor_X3)

In [None]:
liquor_imputed7 = pd.DataFrame.from_records(liquor_imputed7, columns=liquor_X3.columns)

In [None]:
from sklearn.model_selection import train_test_split
X7 = liquor_imputed7.iloc[:,:7]
y7 =  liquor_imputed7.iloc[:,7]

X_train7, X_test7, y_train7, y_test7 = train_test_split(X7, y7, test_size = 0.3, random_state=42)

In [None]:
from sklearn.linear_model import LinearRegression

linear_reg7 = LinearRegression()

linear_reg7.fit(X_train7, y_train7)

linear_reg7.score(X_test7, y_test7)
#0.7267532647058694

#### Iterative Imputer with Bayesian Ridge

In [None]:
#ITERATIVE IMPUTER WITH BAYESIAN RIDGE

from sklearn.linear_model import BayesianRidge

imp_bridge = IterativeImputer(
    estimator=BayesianRidge(), max_iter=10, random_state=42)

#transform the dataset
liquor_imputed8 = imp_tree.fit_transform(liquor_X2)

In [None]:
liquor_imputed8 = pd.DataFrame.from_records(liquor_imputed8, columns=liquor_X3.columns)

In [None]:
from sklearn.model_selection import train_test_split
X8 = liquor_imputed8.iloc[:,:7]
y8 =  liquor_imputed8.iloc[:,7]

X_train8, X_test8, y_train8, y_test8 = train_test_split(X8, y8, test_size = 0.3, random_state=42)

In [None]:
from sklearn.linear_model import LinearRegression

linear_reg8 = LinearRegression()

linear_reg8.fit(X_train8, y_train8)

linear_reg8.score(X_test8, y_test8)
#0.7272483805724858

#### Iterative Imputer with KNN Imputer

In [None]:
#IMPUTATION WITH KNN IMPUTER

from sklearn.impute import KNNImputer

knn_imp = KNNImputer(n_neighbors=3)
# Fit/transform
liquor_imputed9 = knn_imp.fit_transform(liquor_X2)

In [None]:
liquor_imputed9 = pd.DataFrame.from_records(liquor_imputed9, columns=liquor_X3.columns)

In [None]:
from sklearn.model_selection import train_test_split
X9 = liquor_imputed8.iloc[:,:7]
y9 =  liquor_imputed8.iloc[:,7]

X_train9, X_test9, y_train9, y_test9 = train_test_split(X9, y9, test_size = 0.3, random_state=42)

In [None]:
from sklearn.linear_model import LinearRegression

linear_reg9 = LinearRegression()

linear_reg9.fit(X_train9, y_train9)

linear_reg9.score(X_test9, y_test9)
#0.7272483805724858

In [51]:
#Continue running from here
liquor_imputed4.head()

Unnamed: 0,pack,bottle_volume_ml,state_bottle_cost,state_bottle_retail,bottles_sold,volume_sold_liters,volume_sold_gallons,sale_dollars
0,48.0,100.0,0.9,1.35,48.0,4.8,1.26,64.8
1,12.0,750.0,11.5,17.25,4.0,3.0,0.79,69.0
2,6.0,1750.0,14.75,22.13,6.0,10.5,2.77,132.78
3,12.0,750.0,3.31,4.97,12.0,9.0,2.37,59.64
4,12.0,750.0,20.99,31.49,2.0,1.5,0.39,62.98


In [54]:
#liquorA.drop(['pack_mean', 'bottle_volume_ml_mean', 'state_bottle_cost_mean', 'state_bottle_retail_mean',
#       'bottles_sold_mean', 'volume_sold_liters_mean', 'volume_sold_gallons_mean',
#       'sale_dollars_mean'], axis = 1, inplace = True)

#### Follwing shows that Imputed Data now has 0 missing values

In [53]:
#our imputed data now has 0 missing values
liquor_imputed5.isnull().sum().sum()/liquor.size*100

0.0

#### Conclusive Remarks for Imputation
*    Linear regression model was run to evaluate the imputations run on the data. From the imputed data sale_dollars variable
     was taken as an independent variable and rest of the imputed columns as dependent variables and the scores were compared.
     Iterative imputer with median gave the best performance with the highest score when tested on the test data. With 
     this investigative technique it showed that it is statistically significant then using mean techniques on this data. 
     
     
*    Summary of scores:
*    - Mean for numerical variables : 0.7255
     - Simple Imputer with mean : 0.7256
     - Simple Imputer with median : 0.7253
     - Simple Imputer with mode : 0.7252
     - Iterative Imputer with mean : 0.7254
     - Iterative Imputer with median : 0.738
     - Iterative Imputer with mode : 0.7203
     - Iterative Imputer with extra tree regressor : 0.726 
     - Iterative Imputer with bayesian ridge : 0.7272
     - Imputation with KNN imputer : 0.7272