BUSINESS DESCRIPTION:

The dataset provided offers an opportunity to delve into tipping behavior analysis within our restaurant setting. It encompasses crucial factors like total bill, tip amount, customer gender, smoking preference, day of the week, time of day, and dining party size, shedding light on tipping patterns. By harnessing this data effectively, we aim to uncover correlations and trends that elucidate the drivers behind tipping behavior. This analysis will guide decisions regarding staffing allocation during peak periods, tailored customer experiences tailored to demographics and dining contexts, and focused marketing initiatives to attract and retain customers.


Pre-Processing Of Data:

Handle Missing Values: Fill all the null values which are existing in age, salary, and education column using measures of central tendency.


Encode Categorical Variables: Convert categorical variables like Name, gender and education into numerical format using OneHotEncoding.


Feature Scaling: Standardize or normalize numerical features like age and salary.


Data Splitting: Split the dataset into training and testing sets.


In [31]:
import pandas as pd
import matplotlib.pyplot as pylot

In [32]:
df = pd.read_csv("tips.csv")
df

Unnamed: 0,total_bill,smoker,sex,day,time,tip
0,16.99,Yes,Female,Sun,Dinner,1.01
1,10.34,No,Male,Sun,Dinner,1.66
2,21.01,No,Male,Sun,Lunch,3.5
3,23.68,Yes,Male,Sun,Lunch,1.02
4,24.59,No,,Sat,Lunch,3.61
5,25.29,Yes,Male,Sun,,4.71
6,8.77,No,,Sun,Dinner,2.0
7,26.88,,Male,Mon,Lunch,3.12
8,15.04,,Female,Sun,Dinner,1.96
9,,No,Male,,Lunch,3.23


In [33]:
X = df.iloc[:,:-1].values
y = df.iloc[:,5].values
X

array([[16.99, 'Yes', 'Female', 'Sun', 'Dinner'],
       [10.34, 'No', 'Male', 'Sun', 'Dinner'],
       [21.01, 'No', 'Male', 'Sun', 'Lunch'],
       [23.68, 'Yes', 'Male', 'Sun', 'Lunch'],
       [24.59, 'No', nan, 'Sat', 'Lunch'],
       [25.29, 'Yes', 'Male', 'Sun', nan],
       [8.77, 'No', nan, 'Sun', 'Dinner'],
       [26.88, nan, 'Male', 'Mon', 'Lunch'],
       [15.04, nan, 'Female', 'Sun', 'Dinner'],
       [nan, 'No', 'Male', nan, 'Lunch'],
       [10.27, 'Yes', 'Male', 'Sun', 'Dinner'],
       [35.26, 'No', 'Female', 'Fri', 'Breakfast'],
       [15.42, 'Yes', 'Male', nan, nan]], dtype=object)

In [34]:
y

array([1.01, 1.66, 3.5 , 1.02, 3.61, 4.71, 2.  , 3.12, 1.96, 3.23, 1.71,
       5.  , 2.7 ])

In [35]:
from sklearn.impute import SimpleImputer

In [36]:
imputer = SimpleImputer(strategy="mean")
imputer = imputer.fit(X[:,0:1])
X[:,0:1] = imputer.transform(X[:,0:1])
X

array([[16.99, 'Yes', 'Female', 'Sun', 'Dinner'],
       [10.34, 'No', 'Male', 'Sun', 'Dinner'],
       [21.01, 'No', 'Male', 'Sun', 'Lunch'],
       [23.68, 'Yes', 'Male', 'Sun', 'Lunch'],
       [24.59, 'No', nan, 'Sat', 'Lunch'],
       [25.29, 'Yes', 'Male', 'Sun', nan],
       [8.77, 'No', nan, 'Sun', 'Dinner'],
       [26.88, nan, 'Male', 'Mon', 'Lunch'],
       [15.04, nan, 'Female', 'Sun', 'Dinner'],
       [19.461666666666666, 'No', 'Male', nan, 'Lunch'],
       [10.27, 'Yes', 'Male', 'Sun', 'Dinner'],
       [35.26, 'No', 'Female', 'Fri', 'Breakfast'],
       [15.42, 'Yes', 'Male', nan, nan]], dtype=object)

In [37]:
imputer = SimpleImputer(strategy="most_frequent")
imputer = imputer.fit(X[:,1:5])
X[:,1:5] = imputer.transform(X[:,1:5])
X

array([[16.99, 'Yes', 'Female', 'Sun', 'Dinner'],
       [10.34, 'No', 'Male', 'Sun', 'Dinner'],
       [21.01, 'No', 'Male', 'Sun', 'Lunch'],
       [23.68, 'Yes', 'Male', 'Sun', 'Lunch'],
       [24.59, 'No', 'Male', 'Sat', 'Lunch'],
       [25.29, 'Yes', 'Male', 'Sun', 'Dinner'],
       [8.77, 'No', 'Male', 'Sun', 'Dinner'],
       [26.88, 'No', 'Male', 'Mon', 'Lunch'],
       [15.04, 'No', 'Female', 'Sun', 'Dinner'],
       [19.461666666666666, 'No', 'Male', 'Sun', 'Lunch'],
       [10.27, 'Yes', 'Male', 'Sun', 'Dinner'],
       [35.26, 'No', 'Female', 'Fri', 'Breakfast'],
       [15.42, 'Yes', 'Male', 'Sun', 'Dinner']], dtype=object)

In [38]:
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
X[:,1] = label_encoder.fit_transform(X[:,1])

In [39]:
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
X[:,2] = label_encoder.fit_transform(X[:,2])

In [40]:
X

array([[16.99, 1, 0, 'Sun', 'Dinner'],
       [10.34, 0, 1, 'Sun', 'Dinner'],
       [21.01, 0, 1, 'Sun', 'Lunch'],
       [23.68, 1, 1, 'Sun', 'Lunch'],
       [24.59, 0, 1, 'Sat', 'Lunch'],
       [25.29, 1, 1, 'Sun', 'Dinner'],
       [8.77, 0, 1, 'Sun', 'Dinner'],
       [26.88, 0, 1, 'Mon', 'Lunch'],
       [15.04, 0, 0, 'Sun', 'Dinner'],
       [19.461666666666666, 0, 1, 'Sun', 'Lunch'],
       [10.27, 1, 1, 'Sun', 'Dinner'],
       [35.26, 0, 0, 'Fri', 'Breakfast'],
       [15.42, 1, 1, 'Sun', 'Dinner']], dtype=object)

In [41]:
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([("day",OneHotEncoder(),[3]),("time",OneHotEncoder(),[4])],remainder="passthrough")
X = ct.fit_transform(X)

In [42]:
X

array([[0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 16.99, 1, 0],
       [0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 10.34, 0, 1],
       [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 21.01, 0, 1],
       [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 23.68, 1, 1],
       [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 24.59, 0, 1],
       [0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 25.29, 1, 1],
       [0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 8.77, 0, 1],
       [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 26.88, 0, 1],
       [0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 15.04, 0, 0],
       [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 19.461666666666666, 0, 1],
       [0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 10.27, 1, 1],
       [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 35.26, 0, 0],
       [0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 15.42, 1, 1]], dtype=object)

In [43]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2, random_state = 0)

In [44]:
X

array([[0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 16.99, 1, 0],
       [0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 10.34, 0, 1],
       [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 21.01, 0, 1],
       [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 23.68, 1, 1],
       [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 24.59, 0, 1],
       [0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 25.29, 1, 1],
       [0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 8.77, 0, 1],
       [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 26.88, 0, 1],
       [0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 15.04, 0, 0],
       [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 19.461666666666666, 0, 1],
       [0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 10.27, 1, 1],
       [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 35.26, 0, 0],
       [0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 15.42, 1, 1]], dtype=object)

In [45]:
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
print(X_train)
X_test = sc_X.fit_transform(X_test)
print(X_test)

[[ 0.         -0.33333333  0.          0.33333333  0.          0.81649658
  -0.81649658 -1.47083674  1.          0.5       ]
 [ 0.         -0.33333333  0.          0.33333333  0.         -1.22474487
   1.22474487  0.46310845 -1.          0.5       ]
 [ 0.         -0.33333333  0.          0.33333333  0.          0.81649658
  -0.81649658 -0.61190578 -1.         -2.        ]
 [ 0.         -0.33333333  0.          0.33333333  0.          0.81649658
  -0.81649658 -1.45823189 -1.          0.5       ]
 [ 0.          3.          0.         -3.          0.         -1.22474487
   1.22474487  1.52011573 -1.          0.5       ]
 [ 0.         -0.33333333  0.          0.33333333  0.         -1.22474487
   1.22474487  0.18430101 -1.          0.5       ]
 [ 0.         -0.33333333  0.          0.33333333  0.         -1.22474487
   1.22474487  0.9438937   1.          0.5       ]
 [ 0.         -0.33333333  0.          0.33333333  0.          0.81649658
  -0.81649658 -0.26077048  1.         -2.        ]
