##  Waiter Tips Case Study


Description of our dataset:

1. total_bill: Total bill in dollars including tax
2. tip: Tip given to waiter in dollars
3. sex: gender of the person paying the bill
4. smoker: whether the person smoked or not
5. day: day of the week
6. time: lunch or dinner
7. size: number of people

In [6]:
# Import neccesary libraries

import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go


In [2]:
# Upload data
data = pd.read_csv("tips.csv")
print(data.head())

   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4


In [7]:
# Analyzing the data using a scatterplot to see if the is a correlation between the days and the total bill amount and the amount of tip given.

figure = px.scatter(data_frame = data, x="total_bill",
                    y="tip", size="size", color= "day", trendline="ols")
figure.show()

In [8]:
# tips by day

figure = px.pie(data, 
             values='tip', 
             names='day',hole = 0.5)
figure.show()

According to the visualization above, on Saturdays, most tips are given to the waiters.

In [4]:
# Now we will look at the tips given to waiters by total bill paid, number of people at the table, and gender of the person paying the bill

figure = px.scatter(data_frame = data, x="total_bill",
                    y="tip", size="size", color= "sex", trendline="ols")
figure.show()

In [9]:
# tips by gender
figure = px.pie(data, 
             values='tip', 
             names='sex',hole = 0.5)
figure.show()

According to the visualization above, most tips are given by men.

In [5]:
# Tips given to waiters by number of people at table and time of the meal

figure = px.scatter(data_frame = data, x="total_bill",
                    y="tip", size="size", color= "time", trendline="ols")
figure.show()

In [12]:
# tips by time
figure = px.pie(data, 
             values='tip', 
             names='time',hole = 0.5)
figure.show()

### From the different visualizations we've plotted above we can observe the following:
    
    1. People tend to leave higher tips for the waiters at Dinner time
    2. According to this sample/dataset, males leave higher tips
    3. The higher the totall bill, the larger the tips waiters generally get
    4. More people leave tips on Saturdays over any other day of the week

## Waiter Tips Prediction Model

In [14]:
# Trainning ML model

data["sex"] = data["sex"].map({"Female": 0, "Male": 1})
data["smoker"] = data["smoker"].map({"No": 0, "Yes": 1})
data["day"] = data["day"].map({"Thur": 0, "Fri": 1, "Sat": 2, "Sun": 3})
data["time"] = data["time"].map({"Lunch": 0, "Dinner": 1})
data.head()


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,0,0,3,1,2
1,10.34,1.66,1,0,3,1,3
2,21.01,3.5,1,0,3,1,3
3,23.68,3.31,1,0,3,1,2
4,24.59,3.61,0,0,3,1,4


In [15]:
# Split the data into training and test sets

x = np.array(data[["total_bill", "sex", "smoker", "day", 
                   "time", "size"]])
y = np.array(data["tip"])

from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, y, 
                                                test_size=0.2, 
                                                random_state=42)

In [16]:
# Train ML model

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(xtrain, ytrain)

LinearRegression()

In [17]:
# features = [[total_bill, "sex", "smoker", "day", "time", "size"]]
features = np.array([[24.50, 1, 0, 0, 1, 4]])
model.predict(features)

array([3.73742609])