#### Waiter Tips prediction with machine learning

The data recorded by the food server is as follows:  
**total_bill**: Total bill in dollars including taxes  
**tip**: Tip given to waiters in dollars  
**sex**: gender of the person paying the bill  
**smoker**: whether the person smoked or not  
**day**: day of the week  
**time**: lunch or dinner  
**size**: number of people in a table  

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

print('libraries imported successfully')

libraries imported successfully


In [15]:
# Load dataset
path = "tips.csv"
df = pd.read_csv(path)
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [18]:
# Statistics about dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   total_bill  244 non-null    float64
 1   tip         244 non-null    float64
 2   sex         244 non-null    object 
 3   smoker      244 non-null    object 
 4   day         244 non-null    object 
 5   time        244 non-null    object 
 6   size        244 non-null    int64  
dtypes: float64(2), int64(1), object(4)
memory usage: 13.5+ KB


In [19]:
df.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


In [22]:
#Missing values
df.isnull().sum()

total_bill    0
tip           0
sex           0
smoker        0
day           0
time          0
size          0
dtype: int64

#### EDA - Let get some insights on the tips paid based on the:
**Total bill paid, number of people at a table, and day of the week**

In [10]:
figure = px.scatter(data_frame = df, x="total_bill", y="tip", size="size", color="day", trendline="ols", title="Tips paid per days of the week")
figure.show()

#### Tips given to waiters base on gender(male, female):
**Total bill paid, Number of people, Gender**

In [12]:
figure1 = px.scatter(data_frame=df, x="total_bill", y="tip", size="size", color="sex", trendline="ols",title="Tips paid per gender on a specific day")
figure1.show()

#### Tips given to a waiter based on the:
**Total bill paid, Number of people at the table, and time of meal**

In [13]:
figure2 = px.scatter(data_frame=df, x="total_bill", y="tip", size="size", color="time", trendline="ols",title="Tips paid per time of the day")
figure2.show()

#### Let get some percentage of which day has most tips

In [8]:
most_tips = px.pie(data_frame=df, values='tip', names='day', title='Tips by Day')
most_tips.show()

#### Gender of the person paying tips

In [14]:
person_paying = px.pie(data_frame=df, values='tip', names='sex', hole=0.5, title='Gender of the person giving tips')
person_paying.show()

#### Find out if a smoker tips or not

In [24]:
smoker_tips = px.pie(data_frame=df, values='tip', names='smoker', title='Tips by Smoker')
smoker_tips.show()

#### Which time of the day are most tips given - Lunch or Dinner

In [25]:
time_most_tips = px.pie(data_frame=df, values='tip', names='time', title='Time of the Day most tips are given', hole=0.5)
time_most_tips.show()