### 🧩 Case Study: Analyzing Factors Affecting Restaurant Tips
🎯 Objective

Use data analysis and visualization to understand what factors influence the amount of tip customers leave in restaurants.

You’ll explore relationships between variables such as:

Total bill

Gender of the customer

Day of the week

Time of day (Lunch/Dinner)

Party size

📦 1. Dataset

Use Seaborn’s built-in dataset:

import seaborn as sns
tips = sns.load_dataset("tips")
tips.head()


Columns include:

total_bill: Total bill amount (in dollars)

tip: Tip amount (in dollars)

sex: Gender of the customer

smoker: Whether the customer was a smoker or not

day: Day of the week

time: Time of meal (Lunch/Dinner)

size: Number of people in the party

🔧 2. Tools Used

Pandas → Data cleaning, aggregation, and manipulation

NumPy → Numerical operations, statistics, and derived metrics

Matplotlib & Seaborn → Data visualization and pattern discovery

🧹 3. Data Cleaning & Preprocessing

Tasks:

Check for missing values and duplicates.

Create a new column tip_percent = (tip / total_bill) * 100.

Use NumPy to calculate summary stats:

import numpy as np
np.mean(tips['tip_percent'])
np.median(tips['tip_percent'])
np.std(tips['tip_percent'])


Convert categorical data types (if needed).

📊 4. Exploratory Data Analysis (EDA)

Use Seaborn and Matplotlib to visualize patterns.

Example visualizations:
Question	Visualization	Function
What is the relationship between total bill and tip?	Scatter plot	sns.scatterplot()
Do men or women tip more?	Box plot	sns.boxplot()
How does tipping differ across days?	Bar plot	sns.barplot()
Does party size affect the tip amount?	Line plot	sns.lineplot()
What’s the overall distribution of tip percentage?	Histogram	sns.histplot()

Add titles, axis labels, and annotations using plt.title(), plt.xlabel(), and plt.ylabel().

📈 5. Insights & Conclusions

Summarize what you found:

Do larger bills result in higher tips (in absolute or percentage terms)?

Does gender or smoking status influence tipping behavior?

Which day or time yields the highest average tip?

🧠 6. Bonus Extensions

If you want to make it more advanced:

Perform a correlation analysis using sns.heatmap().

Add regression lines with sns.lmplot().

Use groupby() in Pandas to compute aggregated insights (e.g., average tip by day and gender).

Export visualizations or final results to a .csv or .pdf report.

📁 Deliverables

Jupyter Notebook (.ipynb) with:

Data cleaning

EDA and visualizations

Conclusions & insights

(Optional) Report or dashboard summarizing findings.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
sns.get_dataset_names()

['anagrams',
 'anscombe',
 'attention',
 'brain_networks',
 'car_crashes',
 'diamonds',
 'dots',
 'dowjones',
 'exercise',
 'flights',
 'fmri',
 'geyser',
 'glue',
 'healthexp',
 'iris',
 'mpg',
 'penguins',
 'planets',
 'seaice',
 'taxis',
 'tips',
 'titanic']

In [3]:
tips = sns.load_dataset("tips")

In [4]:
tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


### Checking if it contain null values

In [5]:
tips.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   total_bill  244 non-null    float64 
 1   tip         244 non-null    float64 
 2   sex         244 non-null    category
 3   smoker      244 non-null    category
 4   day         244 non-null    category
 5   time        244 non-null    category
 6   size        244 non-null    int64   
dtypes: category(4), float64(2), int64(1)
memory usage: 7.4 KB


In [6]:
tips.isna()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...
239,False,False,False,False,False,False,False
240,False,False,False,False,False,False,False
241,False,False,False,False,False,False,False
242,False,False,False,False,False,False,False


In [7]:
tips["total_bill"].isna()

0      False
1      False
2      False
3      False
4      False
       ...  
239    False
240    False
241    False
242    False
243    False
Name: total_bill, Length: 244, dtype: bool

In [8]:
tips.isnull().sum()

total_bill    0
tip           0
sex           0
smoker        0
day           0
time          0
size          0
dtype: int64

In [9]:
tips.isnull().sum().sum()

np.int64(0)

In [10]:
tips.isnull().mean()

total_bill    0.0
tip           0.0
sex           0.0
smoker        0.0
day           0.0
time          0.0
size          0.0
dtype: float64