## Project Overview

This notebook presents a customer segmentation analysis based on restaurant transaction data.
The goal is to identify high-value customer segments and derive practical business recommendations.


# Customer Segmentation and Business Analysis


## Introduction


This analysis focuses on understanding customer behavior and identifying the most valuable customer segments in the restaurant.
Using frequency analysis (value_counts) and basic customer segmentation, the goal is to examine who visits the restaurant, when they visit, and how their spending and tipping patterns differ.

The insights from this analysis can help support business decisions related to staffing, marketing focus, and revenue optimization.

## Data Overview


The dataset contains information about restaurant transactions, including total bill amounts, tips, customer characteristics, and visit details such as day of the week and time of day.

Key variables used in this analysis include:

total_bill – total amount of the customer’s bill

tip – tip given by the customer

day – day of the week when the visit occurred

time – time of day (Lunch or Dinner)

sex – customer gender

size – number of people at the table

The dataset is used to explore customer behavior patterns and to support basic customer segmentation and business insights.


In [1]:
import pandas as pd

df = pd.read_csv("tips.csv")
df.head()


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


## Value Counts Analysis


The value counts analysis provides an overview of the customer and order structure in the restaurant.
By examining the frequency of orders across different categories, it is possible to identify when the restaurant experiences the highest traffic and which customer groups dominate.

The analysis shows clear differences in order distribution by day of the week and time of day. Dinner service accounts for the majority of orders, indicating that evenings are the primary operating period. Weekend days, particularly Saturday, generate the highest number of transactions, suggesting increased customer demand during this time.

From a customer perspective, the distribution by gender is relatively balanced, while most visits are associated with smaller table sizes, indicating that the restaurant is mainly visited by individuals or small groups rather than large parties.

These findings help establish a baseline understanding of customer behavior and serve as a foundation for further customer segmentation and business-focused analysis.


- How many % orders by day:

In [2]:
df["day"].value_counts(normalize=True) * 100

day
Sat     35.655738
Sun     31.147541
Thur    25.409836
Fri      7.786885
Name: proportion, dtype: float64

- How many % orders by time Lunch/Dinner

In [3]:
df["time"].value_counts(normalize=True) * 100

time
Dinner    72.131148
Lunch     27.868852
Name: proportion, dtype: float64

- % of clients by sex

In [4]:
df["sex"].value_counts(normalize=True) * 100

sex
Male      64.344262
Female    35.655738
Name: proportion, dtype: float64

- % of groups by size

In [5]:
df["size"].value_counts(normalize=True) * 100

size
2    63.934426
3    15.573770
4    15.163934
5     2.049180
1     1.639344
6     1.639344
Name: proportion, dtype: float64

Conclusions (Value Counts)
Orders by day of week

Saturday generates the highest share of orders (~35.66%), followed by Sunday (~31.15%).

Thursday is also strong (~25.41%), while Friday is clearly the weakest day (~7.79%).

Overall, demand is heavily concentrated around the weekend, with Thursday acting as an additional high-traffic day.

Orders by time (Lunch vs Dinner)

Most orders happen at Dinner (~72.13%), while Lunch represents only ~27.87%.

This suggests the restaurant is primarily a dinner-driven business, so staffing and inventory planning should prioritize evening service.

Client distribution by sex

The dataset is male-skewed: Male ~64.34% vs Female ~35.66%.

Any customer behavior analysis (tips, total bill, etc.) should account for this imbalance to avoid misleading comparisons.

Group size distribution

The most common group size is 2 people (~63.93%), indicating that couples/pairs are the dominant customer type.

Medium groups are much less frequent: 3 people ~15.57% and 4 people ~15.16%.

Large groups are rare (5+ people ~2.05%, 6 people ~1.64%), and solo dining is also uncommon (1 person ~1.64%).

## Customer Segmentation
Analysis by bill size, day type and behavior.


- Definition

Customer segmentation is the process of dividing customers into groups based on similar behavior and characteristics.
In this analysis, customers are segmented based on bill value and visit timing to identify meaningful differences in spending and tipping patterns.

In [6]:
df["bill_segment"] = pd.cut(
    df["total_bill"],
    bins=[0, 15, 30, df["total_bill"].max()],
    labels=["Low", "Medium", "High"]
)


Number of customers per segment

In [7]:
df["bill_segment"].value_counts()

bill_segment
Medium    132
Low        80
High       32
Name: count, dtype: int64

- Average tip per segment

In [8]:
df.groupby("bill_segment", observed=True)["tip"].mean()



bill_segment
Low       2.050250
Medium    3.188636
High      4.583125
Name: tip, dtype: float64

* Note:
`observed=True` ensures that only bill segments actually present in the data
are included in the aggregation, avoiding empty categories.


## Business Case: Most Valuable Customers
Key findings and conclusions.

### Definition of Most Valuable Customers

For the purpose of this analysis, the most valuable customers are defined as those who:
- generate higher than average total bills,
- leave above average tips,
- place orders during high-revenue periods (e.g. weekends or dinner time).

This definition focuses on both revenue and tipping behavior rather than order volume alone.

In [9]:
avg_bill = df["total_bill"].mean()
avg_tip = df["tip"].mean()

avg_bill, avg_tip

df["total_bill_above_avg"] = df["total_bill"] > avg_bill
df["tip_above_avg"] = df["tip"] > avg_tip

df[["total_bill_above_avg", "tip_above_avg"]].value_counts()


total_bill_above_avg  tip_above_avg
False                 False            102
True                  True              78
False                 True              43
True                  False             21
Name: count, dtype: int64

Customers who exceed the average bill and tip thresholds represent a smaller but potentially high-value segment.


In [10]:
most_valuable = df[
    (df["total_bill_above_avg"]) &
    (df["tip_above_avg"])
]

most_valuable.shape


(78, 10)

In [11]:
most_valuable["total_bill"].mean(), df["total_bill"].mean()


(np.float64(28.94794871794872), np.float64(19.78594262295082))

In [12]:
most_valuable["tip"].mean(), df["tip"].mean()


(np.float64(4.383461538461539), np.float64(2.99827868852459))

### Key Finding

Customers who generate both above-average bills and above-average tips represent a smaller segment of the customer base, but they contribute disproportionately higher revenue and tipping value compared to the average customer.


In [13]:
other_customers = df[~df.index.isin(most_valuable.index)]


In [14]:
comparison = {
    "Average Bill": [
        most_valuable["total_bill"].mean(),
        other_customers["total_bill"].mean()
    ],
    "Average Tip": [
        most_valuable["tip"].mean(),
        other_customers["tip"].mean()
    ],
    "Number of Orders": [
        most_valuable.shape[0],
        other_customers.shape[0]
    ]
}

comparison_df = pd.DataFrame(
    comparison,
    index=["Most Valuable Customers", "Other Customers"]
)

comparison_df


Unnamed: 0,Average Bill,Average Tip,Number of Orders
Most Valuable Customers,28.947949,4.383462,78
Other Customers,15.480904,2.34741,166


The comparison shows that most valuable customers generate significantly higher average bills and tips per order, despite representing a smaller portion of the total customer base.


## Business Recommendation

Based on the analysis, the restaurant should focus on retaining and attracting customers who generate above-average bills and tips, particularly during high-revenue periods such as weekend dinners.

Potential actions include targeted promotions, personalized offers, or loyalty incentives aimed at encouraging repeat visits from this high-value customer segment.
