# üõí **Supermarket Sales Data Analysis** üìä

<h2 style="font-family: 'poppins'; font-weight: bold;">üë®‚ÄçüíªAuthor: Muhammad Hassan Saboor</h2>

[![GitHub](https://img.shields.io/badge/GitHub-Profile-blue?style=for-the-badge&logo=github)](https://github.com/MuhammadHassanSaboor) 
[![Kaggle](https://img.shields.io/badge/Kaggle-Profile-blue?style=for-the-badge&logo=kaggle)](https://www.kaggle.com/mhassansaboor) 
[![LinkedIn](https://img.shields.io/badge/LinkedIn-Profile-blue?style=for-the-badge&logo=linkedin)](https://www.linkedin.com/in/muhammad-hassan-saboor/)  
[![Facebook](https://img.shields.io/badge/Facebook-Profile-blue?style=for-the-badge&logo=facebook)](https://www.facebook.com/profile.php?id=61555194218257) 
[![Twitter/X](https://img.shields.io/badge/Twitter-Profile-blue?style=for-the-badge&logo=twitter)](https://twitter.com/MUHAMMA84929767) 
[![Instagram](https://img.shields.io/badge/Instagram-Profile-blue?style=for-the-badge&logo=instagram)](https://www.instagram.com/m_hassan_saboor/) 

### üìã **Metadata**:

This dataset contains **sales transactions** from a **supermarket** across multiple branches in three major cities: **New York**, **Chicago**, and **Los Angeles**. The data includes key information about the sales transactions, customer demographics, and product details. Below is an overview of the columns present in the dataset:

- **Sale ID**: Unique sales identifier for each transaction.
- **Branch**: The supermarket branch where the sale occurred (New York, Chicago, Los Angeles).
- **City**: The city where the supermarket branch is located (New York, Chicago, Los Angeles).
- **Customer Type**: Customer classification as **Member** or **Normal**. Members earn **reward points** for each purchase.
- **Gender**: Gender of the customer (**Male** or **Female**).
- **Product Name**: Name of the product sold in the transaction.
- **Product Category**: The category of the product sold.
- **Unit Price**: The price of a single unit of the product.
- **Quantity**: The number of units of the product sold in the transaction.
- **Tax**: 7% sales tax for each product.
- **Total Price**: The total price of the transaction after tax.
- **Reward Points**: Reward points earned by **Member** customers based on the total transaction amount.

---

### üìä **Exploratory Data Analysis (EDA)**:

In this **Exploratory Data Analysis (EDA)**, we aimed to derive **key insights** from the **supermarket sales data** through a comprehensive set of **visualizations** and **statistical analysis**. Below are the **major aspects** that we explored:

#### 1. üè™ **Branch and City Analysis**:
   - **Sales Distribution**: We examined how sales are distributed across different **branches** (New York, Chicago, and Los Angeles) and **cities**.
   - **Top Branch/City**: We identified the **highest** and **lowest** sales by analyzing total sales and quantities sold in each branch and city.
   - **Pricing Analysis**: We compared the **average unit price** and **total price** across branches and cities to uncover pricing trends.

#### 2. üë• **Customer Analysis**:
   - **Customer Type**: We analyzed the sales distribution based on **customer type** (Members vs. Normal), highlighting differences in purchasing behavior.
   - **Gender-based Sales**: We explored sales based on **gender**, identifying which gender purchased more (in terms of quantity) and the **average spending** for each gender.

#### 3. üì¶ **Product Analysis**:
   - **Top-Selling Products**: We identified the **top-selling products** based on **quantity** and **total price**.
   - **Product Category Contribution**: We examined which **product categories** generated the most revenue.
   - **Average Pricing**: We calculated the **average unit price** of products across different categories to understand price distribution.
   - **City/Branch Popularity**: We explored which products are most **popular** in each city and branch.

#### 4. üí∏ **Pricing and Revenue Analysis**:
   - **Correlation Analysis**: We explored the relationships between **unit price**, **quantity**, **tax**, and **total price** to identify pricing patterns.
   - **Price vs. Sales**: We investigated **products with high unit prices** but low sales (and vice versa) to uncover pricing inefficiencies.
   - **Tax Contribution**: We examined the tax contribution across different branches and cities to understand how tax affects revenue generation.

#### 5. üíé **Reward Points Analysis**:
   - **Average Reward Points**: We analyzed the **average reward points** earned per transaction.
   - **Reward Points Effectiveness**: We explored the **relationship between reward points** and **total spending**, evaluating how reward points impact customer purchases.
   - **Reward Points by Branch**: We examined which branches had the highest **average reward points** per sale.

#### 6. üìà **Performance Metrics**:
   - **Revenue Analysis**: We identified the **highest revenue-generating branch**, **product**, and **customer type**.
   - **Sales Trends**: We explored **monthly** or **daily sales trends** to understand purchasing patterns over time.

#### 7. üßê **Outliers and Anomalies**:
   - **Extreme Sales Values**: We identified **unusual sales data**, such as transactions with **extremely high or low total prices**.
   - **Unusual Quantities**: We explored transactions with **extremely high quantities** purchased to detect anomalies.

#### 8. üèôÔ∏è **Demographic Insights**:
   - **Gender Preferences**: We analyzed **gender preferences** across different **product categories** and **branches**.
   - **City-wise Behavior**: We explored **city-wise** differences in **customer behavior**, such as average spending and preferred product categories.

#### 9. üí∞ **Tax and Reward System Efficiency**:
   - **Tax-to-Total Price Ratio**: We calculated the **tax-to-total price ratio** across different **branches** and **products** to analyze tax efficiency.
   - **Effectiveness of Reward Points**: We analyzed the **effectiveness of reward points** in encouraging higher purchases and increasing customer spending.

---

### üèÅ **Conclusion**:

This **EDA** has provided valuable insights into various aspects of the supermarket‚Äôs sales data. We have gained a better understanding of sales trends, customer behavior, product performance, pricing strategies, and the effectiveness of reward points. These insights can help the supermarket optimize operations, refine pricing strategies, and design more targeted marketing campaigns for different customer segments.

---

üîç **Next Steps**:
- **Predictive Modeling**: Implementing machine learning models to predict sales, customer spending, and the likelihood of reward point redemptions.
- **Sales Forecasting**: Creating models to forecast future sales trends and identify peak seasons for promotions and discounts.
- **Customer Segmentation**: Analyzing customer data for segmentation to design personalized offers based on customer behavior.


# üìö Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# üì• Loading the Dataset

In [2]:
df = pd.read_csv("/kaggle/input/supermarket-sales/sales.csv")

# üìä Exploring the Dataset

In [3]:
df.head()

Unnamed: 0,sale_id,branch,city,customer_type,gender,product_name,product_category,unit_price,quantity,tax,total_price,reward_points
0,1,A,New York,Member,Male,Shampoo,Personal Care,5.5,3,1.16,17.66,1
1,2,B,Los Angeles,Normal,Female,Notebook,Stationery,2.75,10,1.93,29.43,0
2,3,A,New York,Member,Female,Apple,Fruits,1.2,15,1.26,19.26,1
3,4,A,Chicago,Normal,Male,Detergent,Household,7.8,5,2.73,41.73,0
4,5,B,Los Angeles,Member,Female,Orange Juice,Beverages,3.5,7,1.72,26.22,2


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   sale_id           1000 non-null   int64  
 1   branch            1000 non-null   object 
 2   city              1000 non-null   object 
 3   customer_type     1000 non-null   object 
 4   gender            1000 non-null   object 
 5   product_name      1000 non-null   object 
 6   product_category  1000 non-null   object 
 7   unit_price        1000 non-null   float64
 8   quantity          1000 non-null   int64  
 9   tax               1000 non-null   float64
 10  total_price       1000 non-null   float64
 11  reward_points     1000 non-null   int64  
dtypes: float64(3), int64(3), object(6)
memory usage: 93.9+ KB


# üìä **Exploratory Data Analysis (EDA)**:

## 1. üè™ **Branch and City Analysis**:

#### Sales distribution across different branches and cities

In [5]:
branch_city_sales = df.groupby(["branch", "city"])["total_price"].sum().reset_index()
fig1 = px.bar(
    branch_city_sales,
    x="branch",
    y="total_price",
    color="city",
    title="Sales Distribution by Branch and City",
    text="total_price",
    color_discrete_sequence=px.colors.qualitative.Bold
)
fig1.update_layout(
    template="plotly_dark",
    xaxis_title="Branch",
    yaxis_title="Total Sales",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig1.show()

#### Highest and lowest sales by city

In [6]:
city_sales = df.groupby("city")["total_price"].sum().reset_index()
fig2 = px.bar(
    city_sales,
    x="city",
    y="total_price",
    title="Total Sales by City",
    text="total_price",
    color="total_price",
    color_continuous_scale="reds"
)
fig2.update_layout(
    template="plotly_dark",
    xaxis_title="City",
    yaxis_title="Total Sales",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig2.show()

#### Compare the average unit price and total price per branch/city

In [7]:
branch_city_avg = df.groupby(["branch", "city"])[["unit_price", "total_price"]].mean().reset_index()
fig3 = px.bar(
    branch_city_avg,
    x="branch",
    y="unit_price",
    color="city",
    title="Average Unit Price per Branch/City",
    text="unit_price",
    barmode="group",
    color_discrete_sequence=px.colors.qualitative.Dark24
)
fig3.update_layout(
    template="plotly_dark",
    xaxis_title="Branch",
    yaxis_title="Average Unit Price",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig3.show()

## 2. üë• **Customer Analysis**:

#### Distribution of sales by customer type

In [8]:
customer_sales = df.groupby("customer_type")["total_price"].sum().reset_index()
fig1 = px.pie(
    customer_sales,
    values="total_price",
    names="customer_type",
    title="Sales Distribution by Customer Type",
    color_discrete_sequence=px.colors.sequential.RdBu
)
fig1.update_layout(
    template="plotly_dark",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig1.show()

#### Gender-based quantity analysis

In [9]:
gender_quantity = df.groupby("gender")["quantity"].sum().reset_index()
fig2 = px.bar(
    gender_quantity,
    x="gender",
    y="quantity",
    title="Total Quantity Purchased by Gender",
    text="quantity",
    color="gender",
    color_discrete_sequence=px.colors.qualitative.Vivid
)
fig2.update_layout(
    template="plotly_dark",
    xaxis_title="Gender",
    yaxis_title="Total Quantity",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig2.show()

#### Average spending by gender

In [10]:
gender_spending = df.groupby("gender")["total_price"].mean().reset_index()
fig3 = px.bar(
    gender_spending,
    x="gender",
    y="total_price",
    title="Average Spending by Gender",
    text="total_price",
    color="gender",
    color_discrete_sequence=px.colors.qualitative.Pastel
)
fig3.update_layout(
    template="plotly_dark",
    xaxis_title="Gender",
    yaxis_title="Average Spending (Total Price)",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig3.show()

## 3. üì¶ **Product Analysis**:

#### Top-selling products based on quantity

In [11]:
top_products_quantity = df.groupby("product_name")["quantity"].sum().reset_index().sort_values(by="quantity", ascending=False).head(10)
fig1 = px.bar(
    top_products_quantity,
    x="product_name",
    y="quantity",
    title="Top-Selling Products by Quantity",
    text="quantity",
    color="quantity",
    color_continuous_scale="viridis"
)
fig1.update_layout(
    template="plotly_dark",
    xaxis_title="Product Name",
    yaxis_title="Total Quantity Sold",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig1.show()

#### Top-selling products based on total price

In [12]:
top_products_revenue = df.groupby("product_name")["total_price"].sum().reset_index().sort_values(by="total_price", ascending=False).head(10)
fig2 = px.bar(
    top_products_revenue,
    x="product_name",
    y="total_price",
    title="Top-Selling Products by Revenue",
    text="total_price",
    color="total_price",
    color_continuous_scale="plasma"
)
fig2.update_layout(
    template="plotly_dark",
    xaxis_title="Product Name",
    yaxis_title="Total Revenue",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig2.show()

#### Product categories contributing the most to revenue

In [13]:
category_revenue = df.groupby("product_category")["total_price"].sum().reset_index().sort_values(by="total_price", ascending=False)
fig3 = px.pie(
    category_revenue,
    values="total_price",
    names="product_category",
    title="Product Categories Contribution to Revenue",
    color_discrete_sequence=px.colors.sequential.Tealgrn
)
fig3.update_layout(
    template="plotly_dark",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig3.show()

#### Average unit price of products across categories

In [14]:
category_avg_price = df.groupby("product_category")["unit_price"].mean().reset_index().sort_values(by="unit_price", ascending=False)
fig4 = px.bar(
    category_avg_price,
    x="product_category",
    y="unit_price",
    title="Average Unit Price Across Categories",
    text="unit_price",
    color="unit_price",
    color_continuous_scale="cividis"
)
fig4.update_layout(
    template="plotly_dark",
    xaxis_title="Product Category",
    yaxis_title="Average Unit Price",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig4.show()

#### Popular products in each branch or city

In [15]:
popular_products_branch_city = df.groupby(["branch", "product_name"])["quantity"].sum().reset_index().sort_values(by="quantity", ascending=False)
fig5 = px.sunburst(
    popular_products_branch_city,
    path=["branch", "product_name"],
    values="quantity",
    title="Popular Products in Each Branch",
    color="quantity",
    color_continuous_scale="reds"
)
fig5.update_layout(
    template="plotly_dark",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig5.show()

## 4. üí∏ **Pricing and Revenue Analysis**:

#### Correlation between unit_price, quantity, tax, and total_price

In [16]:
fig1 = px.scatter_matrix(
    df,
    dimensions=["unit_price", "quantity", "tax", "total_price"],
    title="Correlation Between Unit Price, Quantity, Tax, and Total Price",
    color="branch",
    color_discrete_sequence=px.colors.qualitative.Plotly
)
fig1.update_layout(
    template="plotly_dark",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig1.show()

#### Products with the highest unit price but least sales (or vice versa)

In [17]:
product_price_sales = df.groupby("product_name").agg(
    total_quantity=("quantity", "sum"),
    avg_unit_price=("unit_price", "mean")
).reset_index()

fig2 = px.scatter(
    product_price_sales,
    x="avg_unit_price",
    y="total_quantity",
    text="product_name",
    title="Products: High Unit Price vs. Low Sales (or Vice Versa)",
    color="total_quantity",
    size="avg_unit_price",
    color_continuous_scale="Viridis"
)
fig2.update_traces(textposition="top center")
fig2.update_layout(
    template="plotly_dark",
    xaxis_title="Average Unit Price",
    yaxis_title="Total Quantity Sold",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig2.show()


#### Tax contribution across different branches/cities

In [18]:
branch_tax = df.groupby("branch")["tax"].sum().reset_index()
city_tax = df.groupby("city")["tax"].sum().reset_index()

fig = make_subplots(
    rows=1, cols=2,  
    subplot_titles=("Tax Contribution Across Branches", "Tax Contribution Across Cities"),
    horizontal_spacing=0.2
)

fig.add_trace(
    go.Bar(
        x=branch_tax["branch"],
        y=branch_tax["tax"],
        text=branch_tax["tax"],
        marker=dict(color=px.colors.qualitative.Dark24),
        name="Branch Tax"
    ),
    row=1, col=1
)

# Add the bar chart for cities
fig.add_trace(
    go.Bar(
        x=city_tax["city"],
        y=city_tax["tax"],
        text=city_tax["tax"],
        marker=dict(color=px.colors.qualitative.Vivid),
        name="City Tax"
    ),
    row=1, col=2
)

fig.update_layout(
    template="plotly_dark",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black",
    xaxis1_title="Branch",
    yaxis1_title="Total Tax Contribution",
    xaxis2_title="City",
    yaxis2_title="Total Tax Contribution",
    showlegend=False 
)

fig.show()

## 5. üíé **Reward Points Analysis**:

#### Average reward points earned per transaction

In [19]:
avg_reward_points = df.groupby("sale_id")["reward_points"].mean().reset_index()
fig1 = px.histogram(
    avg_reward_points,
    x="reward_points",
    title="Average Reward Points Earned Per Transaction",
    color_discrete_sequence=["#f39c12"]
)
fig1.update_layout(
    template="plotly_dark",
    xaxis_title="Average Reward Points",
    yaxis_title="Count of Transactions",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig1.show()

#### Relationship between reward_points and total_price

In [20]:
fig2 = px.scatter(
    df,
    x="reward_points",
    y="total_price",
    title="Relationship Between Reward Points and Total Price",
    color="branch",
    color_discrete_sequence=px.colors.qualitative.Set1
)
fig2.update_layout(
    template="plotly_dark",
    xaxis_title="Reward Points",
    yaxis_title="Total Price",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig2.show()

#### Branches with the highest average reward points per sale

In [21]:
branch_avg_reward = df.groupby("branch")["reward_points"].mean().reset_index().sort_values(by="reward_points", ascending=False)
fig3 = px.bar(
    branch_avg_reward,
    x="branch",
    y="reward_points",
    title="Branches with the Highest Average Reward Points per Sale",
    text="reward_points",
    color="reward_points",
    color_continuous_scale="Blues"
)
fig3.update_layout(
    template="plotly_dark",
    xaxis_title="Branch",
    yaxis_title="Average Reward Points",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig3.show()

## 6. üìà **Performance Metrics**:

#### Highest revenue-generating branch

In [22]:
branch_revenue = df.groupby("branch")["total_price"].sum().reset_index().sort_values(by="total_price", ascending=False)
fig1 = px.bar(
    branch_revenue,
    x="branch",
    y="total_price",
    title="Highest Revenue-Generating Branches",
    text="total_price",
    color="total_price",
    color_continuous_scale="Teal"
)
fig1.update_layout(
    template="plotly_dark",
    xaxis_title="Branch",
    yaxis_title="Total Revenue",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig1.show()

#### Highest revenue-generating product

In [23]:
product_revenue = df.groupby("product_name")["total_price"].sum().reset_index().sort_values(by="total_price", ascending=False)
fig2 = px.bar(
    product_revenue,
    x="product_name",
    y="total_price",
    title="Highest Revenue-Generating Products",
    text="total_price",
    color="total_price",
    color_continuous_scale="Viridis"
)
fig2.update_layout(
    template="plotly_dark",
    xaxis_title="Product Name",
    yaxis_title="Total Revenue",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig2.show()

#### Highest revenue-generating customer type

In [24]:
customer_type_revenue = df.groupby("customer_type")["total_price"].sum().reset_index().sort_values(by="total_price", ascending=False)
fig3 = px.bar(
    customer_type_revenue,
    x="customer_type",
    y="total_price",
    title="Revenue-Generating Customer Type",
    text="total_price",
    color="total_price",
    color_continuous_scale="Cividis"
)
fig3.update_layout(
    template="plotly_dark",
    xaxis_title="Customer Type",
    yaxis_title="Total Revenue",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig3.show()

## 7. üßê **Outliers and Anomalies**:

#### Identify extreme outliers for total_price using IQR (Interquartile Range)

In [25]:
Q1 = df['total_price'].quantile(0.25)
Q3 = df['total_price'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

In [26]:
outliers = df[(df['total_price'] < lower_bound) | (df['total_price'] > upper_bound)]
fig1 = px.scatter(
    outliers,
    x="sale_id",
    y="total_price",
    title="Outliers in Total Price",
    color="total_price",
    color_continuous_scale="Reds"
)
fig1.update_layout(
    template="plotly_dark",
    xaxis_title="Sale ID",
    yaxis_title="Total Price",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig1.show()

#### Identify the transactions with the highest quantity purchased

In [27]:
max_quantity = df.sort_values(by="quantity", ascending=False).head(10)
fig2 = px.bar(
    max_quantity,
    x="sale_id",
    y="quantity",
    title="Transactions with Highest Quantity Purchased",
    text="quantity",
    color="quantity",
    color_continuous_scale="Blues"
)
fig2.update_layout(
    template="plotly_dark",
    xaxis_title="Sale ID",
    yaxis_title="Quantity Purchased",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig2.show()

## 8. üèôÔ∏è **Demographic Insights**:

#### Gender preference across product categories

In [28]:
gender_product_category = df.groupby(["gender", "product_category"])["total_price"].sum().reset_index()
fig1 = px.bar(
    gender_product_category,
    x="product_category",
    y="total_price",
    color="gender",
    title="Gender Preference for Product Categories",
    barmode="group",
    color_discrete_sequence=["#FF69B4", "#1E90FF"]
)
fig1.update_layout(
    template="plotly_dark",
    xaxis_title="Product Category",
    yaxis_title="Total Revenue",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig1.show()

#### Gender preference across branches

In [29]:
gender_branch = df.groupby(["gender", "branch"])["total_price"].sum().reset_index()
fig2 = px.bar(
    gender_branch,
    x="branch",
    y="total_price",
    color="gender",
    title="Gender Preference for Branches",
    barmode="group",
    color_discrete_sequence=["#FF69B4", "#1E90FF"]
)
fig2.update_layout(
    template="plotly_dark",
    xaxis_title="Branch",
    yaxis_title="Total Revenue",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig2.show()

#### Average spending by city

In [30]:
city_avg_spending = df.groupby("city")["total_price"].mean().reset_index()
fig3 = px.bar(
    city_avg_spending,
    x="city",
    y="total_price",
    title="Average Spending by City",
    color="total_price",
    color_continuous_scale="Viridis"
)
fig3.update_layout(
    template="plotly_dark",
    xaxis_title="City",
    yaxis_title="Average Spending",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig3.show()

#### Preferred product categories in each city

In [31]:
city_product_category = df.groupby(["city", "product_category"])["total_price"].sum().reset_index()
fig4 = px.bar(
    city_product_category,
    x="product_category",
    y="total_price",
    color="city",
    title="Preferred Product Categories in Each City",
    barmode="group",
    color_discrete_sequence=px.colors.qualitative.Set2
)
fig4.update_layout(
    template="plotly_dark",
    xaxis_title="Product Category",
    yaxis_title="Total Revenue",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig4.show()

## 9. üí∞ **Tax and Reward System Efficiency**:

#### Calculate tax-to-total_price ratio

In [32]:
df['tax_to_total_price'] = df['tax'] / df['total_price']

#### Tax-to-total_price ratio by branch

In [33]:
tax_branch = df.groupby("branch")["tax_to_total_price"].mean().reset_index()
fig1 = px.bar(
    tax_branch,
    x="branch",
    y="tax_to_total_price",
    title="Tax-to-Total Price Ratio by Branch",
    color="tax_to_total_price",
    color_continuous_scale="Blues"
)
fig1.update_layout(
    template="plotly_dark",
    xaxis_title="Branch",
    yaxis_title="Tax-to-Total Price Ratio",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig1.show()

#### Tax-to-total_price ratio by product category

In [34]:
tax_product_category = df.groupby("product_category")["tax_to_total_price"].mean().reset_index()
fig2 = px.bar(
    tax_product_category,
    x="product_category",
    y="tax_to_total_price",
    title="Tax-to-Total Price Ratio by Product Category",
    color="tax_to_total_price",
    color_continuous_scale="Reds"
)
fig2.update_layout(
    template="plotly_dark",
    xaxis_title="Product Category",
    yaxis_title="Tax-to-Total Price Ratio",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig2.show()

#### Calculate the correlation between reward points and total price

In [35]:
reward_effectiveness = df.groupby("reward_points")["total_price"].mean().reset_index()
fig3 = px.scatter(
    reward_effectiveness,
    x="reward_points",
    y="total_price",
    title="Effectiveness of Reward Points on Purchases",
    color="total_price",
    color_continuous_scale="Viridis"
)
fig3.update_layout(
    template="plotly_dark",
    xaxis_title="Reward Points",
    yaxis_title="Average Total Price",
    font=dict(color="white"),
    plot_bgcolor="black",
    paper_bgcolor="black"
)
fig3.show()

## üôè **Thank You!**

Thank you for taking the time to explore this **Supermarket Sales Data Analysis** with me. I hope the insights and visualizations presented have provided valuable information and a deeper understanding of the supermarket's performance across various factors like **sales trends**, **customer behavior**, and **product performance**. 

Feel free to reach out with any questions, comments, or suggestions ‚Äî I would love to hear your thoughts! üòä

Wishing you all the best in your own data analysis journey. Happy analyzing! üöÄüìä

---

**Muhammad Hassan Saboor**  
*Data Analyst | ML & Deep Learning Enthusiast*  
*December 2024*
