<div style="background-color:#f4f8ff; padding:16px; border-left:6px solid #1f4fd8; border-radius:6px; color:#000;">

<h2 style="margin-top:0; color:#000;">Data Segmentation Analysis</h2>

<h4 style="color:#000;">Purpose</h4>
<ul>
  <li>Group data into meaningful categories for targeted insights.</li>
  <li>Support customer segmentation, product categorization, and regional analysis.</li>
</ul>

<h4 style="color:#000;">SQL Functions Used</h4>
<ul>
  <li><b>CASE</b> – Defines custom segmentation logic.</li>
  <li><b>GROUP BY</b> – Groups data into defined segments.</li>
</ul>

</div>


<div style="background-color:#f4f8ff; padding:10px; border-left:6px solid #1f4fd8; border-radius:6px; color:#000; font-size:20px;">
  <b>1. Segment products into cost ranges and count how many products fall into each segment</b><br>
  <span style="font-size:16px;"></span>
</div>

In [2]:
query = """
WITH product_segments AS (
    SELECT
        product_key,
        product_name,
        cost,
        CASE 
            WHEN cost < 100 THEN 'Below 100'
            WHEN cost BETWEEN 100 AND 500 THEN '100-500'
            WHEN cost BETWEEN 500 AND 1000 THEN '500-1000'
            ELSE 'Above 1000'
        END AS cost_range
    FROM gold.dim_products
)
SELECT 
    cost_range,
    COUNT(product_key) AS total_products
FROM product_segments
GROUP BY cost_range
ORDER BY total_products DESC;

"""

df = pd.read_sql(query, engine)
display(HTML(df.to_html(index=False)))

cost_range,total_products
Below 100,110
100-500,101
500-1000,45
Above 1000,39


<div style="background-color:#f4f8ff; padding:16px; border-left:6px solid #1f4fd8; border-radius:6px; color:#000;">

<strong style="font-size:20px;">
2. Group customers into three segments based on their spending behavior:
</strong>

<ul>
  <li>VIP: Customers with at least 12 months of history and spending more than €5,000.</li>
  <li>Regular: Customers with at least 12 months of history but spending €5,000 or less.</li>
  <li>New: Customers with a lifespan less than 12 months.</li>
</ul>

<strong>
And find the total number of customers by each group
</strong>

</div>


In [3]:
query = """
WITH customer_spending AS (
    SELECT
        c.customer_key,
        SUM(f.sales_amount) AS total_spending,
        MIN(order_date) AS first_order,
        MAX(order_date) AS last_order,
        DATEDIFF(month, MIN(order_date), MAX(order_date)) AS lifespan
    FROM gold.fact_sales f
    LEFT JOIN gold.dim_customers c
        ON f.customer_key = c.customer_key
    GROUP BY c.customer_key
)
SELECT 
    customer_segment,
    COUNT(customer_key) AS total_customers
FROM (
    SELECT 
        customer_key,
        CASE 
            WHEN lifespan >= 12 AND total_spending > 5000 THEN 'VIP'
            WHEN lifespan >= 12 AND total_spending <= 5000 THEN 'Regular'
            ELSE 'New'
        END AS customer_segment
    FROM customer_spending
) AS segmented_customers
GROUP BY customer_segment
ORDER BY total_customers DESC;

"""

df = pd.read_sql(query, engine)
display(HTML(df.to_html(index=False)))

customer_segment,total_customers
New,14631
Regular,2198
VIP,1655
