# Grouping Data in SQL

## Introduction

The `GROUP BY` clause in SQL is used to aggregate data by one or more columns.  
Instead of analyzing individual rows, grouping enables category-level summarization.

It is commonly used to compute metrics such as counts, totals, averages, minimums, and maximums for each group.

Grouping forms the foundation of analytical reporting, KPI calculation, and business intelligence workflows.

---

## Topics Covered

- `GROUP BY`
- `HAVING`
- `GROUP BY` with multiple columns

### Database Connection 

In [2]:
%reload_ext sql
%config SqlMagic.style = '_DEPRECATED_DEFAULT'
%sql mysql+pymysql://root:Bhavesh%402025@localhost/customers

### Database Testing 

In [3]:
%%sql
SELECT version();

 * mysql+pymysql://root:***@localhost/customers
1 rows affected.


version()
8.0.42


In [4]:
%%sql 
Select * from customers ;

 * mysql+pymysql://root:***@localhost/customers
5 rows affected.


customer_id,name,city,age,signup_date,spend_amount,customer_type,is_active
1,Rahul,Mumbai,24,2023-01-10,12000,Regular,0
2,Ananya,Delhi,27,2023-02-15,18000,Regular,1
3,Aman,Pune,22,2023-03-01,7000,Regular,1
4,Neha,Mumbai,29,2023-01-20,22000,Premium,0
5,Karan,Bangalore,31,2023-04-05,15000,Premium,1


### Group BY 
GROUP BY groups rows that have the same values in specified columns and applies aggregation per group  

### How many customers are there in each city?

In [5]:
%%sql
select city,count(*) as total_customers 
from customers 
group by city ;

 * mysql+pymysql://root:***@localhost/customers
4 rows affected.


city,total_customers
Mumbai,2
Delhi,1
Pune,1
Bangalore,1


### What is the average age of customers in each city?

In [6]:
%%sql 
select city,avg(age) as Avrage_age from customers 
group by city;

 * mysql+pymysql://root:***@localhost/customers
4 rows affected.


city,Avrage_age
Mumbai,26.5
Delhi,27.0
Pune,22.0
Bangalore,31.0


## HAVING filters aggregated results, not individual rows.
WHERE → filters rows  
HAVING → filters groups  

### Show only cities having more than 1 customer.

In [7]:
%%sql 
select city,count(*) as total_customers
from customers
group by city 
having count(*) > 1 

 * mysql+pymysql://root:***@localhost/customers
1 rows affected.


city,total_customers
Mumbai,2


### Find cities where average age is greater than 25.

In [8]:
%%sql 
select city,avg(age) as avrage_age 
from customers 
group by city
having avrage_age > 25

 * mysql+pymysql://root:***@localhost/customers
3 rows affected.


city,avrage_age
Mumbai,26.5
Delhi,27.0
Bangalore,31.0


##  GROUP BY with Multiple Columns
Grouping by multiple columns creates sub-groups for deeper insights.

### Count customers by city and age.

In [9]:
%%sql 
select city,age, count(*) as customers_count
from customers 
group by city, age 

 * mysql+pymysql://root:***@localhost/customers
5 rows affected.


city,age,customers_count
Mumbai,24,1
Delhi,27,1
Pune,22,1
Mumbai,29,1
Bangalore,31,1


### Customer count by city and signup month.

In [10]:
%%sql 
select city,
    EXTRACT(MONTH FROM signup_date) AS signup_month,
    count(*) AS total_customers
from customers
group by city, EXTRACT(MONTH FROM signup_date);

 * mysql+pymysql://root:***@localhost/customers
4 rows affected.


city,signup_month,total_customers
Mumbai,1,2
Delhi,2,1
Pune,3,1
Bangalore,4,1


### Analysis Questions (Grouping Data)

### 1.Which city has the highest average age?

In [11]:
%%sql
SELECT
    city,
    AVG(age) AS avg_age
FROM customers
GROUP BY city
ORDER BY avg_age DESC
LIMIT 1;

 * mysql+pymysql://root:***@localhost/customers
1 rows affected.


city,avg_age
Bangalore,31.0


### 2.Which signup month has the maximum customers?

In [12]:
%%sql
select extract(month from signup_date) as signup_month,
       count(*) as total_customers
from customers
group by signup_month
order by total_customers desc
limit 1;

 * mysql+pymysql://root:***@localhost/customers
1 rows affected.


signup_month,total_customers
1,2


### Cities having customer count above overall average?

In [13]:
%%sql
SELECT
    city,
    COUNT(*) AS city_customers
FROM customers
GROUP BY city
HAVING COUNT(*) > (
    SELECT AVG(city_count)
    FROM (
        SELECT COUNT(*) AS city_count
        FROM customers
        GROUP BY city
    ) sub
);

 * mysql+pymysql://root:***@localhost/customers
1 rows affected.


city,city_customers
Mumbai,2


### Compare before vs after Feb-2023 customers per city

In [14]:
%%sql
SELECT
    city,
    SUM(CASE WHEN signup_date < '2023-02-01' THEN 1 ELSE 0 END) AS before_feb,
    SUM(CASE WHEN signup_date >= '2023-02-01' THEN 1 ELSE 0 END) AS after_feb
FROM customers
GROUP BY city;

 * mysql+pymysql://root:***@localhost/customers
4 rows affected.


city,before_feb,after_feb
Mumbai,2,0
Delhi,0,1
Pune,0,1
Bangalore,0,1


### Identify cities with consistent monthly signups

In [15]:
%%sql
SELECT
    city,
    COUNT(DISTINCT EXTRACT(MONTH FROM signup_date)) AS active_months
FROM customers
GROUP BY city
HAVING COUNT(DISTINCT EXTRACT(MONTH FROM signup_date)) > 1;

 * mysql+pymysql://root:***@localhost/customers
0 rows affected.


city,active_months
