# Using DISTINCT, GROUP BY, and Aggregation in PostgreSQL

This notebook covers how to remove duplicates, group rows by columns, and perform aggregate calculations with PostgreSQL.

In [2]:
%load_ext sql

In [3]:
%sql postgresql://fahad:secret@localhost:5432/people

---
## 1. Create Sample Table

We’ll create a table with duplicate entries for demonstration of `DISTINCT` and `GROUP BY`.

In [4]:
%%sql
DROP TABLE IF EXISTS sales CASCADE;
CREATE TABLE sales (
    id SERIAL PRIMARY KEY,
    product VARCHAR(50),
    region VARCHAR(50),
    amount NUMERIC
);

 * postgresql://fahad:***@localhost:5432/people
Done.
Done.


[]

---
## 2. Insert Sample Data

In [5]:
%%sql
INSERT INTO sales (product, region, amount)
VALUES
('Laptop', 'Dubai', 1200),
('Laptop', 'Dubai', 1200),
('Laptop', 'Abu Dhabi', 1100),
('Phone', 'Dubai', 800),
('Phone', 'Abu Dhabi', 750),
('Tablet', 'Dubai', 400),
('Tablet', 'Dubai', 400);

 * postgresql://fahad:***@localhost:5432/people
7 rows affected.


[]

---
## 3. Using DISTINCT

Remove duplicate rows or select unique values

In [6]:
%%sql
-- Unique products
SELECT DISTINCT product FROM sales;

-- Unique combinations of product and region
SELECT DISTINCT product, region FROM sales;

 * postgresql://fahad:***@localhost:5432/people
3 rows affected.
5 rows affected.


product,region
Laptop,Abu Dhabi
Phone,Abu Dhabi
Phone,Dubai
Tablet,Dubai
Laptop,Dubai


---
## 4. GROUP BY with Aggregation

Group rows and compute aggregates like `SUM`, `AVG`, `COUNT`, `MAX`, `MIN`.

In [7]:
%%sql
-- Total sales per product
SELECT product, SUM(amount) AS total_sales
FROM sales
GROUP BY product;

-- Count of sales per region
SELECT region, COUNT(*) AS num_sales
FROM sales
GROUP BY region;

 * postgresql://fahad:***@localhost:5432/people
3 rows affected.
2 rows affected.


region,num_sales
Abu Dhabi,2
Dubai,5


---
## 5. GROUP BY Multiple Columns

Aggregate results by more than one column.

In [8]:
%%sql
SELECT product, region, SUM(amount) AS total_sales
FROM sales
GROUP BY product, region
ORDER BY product, region;

 * postgresql://fahad:***@localhost:5432/people
5 rows affected.


product,region,total_sales
Laptop,Abu Dhabi,1100
Laptop,Dubai,2400
Phone,Abu Dhabi,750
Phone,Dubai,800
Tablet,Dubai,800


---
## 6. Filtering Groups with HAVING

Use `HAVING` to filter groups after aggregation (like WHERE for rows).

In [9]:
%%sql
-- Only products with total sales above 1500
SELECT product, SUM(amount) AS total_sales
FROM sales
GROUP BY product
HAVING SUM(amount) > 1500;

 * postgresql://fahad:***@localhost:5432/people
2 rows affected.


product,total_sales
Phone,1550
Laptop,3500


---
## Notes

* `DISTINCT` removes duplicates, works on one or multiple columns.
* `GROUP BY` allows aggregation over sets of rows.
* Common aggregates: `SUM`, `AVG`, `COUNT`, `MIN`, `MAX`.
* Use `HAVING` to filter groups; use `WHERE` to filter rows before grouping.
* Always order your results for readability, especially when demonstrating in portfolio/notebooks.