![SuperStore_ERD](../../imgs/SuperStore_ERD.png)

# Cleaning a PostgreSQL Database
![Clean PostgreSQL Database](../../imgs/Project_Image.jpeg)

In this project, you will work with data from a hypothetical Super Store to challenge and enhance your SQL skills in data cleaning. This project will engage you in identifying top categories based on the highest profit margins and detecting missing values, utilizing your comprehensive knowledge of SQL concepts.

## Data Dictionary:

### `orders`:
| Column | Definition | Data type | Comments |
|--------|------------|-----------|----------|
| `row_id`| Unique Record ID | `INTEGER` |
| `order_id` | Identifier for each order in table | `TEXT` | Connects to `order_id` in `returned_orders` table |
| `order_date` | Date when order was placed | `TEXT` |
| `market` | Market order_id belongs to | `TEXT` |
| `region` | Region Customer belongs to | `TEXT` | Connects to `region` in `people` table |
| `product_id` | Identifier of Product bought | `TEXT` | Connects to `product_id` in `products` table |
| `sales` | Total Sales Amount for the Line Item | `DOUBLE PRECISION` |
| `quantity` | Total Quantity for the Line Item | `DOUBLE PRECISION` |
| `discount` | Discount applied for the Line Item | `DOUBLE PRECISION` |
| `profit` | Total Profit earned on the Line Item | `DOUBLE PRECISION` |

### `returned_orders`:
| Column | Definition | Data type |
|--------|------------|-----------|
| `returned`| Yes values for Order / Line Item Returned | `TEXT` |
| `order_id` | Identifier for each order in table | `TEXT` |
| `market` | Market order_id belongs to | `TEXT` |

### `people`:
| Column | Definition | Data type |
|--------|------------|-----------|
| `person`| Name of Salesperson credited with Order | `TEXT` |
| `region` | Region Salesperson in operating in | `TEXT` |

### `products`:
| Column | Definition | Data type |
|--------|------------|-----------|
| `product_id`| Unique Identifier for the Product | `TEXT` |
| `category` | Category Product belongs to | `TEXT` |
| `sub_category` | Sub Category Product belongs to | `TEXT` |
| `product_name` | Detailed Name of the Product | `TEXT` |

### As you can see in the Data Dictionary above, date fields have been written to the `orders` table as `TEXT` and numeric fields like sales, profit, etc. have been written to the `orders` table as `Double Precision`. You will need to take care of these types in some of the queries. This project is an excellent opportunity to apply your SQL skills in a practical setting and gain valuable experience in data cleaning and analysis. Good luck, and happy querying!

In [46]:
-- top_five_products_each_category
with tmp as (
	select 			p.category as category, --p.product_id as product_id, 
					p.product_name as product_name,
					--sum(o.profit) as product_total_profit,
					sum(o.profit) as total_profit,
					--sum(o.sales) as product_total_sales, 
					sum(o.sales) as total_sales, 
					row_number() over (partition by category order by sum(o.profit) desc) as product_rank
	from 			products as p 	inner join orders as o on p.product_id = o.product_id
	group by		category, product_name
)
select 			*
from			tmp
where			product_rank <= 5

Unnamed: 0,category,product_name,total_profit,total_sales,product_rank
0,Furniture,"Sauder Classic Bookcase, Traditional",10672.073,39108.303,1
1,Furniture,"Harbour Creations Executive Leather Armchair, ...",10427.326,50121.516,2
2,Furniture,"Bush Classic Bookcase, Pine",7477.4665,20887.0273,3
3,Furniture,"SAFCO Executive Leather Armchair, Black",7154.28,41923.53,4
4,Furniture,"Dania Classic Bookcase, Pine",6565.0146,25630.4946,5
5,Office Supplies,"Hoover Stove, Red",11651.681,32644.131,1
6,Office Supplies,Fellowes PB500 Electric Punch Plastic Comb Bin...,7753.039,27453.384,2
7,Office Supplies,"Rogers Lockers, Single Width",6755.184,20493.364,3
8,Office Supplies,"Eldon Lockers, Industrial",6485.4615,17825.3415,4
9,Office Supplies,"Hamilton Beach Stove, Silver",5989.749,28657.049,5


In [47]:
-- salesperson_market_sales_details
select			ppl.person as person, 		--ppl.region, 
				ord.market as market,
				case 
					when ord.sales < 100 then '0-100'
					when ord.sales < 500 then '100-500'
					else '500+'
				end as sales_bin,
				count(distinct ord.order_id) as order_counts,
				--sum(case when rtn_ord.returned = 'Yes' then 1 else 0 end) as returned_orders,
				sum(case when rtn_ord.returned = 'Yes' then 1 else 0 end) as orders_returned,
				sum(ord.sales) as total_sales,
				sum(case when rtn_ord.returned = 'Yes' then ord.sales else 0 end) as returned_sales
from			people as ppl 	inner join 	orders as ord on ppl.region = ord.region
								left join 	returned_orders as rtn_ord on ord.order_id = rtn_ord.order_id
											and ord.market = rtn_ord.market
group by 		sales_bin, ppl.person, ord.market
order by 		ppl.person, ord.market, sales_bin

Unnamed: 0,person,market,sales_bin,order_counts,orders_returned,total_sales,returned_sales
0,Alejandro Ballentine,APAC,0-100,995,69,66548.57,3440.565
1,Alejandro Ballentine,APAC,100-500,797,59,263646.2,15190.2387
2,Alejandro Ballentine,APAC,500+,428,25,554228.4,25047.594
3,Anna Andreadi,EU,0-100,1662,159,123939.8,7951.059
4,Anna Andreadi,EU,100-500,1524,162,529997.1,37894.53
5,Anna Andreadi,EU,500+,818,61,1066616.0,61685.2095
6,Anna Andreadi,LATAM,0-100,1080,77,70412.08,2964.17896
7,Anna Andreadi,LATAM,100-500,734,43,217929.4,10151.38944
8,Anna Andreadi,LATAM,500+,286,9,312168.6,9244.36624
9,Anna Andreadi,US,0-100,904,0,42547.38,0.0


In [48]:
-- impute_missing_values
with missing as (
	select 			product_id, discount, market, region,
					sales, quantity
					--0 as calculated_quantity
	from			orders as ord
	where			ord.quantity is null
)
select 			distinct mis.*, 
				--CAST(ord.sales / ord.quantity as NUMERIC) as unit_price
				ROUND(CAST(mis.sales / (ord.sales / ord.quantity) as NUMERIC), 0) as calculated_quantity
from			orders as ord right join missing as mis 	on mis.product_id = ord.product_id and
															mis.discount = ord.discount
where			ord.quantity is not null
order by		product_id

Unnamed: 0,product_id,discount,market,region,sales,quantity,calculated_quantity
0,FUR-ADV-10000571,0.0,EMEA,EMEA,438.96,,4
1,FUR-ADV-10004395,0.0,EMEA,EMEA,84.12,,2
2,FUR-BO-10001337,0.15,US,West,308.499,,3
3,TEC-STA-10003330,0.0,Africa,Africa,506.64,,2
4,TEC-STA-10004542,0.0,Africa,Africa,160.32,,4
