# Smart E-commerce Catalog Data Analysis

## 2. Data Analysis Part 1 - SQL Based EDA on E-commerce Data

### 1. Basic Exploration:
- Getting the total number of the products

In [0]:
%sql 
-- 1. Total Number of Products
SELECT COUNT(DISTINCT product_id) AS total_products 
FROM product_db.product_table;

total_products
1000


- There are 1000 product

- Grouping products by category and showing the count in descending order

In [0]:
%sql
-- 2. Category Distribution
SELECT category, COUNT(*) AS count 
FROM product_db.product_table 
GROUP BY category 
ORDER BY count DESC;


category,count
Toys,207
Books,206
Home,200
Electronics,199
Clothing,188


 There are :

- `207` from catefory `Toys`

- `206` from catefory `Books`

- `200` from catefory `Home`

- `199` from catefory `Electronic`

- `188` from catefory `Clothing`

- Calculating the percentage of missing values in the price and rating columns

In [0]:
%sql
-- 3. Missing Data Percentages
SELECT 
    COUNT(CASE WHEN price IS NULL THEN 1 END) * 100.0 / COUNT(*) AS price_missing_pct,
    COUNT(CASE WHEN rating IS NULL THEN 1 END) * 100.0 / COUNT(*) AS rating_missing_pct
FROM product_db.product_table;

price_missing_pct,rating_missing_pct
0.3,0.0


- `price_missing_pct`: `0.3%` of the entries in the price column are missing (set to NULL).
- `rating_missing_pct`: `0E-14` in this context represents an extremely small number, practically `zero`. It’s essentially `0%`, indicating that there are no missing values in the rating column.

### 2. Additional Metrics:
- Average and median prices per category:

In [0]:
%sql
SELECT 
    category, 
    AVG(price) AS avg_price, 
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY price) AS median_price
FROM product_db.product_table 
WHERE price IS NOT NULL
GROUP BY category;


category,avg_price,median_price
Home,265.2524508237839,271.5400085449219
Electronics,240.55414140585697,213.01499938964844
Clothing,243.02590379309143,242.629997253418
Books,256.9183824529835,254.7199935913086
Toys,265.2074882166397,271.25


**Key Insights**
- The `Home` category has the highest average and median prices.
- The `Electronics` category exhibits the lowest median price, despite having a relatively high average price.
- **Books** and `Toys` have similar average prices, with Toys showing a slightly higher median price.


- Average and total stock levels per category:

In [0]:
%sql
SELECT 
    category, 
    AVG(stock) AS avg_stock, 
    SUM(stock) AS total_stock 
FROM product_db.product_table 
GROUP BY category 
ORDER BY total_stock DESC;

category,avg_stock,total_stock
Toys,51.03864734299517,10565
Books,50.74271844660194,10453
Clothing,52.09574468085106,9794
Electronics,48.8643216080402,9724
Home,46.125,9225


**Key Insights**
- `Toys` category has the highest average stock, closely followed by `Clothing`.
- The `Home` category has the lowest average stock, with `Electronics` not far behind.
- The `Toys` category also leads in total stock, indicating a strong inventory presence in this segment.