# **Exploratory data analysis**

Understand and find insight about our datasets

In [None]:
-- Selecting the correct database 
USE DataWarehouse
-- Explore all object in DataBase
SELECT * FROM INFORMATION_SCHEMA.TABLES

In [None]:
-- Explore all colums in DataBase
SELECT * FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'crm_cust_info'

## **Dimension Exploration**

### **purpose**

- **Explore the structure of dimension tables**

**SQL Functions Used**

1. **DISTINCT**
2. **ORDER BY**

In [None]:
-- Exploring Distinct country values to see which countries we are dealing with

SELECT  DISTINCT 
    country 
FROM 
    gold.dim_customers
ORDER BY country 


In [None]:
-- Explore all categorizes (The major Divisions)
SELECT DISTINCT 
    category, subcategory, product_name 
FROM 
    gold.dim_products
ORDER BY category, subcategory, product_name

## **Date Range Exploration**

**Purpose**

- **To determine the temporal boundaries of key data points**
- **To understand the range of historical data**

**SQL Functions Used**

1. **MIN**
2. **MAX**
3. **DATEDIFF**

In [None]:
-- Finding minimum order_date : first order and maximum order_date : latest order
select 
	MIN(order_date) as first_order_date, 
	MAX(order_date) as last_order_date,
	DATEDIFF(year, MIN(order_date), MAX(order_date)) as order_time_stamp
from gold.fact_sales 

In [None]:
-- Finding yongest and oldest customers
select 
	min(birth_date) as oldest_customers,
	max(birth_date) as youngest_customers,
	DATEDIFF(year, min(birth_date), getdate()) as oldest_customers,
	DATEDIFF(year, max(birth_date), getdate()) as yongest_customers,
	DATEDIFF(year, min(birth_date), max(birth_date)) as age_range	
from gold.dim_customers

## **Measures Exploration**

**Purpose**

- calculate the key metric of business (Big Numbers)
- To identify overall trends 
- To identify outliners

In [None]:

-- Generate a Report that shows all key metrics of the business

SELECT  'Total Sales' AS 'Measure Name', SUM(sales) AS 'Measure Value' from gold.fact_sales
UNION ALL
SELECT 'Total quantity' , SUM(quantity) from gold.fact_sales 
UNION ALL
SELECT 'Average price', AVG(price) from gold.fact_sales
UNION ALL
SELECT 'Total orders' , COUNT(order_number)  from gold.fact_sales
UNION ALL
SELECT  'Distinct total orders' ,COUNT(DISTINCT order_number) from gold.fact_sales
UNION ALL
SELECT  'Total products' ,COUNT(product_id)  from gold.dim_products
UNION ALL
SELECT 'Total customers', COUNT(customer_id)  from gold.dim_customers
UNION ALL
SELECT 'Total customers who ordered', COUNT(DISTINCT customer_key) from gold.fact_sales

## **Magnitude Analysis**

comparing the measure <span style="color: #0000ff;">values</span> accross different categorizes <span style="color: #0000ff;">and</span> dimension

**Purpose:**

-     To quantify data and group results by specific dimensions.
-     For understanding data distribution across categories.

  

**SQL Functions Used:**

-     Aggregate Functions: SUM(), COUNT(), AVG()
-      GROUP BY, ORDER BY

In [None]:
-- Find total customers by countries
SELECT country, COUNT(customer_id) as 'Total customers' from gold.dim_customers GROUP BY country

In [None]:
-- Find total customers by gender


SELECT 
    gender, COUNT(customer_key) as 'Total customers' 
from 
    gold.dim_customers 

GROUP BY gender 

In [None]:
-- Total products by category
SELECT 
    category, COUNT(product_key) AS 'Total products'
from 
    gold.dim_products
GROUP BY category
ORDER BY 'Total products' DESC


In [None]:
-- Average costs in each category
SELECT
    category, AVG(cost) as 'Avg cost'
FROM    
    gold.dim_products
GROUP BY category
ORDER BY 'Avg cost'

In [None]:
-- Total revenue for each category
SELECT 
    category,
    SUM(sales) as 'Total sales'
FROM
    gold.fact_sales as f 
    LEFT JOIN 
    gold.dim_products as p
    on f.product_key = p.product_key
GROUP BY 
category
ORDER BY 'Total sales' DESC

In [None]:
-- Total revenue generated by each coustomers
SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    SUM(sales) as 'Total sales'
FROM
    gold.fact_sales as f 
    LEFT JOIN 
    gold.dim_customers as c
    on f.customer_key = c.customer_key
GROUP BY 
 c.customer_id,
 c.first_name,
 c.last_name
ORDER BY 'Total sales' DESC

In [None]:
-- Distribution of sold items accross countries
SELECT 
    c.country,   
    SUM(sales) as 'Total sales'
FROM
    gold.fact_sales as f 
    LEFT JOIN 
    gold.dim_customers as c
    on f.customer_key = c.customer_key
GROUP BY 
    c.country
ORDER BY 'Total sales' DESC

## **Ranking analysis**

**Purpose:**

- To rank items (e.g., products, customers) based on performance or other metrics.
- To identify top performers or laggards.

**SQL Functions Used:**

- Window Ranking Functions: RANK(), DENSE\_RANK(), ROW\_NUMBER(), TOP
- Clauses: GROUP BY, ORDER BY

In [None]:
-- Top 5 product generating highest reneview
SELECT TOP 5
    p.product_name as 'product name',
    SUM(f.sales) as 'Total revenue'
FROM
    gold.dim_products as p 
    RIGHT JOIN
    gold.fact_sales as f
on p.product_key = f.product_key
GROUP BY p.product_name
ORDER BY 'Total revenue' DESC

In [None]:
-- Bottom 5 worst product 
SELECT TOP 5
    p.product_name as 'product name',
    SUM(f.sales) as 'Total revenue'
FROM
    gold.dim_products as p 
    RIGHT JOIN
    gold.fact_sales as f
on p.product_key = f.product_key
GROUP BY p.product_name
ORDER BY 'Total revenue' ASC

In [None]:
-- Ranking products

SELECT 
*
FROM
    (SELECT 
        p.product_name as 'product name',
        SUM(f.sales) as 'Total revenue',
        ROW_NUMBER() OVER (ORDER BY SUM(f.sales) DESC) AS rank_products
    FROM
        gold.dim_products as p 
        RIGHT JOIN
        gold.fact_sales as f
    on p.product_key = f.product_key
    GROUP BY p.product_name) T
WHERE rank_products <= 5
