**Zackaria Mamdouh | Group 6 | Project 1**

_Written in collaboration with ChatGPT from OpenAI to improve understanding, assist with the explanation of the query, and enhance formatting and display of the queries._

**Complex Queries (from Pokemon Databases)**

**Query 1 (BEST QUERY)**

**Proposition**: The goal of this query is to compare the average Hit Points (HP), Attack, and Defense statistics for Pokémon across the first three generations.

**Tables**

The query involves three tables from a Pokémon database, one for each of the first three generations. Each table contains various statistics about Pokémon:

- `PokemonGen1.dbo.PokemonGen1`
- `PokemonGen2.dbo.PokemonGen2`
- `PokemonGen3.dbo.PokemonGen3`

**Columns**

The columns involved from each table are:

- `HP` (Hit Points)
- `Attack`
- `Defense`

**Predicate**

For each generation's table, the query:

1. Calculates the average `HP`, `Attack`, and `Defense` values.
2. Labels the calculated averages with the respective generation name.

**Combined Statistics**

A Common Table Expression (CTE) named `CombinedStats` is used to unify the averages from all three tables. The final output of the query retrieves the generation and their corresponding average statistics for `HP`, `Attack`, and `Defense`.

The `UNION ALL` operator is used to combine the selections from the different generations into a single result set, which is then used to display the overall comparison.

In [1]:
--QUERY 1-- This query combines and compares the average HP, Attack, and Defense stats across the three Pokémon generations.

WITH CombinedStats AS (
    -- Selects the average HP, Attack, and Defense stats from Generation 1
    SELECT 'Gen1' AS Generation, AVG(HP) AS AvgHP, AVG(Attack) AS AvgAttack, AVG(Defense) AS AvgDefense
    FROM PokemonGen1.dbo.PokemonGen1
    UNION ALL
    -- Selects the average HP, Attack, and Defense stats from Generation 2
    SELECT 'Gen2', AVG(HP), AVG(Attack), AVG(Defense)
    FROM PokemonGen2.dbo.PokemonGen2
    UNION ALL
    -- Selects the average HP, Attack, and Defense stats from Generation 3
    SELECT 'Gen3', AVG(HP), AVG(Attack), AVG(Defense)
    FROM PokemonGen3.dbo.PokemonGen3
)
-- The final SELECT fetches the combined average stats for each generation
SELECT Generation, AvgHP, AvgAttack, AvgDefense
FROM CombinedStats;

Generation,AvgHP,AvgAttack,AvgDefense
Gen1,63,72,68
Gen2,70,68,69
Gen3,65,73,69


**Query 2 (SECOND BEST QUERY)**

**Proposition**: This query aims to list Pokémon from all generations, ordered by their Defense to Attack ratio in descending order, to understand how defensive each Pokémon is relative to its offensive capabilities.

**Tables** The query utilizes data from three tables within a Pokémon database, corresponding to different Pokémon generations:

- `PokemonGen1.dbo.PokemonGen1`
- `PokemonGen2.dbo.PokemonGen2`
- `PokemonGen3.dbo.PokemonGen3`

**Columns**

The columns used from these tables are:

- `Name` (the Pokémon's name)
- `Defense` (the Pokémon's defense statistic)
- `Attack` (the Pokémon's attack statistic)

**Predicate** The query executes the following steps:

1. In a subquery, it selects the `Name`, `Defense`, and `Attack` stats from the `PokemonGen1` table and labels the generation as 'Gen1'.
2. It repeats this selection for `PokemonGen2` and `PokemonGen3`, labeling the results 'Gen2' and 'Gen3', respectively.
3. The results from all three generations are combined into a single dataset using `UNION ALL`.

**Calculating Defense to Attack Ratio** In the main query:

- It creates a new column `DefenseAttackRatio` by dividing the `Defense` stat by the `Attack` stat for each Pokémon. The division is cast to `FLOAT` to ensure an accurate decimal ratio.
- The final results are then ordered by this ratio in descending order (`ORDER BY DefenseAttackRatio DESC`), so Pokémon with higher defensive capabilities relative to their attack are listed first.

In [7]:
--QUERY 2 -- Query to find the top 5 Pokémon with the highest average combined Defense and Special Defense across all generations.

SELECT TOP 5 Name, MainType, AVG(Defense + SpecialDefense) as AverageDefense
FROM (
    -- Combining Pokémon data from Generation 1. Selecting Name, primary Type, Defense, and SpecialDefense.
    SELECT Name, Type1 as MainType, Defense, SpecialDefense FROM PokemonGen1.dbo.PokemonGen1
    UNION ALL
    -- Repeating the process for Generation 2 Pokémon.
    SELECT Name, Type1, Defense, SpecialDefense FROM PokemonGen2.dbo.PokemonGen2
    UNION ALL
    -- Repeating the process for Generation 3 Pokémon.
    SELECT Name, Type1, Defense, SpecialDefense FROM PokemonGen3.dbo.PokemonGen3
) AS CombinedPokemons
-- Grouping the results by Pokémon Name and their primary Type.
GROUP BY Name, MainType
-- Ordering the Pokémon based on their average combined Defense and SpecialDefense in descending order.
ORDER BY AverageDefense DESC;


Name,MainType,AverageDefense
Shuckle,Bug,460
Regice,Ice,300
Regirock,Rock,300
Registeel,Steel,300
Lugia,Psychic,284


**Query 3 (THIRD BEST QUERY)**

**Proposition**: This query aims to count the number of legendary Pokémon in each generation, identifying them by the presence of the 'Pressure' ability, which is considered characteristic of legendary Pokémon.

**Tables** The query accesses data from three tables within a Pokémon database, each corresponding to a different generation:

- `PokemonGen1.dbo.PokemonGen1`
- `PokemonGen2.dbo.PokemonGen2`
- `PokemonGen3.dbo.PokemonGen3`

**Columns**

Columns involved in the query are:

- `Name` (the name of the Pokémon)
- `Ability1`, `Ability2`, `Ability3` (different abilities a Pokémon may have)

**Predicate** The query consists of the following steps:

1. Selects Pokémon from the `PokemonGen1` table that have the 'Pressure' ability in any of their ability slots.
2. Applies the same selection criteria to the `PokemonGen2` and `PokemonGen3` tables.
3. Combines the results from all three generations using `UNION ALL`, forming a unified dataset of legendary Pokémon across generations.

**Counting Legendary Pokémon** In the final stage, the query:

- Groups the collected data by `Generation`.
- Uses the `COUNT` function to determine the total number of legendary Pokémon for each generation.

**Purpose** The outcome of this query provides insight into the distribution of legendary Pokémon, denoted by the 'Pressure' ability, across the first three generations of the Pokémon series.

In [5]:
-- QUERY 3 --  This query calculates the number of legendary Pokémon in each generation by searching for the 'Pressure ability that is exclusive to legendary Pokemon.
SELECT Generation, COUNT(*) AS LegendaryCount
FROM (
    -- Selects all Pokémon from Generation 1 that have the 'Pressure' ability, which is assumed to be a marker of legendary Pokémon.
    SELECT 'Gen1' AS Generation, Name 
    FROM PokemonGen1.dbo.PokemonGen1 
    WHERE Ability1 = 'Pressure' OR Ability2 = 'Pressure' OR Ability3 = 'Pressure'
    UNION ALL
    -- Selects all Pokémon from Generation 2 with the 'Pressure' ability.
    SELECT 'Gen2', Name 
    FROM PokemonGen2.dbo.PokemonGen2 
    WHERE Ability1 = 'Pressure' OR Ability2 = 'Pressure' OR Ability3 = 'Pressure'
    UNION ALL
    -- Selects all Pokémon from Generation 3 with the 'Pressure' ability.
    SELECT 'Gen3', Name 
    FROM PokemonGen3.dbo.PokemonGen3
    WHERE Ability1 = 'Pressure' OR Ability2 = 'Pressure' OR Ability3 = 'Pressure'
) AS LegendaryPokemons -- The subquery results are aliased as Legendary Pokemons.
GROUP BY Generation -- Groups the results by generation to prepare for the count.
-- The count function is used to count the number of legendary Pokémon in each group.

Generation,LegendaryCount
Gen1,5
Gen2,5
Gen3,5


**Query 4 ("Worst" Query)**

**Proposition** The aim of this query is to identify the most frequently occurring Pokémon type from the combined data of the first three generations. 

**Tables** This query involves three tables from a Pokémon database, each representing a different generation of Pokémon. The relevant tables and columns are:

- `PokemonGen1.dbo.PokemonGen1`
- `PokemonGen2.dbo.PokemonGen2`
- `PokemonGen3.dbo.PokemonGen3`

**Columns**

Each table includes two columns for Pokémon types:

- `Type1`
- `Type2`

**Predicate** The query performs the following actions:

1. Selects all non-null instances of `Type1` and `Type2` from the `PokemonGen1` table.
2. Repeats the selection for `Type1` and `Type2` from the `PokemonGen2` and `PokemonGen3` tables.
3. Uses the `UNION ALL` operator to combine these results into a single dataset.

**Determining the Most Common Type** A subquery named `CombinedTypes` is created to hold the unified list of all types. The query then:

- Groups the results by `Type`.
- Counts the occurrences of each type.
- Orders the grouped count in descending order.

The `SELECT TOP 1` statement is used to retrieve only the most common type out of the dataset.

In [2]:
-- --QUERY 4 --  This query finds the most common Pokémon type across all three generations.

SELECT TOP 1 Type, COUNT(*) AS TypeCount
FROM (
    -- Include all 'Type1' from Gen1
    SELECT Type1 AS Type FROM PokemonGen1.dbo.PokemonGen1 WHERE Type1 IS NOT NULL
    UNION ALL
    -- Include all 'Type2' from Gen1
    SELECT Type2 AS Type FROM PokemonGen1.dbo.PokemonGen1 WHERE Type2 IS NOT NULL
    UNION ALL
    -- Include all 'Type1' from Gen2
    SELECT Type1 AS Type FROM PokemonGen2.dbo.PokemonGen2 WHERE Type1 IS NOT NULL
    UNION ALL
    -- Include all 'Type2' from Gen2
    SELECT Type2 AS Type FROM PokemonGen2.dbo.PokemonGen2 WHERE Type2 IS NOT NULL
    UNION ALL
    -- Include all 'Type1' from Gen3
    SELECT Type1 AS Type FROM PokemonGen3.dbo.PokemonGen3 WHERE Type1 IS NOT NULL
    UNION ALL
    -- Include all 'Type2' from Gen3
    SELECT Type2 AS Type FROM PokemonGen3.dbo.PokemonGen3 WHERE Type2 IS NOT NULL
) AS CombinedTypes
GROUP BY Type
ORDER BY TypeCount DESC;

Type,TypeCount
Water,78


**Query 5 (SECOND "WORST" QUERY)**

**Proposition**:This query is designed to determine which Pokémon has the highest aggregate statistics, known as total stats (the sum of HP, Attack, Defense, SpecialAttack, SpecialDefense, and Speed), comparing across the first three generations of Pokémon.

**Tables** The query accesses three separate tables within a Pokémon database, with each table representing a different generation of Pokémon:

- `PokemonGen1.dbo.PokemonGen1`
- `PokemonGen2.dbo.PokemonGen2`
- `PokemonGen3.dbo.PokemonGen3`

**Columns**

From each table, it utilizes the following columns:

- `Name` (the name of the Pokémon)
- `HP` (Hit Points)
- `Attack`
- `Defense`
- `SpecialAttack`
- `SpecialDefense`
- `Speed`

**Predicate** The logic of the query executes these steps:

1. For each Pokémon in the `PokemonGen1` table, it computes the total of the aforementioned stats and assigns a label 'Gen1'.
2. It repeats the calculation for `PokemonGen2` and `PokemonGen3`, labeling the results 'Gen2' and 'Gen3', respectively.
3. It unifies the results from all three generations using `UNION ALL`.

**Aggregate and Grouping** The query then performs an aggregation to:

- Group the results by each Pokémon's name and generation, ensuring the max total is calculated within these groups.
- Use the `MAX` function to find the highest total stats for Pokémon within the same name and generation grouping.

In [3]:
--QUERY 5 -- This query identifies the Pokémon with the highest total stats (sum of all individual stats) across all generations.

SELECT MAX(CombinedTotal.Total) AS MaxStat, CombinedTotal.Name, CombinedTotal.Generation
FROM (
    -- This subquery calculates the total stats for each Pokémon in Generation 1 and labels them 'Gen1'.
    SELECT Name, (HP + Attack + Defense + SpecialAttack + SpecialDefense + Speed) AS Total, 'Gen1' AS Generation FROM PokemonGen1.dbo.PokemonGen1
    UNION ALL
    -- Similarly, it calculates the total stats for each Pokémon in Generation 2 and labels them 'Gen2'.
    SELECT Name, (HP + Attack + Defense + SpecialAttack + SpecialDefense + Speed), 'Gen2' FROM PokemonGen2.dbo.PokemonGen2
    UNION ALL
    -- And does the same for Generation 3, labeling them 'Gen3'.
    SELECT Name, (HP + Attack + Defense + SpecialAttack + SpecialDefense + Speed), 'Gen3' FROM PokemonGen3.dbo.PokemonGen3
) AS CombinedTotal
-- Grouping the results by Pokémon name and generation.
GROUP BY CombinedTotal.Name, CombinedTotal.Generation;

MaxStat,Name,Generation
485,Rhydon,Gen1
435,Tangela,Gen1
310,Abra,Gen1
515,Aerodactyl,Gen1
500,Alakazam,Gen1
448,Arbok,Gen1
555,Arcanine,Gen1
580,Articuno,Gen1
395,Beedrill,Gen1
300,Bellsprout,Gen1


**Query 6 (THIRD "WORST" QUERY)**

**Proposition**: This query is structured to analyze the average Speed of Pokémon, distinguishing between those with a single type (Single) and those with two types (Dual), across three generations.

**Tables** The data is sourced from three different tables in the Pokémon database, each corresponding to a distinct generation of Pokémon:

- `PokemonGen1.dbo.PokemonGen1`
- `PokemonGen2.dbo.PokemonGen2`
- `PokemonGen3.dbo.PokemonGen3`

**Columns**

The columns assessed from these tables are:

- `Speed` (a statistic of the Pokémon)
- `Type2` (the second type of the Pokémon, if present)

**Predicate** The query executes the following operations:

1. Constructs a case classification within a subquery for each Pokémon in the `PokemonGen1` table to determine if the Pokémon is Single or Dual-typed based on the presence or absence of `Type2`.
2. It replicates this classification for `PokemonGen2` and `PokemonGen3`.
3. All three selections are unified using `UNION ALL` to include all generations in a common format.

**Calculating Average Speed** Post-unification, the query:

- Groups the data by `Generation` and `TypeCategory`.
- Calculates the average `Speed` for each group using the `AVG` function.

In [4]:
-- QUERY 6 --  This query calculates the average speed of Pokémon based on their type classification (Single or Dual) for each generation.

SELECT Generation, TypeCategory, AVG(Speed) AS AverageSpeed
FROM (
    -- Create a subquery that selects generation, type category, and speed for Gen1 Pokémon
    SELECT 'Gen1' AS Generation, 
           CASE WHEN Type2 IS NOT NULL THEN 'Dual' ELSE 'Single' END AS TypeCategory, 
           Speed 
    FROM PokemonGen1.dbo.PokemonGen1
    UNION ALL
    -- Repeat the same selection for Gen2 Pokémon, classifying type category based on the presence of Type2
    SELECT 'Gen2', 
           CASE WHEN Type2 IS NOT NULL THEN 'Dual' ELSE 'Single' END, 
           Speed 
    FROM PokemonGen2.dbo.PokemonGen2
    UNION ALL
    -- Repeat the selection for Gen3 Pokémon, continuing the type category classification
    SELECT 'Gen3', 
           CASE WHEN Type2 IS NOT NULL THEN 'Dual' ELSE 'Single' END, 
           Speed 
    FROM PokemonGen3.dbo.PokemonGen3
) AS CombinedTypes
-- Group the results by generation and type category
GROUP BY Generation, TypeCategory;

Generation,TypeCategory,AverageSpeed
Gen1,Dual,65
Gen2,Dual,64
Gen3,Dual,64
Gen1,Single,71
Gen2,Single,58
Gen3,Single,59


**Query 7**

**Proposition**: This query aims to list Pokémon from all generations, ordered by their Defense to Attack ratio in descending order, to understand how defensive each Pokémon is relative to its offensive capabilities.

**Tables** The query utilizes data from three tables within a Pokémon database, corresponding to different Pokémon generations:

- `PokemonGen1.dbo.PokemonGen1`
- `PokemonGen2.dbo.PokemonGen2`
- `PokemonGen3.dbo.PokemonGen3`

**Columns**

The columns used from these tables are:

- `Name` (the Pokémon's name)
- `Defense` (the Pokémon's defense statistic)
- `Attack` (the Pokémon's attack statistic)

**Predicate** The query executes the following steps:

1. In a subquery, it selects the `Name`, `Defense`, and `Attack` stats from the `PokemonGen1` table and labels the generation as 'Gen1'.
2. It repeats this selection for `PokemonGen2` and `PokemonGen3`, labeling the results 'Gen2' and 'Gen3', respectively.
3. The results from all three generations are combined into a single dataset using `UNION ALL`.

**Calculating Defense to Attack Ratio** In the main query:

- It creates a new column `DefenseAttackRatio` by dividing the `Defense` stat by the `Attack` stat for each Pokémon. The division is cast to FLOAT to ensure an accurate decimal ratio.
- The final results are then ordered by this ratio in descending order (`ORDER BY DefenseAttackRatio DESC`), so Pokémon with higher defensive capabilities relative to their attack are listed first.

In [6]:
--QUERY 7 --  This query lists the Pokémon from all generations ordered by their Defense to Attack ratio in descending order.

-- Create a virtual table that combines the Name, Defense, and Attack stats for all Pokémon across generations.
SELECT Generation, Name, Defense, Attack, CAST(Defense AS FLOAT) / Attack AS DefenseAttackRatio
FROM (
    -- Select Name, Defense, and Attack stats from Generation 1 Pokémon.
    SELECT 'Gen1' AS Generation, Name, Defense, Attack FROM PokemonGen1.dbo.PokemonGen1
    UNION ALL
    -- Repeat the selection for Generation 2.
    SELECT 'Gen2', Name, Defense, Attack FROM PokemonGen2.dbo.PokemonGen2
    UNION ALL
    -- Repeat the selection for Generation 3.
    SELECT 'Gen3', Name, Defense, Attack FROM PokemonGen3.dbo.PokemonGen3
) AS CombinedStats
-- Order the results by the Defense to Attack ratio, from highest to lowest.
ORDER BY DefenseAttackRatio DESC;

Generation,Name,Defense,Attack,DefenseAttackRatio
Gen2,Shuckle,230,10,23.0
Gen1,Magikarp,55,10,5.5
Gen1,Onix,160,45,3.555555555555556
Gen2,Togepi,65,20,3.25
Gen3,Nosepass,135,45,3.0
Gen1,Metapod,55,20,2.75
Gen2,Marill,50,20,2.5
Gen1,Omanyte,100,40,2.5
Gen2,Magcargo,120,50,2.4
Gen2,Steelix,200,85,2.3529411764705883


**Below this line is the 13 medium queries using Northwinds and AdventureWorks. The first 3 are the "best"queries, followed by the 3 "worst" queries.**

**\-------------------------------------------------------------------------------------------------**

**SQL Query Analysis 1 (BEST QUERY)**

**Proposition** This query aims to identify customers with total sales over $1000 from the AdventureWorks2017 database. It's designed to calculate the total sales per customer and display the names of those whose total sales exceed this amount, sorted in descending order of their sales value.

**Tables** The query utilizes data from the following tables in the AdventureWorks2017 database:

1. `AdventureWorks2017.Sales.SalesOrderHeader` (aliased as SOH)
2. `AdventureWorks2017.Sales.SalesOrderDetail` (aliased as SOD)
3. `AdventureWorks2017.Person.Person` (aliased as P)

**Columns** The columns involved in this query are:

- From `SalesOrderHeader`:
    
    - `CustomerID`
    - `SalesOrderID`
- From `SalesOrderDetail`:
    
    - `LineTotal`
    - `SalesOrderID`
- From `Person`:
    
    - `FirstName`
    - `LastName`
    - `BusinessEntityID`

**Common Table Expression (CTE)**

- Named `CustomerSalesCTE`.
- It selects:
    - `CustomerID` from `SalesOrderHeader`.
    - Sums up `LineTotal` from `SalesOrderDetail` as `TotalSales`.
- Involves a JOIN between `SalesOrderHeader` and `SalesOrderDetail` on `SalesOrderID`.
- Groups the results by `CustomerID`.

**Main Query**

- Selects:
    - Customer name by concatenating `FirstName` and `LastName` from `Person` table as `CustomerName`.
    - `TotalSales` from the CTE.
- Involves a JOIN between `Person` and the CTE on `BusinessEntityID` = `CustomerID`.
- Filters to include only those records where `TotalSales` is greater than $1000.
- Orders the result by `TotalSales` in descending order.

**Objective** To provide a list of customers who have made significant purchases (over $1000) along with the total value of their purchases, ranked from highest to lowest spender.

In [8]:
WITH CustomerSalesCTE AS (
    SELECT 
        SOH.CustomerID, 
        SUM(SOD.LineTotal) AS TotalSales
    FROM 
        AdventureWorks2017.Sales.SalesOrderHeader SOH
        JOIN AdventureWorks2017.Sales.SalesOrderDetail SOD 
        ON SOH.SalesOrderID = SOD.SalesOrderID
    GROUP BY SOH.CustomerID
)
SELECT 
    P.FirstName + ' ' + P.LastName AS CustomerName, 
    CS.TotalSales
FROM 
    AdventureWorks2017.Person.Person P
    JOIN CustomerSalesCTE CS 
    ON P.BusinessEntityID = CS.CustomerID
WHERE 
    CS.TotalSales > 1000
ORDER BY 
    CS.TotalSales DESC;


CustomerName,TotalSales
Hannah Clark,13295.38
Taylor Jones,13294.27
Hannah Lee,13269.27
Kelli Chander,13265.99
Colleen She,13242.7
Madison Miller,13215.65
Taylor Smith,13195.64
Colleen Tang,13173.19
Hannah Garcia,13164.64
Jenny Chen,12909.6682


**SQL Query Analysis 2 (SECOND BEST QUERY)**

**Proposition** This query aims to identify the top 10 products in terms of sales volume and total quantity sold from the AdventureWorks2017 database. The focus is on products whose total sales exceed the average line total across all products.

**Table** The query utilizes data from:

- `AdventureWorks2017.Sales.SalesOrderDetail`

**Columns** The columns involved in this query are:

- **`ProductID`**: Identifier for each product.
- **`OrderQty`**: The quantity of the product ordered.
- **`LineTotal`**: The total sales value for each line item.

**Query Structure**

- **Selection**: Retrieves the top 10 records based on total sales.
- **Aggregation**:
    - Calculates **`TotalQuantity`** as the sum of `OrderQty`.
    - Calculates **`TotalSales`** as the sum of `LineTotal`.
- **Grouping**: Groups the results by `ProductID`.
- **Condition**: Includes only those products where the sum of `LineTotal` is greater than the average `LineTotal` across all products.
    - This condition is implemented using a subquery that calculates the average `LineTotal`.
- **Ordering**: Orders the results by `TotalSales` in descending order.

**Objective** The query is designed to identify top-performing products, highlighting those with significant sales achievements above the average. This analysis is crucial for understanding product performance and guiding inventory and marketing strategies.

In [9]:
SELECT 
    TOP 10 
    ProductID, 
    TotalQuantity = SUM(OrderQty), 
    TotalSales = SUM(LineTotal)
FROM 
    AdventureWorks2017.Sales.SalesOrderDetail
GROUP BY 
    ProductID
HAVING 
    SUM(LineTotal) > (SELECT AVG(LineTotal) FROM AdventureWorks2017.Sales.SalesOrderDetail)
ORDER BY 
    TotalSales DESC;

ProductID,TotalQuantity,TotalSales
782,2977,4400592.8004
783,2664,4009494.761841
779,2394,3693678.025272
780,2234,3438478.860423
781,2216,3434256.941928
784,2111,3309673.216908
793,1642,2516857.314918
794,1498,2347655.953454
795,1245,2012447.775
753,664,1847818.628


**SQL Query Analysis 3 (THIRD BEST QUERY)**

**Proposition** This query aims to analyze sales data by region in the AdventureWorks2017 database. It calculates the total and average sales values for each sales territory.

**Tables** The query involves the following tables:

1. `AdventureWorks2017.Sales.SalesOrderHeader` (aliased as SOH)
2. `AdventureWorks2017.Sales.SalesTerritory` (aliased as ST)

**Columns** Key columns used in this query are:

- From `SalesOrderHeader`:
    - `SubTotal`
    - `TerritoryID`
- From `SalesTerritory`:
    - `Name`
    - `TerritoryID`

**Join**

- An INNER JOIN connects SOH and ST based on `TerritoryID`.

**Grouping and Aggregation**

- The query groups the data by `Name` (sales region) from the ST table.
- It computes two aggregated values:
    - `TotalSales`: the sum of `SubTotal` from SOH for each group.
    - `AverageOrderValue`: the average of `SubTotal` for each group.

**Objective** To provide a summary of sales performance by region, highlighting both the total and average sales for each territory. This analysis is key for understanding regional market trends and sales efficiency.

In [10]:
SELECT 
    SalesRegion = ST.Name, 
    TotalSales = SUM(SOH.SubTotal), 
    AverageOrderValue = AVG(SOH.SubTotal)
FROM 
    AdventureWorks2017.Sales.SalesOrderHeader SOH
    INNER JOIN AdventureWorks2017.Sales.SalesTerritory ST 
    ON SOH.TerritoryID = ST.TerritoryID
GROUP BY 
    ST.Name;

SalesRegion,TotalSales,AverageOrderValue
Australia,10655335.9598,1557.1147
Central,7909009.0062,20542.8805
Canada,16355770.4553,4021.5811
France,7251555.6473,2713.9055
Northwest,16084942.5482,3501.2935
United Kingdom,7670721.0356,2382.9515
Southwest,24184609.6011,3885.702
Southeast,7879655.0731,16213.282
Northeast,6939374.4813,19714.132
Germany,4915407.596,1873.964


**SQL Query Analysis 4 (FIRST "WORST" QUERY)**

**Proposition** This query focuses on analyzing discounted sales transactions in the AdventureWorks2017 database. It retrieves detailed information about each sale that had a discount applied, including the discount amount calculated.

**Tables** The query involves data from two tables:

1. `AdventureWorks2017.Sales.SalesOrderHeader` (abbreviated as SOH)
2. `AdventureWorks2017.Sales.SalesOrderDetail` (abbreviated as SOD)

**Columns** The selected columns in this query are:

- From `SalesOrderHeader`:
    - `SalesOrderID`
    - `OrderDate`
- From `SalesOrderDetail`:
    - `ProductID`
    - `OrderQty`
    - `UnitPrice`
    - `UnitPriceDiscount`

**Calculated Column**

- `DiscountAmount`: A new column calculated as `SOD.OrderQty * SOD.UnitPrice * SOD.UnitPriceDiscount`.

**Join**

- An INNER JOIN is used to combine SOH and SOD on the common column `SalesOrderID`.

**Filter**

- The `WHERE` clause filters the data to include only those transactions where `SOD.UnitPriceDiscount` is greater than 0, indicating a discount was applied.

**Ordering**

- The results are ordered by `DiscountAmount` in descending order.

**Objective** The goal of this query is to provide insights into the sales transactions that involved discounts, highlighting the scale of these discounts in terms of the discounted amount. This analysis can be crucial for understanding the impact of discounts on sales and for making informed decisions about pricing strategies.

In [11]:
SELECT 
    SOH.SalesOrderID, 
    SOH.OrderDate, 
    SOD.ProductID, 
    SOD.OrderQty,
    SOD.UnitPrice,
    SOD.UnitPriceDiscount,
    DiscountAmount = SOD.OrderQty * SOD.UnitPrice * SOD.UnitPriceDiscount
FROM 
    AdventureWorks2017.Sales.SalesOrderHeader SOH
INNER JOIN 
    AdventureWorks2017.Sales.SalesOrderDetail SOD 
    ON SOH.SalesOrderID = SOD.SalesOrderID
WHERE 
    SOD.UnitPriceDiscount > 0
ORDER BY 
    DiscountAmount DESC;

SalesOrderID,OrderDate,ProductID,OrderQty,UnitPrice,UnitPriceDiscount,DiscountAmount
51823,2013-06-30 00:00:00.000,957,21,953.628,0.2,4005.2376
46380,2012-04-30 00:00:00.000,776,11,843.7475,0.35,3248.4279
55282,2013-08-30 00:00:00.000,954,26,1192.035,0.1,3099.291
51131,2013-05-30 00:00:00.000,954,16,953.628,0.2,3051.6096
51858,2013-06-30 00:00:00.000,957,16,953.628,0.2,3051.6096
46380,2012-04-30 00:00:00.000,777,10,843.7475,0.35,2953.1163
53535,2013-07-31 00:00:00.000,957,15,953.628,0.2,2860.884
46380,2012-04-30 00:00:00.000,771,9,849.9975,0.35,2677.4921
46334,2012-04-30 00:00:00.000,771,9,849.9975,0.35,2677.4921
53535,2013-07-31 00:00:00.000,954,14,953.628,0.2,2670.1584


**SQL Query Analysis 5 (SECOND "WORST" QUERY)**

**Proposition** This query is intended to identify and analyze customers with more than five orders in the AdventureWorks2017 database. It focuses on calculating the total number of orders per customer and their average order value.

**Tables** The query utilizes data from:

1. `AdventureWorks2017.Sales.SalesOrderHeader`

**Columns** Important columns used in this query include:

- `CustomerID`
- `TotalDue`

**Aggregations**

- **`NumberOfOrders`**: This is a calculated field representing the count of orders for each customer.
- **`AverageOrderTotal`**: This field calculates the average `TotalDue` amount per customer.

**Grouping**

- The data is grouped by `CustomerID`.

**Having Clause**

- The `HAVING` clause filters the grouped data to only include customers with more than five orders (`COUNT(*) > 5`).

**Ordering**

- The results are sorted by `NumberOfOrders` in descending order.

**Objective** The primary objective of this query is to segment customers based on their ordering frequency, focusing on those who are frequent buyers. By calculating the average order value alongside the number of orders, the query provides a dual perspective on customer activity and spending habits.

In [12]:
SELECT 
    CustomerID, 
    NumberOfOrders = COUNT(*), 
    AverageOrderTotal = AVG(TotalDue)
FROM 
    AdventureWorks2017.Sales.SalesOrderHeader
GROUP BY 
    CustomerID
HAVING 
    COUNT(*) > 5
ORDER BY 
    NumberOfOrders DESC;

CustomerID,NumberOfOrders,AverageOrderTotal
11176,28,52.0932
11091,28,46.936
11200,27,59.8902
11277,27,58.7283
11223,27,49.3313
11262,27,46.1161
11185,27,66.149
11300,27,61.4101
11711,27,45.1732
11276,27,40.4479


**SQL Query Analysis 6 (THIRD "WORST" QUERY)**

**Proposition** This query is designed to calculate total sales and average freight cost per customer in the Northwinds2022TSQLV7 database. It aims to provide insights into customer spending and associated shipping costs.

**Tables** The query involves data from the following tables:

1. `Northwinds2022TSQLV7.Sales.[Order]` (aliased as OH)
2. `Northwinds2022TSQLV7.Sales.OrderDetail` (aliased as OD)
3. `Northwinds2022TSQLV7.Sales.Customer` (aliased as C)

**Columns** Important columns used include:

- From `Order`:
    - `CustomerID`
    - `Freight`
- From `OrderDetail`:
    - `UnitPrice`
    - `Quantity`
    - `DiscountPercentage`
- From `Customer`:
    - `CustomerCompanyName`

**Common Table Expression (CTE)**

- Named `TotalSalesCTE`.
- It performs the following computations:
    - Calculates `TotalSales` for each customer as the sum of `UnitPrice * Quantity * (1 - DiscountPercentage)` from `OrderDetail`.
    - Computes `AverageFreight` as the average `Freight` value from `Order`.

**Join Operations**

- The CTE involves two INNER JOINs:
    - Between `Order` and `OrderDetail` on `OrderID`.
    - Between `Order` and `Customer` on `CustomerID`.

**Grouping**

- Data is grouped by `CustomerID` in the CTE.

**Main Query**

- Selects `CustomerCompanyName` from the `Customer` table.
- Retrieves `TotalSales` and `AverageFreight` from the CTE.
- Joins the `Customer` table with the CTE on `CustomerID`.

**Ordering**

- Orders the results by `TotalSales` in descending order.

**Objective** The query is crafted to provide a comprehensive view of each customer's total sales and the average cost of freight for their orders. This analysis can be vital for understanding customer value and logistics efficiency within the business.

In [13]:
WITH TotalSalesCTE AS (
    SELECT 
        OH.CustomerID, 
        TotalSales = SUM(OD.UnitPrice * OD.Quantity * (1 - OD.DiscountPercentage)),
        AverageFreight = AVG(OH.Freight)
    FROM 
        Northwinds2022TSQLV7.Sales.[Order] OH  -- This is the correct alias for Sales.Order
    INNER JOIN 
        Northwinds2022TSQLV7.Sales.OrderDetail OD 
        ON OH.OrderID = OD.OrderID
    INNER JOIN 
        Northwinds2022TSQLV7.Sales.Customer C 
        ON OH.CustomerID = C.CustomerID
    GROUP BY 
        OH.CustomerID
)
SELECT 
    C.CustomerCompanyName,
    TS.TotalSales,
    TS.AverageFreight
FROM 
    Northwinds2022TSQLV7.Sales.Customer C
INNER JOIN 
    TotalSalesCTE TS 
    ON C.CustomerID = TS.CustomerID
ORDER BY 
    TS.TotalSales DESC;


CustomerCompanyName,TotalSales,AverageFreight
Customer IRRVL,110277.305,242.5712
Customer THHDP,104874.9785,240.558
Customer LCOUJ,104361.95,228.74
Customer NYUHS,51097.8005,95.4252
Customer FRXZL,49979.905,131.1725
Customer IBVRG,32841.37,48.5578
Customer GLLAG,30908.384,77.7756
Customer CYZTN,29567.5625,118.0208
Customer PVDZC,28872.19,128.7846
Customer YBQTI,27363.605,100.433


**SQL Query Analysis 7**

**Proposition**
This query is structured to analyze the freight costs associated with each customer's orders in the Northwinds2022TSQLV7 database. It calculates the total number of orders and total freight cost per customer, and orders the customers based on their total freight expenses.

**Tables**
The query utilizes data from the following tables:
1. `Northwinds2022TSQLV7.Sales.Customer` (aliased as `c`)
2. `Northwinds2022TSQLV7.Sales.[Order]` (aliased as `o`)

**Columns**
The key columns involved are:
- From `Customer`:
  - `CustomerID`
  - `CustomerCompanyName`
- From `Order`:
  - `OrderId`
  - `Freight`

**Common Table Expression (CTE)**
- Named `CustomerSales`.
- It computes:
  - The number of orders per customer (`NumberOfOrders`) as a count of `OrderId`.
  - The total freight cost (`TotalFreight`) as a sum of `Freight`.

**Join**
- A JOIN operation links the `Customer` and `Order` tables on `CustomerID`.

**Grouping**
- Grouping is done by `CustomerID` and `CustomerCompanyName`.

**Main Query**
- Selects `CustomerID`, `CustomerCompanyName`, `NumberOfOrders`, and `TotalFreight` from the `CustomerSales` CTE.
- Orders the results by `TotalFreight` in descending order.

**Objective**
The aim of this query is to provide insights into the logistics costs incurred by each customer, highlighted by their total freight charges. This information is valuable for understanding the distribution of shipping expenses across different customers, potentially guiding strategic decisions in logistics and customer management.


In [14]:
WITH CustomerSales AS (
    SELECT 
        c.CustomerID,
        c.CustomerCompanyName,
        COUNT(o.OrderId) AS NumberOfOrders,
        SUM(o.Freight) AS TotalFreight
    FROM Northwinds2022TSQLV7.Sales.Customer c
    JOIN Northwinds2022TSQLV7.Sales.[Order] o ON c.CustomerID = o.CustomerID
    GROUP BY c.CustomerID, c.CustomerCompanyName
)
SELECT 
    CustomerID,
    CustomerCompanyName,
    NumberOfOrders,
    TotalFreight
FROM CustomerSales
ORDER BY TotalFreight DESC;

CustomerID,CustomerCompanyName,NumberOfOrders,TotalFreight
71,Customer LCOUJ,31,6683.7
20,Customer THHDP,30,6205.39
63,Customer IRRVL,28,5605.63
37,Customer FRXZL,19,2755.24
65,Customer NYUHS,18,2134.21
62,Customer WFIZJ,13,1982.7
24,Customer CYZTN,19,1678.08
5,Customer HGVLZ,18,1559.52
25,Customer AZJED,15,1403.44
51,Customer PVDZC,13,1394.22


**SQL Query Analysis 8**

**Proposition**
This query is designed to calculate the total sales quantity and amount for each product in the Northwinds2022TSQLV7 database, taking into account any applied discounts. The aim is to identify the most successful products in terms of total revenue generated after discounts.

**Tables**
The query accesses data from:
1. `Northwinds2022TSQLV7.Sales.OrderDetail` (aliased as `od`)

**Columns**
Key columns involved in this query are from `OrderDetail`:
- `ProductID`
- `Quantity`
- `UnitPrice`
- `DiscountPercentage`

**Aggregations**
- **`TotalQuantity`**: Sum of `Quantity` for each product.
- **`TotalAmountAfterDiscount`**: Sum of the product's `UnitPrice * Quantity * (1 - DiscountPercentage)`, representing the total revenue after applying discounts.

**Grouping**
- The data is grouped by `ProductID`.

**Ordering**
- Orders the results by `TotalAmountAfterDiscount` in descending order.

**Objective**
The goal of this query is to provide a clear view of which products are generating the most revenue after discounts are applied. This analysis is crucial for understanding product performance, guiding inventory and marketing strategies, and identifying key revenue drivers in the product catalog.


In [15]:
SELECT 
    od.ProductID,
    SUM(od.Quantity) AS TotalQuantity,
    SUM(od.UnitPrice * od.Quantity * (1 - od.DiscountPercentage)) AS TotalAmountAfterDiscount
FROM Northwinds2022TSQLV7.Sales.OrderDetail od
GROUP BY od.ProductID
ORDER BY TotalAmountAfterDiscount DESC;

ProductID,TotalQuantity,TotalAmountAfterDiscount
38,623,141396.735
29,746,80368.672
59,1496,71155.7
62,1083,47234.97
60,1577,46825.48
56,1263,42593.06
51,886,41819.65
17,978,32698.38
18,539,29171.875
28,640,25696.64


**SQL Query Analysis 9**

**Proposition**
This query is aimed at extracting detailed order information for each customer in the Northwinds2022TSQLV7 database. It focuses on categorizing orders by year and month, and counts the total number of orders for each period.

**Tables**
The query utilizes data from the following tables:
1. `Northwinds2022TSQLV7.Sales.Customer` (aliased as `c`)
2. `Northwinds2022TSQLV7.Sales.[Order]` (aliased as `o`)

**Columns**
Key columns used in this query are:
- From `Customer`:
  - `CustomerID`
  - `CustomerCompanyName`
- From `Order`:
  - `OrderDate`

**Join**
- The query involves a JOIN between the `Customer` and `Order` tables on the `CustomerID` field.

**Grouping**
- The data is grouped by `CustomerID`, `CustomerCompanyName`, and the year and month of the `OrderDate`.

**Aggregations**
- `OrderYear`: Extracted year from `OrderDate`.
- `OrderMonth`: Extracted month from `OrderDate`.
- `OrderCount`: A count of orders per group.

**Ordering**
- The results are ordered by `CustomerID`, then by `OrderYear`, and finally by `OrderMonth`.

**Objective**
The main purpose of this query is to provide a comprehensive view of customer order patterns over time, broken down by year and month. This allows for a detailed analysis of customer activity and can help in understanding seasonal trends, customer loyalty, and ordering behavior.


In [16]:
SELECT 
    c.CustomerID,
    c.CustomerCompanyName,
    YEAR(o.OrderDate) AS OrderYear,
    MONTH(o.OrderDate) AS OrderMonth,
    COUNT(*) AS OrderCount
FROM Northwinds2022TSQLV7.Sales.Customer c
JOIN Northwinds2022TSQLV7.Sales.[Order] o ON c.CustomerID = o.CustomerID
GROUP BY c.CustomerID, c.CustomerCompanyName, YEAR(o.OrderDate), MONTH(o.OrderDate)
ORDER BY c.CustomerID, OrderYear, OrderMonth;

CustomerID,CustomerCompanyName,OrderYear,OrderMonth,OrderCount
1,Customer NRZBB,2015,8,1
1,Customer NRZBB,2015,10,2
1,Customer NRZBB,2016,1,1
1,Customer NRZBB,2016,3,1
1,Customer NRZBB,2016,4,1
2,Customer MLTDN,2014,9,1
2,Customer MLTDN,2015,8,1
2,Customer MLTDN,2015,11,1
2,Customer MLTDN,2016,3,1
3,Customer KBUDE,2014,11,1


**SQL Query Analysis 10**

**Proposition**
This query aims to analyze the shipping costs associated with orders in the Northwinds2022TSQLV7 database. It calculates the total freight cost for each unique shipping address and orders the results based on these costs.

**Table**
The query uses data from:
1. `Northwinds2022TSQLV7.Sales.[Order]` (aliased as `o`)

**Columns**
Key columns involved in the query are:
- From `Order`:
  - `ShipToName`
  - `ShipToAddress`
  - `ShipToCity`
  - `ShipToRegion`
  - `ShipToPostalCode`
  - `ShipToCountry`
  - `Freight`

**Aggregation**
- `TotalFreightCost`: The sum of `Freight` for each shipping address.

**Grouping**
- Grouping is done by all the shipping address components (`ShipToName`, `ShipToAddress`, `ShipToCity`, `ShipToRegion`, `ShipToPostalCode`, `ShipToCountry`).

**Ordering**
- The query orders the results by `TotalFreightCost` in descending order.

**Objective**
The primary objective of this query is to provide insights into the freight costs incurred for different shipping addresses. This analysis can be vital for understanding the distribution of shipping expenses and for making strategic decisions in logistics and shipping management.


In [17]:
SELECT 
    o.ShipToName,
    o.ShipToAddress,
    o.ShipToCity,
    o.ShipToRegion,
    o.ShipToPostalCode,
    o.ShipToCountry,
    SUM(o.Freight) AS TotalFreightCost
FROM Northwinds2022TSQLV7.Sales.[Order] o
GROUP BY o.ShipToName, o.ShipToAddress, o.ShipToCity, o.ShipToRegion, o.ShipToPostalCode, o.ShipToCountry
ORDER BY TotalFreightCost DESC;

ShipToName,ShipToAddress,ShipToCity,ShipToRegion,ShipToPostalCode,ShipToCountry,TotalFreightCost
Destination CUVPF,Kirchgasse 1234,Graz,,10159,Austria,3516.48
Ship to 71-A,7890 Suffolk Ln.,Boise,ID,10305,USA,2965.15
Ship to 63-B,Taucherstraße 2345,Cunewalde,,10280,Germany,2248.76
Ship to 63-C,Taucherstraße 3456,Cunewalde,,10281,Germany,1994.52
Ship to 71-C,9012 Suffolk Ln.,Boise,ID,10307,USA,1872.65
Ship to 71-B,8901 Suffolk Ln.,Boise,ID,10306,USA,1845.9
Destination RVDMF,Kirchgasse 9012,Graz,,10157,Austria,1679.21
Destination DGKOU,6789 Johnstown Road,Cork,Co. Cork,10204,Ireland,1381.93
Ship to 63-A,Taucherstraße 1234,Cunewalde,,10279,Germany,1362.35
Ship to 62-A,"Alameda dos Canàrios, 8901",Sao Paulo,SP,10276,Brazil,1276.42


**SQL Query Analysis 11**

**Proposition**
This query is structured to identify the top 10 highest-grossing products in the AdventureWorks2017 database, based on total revenue generated. It calculates both the total quantity sold and the total revenue for each product.

**Tables**
The query uses data from the following tables:
1. `AdventureWorks2017.Sales.SalesOrderDetail`
2. `AdventureWorks2017.Production.Product` (aliased as P)

**Columns**
Key columns involved in this query are:
- From `SalesOrderDetail`:
  - `ProductID`
  - `OrderQty`
  - `LineTotal`
- From `Product`:
  - `Name`

**Common Table Expression (CTE)**
- Named `ProductSales`.
- It aggregates for each product:
  - `TotalQuantity`: the sum of `OrderQty`.
  - `TotalRevenue`: the sum of `LineTotal`.

**Grouping**
- The data in the CTE is grouped by `ProductID`.

**Main Query**
- Selects the top 10 entries based on `TotalRevenue`.
- Selects `ProductName` from the `Product` table.
- Retrieves `TotalQuantity` and `TotalRevenue` from the `ProductSales` CTE.
- Joins `ProductSales` with the `Product` table on `ProductID`.

**Ordering**
- Orders the results by `TotalRevenue` in descending order.

**Objective**
The purpose of this query is to highlight the products that contribute most significantly to the company's revenue. By focusing on total sales and revenue, it provides insights into the most successful products in terms of sales volume and financial performance.


In [18]:
WITH ProductSales AS (
    SELECT 
        ProductID, 
        TotalQuantity = SUM(OrderQty), 
        TotalRevenue = SUM(LineTotal)
    FROM AdventureWorks2017.Sales.SalesOrderDetail
    GROUP BY ProductID
)
SELECT TOP 10 
    P.Name AS ProductName,
    PS.TotalQuantity,
    PS.TotalRevenue
FROM ProductSales PS
JOIN AdventureWorks2017.Production.Product P ON PS.ProductID = P.ProductID
ORDER BY PS.TotalRevenue DESC;


ProductName,TotalQuantity,TotalRevenue
"Mountain-200 Black, 38",2977,4400592.8004
"Mountain-200 Black, 42",2664,4009494.761841
"Mountain-200 Silver, 38",2394,3693678.025272
"Mountain-200 Silver, 42",2234,3438478.860423
"Mountain-200 Silver, 46",2216,3434256.941928
"Mountain-200 Black, 46",2111,3309673.216908
"Road-250 Black, 44",1642,2516857.314918
"Road-250 Black, 48",1498,2347655.953454
"Road-250 Black, 52",1245,2012447.775
"Road-150 Red, 56",664,1847818.628


**SQL Query Analysis 12**

**Proposition**
This query is aimed at summarizing the purchase activity of customers in the AdventureWorks2017 database. It calculates each customer's total purchases and average freight cost.

**Tables**
The query involves the following tables:
1. `AdventureWorks2017.Sales.SalesOrderHeader` (aliased as SOH)
2. `AdventureWorks2017.Sales.SalesOrderDetail` (aliased as SOD)
3. `AdventureWorks2017.Person.Person` (aliased as P)

**Columns**
The primary columns used are:
- From `SalesOrderHeader`:
  - `CustomerID`
  - `SalesOrderID`
  - `Freight`
- From `SalesOrderDetail`:
  - `LineTotal`
- From `Person`:
  - `FirstName`
  - `LastName`
  - `BusinessEntityID`

**Joins**
- The query includes JOIN operations:
  - Between SOH and SOD on `SalesOrderID`.
  - Between SOH and P on `CustomerID = BusinessEntityID`.

**Aggregations**
- **`TotalPurchases`**: The sum of `LineTotal` from SOD, representing the total purchase amount by each customer.
- **`AverageFreight`**: The average of `Freight` from SOH for each customer.

**Grouping**
- The results are grouped by `CustomerID`, `FirstName`, and `LastName`.

**Objective**
The main purpose of this query is to provide insights into the purchasing behavior of customers, specifically focusing on their total spending and the average shipping costs they incur. This information can be essential for customer segmentation, targeted marketing, and operational decision-making related to logistics.


In [19]:
SELECT 
    SOH.CustomerID, 
    P.FirstName + ' ' + P.LastName AS CustomerName, 
    SUM(SOD.LineTotal) AS TotalPurchases,
    AVG(SOH.Freight) AS AverageFreight
FROM AdventureWorks2017.Sales.SalesOrderHeader SOH
JOIN AdventureWorks2017.Sales.SalesOrderDetail SOD ON SOH.SalesOrderID = SOD.SalesOrderID
JOIN AdventureWorks2017.Person.Person P ON SOH.CustomerID = P.BusinessEntityID
GROUP BY SOH.CustomerID, P.FirstName, P.LastName;


CustomerID,CustomerName,TotalPurchases,AverageFreight
16867,Aaron Adams,2381.94,59.5485
16901,Adam Adams,39.98,0.9995
16724,Alex Adams,39.98,0.9995
16699,Angel Adams,67.58,0.8546
16691,Carlos Adams,4471.74,39.1168
16872,Connor Adams,2359.24,58.981
16858,Elijah Adams,4.99,0.1248
16902,Eric Adams,211.97,2.9496
16730,Evan Adams,140.94,1.7543
16850,Gabriel Adams,2354.98,58.8745


**SQL Query Analysis 13**

**Proposition**
This query is designed to track the annual growth in the number of unique customers for AdventureWorks2017. It compares the count of unique customers each year to the previous year, calculating the yearly growth in customer base.

**Tables**
The query utilizes data from:
1. `AdventureWorks2017.Sales.SalesOrderHeader` (aliased as SOH)

**Columns**
The relevant columns used in this query include:
- `OrderDate` (from SOH)
- `CustomerID` (from SOH)

**Common Table Expression (CTE)**
- Named `CustomerYearlyGrowth`.
- It calculates:
  - `OrderYear`: the year extracted from `OrderDate`.
  - `UniqueCustomers`: the count of distinct `CustomerID`s for each year.

**Grouping**
- The data in the CTE is grouped by `OrderYear`.

**Main Query**
- Selects `OrderYear` and `UniqueCustomers` from the `CustomerYearlyGrowth` CTE, referred to as `CurrentYear`.
- Implements a LEFT JOIN with `CustomerYearlyGrowth` itself, aliased as `PreviousYear`, on `CurrentYear.OrderYear = PreviousYear.OrderYear + 1`.
- Calculates `Growth` as the difference between the current year's and the previous year's unique customer counts. The `ISNULL` function is used to handle cases where there are no previous year data.

**Ordering**
- The results are ordered by `OrderYear`.

**Objective**
The goal of this query is to analyze the year-over-year growth of AdventureWorks2017's customer base. By comparing the number of unique customers each year to the preceding year, the query provides a clear picture of customer acquisition trends over time.


In [20]:
WITH CustomerYearlyGrowth AS (
    SELECT 
        YEAR(SOH.OrderDate) AS OrderYear,
        COUNT(DISTINCT SOH.CustomerID) AS UniqueCustomers
    FROM AdventureWorks2017.Sales.SalesOrderHeader SOH
    GROUP BY YEAR(SOH.OrderDate)
)
SELECT 
    CurrentYear.OrderYear,
    CurrentYear.UniqueCustomers,
    PreviousYear.UniqueCustomers AS PreviousYearCustomers,
    (CurrentYear.UniqueCustomers - ISNULL(PreviousYear.UniqueCustomers, 0)) AS Growth
FROM CustomerYearlyGrowth CurrentYear
LEFT JOIN CustomerYearlyGrowth PreviousYear ON CurrentYear.OrderYear = PreviousYear.OrderYear + 1
ORDER BY CurrentYear.OrderYear;

OrderYear,UniqueCustomers,PreviousYearCustomers,Growth
2011,1406,,1406
2012,3162,1406.0,1756
2013,11095,3162.0,7933
2014,10354,11095.0,-741
