# 📚 SQL GROUP BY 
___

<div style="font-family: Avenir, sans-serif; font-size: 16px; line-height: 1.6; color: white; background-color: #333; padding: 10px; border-radius: 5px;">
This series of notebooks is a collection of my SQL projects. I have been working with SQL for a few years now and I have been using it in various projects. I have decided to create this portfolio to showcase my SQL skills. I hope you enjoy it!

</div>

Welcome to my SQL portfolio! This notebook delves into GROUP BY, demonstrating my understanding of key SQL concepts, including:

- Using **aliases** for enhanced readability and clarity
- Performing **Right Joins** to combine data from multiple tables with specified conditions
- Leveraging **aggregations** to summarize data insights
- Filtering aggregated results with **HAVING clauses** for precise queries

To showcase the practical applications of SQL in data analysis and database management, I’ve integrated Python to connect to the database, execute SQL queries, and display results directly within this notebook.

### Database Overview

This notebook utilizes the `classicmodels` sample database, which includes key tables like **customers**, **orders**, **products**, and **orderdetails**. These tables represent real-world relationships among customers, products, and sales orders, providing a realistic foundation for practicing SQL join operations.

### Key Tables and Columns:

- **customers**: `customerNumber`, `customerName`, `contactLastName`, `contactFirstName`
- **orders**: `orderNumber`, `orderDate`, `customerNumber`
- **orderdetails**: `orderNumber`, `productCode`, `quantityOrdered`, `priceEach`
- **products**: `productCode`, `productName`, `productLine`, `msrp`

By combining SQL techniques with Python's database connectivity, this portfolio demonstrates efficient ways to perform business data analysis and streamline insights into customer orders, products, and sales.

In [3]:
# Import necessary libraries
import mysql.connector
import pandas as pd

# Establish database connection
connection = mysql.connector.connect(
    user='root',
    password='Password1234',
    host='localhost',
    database='classicmodels'
)

# Function to execute SQL queries and display results
def execute_query(query):
    cursor = connection.cursor()
    cursor.execute(query)
    
    # Fetch results and convert to a DataFrame
    result = cursor.fetchall()
    columns = [desc[0] for desc in cursor.description]  # Column names
    
    # Close cursor after execution
    cursor.close()
    return pd.DataFrame(result, columns=columns)


____

### GROUP BY syntax

In [None]:
SELECT 
   column1, 
   column2, 
   column3, 
   aggregate_function(column4)
FROM 
   table_name 
WHERE 
   condition_statement
GROUP BY
   column1, column2, column3
HAVING 
   aggregate_function(column4) condition;


IndentationError: unexpected indent (1894023184.py, line 2)

___

### Question 1  

**Objective:** Understand how to use the `GROUP BY` clause to organize query results based on a specific column.

**Task**  

Using the `orders` table, write a query to:  

- Retrieve the `status` of all orders.  
- Group the results by the `status` column to ensure unique occurrences of each status are displayed.  

**Requirements**  

- Use the `GROUP BY` clause to categorize the `status` values.  
- Ensure that each unique order status appears only once in the result set.  

**Solution**  

```sql
SELECT 
    status 
FROM 
    orders 
GROUP BY 
    status;
```

In [12]:
sql_query = """

SELECT 
    status 
FROM 
    orders 
GROUP BY 
    status;

"""

# Execute the query and display the results
result_df = execute_query(sql_query)
result_df

Unnamed: 0,status
0,Shipped
1,Resolved
2,Cancelled
3,On Hold
4,Disputed
5,In Process


___

### Question 2  

**Objective:** Learn how to use the `GROUP BY` clause with aggregate functions in MySQL.

**Task**  

Using the `orders` table, write a query to:  

- Retrieve the `status` of orders.  
- Count the number of orders for each `status`.  

**Requirements**  

- Use the `COUNT` function to count the number of rows in each group.  
- Use the `GROUP BY` clause to group rows into sets based on the `status` column.  

**Solution**  

```sql
SELECT 
    status, 
    COUNT(*) 
FROM 
    orders 
GROUP BY 
    status;
```

In [13]:
sql_query = """

SELECT 
  status, 
  COUNT(*) 
FROM 
  orders 
GROUP BY 
  status;

"""

# Execute the query and display the results
result_df = execute_query(sql_query)
result_df

Unnamed: 0,status,COUNT(*)
0,Shipped,303
1,Resolved,4
2,Cancelled,6
3,On Hold,4
4,Disputed,3
5,In Process,6


___

### Question 3  

**Objective:** Calculate the total amount of all orders by status using `GROUP BY` with the `SUM` function.

**Task**  

Using the `orders` and `orderdetails` tables, write a query to:  

- Retrieve the `status` of orders.  
- Calculate the total amount of all orders for each `status` by multiplying `quantityOrdered` with `priceEach`.  

**Requirements**  

- Use the `SUM` function to calculate the total amount per order status.  
- Join the `orders` table with the `orderdetails` table using `INNER JOIN`.  
- Use the `GROUP BY` clause to group rows based on the `status` column.  

**Solution**  

```sql
SELECT 
    status, 
    SUM(quantityOrdered * priceEach) AS amount 
FROM 
    orders 
    INNER JOIN orderdetails USING (orderNumber) 
GROUP BY 
    status;
```


In [14]:
sql_query = """

SELECT 
    status, 
    SUM(quantityOrdered * priceEach) AS amount 
FROM 
    orders 
    INNER JOIN orderdetails 
    USING (orderNumber) 
GROUP BY 
    status

"""

# Execute the query and display the results
result_df = execute_query(sql_query)
result_df

Unnamed: 0,status,amount
0,Shipped,8865094.64
1,Resolved,134235.88
2,Cancelled,238854.18
3,On Hold,169575.61
4,Disputed,61158.78
5,In Process,135271.52


___

### Question 4  

**Objective:** Use the `GROUP BY` clause with expressions to group and summarize data dynamically.

**Task**  

Using the `orders` and `orderdetails` tables, write a query to:  

- Retrieve the year from `orderDate`.  
- Calculate the total sales for each year by multiplying `quantityOrdered` with `priceEach`.  
- Include only orders where the status is `'Shipped'`.  

**Requirements**  

- Use the `YEAR` function to extract the year from `orderDate`.  
- Use the `SUM` function to calculate the total sales for each year.  
- Use `INNER JOIN` to join `orders` and `orderdetails` tables.  
- Use the `WHERE` clause to filter results to only include `'Shipped'` orders.  
- Ensure that the expression in the `SELECT` clause matches the one in the `GROUP BY` clause.  

**Solution**  

```sql
SELECT 
    YEAR(orderDate) AS year, 
    SUM(quantityOrdered * priceEach) AS total 
FROM 
    orders 
    INNER JOIN orderdetails USING (orderNumber) 
WHERE 
    status = 'Shipped' 
GROUP BY 
    YEAR(orderDate);
```


In [15]:
sql_query = """

SELECT 
    YEAR(orderDate) AS year, 
    SUM(quantityOrdered * priceEach) AS total 
FROM 
    orders 
    INNER JOIN orderdetails USING (orderNumber) 
WHERE 
    status = 'Shipped' 
GROUP BY 
    YEAR(orderDate);

"""

# Execute the query and display the results
result_df = execute_query(sql_query)
result_df

Unnamed: 0,year,total
0,2003,3223095.8
1,2004,4300602.99
2,2005,1341395.85


----

### Question 5  

**Objective:** Filter grouped results using the `HAVING` clause.

**Task**  

Using the `orders` and `orderdetails` tables, write a query to:  

- Retrieve the year from `orderDate`.  
- Calculate the total sales for each year by multiplying `quantityOrdered` with `priceEach`.  
- Include only orders where the status is `'Shipped'`.  
- Filter the results to show only years after 2003.  

**Requirements**  

- Use the `YEAR` function to extract the year from `orderDate`.  
- Use the `SUM` function to calculate the total sales for each year.  
- Use `INNER JOIN` to join `orders` and `orderdetails` tables.  
- Use the `WHERE` clause to filter results to only include `'Shipped'` orders.  
- Use the `HAVING` clause to filter results where the year is greater than 2003.  

**Solution**  

```sql
SELECT 
    YEAR(orderDate) AS year, 
    SUM(quantityOrdered * priceEach) AS total 
FROM 
    orders 
    INNER JOIN orderdetails USING (orderNumber) 
WHERE 
    status = 'Shipped' 
GROUP BY 
    YEAR(orderDate) 
HAVING 
    YEAR(orderDate) > 2003;
```


In [10]:
sql_query = """

   SELECT
        year(orderDate) as year,
        sum(quantityOrdered * priceEach) as Total_Amount
    FROM
        orders
    inner JOIN
        orderdetails
    USING(orderNumber)
    WHERE
        status = 'Shipped'
    GROUP BY
        year
    HAVING
        Total_Amount > 10000

"""

# Execute the query and display the results
result_df = execute_query(sql_query)
result_df

Unnamed: 0,year,Total_Amount
0,2003,3223095.8
1,2004,4300602.99
2,2005,1341395.85


___

### Question 6  

**Objective:** Group data by multiple columns to get detailed aggregation results.

**Task**  

Using the `orders` and `orderdetails` tables, write a query to:  

- Retrieve the year from `orderDate`.  
- Retrieve the `status` of orders.  
- Calculate the total sales for each combination of year and order status.  

**Requirements**  

- Use the `YEAR` function to extract the year from `orderDate`.  
- Use the `SUM` function to calculate the total sales for each group.  
- Use `INNER JOIN` to join `orders` and `orderdetails` tables.  
- Use `GROUP BY` with both `YEAR(orderDate)` and `status`.  
- Use `ORDER BY` to sort the results by year.  

**Solution**  

```sql
SELECT 
    YEAR(orderDate) AS year, 
    status, 
    SUM(quantityOrdered * priceEach) AS total 
FROM 
    orders 
    INNER JOIN orderdetails USING (orderNumber) 
GROUP BY 
    YEAR(orderDate), 
    status 
ORDER BY 
    YEAR(orderDate);
```


In [11]:
sql_query = """

SELECT 
    YEAR(orderDate) AS year, 
    status, 
    SUM(quantityOrdered * priceEach) AS total 
FROM 
    orders 
    INNER JOIN orderdetails USING (orderNumber) 
GROUP BY 
    YEAR(orderDate), 
    status 
ORDER BY 
    YEAR(orderDate);
"""

# Execute the query and display the results
result_df = execute_query(sql_query)
result_df

Unnamed: 0,year,status,total
0,2003,Cancelled,67130.69
1,2003,Resolved,27121.9
2,2003,Shipped,3223095.8
3,2004,Cancelled,171723.49
4,2004,On Hold,23014.17
5,2004,Resolved,20564.86
6,2004,Shipped,4300602.99
7,2005,Disputed,61158.78
8,2005,In Process,135271.52
9,2005,On Hold,146561.44


### Summary and Personal Reflection
____

This notebook has demonstrated the power of the `GROUP BY` clause in SQL, showcasing its ability to categorize, aggregate, and summarize data effectively. By combining `GROUP BY` with aggregate functions like `COUNT`, `SUM`, and `YEAR`, we can generate valuable insights from complex datasets, such as total sales by year or order status.

In my experience, mastering `GROUP BY` is essential for any SQL practitioner, as it enables us to organize and analyze data efficiently, providing a solid foundation for advanced queries and data manipulation tasks. By practicing these exercises, I have honed my SQL skills and gained a deeper understanding of how to leverage `GROUP BY` for insightful data analysis.

I hope you found this notebook informative and engaging. Feel free to explore the other notebooks in this series to discover more SQL projects and enhance your database skills. Thank you for reading!
