<div align="right" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/Logo blue_dark.png"  style="width:25px" align="right";/>
</div>

# Joins and set operations
© ExploreAI Academy

In this exercise, we will use the skills we have learned so far to solve complex problems that involve string manipulation and handling.



## Learning objectives

In this exercise, we will:
- Understand and use different types of SQL Joins to combine data from different tables.
- Use `UNION`, `INTERSECT`, and `MINUS/EXCEPT` operators to combine rows from different tables.
- Understand the difference between Joins and Unions/Intersections/Minus in terms of column-wise and row-wise concatenation.

First, let's load our sample database:

In [1]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook.
%load_ext sql

In [2]:
# Load the Northwind database stored in your local machine. 
# Make sure the file is saved in the same folder as this notebook.
%sql sqlite:///Northwind.db

Here is a view of all of our tables in the database:

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/Northwind_ERD.png"  style="width:70%";/>
<br>
<br>
    <em>Figure 1: Northwind database ERD</em>
</div>

## Overview
Run the necessary queries that will provide us with the following information. Compare your queries with the solutions at the end of this notebook.


In [7]:
%config SqlMagic.displaylimit = None

### Exercise 1

Using the Northwind database, determine the list of products and their respective suppliers. 

In [19]:
%%sql
SELECT
    p.ProductName,
    s.CompanyName
FROM
    Products AS p
LEFT JOIN
    Suppliers AS s
ON
    p.SupplierID = s.SupplierID

ProductName,CompanyName
Chai,Exotic Liquids
Chang,Exotic Liquids
Aniseed Syrup,Exotic Liquids
Chef Anton's Cajun Seasoning,New Orleans Cajun Delights
Chef Anton's Gumbo Mix,New Orleans Cajun Delights
Grandma's Boysenberry Spread,Grandma Kelly's Homestead
Uncle Bob's Organic Dried Pears,Grandma Kelly's Homestead
Northwoods Cranberry Sauce,Grandma Kelly's Homestead
Mishi Kobe Niku,Tokyo Traders
Ikura,Tokyo Traders


### Exercise 2

Generate a list of all cities where customers and employees are located.

In [13]:
%%sql

SELECT
    c.ContactName AS Name,
    c.City,
    "Customer" AS Type
FROM
    customers AS c
UNION
SELECT
    e.FirstName || ' ' || e.LastName,
    e.City,
    "Employee" AS Type
FROM
    employees AS e

Name,City,Type
Alejandra Camino,Madrid,Customer
Alexander Feuer,Leipzig,Customer
Ana Trujillo,Mxico D.F.,Customer
Anabela Domingues,Sao Paulo,Customer
Andr Fonseca,Campinas,Customer
Andrew Fuller,Tacoma,Employee
Ann Devon,London,Customer
Anne Dodsworth,London,Employee
Annette Roulet,Toulouse,Customer
Antonio Moreno,Mxico D.F.,Customer


###  Exercise 3

Create a list of all unique cities where customers are located and employees live. 

In [27]:
%%sql

SELECT
    c.City
FROM
    customers AS c
INTERSECT
SELECT
    e.City
FROM
    employees AS e

City
Kirkland
London
Seattle


### 4. Challenge question

Using the Northwind database, as the new sales analyst, you've been asked to prepare a sales report for the last quarter. Your report should include:
1. Customer names and their countries.
2. The total amount spent by each customer.
3. The list of products each customer purchased.
4. The total quantity of each product purchased by each customer.

Only include customers who have spent more than 5000 in the last quarter. 

In [40]:
%%sql

SELECT
    cus.ContactName,
    cus.Country,
    od.Quantity,
    SUM(od.Quantity * od.UnitPrice) AS Total, 
    od.ProductID
FROM
    Customers AS cus
LEFT JOIN
    Orders AS o
ON
    cus.CustomerID = o.CustomerID
LEFT JOIN
    OrderDetails AS od
ON
    o.OrderID = od.OrderID
GROUP BY
    cus.ContactName
HAVING
    SUM((od.Quantity * od.UnitPrice)*(1-od.Discount)) > 5000

ContactName,Country,Quantity,Total,ProductID
Alexander Feuer,Germany,20,5042.2,28
Anabela Domingues,Brazil,20,7310.62,20
Andr Fonseca,Brazil,14,8702.23,31
Ann Devon,UK,30,15033.66,69
Annette Roulet,France,15,10272.350000000002,50
Antonio Moreno,Mexico,24,7515.349999999999,11
Art Braunschweiger,USA,24,12489.7,33
Bernardo Batista,Brazil,20,6973.63,21
Carlos Gonzlez,Venezuela,20,17825.06,15
Carlos Hernndez,Venezuela,25,23611.58,27


## Solutions

### Exercise 1

This query combines rows from `Products` and `Suppliers` tables based on the condition where `Products.SupplierID` equals `Suppliers.SupplierID`.

Expected outcome: The result should be a table that includes columns for `ProductName` and `SupplierName` which shows the products and their respective suppliers.

In [None]:
%%sql

SELECT 
    product.ProductName, 
    supplier.CompanyName
FROM 
    Products AS product
INNER JOIN 
    Suppliers AS supplier
    ON product.SupplierID = supplier.SupplierID;

### Exercise 2

The `UNION` operator is used to combine the results of two `SELECT` statements. It removes duplicate rows from the results.

Expected outcome: A single column table of unique cities where customers and employees are located.

In [14]:
%%sql

SELECT 
    City 
FROM 
    Customers
UNION
SELECT 
    City 
FROM 
    Employees;

City
""
Aachen
Albuquerque
Anchorage
Barcelona
Barquisimeto
Bergamo
Berlin
Bern
Boise


###  Exercise 3

The `JOIN` query returns cities that are common in both `Customers` and `Employees` tables (i.e. it looks for matches). On the other hand, the `UNION` query returns a list of all unique cities in both tables, eliminating any duplicates.

Expected outcome: Two sets of outputs – one for the `JOIN` query and another for the `UNION` query.

In [26]:
%%sql

SELECT 
    Customers.City 
FROM 
    Customers 
INNER JOIN 
    Employees 
    ON Customers.City = Employees.City;


City
London
London
London
London
London
London
London
London
London
London


### 4. Challenge question

In [36]:
%%sql

SELECT 
    customer.ContactName, 
    customer.Country,
    SUM(orderDetails.Quantity * orderDetails.UnitPrice) AS TotalSalesAmount,
    product.ProductName,
    SUM(orderDetails.Quantity) AS TotalUnitsSold
FROM 
    Customers AS customer
JOIN 
    Orders AS orders 
    ON customer.CustomerID = orders.CustomerID
JOIN 
    OrderDetails AS orderDetails 
    ON orders.OrderID = orderDetails.OrderID
JOIN 
    Products AS product 
    ON orderDetails.ProductID = product.ProductID
WHERE 
    orders.OrderDate BETWEEN '1996-08-01' AND '1998-01-30'
GROUP BY 
    customer.ContactName,
    customer.Country, 
    product.ProductName
HAVING 
    SUM(orderDetails.Quantity * orderDetails.UnitPrice) > 5000;


ContactName,Country,TotalSalesAmount,ProductName,TotalUnitsSold
Georg Pipps,Austria,10540.0,Cte de Blaye,50
Horst Kloss,Germany,6222.0,Camembert Pierrot,194
Horst Kloss,Germany,7905.0,Cte de Blaye,30
Horst Kloss,Germany,5268.0,Schoggi Schokolade,120
Howard Snyder,USA,11857.5,Cte de Blaye,45
Jean Fresnire,Canada,10329.2,Cte de Blaye,49
Jose Pavarotti,USA,6832.4400000000005,Thringer Rostbratwurst,60
Jytte Petersen,Denmark,10540.0,Cte de Blaye,50
Lcia Carvalho,Brazil,8432.0,Cte de Blaye,40
Patricia McKenna,Ireland,5830.0,Raclette Courdavault,112


This problem involves pulling data from multiple tables (`Customers`, `Orders`, `OrderDetails`, and `Products`) to create a detailed report. It's testing your ability to use `JOINS`, aggregate functions, grouping, and the `HAVING` clause.

Here's how to break down the problem:

1. **Identify the tables you need to pull data from**: For this problem, you'll need data from the `Customers`, `Orders`, `OrderDetails`, and `Products` tables.
2. **Join these tables**: Use SQL joins to connect these tables. The `Orders` table is connected to `Customers` via `CustomerID`. The `OrderDetails` table is connected to `Orders` via `OrderID` and to `Products` via `ProductID`.
3. **Filter the data**: Use the `WHERE` clause to filter orders to only include those from the last quarter ('2023-04-01' to '2023-06-30').
4. **Aggregate the data**: Aggregate functions are needed to calculate the total amount spent by each customer (SUM of `Quantity` * `UnitPrice`) and the total quantity of each product purchased by each customer (`SUM` of `Quantity`). This is done in the `SELECT` clause.
5. **Group the data**: Use the `GROUP BY` clause to segment the data by `CustomerName`, `Country`, and `ProductName`.
6. **Apply a condition on an aggregate:** Lastly, use the `HAVING` clause to include only customers who have spent more than 5000 in the last quarter. Remember, when you want to use a condition on an aggregate, you should use the `HAVING` clause, not the `WHERE` clause. The `WHERE` clause is used to filter rows, while the `HAVING` clause is used to filter groups.

The result is a detailed report with customer names, their countries, the total amount spent by each customer, the list of products each customer purchased, and the total quantity of each product purchased by each customer for the last quarter. Only customers who spent more than 5000 are included.

#  

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/EAI_Blue_Dark.png"  style="width:200px";/>
</div>