# SQL Subqueries
Subqueries are used to substitute JOIN statements

In [1]:
import sqlite3
import pandas as pd
db = sqlite3.connect(r"C:\Users\nrmmw\Documents\Flatiron\Repos\Phase_2\dsc-sql-subqueries\data.sqlite")

In [6]:
# Query all the employees from US
# Using JOIN statements
pd.read_sql("""
SELECT firstName, lastName
FROM employees
JOIN offices
    USING (officeCode)
WHERE country = "USA";
""", db)

Unnamed: 0,firstName,lastName
0,Anthony,Bow
1,Jeff,Firrelli
2,Leslie,Jennings
3,Diane,Murphy
4,Mary,Patterson
5,Leslie,Thompson
6,Julie,Firrelli
7,Steve,Patterson
8,Foon Yue,Tseng
9,George,Vanauf


In [7]:
# Query all the employees from US
# Using Subqueries
pd.read_sql("""
SELECT firstName, lastName
FROM employees
WHERE officeCode in (
    SELECT officeCode
    FROM offices
    WHERE country = "USA"
    );
""", db)

Unnamed: 0,firstName,lastName
0,Diane,Murphy
1,Mary,Patterson
2,Jeff,Firrelli
3,Anthony,Bow
4,Leslie,Jennings
5,Leslie,Thompson
6,Julie,Firrelli
7,Steve,Patterson
8,Foon Yue,Tseng
9,George,Vanauf


### Subqueries for Filtering Based on an Aggregation

In [8]:
# Find all of the employees from offices 
# with at least 5 employees
pd.read_sql("""
SELECT lastName, firstName, officeCode
FROM employees
WHERE officeCode IN (
    SELECT officeCode 
    FROM offices 
    JOIN employees
        USING(officeCode)
    GROUP BY 1
    HAVING COUNT(employeeNumber) >= 5
)
;
""",db)

Unnamed: 0,lastName,firstName,officeCode
0,Murphy,Diane,1
1,Patterson,Mary,1
2,Firrelli,Jeff,1
3,Bondur,Gerard,4
4,Bow,Anthony,1
5,Jennings,Leslie,1
6,Thompson,Leslie,1
7,Bondur,Loui,4
8,Hernandez,Gerard,4
9,Castillo,Pamela,4


In [20]:
pd.read_sql("""
SELECT lastName, firstName, city
FROM employees
JOIN offices
    USING (officeCode)
WHERE officeCode IN (
    SELECT officeCode
    FROM offices
    JOIN employees
        USING (officeCode)
    GROUP BY officeCode
    HAVING COUNT(employeeNumber) >=5);
""", db)

Unnamed: 0,lastName,firstName,city
0,Murphy,Diane,San Francisco
1,Patterson,Mary,San Francisco
2,Firrelli,Jeff,San Francisco
3,Bondur,Gerard,Paris
4,Bow,Anthony,San Francisco
5,Jennings,Leslie,San Francisco
6,Thompson,Leslie,San Francisco
7,Bondur,Loui,Paris
8,Hernandez,Gerard,Paris
9,Castillo,Pamela,Paris


In [9]:
#  Find the average of individual customers' average payments
pd.read_sql("""
SELECT AVG(customerAvgPayment) AS averagePayment
FROM (
    SELECT AVG(amount) AS customerAvgPayment
    FROM payments
    JOIN customers
        USING(customerNumber)
    GROUP BY customerNumber
);
""", db)

Unnamed: 0,averagePayment
0,31489.754582


You can also run subqueries that reference keys with different names between different tables. For example you can use the employee number in the employees table and the matching sales rep employee number in the customers table.

In [10]:
pd.read_sql("""
SELECT lastName, firstName, employeeNumber
FROM employees
WHERE employeeNumber IN (SELECT salesRepEmployeeNumber
                     FROM customers 
                     WHERE country = "USA")
;
""", db)

Unnamed: 0,lastName,firstName,employeeNumber
0,Jennings,Leslie,1165
1,Thompson,Leslie,1166
2,Firrelli,Julie,1188
3,Patterson,Steve,1216
4,Tseng,Foon Yue,1286
5,Vanauf,George,1323


This is pure sorcery

# SQL Subqueries Lab

The following query works using a `JOIN`. Rewrite it so that it uses a subquery instead.

```sql
SELECT
    customerNumber,
    contactLastName,
    contactFirstName
FROM customers
JOIN orders 
    USING(customerNumber)
WHERE orderDate = '2003-01-31'
;
```

In [25]:
pd.read_sql("""
SELECT customerNumber, contactLastName, contactFirstName
FROM customers
WHERE customerNumber IN (
    SELECT customerNumber
    FROM orders
    WHERE orderDate = '2003-01-31'
        );
""", db)

Unnamed: 0,customerNumber,contactLastName,contactFirstName
0,141,Freyre,Diego


**Select the Total Number of Orders for Each Product Name**

Sort the results by the total number of items sold for that product.

In [39]:
pd.read_sql("""
SELECT productName, COUNT(*) AS orderCount
FROM products AS p
JOIN orderdetails AS od
    ON p.productCode = od.productCode
GROUP BY p.productName
ORDER BY COUNT(*) DESC;
""", db)

Unnamed: 0,productName,orderCount
0,1992 Ferrari 360 Spider red,53
1,P-51-D Mustang,28
2,HMS Bounty,28
3,F/A 18 Hornet 1/72,28
4,Diamond T620 Semi-Skirted Tanker,28
...,...,...
104,1932 Alfa Romeo 8C2300 Spider Sport,25
105,1917 Grand Touring Sedan,25
106,1911 Ford Town Car,25
107,1957 Ford Thunderbird,24


Yeah. This is tough for me