# Subqueries and joins in SQL

## Using subqueries
* What are subqueries?
    * Queries embedded into other queries
    * Relational databases store data in multiple tables
    * Subqueries merge data from multiple sources together
    * Helps with adding other filtering criteria

* Problem setup: subqueries to filter
    * Need to know the region each customer is from who has had an order with freight over 100
        1. Retrieve all customer IDs for orders with freight over 100
        2. Retrieve customer information
        3. Combine the two queries

```sql
SELECT
CustomerId
,CompanyName
,Region
FROM Customers
WHERE customerID IN (SELECT customerID
    FROM Orders
    WHERE Freight >100);

```
* Working with subquery statements
    * Always perform the innermost SELECT portion first
    * DBMS is performing two operations:
        1. Getting the order numbers for the product selected
        2. Adding that to the WHERE clause and processing the overall SELECT statement 

## Subquery best practices and considerations

* Best practices with subqueries
    * There is no limit to the number of subqueries you can have
    * Performance slows when you nest too deeply
    * Subquery selects can only retrieve a single column

* [PoorSQL](www.poorsql.com) Website
    * Website will pre-format code
    * Uses proper indenting
    * Code is easier to read and troubleshoot

* The power of subqueries
    * Subqueries are powerful tools
    * Not always the best option due to performance

## Joining tables: an introduction

* Benefits of breaking data into tables
    * Efficient storage (avoids duplicate information)
    * Easier manipulation
    * Greater scalability
    * Logically models a process

* Tables are related through common values (keys)

* Joins
    * Associate correct records from each table on the fly
    * Allows data retrieval from multiple tables in one query
    * **Joins are not physical** - they persist for the duration of the query execution

## Cartesian (cross) joins

* What is a cross join?
    * Each row from the first table joins with all the rows of another table

* They aren't frequently used, but may be helpful in specific cases

* It doesn't match anything, it simply combines all data

* It is computationally taxing

```sql
SELECT product_name
,unit_price
,company_name
FROM suppliers CROSS JOIN products;
```

## Inner joins

* What is an inner join?
    * The INNER JOIN keyword selects records that have matching values in both tables (intersection)

* Inner join syntax
    * Join type is specified (INNER JOIN)
    * Join condition is in the FROM clause and uses the ON clause
    * Joining more tables together affects overall database performance
    * You can join multiple tables, no limit
    * List all the tables, then define conditions

```sql
SELECT Suppliers.CompanyName
,ProductName
,UnitPrice
FROM Suppliers INNER JOIN Products ON
Suppliers.supplierid = Products.supplierid
```

-- Multiple tables
```sql
SELECT o.OrderID, c.CompanyName, e.LastName
FROM ((Orders o INNER Customers c ON
o.CustomerID = c.CustomerID)
INNER JOIN Employees e ON o.EmployeeID = e.EmployeeID);
```

* Best practices with inner joins
    * Make sure you are pre-qualifying names
    * Do not make unnecessary joins
    * Think about the type of join you are making
    * How are you connecting records?

## Aliases and Self Joins

* What is an alias?
    * SQL aliases give a table or a column a temporary name
    * Make column names more readable
    * An alias only exists for the duration of the query

```sql
SELECT column_name
FROM table_name AS alias_name

-- Example
SELECT vendor_name
,product_name
,product_price
FROM Vendors AS v, Products AS p
WHERE v.vendor_id = p.vendor_id
```
* Self joins

    * Match customers from the same city
    * Take the table and treat it like two separate tables
    * Join the original table to itself

```sql
SELECT column_name(s)
FROM table1 T1, table2 T2
WHERE condition;

-- Example
SELECT A.CustomerName AS CustomerName1,
B.CustomerName AS CustomerName2, A.City
FROM Customers A, Customers B
WHERE A.CUstomerID = B.CUstomerID
AND A.City = B.City
ORDER BY A.City;
```

    



## Advanced Joins: Left, Right and Full Outer Joins

* SQLite  vs. Other SQL DBMS
    * SQLite only does Left Joins
    * Other database management systems use all joins

* Left join
    * Returns all records from the left table (table1), and the matched records from the right table (table2)
    * The result is NULL from the right side, if there is no match

```sql
SELECT C.CustomerName, O.OrderID
FROM Customers C 
LEFT JOIN Orders O ON C.CustomerID = O.CustomerID
ORDER BY C.CustomerName;
```

* Right join
    * Difference between right and left is the order the tables are relating
    * Left joins can be turned into right joins by reversing the order of the tables

```sql
SELECT Orders.OrderID,
Employees.LastName,
Employees.FirstName
FROM Orders
RIGHT JOIN Employees ON
Orders.EmployeeID = Employees.EmployeeID
ORDER BY Orders.OrderID
```

* Full outer join
    * Return all records when there is a match in either left (table1) or right (table2) table records

```sql
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
FULL OUTER JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName;
```

## Unions

* The UNION operator is used to combine the result-set of two or more SELECT statements
* Each SELECT statement within UNION must have the same number of columns
* Columns must have similar data types
* The colimns in each SELECT statement must be in the same order

```sql
-- Basic union setup
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;

-- Example: Which German cities have suppliers?
SELECT City, Country FROM Customers
WHERE Country = 'Germany'
UNION
SELECT City, Country FROM suppliers
WHERE Country = 'Germany'
ORDER BY City
```

## Best practices using joins

* It is easy to get results - you must make sure they are the right results
* Check the number of records
* Does it seem logical given the kind of join you are performing?
* Check for duplicates
* Check the number of records each time you make a new join
* Are you getting the results you expected?
* Start small: one table at a time

* "Slowly Do":
    * Think about what you are trying to do first
    * Map how you are joining data tables
    * Think about what your query is trying to do
    * Thinking first now will save time and frustration later

* Joins and Database performance
    * The more tables you join, the slower the database will perform
    * Don't grab unnecessary data if you don't need to
    * Be strategic
    * Take only what you need

## Suggested readings

* [SQL and Python](https://mode.com/blog/learning-python-sql/)
* [Union and Union all](https://blog.sqlauthority.com/2009/03/11/sql-server-difference-between-union-vs-union-all-optimal-performance-comparison/)