# Assignment 4.  




## Queries combining multiple tables

Read chapters 7-11

In this assignment, you will work on "Problems for you to Solve" from Chapters 8, 9, and 11 in *SQL Queries for Mere Mortals*. 

We will use DataJoint to solve these problems and then review the same solutions in SQL.




# Queries with DataJoint

## Connect to the database server


In [1]:
import datajoint as dj

In [2]:
# Execute this only if you have not connected with datajoint before.  
import json
with open('cred.json') as f:
    creds = json.load(f)
    

dj.config['database.host'] = creds['host']
dj.config['database.user'] = creds['user']
dj.config['database.password'] = creds['password']

dj.config.save_local()

# Queries with SQL

First, let's connect to the database and activate the SQL magic for jupyter.

### Increasing complexity of queries

When working with queries that address multiple tables, rely on the structure of the foreign keys and primary keys to guide the design of your queries. 

We will work with two basic patterns to solve most problems in the book:

1. Using a subquery in the `WHERE` clause
2. Inner natural join.

The database schemas in the book are designed so that foreign key columns share the same names as the primary key of their parent tables. This allows the use of natural joins for logically meaningful joins of tables related by the foreign key. 

In [None]:
# Activate SQL Magic

import json
import pymysql 

pymysql.install_as_MySQLdb()

with open('cred.json') as f:
    creds = json.load(f)

connection_string = "mysql://{user}:{password}@{host}".format(**creds)

In [None]:
%load_ext sql
%config SqlMagic.autocommit=True
%sql $connection_string

Let's solve the following query in the `shared_sales` database:
> Show all the customers who have bought a bicycle, i.e. products in category "Bike"

Review the schema diagram to follow how we build the query step-by-step.

In [None]:
%%sql

USE shared_sales

In [None]:
%%sql

show tables


In [None]:
%%sql

-- Step 1:
-- category for Bike

SELECT * 
    FROM categories 
    WHERE CategoryDescription = "Bikes"

In [None]:
%%sql

-- Step 2
-- Products from that category

SELECT * 
FROM products
WHERE CategoryID in (
    SELECT CategoryID 
    FROM categories
    WHERE CategoryDescription = "Bikes"
)

In [None]:
%%sql

-- Step 2
-- Products from that category, USING JOIN

SELECT * 
    FROM products NATURAL JOIN categories
    WHERE CategoryDescription = "Bikes"

In [None]:
%%sql

-- Step 3
-- Order details that contain such products

SELECT * 
FROM order_details
WHERE ProductNumber in (
    SELECT ProductNumber 
    FROM products NATURAL JOIN categories
    WHERE CategoryDescription = "Bikes")

In [None]:
%%sql

-- Step 3
-- Order details that contain such products using JOINS

SELECT * 
FROM order_details NATURAL JOIN products NATURAL JOIN categories
WHERE CategoryDescription = "Bikes"

In [None]:
%%sql

-- Step 4
-- Orders that contain such products 

SELECT * 
FROM orders 
WHERE OrderNumber IN (
    SELECT OrderNumber
    FROM order_details NATURAL JOIN products NATURAL JOIN categories
    WHERE CategoryDescription = "Bikes")

Why can we not convert this into a pure join query?

In [None]:
%%sql

-- Step 5
-- Customers on these orders

SELECT * 
FROM customers
WHERE CustomerID in (
    SELECT CustomerID    
    FROM orders 
    WHERE OrderNumber IN (
        SELECT OrderNumber
        FROM order_details NATURAL JOIN products NATURAL JOIN categories
        WHERE CategoryDescription = "Bikes"))

These two types of queries: equi-joins (including natural joins) and subqueries in the `WHERE` clauses can be used to solve the majority of problems in the assignment.