# Assignment 4.  




## Queries combining multiple tables

Read chapters 7-11

In this assignment, you will work on "Problems for you to Solve" from Chapters 8, 9, and 11 in *SQL Queries for Mere Mortals*. 

We will use DataJoint to solve these problems and then review the same solutions in SQL.




# Queries with DataJoint

## Connect to the database server


In [None]:
import datajoint as dj

In [None]:
# Execute this only if you have not connected with datajoint before.  
import json
with open('cred.json') as f:
    creds = json.load(f)

dj.config['database.host'] = creds['host']
dj.config['database.user'] = creds['user']
dj.config['database.password'] = creds['password']

dj.config.save_local()

In [None]:
dj.list_schemas()

In [None]:
schema = dj.schema('shared_dj_sales')
schema.spawn_missing_classes()

In [None]:
dj.Diagram(schema)

If you could not see the graph, then go to the DataJoint documentation to install `pydotplus`  https://docs.datajoint.io/python/setup/02-DataJoint-Python-Windows-Install-Guide.html

## Simple Query

Customers()

## Fetching

In [None]:
arr = Customers().fetch()  # retrieve as a numpy recarray

In [None]:
dicts = Customers().fetch(as_dict=True)  # retrieve as a list of dictionaries

In [None]:
df = Customers().fetch(format='frame')  # retrieve as a pandas dataframe

## Restriction

In [None]:
Customers() & {'cust_state': 'WA'}  # restrict by a dict

In [None]:
Customers() & 'cust_state ="WA"'   # restrict by string

In [None]:
Customers() & {'customer_id': 1001}

In [None]:
keys = Customers.fetch('KEY')

In [None]:
Customers & keys[2]

In [None]:
# Give me all the products that cost more than $1000
Products() & 'retail_price > 1000'

## Negative restriction
Customers who are not from WA

In [None]:
Customers() - {'cust_state': 'WA'}

## Projection 

In [None]:
# always includes primary key
Customers.proj()

In [None]:
q = Customers.proj('cust_last_name', 'cust_first_name')

In [None]:
q

In [None]:
Products()

In [None]:
q = Products().proj('product_name', 'retail_price', 
                    stock_value='retail_price * quantity_on_hand')
q 

In [None]:
# give me all products whose stock value is over 5000
(q & 'stock_value > 5000').proj('stock_value')

## Restricting with another query 


In [None]:
dj.Diagram(schema)

In [None]:
# All customers who have made an order

Customers & Orders

In [None]:
# All customers who have not made an order

Customers - Orders

In [None]:
(Customers & Orders).make_sql()

In [None]:
# Give me all orders for items whose price exceeds $1000

In [None]:
expensive_products = Products & 'retail_price > 1000'
big_orders = Orders & (OrderDetails & expensive_products)
big_orders

In [None]:
big_orders.make_sql()

In [None]:
# Show all products from the category "Bikes"
bikes = Categories() & 'category_description = "Bikes"'
Products() & bikes

In [None]:
(Products & bikes).make_sql()

In [None]:
# All order IDs where the customer was from Washington
washington_customers = Customers & 'cust_state="WA"'
(Orders & washington_customers).proj()

## Joins 

In [None]:
Products * Categories

In [None]:
(Products * Categories & 'category_description = "Bikes"').proj('product_name', 'retail_price')

In [None]:
(Products & (Categories & 'category_description = "Bikes"')).proj('product_name', 'retail_price')

In [None]:
# All the orders that took more than 2 days to ship
Orders().proj(days_to_ship='DATEDIFF(ship_date, order_date)') & 'days_to_ship > 2'

In [None]:
ProductVendors()

In [None]:
discounts = OrderDetails * Products & 'quoted_price < retail_price'
(Orders & discounts).proj('order_date')

In [None]:
dj.Diagram(schema)

In [None]:
# Find all orders with products that have vendors from TX
Orders & (OrderDetails * Products * ProductVendors * Vendors & 'vend_state = "TX"')

In [None]:
# Give all products some vendors are from TX
Products & (ProductVendors * Vendors & 'vend_state="TX"')

In [None]:
# Advanced: Give all products where all vendors are from TX
Products - (ProductVendors * Vendors - 'vend_state="TX"')

In [None]:
# rename "retail_price" into "price"
Products().proj(..., price='retail_price')

In [None]:
Customers() * Products()

In [None]:
OrderDetails() * Orders()

# Homework

Write the following queries in DataJoint:

In [None]:
# 1. List all products that need to be restocked (less than 10 are left on hand) 
# (6 items)

In [None]:
# 2. List all orders made since "2018-01-01"
# (350 items)

In [None]:
# 3. List customers who have never ordered anything.
# (1 item)

In [None]:
# 4. List all orders for bicycles
# (586 items)

In [None]:
# 5. List customers whose first name is Liz.
# (1 item)

In [None]:
# 6. List all orders made by Liz.
# (39 items)

In [None]:
# 7. List all orders made by employees named Susan.
# (130 items)

In [None]:
# 8. List all employees who have not sold anything to Liz since 2018-01-01
# (1 item)

In [None]:
# 9. List the names of all the customers to whom Susan has sold something in Sept 2017
# (17 items)

In [None]:
# 10. List the names of products sold on 2017-09-05
# (14 items)

In [None]:
# 11. List employees who did not make a sale on 2017-08-05
# (5 items)

In [None]:
# 12. List empty orders (without any items)
# (11 items)

In [None]:
# 13. List all orders where the customer and the salesperson where from the same city. 
# (41 items)

In [None]:
# 14. List all customers who have never bought items over 1000 in quoted price 
# (5 items)