# Database Design and Load Exercise

### Steps
 1. Analyze - Completed last module
 2. Design - Completed last module. 
 3. Data Carpentry
 4. Data Loading
 5. Analytical Queries

# Smart Stores ERD

In Module 2 module everyone developed an ERD diagram for a smart store based on the entities and attributes below.



### orders :
* `order_id`: order identifier
* `user_id`: customer identifier
* `eval_set`: which evaluation set this order belongs in (see `SET` described below)
* `order_number`: the order sequence number for this user (1 = first, n = nth)
* `order_dow`: the day of the week the order was placed on
* `order_hour_of_day`: the hour of the day the order was placed on
* `days_since_prior`: days since the last order, capped at 30 (with NAs for `order_number` = 1)

### products :
* `product_id`: product identifier
* `product_name`: name of the product
* `aisle_id`: foreign key
* `department_id`: foreign key

### aisles :
* `aisle_id`: aisle identifier
* `aisle`: the name of the aisle

### deptartments :
* `department_id`: department identifier
* `department`: the name of the department

### order_products :
* `order_id`: foreign key
* `product_id`: foreign key
* `add_to_cart_order`: order in which each product was added to cart
* `reordered`: 1 if this product has been ordered by this user in the past, 0 otherwise

#### where SET is one of the four following evaluation sets (eval_set in orders):

* "prior": orders prior to that users most recent order (~3.2m orders)
* "train": training data supplied to participants (~131k orders)
* "test": test data reserved for machine learning competitions (~75k orders)

# ERD Diagram

For this assignment this is the ERD diagram everyone should use based on the requirements above. It is ok if you had something a little bit different in last week's exercise as long as you captured all of the requirements and relationships.

![erd](../images/smart_store.png)


### 1. Create the tables in the database using SQLAlchemy

Convert the Entities and attributes into a Database schema for Postgres. **Remember to specify your Primary Keys and Foreign Keys for each table!** 

In [2]:
import pandas as pd
import getpass
mypasswd = getpass.getpass()
username = 'bmgwd9'
host = 'pgsql.dsa.lan'
database = 'dsa_student'

# Then connects to the DB
from sqlalchemy.engine.url import URL
from sqlalchemy import create_engine

# SQLAlchemy Connection Parameters
postgres_db = {'drivername': 'postgres',
               'username': username,
               'password': mypasswd,
               'host': host,
               'database' :database}
engine = create_engine(URL(**postgres_db), echo=False)
del mypasswd

········


In [42]:
query = """
DROP TABLE IF EXISTS aisles CASCADE;
"""

with engine.connect() as connection:
    res = connection.execute(query)
    print(res)

<sqlalchemy.engine.result.ResultProxy object at 0x7fac3adc8278>


In [45]:
query = """
DROP TABLE IF EXISTS orders;
CREATE TABLE orders (
    order_id                INT, -- Integer
    user_id                 INT, 
    eval_set                varchar(100), -- Character String, varied length
    order_number            INT, 
    order_dow               INT,
    order_hour_of_day       INT,
    days_since_prior_order  REAL, -- Floating point number
    PRIMARY KEY (order_id)
)
"""

with engine.connect() as connection:
    res = connection.execute(query)
    print(res)

<sqlalchemy.engine.result.ResultProxy object at 0x7fac549fcc50>


In [46]:
query = """
DROP TABLE IF EXISTS aisles;
CREATE TABLE aisles (
    aisle_id             INT, 
    aisle                varchar(100), 
     PRIMARY KEY (aisle_id)
)
"""

with engine.connect() as connection:
    res = connection.execute(query)
    print(res)

<sqlalchemy.engine.result.ResultProxy object at 0x7fac549fc860>


In [47]:
query = """
CREATE TABLE departments (
    department_id              INT,
    department                 varchar(100),
     PRIMARY KEY (department_id)
)
"""

with engine.connect() as connection:
    res = connection.execute(query)
    print(res)

<sqlalchemy.engine.result.ResultProxy object at 0x7fac5d3f4eb8>


In [48]:
query = """
DROP TABLE IF EXISTS products;
CREATE TABLE products (
    product_id           INT, 
    department_id        INT,
    aisle_id             INT,
    product_name         varchar(200),
     PRIMARY KEY (product_id),
     FOREIGN KEY (department_id) REFERENCES departments(department_id),
     FOREIGN KEY (aisle_id) REFERENCES aisles(aisle_id)
)
"""

with engine.connect() as connection:
    res = connection.execute(query)
    print(res)

<sqlalchemy.engine.result.ResultProxy object at 0x7fac5d91d048>


In [49]:
query = """
DROP TABLE IF EXISTS order_products;
CREATE TABLE order_products (
    order_id             INT, 
    product_id           INT,
    add_to_cart_order    INT,
    reordered            REAL,
     PRIMARY KEY (order_id, product_id),
     FOREIGN KEY (order_id) REFERENCES orders(order_id),
     FOREIGN KEY (product_id) REFERENCES products(product_id)
)
"""

with engine.connect() as connection:
    res = connection.execute(query)
    print(res)

<sqlalchemy.engine.result.ResultProxy object at 0x7fac54e73400>


### 2. Load data from the following files using SQLAlchemy:

#### `/dsa/data/all_datasets/instacart/orders.csv`
 * 3421084 Rows
 * File Preview 
```
order_id,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order
2539329,1,prior,1,2,08,
2398795,1,prior,2,3,07,15.0
473747,1,prior,3,3,12,21.0
2254736,1,prior,4,4,07,29.0
```

#### `/dsa/data/all_datasets/instacart/products.csv`
 * 49689 Rows
 * File Preview 
```
product_id,product_name,aisle_id,department_id
1,Chocolate Sandwich Cookies,61,19
2,All-Seasons Salt,104,13
3,Robust Golden Unsweetened Oolong Tea,94,7
4,Smart Ones Classic Favorites Mini Rigatoni With Vodka Cream Sauce,38,1
```

#### `/dsa/data/all_datasets/instacart/aisles.csv`
 * 135 Rows
 * File Preview 
```
aisle_id,aisle
1,prepared soups salads
2,specialty cheeses
3,energy granola bars
4,instant foods
```

#### `/dsa/data/all_datasets/instacart/departments.csv`
 * 22 Rows
 * File Preview 
```
department_id,department
1,frozen
2,other
3,bakery
4,produce
```

#### `/dsa/data/all_datasets/instacart/order_products.csv`
 * 1384618 Rows
 * File Preview 
```
order_id,product_id,add_to_cart_order,reordered
1,49302,1,1
1,11109,2,1
1,10246,3,0
1,49683,4,0
```
     

## In each designated cell, load the data using Python



### 2.1 - Orders 

In [50]:
orders = pd.read_csv('/dsa/data/all_datasets/instacart/orders.csv')

orders.to_sql('orders', # The table to load
          engine,             # The engine created above
          schema= username,   # The schema where the table lives, our pawprint
          if_exists='append', # If the table is found, it would keep loading the end of table.
          index=False,        # Ignore creating an index for the index col in the dataframe
          chunksize=100,      # Do 100 records from the data frame at a time
          method='multi')       


### 2.2 - Products







In [54]:
products = pd.read_csv('/dsa/data/all_datasets/instacart/products.csv')

products.to_sql('products', # The table to load
          engine,             # The engine created above
          schema= username,   # The schema where the table lives, our pawprint
          if_exists='append', # If the table is found, it would keep loading the end of table.
          index=False,        # Ignore creating an index for the index col in the dataframe
          chunksize=10)    


### 2.3 - Aisles

In [52]:
aisles = pd.read_csv('/dsa/data/all_datasets/instacart/aisles.csv')

aisles.to_sql('aisles', # The table to load
          engine,             # The engine created above
          schema= username,   # The schema where the table lives, our pawprint
          if_exists='append', # If the table is found, it would keep loading the end of table.
          index=False,        # Ignore creating an index for the index col in the dataframe
          chunksize=100,      # Do 100 records from the data frame at a time
          method='multi')    


### 2.4 - Departments

In [53]:
departments = pd.read_csv('/dsa/data/all_datasets/instacart/departments.csv')

departments.to_sql('departments', # The table to load
          engine,             # The engine created above
          schema= username,   # The schema where the table lives, our pawprint
          if_exists='append', # If the table is found, it would keep loading the end of table.
          index=False,        # Ignore creating an index for the index col in the dataframe
          chunksize=100,      # Do 100 records from the data frame at a time
          method='multi')  


### 2.5 - Order Products

In [55]:
order_products = pd.read_csv('/dsa/data/all_datasets/instacart/order_products.csv')

order_products.to_sql('order_products', # The table to load
          engine,             # The engine created above
          schema= username,   # The schema where the table lives, our pawprint
          if_exists='append', # If the table is found, it would keep loading the end of table.
          index=False,        # Ignore creating an index for the index col in the dataframe
          chunksize=100,      # Do 100 records from the data frame at a time
          method='multi')  


# Save your notebook, then `File > Close and Halt`

---