# Lesson 2 Exercise 2: Creating Denormalized Tables

![](../images/postgresSQLlogo.png)

## Walk through the basics of modeling data from normalized form to denormalized form. We will create tables in PostgreSQL, insert rows of data, and do simple JOIN SQL queries to show how these multiple tables can work together. 

#### Where you see ##### you will need to fill in code. This exercise will be more challenging than the last. Use the information provided to create the tables and write the insert statements.

#### Remember the examples shown are simple, but imagine these situations at scale with large datasets, many users, and the need for quick response time. 

### Import the library 
Note: An error might popup after this command has exectuted. If it does read it careful before ignoring. 

In [None]:
import psycopg2
from src.database import get_pg_connection, get_pg_cursor

### Create a connection to the database, get a cursor, and set autocommit to true

In [None]:
conn = get_pg_connection()
cur = get_pg_cursor(conn)
conn.set_session(autocommit=True)

#### Let's start with our normalized (3NF) database set of tables we had in the last exercise, but we have added a new table `sales`. 

```
Table Name: transactions2 
Column 0: Transaction Id
Column 1: Customer Name
Column 2: Cashier Id
Column 3: Year
```
![](images/table16.png)

```
Table Name: albums_sold
Column 0: Album Id
Column 1: Transaction Id
Column 3: Album Name
```
![](images/table15.png)

```
Table Name: employees
Column 0: Employee Id
Column 1: Employee Name
```
![](images/table17.png)

```
Table Name: sales
Column 0: Transaction Id
Column 1: Amount Spent
```
![](images/table18.png)

### TO-DO: Add all `CREATE` statements for all tables and `INSERT` data into the tables


In [None]:
from src.database import drop_pg_table, create_pg_table, insert_pg_rows

# transactions table
table_name = "transactions"
column_names = ["Transaction_ID", "Customer_Name", "Employee_ID", "Year"]
column_types = ["int", "varchar", "int", "int"]
columns = {name: type for name, type in zip(column_names, column_types)}
drop_pg_table(conn, table_name)
create_pg_table(conn, table_name, columns)
insert_pg_rows(
    conn,
    table_name,
    column_names,
    [
        (1, "Amanda", 1, 2000),
        (2, "Toby", 1, 2000),
        (3, "Max", 2, 2018),
    ],
)

# albums sold table
table_name = "albums_sold"
column_names = [
    "Album_ID",
    "Album_Name",
    "Transaction_ID",
]
column_types = ["int", "varchar", "int"]
columns = {name: type for name, type in zip(column_names, column_types)}
drop_pg_table(conn, table_name)
create_pg_table(conn, table_name, columns)
insert_pg_rows(
    conn,
    table_name,
    column_names,
    [
        (1, "Rubber Soul", 1),
        (2, "Let It Be", 1),
        (3, "My Generation", 2),
        (4, "Meet the Beatles", 3),
        (5, "Help!", 3),
    ],
)
# employee table
table_name = "employees"
column_names = [
    "Employee_ID",
    "Employee_Name",
]
column_types = ["int", "varchar"]
columns = {name: type for name, type in zip(column_names, column_types)}
drop_pg_table(conn, table_name)
create_pg_table(conn, table_name, columns)
insert_pg_rows(
    conn,
    table_name,
    column_names,
    [
        (1, "Sam"),
        (2, "Bob"),
    ],
)

# sales table
table_name = "sales"
column_names = [
    "Transaction_ID",
    "Amount_Spent",
]
column_types = ["int", "int"]
columns = {name: type for name, type in zip(column_names, column_types)}
drop_pg_table(conn, table_name)
create_pg_table(conn, table_name, columns)
insert_pg_rows(
    conn,
    table_name,
    column_names,
    [
        (1, 40),
        (2, 19),
        (3, 45),
    ],
)

#### TO-DO: Confirm using the Select statement the data were added correctly

In [None]:
for table in ["transactions", "albums_sold", "employees", "sales"]:
    print(f"\nTable: {table}\n")
    try:
        cur.execute(f"SELECT * FROM {table};")
    except psycopg2.Error as e:
        print("Error: select *")
        print(e)

    row = cur.fetchone()
    while row:
        print(row)
        row = cur.fetchone()

### Let's say you need to do a query that gives:

```
transaction_id
customer_name
employee_name
year
album_name
amount_spent
```


### TO-DO: Complete the statement below to perform a 3 way `JOIN` on the 4 tables you have created. 

In [None]:
try:
    cur.execute(
        "SELECT\
                    t.Transaction_ID,\
                    t.Customer_Name,\
                    e.Employee_Name,\
                    t.Year,\
                    a.Album_Name,\
                    s.Amount_Spent \
                FROM \
                    ((transactions t JOIN employees e ON t.Employee_ID = e.Employee_ID) \
                    JOIN albums_sold a ON t.Transaction_ID = a.Transaction_ID) \
                    JOIN sales s ON t.Transaction_ID = s.Transaction_ID"
    )


except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

#### Great we were able to get the data we wanted.

### But, we had to perform a 3 way `JOIN` to get there. While it's great we had that flexibility, we need to remember that `JOINS` are slow and if we have a read heavy workload that required low latency queries we want to reduce the number of `JOINS`.  Let's think about denormalizing our normalized tables.

### With denormalization you want to think about the queries you are running and how to reduce the number of JOINS even if that means duplicating data. The following are the queries you need to run.

- **Query 1**: `SELECT transaction_id, customer_name, amount_spent FROM <min number of tables>`
It should generate the amount spent on each transaction 
- **Query 2**: `SELECT cashier_name, SUM(amount_spent) FROM <min number of tables> GROUP BY cashier_name`
It should generate the total sales by cashier 

###  Query 1: `SELECT transaction_id, customer_name, amount_spent FROM <min number of tables>`

One way to do this would be to do a JOIN on the `sales` and `transactions2` table but we want to minimize the use of `JOINS`.  

To reduce the number of tables, first add `amount_spent` to the `transactions` table so that you will not need to do a JOIN at all. 

```
Table Name: transactions 
Column 0: Transaction Id
Column 1: Customer Name
Column 2: Cashier Id
Column 3: Year
Column 4: Amount Spent
```


![](images/table19.png)


### TO-DO: Add the tables as part of the denormalization process

In [None]:
# TO-DO: Create all tables
drop_pg_table(conn, "sales")
table_name = "transactions"
column_names = [
    "Transaction_ID",
    "Customer_Name",
    "Employee_ID",
    "Year",
    "Amount_Spent",
]
column_types = ["int", "varchar", "int", "int", "int"]
columns = {name: type for name, type in zip(column_names, column_types)}
drop_pg_table(conn, table_name)
create_pg_table(conn, table_name, columns)
insert_pg_rows(
    conn,
    table_name,
    column_names,
    [
        (1, "Amanda", 1, 2000, 40),
        (2, "Toby", 1, 2000, 19),
        (3, "Max", 2, 2018, 45),
    ],
)

### Now you should be able to do a simplifed query to get the information you need. No  `JOIN` is needed.

In [None]:
try:
    cur.execute("SELECT Transaction_ID, Customer_Name, Amount_Spent FROM transactions")

except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

#### Your output for the above cell should be the following:
(1, 'Amanda', 40)<br>
(2, 'Toby', 19)<br>
(3, 'Max', 45)

### Query 2: `select cashier_name, SUM(amount_spent) FROM <min number of tables> GROUP BY cashier_name` 

To avoid using any `JOINS`, first create a new table with just the information we need. 

`Table Name: cashier_sales
col: Transaction Id
Col: Cashier Name
Col: Cashier Id
col: Amount_Spent
`
![](images/table20.png)

In [None]:
# TO-DO: Create a new table with just the information you need.
table_name = "employee_transactions"
column_names = [
    "Transaction_ID",
    "Employee_Name",
    "Employee_ID",
    "Amount_Spent",
]
column_types = ["int", "varchar", "int", "int"]
columns = {name: type for name, type in zip(column_names, column_types)}
drop_pg_table(conn, table_name)
create_pg_table(conn, table_name, columns)
insert_pg_rows(
    conn,
    table_name,
    column_names,
    [
        (1, "Sam", 1, 40),
        (2, "Sam", 1, 19),
        (3, "Bob", 2, 45),
    ],
)

### Run the query

In [None]:
try:
    cur.execute(
        "select employee_name, SUM(amount_spent) FROM employee_transactions GROUP BY employee_name"
    )

except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

#### Your output for the above cell should be the following:
('Sam', 59)<br>
('Bob', 45)


#### We have successfully taken normalized table and denormalized them inorder to speed up our performance and allow for simplier queries to be executed. 

### Drop the tables

In [None]:
from src.database import close_pg_connection

# drop all tables
for table in [
    "transactions",
    "albums_sold",
    "employee_transactions",
    "employees",
    "sales",
]:
    drop_pg_table(conn, table)

# close connection
close_pg_connection(conn)