# Lesson 2 Exercise 1: Creating Normalized Tables

![](../images/postgresSQLlogo.png)

## In this exercise we are going to walk through the basics of modeling data in normalized form. We will create tables in PostgreSQL, insert rows of data, and do simple JOIN SQL queries to show how these multiple tables can work together.


#### Where you see ##### you will need to fill in code.


#### Import the library 
Note: An error might popup after this command has exectuted. If it does, read it carefully before ignoring. 

In [None]:
import psycopg2
from src.database import get_pg_connection, get_pg_cursor

__Create a connection to the database, get a cursor, and set autocommit to true)__

In [None]:
conn = get_pg_connection()
conn.set_session(autocommit=True)
cur = get_pg_cursor(conn)

#### Let's imagine we have a table called Music Store. 

`Table Name: music_store
column 0: Transaction Id
column 1: Customer Name
column 2: Cashier Name
column 3: Year 
column 4: Albums Purchased`


## Now to translate this information into a CREATE Table Statement and insert the data


![table12](images/table12.png)

In [None]:
# TO-DO: Add the CREATE Table Statement and INSERT statements to add the data in the table
from src.database import create_pg_table, insert_pg_rows, drop_pg_table

table_name = "music_store"
column_names = [
    "Transaction_ID",
    "Customer_Name",
    "Cashier_Name",
    "Year",
    "Albums_Purchased",
]
column_types = ["int", "varchar", "varchar", "int", "varchar[]"]
columns = {name: type for name, type in zip(column_names, column_types)}
drop_pg_table(conn, table_name)
create_pg_table(conn, table_name, columns)
insert_pg_rows(
    conn,
    "music_store",
    column_names,
    [
        (1, "Amanda", "Sam", 2000, ["Rubber Soul", "Let It Be"]),
        (2, "Toby", "Sam", 2000, ["My Generation"]),
        (3, "Max", "Bob", 2018, ["Meet the Beatles", "Help!"]),
    ],
)

try:
    cur.execute(f"SELECT * FROM {table_name};")
except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

#### Moving to 1st Normal Form (1NF)

### TO-DO: This data has not been normalized. To get this data into 1st normal form, you need to remove any collections or lists of data and break up the list of albums into individual rows.



In [None]:
## TO-DO: Complete the CREATE table statements and INSERT statements
table_name = "music_store2"
column_names = [
    "Transaction_ID",
    "Customer_Name",
    "Cashier_Name",
    "Year",
    "Album_Purchased",
]
column_types = ["int", "varchar", "varchar", "int", "varchar"]
columns = {name: type for name, type in zip(column_names, column_types)}
drop_pg_table(conn, table_name)
create_pg_table(conn, table_name, columns)
insert_pg_rows(
    conn,
    table_name,
    column_names,
    [
        (1, "Amanda", "Sam", 2000, "Rubber Soul"),
        (1, "Amanda", "Sam", 2000, "Let It Be"),
        (2, "Toby", "Sam", 2000, "My Generation"),
        (3, "Max", "Bob", 2018, "Meet the Beatles"),
        (3, "Max", "Bob", 2018, "Help!"),
    ],
)

try:
    cur.execute(f"SELECT * FROM {table_name};")
except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

#### Moving to 2nd Normal Form (2NF)
You have now moved the data into 1NF, which is the first step in moving to 2nd Normal Form. The table is not yet in 2nd Normal Form. While each of the records in the table is unique, our Primary key (transaction id) is not unique. 

### TO-DO: Break up the table into two tables, transactions and albums sold. 


In [None]:
# transactions table
table_name = "transactions"
column_names = ["Transaction_ID", "Customer_Name", "Cashier_Name", "Year"]
column_types = ["int", "varchar", "varchar", "int"]
columns = {name: type for name, type in zip(column_names, column_types)}
drop_pg_table(conn, table_name)
create_pg_table(conn, table_name, columns)
insert_pg_rows(
    conn,
    table_name,
    column_names,
    [
        (1, "Amanda", "Sam", 2000),
        (2, "Toby", "Sam", 2000),
        (3, "Max", "Bob", 2018),
    ],
)

# albums sold table
table_name = "albums_sold"
column_names = [
    "Album_ID",
    "Transaction_ID",
    "Album_Purchased",
]
column_types = ["int", "int", "varchar"]
columns = {name: type for name, type in zip(column_names, column_types)}
drop_pg_table(conn, table_name)
create_pg_table(conn, table_name, columns)
insert_pg_rows(
    conn,
    table_name,
    column_names,
    [
        (1, 1, "Rubber Soul"),
        (2, 1, "Let It Be"),
        (3, 2, "My Generation"),
        (4, 3, "Meet the Beatles"),
        (5, 3, "Help!"),
    ],
)

print("Table: transactions\n")
try:
    cur.execute("SELECT * FROM transactions;")
except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

print("\nTable: albums_sold\n")
try:
    cur.execute("SELECT * FROM albums_sold;")
except psycopg2.Error as e:
    print("Error: select *")
    print(e)
row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

### TO-DO: Do a `JOIN` on these tables to get all the information in the original first Table. 

In [None]:
## TO-DO: Complete the join on the transactions and album_sold tables

try:
    cur.execute(
        "SELECT transactions.Transaction_ID, transactions.Customer_Name, transactions.Cashier_Name, transactions.Year, albums_sold.Album_ID, albums_sold.Album_Purchased FROM transactions JOIN albums_sold ON transactions.Transaction_ID = albums_sold.Transaction_ID ;"
    )
except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

#### Moving to 3rd Normal Form (3NF)
Check our table for any transitive dependencies. 
_HINT:_ Check the table for any transitive dependencies. _Transactions_ can remove _Cashier Name_ to its own table, called _Employees_, which will leave us with 3 tables. 


### TO-DO: Create the third table named *employees* to move to 3rd NF. 


In [None]:
# transactions table
table_name = "transactions2"
column_names = ["Transaction_ID", "Customer_Name", "Employee_ID", "Year"]
column_types = ["int", "varchar", "int", "int"]
columns = {name: type for name, type in zip(column_names, column_types)}
drop_pg_table(conn, table_name)
create_pg_table(conn, table_name, columns)
insert_pg_rows(
    conn,
    table_name,
    column_names,
    [
        (1, "Amanda", 1, 2000),
        (2, "Toby", 1, 2000),
        (3, "Max", 2, 2018),
    ],
)

# employee table
table_name = "employees"
column_names = [
    "Employee_ID",
    "Employee_Name",
]
column_types = ["int", "varchar"]
columns = {name: type for name, type in zip(column_names, column_types)}
drop_pg_table(conn, table_name)
create_pg_table(conn, table_name, columns)
insert_pg_rows(
    conn,
    table_name,
    column_names,
    [
        (1, "Sam"),
        (2, "Bob"),
    ],
)

print("Table: transactions2\n")
try:
    cur.execute("SELECT * FROM transactions2;")
except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

print("\nTable: albums_sold\n")
try:
    cur.execute("SELECT * FROM albums_sold;")
except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

print("\nTable: employees\n")
try:
    cur.execute("SELECT * FROM employees;")
except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

### TO-DO: Complete the last two `JOIN` on these 3 tables so we can get all the information we had in our first Table. 

In [None]:
try:
    cur.execute(
        "SELECT transactions2.Transaction_ID,\
            transactions2.Customer_Name,\
            employees.Employee_Name,\
            transactions2.Year,\
            albums_sold.Album_Purchased\
        FROM (transactions2 JOIN albums_sold ON transactions2.Transaction_ID = albums_sold.Transaction_ID) JOIN \
            employees ON transactions2.Employee_ID = employees.Employee_ID;"
    )
except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

### Your output for the above cell should be:

(1, 'Amanda', 1, 2000, 1, 1, 'Rubber Soul', 1, 'Sam')<br>
(1, 'Amanda', 1, 2000, 2, 1, 'Let it Be', 1, 'Sam')<br>
(2, 'Toby', 1, 2000, 3, 2, 'My Generation', 1, 'Sam')<br>
(3, 'Max', 2, 2018, 4, 3, 'Meet the Beatles', 2, 'Bob')<br>
(3, 'Max', 2, 2018, 5, 3, 'Help!', 2, 'Bob')<br>


### Awesome work!! You have Normalized the dataset! 

### And finally close your cursor and connection. 

In [None]:
from src.database import close_pg_connection

# drop all tables
for table in [
    "music_store",
    "music_store2",
    "transactions",
    "albums_sold",
    "transactions2",
    "employees",
]:
    drop_pg_table(conn, table)

# close connection
close_pg_connection(conn)