# Creating Denormalized Tables

### Walk through the basics of modeling data from normalized from to denormalized form. We will create tables in PostgreSQL, insert rows of data, and do simple JOIN SQL queries to show how these mutliple tables can work together.


**Remember the examples shown are simple, but imagine these situations at scale with large datasets, many users, and the need for quick response time.**

In [2]:
# import library
import psycopg2 as pg

In [3]:
# connection to database
try:
    con = pg.connect("dbname=udacity user=postgres host=127.0.0.1 password=admin")
except pg.Error as e:
    print("Error: Could not make connection to database")
    print(e)

In [6]:
# get a cursor

try:
    cur = con.cursor()
except pg.Error as e:
    print("Error: Could not get cursor")
    print(e)
con.set_session(autocommit=True)

**In this exercise,I'll use the same table used in previous exercise, but we have added a new table sales. note, this database is normalized (NF3).**

Table Name: transactions2 

    column 0: transaction Id
    column 1: Customer Name
    column 2: Cashier Id
    column 3: Year
    
Table Name: albums_sold

    column 0: Album Id
    column 1: Transaction Id
    column 3: Album Name
Table Name: employees

    column 0: Employee Id
    column 1: Employee Name
Table Name: sales

    column 0: Transaction Id
    column 1: Amount Spent


In [7]:
# add sales table to database
try: 
    cur.execute("CREATE TABLE IF NOT EXISTS sales (transaction_id int, amount_spent int);")
except psycopg2.Error as e: 
    print("Error: Issue creating table")
    print (e)

In [8]:
# insert data into sales table

try: 
    cur.execute("INSERT INTO sales (transaction_id, amount_spent) \
                 VALUES (%s, %s)", \
                 (1, 45))
except psycopg2.Error as e: 
    print("Error: Inserting Rows")
    print (e)

try: 
    cur.execute("INSERT INTO sales (transaction_id, amount_spent) \
                 VALUES (%s, %s)", \
                 (2, 30))
except psycopg2.Error as e: 
    print("Error: Inserting Rows")
    print (e)
    
try: 
    cur.execute("INSERT INTO sales (transaction_id, amount_spent) \
                 VALUES (%s, %s)", \
                 (3, 60))
except psycopg2.Error as e: 
    print("Error: Inserting Rows")
    print (e)

In [11]:
# print results
try:
    cur.execute("select * from transaction2;")
except pg.Error as e:
    print("Error: select *")
    print(e)

print("transaction table:")
row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()
    
try:
    cur.execute("select * from album_sold;")
except pg.Error as e:
    print("Error: select *")
    print(e)

print("\nalbum table:")
row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

try:
    cur.execute("select * from employee;")
except pg.Error as e:
    print("Error: select *")
    print(e)

print("\nemployee table:")
row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

try:
    cur.execute("select * from sales;")
except pg.Error as e:
    print("Error: select *")
    print(e)

print("\nsales table:")
row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

transaction table:
(1, 'ibrahim', 1, 2000)
(2, 'ibrahim', 1, 2000)
(3, 'Jamal', 2, 2000)
(4, 'Jamal', 2, 2000)
(5, 'Ameen', 1, 2000)
(6, 'Ameen', 1, 2000)

album table:
(1, 1, 'Hello')
(2, 1, 'world beauty')
(3, 2, 'yes')
(4, 2, 'test')
(5, 3, 'team')
(6, 3, 'FCB')

employee table:
(1, 'Ali')
(2, 'Ahmed')

sales table:
(1, 45)
(2, 30)
(3, 60)


**Let's say we need to do a query that gives us:**

     transaction_id
     customer_name
     cashier name
     year 
     albums sold
     amount sold

we will need to perform a 3 way JOIN on the 4 tables we have created.

In [17]:
try:
    cur.execute("select t.transaction_id, t.customer_name, emp.employee_name, t.year,\
    ab.album_name, sale.amount_spent from transaction2 t \
    join employee emp on t.cashier_id = emp.employee_id \
    join album_sold ab on t.transaction_id = ab.transaction_id \
    join sales sale on t.transaction_id = sale.transaction_id")
except pg.Error as e:
    print("Error: Select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

(1, 'ibrahim', 'Ali', 2000, 'Hello', 45)
(1, 'ibrahim', 'Ali', 2000, 'world beauty', 45)
(2, 'ibrahim', 'Ali', 2000, 'yes', 30)
(2, 'ibrahim', 'Ali', 2000, 'test', 30)
(3, 'Jamal', 'Ahmed', 2000, 'team', 60)
(3, 'Jamal', 'Ahmed', 2000, 'FCB', 60)


**Great we were able to get the data we wanted.**

But, we had a to 3 way JOIN to get there. While it's great we had that flexibility, we need to remember that joins are slow and if we have a read heavy workload that required low latency queries we want to reduce the number of JOINS. Let's think about denormalizing our normalized tables.

With denormalization we want to think about the queries we are running and how we can reduce our number of JOINS even if that means duplicating data.

Query 1 : select transaction_id, customer_name, amount_sent FROM <min number of tables>
This should give the amount spent on each transaction
 
Query 2: select cashier_name, SUM(amount_spent) FROM <min number of tables> GROUP BY cashier_name
This should give the total sales by cashier
    
Query 1: select transaction_id, customer_name, amount_spent FROM <min number of tables>
There are two ways to do this, you can do a JOIN on the sales and transactions2 table but we want to minimize the use of JOINS.

Let's add amount_spent to the transactions table so that we will not need to do a JOIN at all.
    
Table Name: transactions 
    
    column 0: transaction Id
    column 1: Customer Name
    column 2: Cashier Id
    column 3: Year
    column 4: amount_spent


In [18]:
# create transaction table.

try: 
    cur.execute("CREATE TABLE IF NOT EXISTS transactions (transaction_id int, \
                                                           customer_name varchar, cashier_id int, \
                                                           year int, amount_spent int);")
except psycopg2.Error as e: 
    print("Error: Issue creating table")
    print (e)


In [19]:
# insert data into transactions table.

try: 
    cur.execute("INSERT INTO transactions (transaction_id, customer_name, cashier_id, year, amount_spent) \
                 VALUES (%s, %s, %s, %s, %s)", \
                 (1, "ibrahim", 1, 2000, 40))
except psycopg2.Error as e: 
    print("Error: Inserting Rows")
    print (e)

try: 
    cur.execute("INSERT INTO transactions (transaction_id, customer_name, cashier_id, year, amount_spent) \
                 VALUES (%s, %s, %s, %s, %s)", \
                 (2, "Jamal", 2, 2000, 30))
except psycopg2.Error as e: 
    print("Error: Inserting Rows")
    print (e)

try: 
    cur.execute("INSERT INTO transactions (transaction_id, customer_name, cashier_id, year, amount_spent) \
                 VALUES (%s, %s, %s, %s, %s)", \
                 (3, "Ali", 1, 2000, 10))
except psycopg2.Error as e: 
    print("Error: Inserting Rows")
    print (e)

In [21]:
# now do a simplifed query to get the information we need. No  JOIN is needed.

try:
    cur.execute('select transaction_id, customer_name, amount_spent FROM transactions;')
except pg.Error as e:
    print('Error: select *')
    print(e)
    
row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

(1, 'ibrahim', 40)
(2, 'Jamal', 30)
(3, 'Ali', 10)


In [22]:
# Query 2: select cashier_name, SUM(amount_spent) FROM <min number of tables> GROUP BY cashier_name
try: 
    cur.execute("CREATE TABLE IF NOT EXISTS cashier_sales (transaction_id int, cashier_name varchar, \
                                                           cashier_id int, amount_spent int);")
except psycopg2.Error as e: 
    print("Error: Issue creating table")
    print (e)

try: 
    cur.execute("INSERT INTO cashier_sales (transaction_id, cashier_name, cashier_id, amount_spent) \
                 VALUES (%s, %s, %s, %s)", \
                 (1, "Ali", 1, 40 ))
except psycopg2.Error as e: 
    print("Error: Inserting Rows")
    print (e)

try: 
    cur.execute("INSERT INTO cashier_sales (transaction_id, cashier_name, cashier_id, amount_spent) \
                 VALUES (%s, %s, %s, %s)", \
                 (2, "Ahmed", 2, 40 ))
except psycopg2.Error as e: 
    print("Error: Inserting Rows")
    print (e)
try: 
    cur.execute("INSERT INTO cashier_sales (transaction_id, cashier_name, cashier_id, amount_spent) \
                 VALUES (%s, %s, %s, %s)", \
                 (3, "Ahmed", 2, 10 ))
except psycopg2.Error as e: 
    print("Error: Inserting Rows")
    print (e)

try: 
    cur.execute("INSERT INTO cashier_sales (transaction_id, cashier_name, cashier_id, amount_spent) \
                 VALUES (%s, %s, %s, %s)", \
                 (4, "Ali", 1, 70 ))
except psycopg2.Error as e: 
    print("Error: Inserting Rows")
    print (e)

In [26]:
# implement query2:
try: 
    cur.execute("select cashier_name,sum(amount_spent) from cashier_sales group by cashier_name;")
except pg.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

('Ali', 110)
('Ahmed', 50)


In [27]:
# close cursor and connection
cur.close()
con.close()