In this notebook I will walk through the basics of modeling data from normalized form to denormalized form. I'll create tables in PostgreSQL, insert rows of data, and do simple JOIN SQL queries to show how these multiple tables can work together

#### Remember the examples here are simple, but imagine these situations at scale with large datasets, many users, and the need for quick response time.

In [1]:
import psycopg2

Create a connection to the cursor, and set autocommit to true:

In [2]:
try:
    conn = psycopg2.connect("dbname=udacity")
except psycopg2.Error as e:
    print("Error: Could not connect to database")
    print(e)

In [3]:
try:
    cur = conn.cursor()
except psycopg2.Error as e:
    print("Error: Could not get cursor to the database")
    print(e)
conn.set_session(autocommit=True)

Let's start with our normalized (3NF) database set of tables we had in the last exercise but we have added a new table song_length:

Table Name: album_library<br>
column 0: Album Id<br>
column 1: Album Name<br>
column 2: Artist Id<br>
column 3: Year<br>
<br>
Table Name: song_library<br>
column 0: Song Id<br>
column 1: Song Name<br>
column 2: Album Id<br>
<br>
Table Name: artist_library<br>
column 0: Artist Id<br>
column 1: Artist Name<br>
<br>
Table Name: song_length<br>
column 0: Song Id<br>
column 1: Song length in seconds<br>
<br>

In [5]:
try:
    cur.execute("CREATE TABLE IF NOT EXISTS album_library (album_id int, album_name varchar, artist_id int, year int);")
except psycopg2.Error as e:
    print("Error: Issue creating table")
    print(e)
    
try:
    cur.execute("CREATE TABLE IF NOT EXISTS artist_library (artist_id int, artist_name varchar);")
except psycopg2.Error as e:
    print("Error: Issue creating table")
    print(e)
    
try:
    cur.execute("CREATE TABLE IF NOT EXISTS song_library (song_id int, album_id int, song_name varchar);")
except psycopg2.Error as e:
    print("Error: Issue creating table")
    print(e)
    
try:
    cur.execute("CREATE TABLE IF NOT EXISTS song_length (song_id int, song_length int);")
except psycopg2.Error as e:
    print("Error: Issue creating table")
    print(e)

In [6]:
try:
    cur.execute("INSERT INTO song_length (song_id, song_length) VALUES (%s, %s)", \
                (1, 163))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO song_length (song_id, song_length) VALUES (%s, %s)", \
                (2, 137))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO song_length (song_id, song_length) VALUES (%s, %s)", \
                (3, 145))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO song_length (song_id, song_length) VALUES (%s, %s)", \
                (4, 240))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO song_length (song_id, song_length) VALUES (%s, %s)", \
                (5, 227))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO song_library (song_id, album_id, song_name) VALUES (%s, %s, %s)", \
                (1, 1, "Michelle"))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO song_library (song_id, album_id, song_name) VALUES (%s, %s, %s)", \
                (2, 1, "Think For Yourself"))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO song_library (song_id, album_id, song_name) VALUES (%s, %s, %s)", \
                (3, 1, "In My Life"))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO song_library (song_id, album_id, song_name) VALUES (%s, %s, %s)", \
                (4, 2, "Let It Be"))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO song_library (song_id, album_id, song_name) VALUES (%s, %s, %s)", \
                (5, 2, "Across the Universe"))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    

try:
    cur.execute("INSERT INTO album_library (album_id, album_name, artist_id, year) VALUES (%s, %s, %s, %s)", \
                (1, "Rubber Soul", 1, 1965))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO album_library (album_id, album_name, artist_id, year) VALUES (%s, %s, %s, %s)", \
                (2, "Let It Be", 1, 1970))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO artist_library (artist_id, artist_name) VALUES (%s, %s)", (1, "The Beatles"))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)

In [7]:
print("Table: album_library\n")
try:
    cur.execute("SELECT * FROM album_library;")
except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()
    
print("\nTable: song_library\n")
try:
    cur.execute("SELECT * FROM song_library;")
except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()
    
print("\nTable: artist_library\n")
try:
    cur.execute("SELECT * FROM artist_library;")
except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()
    
print("\nTable: song_length\n")
try:
    cur.execute("SELECT * FROM song_length;")
except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

Table: album_library

(1, 'Rubber Soul', 1, 1965)
(2, 'Let It Be', 1, 1970)

Table: song_library

(1, 1, 'Michelle')
(2, 1, 'Think For Yourself')
(3, 1, 'In My Life')
(4, 2, 'Let It Be')
(5, 2, 'Across the Universe')

Table: artist_library

(1, 'The Beatles')

Table: song_length

(1, 163)
(2, 137)
(3, 145)
(4, 240)
(5, 227)


To consolidate all of this data into one table, we need to do a 3-way JOIN. JOINs can be slow, and for a read-heavy workload with low latencies, we want to reduce the number of required JOINs. To do this, let's denormalize our normalized tables:

With denormalization we want to think about the queries we are running and how we can reduce JOINs if it means duplicating data

Query 1: ```SELECT artist_name, album_name, year, song_name, song_length FROM <min number of tables>```<br><br>
I want a list of all my songs
<br><br>

Query 2: ```SELECT album_name SUM(song_length) FROM <min number of tables> GROUP BY album_name```<br><br>
I want to know the length of each album in seconds
<br><br>

Query 1: ```SELECT artist_name, album_name, year, song_name, song_length FROM <min number of tables>```<br>

To reduce the number of tables this is more straightforward: Let's first add ```song_length``` to the ```song_library``` table and ```artist_name``` to ```album_library```

<br>

Table Name: album_library_1<br>
column 0: Album Id<br>
column 1: Album Name<br>
column 2: Artist Name<br>
column 3: Year<br>
<br>
Table Name: song_library_1<br>
column 0: Song Id<br>
column 1: Song Name<br>
column 2: Album Id<br>
column 3: Song Length
<br>

In [8]:
# Create new tables
try:
    cur.execute("CREATE TABLE IF NOT EXISTS album_library_1 (album_id int, album_name varchar, artist_name varchar, year int);")
except psycopg2.Error as e:
    print("Error: Issue creating table")
    print(e)
    
try:
    cur.execute("CREATE TABLE IF NOT EXISTS song_library_1 (song_id int, album_id int, song_name varchar, song_length int);")
except psycopg2.Error as e:
    print("Error: Issue creating table")
    print(e)

In [10]:
# Insert into new tables:
try:
    cur.execute("INSERT INTO song_library_1 (song_id, album_id, song_name, song_length) VALUES (%s, %s, %s, %s)", \
                (1, 1, "Michelle", 163))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO song_library_1 (song_id, album_id, song_name, song_length) VALUES (%s, %s, %s, %s)", \
                (2, 1, "Think For Yourself", 137))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO song_library_1 (song_id, album_id, song_name, song_length) VALUES (%s, %s, %s, %s)", \
                (3, 1, "In My Life", 145))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO song_library_1 (song_id, album_id, song_name, song_length) VALUES (%s, %s, %s, %s)", \
                (4, 2, "Let It Be", 240))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO song_library_1 (song_id, album_id, song_name, song_length) VALUES (%s, %s, %s, %s)", \
                (5, 2, "Across the Universe", 227))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)

In [12]:
try:
    cur.execute("INSERT INTO album_library_1 (album_id, album_name, artist_name, year) VALUES (%s, %s, %s, %s)", \
                (1, "Rubber Soul", "The Beatles", 1965))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO album_library_1 (album_id, album_name, artist_name, year) VALUES (%s, %s, %s, %s)", \
                (2, "Let It Be", "The Beatles", 1970))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)

### Cool, so now we can do a simplified query to get the information we need. Only one ```JOIN``` is needed.

Query 1: ```SELECT artist_name, album_name, year, song_name, song_length FROM <min number of tables>```<br><br>
I want a list of all my songs
<br><br>

Table Name: album_library_1<br>
column 0: Album Id<br>
column 1: Album Name<br>
column 2: Artist Name<br>
column 3: Year<br>
<br>
Table Name: song_library_1<br>
column 0: Song Id<br>
column 1: Song Name<br>
column 2: Album Id<br>
column 3: Song Length
<br>

In [15]:
try:
    cur.execute("SELECT artist_name, album_name, year, song_name, song_length FROM album_library_1 JOIN song_library_1 ON album_library_1.album_id = song_library_1.album_id ")
except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

('The Beatles', 'Rubber Soul', 1965, 'Michelle', 163)
('The Beatles', 'Rubber Soul', 1965, 'Think For Yourself', 137)
('The Beatles', 'Rubber Soul', 1965, 'In My Life', 145)
('The Beatles', 'Let It Be', 1970, 'Let It Be', 240)
('The Beatles', 'Let It Be', 1970, 'Across the Universe', 227)


Query 2: ```SELECT album_name SUM(song_length) FROM <min number of tables> GROUP BY album_name```<br>
We could also do a ```JOIN``` on the tables we have created, but what if we don't want to have any ```JOINS```, why not create a new table with just the information we need:
<br>

Table Name: album_length<br>
column 0: Song Id<br>
column 1: Album Name<br>
column 2: Song Length<br>
<br>

In [16]:
try:
    cur.execute("CREATE TABLE IF NOT EXISTS album_length (song_id int, album_name varchar, song_length int);")
except psycopg2.Error as e:
    print("Error: Issue creating table")
    print(e)

In [17]:
# Insert into all tables:

try:
    cur.execute("INSERT INTO album_length (song_id, album_name, song_length) VALUES (%s, %s, %s)", \
                (1, "Rubber Soul", 163))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO album_length (song_id, album_name, song_length) VALUES (%s, %s, %s)", \
                (2, "Rubber Soul", 137))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO album_length (song_id, album_name, song_length) VALUES (%s, %s, %s)", \
                (3, "Rubber Soul", 145))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO album_length (song_id, album_name, song_length) VALUES (%s, %s, %s)", \
                (4, "Let It Be", 240))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)
    
try:
    cur.execute("INSERT INTO album_length (song_id, album_name, song_length) VALUES (%s, %s, %s)", \
                (5, "Let It Be", 227))
except psycopg2.Error as e:
    print("Error: Inserting rows")
    print(e)

#### Now let's run our query:

Query 2: ```SELECT album_name SUM(song_length) FROM <min number of tables> GROUP BY album_name```<br><br>
I want to know the length of each album in seconds
<br><br>

In [19]:
try:
    cur.execute("SELECT album_name, SUM(song_length) FROM album_length GROUP BY album_name")
except psycopg2.Error as e:
    print("Error: select *")
    print(e)

row = cur.fetchone()
while row:
    print(row)
    row = cur.fetchone()

('Rubber Soul', 445)
('Let It Be', 467)


### We have successfully taken normalized tables and denormalizes them in order to speed up our performance and allow for simpler queries to be executed. 

In [20]:
# Close cursor and connection:
cur.close()
conn.close()