# Demo : Database Normalization

In this demo, we will see how to create normalized database for the previous hero/villain teams.  
  
We will learn how to:
  1. Create tables and relationships in Postgresql
  2. Insert rows
  3. Do a simple JOIN SQL query to show how these normalized tables can work together
  
This is the normalized database diagram we will use.

<img src="img/postgresql-normalization-diagram.png" alt="postgresql-normalization-diagram" width="600" align="left"/>

### NOTE : before start, make sure you already can connect to Postgresql instance. See [previous demo](postgresql-connect.ipynb) for sample

Import library and open connection.

In [None]:
import psycopg2

This one is required for sample later.

In [None]:
import psycopg2.extras

Open connection to postgresql server (adjust the connection string according to your environment), then get cursor to execute SQL queries later.

In [None]:
try: 
    conn = psycopg2.connect("host=127.0.0.1 dbname=postgres user=postgres password=postgres")
    conn.set_session(autocommit=True)
    
    # Open cursor
    cur = conn.cursor()
except Exception as e: 
    print("Error: cannot open cursor for SQL interaction")
    print(e)

Drop tables (just in case if they exists)

In [None]:
try: 
    cur.execute("DROP TABLE IF EXISTS homelands CASCADE");
    cur.execute("DROP TABLE IF EXISTS superpowers CASCADE");
    cur.execute("DROP TABLE IF EXISTS member_superpowers CASCADE");
    cur.execute("DROP TABLE IF EXISTS members CASCADE");
    cur.execute("DROP TABLE IF EXISTS headquarters CASCADE");
    cur.execute("DROP TABLE IF EXISTS teams CASCADE");
    print("Success: tables dropped")
except Exception as e: 
    print("Error: cannot drop tables")
    print (e)

Create table `teams`.  
For simplicity, all columns will be nullable (except primary key) and does not have any constraint.

In [None]:
try:
    cur.execute("""
        CREATE TABLE teams (
            team_id SERIAL PRIMARY KEY,
            team_name VARCHAR,
            is_hero BOOLEAN,
            origin VARCHAR);
    """)
    print("Success : table created")
except Exception as e:
    print("Error: cannot create table")
    print(e)

Create table `headquarters`.   
For simplicity, all columns will be nullable (except primary key) and does not have any constraint.

In [None]:
try:
    cur.execute("""
        CREATE TABLE headquarters (
            team_id INTEGER NOT NULL REFERENCES teams,
            headquarter_id SERIAL PRIMARY KEY,
            name VARCHAR,
            location VARCHAR);
    """)
    print("Success : table created")
except Exception as e:
    print("Error: cannot create table")
    print(e)

Create table `members`.  
For simplicity, all columns will be nullable (except primary key) and does not have any constraint.

In [None]:
try:
    cur.execute("""
        CREATE TABLE members (
            team_id INTEGER NOT NULL REFERENCES teams,
            member_id SERIAL PRIMARY KEY,
            name VARCHAR,
            real_name VARCHAR,
            alias VARCHAR);
    """)
    print("Success : table created")
except Exception as e:
    print("Error: cannot create table")
    print(e)

Create table `superpowers`.  
For simplicity, all columns will be nullable (except primary key) and does not have any constraint.

In [None]:
try:
    cur.execute("""
        CREATE TABLE superpowers (
            superpower_id SERIAL PRIMARY KEY,
            name VARCHAR,
            description VARCHAR);
    """)
    print("Success : table created")
except Exception as e:
    print("Error: cannot create table")
    print(e)

Create table `member_superpowers`. This table is many-to-many join table between `members` and `superpowers`.

In [None]:
try:
    cur.execute("""
        CREATE TABLE member_superpowers (
            member_id INTEGER NOT NULL REFERENCES members,
            superpower_id INTEGER NOT NULL REFERENCES superpowers);
    """)
    print("Success : table created")
except Exception as e:
    print("Error: cannot create table")
    print(e)

Create table `homelands`.  
For simplicity, all columns will be nullable (except primary key) and does not have any constraint.

In [None]:
try:
    cur.execute("""
        CREATE TABLE homelands (
            homeland_id SERIAL PRIMARY KEY,
            name VARCHAR,
            is_exists BOOLEAN);
    """)
    print("Success : table created")
except Exception as e:
    print("Error: cannot create table")
    print(e)

Add relationship between `members` and `homelands`, using `homeland_id` as foreign key.

In [None]:
try:
    cur.execute("""
        ALTER TABLE members 
            ADD COLUMN homeland_id INTEGER REFERENCES homelands
    """)
    print("Success : table relationship created")
except Exception as e:
    print("Error: cannot create table relationship")
    print(e)

Insert several teams for sample.

In [None]:
try:
    cur.execute("INSERT INTO teams(team_name, is_hero, origin) VALUES (%s, %s, %s)",
            ("Avengers", True, "Marvel"))
    cur.execute("INSERT INTO teams(team_name, is_hero, origin) VALUES (%s, %s, %s)",
            ("Justice League", True, "DC"))
    print("Success : data inserted")
except Exception as e:
    print("Error: cannot insert data")
    print(e)

Insert _Avengers_ headquarters.

In [None]:
try:
    cur.execute("SELECT team_id FROM teams WHERE LOWER(team_name) = LOWER(%s)", ("Avengers",))
    team_id = cur.fetchone()[0]

    sql = "INSERT INTO headquarters(name, location, team_id) VALUES(%s, %s, %s)"

    cur.execute(sql, ("Avengers Tower", "New York", team_id))
    cur.execute(sql, ("New Avengers Facility", "New York", team_id))

    print("Success : data inserted")
except Exception as e:
    print("Error: cannot insert data")
    print(e)

Insert _Justice League_ headquarters.

In [None]:
try:
    cur.execute("SELECT team_id FROM teams WHERE LOWER(team_name) = LOWER(%s)", ("Justice League",))
    team_id = cur.fetchone()[0]

    cur.execute("INSERT INTO headquarters(name, location, team_id) VALUES(%s, %s, %s)", 
                ("Justice League Watchtower", "Earth's Moon", team_id))

    print("Success : data inserted")
except Exception as e:
    print("Error: cannot insert data")
    print(e)

Insert several _Avengers_ members. We can also insert from Python list, where each list element is Python tuple (or list) containing record to be inserted.

In [None]:
members = [
    ("Captain Marvel", "Carol Danvers", "Vers"),
    ("Black Panther", "T'Challa", "King of Wakanda")
]

In [None]:
try:    
    # get Avengers team ID
    # note that the parameter is Python tuple
    cur.execute("SELECT team_id FROM teams WHERE LOWER(team_name) = LOWER(%s)", ("Avengers",))
    team_id = cur.fetchone()[0]
    
    for member in members:
        cur.execute("""
            INSERT INTO members(team_id, name, real_name, alias) VALUES({}, %s, %s, %s) 
        """.format(team_id), member)
    
    print("Success : {} data inserted".format(len(members)))
except Exception as e:
    print("Error: cannot insert data")
    print(e)

Let's insert few _Justice League_ members. See, we can also use list of list for parameter.

In [None]:
members = [
    ["Batman", "Bruce Wayne", "World's Greatest Detective"],
    ["Wonder Woman", "Diana Prince", "Princess of Amazon"]
]

This time, use `cur.executemany()` instead of looping list.

In [None]:
try:    
    # get Justice League team ID
    cur.execute("SELECT team_id FROM teams WHERE LOWER(team_name) = LOWER(%s)", ("Justice League",))
    team_id = cur.fetchone()[0]
    
    # Use executemany with list as parameter
    cur.executemany("""
        INSERT INTO members(team_id, name, real_name, alias) VALUES({}, %s, %s, %s) 
    """.format(team_id), members)
    
    print("Success : {} data inserted".format(len(members)))
except Exception as e:
    print("Error: cannot insert data")
    print(e)

Insert some homelands.  
If we work with thousands of data and need fast insert performance, we can also use `execute_batch` from `psycopg2.extras`.

In [None]:
homelands = [
    ("Themyscira", True), ("Gotham City", True), ("Wakanda", True), ("Boston", True)
]

try:
    psycopg2.extras.execute_batch(cur, "INSERT INTO homelands(name, is_exists) VALUES(%s, %s)", homelands)
    
    print("Success : {} data inserted".format(len(homelands)))
except Exception as e:
    print("Error: cannot insert data")
    print(e)    

Now that we have homelands, let's relate each member with their homeland.

In [None]:
try:
    sql = """
        UPDATE members m 
            SET homeland_id = (
                SELECT homeland_id FROM homelands h where LOWER(h.name) = lower(%s)
            )
        WHERE LOWER(m.name) = LOWER(%s)"""

    cur.execute(sql, ("Gotham City", "Batman"))
    cur.execute(sql, ("Themyscira", "Wonder Woman"))
    cur.execute(sql, ("Wakanda", "Black Panther"))
    cur.execute(sql, ("Boston", "Captain Marvel"))
    
    print("Success : Relationship updated")
except Exception as e:
    print("Error: cannot update data")
    print(e) 

Insert some superpowers.

In [None]:
superpowers = [
    ("Super strength", "Extraordinary strength"),
    ("Rich", "Money is the real power"),
    ("Flight", "Able to fly off the ground"),
    ("Super speed", "Move very fast"),
    ("Energy projection", "Who needs guns if you can blast heatwave from your hand?")
]

try:    
    cur.executemany("""
        INSERT INTO superpowers(name, description) VALUES(%s, %s) 
    """, superpowers)
    
    print("Success : {} data inserted".format(len(superpowers)))
except Exception as e:
    print("Error: cannot insert data")
    print(e)

Put some superpowers to those people, using many-to-many join table (`member_superpowers`)
  - Wonder Woman : flight, super strength
  - Batman : rich
  - Black Panther : rich, super strength
  - Captain Marvel : flight, energy projection, super strength

In [None]:
member_superpowers = [
    ('Wonder Woman', 'Flight'),
    ('Wonder Woman', 'Super strength'),
    ('Batman', 'Rich'),
    ('Black Panther', 'Rich'),
    ('Black Panther', 'Super strength'),
    ('Captain Marvel', 'Flight'),
    ('Captain Marvel', 'Energy projection'),
    ('Captain Marvel', 'Super strength'),
]

try:
    sql = """
        INSERT INTO member_superpowers (member_id, superpower_id)
        VALUES(
            (SELECT member_id FROM members m WHERE lower(m.name) = lower(%s)),
            (SELECT superpower_id FROM superpowers s WHERE lower(s.name) = lower(%s)))
    """
    
    cur.executemany(sql, member_superpowers)
    
    print("Success : {} data inserted".format(len(member_superpowers)))
except Exception as e:
    print("Error: cannot insert data")
    print(e)

### Great Job! Data Sample Inserted.

Our sample data were all inserted.  
Let's do some simple SQL `SELECT`.

Get team and their headquarters.

In [None]:
try:
    sql = """
        SELECT t.team_name, h.name, h.location
        FROM teams t
        INNER JOIN headquarters h ON t.team_id = h.team_id
    """
    
    cur.execute(sql)
    
    for row in cur.fetchall():
        print(row)
except Exception as e:
    print("Error: cannot select data")
    print(e)

Get each member's superpowers.

In [None]:
try:
    sql = """
        SELECT
            m.name, m.alias, s.name, s.description
        FROM members m
        INNER JOIN member_superpowers ms ON
            m.member_id = ms.member_id
        INNER JOIN superpowers s ON
            s.superpower_id = ms.superpower_id
        ORDER BY m.name, s.name
    """
    
    cur.execute(sql)
    
    for m_name, m_alias, s_name, s_desc in cur.fetchall():
        print("{}, also known as {}, has superpower '{}', which means {}".format(m_name, m_alias, s_name, s_desc))
except Exception as e:
    print("Error: cannot select data")
    print(e)

Get full team data, including their superpowers.

In [None]:
try:
    sql = """
        SELECT t.team_name, t.origin, m.name, m.alias, h.name, s.name
        FROM teams t
        INNER JOIN members m ON
            m.team_id = t.team_id
        INNER JOIN member_superpowers ms ON
            m.member_id = ms.member_id
        INNER JOIN superpowers s ON
            s.superpower_id = ms.superpower_id
        INNER JOIN homelands h ON
            h.homeland_id = m.homeland_id
        ORDER BY t.team_name, m.name
    """
    
    cur.execute(sql)
    res = cur.fetchall();
    
    print("Found {} data\n".format(len(res)))
    
    for row in res:
        print(row)
except Exception as e:
    print("Error: cannot select data")
    print(e)