# Database Normalization: From 0NF to 3NF

This notebook demonstrates the **process of database normalization**, taking a single, un-normalized table and applying the rules of First, Second, and Third Normal Form to reduce data redundancy and improve data integrity.

*Built and documented by Fahad Shah (1FahadShah) — representing my personal learning journey through the PostgreSQL for Everybody Specialization.*


In [1]:
%load_ext sql
%sql postgresql://fahad:secret@localhost:5432/music

## The Problem: An Un-Normalized Table (0NF)

We start with a single "flat file" table that contains all of our music data. This is considered **Un-Normalized (0NF)**.

| track_title         | track_len | artist_name  | album_title     | genre_name |
| :------------------ | :-------- | :----------- | :-------------- | :--------- |
| Black Dog           | 296       | Led Zeppelin | Led Zeppelin IV | Rock       |
| Stairway to Heaven  | 482       | Led Zeppelin | Led Zeppelin IV | Rock       |
| Hells Bells         | 312       | AC/DC        | Back in Black   | Metal      |
| Back in Black       | 255       | AC/DC        | Back in Black   | Metal      |

**Problems with this design:**
- **Data Redundancy**: The names of artists, albums, and genres are repeated for every single track.
- **Update Anomaly**: If Led Zeppelin changes their name, we have to find and update it in multiple rows.
- **Insertion Anomaly**: We can't add a new artist until they have at least one track.
- **Deletion Anomaly**: If we delete the last track by an artist, we lose the information that the artist ever existed.

--- 
## Step 1: First Normal Form (1NF) - Atomic Values

**Rule**: Each column in a table must hold a single, atomic (indivisible) value, and each row must be unique.

**Action**: Our table already meets the 1NF criteria. Each cell contains only one piece of information. For example, we don't have a column storing multiple genres like `'Rock, Blues'`.

--- 
## Step 2: Second Normal Form (2NF) - Removing Partial Dependencies

**Rule**: The table must be in 1NF, and all non-key attributes must depend on the **entire** primary key. This rule is relevant when a table has a composite primary key.

**Action**: Let's assume our primary key is `(track_title, artist_name)`. 
- `album_title` depends only on `artist_name`, not the full key. This is a **partial dependency**.
- `genre_name` depends only on `track_title` (in this simple model), not the full key. This is also a **partial dependency**.

**Solution**: We must break the table apart to remove these dependencies. We pull out the repeating groups into their own tables.

We create separate tables for `artist` and `genre`.

In [2]:
%%sql

CREATE TABLE IF NOT EXISTS artist (
    id SERIAL PRIMARY KEY,
    name VARCHAR(128) NOT NULL UNIQUE
);

CREATE TABLE IF NOT EXISTS genre (
    id SERIAL PRIMARY KEY,
    name VARCHAR(128) NOT NULL UNIQUE
);

 * postgresql://fahad:***@localhost:5432/music
Done.
Done.


[]

--- 
## Step 3: Third Normal Form (3NF) - Removing Transitive Dependencies

**Rule**: The table must be in 2NF, and there should be no transitive dependencies (where a non-key attribute depends on another non-key attribute).

**Action**: In our original flat table, `album_title` depends on `artist_name`. Since `artist_name` is not part of the primary key, this is a **transitive dependency**.

**Solution**: We create a separate `album` table. This move, which we started to satisfy 2NF, also solves the 3NF problem. The final step is to create our `track` table that links everything together using foreign keys.

--- 
## The Final Normalized Schema (3NF)

After applying the normalization rules, we arrive at our final, clean schema. This design has no redundancy and strong data integrity.

In [3]:
%%sql

-- Create the album table, which depends on the artist table
CREATE TABLE IF NOT EXISTS album (
    id SERIAL PRIMARY KEY,
    title VARCHAR(128) NOT NULL,
    artist_id INTEGER NOT NULL REFERENCES artist(id) ON DELETE CASCADE,
    UNIQUE(title, artist_id)
);

-- Create the final track table, which links everything together
CREATE TABLE IF NOT EXISTS track (
    id SERIAL PRIMARY KEY,
    title VARCHAR(128) NOT NULL,
    len INTEGER, rating INTEGER, count INTEGER,
    album_id INTEGER NOT NULL REFERENCES album(id) ON DELETE CASCADE,
    genre_id INTEGER NOT NULL REFERENCES genre(id) ON DELETE CASCADE,
    UNIQUE(title, album_id)
);

 * postgresql://fahad:***@localhost:5432/music
Done.
Done.


[]

--- 
## The Payoff: Querying with JOINs

Now that our data is properly separated, we can use `JOIN` to bring it back together in meaningful ways.

In [4]:
%%sql
SELECT t.title AS track, a.title AS album, ar.name AS artist, g.name AS genre
FROM track t
JOIN album a ON t.album_id = a.id
JOIN artist ar ON a.artist_id = ar.id
JOIN genre g ON t.genre_id = g.id;

 * postgresql://fahad:***@localhost:5432/music
(psycopg2.errors.InsufficientPrivilege) permission denied for table track

[SQL: SELECT t.title AS track, a.title AS album, ar.name AS artist, g.name AS genre
FROM track t
JOIN album a ON t.album_id = a.id
JOIN artist ar ON a.artist_id = ar.id
JOIN genre g ON t.genre_id = g.id;]
(Background on this error at: https://sqlalche.me/e/20/f405)


## Key Takeaways

1. **Normalization is a process**: It's a step-by-step refinement of your data structure.
2. **1NF, 2NF, 3NF are rules**: They help eliminate redundancy by separating unrelated data into different tables.
3. **Foreign Keys are the glue**: They link the separated tables back together, maintaining relational integrity.
4. **The result is efficiency**: A normalized database is easier to maintain, update, and query, and it prevents common data anomalies.