## Loading Music Data from CSV into a Relational SQLite Database

In this section, we connect everything together by **reading data from a CSV file**
and inserting it into a **normalized relational database** using Python and SQLite.

This demonstrates how relational design, foreign keys, and `JOIN`s work in practice.

---

## Step 1: Database Setup (Creating Tables)

We first create a fresh SQLite database and define our tables.

In [9]:
import sqlite3

conn = sqlite3.connect("trackdb.sqlite")
cur = conn.cursor()

cur.executescript("""
DROP TABLE IF EXISTS Artist;
DROP TABLE IF EXISTS Album;
DROP TABLE IF EXISTS Track;

CREATE TABLE Artist (
    id     INTEGER PRIMARY KEY,
    name   TEXT UNIQUE
);

CREATE TABLE Album (
    id        INTEGER PRIMARY KEY,
    artist_id INTEGER,
    title     TEXT UNIQUE
);

CREATE TABLE Track (
    id       INTEGER PRIMARY KEY,
    title    TEXT UNIQUE,
    album_id INTEGER,
    len      INTEGER,
    rating   INTEGER,
    count    INTEGER
);
""")

<sqlite3.Cursor at 0x236fa952240>

## What This Does

- Drops existing tables to start with a clean database
- Creates **normalized tables**:
  - **Artist** stores each artist only once
  - **Album** belongs to an artist
  - **Track** belongs to an album
- Uses `INTEGER PRIMARY KEY` to generate unique IDs automatically
- Uses `UNIQUE` constraints to prevent duplicate records

---

## Step 2: Reading the CSV File

The input file (`tracks.csv`) contains raw music data in comma-separated format.

Example line from `tracks.csv`:

```sql
Another One Bites The Dust,Queen,Greatest Hits,55,100,217103
```

Column positions:

0      1      2      3     4       5
title, artist, album, count, rating, length

Python code to read the file:


In [10]:
handle = open("tracks.csv")

for line in handle:
    line = line.strip()
    pieces = line.split(',')
    if len(pieces) < 6:
        continue

    name   = pieces[0]
    artist = pieces[1]
    album  = pieces[2]
    count  = pieces[3]
    rating = pieces[4]
    length = pieces[5]


---

## Step 3: Inserting Artists (Parent Table)

Artists are **parent records** in the data model.
Each artist is stored **only once** and identified by a unique ID.

In [11]:
cur.execute(
    "INSERT OR IGNORE INTO Artist (name) VALUES (?)",
    (artist,)
)

cur.execute(
    "SELECT id FROM Artist WHERE name = ?",
    (artist,)
)
artist_id = cur.fetchone()[0]


## Why This Matters

- Each artist exists **only once** in the database  
- `artist_id` becomes a **foreign key** for linking albums to artists  
- `INSERT OR IGNORE` prevents duplicate artist records  

This ensures clean data and stable relationships.

---

## Step 4: Inserting Albums (Using Foreign Keys)

Albums **belong to artists**, so each album must reference
an existing artist using `artist_id`.

In [12]:
cur.execute(
    "INSERT OR IGNORE INTO Album (title, artist_id) VALUES (?, ?)",
    (album, artist_id)
)

cur.execute(
    "SELECT id FROM Album WHERE title = ?",
    (album,)
)
album_id = cur.fetchone()[0]


## Key Point

- Album.artist_id links each album to its artist  
- We never store artist names inside Album  

---

## Step 5: Inserting Tracks (Child Table)

Tracks belong to albums.


In [13]:
cur.execute(
    """INSERT OR REPLACE INTO Track
       (title, album_id, len, rating, count)
       VALUES (?, ?, ?, ?, ?)""",
    (name, album_id, length, rating, count)
)


<sqlite3.Cursor at 0x236fa952240>

### What Happens Here

- Track.album_id connects the track to its album  
- No artist or album names are duplicated  
- The database remains normalized  

---

## Step 6: Commit Changes

In [14]:
conn.commit()


All inserts are written to disk once, which is efficient.
