# Database Design




## From Excel to DB to Answers

For this lab you will be taking a MS Excel file __with three tabs__ of data and using that to create a database in SQLite and run queries on that.

The file is located under datasets as [Module4Data.xlsx](../../../datasets/Module4Data.xlsx).

### Data 
The three pages of data are: 
  1. Artist - contains artist name, genre, and year formed;
  1. Albums - contains artist name, album title, year produced
  1. Songs - contains album title, song name, song length.

You will notice that the last tab, Songs is  unnormalized data discussed previously.


### Methodology

Please recall the steps of the database design process!

  1. Discovery
  1. Modelling
  1. Defining

### Discovery 

You should now take a moment to go and look at the data.  Identify the entities that are relevant for your database, and what their respective attributes will be.  Then, contemplate the identifiers for those entities.

### Modelling
Once you have identified all the aspects of the database, try to sketch out a model.  
  1. How are the entities connected through relationships?
  1. Is there one table per file tab? 
  1. Are you going to use __id__ columns as done in the previous examples?
  1. Which columns of data create overlaps that we can use as foriegn keys to reference other tables?
  1. Is your model normalized to remove redundancy?
  
Your resulting model should be able to hold all the data that you have been given. 

This portion of the activity is one of the most important so don't be afraid to take a while to design the system.



Once you have thought thru your own module. Take a look here [Music](../resources/music.jpg). There were some assumptions made based on the data give. There are no collaborative albums, and songs must be associated with an album. We will use this diagram as a basis for the rest of the lab.

### Defining

Now that you have finalized your DDL, you need to create a SQLite database, named "../__songs.db__" and write the CREATE TABLE statements.





In [1]:
## Load SQL Extension and open the database file
%load_ext sql
%sql sqlite:///../databases/songs.db

'Connected: None@../databases/songs.db'

This is here so that you can go thru the lab more than once. These commands delete the table and the all of the data associated with the table. 

In [2]:
%sql DROP TABLE IF EXISTS Artist;

Done.


[]

In [3]:
%sql DROP TABLE IF EXISTS Album;

Done.


[]

In [4]:
%sql DROP TABLE IF EXISTS Song;

Done.


[]

### Create Table for Artists

Write the Create Table statement for the artist table.
Ensure you use appropriate data types and column/table constraints as necessary.  

**Remember** to define a primary key for the table.

##  Artist SQL

```SQL
CREATE TABLE Artist (
    artist_id INT, 
    name varchar(100), 
    genre varchar(100), 
    year_formed INT,
    PRIMARY KEY (artist_id)
);
```

We make a choice fo using an `artist_id` because artists may have the same name and this integer type (counting id) will be well suited for `FOREIGN KEY` usage.


In [5]:
%%sql 
CREATE TABLE Artist (
    artist_id INT, 
    name varchar(100), 
    genre varchar(100), 
    year_formed INT,
    PRIMARY KEY (artist_id)
);


Done.


[]

### Create Table for Albums

Write the Create Table statement for the artist table.
Ensure you use appropriate data types and column/table constraints as necessary.  

**Remember** to link the Albums records to the Artist records via a foreign key relationship.

## Albums Definition

Recall, an Album is recorded by and Artist.
Therefore, we expect the album to have a foreign key reference back to the Artist table.

```SQL
CREATE TABLE Album (
    album_id INT,
    artist_id INT, 
    title varchar(100), 
    year_produced INT,
    PRIMARY KEY (album_id),
    FOREIGN KEY (artist_id)
        REFERENCES Artist(artist_id)
);
```


In [6]:
%%sql
CREATE TABLE Album (
    album_id INT,
    artist_id INT, 
    title varchar(100), 
    year_produced INT,
    PRIMARY KEY (album_id),
    FOREIGN KEY (artist_id)
        REFERENCES Artist(artist_id)
);

Done.


[]

## Songs Definition

Albums have tracks, so we can number the tracks sequentially on the album.
We will work with the second definition of Album, where the primary key is an `album_id`.
Therefore, we expect the song to have a foreign key reference back to the Album table.

We can define the song to be a proper child of the album by making the `album_id` first part of the key structure for Song.

```SQL
CREATE TABLE Song (
    album_id INT,
    track INT,
    title varchar(100), 
    length varchar(20),
    PRIMARY KEY (album_id, track),
    FOREIGN KEY (album_id)
        REFERENCES Album(album_id)
);
```

Notice the primary key is a composite of the `(artist_id, title)`.



In [7]:
%%sql
CREATE TABLE Song (
    album_id INT,
    track INT,
    title varchar(100),
    length varchar(20),
    PRIMARY KEY (album_id, track),
    FOREIGN KEY (album_id)
        REFERENCES Album(album_id)
);

Done.


[]

#### Note
You will add to this atabase and populate it with data in a practice.




# PLEASE SAVE YOUR NOTEBOOK