# HW3.  Database Design

### Objectives
In this assignment, You will practice:

How to use Design Theory to refine a database you have designed.

How to use Normalization Theory, in particular:

* How to determine which Functional Dependencies (FDs) are implied.

* How to determine which Inclusion Dependencies (INDs) are implied.

* How to find all candidate keys and prime attributes of the schema.

* How to determine whether a given schema is in BCNF.



Useful resources for writing math in the jupyter notebook:

https://towardsdatascience.com/write-markdown-latex-in-the-jupyter-notebook-10985edb91fd

## Q1 (10 points) 

Imagine we want to build a music database with the following characteristics:
 - An **artist** is known by their name. We also keep an artist's *genre*, *hometown*, *bio*, and *homepage* in the database.
 - An **album** has an artist. It is produced by a *recording company*.
 - An album is known by *name* of the album and the *name* of its artist. We also keep *year*, *number of tracks* (at least one), and the *recording studio* for an album.
 - An album has songs on the album. 
 - A **recording company** is known by its name. We also keep *address*, *homepage*, and *telephone number* for a *recording company*.
 - A **song** is known by its *name*, *name* of its artist, and the *album* it is part of. We also keep *length* and *track number* for the song. A song might have *guest musicians*. A song may have a *tablature*.
 - A **tablature** is known by the *URL*. We keep *date*, *transcriber*, and transcriber *email* for a tablature.
 - A **musician** is known by their *name*. A musician should have an *instrument*. We also keep *hometown* for a musician.
 - Musicians and an artist can be in a group.
 - Artists might influence a musician.
 
 Design the ERM to capture this database. Please note that you do not need to submit your ERM design, but you need to use it to answer following questions.

Q1.1 (3 points) Please list your schemas (not create table statements). Follow the style of the bank schema in HW2 for you answers. 

PK: Primary Key, FK: Foreign Key


Schema for the Artist entity:
Artist (ArtistID PK, Name, Genre, Hometown, Bio, Homepage)

Schema for the Album entity:
Album (AlbumID PK, Name, Year, NumberOfTracks, RecordingStudio, ArtistID FK, CompanyID FK)

Schema for the RecordingCompany entity:
RecordingCompany (CompanyID PK, Name, Address, Homepage, Telephone)

Schema for the Song entity:
Song (SongID PK, Name, Length, TrackNumber, AlbumID FK)

Schema for the Tablature entity:
Tablature (TablatureID PK, URL, Date, Transcriber, TranscriberEmail, SongID FK)

Schema for the Musician entity:
Musician (MusicianID PK, Name, Instrument, Hometown)

Schema for the Group entity:
Group (GroupID PK)

Schema for the Influence entity:
Influence (InfluenceID PK)

Q1.2 (2 points) Ensure your relations in Q1.1 are in BCNF. Modify your answer for Q1.1 if necessary.

Schema for the Artist entity:
Artist (ArtistID PK, Name, Genre, Hometown, Bio, Homepage)

Schema for the Album entity:
Album (AlbumID PK, Name, Year, NumberOfTracks, RecordingStudio, ArtistID FK)

Schema for the RecordingCompany entity:
RecordingCompany (CompanyID PK, Name, Address, Homepage, Telephone)

Schema for the Song entity:
Song (SongID PK, Name, Length, TrackNumber, AlbumID FK)

Schema for the Tablature entity:
Tablature (TablatureID PK, URL, Date, Transcriber, TranscriberEmail, SongID FK)

Schema for the Musician entity:
Musician (MusicianID PK, Name, Instrument, Hometown)

Schema for the Group entity:
Group (GroupID PK)

Schema for the Influence entity:
Influence (InfluenceID PK)

Q1.3 (5 points) Please list the functional dependencies in your relations, based on your answer in Q1.2.

Functional dependencies in the Artist relation:
ArtistID -> Name, Genre, Hometown, Bio, Homepage

Functional dependencies in the Album relation:
AlbumID -> Name, Year, NumberOfTracks, RecordingStudio, ArtistID

Functional dependencies in the RecordingCompany relation:
CompanyID -> Name, Address, Homepage, Telephone

Functional dependencies in the Song relation:
SongID -> Name, Length, TrackNumber, AlbumID

Functional dependencies in the Tablature relation:
TablatureID -> URL, Date, Transcriber, TranscriberEmail, SongID

Functional dependencies in the Musician relation:
MusicianID -> Name, Instrument, Hometown

Functional dependencies in the Group relation:
GroupID -> No additional functional dependencies

Functional dependencies in the Influence relation:
InfluenceID -> No additional functional dependencies

## Q2 (12 points)

Consider the following set of FDs:

$D \rightarrow AC$

$AB \rightarrow DE$

$FD \rightarrow E$

$C \rightarrow F$           

Q2.1 (6 points) Determine whether each of the following FDs is implied by the FDs above:

 (a) $EF \rightarrow BC$
 
 (b) $BC \rightarrow BF$
 
 (c) $DEF \rightarrow AB$
 
 (d) $DF \rightarrow AE$
 
 (e) $CD \rightarrow ED$
 
 (f) $DE \rightarrow AF$


a) $ \{BC\} \not\subseteq C_{\sum}(EF) = EFC$, so $\sum \not\models EF \rightarrow BC$

b) $ \{BF\} \subseteq C_{\sum}(BC) = BCF$, so $\sum \models BC \rightarrow BF$

c) $ \{AB\} \not\subseteq C_{\sum}(DEF) = DEFAC$, so $\sum \not\models DEF \rightarrow AB$

d) $ \{AE\} \subseteq C_{\sum}(DF) = ACDFE$, so $\sum \models DF \rightarrow AE$

e) $ \{ED\} \subseteq C_{\sum}(CD) = CDFAE$, so $\sum \models CD \rightarrow ED$

f) $ \{AF\} \subseteq C_{\sum}(DE) = DEACF$, so $\sum \models DE \rightarrow AF$





Q2.2 (6 points) For each of the FDs in Q2.1 that are implied, give a derivation
    using the Armostrong's axioms.


b) 1. $C \rightarrow F$ given in $\Sigma $  

2. $ BC \rightarrow BF$ Augmentation on 1 $\newline$

d) 1. $ D \rightarrow AC$ $\quad$ given in $\Sigma $ 

2. $ DF \rightarrow ACF$ $\quad$  Augmentation on 1

3. $ FD \rightarrow E$ $\quad$ given in $\Sigma $

4. $ DF \rightarrow ACFE$ $\quad$ union on 2,3

5. $ DF \rightarrow AE$ $\quad$ Decomposition on 4 $\newline$

e) 1. $ C \rightarrow F$ $\quad$ given in $\Sigma $ 

2. $ CD \rightarrow FD$ $\quad$ Augmentation on 1

3. $ FD \rightarrow E$ $\quad$ given in $\Sigma $

4. $ CD \rightarrow E$ $\quad$ Transitivity on 2,3

5. $ CD \rightarrow ED$ $\quad$ Augmentation on 4

f) 1. $ DF \rightarrow E$ $\quad$ given in $\Sigma $ 

2. $DF \rightarrow DE$ $\quad$ Augmentation on 1 

3. $ D \rightarrow AC$ $\quad$ given in $\Sigma $ 

4. $ DE \rightarrow ACE$ $\quad$ Augmentation on 3 

5. $ DF \rightarrow ACE$ $\quad$ Transitivity on 2,4

6. $ DF \rightarrow AE$ $\quad$ Decomposition on 5 






## Q3 (4 points)

Let $R$, $S$ and $T$ be relations on attributes $A,B,C$. Given the following
  set of INDs:
  
   $R[A,B] \subseteq S[B,C] $
   
   $ S[B,C] \subseteq T[C,A]$
   
  Determine which of the following INDs are implied:  
  (a) $R[C] \subseteq T[C]$
  
  (b) $R[B] \subseteq T[C]$
  
  (c)$R[A] \subseteq T[C]$
  
  (d) $R[C] \subseteq T[A]$
  

a) Not implied because R[C] ⊆ T[C] can't be derived.
    
b) Not implied because R[B] ⊆ T[C] can't be implied from the set.

c) 1 R[A,B] ⊆ T[C,A]  Transivity on 2 given statments.

    R[A] ⊆ T[C]   Projection on 1
    
d) Not implied because R[C] ⊆ T[A] can't be implied from the set.
  
    

## Q4  (10 points)

Consider the schema over attributes $A,B,C,D,E,F$ and the following set of FDs:

$EF \rightarrow BC $

$A \rightarrow D $

$B \rightarrow AE $

$BD \rightarrow C$


Q4.1 (6 points) Find all candidate keys and prime attributes of the schema.

Every key requires F as it is not on the r.h.s of any FD (Check Combos). 

$C_{\sum}(F) = F$, F not a key

$C_{\sum}(AF) = AFD$, AF not a key

$C_{\sum}(BF) = BFAEDC,$ BF is a candidate key

$C_{\sum}(CF) = CF$,  CF is not a key

$C_{\sum}(DF) = DF$, DF not a key

$C_{\sum}(EF) = EFBCAD$, EF is a candidate key


Exstension of not keys from above:

$C_{\sum}(ACF) = ACFD$, ACF not a key

$C_{\sum}(CDF) = CDF$, CDF not a key

$C_{\sum}(ADF) = ADF$, ADF not a key

Exstensions of not key from above without B or E since they are candidate keys and having them would be non minimal thus violating rule:

$C_{\sum}(ACDF) = ACDF$, ACDF not key

Therefore, BF, EF are the candidate keys

prime attributes: B, E, F

Q4.2 (4 points) Is the schema in BCNF? Justify your answer.

This schema is not in BCNF because the FDs are non-trivial. 

From the previous question, any set that does not include BF or EF is not a key. 

Therefore, the BCNF is violated by the set of FDs {A->D, B->AE, BD->C}

## Submission

Complete the code in this notebook [hw3.ipynb](hw3.ipynb) and submit it to the Canvas activity Homework(3). Please note than you can insert additional cells if required for your answers.