# Assignment 4. Database Design

## Objectives

This assignment has two parts.

* In Part 1, you will be trained to draw an E/R diagram (Task 1) and transform it into relational schemas (Task 2).
* In Part 2, you will be trained to master important techniques related to database normalization (Tasks 3-5).

Download [A4.zip](A4.zip). Answer the questions in A4.ipynb.

## Part 1. Entity-Relationship Model (10 points)

You will design a database for SFU. This database will include information about departments, students, courses (and their offerings):

* Information about **students** includes their SID, name and age. The SID of a student is assumed to be unique, not shared by any other student. Each student is either a **graduate** or or an **undergraduate**. 
 - Each student must be in one category or the other, and cannot be in both categories simultaneously.
 - For graduate students, we record what their research field is.
 - For undergraduate students, we record their concentration.
 
 
 
* Information about **departments** includes their name and address. The name of a department is assumed to be unique, not shared by any other department.



* We need to be able to associate student with the departments with which they are affiliated. Each student has to be affiliated with exactly one department.



* Information about a course includes its number (e.g., "354"), name (e.g., "Introduction to Databases"), and capacity (e.g., 110). We also need to be able to know the unique department that owns each course: no cross-listing of courses across departments is allowed, and every course is owned by exactly one department.
 * Note: you cannot assume that course number uniquely identifies a course; in fact, you cannot assume even that course number together with course name uniquely identify a course. However, course number uniquely identifies courses within a department.
 
 
 
* Finally, we need to record all terms -- identified as semester (e.g., "fall") and year (e.g., "2018") -- in which each course has been offered in the history of the university.



* Assume that for a course to be offered during a term, it has at least one student enrolled. Also a course is offered at most once during each term. In other words, a course cannot have multiple sections during one term.



* Finally, assume that a student can take courses “owned” by departments with which the student is not affiliated. And a student should be enrolled in at least one course.


Please note that the following two sentences are not constraints (you don't need to enforce them in your ER-diagram). They just tell you what data might be like. 

1. Assume that for a course to be offered during a term, it has at least one student enrolled

2. Assume that a student can take courses “owned” by departments with which the student is not affiliated


### Task 1: E/R Diagram (5 points) 

Render the SFU database in the version of the E/R model that we studied in class, with *exactly* the constraints and requirements specified above.


<img src="ER-diagram.png" alt="Drawing" style="width: 800px;"/>

### Task 2: From E/R Diagram to Relational Schemas (5 points).

Please follow the above E/R Diagram and write SQL queries to create required tables in `sfu.db`

In [1]:
%load_ext sql

%sql sqlite:///sfu.db

'Connected: @sfu.db'

In [2]:
 %%sql
CREATE TABLE Student(SID INTEGER, name CHAR(50),age INTEGER, PRIMARY KEY (SID));

CREATE TABLE Undergraduate(concentration CHAR(50), SID INTEGER, PRIMARY KEY (SID), FOREIGN KEY (SID) REFERENCES Student);

CREATE TABLE Graduate(research_Field CHAR(50), SID INTEGER, PRIMARY KEY (SID), FOREIGN KEY (SID) REFERENCES Student);

CREATE TABLE Department(name CHAR(50), address CHAR(50), PRIMARY KEY (name));

CREATE TABLE Course(number INTEGER, name CHAR(50), capacity INTEGER, department_name CHAR(50),PRIMARY KEY (department_name, name),
                    FOREIGN KEY (department_name) REFERENCES Department);

CREATE TABLE Term(year CHAR(4), semester CHAR(50), PRIMARY KEY (year,semester));

CREATE TABLE is_in(SID INTEGER, name CHAR(50), PRIMARY KEY(SID, name), FOREIGN KEY (SID) REFERENCES Student,
                            FOREIGN KEY (name) REFERENCES Department);

CREATE TABLE enrolled(SID INTEGER, number INTEGER,department_name CHAR(50),PRIMARY KEY (SID, number, department_name),
                        FOREIGN KEY (SID) REFERENCES Student, FOREIGN KEY (number) REFERENCES Course,
                        FOREIGN KEY (department_name) REFERENCES Course);

CREATE TABLE offers(year CHAR(4), semester CHAR(50), department_name CHAR(50), number INTEGER, PRIMARY KEY (year,semester,department_name,number),
                        FOREIGN KEY (year) REFERENCES Term, FOREIGN KEY (semester) REFERENCES Term, FOREIGN KEY (department_name) REFERENCES Course,
                        FOREIGN KEY (number) REFERENCES Course)
CREATE TABLE offered_at(PRIMARY KEY (department_name, name))

 * sqlite:///sfu.db
Done.


[]

## Part 2. Normalization (10 points)

### Task 3. Decompose a relational schema into BCNF

Consider a relational schema and a set of functional dependencies: 

* $R(A,B,C,D,E)$ with functional dependencies $A \rightarrow E$, $BC \rightarrow A$, $DE \rightarrow B$

**Decompose $R(A,B,C,D,E)$ into BCNF. Show all of your work and explain, at each step, which dependency violations you are correcting. You have to write down a description of your decomposition steps. （2 points)**

Boyce-Codd Normal Form (BCNF) = no bad FD and good "FD' are functional dependancies

For the relation to be in BCNF the relation must be in 3NF and the left hand side of the functional dependency must be super key. If the right hand side of FD is prime attribute then the left hand side must also be prime attribute.

1.For R(A,B,C,D,E), A is not a key as not all attributes can be derived from A so A->E is not optimal FD.

2.Thus {A}+ = {A,E} doesnt equal (!= lol) {A,B,C,D,E}.

3.Decompose R(A,B,C,D,E) to R1(A,E) and R2(A,B,C,D)

4.R1(A,E) since A is a key and can determine E it is a good FD. So R1 is in BCNF.

5.For R2(A,B,C,D), BC->A is a bad FD because BC does not determine all attributes especially E

6.Thus for the closure we have {BC}+ = {B,C,A} which doesntual {A,B,C,D}. So we decompose R2(A,B,C,D) into R21=(B,C,A) and R22(B,C,D) and these are in BCNF.

### Task 4. Find a set of FDs that is consistent with a closed attribute set

A set of attributes $X$ is called closed (with respect to a given set of functional dependencies) if
$X^+=X$. Consider a relation with schema $R(A,B,C,D)$ and an unknown set of functional dependencies. For each closed attribute set below, give a set of functional dependencies that is consistent with it.

**a. All sets of attributes are closed (1 point)**
   * A -> A
   * B -> B
   * C -> C
   * D -> D

**b. The only closed sets are $\{\}$ and $\{A,B,C,D\}$ (1 point)**
* A -> B
* B -> C
* C -> D
* D -> A

**c. The only closed sets are $\{\}$, $\{A,B\}$, and $\{A,B,C,D\}$ (1 point)**
* A -> B
* B -> AB
* C -> BD
* D -> A

### Task 5. Normalize a database

Suppose Mike is the owner of a small store. He uses the following database ([mike.db](mike.db)) to store monthly sales of his store. 
* `Sales`(name, discount, mouth, price)

In [3]:
%load_ext sql
%sql sqlite:///mike.db

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


u'Connected: @mike.db'

In [4]:
%sql select * from Sales limit 5

 * sqlite:///mike.db
   sqlite:///sfu.db
Done.


name,discount,month,price
bar1,0.15,apr,19
bar8,0.15,apr,19
gizmo3,0.15,apr,19
gizmo7,0.15,apr,19
mouse1,0.15,apr,19



However, Mike finds that the database is difficult to update (i.e., when inserting new data into the database). Your job is to help Mike to normalize his database. You should do the following steps(a-d):

**a.** Find all *nontrivial* functional dependencies in the database.
This is a reverse engineering task, so expect to proceed in a trial and error fashion. Search first for the simple dependencies, say $name \rightarrow discount$ then try the more complex ones, like $name, discount \rightarrow month$, as needed. To check each functional dependency you have to write a SQL query.

Your challenge is to write this SQL query for every candidate functional dependency that you check, such that:

 - the query's answer is always short (say: no more than ten lines - remember that 0 results can be instructive as well)

 - you can determine whether the FD holds or not by looking at the query's answer. Try to be clever in order not to check too many dependencies, but don't miss potential relevant dependencies. For example, if you have A → B and C → D, you do not need to derive AC → BD as well.

**Write down all FDs that you found. (1 point)**

* name -> discount
* name,discount,month -> price
* name,discount ->month
* name -> price
* name,price -> month
* name,month,price -> discount
* name,month -> discount
* name,month -> price
* name,discount -> discount
* name,discount -> price
* month -> discount
* month,price -> discount


**For each FD above, write down the SQL query that discovered it (remember short queries are preferred) (1 point)**

In [5]:
%%sql

SELECT * FROM Sales s1, Sales s2
WHERE s1.month = s2.month 
AND s1.discount != s2.discount

In [None]:
%%sql

SELECT * FROM Sales s1, Sales s2
WHERE s1.name = s2.month AND s1.discount != s2.discount 
AND s1.price = s2.price

**b. Decompose the `Sales` table into BCNF. Like Task 1, show a description of your decomposition steps. (1 point)**

* We start by R(name,discount,price,month) 
    and Focus on the (name)+= {name,price} and {month}+= and {month}+={month,dicount}
* Then we decompose into (name,price) and (name,month,discount) by using
    (name)+={name,price}
* Now we R(name,discount,month,price) into -> R1(name,discount,month) and R2(name,price)
* For R1(name,discount,month) we decompose it by using {month}+={month,discount}.So one R is R11(discount,month) and another side is month plus the rest and we call this R12(name,month)
* R1(name,month,discount) becomes -> R11(month,dicount) and R12(name,month)
* R12(name,month) no need for more Functional dependciaes lowkey:)
* Finally we decompose R(name,dicount,month,price) into R11(month,discount) and R12(name,month) and R2(name,price)

**c.  Write down SQL queries to create the BCNF tables in the [mike.db](mike.db). Create keys and foreign keys where appropriate. (1 point)**

In [6]:
%sql 
PRAGMA table_info(Sales)

In [None]:
%%sql

CREATE TABLE monthDiscount(month varchar(3),discount float,
                           PRIMARY KEY(month))

In [None]:
%%sql

CREATE TABLE namePrice (name varchar(50),price int,
                           PRIMARY KEY(name))

In [None]:
%%sql

CREATE TABLE nameMonth(name varchar(50),month varchar(3),
                           FOREIGN KEY(name) references namePrice(name)
                           FOREIGN KEY(month) references namePrice(month) 
                      )

**d.  Populate the BCNF tables using the data from the sales table. (1 point)**

*Hint:* see [SQL INSERT INTO SELECT Statement](https://www.w3schools.com/sql/sql_insert_into_select.asp)

In [1]:
%%sql

INSERT INTO monthDiscount
SELECT DISTINCT month,discount
FROM Sales

UsageError: Cell magic `%%sql` not found.


In [None]:
%%sql

INSERT INTO namePrice
SELECT DISTINCT name,price
FROM Sales

In [None]:
%%sql

INSERT INTO nameMonth
SELECT DISTINCT name,month
FROM Sales

## Submission

Download [A4.zip](A4.zip). Answer the questions in A4.ipynb. Put `A4.ipynb`, `ER-diagram.png`, `sfu.db`, and the `mike.db` (with populated BCNF tables) into A4-submission.zip. 

Submit A4-submission.zip to the CourSys activity Assignment 4. 