# Assignment 4. Database Design

## Objectives

This assignment has two parts.

* In Part 1, you will be trained to draw an E/R diagram (Task 1) and transform it into relational schemas (Task 2).
* In Part 2, you will be trained to master important techniques related to database normalization (Tasks 3-5).

Download [A4.zip](A4.zip). Answer the questions in A4.ipynb.

## Part 1. Entity-Relationship Model (10 points)

You will design a database for SFU. This database will include information about departments, students, courses (and their offerings):

* Information about **students** includes their SID, name and age. The SID of a student is assumed to be unique, not shared by any other student. Each student is either a **graduate** or or an **undergraduate**. 
 - Each student must be in one category or the other, and cannot be in both categories simultaneously.
 - For graduate students, we record what their research field is.
 - For undergraduate students, we record their concentration.
 
 
 
* Information about **departments** includes their name and address. The name of a department is assumed to be unique, not shared by any other department.



* We need to be able to associate student with the departments with which they are affiliated. Each student has to be affiliated with exactly one department.



* Information about a course includes its number (e.g., "354"), name (e.g., "Introduction to Databases"), and capacity (e.g., 110). We also need to be able to know the unique department that owns each course: no cross-listing of courses across departments is allowed, and every course is owned by exactly one department.
 * Note: you cannot assume that course number uniquely identifies a course; in fact, you cannot assume even that course number together with course name uniquely identify a course. However, course number uniquely identifies courses within a department.
 
 
 
* Finally, we need to record all terms -- identified as semester (e.g., "fall") and year (e.g., "2018") -- in which each course has been offered in the history of the university.



* Assume that for a course to be offered during a term, it has at least one student enrolled. Also a course is offered at most once during each term. In other words, a course cannot have multiple sections during one term.



* Finally, assume that a student can take courses “owned” by departments with which the student is not affiliated. And a student should be enrolled in at least one course.


Please note that the following two sentences are not constraints (you don't need to enforce them in your ER-diagram). They just tell you what data might be like. 

1. Assume that for a course to be offered during a term, it has at least one student enrolled

2. Assume that a student can take courses “owned” by departments with which the student is not affiliated


### Task 1: E/R Diagram (5 points) 

Render the SFU database in the version of the E/R model that we studied in class, with *exactly* the constraints and requirements specified above.


<img src="ER-diagram.png" alt="Drawing" style="width: 800px;"/>

### Task 2: From E/R Diagram to Relational Schemas (5 points).

Please follow the above E/R Diagram and write SQL queries to create required tables in `sfu.db`

In [11]:
%load_ext sql

%sql sqlite:///sfu.db

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


'Connected: @sfu.db'

In [3]:
%%sql 
CREATE TABLE Students(
SID integer,
name varchar(100),
age integer,
Department_name varchar(100),
PRIMARY KEY(SID)
);

 * sqlite:///sfu.db
Done.


[]

In [4]:
%%sql 
CREATE TABLE Undergraduate(
SID integer,
concentration varchar(100),
PRIMARY KEY(SID),
FOREIGN KEY(SID) REFERENCES Students
);

 * sqlite:///sfu.db
Done.


[]

In [5]:
%%sql 
CREATE TABLE Graduate(
SID integer,
research_field varchar(100),
PRIMARY KEY(SID),
FOREIGN KEY(SID) REFERENCES Students
);

 * sqlite:///sfu.db
Done.


[]

In [6]:
%%sql 
CREATE TABLE Department(
name varchar(100),
address varchar(200),
PRIMARY KEY(name)
);

 * sqlite:///sfu.db
Done.


[]

In [7]:
%%sql 
CREATE TABLE Course(
name varchar(100),
number integer,
capacity integer,
Department_name varchar(100),
PRIMARY KEY(number , Department_name),
FOREIGN KEY(Department_name) REFERENCES Department
);

 * sqlite:///sfu.db
Done.


[]

In [8]:
%%sql 
CREATE TABLE Term(
semester varchar(10),
year integer,
PRIMARY KEY(semester, year)
);

 * sqlite:///sfu.db
Done.


[]

In [14]:
%%sql 
CREATE TABLE Enrolled(
SID integer,
Department_name varchar(100),
number integer,
semester varchar(10),
year integer,
PRIMARY KEY(SID, Department_name , number , semester, year),
FOREIGN KEY (number,Department_name) REFERENCES Course,
FOREIGN KEY (semester, year) REFERENCES term,
FOREIGN KEY (SID) REFERENCES Students
);

 * sqlite:///sfu.db
Done.


[]

## Part 2. Normalization (10 points)

### Task 3. Decompose a relational schema into BCNF

Consider a relational schema and a set of functional dependencies: 

* $R(A,B,C,D,E)$ with functional dependencies $A \rightarrow E$, $BC \rightarrow A$, $DE \rightarrow B$

**Decompose $R(A,B,C,D,E)$ into BCNF. Show all of your work and explain, at each step, which dependency violations you are correcting. You have to write down a description of your decomposition steps. （2 points)**

* R(A,B,C,D,E) 
*  FDs : A -> E && BC -> A && DE -> B
* Taking X : A 
*  Step 1 : A+ = {A,E} and A + Rest = {A,B,C,D}
*  Correcting A->E dependency.
*  R(A,E) and R(A,B,C,D)   [R(A,E) has no bad fds]
*   Any of the BAD FDs are not affecting R(A,E) 
*  Note: A->E is not a bad FD for R(A->E) and rest doesn't apply. 
*  R(A,E) is in BCNF.
*  
*  Step 2 : BC+ = {B,C,A}  && BC + Rest = {B,C,D}
*  Correcting BC->A dependency.
*  Decomposing R(A,B,C,D) from step 1.  
*  R(A,B,C) and R(B,C,D). These are in BCNF as there are no bad FDs
  
*  we get 3 BCNF relations: 
*  R(A,E) && R(A,B,C) && R(B,C,D)
  

### Task 4. Find a set of FDs that is consistent with a closed attribute set

A set of attributes $X$ is called closed (with respect to a given set of functional dependencies) if
$X^+=X$. Consider a relation with schema $R(A,B,C,D)$ and an unknown set of functional dependencies. For each closed attribute set below, give a set of functional dependencies that is consistent with it.

**a. All sets of attributes are closed (1 point)**
* A -> A
* B -> B
* C -> C 
* D -> D

**b. The only closed sets are $\{\}$ and $\{A,B,C,D\}$ (1 point)**
* A -> D 
* B -> D
* C -> B

**c. The only closed sets are $\{\}$, $\{A,B\}$, and $\{A,B,C,D\}$ (1 point)**
* A,B,C -> D 
* A -> B

### Task 5. Normalize a database

Suppose Mike is the owner of a small store. He uses the following database ([mike.db](mike.db)) to store monthly sales of his store. 
* `Sales`(name, discount, mouth, price)

In [15]:
%load_ext sql
%sql sqlite:///mike.db

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


'Connected: @mike.db'

In [16]:
%sql select * from Sales s limit 5

 * sqlite:///mike.db
   sqlite:///sfu.db
Done.


name,discount,month,price
bar1,0.15,apr,19
bar8,0.15,apr,19
gizmo3,0.15,apr,19
gizmo7,0.15,apr,19
mouse1,0.15,apr,19



However, Mike finds that the database is difficult to update (i.e., when inserting new data into the database). Your job is to help Mike to normalize his database. You should do the following steps(a-d):

**a.** Find all *nontrivial* functional dependencies in the database.
This is a reverse engineering task, so expect to proceed in a trial and error fashion. Search first for the simple dependencies, say $name \rightarrow discount$ then try the more complex ones, like $name, discount \rightarrow month$, as needed. To check each functional dependency you have to write a SQL query.

Your challenge is to write this SQL query for every candidate functional dependency that you check, such that:

 - the query's answer is always short (say: no more than ten lines - remember that 0 results can be instructive as well)

 - you can determine whether the FD holds or not by looking at the query's answer. Try to be clever in order not to check too many dependencies, but don't miss potential relevant dependencies. For example, if you have A → B and C → D, you do not need to derive AC → BD as well.

**Write down all FDs that you found. (1 point)**

Bad FDs: 
name -> price
month -> discount

**For each FD above, write down the SQL query that discovered it (remember short queries are preferred) (1 point)**

In [17]:
%sql SELECT name,price from Sales Order by name,price;

 * sqlite:///mike.db
   sqlite:///sfu.db
Done.


name,price
bar1,19
bar1,19
bar1,19
bar1,19
bar1,19
bar1,19
bar1,19
bar1,19
bar1,19
bar1,19


In [18]:
%sql SELECT month,discount from Sales Order by month,discount;

 * sqlite:///mike.db
   sqlite:///sfu.db
Done.


month,discount
apr,0.15
apr,0.15
apr,0.15
apr,0.15
apr,0.15
apr,0.15
apr,0.15
apr,0.15
apr,0.15
apr,0.15


**b. Decompose the `Sales` table into BCNF. Like Task 1, show a description of your decomposition steps. (1 point)**

* * Sales(name, discount, month , price) 
*  FDs : name -> price && month -> discount
* Taking X : name 
*  Step 1 : name+ = {name,price} and A + Rest = {name,discount,month}
*  Correcting name->price dependency.
*  R(name,price) and R(name,discount,month)   [R(name,price) has no bad fds]
*   Any of the BAD FDs are not affecting R(name,price) 
*  Note: name->price is not a bad FD for R(name->price) and rest doesn't apply. 
*  R(name,price) is in BCNF.
*  
* * R(name,discount,month) [Applying BCNF algorithm on R(name,discount,month)]
*  Step 2 : month+ = {month,discount}  && month + Rest = {name,month}
*  Correcting month->discount dependency.
*  Decomposing R(name,discount,month) from step 1.  
*  R(month,discount) && R(name,month)
*  These don't have any bad FDs. So, these are also in BCNF.
  
*  we get 3 BCNF relations: 
*  R(name,price) && R(month,discount) && R(name,month)

**c.  Write down SQL queries to create the BCNF tables in the [mike.db](mike.db). Create keys and foreign keys where appropriate. (1 point)**

In [19]:
%%sql 
CREATE TABLE NamePrice(
name varchar(100),
price integer,
PRIMARY KEY(name,price)
);

 * sqlite:///mike.db
   sqlite:///sfu.db
Done.


[]

In [20]:
%%sql 
CREATE TABLE MonthDiscount(
month char(3),
discount double,
PRIMARY KEY(month)
);

 * sqlite:///mike.db
   sqlite:///sfu.db
Done.


[]

In [21]:
%%sql 
CREATE TABLE NameMonth(
name varchar(100),
month char(3),
PRIMARY KEY(name,month),
FOREIGN KEY(name) REFERENCES NamePrice,
FOREIGN KEY(month) REFERENCES MonthDiscount
);

 * sqlite:///mike.db
   sqlite:///sfu.db
Done.


[]

**d.  Populate the BCNF tables using the data from the sales table. (1 point)**

*Hint:* see [SQL INSERT INTO SELECT Statement](https://www.w3schools.com/sql/sql_insert_into_select.asp)

In [22]:
%%sql
INSERT INTO NamePrice(name,price)
SELECT DISTINCT name,price from Sales;

 * sqlite:///mike.db
   sqlite:///sfu.db
36 rows affected.


[]

In [23]:
%%sql
INSERT INTO MonthDiscount(month,discount)
SELECT DISTINCT month,discount from Sales;

 * sqlite:///mike.db
   sqlite:///sfu.db
12 rows affected.


[]

In [24]:
%%sql
INSERT INTO NameMonth(name,month)
SELECT DISTINCT name,month from Sales;

 * sqlite:///mike.db
   sqlite:///sfu.db
426 rows affected.


[]

## Submission

Download [A4.zip](A4.zip). Answer the questions in A4.ipynb. Put `A4.ipynb`, `ER-diagram.png`, `sfu.db`, and the `mike.db` (with populated BCNF tables) into A4-submission.zip. 

Submit A4-submission.zip to the CourSys activity Assignment 4. 