### Introduction
The process of normalization determines the degree of redundancy in a relation. It follows from the theory of functional dependencies.

### Functional Dependencies
We consider the following relations:  

![Relations](https://i.imgur.com/d1ViNY0.png)

In the above relation we will consider that the proirity is decided by the GPA. Two tuples with same GPA will have the same priority. We can write this as:

$$\forall t,u \in Student$$
$$t.GPA = u.GPA \implies t.Priority = u.Priority$$

The above statement is that for all tuples $t$ and $u$ in the Student relation, if the GPA for those two tuples are the same then the priority will also be the same. This can be written as:
$$GPA \rightarrow Priority$$

More generally,
$$\forall t,u \in R$$
$$t.A = u.A \implies t.B = u.B$$
$$A \rightarrow B$$

Expanding this to muliple attributes,
$$\forall t,u \in R$$
$$t.[A_1, A_2, ..., A_n] = u.[A_1, A_2, ..., A_n] \implies t.[B_1, B_2, ..., B_m] = u.[B_1, B_2, ..., B_m]$$
$$A_1, A_2, ..., A_n \rightarrow B_1, B_2, ..., B_m$$
$$\overline{A} \rightarrow \overline{B}$$

The functional dependencies in a relation is based on real world knowledge. And all instances of the relation adhere to it. The meaning of $\overline{A} \rightarrow \overline{B}$ can be made more clear by the table diagram below:

![FD](https://i.imgur.com/DQNSfYw.png)

Some FDs evident from the Student relation:  
$SSN \rightarrow Name$  
$SchoolCode \rightarrow SchoolName, SchoolCity$  
$SchoolName, SchoolCity \rightarrow SchoolCode$  
$SSN \rightarrow GPA$  
$GPA \rightarrow Priority$

Some FDs from the Application relation:  
$SSN, CollegeName \rightarrow Course$  
$SSN \rightarrow State$ if you can apply to only one college in a state

### Types of Functional Dependency
**Trivial FD:** A functional dependency $\overline{A} \rightarrow \overline{B}$ is trivial if $\overline{B} \subseteq \overline{A}$.  
![Trivial](https://i.imgur.com/jm222iV.png)  

**Non Trivial FD:** A functional dependency $\overline{A} \rightarrow \overline{B}$ is non-trivial if $\overline{B} \nsubseteq \overline{A}$. 

![Non](https://i.imgur.com/1rVqzNX.png)  

**Complete Non Trivial FD:** A functional dependency $\overline{A} \rightarrow \overline{B}$ is complete non-trivial if $\overline{B} \cap \overline{A} = \emptyset$.  
![Complete Non](https://i.imgur.com/OvDv2FE.png)  

### Functional Dependency Rules
**Splitting:** We can split RHS
$$\overline{A} \rightarrow B_1, B_2, ..., B_m$$
$$\implies \overline{A} \rightarrow B_1, \overline{A} \rightarrow B_2, ...$$

Splitting left hand side is however not possible in many cases, for example:
$$SchoolName, SchoolCity \rightarrow SchoolCode$$
Cannot be written as
$$SchoolName \rightarrow SchoolCode$$
$$SchoolCity \rightarrow SchoolCode$$  

**Combining:** We can combine multiple FDs as one:
$$\overline{A} \rightarrow B_1$$
$$\overline{A} \rightarrow B_2$$
$$\vdots$$
$$\overline{A} \rightarrow B_m$$
$$\implies \overline{A} \rightarrow B_1, B_2, ..., B_m$$

**Transitive** If we have
$$A \rightarrow B, B \rightarrow C$$
Then
$$A \rightarrow C$$

With all the above rules we can find the *closure of attributes* which is: given a relation and set of FDs, find all $B$ such that $\overline{A} \rightarrow B$. That set would be represented as $\overline{A}^{+}$. To find closure,
- start with the given set of attributes: $\{A_1, A_2, ..., A_n\}$
- repeat until no change the following: If $\overline{A} \rightarrow \overline{B}$, then add $\overline{B}$ to the same set.

As an example if we take student relation example, we can see the following functional dependencies:  
![Student FDs](https://i.imgur.com/2VRkOPC.png)  

Let us find the closure for the attributes SSN and SchoolCode. First we create a set with these two attributes  
$$\{SSN, SchoolCode\}$$
Then, based on the above available functional dependencies, we add attribute to this set. So first we add Name, Address and GPA:
$$\{SSN, SchoolCode, Name, Address, GPA\}$$
Now we add Priority since $GPA \rightarrow Priority$
$$\{SSN, SchoolCode, Name, Address, GPA, Priority\}$$
Finally we add SchoolName and SchoolCity
$$\{SSN, SchoolCode, Name, Address, GPA, Priority, SchoolName, SchoolCity\}^{+}$$

### Closure and Keys
Once we have the closure determined, we can see that in this case SSN and SchoolCode functionally determince *all* the attributes in the relation. If this is the case, we can say that SSN and SchoolCode together form a *key* for this relation.  

We often get the following questions:
- Given $\overline{A}$, does it represent key? Compute $\overline{A}^{+}$, if it contains all the attributes of the relation, then it is a key.
- Given set of FDs, how to get the keys? Find all possible subsets of attributes (start with the smallest) and find closures of all subsets.

**Superkey:** an attribute, or combination of attributes, that functionally determines all of the table's other attributes. A super key is a set of attributes whose closure is the set of all atributes. Suppose that for our relation $R$, the following FDs hold true:
$$A \rightarrow B, BC \rightarrow E, ED \rightarrow A$$  
All superkeys are:
$$\{ABCDE, BCED, ACDE, ABCD, ACD, BCD, CDE\}$$  

**Candidate Key:** is minimal superkey, that is superkey with minimum number of attributes. In the above example, the candidate keys are:
$$\{ACD, BCD, CDE\}$$

**Primary Key:** is one of the candidate keys as long as the candidate key does not contain null values.

### First Normal Form
The following rule must be fullfilled:
- The domain of an attribute must include only atomic (simple, indivisible) values and
- the value of any attribute in a tuple must be a single value from the domain of that attribute

![1NF](https://i.imgur.com/Qi8WM04.png)  

This however introduces redundancy in data.

### Second Normal Form
For the schema to be in second normal form, the schema should be in first normal form. The second normal form says that all non-prime attributes (attributes not part of candidate key) must not functionally depend on subset of attributes of the candidate key.  

![Student](https://i.imgur.com/CFgmu0F.png)  

We can see that the following dependencies can be established:
$$StudentID \rightarrow StudentName, CourseID \rightarrow CourseName$$
The candidate key in this relation is:
$$\{StudentID, CourseID\}$$

The above relation breaks 2nd Normal Form because non-prime attributes StudentName and CourseName depend on partial attributes of candidate key
1. $StudentID \rightarrow StudentName$
2. $CourseID \rightarrow CourseName$

So we break this one relation into two:  
![2NF 1](https://i.imgur.com/T71qtJB.png)

The second relation is not in second normal form because CourseName is functionally dependent on subset of candidate key StudentID, CourseID. So we need to decompose it further:  
![2NF 2](https://i.imgur.com/IoTdhvr.png)

### Third Normal Form
The schema must be in second normal form. The third normal form stated that non-prime attributes depend only on the candidate keys and do not have a transitive dependency on another key.

For example:  
![Before 3NF](https://i.imgur.com/CPcZtPB.png)  

Here we can see that $\{Tournament, Year\}$ combination forms the candidate key and $\{Winner, WinnerDOB\}$ are non-prime attributes. The schema is in second normal form.
$$Tournament, Year \rightarrow Winner$$
$$Winner \rightarrow WinnerName$$
The schema violates third normal form due the following transitive dependency:
$$Tournament, Year \rightarrow Winner \rightarrow WinnerName$$
$$\implies Tournament, Year \rightarrow WinnerName$$

So we need to decompose as:  
![3NF](https://i.imgur.com/x4zrz8E.png)