# Database normalization

In this lab, we will go through two examples and decompose the tables to achieve different normal forms. 


## Example 1

Consider the following table which captures a multidisciplinary course which is taught by multiple instructors.

**Course_Lecture(Dept, Instructor, Module, Texts)**

|Module|Dept|Instructor|Texts|
|---|---|---|---|
|M1|D1|I1|T1,T2|
|M2|D1|I1|T1,T3|
|M3|D1|I2|T4|
|M4|D2|I3|T1,T5|
|M5|D2|I4|T6|

### Task: Identify the current form of the table. Decompose the table to make the database in 3NF.

The table is in unnormalized form (0-NF). The texts field is multivalued. So it is not in 1NF. 


### 1-NF

To make the the database in 1NF, we need to keep one text per cell. The following table is in 1NF. 


**Course_Lecture(Dept, Instructor, Module, Text):**
 

|Module|Dept|Instructor|Text|
|---|---|---|---|
|M1|D1|I1|T1|
|M1|D1|I1|T2|
|M2|D1|I1|T1|
|M2|D1|I1|T3|
|M3|D1|I2|T4|
|M4|D2|I3|T1|
|M4|D2|I3|T5|
|M5|D2|I4|T6|

## 2-NF

Once a table is in 1NF, we need to check whether it is in 2NF. For this, we address the following question: 

Q. Is there any partial dependency in this table? 

Partial dependency exists when a non-key (non-prime) attribute depends on a subset of a candidate/primary keys. So we need to identify a primary key for this table.


What could be primary key/candidate key of the above table? As no single attribute will work, next step is search for a pair of columns that identify each row uniquely. The only pair that can work as a primary key is (module, text). In terms of function dependency notation, we can write the dependencies in this table as follows: 

```
{module, text} ==> {dept, instructor}
```
Now we need to search for other depedencies between columns in the table. E.g., 

```
{module} ==> {instructor}
{module} ==> {dept}
{instructor} ==> {dept}
```

As a non-key (e.g., instructor) depends on a subset of the primary key (e.g. {module}), there exists partial dependency. 


How to remove partial dependency? Since the dependency `{module} ==> {instructor, dept}` is introducing the partial depency, we can remove right hand side of this dependencies and form a new relationship. So we get two relations here: 

```
1. module_text(module, text)
2. module_instructor(module, dept, instructor)
```

The two tables will be as follows: 

**module_text(module, text):**

primary key: {module, text}

|Module|Text|
|---|---|
|M1|T1|
|M1|T2|
|M2|T1|
|M2|T3|
|M3|T4|
|M4|T1|
|M4|T5|
|M5|T6|

**module_instructor(module, dept, instructor):**

primary key: {module}

|Module|Dept|Instructor|
|---|---|---|
|M1|D1|I1|
|M2|D1|I1|
|M3|D1|I2|
|M4|D2|I3|
|M5|D2|I4|

## 3-NF

In module_text(module, text), both columns together form primary key. This table is in 2-NF and 3-NF. 

IN module_instructor(module, dept, instructor), the module column is the primary key. This table is in 2-NF, but not in 3-NF. Why? 

The dependencies in this table are as follows: 

```
{module} ==> {dept}
{module} ==> {instructor}
{instructor} ==> {dept}
```

There exists a transitive dependency (`{module} ==> {instructor}, {instructor} ==> {dept}`) here. We need to take  this dependency out of this table and form a new relationship. 


```
1. module_text(module, text)
2. module_instructor(module, instructor)
3. instructor_dept(instructor, dept)
```

Final snapshot of the database is as follows. 


**module_text(module, text):**

primary key: {module, text}

|Module|Text|
|---|---|
|M1|T1|
|M1|T2|
|M2|T1|
|M2|T3|
|M3|T4|
|M4|T1|
|M4|T5|
|M5|T6|


**module_instructor(module,instructor)**

primary key: {module}

|Module|Instructor|
|---|---|
|M1|I1|
|M2|I1|
|M3|I2|
|M4|I3|
|M5|I4|


**instructor_dept(instructor, dept)**

primary key: {instructor}

|Instructor|Dept|
|---|---|
|I1|D1|
|I2|D1|
|I3|D2|
|I4|D2|

## Example 2

**Task: This following movie database `movie(title,year,length,genre,studio_name,star_names)` is in unnormalized (0-NF/UNF) form. Decopose the table so that all the relationships become 3-NF**

This example is taken from the book Database Systems: The Complete Book by H. Molina, J Ullman, J. Widom. 

|title|year|length|genre|studio_name|studio_addr|star_names|
|---|---|---|---|---|---|---|
|Star Wars|1977|124|SciFi|Fox|Hollywood|Carrie Fisher,Mark Hamill,Harrison Ford|
|Gone With the Wind|1939|231|drama|MGM|Buena Vista|Vivien Leigh| 
|Wayne’s World|1992|95|comedy|Paramount|Hollywood|Dana Carvey,Mike Meyers|





### 1-NF:

Ther is one mutli-valued attributed (star_names), we can convert this into a column (star_name) and keep one value per field/cell. Then this relationship will become 1-NF. 


**movie(title,year,length,genre,studio_name,star_name):**

|title|year|length|genre|studio_name|studio_addr|star_name|
|---|---|---|---|---|---|---|
|Star Wars|1977|124|SciFi|Fox|Hollywood|Carrie Fisher|
|Star Wars|1977|124|SciFi|Fox|Hollywood|Mark Hamill|
|Star Wars|1977|124|SciFi|Fox|Hollywood|Harrison Ford|
|Gone With the Wind|1939|231|drama|MGM|Buena Vista|Vivien Leigh| 
|Wayne’s World|1992|95|comedy|Paramount|Hollywood|Dana Carvey|
|Wayne’s World|1992|95|comedy|Paramount|Hollywood|Mike Meyers|




### 2-NF:

We need to identify the primary key/candidate keys and all the dependencies in the table. Once we determine the primary key of the table we need to ask whether there is any partial dependencies in the table. 

None of the single columns can uniquely identify the rest of the columns. E.g., title can uniquely identifies year, length, genre, studio_name, but not the star_name. Then we can check pair of columns followed by checking of triplet of columns. 

After checking different options, we can say `{title, year, star_name}` form a key as these three columns uniquely determine each row of the table. 

Q. Why `{title,year}` is not a key? Because title and year do not determine star_name.
Q. Why `{year, star_name}` is not a key? Because we could have a star in two movies in the same year.
Q. Why `{title,star_name}` is not a key? Because two movies with the same title, made in different years, occasionally have a star in common


Now we need to search for a other dependencies in the table: 

```

{title, year} ==> {length}   # partial dependency
{title, year} ==> {genre}   # partial dependency
{title, year} ==> {studio_name}  # partial dependency
{title, year} ==> {studio_addr}  # partial dependency
{studio_name} ==> {studio_addr}

```

There exist partial depdenency in the above table. So the table is not in 2NF. 


How to decompose? Take the subset of keys that is introducing partial dependency. In this case it is `{title, year}` and we can write the partial depdencies concisely: `{title, year} ==> {length, genre, studio_name, studio_addr}`. Using `{title, year}`, we can decompose the table into two relationships: 

```
1. Movie_star(title, year, star_name)
2. Movie_details(title, year, length, genre, studio_name, studio_addr)
```




**Movie_star(title,year,star_name):**

primary key: {title, year, star_name}

|title|year|star_name|
|---|---|---|
|Star Wars|1977|Carrie Fisher|
|Star Wars|1977|Mark Hamill|
|Star Wars|1977|Harrison Ford|
|Gone With the Wind|1939|Vivien Leigh| 
|Wayne’s World|1992|Dana Carvey|
|Wayne’s World|1992|Mike Meyers|


**Movie_details(title, year, length, genre, studio_name, studio_addr):**

primary key: {title, year}

|title|year|length|genre|studio_name|studio_addr|
|---|---|---|---|---|---|
|Star Wars|1977|124|SciFi|Fox|Hollywood|
|Gone With the Wind|1939|231|drama|MGM|Buena Vista|
|Wayne’s World|1992|95|comedy|Paramount|Hollywood|


### 3-NF:

`Movie_star(title,year,star_name)` table is in 3NF as there is not transitive dependencies in this table. `Movie_details(title, year, length, genre, studio_name, studio_addr)` is not in 3NF as there is a transitive dependency in this table. 

```
{title, year} ==> {studio_name} 
{studio_name} ==> {studio_addr}
```

We need to decompose movie_details to accomplish 3NF. 

```
1. Movie_star(title, year, star_name)
2. Movie_details(title, year, length, genre, studio_name)
3. Studio(studio_name, studio_addr)
```



**Movie_star(title,year,star_name):**

primary key: {title, year, star_name}

|title|year|star_name|
|---|---|---|
|Star Wars|1977|Carrie Fisher|
|Star Wars|1977|Mark Hamill|
|Star Wars|1977|Harrison Ford|
|Gone With the Wind|1939|Vivien Leigh| 
|Wayne’s World|1992|Dana Carvey|
|Wayne’s World|1992|Mike Meyers|


**Movie_details(title, year, length, genre, studio_name):**

primary key: {title, year}

|title|year|length|genre|studio_name|
|---|---|---|---|---|
|Star Wars|1977|124|SciFi|Fox|
|Gone With the Wind|1939|231|drama|MGM|
|Wayne’s World|1992|95|comedy|Paramount|

**Studio(studio_name, studio_addr):**

primary key: {studio_name}

|studio_name|studio_addr|
|---|---|
|Fox|Hollywood|
|MGM|Buena Vista|
|Paramount|Hollywood|