# 3. Normalization and its concepts in DBMS
## 1. Functional Dependency

Functional dependency is a relationship that exists between two attributes. It typically exists between the primary key and a non-key attribute within a table.

## Definition

### Notation
X → Y   

- The left side of the functional dependency (FD) is known as a **determinant**.
- The right side of the functional dependency is known as a **dependent**.

For example:

Assume we have an employee table with attributes: `Emp_Id`, `Emp_Name`, `Emp_Address`.

Here, the `Emp_Id` attribute can uniquely identify the `Emp_Name` attribute of the employee table because if we know the `Emp_Id`, we can tell the employee name associated with it.

Functional dependency can be written as:

Emp_Id → Emp_Name   

We can say that `Emp_Name` is functionally dependent on `Emp_Id`.

## Types of Functional Dependency

### 1. Trivial Functional Dependency

- A → B has trivial functional dependency if B is a subset of A.
- The following dependencies are also trivial like: A → A, B → B

**Example:**

Consider a table with two columns Employee_Id and Employee_Name.  
{Employee_id, Employee_Name}   →    Employee_Id is a trivial functional dependency as   
Employee_Id is a subset of {Employee_Id, Employee_Name}.  
Also, Employee_Id → Employee_Id and Employee_Name   →    Employee_Name are trivial dependencies too

### 2. Non-Trivial Functional Dependency



- A → B has a non-trivial functional dependency if B is not a subset of A.
- When A intersection B is NULL, then A → B is called as complete non-trivial.



**Example:** <br>
ID   →    Name,  
Name   →    DOB  

In these examples, `Name` is not a subset of `ID`, and `DOB` is not a subset of `Name`, making these non-trivial functional dependencies.

## 2. Inference Rule (IR)

Armstrong's axioms are the basic inference rules used to conclude functional dependencies on a relational database. The inference rule is a type of assertion that can apply to a set of functional dependencies (FD) to derive other FDs. Using the inference rule, we can derive additional functional dependencies from the initial set.

The functional dependency has 6 types of inference rules:

#### 1. Reflexive Rule (IR1)

In the reflexive rule, if Y is a subset of X, then X determines Y.

    If X ⊇ Y then X → Y

**Example:**

    X = {a, b, c, d, e}  
    Y = {a, b, c}  

#### 2. Augmentation Rule (IR2)

The augmentation rule, also called a partial dependency, states that if X determines Y, then XZ determines YZ for any Z.

    If X → Y then XZ → YZ

**Example:**

    For R(ABCD), if A → B then AC → BC  

#### 3. Transitive Rule (IR3)

In the transitive rule, if X determines Y and Y determines Z, then X must also determine Z.

    If X → Y and Y → Z then X → Z

#### 4. Union Rule (IR4)

The union rule states that if X determines Y and X determines Z, then X must also determine Y and Z.

    If X → Y and X → Z then X → YZ

**Proof:**

1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X, where XX = X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)

#### 5. Decomposition Rule (IR5)

The decomposition rule, also known as the projection rule, is the reverse of the union rule. This rule states that if X determines Y and Z, then X determines Y and X determines Z separately.

    If X → YZ then X → Y and X → Z

**Proof:**

1. X → YZ (given)
2. YZ → Y (using IR1)
3. X → Y (using IR3 on 1 and 2)

#### 6. Pseudo-transitive Rule (IR6)

In the pseudo-transitive rule, if X determines Y and YZ determines W, then XZ determines W.

    If X → Y and YZ → W then XZ → W

**Proof:**

1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)

## Normalization
A large database defined as a single relation may result in data duplication. This repetition of data can lead to:

- Making relations very large.
- Difficulty in maintaining and updating data as it would involve searching many records in relation.
- Wastage and poor utilization of disk space and resources.
- Increased likelihood of errors and inconsistencies.

To handle these problems, we should analyze and decompose the relations with redundant data into smaller, simpler, and well-structured relations that satisfy desirable properties. Normalization is a process of decomposing the relations into relations with fewer attributes.

#### What is Normalization?

- **Normalization** is the process of organizing the data in the database.
- Normalization is used to minimize redundancy in a relation or set of relations and to eliminate undesirable characteristics like insertion, update, and deletion anomalies.
- Normalization divides larger tables into smaller ones and links them using relationships.
- The normal form is used to reduce redundancy in database tables.

#### Why do we need Normalization?

The main reason for normalizing relations is to remove anomalies. Failure to eliminate anomalies leads to data redundancy and can cause data integrity and other problems as the database grows. Normalization consists of a series of guidelines that help guide you in creating a good database structure.

#### Data modification anomalies can be categorized into three types:

- **Insertion Anomaly:** When one cannot insert a new tuple into a relationship due to lack of data.
- **Deletion Anomaly:** When the deletion of data results in the unintended loss of some other important data.
- **Update Anomaly:** When an update of a single data value requires multiple rows of data to be updated.

### Types of Normal Forms

Normalization works through a series of stages called normal forms. The normal forms apply to individual relations. A relation is said to be in a particular normal form if it satisfies specific constraints.

Following are the various types of normal forms:

| **Normal Form** | **Description** |
|-----------------|-----------------|
| **1NF**         | A relation is in 1NF if it contains atomic values. |
| **2NF**         | A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functionally dependent on the primary key. |
| **3NF**         | A relation will be in 3NF if it is in 2NF and no transitive dependency exists. |
| **BCNF**        | A stronger definition of 3NF is known as Boyce-Codd normal form. |
| **4NF**         | A relation will be in 4NF if it is in BCNF and has no multi-valued dependency. |
| **5NF**         | A relation is in 5NF if it is in 4NF and does not contain any join dependency, ensuring that joins are lossless. |

### Advantages of Normalization

- Helps to minimize data redundancy.
- Greater overall database organization.
- Data consistency within the database.
- Much more flexible database design.
- Enforces the concept of relational integrity.

### Disadvantages of Normalization

- You cannot start building the database before knowing what the user needs.
- Performance may degrade when normalizing the relations to higher normal forms, i.e., 4NF, 5NF.
- It is very time-consuming and difficult to normalize relations of a higher degree.
- Careless decomposition may lead to a bad database design, resulting in serious problems.

![](https://static.javatpoint.com/dbms/images/dbms-normalization.png)

## 3. Types of Normalization

**First Normal Form (1NF)**

A relation is in 1NF if:
- It contains atomic values.
- Each attribute holds only single-valued attributes.
- It disallows multi-valued, composite attributes, and their combinations.

**Example:**

Original EMPLOYEE table not in 1NF due to multi-valued attribute EMP_PHONE:

| EMP_ID | EMP_NAME | EMP_PHONE           | EMP_STATE |
|--------|----------|---------------------|-----------|
| 14     | John     | 7272826385, 9064738238 | UP        |
| 20     | Harry    | 8574783832          | Bihar     |
| 12     | Sam      | 7390372389, 8589830302 | Punjab    |

Decomposed EMPLOYEE table into 1NF:

| EMP_ID | EMP_NAME | EMP_PHONE    | EMP_STATE |
|--------|----------|--------------|-----------|
| 14     | John     | 7272826385   | UP        |
| 14     | John     | 9064738238   | UP        |
| 20     | Harry    | 8574783832   | Bihar     |
| 12     | Sam      | 7390372389   | Punjab    |
| 12     | Sam      | 8589830302   | Punjab    |

---

**Second Normal Form (2NF)**

A relation is in 2NF if:
- It is in 1NF.
- All non-key attributes are fully functionally dependent on the primary key.

**Example:**

Original TEACHER table not in 2NF due to partial dependency of TEACHER_AGE on part of the candidate key:

| TEACHER_ID | SUBJECT   | TEACHER_AGE |
|------------|-----------|-------------|
| 25         | Chemistry | 30          |
| 25         | Biology   | 30          |
| 47         | English   | 35          |
| 83         | Math      | 38          |
| 83         | Computer  | 38          |

Decomposed TEACHER table into 2NF:

TEACHER_DETAIL table:

| TEACHER_ID | TEACHER_AGE |
|------------|-------------|
| 25         | 30          |
| 47         | 35          |
| 83         | 38          |

TEACHER_SUBJECT table:

| TEACHER_ID | SUBJECT   |
|------------|-----------|
| 25         | Chemistry |
| 25         | Biology   |
| 47         | English   |
| 83         | Math      |
| 83         | Computer  |

---

**Third Normal Form (3NF)**

A relation is in 3NF if:
- It is in 2NF.
- It does not have transitive dependencies.

**Example:**

Original EMPLOYEE_DETAIL table not in 3NF due to transitive dependency:

| EMP_ID | EMP_NAME  | EMP_ZIP | EMP_STATE | EMP_CITY   |
|--------|-----------|---------|-----------|------------|
| 222    | Harry     | 201010  | UP        | Noida      |
| 333    | Stephan   | 02228   | US        | Boston     |
| 444    | Lan       | 60007   | US        | Chicago    |
| 555    | Katharine | 06389   | UK        | Norwich    |
| 666    | John      | 462007  | MP        | Bhopal     |

Decomposed EMPLOYEE table into 3NF:

EMPLOYEE table:

| EMP_ID | EMP_NAME  | EMP_ZIP |
|--------|-----------|---------|
| 222    | Harry     | 201010  |
| 333    | Stephan   | 02228   |
| 444    | Lan       | 60007   |
| 555    | Katharine | 06389   |
| 666    | John      | 462007  |

EMPLOYEE_ZIP table:

| EMP_ZIP | EMP_STATE | EMP_CITY |
|---------|-----------|----------|
| 201010  | UP        | Noida    |
| 02228   | US        | Boston   |
| 60007   | US        | Chicago  |
| 06389   | UK        | Norwich  |
| 462007  | MP        | Bhopal   |

---

**Boyce Codd Normal Form (BCNF)**

A relation is in BCNF if:
- It is in 3NF.
- For every functional dependency X → Y, X is a superkey.

**Example:**

Original EMPLOYEE table not in BCNF due to partial dependency:

| EMP_ID | EMP_COUNTRY | EMP_DEPT     | DEPT_TYPE | EMP_DEPT_NO |
|--------|-------------|--------------|-----------|-------------|
| 264    | India       | Designing    | D394      | 283         |
| 264    | India       | Testing      | D394      | 300         |
| 364    | UK          | Stores       | D283      | 232         |
| 364    | UK          | Developing   | D283      | 549         |

Decomposed EMPLOYEE table into BCNF:

EMP_COUNTRY table:

| EMP_ID | EMP_COUNTRY |
|--------|-------------|
| 264    | India       |
| 364    | UK          |

EMP_DEPT table:

| EMP_DEPT    | DEPT_TYPE | EMP_DEPT_NO |
|-------------|-----------|-------------|
| Designing   | D394      | 283         |
| Testing     | D394      | 300         |
| Stores      | D283      | 232         |
| Developing  | D283      | 549         |

EMP_DEPT_MAPPING table:

| EMP_ID | EMP_DEPT    |
|--------|-------------|
| 264    | Designing   |
| 264    | Testing     |
| 364    | Stores      |
| 364    | Developing  |

---

**Fourth Normal Form (4NF)**

A relation is in 4NF if:
- It is in BCNF.
- It has no multi-valued dependencies.

**Example:**

Original STUDENT table not in 4NF due to multi-valued dependency:

| STU_ID | COURSE     | HOBBY      |
|--------|------------|------------|
| 21     | Computer   | Dancing    |
| 21     | Math       | Singing    |
| 34     | Chemistry  | Dancing    |
| 74     | Biology    | Cricket    |
| 59     | Physics    | Hockey     |

Decomposed STUDENT table into 4NF:

STUDENT_COURSE table:

| STU_ID | COURSE    |
|--------|-----------|
| 21     | Computer  |
| 21     | Math      |
| 34     | Chemistry |
| 74     | Biology   |
| 59     | Physics   |

STUDENT_HOBBY table:

| STU_ID | HOBBY     |
|--------|-----------|
| 21     | Dancing   |
| 21     | Singing   |
| 34     | Dancing   |
| 74     | Cricket   |
| 59     | Hockey    |

---

**Fifth Normal Form (5NF)**

A relation is in 5NF if:
- It is in 4NF.
- It does not contain any join dependency and joining should be lossless.

**Example:**

Original SUBJECT table not in 5NF due to join dependency:

| SUBJECT   | LECTURER  | SEMESTER   |
|-----------|-----------|------------|
| Computer  | Anshika   | Semester 1 |
| Computer  | John      | Semester 1 |
| Math      | John      | Semester 1 |
| Math      | Akash     | Semester 2 |
| Chemistry | Praveen   | Semester 1 |

Decomposed SUBJECT table into 5NF:

P1 table:

| SEMESTER   | SUBJECT   |
|------------|-----------|
| Semester 1 | Computer  |
| Semester 1 | Math      |
| Semester 1 | Chemistry |
| Semester 2 | Math      |

P2 table:

| SUBJECT   | LECTURER  |
|-----------|-----------|
| Computer  | Anshika   |
| Computer  | John      |
| Math      | John      |
| Math      | Akash     |
| Chemistry | Praveen   |

P3 table:

| SEMESTER   | LECTURER  |
|------------|-----------|
| Semester 1 | Anshika   |
| Semester 1 | John      |
| Semester 1 | John      |
| Semester 2 | Akash     |
| Semester 1 | Praveen   |



### Simplified Example: Student Hobbies

Imagine we have a table that keeps track of students and their hobbies.

| StudentID | StudentName | Hobby1     | Hobby2     |
|-----------|-------------|------------|------------|
| 1         | Alice       | Reading    | Swimming   |
| 2         | Bob         | Painting   | Dancing    |
| 3         | Carol       | Swimming   | Painting   |

### 1NF (First Normal Form)

**Rule:** Ensure that each column contains atomic (indivisible) values.

Our table is not in 1NF because we have multiple hobbies in a single row. We need to create separate rows for each hobby.

**1NF Table:**

| StudentID | StudentName | Hobby     |
|-----------|-------------|-----------|
| 1         | Alice       | Reading   |
| 1         | Alice       | Swimming  |
| 2         | Bob         | Painting  |
| 2         | Bob         | Dancing   |
| 3         | Carol       | Swimming  |
| 3         | Carol       | Painting  |

### 2NF (Second Normal Form)

**Rule:** Ensure that the table is in 1NF and all non-key attributes are fully functionally dependent on the primary key.

Here, our primary key is a composite of `StudentID` and `Hobby`. In this table, `StudentName` is dependent only on `StudentID`, not on the combination of `StudentID` and `Hobby`. We need to separate the student information from the hobbies.

**2NF Tables:**

**Students Table:**

| StudentID | StudentName |
|-----------|-------------|
| 1         | Alice       |
| 2         | Bob         |
| 3         | Carol       |

**Hobbies Table:**

| StudentID | Hobby     |
|-----------|-----------|
| 1         | Reading   |
| 1         | Swimming  |
| 2         | Painting  |
| 2         | Dancing   |
| 3         | Swimming  |
| 3         | Painting  |

### 3NF (Third Normal Form)

**Rule:** Ensure that the table is in 2NF and all the attributes are functionally dependent only on the primary key.

Our tables are now in 2NF and there are no transitive dependencies. This means each non-key attribute depends only on the primary key. In this case, we don't have further attributes to normalize.

### BCNF (Boyce-Codd Normal Form)

**Rule:** Ensure that the table is in 3NF and for every functional dependency (X → Y), X is a super key.

Our current tables already satisfy BCNF because the left side of every dependency (e.g., `StudentID` in `Students Table` and `StudentID` & `Hobby` in `Hobbies Table`) is a super key.

### 4NF (Fourth Normal Form)

**Rule:** Ensure that the table is in BCNF and contains no multi-valued dependencies.

We don't have any multi-valued dependencies in our current tables.

### 5NF (Fifth Normal Form)

**Rule:** Ensure that the table is in 4NF and cannot be decomposed into any smaller tables without losing information (join dependency).

Our tables are already in 5NF because we cannot decompose them further without losing information.

### Summary

By going through these steps, we have organized our data more efficiently. Let's look at the final structure again:

**Students Table:**

| StudentID | StudentName |
|-----------|-------------|
| 1         | Alice       |
| 2         | Bob         |
| 3         | Carol       |

**Hobbies Table:**

| StudentID | Hobby     |
|-----------|-----------|
| 1         | Reading   |
| 1         | Swimming  |
| 2         | Painting  |
| 2         | Dancing   |
| 3         | Swimming  |
| 3         | Painting  |

This structure reduces redundancy (no repeated student names) and ensures data integrity, making our database more efficient and easier to maintain.

### Visualization

To help visualize:

1. **1NF:** Separate multiple values into individual rows.
2. **2NF:** Separate data into different tables based on dependencies.
3. **3NF:** Ensure no transitive dependencies (attributes depend only on the primary key).
4. **BCNF:** Ensure all dependencies have super keys on the left.
5. **4NF & 5NF:** Ensure no multi-valued dependencies and no further decomposition needed.


#### Prepared By,
Ahamed Basith