In this chapter, **`database normalization`** is introduced. 

[Database normalization](https://github.com/Nhan121/Lectures_notes-teaching-in-VN-/blob/master/SQL%20practices/Track:%20SQL%20for%20Database%20Administrators/Course2:%20Database%20Design/database-schemas-and-normalization.ipynb) maintains data integrity and reduces data duplication. 

`1st`, `2nd`, and `3rd` **`Normal Form`** are defined as steps in the process of `normalizing a database`. 

Examples to clarify concepts are provided throughout the chapter.

## 1. The importance of data normalization.
### Examples.
#### EX1. Redundant data.
- Data redundancy can be problematic, for example

                    CREATE TABLE loan(
                                       borrower_id  INTEGER  REFERENCES borrower(id),
                                       bank_name  VARCHAR(50) DEFAULT NULL,
                                       bank_id INTEGER REFERENCES bank(id),
                                       ...
                                      );
                    CREATE TABLE bank(
                                       id SERIAL PRIMARY KEY,
                                       name VARCHAR(50) DEFAULT NULL,
                                       ...
                                     );
$ \Rightarrow$ The problems are:

> - different `banks` have the same `name`
> - The `name` is changed!
> - `banks` share `name` with the `distinct ids`.
> - Updates to `bank names` will only affect `bank` table.

#### EX2. Consolidating record.
Firstly, assume that we have the 2 following tables: `applicant` and `borrower` defined by

**`applicant`**

| `id` | `name` |
|:-|:-|
|1| Jane Simmons |
|2| Rick Demps |
|3| Pam Jones |

**`borrower`**

| `id` | `name` |
|:-|:-|
| 1| Jack Smith |
| 2| Sara William |
| 3| Jenifer Valdez|
| 4| Pam Jones |

The duplicated value `Pam Jones` make the `consolidation`; so we must adjust them into

**`applicant`**

| `id` | `name` |
|:-|:-|
|1| Jane Simmons |
|2| Rick Demps |

**`borrower`**

| `id` | `name` |
|:-|:-|
| 1| Jack Smith |
| 2| Sara William |
| 3| Jenifer Valdez|
| 4| Pam Jones |

**`SQL` command.**

                    CREATE TABLE borrower(
                                           id SERIAL PRIMARY KEY,
                                           name VARCHAR(50) NOT NULL,
                                           approved BOOLEAN DEFAULT NULL
                                          );
This meant
- `approved = NULL` then `applicant`
- `approved = TRUE` then `borrower`
- `approved = FALSE` then `denied application`

**Why using `normalization`?**
- ***Reduces*** data `duplication`. This helps to *optimize the storage requirements* of a database.
- ***Increase*** data `consistency`. `Normalized data` is more `consistent` because updates to `entities` are *not scattered* across tables.
- ***Improves*** data `organization`. Data is better organized in that records in database tables more closely map to the `real world entities` which they represent.

### EXERCISEs
#### Exercise 1.1. Reasons for normalizing databases
Which of the following **is not a reason to normalize a database?**

A. To reduce data duplication

B. To simplify data queries

C. To increase data consistency

D. To improve data organization


**Answers & comments.**

*A, C, D. Wrong!!* Look at the preceding reasons!

**B. Correct!!!** `Data queries` are often more complicated when a `database is normalized` due to an `increase in the number of tables` and the need to `perform joins`.

#### Exercise 1.2. Reducing data redundancy
A previous employee of the `Small Business Administration` developed an initial version of the database. `Location information` is utilized throughout the database for borrowers, banks, and projects. 

Each of the corresponding tables for these entities utilizes `city`, `state`, and `zip_code` columns creating redundant data. It is your responsibility to `normalize this location data`. 

You will have the opportunity to put your data normalization knowledge to work for you by creating a `place` table to consolidate location data.

#### Instructions
- Create the `place` table with `zip_code` as a five-character **`PRIMARY KEY`**, `city` as a `text-type` with up to 50 characters, and `state` as a `two-character` column.
- Remove the `city`, `state`, and `zip_code` columns from the `borrower` table definition.
- Add a column named `place_id` (a [`foreign key`](https://github.com/Nhan121/Lectures_notes-teaching-in-VN-/blob/master/SQL%20practices/Track:%20SQL%20for%20Database%20Administrators/Course1:%20Introduction%20to%20Relational%20Databases%20in%20SQL/glue-tables-with-foreign-keys.ipynb)) to the `borrower` table that `references` the `zip_code` column of the `place` table

**SOLUTION.**

                -- Create the place table
                CREATE TABLE place (
                                  -- Define zip_code column
                                  zip_code CHAR(5) PRIMARY KEY,
                                  -- Define city column
                                  city VARCHAR(50) NOT NULL,
                                  -- Define state column
                                  state CHAR(2) NOT NULL
                                );
                CREATE TABLE borrower(
                                      id SERIAL PRIMARY KEY,
                                      name VARCHAR(50) NOT NULL,
                                      approved BOOLEAN DEFAULT NULL,
                                      
                                      --Remove zip_code, city & state column
                                      --
                                      -- Add column referencing place table
                                      place_id CHAR(5) REFERENCES place(zip_code) 
                                    );

#### Exercise 1.3. Improving `object`-to-data mapping
The `Small Business Development Center client` table was previously defined without the inclusion of a point of contact for the client. 

The initial instinct of the database team was to simply add `contact_name` and `contact_email` columns to the `client` table. However, you object to this plan due to your instincts regarding proper data organization. In the future, a contact might be referenced in multiple tables. 

In this exercise, you will define table structures for the `client` and `contact` information that better separates the `client` and `contact` `objects`.

Recall the previous definition of the `client` table:

                CREATE TABLE client(
                                        id SERIAL PRIMARY KEY,
                                        name VARCHAR(50),
                                        site_url VARCHAR(50),
                                        num_employees SMALLINT,
                                        num_customers INTEGER
                                    );
#### Instructions
Create a `contact` table with columns `id` (a `primary key`), `name` (max length of 50), and `email` (max length of 50).

Alter the `client` table by adding a `contact_id` column as a `foreign key`.

**SOLUTION.**

                -- Create the contact table
                CREATE TABLE contact(
                                    -- Define the id primary key column
                                    id SERIAL PRIMARY KEY,
                                    -- Define the name column
                                    name VARCHAR(50) NOT NULL,
                                    -- Define the email column
                                    email VARCHAR(50) NOT NULL
                                    );

                -- Add contact_id to the client table
                ALTER TABLE client ADD contact_id INTEGER NOT NULL;

                -- Add a FOREIGN KEY constraint to the client table
                ALTER TABLE client ADD CONSTRAINT fk_c_id FOREIGN KEY (contact_id) REFERENCES contact(id);
**Comment.** Your `table design` helps to keep `client` and `contact` information separated. This choice will help to ensure the `integrity` of this `data` if the identity of a `client` contact changes in the future.

## 2. The `1st Normal Form`


**Reminders: `1 st NF rules`.**
- Each `record` must be `unique`; no duplicate rows
- Each `cell` must hold `one value`.

These concepts: `1st, 2nd, 3rd,... normal form` are firstly mentioned in the [Section 3, chapter 2; course 2](https://github.com/Nhan121/Lectures_notes-teaching-in-VN-/blob/master/SQL%20practices/Track:%20SQL%20for%20Database%20Administrators/Course2:%20Database%20Design/database-schemas-and-normalization.ipynb); and now, we will begin with its examples.

> **Motivation's example. Maintaining student record.**

                        CREATE TABLE student(
                                              id SERIAL PRIMARY KEY,
                                              name VARCHAR(50) NOT NULL, 
                                              courses VARCHAR(50) NOT NULL,
                                              home_room SMALLINT NOT NULL
                                              );
> **Problems.** Update, insertion & deletion the errors.
- **Case 1. Duplicated data after update**   
 
`before update`

| `id` | `name` | `courses` | `home_room` |
|:-|:-|:-|:-|
|121| Susan Roth | Algebra I, Physics, Spanish II | 101 |
|244| Leona Yin | History, Geometric, Biology | 204 |
|411| Peter Wright | English III, Bio-Stats, Finance I | 102 |

`then,`

| `id` | `name` | `courses` | `home_room` |
|:-|:-|:-|:-|
|121| Susan Roth | Algebra I, Chemistry, Spanish II, Chemistry | 101 |
|244| Leona Yin | History, Geometric, Biology | 204 |
|411| Peter Wright | English III, Bio-Stats, Finance I | 102 |

the `subject: Chemistry` in the `courses` column of the student `id = 121` is now duplicated after update!

> $ $
- **Case 2. Insertions with a column restriction**

`before update`

| `id` | `name` | `courses` | `home_room` |
|:-|:-|:-|:-|
|121| Susan Roth | Algebra I, Physics, Spanish II | 101 |
|244| Leona Yin | History, Geometric, Biology | 204 |
|411| Peter Wright | English III, Bio-Stats, Finance I | 102 |

`then,`

| `id` | `name` | `courses` | `home_room` |
|:-|:-|:-|:-|
|121| Susan Roth | Algebra I, Chemistry, Spanish II, Chemistry | 101 |
|244| Leona Yin | History, Geometric, Biology, French Literature | 204 |
|411| Peter Wright | English III, Bio-Stats, Finance I | 102 |

so the value `French Literature` is inserted into the `courses` : the column restrictions!

> $ $
- **Case 3. Data integrity impacted by deleting record.**

`before update`

| `id` | `name` | `courses` | `home_room` |
|:-|:-|:-|:-|
|121| Susan Roth | Algebra I, Physics, Spanish II | 101 |
|244| Leona Yin | History, Geometric, Biology | 204 |
|411| Peter Wright | English III, Bio-Stats, Finance I | 102 |

`then,`

| `id` | `name` | `courses` | `home_room` |
|:-|:-|:-|:-|
|121| Susan Roth | Algebra I, Physics, Spanish II | 101 |
|244| Leona Yin | History, Geometric, Biology | 204 |
|411| Peter Wright | ??? | 102 |

**SOLUTION. To make the `student` table satisfies the `1-NF`**

The `1NF requirement` is ***the `table values` must be `atomic`!***, each cell holds a unique value.

**`SQL command`.**

                        CREATE TABLE student(
                                              id INTEGER,
                                              name VARCHAR(50) NOT NULL, 
                                              courses VARCHAR(50) NOT NULL,
                                              home_room SMALLINT NOT NULL
                                              );              
**`query example`**

| `id` | `name` | `courses` | `home_room` |
|:-|:-|:-|:-|
|121| Susan Roth | Algebra I | 101 |
|121| Susan Roth | Physics | 101 |
|121| Susan Roth | Spanish II | 101 |
|244| Leona Yin | History | 204 |
|244| Leona Yin | Geometric | 204 |
|244| Leona Yin | Biology | 204 |
|411| Peter Wright | English III | 102 |
|411| Peter Wright | Bio-stats | 102 |
|411| Peter Wright | Finance I | 102 |

**Example 2.2. Another `student` table satisfying `1 NF`**


                        CREATE TABLE student(
                                              id INTEGER,
                                              first_name VARCHAR(50) NOT NULL,
                                              last_name VARCHAR(50) NOT NULL,
                                              courses VARCHAR(50) NOT NULL,
                                              home_room SMALLINT NOT NULL
                                              );              
**`query example`**

| `id` | `first_name` | `last_name` | `courses` | `home_room` |
|:-|:-|:-|:-|:-|
|121| Susan | Roth | Algebra I | 101 |
|121| Susan | Roth | Physics | 101 |
|121| Susan | Roth | Spanish II | 101 |
|244| Leona | Yin | History | 204 |
|244| Leona | Yin | Geometric | 204 |
|244| Leona | Yin | Biology | 204 |
|411| Peter | Wright | English III | 102 |
|411| Peter | Wright | Bio-stats | 102 |
|411| Peter | Wright | Finance I | 102 |

### EXERCISE.
#### Exercise 2.1. Simplifying database records
One teacher from the high school heard rumblings about efforts to better organize student records. He would like to organize `student grades` in his `courses`. The teacher proposes the following table structure for the `test_grades` table:

                    CREATE TABLE test_grades(
                                                student_id INTEGER NOT NULL,
                                                course_name VARCHAR(50) NOT NULL,
                                                grades TEXT NOT NULL
                                             );
Each record represents a student from one of the teacher's classes identified by the student's id, the course name, and the student's test grades. 

The teacher finds that managing the database with this structure is difficult. Inserting new grades requires a complex `query`. In addition, doing calculations on the grades is not very easy. In this exercise, you will help to put this table in `1st Normal Form (1NF)`.

#### Instructions
Define a new version of the table with the name `test_grade`.

Include `student_id` and `course_name` columns as defined in the `test_grades` table.

In place of a `grades` column, include a numeric column named `grade`.

**SOLUTION.**

                -- Create the test_grade table
                CREATE TABLE test_grade(
                                      -- Include a column for the student id
                                      student_id INTEGER NOT NULL,

                                      -- Include a column for the course name
                                      course_name VARCHAR(50) NOT NULL,

                                      -- Add a column to capture a single test grade
                                      grade NUMERIC NOT NULL
                                    );
**Comments.** This new table design will make it possible to easily add new `test grades` for each `student`. Performing `queries` to calculate statistics like the average test score or the lowest test score will be enabled by this design. Your excellent work shows how satisfying `1NF` improves the use and organization of the teacher's data.

#### Exercise 2.2. Too much normalization
Recall the definition of the `loan` table.

                CREATE TABLE loan (
                                    borrower_id INTEGER REFERENCES borrower(id),
                                    bank_id INTEGER REFERENCES bank(id),
                                    approval_date DATE NOT NULL DEFAULT CURRENT_DATE,
                                    gross_approval DECIMAL(9, 2) NOT NULL,
                                    term_in_months SMALLINT NOT NULL,
                                    revolver_status BOOLEAN NOT NULL DEFAULT FALSE,
                                    initial_interest_rate DECIMAL(4, 2) NOT NULL
                                );
A new design for this table has been suggested to satisfy `1NF`. The revised table definition replaces `approval_date` with `approval_month`, `approval_day`, and `approval_year`:

                CREATE TABLE loan(
                                ...
                                approval_month SMALLINT,
                                approval_day SMALLINT,
                                approval_year SMALLINT,
                                ...
                                );
This exercise demonstrates `how too much normalization` can allow for the insertion of `invalid data`.

#### Instructions
Remove the **`INSERT INTO`** statement in the following command, 

                INSERT INTO loan(
                                borrower_id, bank_id, approval_month, approval_day,
                                approval_year, gross_approval, term_in_months,
                                revolver_status, initial_interest_rate
                                ) 
                                VALUES (12, 14, 12, 1, 2013, 421115, 120, false, 4.42);

                INSERT INTO loan(
                                borrower_id, bank_id, approval_month, approval_day,
                                approval_year, gross_approval, term_in_months,
                                revolver_status, initial_interest_rate
                                ) 
                                VALUES (3, 201, 6, 42, 2017, 30015, 60, true, 3.25);

                INSERT INTO loan(
                                borrower_id, bank_id, approval_month, approval_day,
                                approval_year, gross_approval, term_in_months,
                                revolver_status, initial_interest_rate
                                ) 
                                VALUES (19, 5, 8, 19, 2018, 200000, 120, false, 6.3);
if executed, would result in invalid data being inserted into the table.

**SOLUTION.** Delete the second `insertion` since its datetime value be `2017/06/42`; a **`nonexistent day value`** could be included in the `INSERT INTO` `statement` due to a simple typo. This is invalid data but is allowed by the table definition. `PostgreSQL` provides an atomic DATE data type. Components of a date are not required to be represented in a more atomic format.

## 3. The `2nd Normal Form (2 NF).`
Firstly, look at the following command:

                    CREATE TABLE textbook(
                                            id SERIAL PRIMARY KEY,
                                            title VARCHAR(100) NOT NULL,
                                            publisher_name VARCHAR(100) NOT NULL,
                                            publisher_site VARCHAR(50),
                                            quantity SMALLINT NOT NULL DEFAULT 0,
                                          );
Assume that we obtain the following result query.

| `id` | `title` | `publisher_name` | `publisher_site` | `quantity` |
|:-|:-|:-|:-|:-|
| 25 | Functional Analyses | ABC publisher | www.abc.com | 122
| 29 | Assymptotic Statistics | ABC publisher | www.abc.com | 129
| 122| Data Analyst | DEF publisher | www.def.com | 140

$\diamond$ **problem: inconsistentcy from updating the `url`**, for instance

| `id` | `title` | `publisher_name` | `publisher_site` | `quantity` |
|:-|:-|:-|:-|:-|
| 25 | Functional Analyses | ABC publisher | www.newabc.com | 122
| 29 | Assymptotic Statistics | ABC publisher | www.abc.com | 129
| 122| Data Analyst | DEF publisher | www.def.com | 140

$\diamond$ **problem: adding `publisher` without `textbook`**, e.g.,

| `id` | `title` | `publisher_name` | `publisher_site` | `quantity` |
|:-|:-|:-|:-|:-|
| 25 | Functional Analyses | ABC publisher | www.abc.com | 122
| 29 | Assymptotic Statistics | ABC publisher | www.abc.com | 129
| 122| Data Analyst | DEF publisher | www.def.com | 140
| 174| ??? | XYZ publisher | www.xyz.com | ?? |

$\diamond$ **problem: removing a `textbook`**, e.g.,

| `id` | `title` | `publisher_name` | `publisher_site` | `quantity` |
|:-|:-|:-|:-|:-|
| 25 | Functional Analyses | ABC publisher | www.abc.com | 122
| 29 | Assymptotic Statistics | ABC publisher | www.abc.com | 129

$\Rightarrow$ So, 
- The `publisher` requires **`seperate table`**
- Data anomalies from `insertions` and `deletions`.

$\Rightarrow$ To satisfy the `2NF`, we need 
- the `1 NF` holds
- All `non-key` columns are **dependent** on the table's **`PRIMARY KEY`**.

#### Example: `textbooks` & `publishers` in the `2 NF`.

                        CREATE TABLE textbook(
                                               id SERIAL PRIMARY KEY,
                                               name VARCHAR(50) NOT NULL,
                                               quantity SMALLINT NOT NULL DEFAULT 0,
                                               publisher_id INTEGER REFERENCES publisher(id)
                                              );
                                              
                        CREATE TABLE publisher(
                                                id SERIAL PRIMARY KEY,
                                                name VARCHAR(100) NOT NULL,
                                                site VARCHAR(50)
                                               );

### EXERCISEs.
#### Exercise 3.1. Designing a course table
The school's administration decides to use its database to store course details. Given that this is the first attempt at building the database, they are unsure of which columns to include in the course table. Below is a list of possible columns and a description of the data type for each. In this exercise, you will choose the appropriate columns for this table from the list of possible column choices:

- `id` - a **`PRIMARY KEY`** for the course
- `name` - a variable length (max 50, not `NULL`) string for the course name
- `meeting_time` - a time representing the meeting time of the course
- `student_name` - a variable length (max 50, not `NULL`) string representing an enrolled student
- `max_students` - an integer for maximum student enrollment (classrooms can only fit 30 desks safely)

#### Instructions
Create a `course` table which satisfies `2NF` using 3 columns from the list above.

**SOLUTION.** The `meeting_time` and `student_name` must be contained in another table: `student`; so

                -- Create the course table
                CREATE TABLE course (
                    -- Add a column for the course table
                    id SERIAL PRIMARY KEY,

                    -- Add a column for the course table
                    name VARCHAR(50) NOT NULL,

                    -- Add a column for the course table
                    max_students SMALLINT
                );

**Comment.** It is a good idea to brainstorm which columns to include in a table before creating the table. Once a potential list of columns is defined, normalization rules can be used to determine which columns included in your table.

#### Exercise 3.2. Streamlining meal options
The cafeteria staff hears about all of the great work happening at the high school to organize data for important aspects of school operations. This group now wants to join these efforts. 

In particular, the staff wants to keep track of the different meal options that are available throughout the school year. With the help of the `IT` staff, the following table is defined for this purpose:

                    CREATE TABLE meal (
                                        id INTEGER,
                                        name VARCHAR(50) NOT NULL
                                        ingredients VARCHAR(150), -- comma seperated list
                                        avg_student_rating NUMERIC,
                                        date_served DATE,
                                        total_calories SMALLINT NOT NULL
                                    );
Using your knowledge of database normalization, you will provide a better design for the meal table.

#### Instructions
- Complete the definition of the `ingredient` table for storage of ingredients.
- Make the id column of the meal table a `PRIMARY KEY` and remove the 2 columns from the `meal` table that are no longer needed so that the meal table satisfies `2NF`.
- Complete the definition of `meal_date` to store dates on which a `meal` is served.
- Complete the definition of `meal_ingredient` so that ingredients in the ingredient table can be referenced from the `meal_ingredient` table.

**SOLUTION.**

            CREATE TABLE ingredient( 
                                      id SERIAL PRIMARY KEY, -- Add PRIMARY KEY for table
                                      name VARCHAR(50) NOT NULL
                                    );
            CREATE TABLE meal(
                            id SERIAL PRIMARY KEY, -- Make id a PRIMARY KEY
                            name VARCHAR(50) NOT NULL,

                            -- Remove columns that do not satisfy 2NF: ingredient & date serves
                            avg_student_rating NUMERIC,
                            total_calories SMALLINT NOT NULL
                            );
            CREATE TABLE meal_date(
                                  meal_id INTEGER REFERENCES meal(id), -- Def.a.col referencing the meal table
                                  date_served DATE NOT NULL
                                  );
            CREATE TABLE meal_ingredient(
                                        meal_id INTEGER REFERENCES meal(id),
                                        ingredient_id INTEGER REFERENCES ingredient(id)
                                        );
                                        
**Comments.** It may seem like a lot of extra work to `normalize` a database; here, a single table was split into 4 different tables to satisfy `2NF`. The benefits from `normalization` are worth the effort.                                       

## 4. The `3rd Normal Form (3 NF)`
### Requirements.
- Satisfy the `2 NF`
- **`No transitive dependencies exist!`**, this meant *All none-key* columns are **only dependent on** the `PRIMARY KEY`.
#### What is `transitive dependencies`?
For example, our table have 3 columns `X, Y, Z` then:
- changing `X` implies changing in `Y`
- changing `Y` implies changing in `Z`
- changing `X` implies changing in `Z`

For instance, **`course_room assignment`**

| `id` | `name` | `teacher` | `num` |
|:-|:-|:-|:-:|
| 157 | Linear Algebra | Maggie Winters | 288 |
| 162 | Computer Vision | Maggie Winters | 288 |
| 211 | French I | Laurrent Massart | 333 |
| 365 | Big Data | Steven Turring | 121 |
| 411 | French III | Laurrent Massart | 333 |

$\diamond$ We can see that (the `transitive dependency`):
- `course name --> teacher`
- `teacher --> room_number`
- `course name --> room_number`

$\diamond$ So, what happen if?
- `updating` the `room number`?
- `adding` new `teachers`?
- `deleting` all course of a `teacher`?

$\Rightarrow$ How do we change the `structure of our data` **in order to alleviate** these `potential problems` **?**

We must make sure that our tables hold the `3 NF rules`, so seperate the `course_room_assignment` into `teachers` and `course_assignments` tables; for instance,

**`teachers`**

| `id` | `name` | `num` |
|:-|:-|:-:|
| 1 | Laurrent Massart | 333 |
| 2 | Maggie Winters | 288 |
| 3 | Steven Turring | 121 |

**`course_assignments`**

| `id` | `name` | `teacher_id` |
|:-|:-|:-:|
|157| Linear Algebra | 2
|162| Computer Vision | 2
|211| French I | 1
|365| Big Data | 3
|411| French III | 1

### EXERCISEs.
#### Exercise 4.1. Identifying transitive dependencies
Imagine that a nation-wide database of schools exists. Someone who is unfamiliar with database normalization proposes the following structure for the `school` table:

            CREATE TABLE school (
                                id serial PRIMARY KEY,
                                name VARCHAR(100) NOT NULL,
                                street_address VARCHAR(100) NOT NULL,
                                city VARCHAR(50) NOT NULL,
                                state VARCHAR(50) NOT NULL,
                                zip_code INTEGER NOT NULL
                            )
Identify the **`transitive dependency`** introduced by this table definition.

A. `id` determines `name`.

B. `zip_code` determines city and `state`.

C. `address` determines `city`.

D. `name` determines `zip_code`.

**Answers & comments.**

*A. Incorrect!!* Noting that the `id` is the **`PRIMARY KEY`** so the dependence of the `school name` on the `id` is **valid**.

**B. Correct!!!** The `city` and `state` are dependent on `zip_code` and this **`transitive dependency`** must be removed to satisfy `3rd Normal Form`. 

*C. Wrong!!* The `school's city` is not dependent on the `school's address`.

*D. Incorrect!!* The `school's zip_code` is not dependent on the `school's name`.

#### Exercise 4.2. Table definitions for `3rd Normal Form`.
Recall the definition of the `school` table from the previous exercise:

                    CREATE TABLE school(
                                        id serial PRIMARY KEY,
                                        name VARCHAR(100) NOT NULL,
                                        street_address VARCHAR(100) NOT NULL,
                                        city VARCHAR(50) NOT NULL,
                                        state VARCHAR(50) NOT NULL,
                                        zip_code INTEGER NOT NULL
                                      )
We can define a new table called `zip` to help satisfy `3rd Normal Form`.

#### Instructions
Add a **`PRIMARY KEY`** named `code` to complete the definition of the `zip` table.

Update the definition of school to satisfy `3NF`.

**SOLUTION.**

                    -- Complete the definition of the table for zip codes
                    CREATE TABLE zip(
                                      code INTEGER PRIMARY KEY,
                                      city VARCHAR(50) NOT NULL,
                                      state VARCHAR(50) NOT NULL
                                    );

                    -- Complete the definition of the "zip_code" column
                    CREATE TABLE school(
                                        id serial PRIMARY KEY,
                                        name VARCHAR(100) NOT NULL,
                                        street_address VARCHAR(100) NOT NULL,
                                        zip_code INTEGER REFERENCES zip(code)
                                        );

#### Exercise 4.3. Working through the normalization process
`Table normalization` is an important action to undertake prior to creation of a new database to ensure that `data redundancy` is reduced and the `integrity` of your data is properly managed.

In this exercise, you will have an opportunity to practice normalizing database tables related to the `Small Business Association` loan program:
- a `borrower` table will be altered to satisfy the requirements for `1st Normal Form (1NF)`
- a `bank` and a loan table will be altered to satisfy the requirements for `2nd Normal Form (2NF)`
- the `loan` table will be altered again to satisfy the requirements for `3rd Normal Form (3NF)`

**`loan`**

|`borrower_id`|`bank_id`|`approval_date`|`program`|`max_amount`|`gross_approval`|`term_in_months`|`revolver_status`|`bank_zip`|`initial_interest_rate`|
|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|
|9|1|1990-03-21|7a|5000000.00|25000.00|36|true|13126-1014|10.12 |
|2|2|1997-02-09|504|6000000.00|126030.12|60|true|72209|6.30 |
|3|2|2002-01-13|504|6000000.00|30000.90|48|false|72209|5.15 |
|12|4|2010-11-25|7a|5000000.00|1000200.00|120|false|32084-3981|9.25 |
|5|10|2017-08-01|7a|5000000.00|500205.00|84|true|02816|8.70 |
|23|3|1995-09-24|504|6000000.00|900005.83|96|false|60056|4.30 |

After completing this exercise, you should feel more confident in your ability to normalize database tables.

#### Instructions
**Step 1.** The `borrower` table is not in `1NF`.

                        CREATE TABLE borrower (
                            id serial PRIMARY KEY,
                            full_name VARCHAR (100) NOT NULL
                        );
Add `first_name` and `last_name` columns. Remove the `full_name` column to satisfy `1NF` for this table.

**SOLUTION.**

                    -- Add new columns to the borrower table
                    ALTER TABLE borrower
                    ADD COLUMN first_name VARCHAR (50) NOT NULL,
                    ADD COLUMN last_name VARCHAR (50) NOT NULL;

                    -- Remove column from borrower table to satisfy 1NF
                    ALTER TABLE borrower
                    DROP COLUMN full_name;

**Step 2.** The `loan` table contains a `bank_zip` column. The bank table is defined below:

                    CREATE TABLE bank (
                        id serial PRIMARY KEY,
                        name VARCHAR(100) NOT NULL
                    );
Add a new column to bank named `zip` and alter `loan` to satisfy `2NF`.

**SOLUTION.**

                    -- Add columns to the borrower table
                    ALTER TABLE borrower
                    ADD COLUMN first_name VARCHAR (50) NOT NULL,
                    ADD COLUMN last_name VARCHAR (50) NOT NULL;

                    -- Remove column from borrower table to satisfy 1NF
                    ALTER TABLE borrower
                    DROP COLUMN full_name;

                    -- Add a new column named 'zip' to the 'bank' table 
                    ALTER TABLE bank
                    ADD COLUMN zip VARCHAR(10) NOT NULL;

                    -- Remove corresponding column from 'loan' to satisfy 2NF
                    ALTER TABLE loan
                    DROP COLUMN bank_zip;

**Step 3.** Let's also track the type of program for the `loan`. Create a new table named `program` that will store program records consisting of a `id`, `description`, and `max_amount` columns.

**SOLUTION.**

                    -- Add columns to the borrower table
                    ALTER TABLE borrower
                    ADD COLUMN first_name VARCHAR (50) NOT NULL,
                    ADD COLUMN last_name VARCHAR (50) NOT NULL;

                    -- Remove column from borrower table to satisfy 1NF
                    ALTER TABLE borrower
                    DROP COLUMN full_name;

                    -- Add a new column called 'zip' to the 'bank' table
                    ALTER TABLE bank
                    ADD COLUMN zip VARCHAR(10) NOT NULL;

                    -- Remove a corresponding column from 'loan' to satisfy 2NF
                    ALTER TABLE loan
                    DROP COLUMN bank_zip;

                    -- Define 'program' table with max amount for each program
                    CREATE TABLE program(
                                            id serial PRIMARY KEY,
                                            description text NOT NULL,
                                            max_amount DECIMAL(9,2) NOT NULL
                                        );

**Step 4.** The `max_amount` of a `loan` depends only on the `loan's program`. The `max_amount` of a `loan` can be determined using a `foreign key reference` to the `program` table, `program_id`, removing the need for the `program` column. Alter `loan` to satisfy `3NF`.

**SOLUTION.**

                    -- Add columns to the borrower table
                    ALTER TABLE borrower
                    ADD COLUMN first_name VARCHAR (50) NOT NULL,
                    ADD COLUMN last_name VARCHAR (50) NOT NULL;

                    -- Remove column from borrower table to satisfy 1NF
                    ALTER TABLE borrower
                    DROP COLUMN full_name;

                    -- Add a new column called 'zip' to the 'bank' table 
                    ALTER TABLE bank
                    ADD COLUMN zip VARCHAR(10) NOT NULL;

                    -- Remove a corresponding column from 'loan' to satisfy 2NF
                    ALTER TABLE loan
                    DROP COLUMN bank_zip;

                    -- Define 'program' table with max amount for each program
                    CREATE TABLE program (
                                            id serial PRIMARY KEY,
                                            description text NOT NULL,
                                            max_amount DECIMAL(9,2) NOT NULL
                                        );

                    -- Alter the 'loan' table to satisfy 3NF
                    ALTER TABLE loan
                    ADD COLUMN program_id INTEGER REFERENCES program (id), 
                    DROP COLUMN max_amount,
                    DROP COLUMN program;
                    
**Comments.** Now, you have updated the `structure of the loan` table to satisfy `3NF`. This will help to avoid `data redundancy` and to ensure the `integrity` of your data.