<div align="right" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/Logo blue_dark.png"  style="width:25px" align="right";/>
</div>

# The Second Normal Form – 2NF
© ExploreAI Academy

In this train, we will explain the concept of the Second Normal Form (2NF) in database design, emphasising the need to eliminate partial dependencies by splitting non-key attributes into separate tables.

> ⚠️ This notebook will not run on Google Colab because it cannot connect to a local database. Please make sure that this notebook is running on the same local machine as your MySQL Workbench installation and MySQL `company_data` database.

## Learning objectives

In this train, we will learn to:
- Understand the requirements for a table to meet the Second Normal Form (2NF) in database design. 
- Learn how to identify and eliminate partial dependencies by creating separate tables for non-key attributes.

## Connecting to the database

Using our `Company_employees` table in our `company_data` database that was created in MySQL Workbench, we want to answer some questions about our dataset. We can apply the same queries we used in MySQL Workbench in this notebook if we connect to our MySQL server by running the cells below.


In [1]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook. 
# If you get an error here, make sure that mysql and pymysql are installed correctly. 

%load_ext sql 

In [4]:
# Establish a connection to the local database using the '%sql' magic command.
# Replace 'password' with our connection password and `db_name` with our database name. 
# If you get an error here, please make sure the database name or password is correct.

%sql mysql+pymysql://root:Dau/2022@localhost:3306/company_data

'Connected: root@company_data'

## Overview 

The 1NF only allows us to identify records uniquely using a key but doesn’t eliminate redundancy, i.e. repetition of information.

In [5]:
%%sql
SELECT 
	* 
FROM 
	Company_employees;

 * mysql+pymysql://root:***@localhost:3306/company_data
   mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Employee_id,Name,Job_code,Job_title,State_code,Home_state
E001,Carmel,J01,Chef,26,Cape Town
E001,Carmel,J02,Waiter,26,Cape Town
E002,Stefanie,J02,Waiter,56,Joburg
E002,Stefanie,J03,Bartender,56,Joburg
E003,Lisa,J01,Chef,5,Nairobi


For a table to be in the Second Normal Form, two things must be true about the table:
- It should already be in its 1NF.
- It should have no partial dependency, i.e. all non-key attributes should be fully dependent on a primary key.

We already established that the `Company_employees` table is in the 1NF. We, therefore, only need to look at the non-key attributes and split them into separate tables that do not have repetition. 

## Exercise


### Exercise 1

Create a new table from `Company_employees` called `Employees`. Each row in the `Employee` table can be uniquely identified by the `Employee_id` column which is the primary key. The `Name`, `State_code`, and `Home_state`, which are non-key attributes, should be included.

There should be no partial dependencies.

In [6]:
%%sql

DROP TABLE IF EXISTS Employees;

CREATE TABLE Employees (PRIMARY KEY(`Employee_id`)) AS
SELECT
    DISTINCT Employee_id,
    Name,
    State_code,
    Home_state
FROM
    Company_employees;

 * mysql+pymysql://root:***@localhost:3306/company_data
   mysql+pymysql://root:***@localhost:3306/united_nations
0 rows affected.
3 rows affected.


ResourceClosedError: This result object does not return rows. It has been closed automatically.

### Exercise 2

Similar to `Employees`, create a table called `Jobs` from the `Company_employees` table. Each `Job` is identified by the `Job_code` and has a `Job_title`. Each row can be uniquely identified by the primary key, which is `Job_code`. 

There should be no partial dependencies.

In [7]:
%%sql

DROP TABLE IF EXISTS Jobs;

CREATE TABLE Jobs (PRIMARY KEY(`Job_code`)) AS
SELECT 
    DISTINCT Job_code, 
    Job_title
FROM 
    Company_employees;

 * mysql+pymysql://root:***@localhost:3306/company_data
   mysql+pymysql://root:***@localhost:3306/united_nations
0 rows affected.
3 rows affected.


ResourceClosedError: This result object does not return rows. It has been closed automatically.

### Exercise 3

The final step is to be able to map each employee to a job. To do this, define a table called `Employee_roles` with the columns `Employee_id` and `Job_code`. Each row can be uniquely identified by the composite primary key (`Employee_id` and `Job_code`).

There should be no partial dependencies. 

In [8]:
%%sql

DROP TABLE IF EXISTS Employee_roles;

CREATE TABLE Employee_roles (PRIMARY KEY(`Employee_id`, `Job_code`)) AS
SELECT 
    Employee_id, 
    Job_code
FROM 
    Company_employees;

 * mysql+pymysql://root:***@localhost:3306/company_data
   mysql+pymysql://root:***@localhost:3306/united_nations
0 rows affected.
5 rows affected.


ResourceClosedError: This result object does not return rows. It has been closed automatically.

### Summary

We have transformed the original table into Second Normal Form, resulting in this relational structure:

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/2NF%20ERD.png"  style="width:90%";/>
<br>
<br>
    <em>Figure 1: Second normal form ERD</em>
</div>

#  

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/EAI_Blue_Dark.png"  style="width:200px";/>
</div>