<div align="right" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/Logo blue_dark.png"  style="width:25px" align="right";/>
</div>

# Intro to the First Normal Form – 1NF
© ExploreAI Academy

In this train, we get an overview of database normalisation techniques and learn how to apply the First Normal Form – 1NF to a dataset. We will cover the steps to identify and address data inconsistencies, add a unique identifier column, create a new normalised table, and transfer data from the original table to the normalised one using SQL queries.



> ⚠️ Make sure that you download the `dam_levels.db` file before continuing with the train.
 
> ⚠️ Since the queries here will modify the database, you will have to get a fresh copy of the database in order to redo the code cells.

## Learning objectives

By the end of this train, you should:
- Understand the concept of database normalisation and its significance in data management.
- Learn the criteria for a table to be in the First Normal Form (1NF), including atomicity, primary keys, and uniqueness.
- Apply the concepts learned to normalise a table, copy data into it, and remove the old table to achieve the desired database structure.

## 1. Overview
Normalisation is a database technique for evaluating and reorganising table structures to minimise redundancies, improve data integrity, improve storage efficiency, and reduce the need to redesign the database if new data are introduced. Database normalisation removes inconsistencies which may cause the analysis of our data to be more complicated. These inconsistencies could come from updating, inserting, or deleting records. It also includes the removal of duplicate records which saves on storage and is a step toward fulfilling the requirement of records having unique identifiers, called keys.

Normalisation is segmented into ordered categories: 1NF, 2NF, 3NF, and so on, up to higher levels of normalisation.
2NF is preferred to 1NF, 3NF to 2NF, etc.

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/Normalisation%20-%20Normal%20forms.png"  style="width:80%";/>
<br>
<br>
    <em>Figure 1: Normal forms</em>
</div>

A table is in 1NF if:
- Each cell in the table must not hold more than one value, which is referred to as atomicity.
- The table has a primary key for identification.
- The table has no duplicated rows or columns.

To convert an unnormalised table to 1NF, do either of the following:

- Flatten the table and change the primary key.
- Decompose the table into smaller tables – one for the repeating groups and one for the non-repeating groups.

<div>
Let's look at normalising the dams database – a <a #href="https://en.wikipedia.org/wiki/Flat-file_database">flat-file database</a> containing information about Cape Town dam water levels leading up to the 2018 Cape Town water crisis.
</div>

## Connecting to the database

In [1]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook. 
# If you get an error here, make sure that mysql and pymysql are installed correctly. 

%load_ext sql

In [2]:
# Load SQLite database
%sql sqlite:///dam_levels-a-3987.db

'Connected: @dam_levels-a-3987.db'

Before we proceed, let's first take a closer look at the `dam_levels` table:

In [3]:
%%sql

SELECT *
FROM dam_levels

 * sqlite:///dam_levels-a-3987.db
Done.


year,dam_name,Assessment_Officer,Officer_Reg,water_level,dam_latitude,dam_longitude
2012,WEMMERSHOEK,P. M. Naidoo,201013,48.2,-33.826246,19.091828
2012,STEENBRAS LOWER;STEENBRAS UPPER,V. Mokere,201124,20.3;24.2,-34.180527;-34.166702,18.866688;18.90976
2012,VOËLVLEI,A. T. Sijovu,202256,15,-33.34178,19.04105
2012,HELY-HUTCHINSON,P. M. Naidoo,201013,14.2,-33.976929,18.409568
2012,WOODHEAD,A. T. Sijovu,202256,35.3,-33.977341,18.404046
2013,WEMMERSHOEK,P. M. Naidoo,201013,53.3,-33.826246,19.091828
2013,STEENBRAS LOWER;STEENBRAS UPPER,V. Mokere,201124,22.4;24.6,-34.180527;-34.166702,18.866688;18.90976
2013,VOËLVLEI,A. T. Sijovu,202256,16.6,-33.34178,19.04105
2013,HELY-HUTCHINSON,P. M. Naidoo,201013,15.2,-33.976929,18.409568
2013,WOODHEAD,A. T. Sijovu,202256,35.9,-33.977341,18.404046


## Exercise


### Exercise 1

It seems that data for two dams have been lumped together in some rows. Write a query to delete these rows.

In [4]:
%%sql

DELETE 
FROM 
    dam_levels
WHERE 
    Assessment_Officer = "V. Mokere";

 * sqlite:///dam_levels-a-3987.db
3 rows affected.


ResourceClosedError: This result object does not return rows. It has been closed automatically.

### Exercise 2

Now, re-insert the deleted rows of data the right way, with a focus on atomicity.

**HINT:** Use the values from the `dam_levels` table we loaded above (third code cell).

In [5]:
%%sql

INSERT INTO dam_levels (year,dam_name,Assessment_Officer,Officer_Reg,water_level,dam_latitude,dam_longitude)
VALUES 
    ( 2012,"STEENBRAS LOWER","V. Mokere",201124,20.3,-34.180527,18.866688),
    ( 2012,"STEENBRAS UPPER","V. Mokere",201124,24.2,-34.166702,18.90976),
    ( 2013,"STEENBRAS LOWER","V. Mokere",201124,22.4,-34.180527,18.866688),
    ( 2013,"STEENBRAS UPPER","V. Mokere",201124,24.6,-34.166702,18.90976),
    ( 2015,"STEENBRAS LOWER","V. Mokere",201124,22.7,-34.180527,18.866688),
    ( 2015,"STEENBRAS UPPER","V. Mokere",201124,24.6,-34.16670,18.90976);

 * sqlite:///dam_levels-a-3987.db
6 rows affected.


ResourceClosedError: This result object does not return rows. It has been closed automatically.

### Exercise 3

Next, we need to make sure that the rows are uniquely identifiable. The easiest way to achieve this is to add an `ID` column to the table. However, SQLite does not allow for the addition of constrained columns to existing tables. As such, we are better off creating a new table with the new `ID` column, copying the old columns across, and deleting the old table.

Start by creating the structure of the new table `dam_levels_1nf`. 

In [6]:
%%sql

CREATE TABLE dam_levels_1nf (
    AssessmentId INTEGER PRIMARY KEY AUTOINCREMENT,
    year INTEGER,
    dam_name VARCHAR(100),
    Assessment_Officer VARCHAR(100),
    Officer_Reg INTEGER,
    water_level NUMERIC(10,1),
    dam_latitude NUMERIC(3,6),
    dam_longitude NUMERIC(3,6)
);

 * sqlite:///dam_levels-a-3987.db
Done.


ResourceClosedError: This result object does not return rows. It has been closed automatically.

### Exercise 4

Subsequently, transfer data from the previous `dam_levels` table into the newly created `dam_levels_1nf` table.

In [7]:
%%sql

INSERT INTO 
    dam_levels_1nf(
            year, 
            dam_name, 
            Assessment_Officer, 
            Officer_Reg, 
            water_level, 
            dam_latitude, 
            dam_longitude
        )
SELECT 
    year, 
    dam_name, 
    Assessment_Officer, 
    Officer_Reg, 
    water_level, 
    dam_latitude, 
    dam_longitude
FROM 
    dam_levels
ORDER BY year;

 * sqlite:///dam_levels-a-3987.db
18 rows affected.


ResourceClosedError: This result object does not return rows. It has been closed automatically.

### Exercise 5

Delete the redundant table, `dam_levels`.

In [8]:
%%sql

DROP TABLE dam_levels;

 * sqlite:///dam_levels-a-3987.db
Done.


ResourceClosedError: This result object does not return rows. It has been closed automatically.

## Summary

And with that, we have a table in 1NF!

**Important**: After completing this exercise, ensure that you save the `dam_levels.db` file, as it will be used as a foundation for the next exercise.

#  

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/EAI_Blue_Dark.png"  style="width:200px";/>
</div>