# Normalisation continued & Predict overview

In this webinar:
- Database table creation
    - Creating, dropping, and populating tables
- Modifying and updating tables
    - Deleting, updating and altering tables
- Practical normalisation (a simple example!)
- Predict overview + Q&A

## Database table creation

We may start with a completely unnormalised table, but we need to create new tables to place all our data into. The way we work through normalisation, is by **creating** each of our tables (essentially setting up the structure), followed by **inserting** the relevant data into it.


### CREATE TABLE

We start of with the "CREATE TABLE" phrase. With this, we are able to set up our table's structure and data types that SQLite should expect. We can also tell SQLite which columns are to be considered our primary or foreign keys!

```sql

CREATE TABLE "table_name" (
    "column_1" column_datatype,
    "column_2" column_datatype,
    "column_3" column_datatype,
    PRIMARY KEY ("column_1")
    FOREIGN KEY("column_3") REFERENCES "other_table" ("foreign_column_name")
);
```

*NB: some other flavours of SQL define primary and foreign keys in a slightly different way, so be aware of this if you're ever looking up anything about setting PK/FKs.*

### DROP TABLE

Maybe we made a mistake in our table, and we need to get rid of it so we can recreate it again. This is done with a simple DROP TABLE statement:

```sql
DROP TABLE table_name
```

Be careful with this, if you accidentally drop a table you don't have the code to recreate, it will affect your .db file and you may need to re-download your data.

The DROP TABLE statement is very useful when creating tables. If you make a mistake when creating a table, you need to drop the table before recreating it, so having this statement written above your CREATE TABLE query can make creating a table a bit easier (we'll see this in a bit). You'll sometimes see this phrase written as DROP TABLE IF EXISTS - this avoids SQL throwing an error if the table happens to not exist yet.


### INSERT INTO

Once we finally have our table created and the structure set up ready to be populated, we need to get to putting the data into our tables. We do that using the INSERT INTO phrase:

```sql
INSERT INTO table_name (column_1, column_2, column_3)
SELECT DISTINCT 
    old_column_1, 
    old_column_2, 
    old_column_3
FROM previous_table
```

It's important to note that when you're populating tables, have the columns in the correct order - the columns that follow the INSERT INTO statement should be in the same order as those in the SELECT statement. Even if the column names are different, SQL is pairing the first column with the first column, second with the second, etc.
    

---

## Modifying and updating tables

### DELETE FROM

If we need to delete records from our table, we can use the DELETE FROM statement. We can either delete everything, or we can delete records based on a condition. This will work on entire records (rows) rather than individual cell values.

```sql
DELETE FROM table_name;

DELETE FROM table_name
WHERE condition;
```

### UPDATE 

The UPDATE statement can be used to edit values in particular columns where needed. This can either be done to an entire column, or based on a condition can be applied to particular values.

```sql
UPDATE table_name
    SET column_1 = value_1,
        column_2 = value 2, ...
;

UPDATE table_name
    SET column_1 = value_1
    WHERE condition;
```


### ALTER 

The ALTER statement can be used to make a few different changes to a table or columns.

```sql
ALTER table_name
RENAME TO new_name;

ALTER table_name
ADD new_column column_definition;

ALTER table_name
DROP column_name;
```


### Other data modifications

Think back to some of the methods we've learnt previously that we can apply to tables! (Think back to the String Manipulation and Data Transformations trains...)
- CASE IF
- IFF
- COALESCE
- CAST
- etc...

---

## Practical Normalisation

Let's try working through a smaller, simpler table taking it from unnormalised through to 3NF. Let's first load the sql extension and our database:

In [1]:
# Load extension and database
%load_ext sql

In [2]:
%%sql

sqlite:///employees_db.db

Run the code below to see what tables we have in our database so far:

In [3]:
%%sql

SELECT name, type FROM sqlite_master WHERE type IN ('table') AND name NOT LIKE 'sqlite_%' ORDER BY 1

 * sqlite:///employees_db.db
Done.


name,type
employees,table


In [4]:
%%sql

SELECT *
FROM employees

 * sqlite:///employees_db.db
Done.


Employee_ID,Name,Job_Code,Job,Province_Code,Home_Province
E001,Alice,"J01, J02","Chef, Waiter",1,Gauteng
E002,Bob,"J02, J03","Waiter, Bartender",2,Western Cape
E003,Leona,J01,Chef,2,Western Cape
E004,John,"J04, J03","Manager, Bartender",1,Gauteng
E005,Siyanda,J05,CEO,4,Limpopo
E006,Sipho,J02,Waiter,5,Northern Cape
E007,Alex,J06,Head Chef,5,Northern Cape
E008,Brad,J04,Manager,3,Eastern Cape
E009,Saveshnee,J01,Chef,6,KwaZulu Natal
E010,Adrian,J03,Bartender,7,Mpumalange


# 1NF

For a table to be in 1NF, it should have no repeating groups

![](1NF.png)

### Create and populate our table in 1NF
We need to modify our data where we have more than one value in a cell. First, we'll create our 1NF table structure and call it employees_1NF:

In [5]:
%%sql
--Creating our table

DROP TABLE IF EXISTS "employees_1NF";

CREATE TABLE "employees_1NF" (
    "Employee_ID"   VARCHAR(100)  NOT NULL,
    "Name"  VARCHAR(100)  NOT NULL,
    "Job_Code"  VARCHAR(100) NOT NULL,
    "Job" VARCHAR(100) NOT NULL,
    "Province_Code" NUMERIC(10) NOT NULL,
    "Home_Province" VARCHAR(100) NOT NULL,
    PRIMARY KEY ("Employee_ID", "Job_Code")
);

 * sqlite:///employees_db.db
Done.


[]

In [6]:
%%sql
--Inserting data from the employees table into our newly created employees_1NF table

DELETE FROM employees_1NF; 

INSERT INTO employees_1NF (Employee_ID, Name, Job_Code, Job, Province_Code, Home_Province)
SELECT DISTINCT 
    Employee_ID,
    Name, 
    Job_Code, 
    Job, 
    Province_Code,
    Home_Province
FROM employees

 * sqlite:///employees_db.db
Done.
10 rows affected.


[]

Check that we've successfully inserted our data:

In [7]:
%%sql
SELECT *
FROM employees_1NF

 * sqlite:///employees_db.db
Done.


Employee_ID,Name,Job_Code,Job,Province_Code,Home_Province
E001,Alice,"J01, J02","Chef, Waiter",1,Gauteng
E002,Bob,"J02, J03","Waiter, Bartender",2,Western Cape
E003,Leona,J01,Chef,2,Western Cape
E004,John,"J04, J03","Manager, Bartender",1,Gauteng
E005,Siyanda,J05,CEO,4,Limpopo
E006,Sipho,J02,Waiter,5,Northern Cape
E007,Alex,J06,Head Chef,5,Northern Cape
E008,Brad,J04,Manager,3,Eastern Cape
E009,Saveshnee,J01,Chef,6,KwaZulu Natal
E010,Adrian,J03,Bartender,7,Mpumalange


We still need to deal with the rows where there are multiple values. Since there aren't too many, we can do this manually:

- delete the rows (ensure we have the data recorded somewhere so we know what we want to put back in)
- manually imput the rows into our table

In [8]:
%%sql
DELETE FROM employees_1NF
WHERE Employee_ID = "E001" 
    OR Employee_ID = "E002"
    OR Employee_ID = "E004"

 * sqlite:///employees_db.db
3 rows affected.


[]

In [9]:
%%sql
SELECT * 
FROM employees_1NF

 * sqlite:///employees_db.db
Done.


Employee_ID,Name,Job_Code,Job,Province_Code,Home_Province
E003,Leona,J01,Chef,2,Western Cape
E005,Siyanda,J05,CEO,4,Limpopo
E006,Sipho,J02,Waiter,5,Northern Cape
E007,Alex,J06,Head Chef,5,Northern Cape
E008,Brad,J04,Manager,3,Eastern Cape
E009,Saveshnee,J01,Chef,6,KwaZulu Natal
E010,Adrian,J03,Bartender,7,Mpumalange


In [10]:
%%sql
INSERT INTO employees_1NF (Employee_ID,Name,Job_Code,Job,Province_Code,Home_Province)
VALUES
    ("E001","Alice", "J01", "Chef", 1, "Gauteng"),
    ("E001","Alice", "J02", "Waiter", 1, "Gauteng"),
    ("E002","Bob", "J02", "Waiter", 2, "Western Cape"),
    ("E002","Bob", "J03", "Bartender", 2, "Western Cape"),
    ("E004", "John", "J03", "Bartender", 1,"Gauteng"),
    ("E004", "John", "J04", "Manager", 1,"Gauteng")

 * sqlite:///employees_db.db
6 rows affected.


[]

In [11]:
%%sql
SELECT *
FROM employees_1NF

 * sqlite:///employees_db.db
Done.


Employee_ID,Name,Job_Code,Job,Province_Code,Home_Province
E003,Leona,J01,Chef,2,Western Cape
E005,Siyanda,J05,CEO,4,Limpopo
E006,Sipho,J02,Waiter,5,Northern Cape
E007,Alex,J06,Head Chef,5,Northern Cape
E008,Brad,J04,Manager,3,Eastern Cape
E009,Saveshnee,J01,Chef,6,KwaZulu Natal
E010,Adrian,J03,Bartender,7,Mpumalange
E001,Alice,J01,Chef,1,Gauteng
E001,Alice,J02,Waiter,1,Gauteng
E002,Bob,J02,Waiter,2,Western Cape


In [12]:
%%sql

SELECT name, type FROM sqlite_master WHERE type IN ('table') AND name NOT LIKE 'sqlite_%' ORDER BY 1

 * sqlite:///employees_db.db
Done.


name,type
employees,table
employees_1NF,table


# 2NF

- For a table to be in 2NF, it should have no repeating groups, and no partial dependencies

![](2NF.png)

### Create and populate the employees_2NF table

In [13]:
%%sql

DROP TABLE IF EXISTS "employees_2NF";

CREATE TABLE "employees_2NF" (
    "Employee_ID"   VARCHAR(100)  NOT NULL,
    "Name"  VARCHAR(100)  NOT NULL,
    "Province_Code" NUMERIC(10) NOT NULL,
    "Home_Province" VARCHAR(100) NOT NULL,
    PRIMARY KEY ("Employee_ID")
);

 * sqlite:///employees_db.db
Done.
Done.


[]

In [14]:
%%sql
INSERT INTO employees_2NF (Employee_ID,Name,Province_Code,Home_Province)
SELECT DISTINCT 
    Employee_ID,
    Name,
    Province_Code,
    Home_Province
FROM employees_1NF

 * sqlite:///employees_db.db
10 rows affected.


[]

In [15]:
%%sql
SELECT *
FROM employees_2NF

 * sqlite:///employees_db.db
Done.


Employee_ID,Name,Province_Code,Home_Province
E001,Alice,1,Gauteng
E002,Bob,2,Western Cape
E003,Leona,2,Western Cape
E004,John,1,Gauteng
E005,Siyanda,4,Limpopo
E006,Sipho,5,Northern Cape
E007,Alex,5,Northern Cape
E008,Brad,3,Eastern Cape
E009,Saveshnee,6,KwaZulu Natal
E010,Adrian,7,Mpumalange


### Create and populate the jobs_2NF table

In [16]:
%%sql

DROP TABLE IF EXISTS "jobs_2NF";

CREATE TABLE "jobs_2NF" (
    "Job_Code"  VARCHAR(100),
    "Job" VARCHAR(100),
    PRIMARY KEY ("Job_Code")
);

 * sqlite:///employees_db.db
Done.
Done.


[]

In [17]:
%%sql
INSERT INTO jobs_2NF (Job_Code,Job)
SELECT DISTINCT 
    Job_Code,
    Job
FROM employees_1NF

 * sqlite:///employees_db.db
6 rows affected.


[]

In [18]:
%%sql
SELECT *
FROM jobs_2NF

 * sqlite:///employees_db.db
Done.


Job_Code,Job
J01,Chef
J05,CEO
J02,Waiter
J06,Head Chef
J04,Manager
J03,Bartender


### Create and populate the employee_roles_2NF table

In [19]:
%%sql

DROP TABLE IF EXISTS "employee_roles_2NF";

CREATE TABLE "employee_roles_2NF" (
    "Employee_ID"  VARCHAR(100),
    "Job_Code" VARCHAR(100),
    FOREIGN KEY("Employee_ID") REFERENCES "employees_2NF" ("Employee_ID")
    FOREIGN KEY("Job_Code") REFERENCES "jobs_2NF" ("Job_Code")
);

 * sqlite:///employees_db.db
Done.
Done.


[]

In [20]:
%%sql
INSERT INTO employee_roles_2NF (Employee_ID,Job_Code)
SELECT DISTINCT 
    Employee_ID,
    Job_Code
FROM employees_1NF

 * sqlite:///employees_db.db
13 rows affected.


[]

In [21]:
%%sql
SELECT *
FROM employee_roles_2NF

 * sqlite:///employees_db.db
Done.


Employee_ID,Job_Code
E001,J01
E001,J02
E002,J02
E002,J03
E003,J01
E004,J03
E004,J04
E005,J05
E006,J02
E007,J06


In [22]:
%%sql

SELECT name, type FROM sqlite_master WHERE type IN ('table','view') AND name NOT LIKE 'sqlite_%' ORDER BY 2

 * sqlite:///employees_db.db
Done.


name,type
employees,table
employees_1NF,table
employees_2NF,table
jobs_2NF,table
employee_roles_2NF,table


# 3NF

For a table to be in 3NF, it needs to already be in 2NF, and should have no transitive dependencies.

Try the following yourself! Have a look at the steps taken getting into the previous NFs and follow the process of creating the table, then populating.

![](3NF.png)

### Create and populate the province_info_3NF table

In [None]:
%%sql

DROP TABLE IF EXISTS "province_info_3NF";

CREATE TABLE "province_info_3NF" (
    ...,
    ..., 
    PRIMARY KEY (...)
);

In [None]:
%%sql
INSERT INTO ... (...)
SELECT DISTINCT 
    ...,
    ...
FROM ...

In [None]:
%%sql
SELECT *
FROM province_info_3NF

### Create and populate the employees_3NF table

In [None]:
%%sql

DROP ...;

CREATE TABLE "employees_3NF" (
    ...,
    ...,
    ....
    PRIMARY KEY ...
    FOREIGN KEY ...
);

In [None]:
%%sql
INSERT INTO ...
SELECT DISTINCT 
    ...
FROM ...

In [None]:
%%sql
SELECT *
FROM employees_3NF

### What about the jobs and employee_roles tables?

These two tables we created in 2NF are already in 3NF. However, we can create new tables so that they align with our 3NF naming. *Alternatively, we could also just rename those two tables to align with the naming!*

### Create and populate the jobs_3NF and employee_roles_3NF tables

In [None]:
%%sql
DROP ...;
DROP ...;

CREATE TABLE ...;

CREATE TABLE ...;

In [None]:
%%sql

DELETE FROM ...;
DELETE FROM ...;

INSERT INTO ...
SELECT DISTINCT 
    ...
FROM ...;

INSERT INTO ...
SELECT DISTINCT 
    ...
FROM ...;

In [None]:
%%sql

SELECT * 
FROM jobs_3NF

In [None]:
%%sql

SELECT * 
FROM employee_roles_3NF

## Now that everything is in 3NF...

Now let's finally view all of the tables we have. We should have all the necessary tables we need, and we could even go ahead to drop those we no longer need, to ensure all the data we have is in 3NF.

In [None]:
%%sql

SELECT name, type FROM sqlite_master WHERE type IN ('table','view') AND name NOT LIKE 'sqlite_%' ORDER BY 1

## Exercises

1. Our one intern Alex has completed their internship with us, and has moved onto new adventures. How would we drop their record from the table?

2. Our restaurant/bar/cafe has had a sudden influx of coffee-lovers over the past few months! The CEO has decided that Bartenders are to train as Baristas as well, and wants to update their title to 'Bartender/Barista' (don't worry, their salary is being updated too...) How would we update this to reflect in our database?

3. The CEO has also decided that we should start storing employee's phone numbers and emails in our database as well. Where would you store this, and how would you add it to the database?

4. The restaurant is planning to start keeping record of all sales within this database from now on. Create a new 'Sales' table, that has the columns Employee_ID, Date, Time, Item_Count, and Total_Price. We only need to create the structure for now in preparation for the incoming data.

5. We belatedly realise that *Mpumalanga* is misspelt in our database! How would you change this to the correct spelling?

# Onto the Predict...