# Relational Databases and Normal Form

This is adapted from 

_MongoDB Applied Design Patterns_, Rick Copeland (O’Reilly). 978-1-449-34004-9.



In [None]:
from IPython.display import Image
import sqlite3 as sq

Image("http://imgs.xkcd.com/comics/exploits_of_a_mom.png")

By far the most common form of a database is a relational database, like [Oracle](https://www.oracle.com/au/database/), [SQLServer](https://www.microsoft.com/en-gb/sql-server/), or [PostgreSQL](https://www.postgresql.org/). 

Relational databases are buided by the concept of [__first normal form (1NF)__](https://en.wikipedia.org/wiki/First_normal_form).


>For our purposes, we can consider 1NF data to be any data that’s tabular (composed of rows and columns), with each row-column intersection (“cell”) containing __exactly one value.__ (*MongoDB Applied Design Patterns,* p 18)

## A Motivating Example

Consider a database that is designed to keep track of people, their phone number, and their zip (USA, postal AU) code.




|id  | name  | phone\_number| zip code |
|----|:-----:|:-------------|---------:|
|1   | Rick  |555-111-1234  | 30062    |
|2   | Mike  | 555-222-2345 | 30062|
|3   | Jenny |555-333-3456  |01209|


This is straightforward enough, but what if...

...people have more than one phone number?

### Two Numbers?

- Cell number
- Home number

## What if people have more than one phone number?
### Is this first normal form?

|id  | name  | phone\_number| zip code |
|----|:-----:|:-------------|---------:|
|1   | Rick  |555-111-1234  | 30062    |
|2   | Mike  | 555-222-2345;555-212-2322 | 30062|
|3   | Jenny |555-333-3456;555-334-3411 |01209|


### What do you think of this solution?

|id  | name  | phone\_number0|phone\_number1| zip code |
|----|:-----:|:-------------|:--------------|---------:|
|1   | Rick  |555-111-1234  |NULL | 30062    |
|2   | Mike  | 555-222-2345 |555-212-2322 | 30062|
|3   | Jenny |555-333-3456 | 555-334-3411 |01209|



## Is This first normal form?

|id  | name  | phone\_number| zip code |
|----|:-----:|:-------------|---------:|
|1   | Rick  |555-111-1234  | 30062    |
|2   | Mike  | 555-222-2345 | 30062|
|2   | Mike  | 555-212-2322 | 30062|
|3   | Jenny |555-333-3456  |01209|
|3   | Jenny |555-334-3411  |01209|

### Any drawbacks?

## What is the cost of our solution?

We have introduced redudancy.

* Increased data storage
* **Opportunities for data inconsistency**

## Here is the Normal Form Solution

|id  | name  | zip code |
|----|:-----:|---------:|
|1   | Rick  | 30062    |
|2   | Mike  |30062|
|3   | Jenny |01209|


|id  |phone\_number| 
|----|:-------------|
|1   |555-111-1234  | 
|2   |555-222-2345 | 
|2   |555-212-2322 | 
|3   |555-333-3456  |
|3   |555-334-3411  |