# Views & Introduction to Normalisation

### In today's webinar:

- Views - what are they and why do we use them?
- Views vs CTEs
- View types and examples
- What is normalisation?
- Normal forms and examples

---

# Views
### What are they?

Views are a kind of 'virtual table'. Imagine you have a huge database, with many, many tables in it that hold all kinds of different information. Maybe this data even ranges over several decades. This is a lot of information, a lot of which may not even be important to us or the task we are trying to complete, and sometimes querying it can be complicated.

Views allow us to create a sort of 'snapshot' of the data, just the parts that we need, giving us something reusable and easier to use. 

Views do not take up any physical storage - the resulting table from our view is not what is stored, but rather the query that is used to create the view is what is stored. The view is stored in memory (takes up memory space, not storage space)


### Why do we use them?
- Reduce complexity
- Security reasons

### Views vs CTEs
- We looked last webinar at CTEs. They seem pretty similar...but what's the difference?
- Both CTEs and views can be used to optimise our queries
- CTEs are **temporary results** - we create them in a query, and they are valid for that query only. We can't reuse them in the next query.
- Views, on the other hand, **can** be reused in subsequent queries. The view query (not the resulting table) is stored in memory.

### Basic view query structure

``` sql
CREATE VIEW ViewName AS
SELECT columnnames
FROM tablename;
```

### Types of view

#### 1. Look Up View
- Used to select certain columns from a single table. 
    - but we can always do this with a simple SELECT-FROM query, what's the point?
    - You might have a table that has tens of columns, and you need a subset of, say, 15 - typing out all 15 columns every single time is inefficient, and a lookup view makes this process a lot easier


#### 2. Join View
- Used to display selected columns from multiple tables at once, using JOIN statements

#### 3. Aggregating View
- Used to select certain columns from a table, but also makes use of summary metrics (average, sum, max/min, etc.) to create an additional column/columns in the resulting table that aggregates the data as needed.


### I made a view, but I made it wrong! What do I do?
- Not to worry! If you made your view incorrectly, you can use the `DROP` statement to drop a view
- If you need to edit your view, make sure to DROP it before re-running it's creation, or you'll run into an error

```sql
DROP VIEW viewname
```

In [16]:
#Load the SQL extension to use magic commands.
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [17]:
# Load SQLite database
%sql sqlite:///chinook.db

In [19]:
%%sql

SELECT name FROM sqlite_master WHERE type IN ('table','view') AND name NOT LIKE 'sqlite_%' ORDER BY 1

 * sqlite:///chinook.db
Done.


name
albums
artists
customers
employees
genres
invoice_items
invoices
media_types
playlist_track
playlists


In [24]:
%%sql
CREATE VIEW Albums_View AS
SELECT Title AS Album_Title, Name AS Artist
FROM Albums
INNER JOIN Artists
    ON Albums.ArtistId = Artists.ArtistId

 * sqlite:///chinook.db
Done.


[]

In [25]:
%%sql
SELECT *
FROM Albums_View
LIMIT 5;

 * sqlite:///chinook.db
Done.


Album_Title,Artist
For Those About To Rock We Salute You,AC/DC
Balls to the Wall,Accept
Restless and Wild,Accept
Let There Be Rock,AC/DC
Big Ones,Aerosmith


In [23]:
%%sql
DROP VIEW Albums_View

 * sqlite:///chinook.db
Done.


[]

In [27]:
%%sql

SELECT name, type FROM sqlite_master WHERE type IN ('table','view') AND name NOT LIKE 'sqlite_%' ORDER BY 1

 * sqlite:///chinook.db
Done.


name,type
Albums_View,view
albums,table
artists,table
customers,table
employees,table
genres,table
invoice_items,table
invoices,table
media_types,table
playlist_track,table


---

# Normalisation

- If you were working with a very large database, with many tables and an extensive amount of data, you'd want it to be organised in some way, right?
   - Normalisation is the process of doing this.
   - Normalisation is a technique used to reduce redundancies, improve data integrity, improve storage efficiency, and reduce the need to re-design the database if new data is introduced
   - It removes inconsistencies, and removes unnecessary duplication of data that takes up storage space
- Normalisation works up from Unnormalised > 1NF > 2NF > 3NF > 4NF onwards
    - You have to meet the requirements of a particular normal form before moving up a level!

# Normal forms

[Example source](https://www.freecodecamp.org/news/database-normalization-1nf-2nf-3nf-table-examples/)

### Unnormalised database
This is the most basic form of a database. You might have a database that meets some requirements of the normal forms, but not all/not consistently. From an unnormalised database, we can work towards getting it into First Normal Form (1NF):

### 1NF
For a table to be in 1NF, it needs to have **no repeating groups**. Cells should not hold more than one piece of information. For example:

|EMPLOYEE_ID | NAME|	JOB_CODE|	JOB	|PROVINCE_CODE|	HOME_PROVINCE |
| --- | --- | --- | --- | --- | --- |
|E001|	Alice|	J01, J02	|Chef, Waiter|	26	|Gauteng|
|E002|	Bob	|J02, J03|	Waiter, Bartender|	56	|Western Cape|
|E003|	Alice|	J01|	Chef	|56|	Western Cape|



A table like the above has more than one piece of information in cells in the 'Job Code' and 'Job' column. In order to resolve this, we ensure that each cell has only one value:

| EMPLOYEE_ID|	NAME|	JOB_CODE|	JOB|	PROVINCE_CODE|	HOME_PROVINCE|
| --- | --- | --- | --- | --- | --- |
|E001|	Alice|	J01|	Chef|	26	|Gauteng|
|E001|	Alice|	J02|	Waiter|	26	|Gauteng|
|E002|	Bob|	J02|	Waiter|	56	|Western Cape|
|E002|	Bob|	J03|	Bartender|	56|	Western Cape|
|E003|	Alice|	J01|	Chef	|56|	Western Cape|


### 2NF

A table is in 2NF if it contains no repeating groups (**i.e. is already in 1NF**) and no *partial functional dependencies*.

To understand partial functional dependencies, let's have a look at the following example:

| EMPLOYEE_ID|	NAME|	JOB_CODE|	JOB|	PROVINCE_CODE|	HOME_PROVINCE|
| --- | --- | --- | --- | --- | --- |
|E001|	Alice|	J01|	Chef|	26	|Gauteng|
|E001|	Alice|	J02|	Waiter|	26	|Gauteng|
|E002|	Bob|	J02|	Waiter|	56	|Western Cape|
|E002|	Bob|	J03|	Bartender|	56|	Western Cape|
|E003|	Alice|	J01|	Chef	|56|	Western Cape|


Here, both Employee_ID and Job_Code are 'candidate keys' - meaning they both have potential to become a primary key/unique identifier for the table (or, together they could for a composite primary key!) 

Columns like Name, Province_Code and Home_Province depend on the Employee_ID, but not the Job_Code. Similarly, Job depends on Job_Code, but not Employee_ID. This is a partial functional dependency. We can solve this by again, decomposing into smaller tables to eliminate any partial functional dependencies:

**Employees table**

|EMPLOYEE_ID|	NAME|	PROVINCE_CODE|	HOME_PROVINCE|
| --- | --- | --- | --- |
|E001	|Alice	|26|	Gauteng|
|E002	|Bob	|56|	Western Cape|
|E003	|Alice	|56|	Western Cape|


**Jobs table**

|JOB_CODE	|JOB|
| --- | --- |
|J01	|Chef|
|J02	|Waiter|
|J03	|Bartender|


**Employee_roles table**

|EMPLOYEE_ID	|JOB_CODE|
| --- | --- |
|E001	|J01|
|E001	|J02|
|E002	|J02|
|E002	|J03|
|E003	|J01|


2NF aims to reduce data redundancies - multiple employees may be working the same role. If we had a large company with 100 sales reps, we'd have to list the job title with every occurrence of the job code, which could take up unnecessary space.


### 3NF

In order to get data into 3NF, it needs to already be in 2NF, and we need to ensure that there are no *transitive functional dependencies* - this means that all columns should be **directly** dependent on the primary key ONLY. Let's have a look at the below table:

**Employees table**

|EMPLOYEE_ID|	NAME|	PROVINCE_CODE	|HOME_PROVINCE|
| --- | --- | --- | --- |
|E001|	Alice|	26	|Gauteng|
|E002|	Bob|	56	|Western Cape|
|E003|	Alice|	56	|Western Cape|

Here, the primary key is Employee_ID. The column Home_Province is *transitively dependent* on the primary key. This is because the employee's home province is related to the Employee_ID column *through* the Province_Code column. 

In order to get this into 3NF we can decompose the table into smaller tables:

**Employees table**

|EMPLOYEE_ID	|NAME|	PROVINCE_CODE|
| --- | --- | --- | 
|E001|	Alice|	26|
|E002|	Bob	|56|
|E003|	Alice	|56|


**Provinces table**

|PROVINCE_CODE	|HOME_PROVINCE|
| --- | --- |
|26	|Gauteng|
|56	|Western Cape|

3NF helps to improve data integrity by ensuring all non-key columns are dependent on ONLY the primary key, and also helps to reduce data redundancy further. 

### BCNF, 4NF and further... [out of scope for this course!]

We can take our database further than 3NF, but 3NF is a generally accepted state for solving normalisation problems and getting our data in a state that is efficient to work with. Anything further than 3NF is out of scope for this course, but if you're curious, you can read up on [BCNF](https://www.geeksforgeeks.org/boyce-codd-normal-form-bcnf/) and [4NF](https://www.geeksforgeeks.org/introduction-of-4th-and-5th-normal-form-in-dbms/).

## Next time:
- Normalisation continued: looking at some code!
- Predict introduction and explanation