# S01 Relational Databases

For a simple tutorial on database design, see [Introduction to Database Design](https://www.datanamic.com/support/lt-dez005-introduction-db-modeling.html)

For a deep dive, see [Database Design for Mere Mortals](https://www.amazon.com/Database-Design-Mere-Mortals-Hands/dp/0321884493/ref=dp_ob_title_bk)

## 0. Packages for working with relational databases in Python

- [Python Database API Specification v2.0](https://www.python.org/dev/peps/pep-0249/) - The standard Python Database API
- [sqlite3](https://docs.python.org/3.7/library/sqlite3.html) - API for builit-in `sqlite3` package
- [Database drivers](https://github.com/vinta/awesome-python#database-drivers) - For connecting to other databases
- [ipython-sql](https://github.com/catherinedevlin/ipython-sql) - SQL magic in Jupyter
- [SQLAlchemy](https://www.sqlalchemy.org) - Most well-known Object Relational Mapper (ORM)
- [Pony ORM](https://ponyorm.com) - Alternative ORM

-----

## 1. Motivation

Why relational databases and SQL?

- History of databases
- ACID
- Data integrity
- Schema

### Note: [Motivation] History of databases

- flat files
    - rows & columns
    - like R & Python data frame
- hierarchical databases
    - tree instead of flat files
    - HDF file format (.hdf5, .h5)
        - Matlab store its files in this format
        - NetCDF
- Network database (which is already died)
- Relational DB (1960)
- [NoSQL](https://en.wikipedia.org/wiki/NoSQL)

### Notes: [Motivation] Relational database
- **store relations**
    - example person <-----> job, name, location, ......
    - [referential integrity](https://en.wikipedia.org/wiki/Referential_integrity)
        - person <-----> job, name, location
        - job    <-----> title, 
        - every rows needs to be unique
        - when one places is changed, you don't need to change most of other information
            - primary key and foreign key
        - remove redundancy
- [**relational database managing system (RDMS)**](https://techterms.com/definition/rdbms)
    - memory
    - disk
    - dictionary
        - ex: column properties
    - query language (SQL)
        - most vender have the common core with different extension of functionalities
        - => most of the syntax are the same for different RDMS
- Structure
    - Table   (Relation)
    - Columns (Attribute)
    - Rows (Tuple)
    - Note: 
        - the person who invented it was a mathematician; 
        - therefore, the terms are often from set theory

![Structure of relational DB](https://cdn.tutsplus.com/net/authors/lalith-polepeddi/relational-databases-for-dummies-fig1.png)

### Notes: [Motivation] ACID Properties
- Atomic 
    - a transaction (ex: insert, delete, etc)
        - Begin --- End => cannot go half way
- consistent
    - all simutaneious queries -> same answer
- isolated 
    - queries from different persons are 
- durable
    - store in disk
    - so if the power goes off, the data should not be lost
    
    
**problem: relatively slow**  
during the synchronization, the clients are not allow to query things (to maintain consistent)

**imagine: bank transferring money**    
Imagine the money is transferred from bank A to bank B and the power was off. If ACID properties do not hold, you may get the result that bank A has transferred your money out but bank B has not received your money.

### Notes: [Motivation] Schema

- Examples
    - Person
        - name **varchar(255)**
- schema makes the system rigid
    - constrain on type, range, ...

-----

## 2. RDBMS

- Memory
- Storage
- Dictionary
- Query language

## 3. Anatomy

### Table (Relation)

Represents a *subject* or an *event*.

### Column (Attribute)

Represents a single *variable* or *feature*.

### Row (Tuple)

Represents an *observation*.

### Notes: [Anatomy] table, column, row
- a table  -> 1 entity
- a column -> 1 variable
- a row    -> 1 observation

![Structure of relational DB](https://cdn.tutsplus.com/net/authors/lalith-polepeddi/relational-databases-for-dummies-fig1.png)

![](https://upload.wikimedia.org/wikipedia/commons/thumb/7/7c/Relational_database_terms.svg/350px-Relational_database_terms.svg.png)

## 4. Concepts

### Constraints

You can impose constraints that values in a column have to take. For example, you can specify that values are compulsory (NOT NULL), or UNIQUE or fall within a certain range.

#### Notes of [Concepts] Constraints
- Constraints
    - example: SSN
        - UNIQUE (most of the time)
        - not null

### Referential integrity

- Primary key represents a unique identifier of a row. It may be simple or composite.
  - Unique
  - Non-null
  - Never optional
- Foreign key is a column containing the primary key of a different table. It enforces *referential integrity*.

#### Notes of [Concepts] Referntial integrity
- Keys
    - a column uniquely identifies a row
    - **primary keys**
        - *unique* and *not null*
        - example: country code
            - therefore, first and last name is not able to become keys
            - => use artificial id instead
        - composite primary key
            - a whole row is unique
            - used as primary key
    - **foreign key**
        - primary key of different table
        - this is how you link tables together

![](https://i.ytimg.com/vi/x2udY8IBXQ4/maxresdefault.jpg)

### Relationships

- One to one
- One to many
- Many to many

- What happens on delete?
  - Restrict
  - Cascade

#### Notes of Relationships
- one-to-many
- one-to-one
    - three columns
        - person id (id) --- name --- salary
    - => for security issue, splited into two table (confidentiality)
        - pid --- name
        - pid --- salary (create another one-to-one table)
- many-to-many
    - example: 
        - student --- multiple classes
        - class   --- multiple students
        - construction:
            - three tables:
                - student:
                    - pid --- name, gender, ......
                - class:
                    - cid --- ......
                - student-class (which is called linker table)
                    - pid --- cid
            - if you delete a class in a table
                - either refuse or cascade
                - only administer need to worry about it

### Indexes

An index is a data structure that allows fast search of a column (typically from linear to log time complexity). Most databases will automatically build an index for every primary key column, but you can also manually specify columns to build indexes for. 

#### Notes of [Concepts] Indexes
- indexes
    - a kind of data structure
    - Why we need a data structure for a table
        - imagine your want to search an item ---> too long if searching line by line
        - hash table
        - tree

### Views

- Temporary virtual table retuned as a result of a *query*.
- Views only specify the strucutre of a table - the contents are constructed on the fly from existing tables.
- Queries return a Result Set

#### Notes of [Concepts] of Views                
- view
    - saved query
    - a virtual table that combine a set of tables together
    - demomalized table
        - allow duplications (which are stored as foreign keys)  
    - example
        - `Create view foo as SELECT blah`
        - => Now a foo is pseudotable  which is represented by a SQL statement 
        - => every time you call the table foo, it runs that block of SQL statement again 
        - Note: 
            - for some situation, you might not want Since joining is instead of creating view
            - you can create a new demonaltize table

## 5. Design

### Columns

- Use singlular form for name 
- Use informative names
- Use unique names not shared by any other table (except foreign keys)
- Column must be an attribute of the table's subject
- Eliminate multi-part columns
- Eliminate multi-value columsn
- Eliminate redundant columns

#### Notes of [Design] Columns
- Use singlular form for name 
    - (X) first-names
    - (O) first-name
- Use informative names
    - do not use abbreviation
- Use unique names not shared by any other table (except foreign keys)
    - someone may use *person-id, person-first, person-last, person-age*
- Column must be an attribute of the table's subject
- Eliminate multi-part columns
    - example: student --- classes as column is a mess
        - math, latin
        - math, bio,   geo
        - math, music, history, 
- Eliminate multi-value columns
- Eliminate redundant columns
    - example: don't store bmi when you have height and weight

### Tables

- Use singular/plural forms for name (controversial)
- Enusre every table has a primary key
- Eliminate duplicate columns

### Relationships

- Establish participation type and degree of relationship
  - One to one
  - One to many
  - Many to many

## 6. Example

Use `sqlmagic` as alternative to using `sqlite3` driver.

In [1]:
%load_ext sql

Connect to SQLite3 database on disk (creates it if it does not exist)

In [2]:
%sql sqlite:///data/dummy.db

'Connected: @data/dummy.db'

SQL for table deletion and creation

In [3]:
%%sql

DROP TABLE IF EXISTS Country;
DROP TABLE IF EXISTS Person;

CREATE TABLE Country (
    country_id varcarh(2) PRIMARY KEY,
    country_name varchar(255)
);

CREATE TABLE Person (
    person_id INTEGER PRIMARY KEY,
    person_first varchar(255),
    person_last varchar(255),
    country_id INTEGER NOT NULL,
      FOREIGN KEY (country_id) REFERENCES Country(country_id)
);

 * sqlite:///data/dummy.db
Done.
Done.
Done.
Done.


[]

SQL to insert rows.

In [4]:
%%sql

INSERT INTO Country(country_id, country_name) 
VALUES ('FR', 'France'), ('CU', 'CUBA');

 * sqlite:///data/dummy.db
2 rows affected.


[]

In [5]:
%%sql

INSERT INTO Person(person_first, person_last, country_id) 
VALUES 
('Napolean', 'Bonaparte', 'FR'),
('Luis','Alvarez', 'CU');

 * sqlite:///data/dummy.db
2 rows affected.


[]

Accessing the RDBMS dictionary.

In [6]:
%%sql

SELECT name FROM sqlite_master 
WHERE type = "table";

 * sqlite:///data/dummy.db
Done.


name
Country
Person


In [7]:
%%sql

SELECT sql FROM sqlite_master 
WHERE name='Person';

 * sqlite:///data/dummy.db
Done.


sql
"CREATE TABLE Person (  person_id INTEGER PRIMARY KEY,  person_first varchar(255),  person_last varchar(255),  country_id INTEGER NOT NULL,  FOREIGN KEY (country_id) REFERENCES Country(country_id) )"


SQL as a Query Language.

In [8]:
%%sql

SELECT person_first as first, person_last AS last, country_name AS nationality
FROM Person 
INNER JOIN country 
ON Person.country_id = Country.country_id;

 * sqlite:///data/dummy.db
Done.


first,last,nationality
Napolean,Bonaparte,France
Luis,Alvarez,CUBA


Visualizing the entitry-relationship diagram (ERd).

In [10]:
import os
from eralchemy import render_er

if not os.path.exists('erd_from_sqlalchemy.png'):
    render_er('sqlite:///data/dummy.db', 'erd_from_sqlalchemy.png')

![](erd_from_sqlalchemy.png)

## Homework walk-through

Convert the flat file data in `data/flat.csv` into a well-structured relational database in SQLite3 stored as `data/faculty.db`. Note - salary information is confidential and should be kept in a separate table from other personal data.

In [11]:
import pandas as pd

In [12]:
flat = pd.read_csv('data/flat.csv', keep_default_na=False)
flat.sample(10)

Unnamed: 0,name,gender,age,height,weight,salary,nationality,code,country,language1,language2,language3,first,last
1137,Ping O'neil,Female,52,1.72,78,64000,Mexican,ME,Mexico,Elixir,Assembly,,Ping,O'neil
212,Celena Carney,Female,43,1.87,76,77000,Bolivian,BO,Bolivia,Clojure,Elixir,,Celena,Carney
1324,Solomon Merrill,Male,66,1.88,47,97000,Spanish,SP,Spain,,,,Solomon,Merrill
1399,Timothy Duffy,Female,19,1.52,57,72000,Romanian,RO,Romania,,,,Timothy,Duffy
923,Mahalia Clark,Female,30,1.77,63,119000,Iranian,IR,Iran,,,,Mahalia,Clark
340,Denny Harris,Female,31,1.86,84,115000,Taiwanese,TW,Taiwan,Groovy,,,Denny,Harris
189,Carie Ray,Female,22,1.85,71,136000,Mexican,ME,Mexico,Dylan,Ruby,Objective-C,Carie,Ray
659,Jarrod Hall,Male,53,1.75,81,143000,Irish,IE,Ireland,Prolog,Lua,Dylan,Jarrod,Hall
734,Karl Sandoval,Female,20,1.79,41,103000,Brazilian,BR,Brazil,,,,Karl,Sandoval
946,Marcus Nieves,Male,36,1.79,66,90000,Australian,AU,Australia,Io,Bash,,Marcus,Nieves


Brainstorming: How do we create tables
- person
- language
- country
- confidential (salary) (one-to-one table)