# Relational Databases

## PLEASE READ TUTORIALS BEFORE CONTINUING
* [A tutorial](http://www.ntu.edu.sg/home/ehchua/programming/sql/relational_database_design.html)
* [Another tutorial](https://docs.oracle.com/javase/tutorial/jdbc/overview/database.html)

In [None]:
from IPython.display import YouTubeVideo

YouTubeVideo("fKyf9e7Xmi8")

## Review of Databases

## Databases
* Relational (SQL)
    * MySQL
    * PostgreSQL
    * Oracle
    * Access
    * SQLite
* NoSQL
    * [ZODB (Zope Database)](http://www.zodb.org/en/latest/)
    * [MongoDB](https://api.mongodb.org/python/current/)
    * [Cassandra](https://github.com/datastax/python-driver)
    * And more
    

## Relational Databases
* What is a Relational Database?
>The relational model's central idea is to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values. The content of the database at any given time is a finite (logical) model of the database, i.e. a set of relations, one per predicate variable, such that all predicates are satisfied. A request for information from the database (a database query) is also a predicate. ([Wikipedia, "Relational Model"](https://en.wikipedia.org/wiki/Relational_model))
>
>The fundamental assumption of the relational model is that all data is represented as mathematical n-ary relations, an n-ary relation being a subset of the Cartesian product of n domains. In the mathematical model, reasoning about such data is done in two-valued predicate logic, meaning there are two possible evaluations for each proposition: either true or false (and in particular no third value such as unknown, or not applicable, either of which are often associated with the concept of NULL). Data are operated upon by means of a relational calculus or relational algebra, these being equivalent in expressive power. ([Wikipedia, "Relational Model"](https://en.wikipedia.org/wiki/Relational_model))

## Why Are Relational Databases Popular?
* Because they are based on first order logic
    * It is possible to precisely define a query language
* SQL
    * First standard published in 1986

## Basic SQL Concepts
* **Tables (CREATE TABLE):**
>Before you can do anything, you have to understand tables. If you don't have a table, you have nothing to work on. The table is the standard unit of information in a relational database. Everything revolves around tables. Tables are composed of rows and columns. And while that sounds simple, the sad truth is that tables are not simple. (*The Definitive Guide to SQLite*)

* **Modifying Tables (INSERT, ALTER, DELETE)**
* **Querying Tables (SELECT):**
>If the SELECT command is the most complex command in SQL, then the WHERE  clause is the most complex clause in SELECT. (*The Definitive Guide to SQLite*)

* A collection of tables is a database




## Python DB API
* Python API defines a set of features and functionalities that all Python database interfaces must subscribe to. 
    * This protects the user from the details of the specific API for each type of database.
    * That is, application should look the same despite the specific database being used.

### Module (Global) Variables
* apilevel 
    * 1.0 or 2.0; absence of value means not 2.0 compliant
* [threadsafety](https://www.python.org/dev/peps/pep-0249/#threadsafety) (0,1,2,3)
    * 0 threads may not share the module at all
	* 1 threads may share module but not connections
	* 2 threads may share, modules, connections, but not cursors
	* 3 module completely thread-safe
* [paramtyle](https://www.python.org/dev/peps/pep-0249/#paramstyle): how parameters are spliced into SQL queries
    * format: C-style formating
	* pyformat: Python extended format codes
	* numeric
	* named
	* qmark



## What are the api parameters for the sqlite3 package?

In [None]:
import sqlite3 as sqlite
print (sqlite.apilevel)
print (sqlite.threadsafety)
print (sqlite.paramstyle)

### Example of how paramstyle is used

```Python
c.executemany('INSERT INTO stocks VALUES (?,?,?,?,?)', purchases)
```

#### Note: We always let the cursor substitute the values into the SQL statement

### Let's Install Some Python Packages for MySQL and Postgresql

#### Install a pure python MySQL package

In [None]:
!conda install pymysql -y

In [None]:
import pymysql
print(pymysql.apilevel)
print(pymysql.threadsafety)
print(pymysql.paramstyle)

### Example of how paramstyle is used

```Python
cursor.execute('INSERT INTO tz_data VALUES (%s, %s, %s)', 
                                                 (v1,v2,v3))
```

#### Install the official MySQL Python bindings

In [None]:
!conda install mysql-connector-python -y

In [None]:
import mysql.connector

print (mysql.connector.apilevel)
print (mysql.connector.threadsafety)
print (mysql.connector.paramstyle)

```Python
stmt = "INSERT INTO employees (first_name, hire_date) VALUES (%s, %s)"
cursor.executemany(stmt, data)
```

#### Install Postgresql Python package

In [None]:
!conda install psycopg2 -y

In [None]:
import psycopg2
print (psycopg2.apilevel)
print (psycopg2.threadsafety)
print (psycopg2.paramstyle)

### Example of how paramstyle is used

```Python
cur.execute("INSERT INTO test (num, data) VALUES (%s, %s)",
                (100, "abc'def"))
```