### ST445 Managing and Visualizing Data
# Creating and managing databases
### Week 3 Lecture, MT 2017 - Kenneth Benoit, Christian Mueller

## Plan for today

- Relational and non-relational databases
- SQL and SQLite:
    - Creating tables
    - Querying data
    - Filtering
    - Grouping and aggregation
    - Joining tables
    - Adding indexes

## Database systems

#### Relational databases

- Mainly implementations and extensions of the SQL Standard ([ISO/IEC 9075:2016](https://www.iso.org/standard/63556.html))
- Transactions are always **ACID** (atomic, consistent, isolated, durable)
- Data needs to be defined

#### Non-relational databases

- Key-value storage types (e.g. Amazon DynamoDB) or document storage types (e.g. CouchDB, MongoDB)
- Sometime labelled as providing **ACID** transactions but often only _eventually consistent_

- FYI for clicking on the SQL standard link: The standard is open, i.e. anyone get it, but subject to a fee

## Basic SQL concepts

(Most of this will be familiar from last week's lecture and class on managing data)

- The basic unit is the **database**
- It might be stored on disk in a single file or a range of files managed by a server
- The database constists of **tables** which store actual data
- A table consist of at least one **column** whose name and data type need to be declared
- Data is stored in the **rows** of a table

## Basic SQL syntax

- Defining data: `CREATE TABLE`
- Accessing data: `SELECT`
- Most functionality is part of the `SELECT` statement:
    - Filter: `SELECT ... WHERE`
    - Sort: `SELECT ... ORDER BY`
    - Aggregate: `SELECT ... GROUP BY`
    - Aggregate and filter: `SELECT ... GROUP BY ... HAVING`
    - Combining data: `SELECT ... JOIN`
- Adding constraints: `CREATE CONSTRAINT`
- Adding indexes: `CREATE INDEX`

## SQL Syntax caveats

- SQL syntax is **case-insensitive**
- `;` has to be added at the end of a line to terminate it (as in C-family languages, Javascript, ...)


## Setting up the SQLite command line


#### Installation via anaconda

```
conda install sqlite
```

#### Connecting to a database

```sh
sqlite st445-week03.db
```

```
SQLite version 3.20.1 2017-08-24 16:21:36
Enter ".help" for usage hints.
sqlite>
```

### Creating a table

```SQL
CREATE TABLE table_name (column_name column_type [, column_name column_type]) ;
```

Creating a table involves two things:

1. Giving the table a name constiting of alphanumeric characters and `_`
2. Giving each column a name and a **data type**

The SQL Standard defines several common data types and most SQL implementations provide additional ones:

| Type | Description                |
|:-----|:---------------------------|
| INT, BIGINT | Integer (4- and 8-byte). |
| FLOAT, DOUBLE | Single or double precision floating point number (4 or 8 bytes). |
| TEXT | String, stored using the database encoding (UTF-8, ...).|
| BLOB | Raw binary data. |
| BOOLEAN | True or false. |
| DATE, TIMESTAMP | Date and date-time. |

Compare data types available in [SQLite](https://www.sqlite.org/datatype3.html), [MySQL](https://dev.mysql.com/doc/refman/5.7/en/data-type-overview.html), and [PostgreSQL](https://www.postgresql.org/docs/current/static/datatype.html#DATATYPE-TABLE).

#### Example

```SQL
CREATE TABLE my_table (my_integer INT, my_float FLOAT, my_text TEXT) ;
```

## Adding data to a table

You probably will not do this from by hand but

```SQL
INSERT INTO table_name [(column_name [, column_name])] VALUES (value1 [, value2]);
```

#### Example

```SQL
INSERT INTO my_table VALUES (1, 1.3, 'abc') ;
INSERT INTO my_table VALUES (4, 4.3, 'def') ;
```

### Retrieving data

Whenever data should be retrieved, the statement starts with `SELECT`.

```SQL
SELECT column_name [, column_name] | * FROM table_name ;
```

- The most simple invocation selects all the columns with `*`

#### Example 1: Dummy table created above

```SQL
SELECT * FROM my_table ;
```

```
1|1.3|abc
4|4.3|def
```

#### Example 2: Actual data


```SQL
SELECT * FROM lecture_TBD ;
```

```
TBD
```

- This will return all the rows in the table which is not practical for tables with many rows
- To just have a glance at the first X rows of the data:
    ```SQL
    SELECT * FROM lecture_TBD LIMIT 20 ;
    ```


- This works the same as the `head(dat, n = 20)` function in R from last week
- **NB: There is no implicit ordering of rows in SQL**
- If the data is not explicitly ordered (will be explained shortly) there is no order whatsoever

* **Lab**: Working with a relational database manager
* **Next week**: Using data from the Internet