# Background

### What is a database
- Allows us to store, manipulate and retrieve data
- Often stored on a computer server

### Relational databases and SQL
- Relational databases: Data is represented as consisting of tuples, which are grouped into *relations*
- SQL (Structured Query Language): a querying language used to query/maintain relational databases
- In SQL, relations are represented as tables: rows are tuples ("entries") and columns are attributes ("variables")
- SQL is old: introduced in 1974 as SEQUEL

### PostgreSQL
- Postgres: A relational database management system (RDBMS)
- Other (proprietary) examples: Oracle Database, MySQL, Microsoft SQL Server, ...
- RDBMSs are not compatible in general, but have a LOT in common

![relation_image](relational_database.png)

# Writing queries

### Styling and some key conventions
- Difference between readable/unreadable code, although the computer does not care
- Reserved keywords ('instructions') are written in ALL CAPS
- Table and attribute names are in all lower case, underscores as spaces
- Use indentation/white space/line breaks generously, for longer queries this makes a world of difference

### Things I often forgot, especially in the beginning
- Separate arguments within a single instruction by commas
- Strings are indicated by a *single* quote on each side ('a string'). Double quotes cause errors, at least in PostgreSQL
- Equality testing involves a *single* equality sign: 3 = 4 is false, 3 == 4 causes an error
- Every query ends with a semicolon

Example from https://www.sqlstyle.guide/:

In [None]:
select f.species_name, avg(f.height) as average_height, avg(f.diameter) as average_diameter from flora as f where f.species_name = 'Banksia' or f.species_name = 'Sheoak' or f.species_name = 'Wattle' group by f.species_name, f.observation_date;

In [None]:
SELECT  f.species_name,
        AVG(f.height) AS average_height,
        AVG(f.diameter) AS average_diameter
   FROM flora AS f
  WHERE f.species_name = 'Banksia'
     OR f.species_name = 'Sheoak'
     OR f.species_name = 'Wattle'
  GROUP BY f.species_name, f.observation_date;

# CRUD: Create, read, update and delete data

## Create table: CREATE TABLE

In [None]:
CREATE TABLE customers (
    id INT,
    first_name TEXT,
    last_name TEXT,
    email TEXT,
    gender TEXT,
    ip_address TEXT
);

See https://www.postgresql.org/docs/12/datatype.html for all types in Postgres

## Create data: INSERT INTO

In [None]:
INSERT INTO customers (id, first_name, last_name, email, gender, ip_address)
VALUES  (1, 'Arly', 'Llewellin', 'allewellin0@mtv.com', 'Female', '248.43.200.219'),
        (2, 'Belita', 'Faulkner', 'bfaulkner1@last.fm', 'Female', '218.205.154.252');

Not all fields are required, unless we add constraints (see below)

In [None]:
INSERT INTO customers (id, last_name, gender)
VALUES (3, 'Gadd', 'Male');

##### Why types are important

In [None]:
INSERT INTO customers (id, first_name)
VALUES ('Brian', 3)

But Postgres typing is not flawless

In [None]:
INSERT INTO customers (id, first_name)
VALUES (3,3)

## Constraints

Our customers table permits adding customers without a customer id, name or email address <br>
Constraints allow us to prevent such behaviour by setting hard requirements on the data

In [None]:
CREATE TABLE customers_constrained (
    id INT UNIQUE NOT NULL,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    age INT CHECK (age >= 0),
    email TEXT,
    gender TEXT,
    ip_address TEXT
);

In [None]:
INSERT INTO customers_constrained (id, first_name, last_name, age, email, gender, ip_address)
VALUES  (1, 'Arly', 'Llewellin', 32, 'allewellin0@mtv.com', 'Female', '248.43.200.219'),
        (2, 'Belita', 'Faulkner', 46, 'bfaulkner1@last.fm', 'Female', '218.205.154.252');

In [None]:
INSERT INTO customers_constrained (id, last_name, gender) VALUES (3, 'Gadd', 'Male');

When we look at joins, two more constraints (primary key and foreign key) will be important

## Read data/table: SELECT
Use SELECT * to show everything

In [None]:
SELECT *
FROM customers;

We can select columns in any order

In [None]:
SELECT first_name, id, email
FROM customers;

We can give columns custom names in a query ("aliasing")

In [None]:
SELECT  first_name AS alias_1,
        last_name AS alias_2
FROM customers;

## Update data/table: UPDATE

In [None]:
UPDATE customers
SET gender = 'Male';

We can conveniently use arithmetic operations

In [None]:
UPDATE customers
SET id = 2*id + 15;

## Delete data: DELETE

In [None]:
DELETE last_name
FROM customers;

To delete everything in the table

In [None]:
DELETE FROM customers

## Delete table: DROP TABLE

In [None]:
DROP TABLE customers

# Filtering, sorting, grouping

## Filtering results: WHERE clause

In [None]:
SELECT *
FROM customers
WHERE first_name = 'Arly';

## Sorting results: ORDER BY clause

In [None]:
SELECT *
FROM customers
ORDER BY last_name;

## Grouping results: GROUP BY clause

##### Note: typically used with aggregate functions (follow later)

In [None]:
SELECT sum(id)
FROM customers
GROUP BY last_name;