#  SQL Queries 01

For more SQL examples in the SQLite3 dialect, seee [SQLite3 tutorial](https://www.techonthenet.com/sqlite/index.php). 

For a deep dive, see [SQL Queries for Mere Mortals](https://www.amazon.com/SQL-Queries-Mere-Mortals-Hands/dp/0134858336/ref=dp_ob_title_bk).

## Data

In [None]:
%load_ext sql

In [None]:
%sql sqlite:///data/faculty.db

In [None]:
%%sql

SELECT * FROM sqlite_master WHERE type='table';

Note: You can save results as a variable

In [None]:
%%sql master <<

SELECT * FROM sqlite_master WHERE type='table'

In [None]:
master.DataFrame()

## Basic Structure

```SQL
SELECT DISTINCT value_expression AS alias
FROM tables AS alias
WHERE predicate
ORDER BY value_expression
```

### Types

- Character (Fixed width, variable width)
- National Character (Fixed width, variable width)
- Binary
- Numeric (Exact, Arpproximate)
- Boolean
- DateTime
- Interval

**CHAR** and **NCHAR** are vendor-dependent. Sometimes they mean the same thing, and sometimes CHAR means bytes and NCHAR means Unicode.

The SQL standard specifies that character strings and datetime literals are enclosed by single quotes. Two single quotes wihtin a string is intepreted as a literal single quote.

```sql
'Gilligan''s island'
```

#### The CAST function

```sql
CAST(X as CHARACTER(10))
```

### Value expression

- Literal
- Column reference
- Function
- CASES
- (Value expression)

which may be prefixed with unary operators `-` and `+` and combined with binary operators appropriate for the data type.

Literal

In [None]:
%sql SELECT 23

Column reference

In [None]:
%sql SELECT first, last FROM person LIMIT 3

Function

In [None]:
%sql SELECT count(*) FROM person

Cases

In [None]:
%%sql

SELECT first, last, age,
CASE
    WHEN age < 50 THEN 'Whippernapper'
    WHEN age < 70 THEN 'Old codger'
    ELSE 'Dinosaur'
END comment
FROM person
LIMIT 4

Value expression

In [None]:
%%sql

SELECT first || ' ' || last AS name, age, age - 10 AS fake_age
FROM person
LIMIT 3

### Bineary operators

#### Concatenation

```SQL
A || B
```

#### Mathematical

```SQL
A + B
A - B
A * B
A / B
```

#### Data and time arithmetic

```SQL
'2018-08-29' + 3
'11:59' + '00:01'
```

In [None]:
%%sql

SELECT DISTINCT language_name
FROM language
LIMIT 5;

### Sorting

```SQL
SELECT DISTINCT value_expression AS alias
FROM tables AS alias
ORDER BY value_expression
```

In [None]:
%%sql

SELECT DISTINCT language_name
FROM language
ORDER BY language_name ASC
LIMIT 5;

In [None]:
%%sql

SELECT DISTINCT language_name
FROM language
ORDER BY random()
LIMIT 5;

### Filtering

For efficiency, place the most stringent filters first.

```SQL
SELECT DISTINCT value_expression AS alias
FROM tables AS alias
WHERE predicate
ORDER BY value_expression
```

#### Predicates for filtering rows

- Comparison operators (=, <>, <, >, <=, >=)
- BETWEEN start AND end
- IN(A, B, C)
- LIKE
- IS NULL
- REGEX

Use NOT prefix for negation

#### Combining predicates

```sql
AND
OR
```

USe parenthesis to indicate order of evaluation for compound statements.

In [None]:
%%sql

SELECT first, last, age
FROM person
WHERE age BETWEEN 16 AND 17
LIMIT 5;

### Joins

Joins combine data from 1 or more tables to form a new result set.

Note: To join on multiple columns just use `AND` in the `ON` expression

#### Natural join

Uses all common columns in Tables 1 and 2 for JOIN

```SQL
FROM Table1 
NATURAL INNER JOIN Table 2
```

#### Inner join

General form of INNER JOIN uisng ON

```SQL
FROM Table1 
INNER JOIN Table2
ON Table1.Column = Table2.Column
```

**Note**: This is equivalent to an EQUIJOIN but more flexible in that additional JOIN conditions can be specified.

```SQL
SELECT * 
FROM Table1, Table2
WHERE Table1.Column = Table2.Column
```

If there is a common column in both tables

```SQL
FROM Table1
INNER JOIN Table2
USING Column
```

Joining more than two tables

```SQL
From (Table1 
      INNER JOIN Table2
      ON Table1.column1 = Table2.Column1)
      INNER JOIN Table3 
      ON Table3.column2 = Table2.Column2
```

#### Outer join

General form of OUTER JOIN uisng ON

```SQL
FROM Table1 
RIGHT OUTER JOIN Table2
ON Table1.Column = Table2.Column
```

```SQL
FROM Table1 
LEFT OUTER JOIN Table2
ON Table1.Column = Table2.Column
```

```SQL
FROM Table1 
FULL OUTER JOIN Table2
ON Table1.Column = Table2.Column
```

In [None]:
%%sql

SELECT first, last, language_name 
FROM person
INNER JOIN person_language 
    ON person.person_id = person_language.person_id
INNER JOIN language 
    ON language.language_id = person_language.language_id
LIMIT 10;

### Set operations 

```SQL
SELECT a, b 
FROM table1
SetOp
SELECT a, b 
FROM table2
```

wehre SetOp is `INTERSECT`, `EXCEPT`, `UNION` or `UNION ALL`.

#### Intersection

```sql
INTERSECT
```

Alternative using `INNER JOIN`

#### Union

```SQL
UNION
UNION ALL (does not eliminate duplicate rows)
```

#### Difference

```SQL
EXCEPT
```

Alternative using `OUTER JOIN` with test for `NULL`

In [None]:
%%sql

DROP VIEW IF EXISTS language_view;
CREATE VIEW language_view AS
SELECT first, last, language_name 
FROM person
INNER JOIN person_language 
    ON person.person_id = person_language.person_id
INNER JOIN language 
    ON language.language_id = person_language.language_id
;

In [None]:
%%sql

SELECt * 
FROM language_view 
LIMIT 10;

In [None]:
%%sql

SELECt * 
FROM language_view 
WHERE language_name = 'Python'
UNION
SELECt * 
FROM language_view 
WHERE language_name = 'Haskell'
LIMIT 10;

In [None]:
%%sql

SELECt * 
FROM language_view 
WHERE language_name IN ('Python', 'Haskell')
ORDER BY first
LIMIT 10;

### Aggregate functions

```SQL
COUNT
MIN
MAX
AVG
SUM
```

In [None]:
%%sql

SELECT count(language_name) 
FROM language_view;

### Grouping

```SQL
SELECT a, MIN(b) AS min_b, MAX(b) AS max_b, AVG(b) AS mean_b
FROM table
GROUP BY a
HAVING mean_b > 5
```

The `HAVING` is analagous to the `WHERE` clause, but filters on aggregate conditions. Note that the `WHERE` statement filters rows BEFORE the grouping is done.

Note: Any variable in the SELECT part that is not an aggregte function needs to be in the GROUP BY part.

```SQL
SELECT a, b, c, COUNT(d)
FROM table
GROUP BY a, b, c
```

In [None]:
%%sql

SELECT language_name, count(*) AS n
FROM language_view
GROUP BY language_name
HAVING n > 45;

### The CASE switch

#### Simple CASE

```SQL
SELECT name,
(CASE sex 
 WHEN 'M' THEN 1.5*dose
 WHEN 'F' THEN dose
 END) as adjusted_dose
FROM table
```

#### Searched CASE

```SQL
SELECT name,
(CASE  
 WHEN sex = 'M' THEN 1.5*dose
 WHEN sex = 'F' THEN dose
 END) as adjusted_dose
FROM table
```

In [None]:
%%sql

SELECT first, last, language_name,
(CASE
    WHEN language_name LIKE 'H%' THEN 'Hire'
    ELSE 'FIRE'
END
) AS outcome
FROM language_view
LIMIT 10;

## User defined functions (UDF)

In [None]:
import sqlite3

In [None]:
import random
import statistics

In [None]:
con = sqlite3.connect(":memory:")

#### Row functions

In [None]:
con.create_function("rnorm", 2, random.normalvariate)

In [None]:
cr = con.cursor()

In [None]:
cr.execute('CREATE TABLE foo(num REAL);')

In [None]:
cr.execute("""
INSERT INTO foo(num) 
VALUES
(rnorm(0,1)), 
(rnorm(0,1)), 
(rnorm(0,1)), 
(rnorm(0,1)), 
(rnorm(0,1)),
(rnorm(0,1)), 
(rnorm(0,1)),
(rnorm(0,1))
""")

In [None]:
cr.execute('SELECT * from foo')
cr.fetchall()

#### Aggregate functions

In [None]:
class Var:
    def __init__(self):
        self.acc = []

    def step(self, value):
        self.acc.append(value)

    def finalize(self):
        if len(self.acc) < 2:
            return 0
        else:
            return statistics.variance(self.acc)

In [None]:
con.create_aggregate("Var", 1, Var)

In [None]:
cr.execute('SELECT Var(num) FROM foo')
cr.fetchall()

In [None]:
con.close()