# SQL Workshop

## Introductions

* Name
* Background/Role
* Prior Experiences
* Expectations
* Hobby

* Michael Burgess
    * michael.burgess@decoded.com
* Head of Technical Solutions -- design commerical educational programmes
    * Data, Analytics, & AI -- Physics, Contracting (defence, telefony, mobile)
* Arguing, Podcasts, Youtube


* John
    * Fidelity, Infrastructure Architect
        * Java, SQL (prior experience)
    * Applying SQL within jupyter env. 
    * Travel, boy scouting, camping
* Sofiah
    * Fidelity, Brunei, BA in Innovation/AI
        * no prior experience
        * last day of work 
    * Use SQL
    * Theatre, Music, Orchestra
* Philip
    * Service Reporting, Service Management Reporting
    * Some prior SQL, connecting & pulling -- no analytics
        * python & SQL
    * Rugby, Sci-Fi
    * Be Bold
* Gema (BGC)
    * Internal Audit (Snr Audit Analyst)
        * minimial prior exp. with coding, sql
        * apply within testing analyst
    * Galleries, Creative... 

By default, the `sqlite3` library is included in python,

In [51]:
import sqlite3

The `sqlalchemy` library allows you to work with *any* database,

In [52]:
from sqlalchemy import create_engine

In [53]:
db = create_engine("sqlite:///newd3.db")
con = db.connect()

---

## Using Pandas with SQL

Pandas is like Excel... its just a spreadsheet interface. Pandas always needs to "open" ("read") a dataset.

Sources: csv, json... or database connection.

In [54]:
import pandas as pd

In [55]:
data = pd.DataFrame({
    "Age": [10, 20, 30, 40],
    "Name": ["Alice", "Eve", "Bob", "Dan"]
})

In [56]:
data

Unnamed: 0,Age,Name
0,10,Alice
1,20,Eve
2,30,Bob
3,40,Dan


In [57]:
data.to_sql('people',con, index=False)

4

In [58]:
food = pd.DataFrame({
    "Food": ["Honey", "Milk", "Bread"],
    "Name": ["Alice", "Eve", "Bob"]
})

In [59]:
food.to_sql('food',con, index=False)

3

In [60]:
pd.read_sql("SELECT * FROM people", con)

Unnamed: 0,Age,Name
0,10,Alice
1,20,Eve
2,30,Bob
3,40,Dan


In [61]:
pd.read_sql("SELECT * FROM food", con)

Unnamed: 0,Food,Name
0,Honey,Alice
1,Milk,Eve
2,Bread,Bob


In [62]:
pd.read_sql("SELECT * FROM food, people", con)

Unnamed: 0,Food,Name,Age,Name.1
0,Honey,Alice,10,Alice
1,Honey,Alice,20,Eve
2,Honey,Alice,30,Bob
3,Honey,Alice,40,Dan
4,Milk,Eve,10,Alice
5,Milk,Eve,20,Eve
6,Milk,Eve,30,Bob
7,Milk,Eve,40,Dan
8,Bread,Bob,10,Alice
9,Bread,Bob,20,Eve


In [63]:
pd.read_sql("SELECT * FROM food, people WHERE food.Name = people.Name", con)

Unnamed: 0,Food,Name,Age,Name.1
0,Honey,Alice,10,Alice
1,Milk,Eve,20,Eve
2,Bread,Bob,30,Bob


In [64]:
pd.read_sql("SELECT * FROM food JOIN people ON food.Name = people.Name", con)

Unnamed: 0,Food,Name,Age,Name.1
0,Honey,Alice,10,Alice
1,Milk,Eve,20,Eve
2,Bread,Bob,30,Bob


In [71]:
pd.read_sql("SELECT * FROM people LEFT JOIN food ON food.Name = people.Name", con)

Unnamed: 0,Age,Name,Food,Name.1
0,10,Alice,Honey,Alice
1,20,Eve,Milk,Eve
2,30,Bob,Bread,Bob
3,40,Dan,,


In [82]:
pd.read_sql("""

SELECT * 
FROM people
WHERE Name NOT IN (SELECT Name FROM food) 

""", con)

Unnamed: 0,Age,Name
0,40,Dan


In [78]:
pd.read_sql("""

SELECT ...
FROM (SELECT * FROM people)
WHERE 1 IN (SELECT id FROM people)

""", con)

Unnamed: 0,Age,Name
0,10,Alice
1,20,Eve
2,30,Bob
3,40,Dan


---

### Template

In [18]:
query = """
    SELECT *
    FROM people
"""

pd.read_sql( query , con)

Unnamed: 0,Age,Name
0,10,Alice
1,20,Eve
2,30,Bob


---

In [22]:
query = """
    SELECT Age
    FROM people
"""

pd.read_sql( query , con)

Unnamed: 0,Age
0,10
1,20
2,30


In [23]:
query = """
    SELECT AVG(Age)
    FROM people
"""

pd.read_sql( query , con)

Unnamed: 0,AVG(Age)
0,20.0


In [20]:
query = """
    SELECT Age, Name
    FROM people
"""

pd.read_sql( query , con)

Unnamed: 0,Age,Name
0,10,Alice
1,20,Eve
2,30,Bob


In [21]:
query = """
    SELECT Age, Name
    FROM people
    ORDER BY Name
"""

pd.read_sql( query , con)

Unnamed: 0,Age,Name
0,10,Alice
1,30,Bob
2,20,Eve


In [26]:
query = """
    SELECT 
        AVG(Age) AS AvgAge
    FROM people
"""

pd.read_sql( query , con)

Unnamed: 0,AvgAge
0,20.0


---

## Activity: Workbook Notebook 1 (25 min) to 1045am

---

In [1]:
# Import create_engine from sqlalchemy to connect to the database
from sqlalchemy import create_engine

# Import pandas
import pandas as pd

# Create an engine to the database sqlite-sakila.db
engine = create_engine("sqlite:///sqlite-sakila.db")

# Establish a connection to the database
conn = engine.connect()

* Find a long film, with the lowest replacement cost which is not adult. 
    * long = 2hr
    * not adult = neither R, NC17

In [20]:
query = """
    SELECT title
    FROM film
    WHERE 
        (length >= 120)
    AND (NOT rating IN ('R', 'NC-17'))
    ORDER BY replacement_cost ASC
    LIMIT 1
"""

pd.read_sql_query(query, conn)

Unnamed: 0,title
0,CONTROL ANTHEM


In [21]:
query = """
    SELECT title
    FROM film
    WHERE 
        (length >= 120)
    AND (rating != 'R')
    AND (rating != 'NC-17')
    ORDER BY replacement_cost ASC
    LIMIT 1
"""

pd.read_sql_query(query, conn)

Unnamed: 0,title
0,CONTROL ANTHEM


In [18]:
pd.read_sql_query("SELECT * FROM film LIMIT 2", conn)

Unnamed: 0,film_id,title,description,release_year,language_id,original_language_id,rental_duration,rental_rate,length,replacement_cost,rating,special_features,last_update
0,1,ACADEMY DINOSAUR,A Epic Drama of a Feminist And a Mad Scientist...,2006,1,,6,0.99,86,20.99,PG,"Deleted Scenes,Behind the Scenes",2021-03-06 15:52:00
1,2,ACE GOLDFINGER,A Astounding Epistle of a Database Administrat...,2006,1,,3,4.99,48,12.99,G,"Trailers,Deleted Scenes",2021-03-06 15:52:00


### Applying Filtering Conditions to Rows

```sql

A 120 PG        T AND  (T OR T) = T
B 120 G         T AND  (T OR T) = T
C 120 NC17      T AND  (T OR F) = F <-- OOPS!
D 120 R
E 120 G
F 120 R

```

In [16]:
query = """
    SELECT title
    FROM film
    WHERE 
        (length >= 120)
    AND (rating = 'R' OR rating != 'NC-17')
    ORDER BY replacement_cost ASC
    LIMIT 1
"""

pd.read_sql_query(query, conn)

Unnamed: 0,title
0,CONTROL ANTHEM


In [2]:
# Writing an SQL query 
query = """SELECT title
FROM film
WHERE length > 120
    AND (rating != 'R' OR rating != 'PG-13')
ORDER BY replacement_cost ASC
LIMIT 1"""

# Querying the database
pd.read_sql_query(query, conn)

Unnamed: 0,title
0,CONTROL ANTHEM


In [26]:
pd.read_sql("SELECT * FROM actor, film LIMIT 10", conn)

Unnamed: 0,actor_id,first_name,last_name,last_update,film_id,title,description,release_year,language_id,original_language_id,rental_duration,rental_rate,length,replacement_cost,rating,special_features,last_update.1
0,1,PENELOPE,GUINESS,2021-03-06 15:51:59,1,ACADEMY DINOSAUR,A Epic Drama of a Feminist And a Mad Scientist...,2006,1,,6,0.99,86,20.99,PG,"Deleted Scenes,Behind the Scenes",2021-03-06 15:52:00
1,1,PENELOPE,GUINESS,2021-03-06 15:51:59,2,ACE GOLDFINGER,A Astounding Epistle of a Database Administrat...,2006,1,,3,4.99,48,12.99,G,"Trailers,Deleted Scenes",2021-03-06 15:52:00
2,1,PENELOPE,GUINESS,2021-03-06 15:51:59,3,ADAPTATION HOLES,A Astounding Reflection of a Lumberjack And a ...,2006,1,,7,2.99,50,18.99,NC-17,"Trailers,Deleted Scenes",2021-03-06 15:52:00
3,1,PENELOPE,GUINESS,2021-03-06 15:51:59,4,AFFAIR PREJUDICE,A Fanciful Documentary of a Frisbee And a Lumb...,2006,1,,5,2.99,117,26.99,G,"Commentaries,Behind the Scenes",2021-03-06 15:52:00
4,1,PENELOPE,GUINESS,2021-03-06 15:51:59,5,AFRICAN EGG,A Fast-Paced Documentary of a Pastry Chef And ...,2006,1,,6,2.99,130,22.99,G,Deleted Scenes,2021-03-06 15:52:00
5,1,PENELOPE,GUINESS,2021-03-06 15:51:59,6,AGENT TRUMAN,A Intrepid Panorama of a Robot And a Boy who m...,2006,1,,3,2.99,169,17.99,PG,Deleted Scenes,2021-03-06 15:52:00
6,1,PENELOPE,GUINESS,2021-03-06 15:51:59,7,AIRPLANE SIERRA,A Touching Saga of a Hunter And a Butler who m...,2006,1,,6,4.99,62,28.99,PG-13,"Trailers,Deleted Scenes",2021-03-06 15:52:00
7,1,PENELOPE,GUINESS,2021-03-06 15:51:59,8,AIRPORT POLLOCK,A Epic Tale of a Moose And a Girl who must Con...,2006,1,,6,4.99,54,15.99,R,Trailers,2021-03-06 15:52:00
8,1,PENELOPE,GUINESS,2021-03-06 15:51:59,9,ALABAMA DEVIL,A Thoughtful Panorama of a Database Administra...,2006,1,,3,2.99,114,21.99,PG-13,"Trailers,Deleted Scenes",2021-03-06 15:52:00
9,1,PENELOPE,GUINESS,2021-03-06 15:51:59,10,ALADDIN CALENDAR,A Action-Packed Tale of a Man And a Lumberjack...,2006,1,,6,4.99,63,24.99,NC-17,"Trailers,Deleted Scenes",2021-03-06 15:52:00


---

In [75]:
query = """
select customer_id, address
from customer
join store
join address
ON
    store.store_id = customer.customer_id
    address.address_id = customer.address_id
WHERE
store.store_id = 1

"""
pd.read_sql(query, conn)

OperationalError: (sqlite3.OperationalError) near "address": syntax error
[SQL: 
select customer_id, address
from customer
join store
join address
ON
    store.store_id = customer.customer_id
    address.address_id = customer.address_id
WHERE
store.store_id = 1

]
(Background on this error at: https://sqlalche.me/e/14/e3q8)

---

## Errata

* bug: logic/filter
* erd: tidy up links
* question clarity

---

## Q&A

* when JOIN?
    * always
    * RDBMs are heavily optimized for JOINs
* when subquery?
    * rarely
    * can often be hard for the DB to optimze
    
* really useful:
```sql
INSERT INTO table VALUES (SELECT ... FROM other_table)
```