# Introduction to SQL, inside a Python-based Jupyter Notebook

This Notebook focuses on SQL, while using the Python-based IPython kernel. For this, we take advantage of the [extensible IPython magic command system](https://ipython.readthedocs.io/en/stable/interactive/magics.html), as described below.

Our first step is to load the [ipython-sql extension](https://github.com/catherinedevlin/ipython-sql), which exposes the `%%sql` cell magic.

## One-time setup - temporary fix

_Note:_ I accidentally missed updating the hub to include the SQL IPython support. We've [requested a fix on the hub](https://github.com/berkeley-dsep-infra/datahub/issues/3897), but in the meantime, the cell below will do an install if necessary.

Please use similar code in your other uses of SQL on the hub til we get this fixed (should happen soon).  Once this is fixed, the cell below can be replaced with simply

```python
%load_ext sql
```

In [1]:
import sql
sql??

[0;31mType:[0m        module
[0;31mString form:[0m <module 'sql' from '/home/jovyan/.local/lib/python3.9/site-packages/sql/__init__.py'>
[0;31mFile:[0m        ~/.local/lib/python3.9/site-packages/sql/__init__.py
[0;31mSource:[0m      [0;32mfrom[0m [0;34m.[0m[0mmagic[0m [0;32mimport[0m [0;34m*[0m[0;34m[0m[0;34m[0m[0m


In [2]:
try:
    %load_ext sql
except ModuleNotFoundError:
    %pip install ipython-sql
    import sys
    print('#'*80 + 
          '\n\nPlease restart your kernel before continuing.\n\n' + 
          '#'*80 ,
          file=sys.stderr)

In [3]:
# %load_ext sql

# We'll also connect directly to our DB with Pandas later
import sqlalchemy
import pandas as pd

## Connection and basic queries

Then we will connect to the database. In this lecture example, the database is stored in a single file on our own computer called lec18_basic_examples.db.

Note that starting a cell with `%%sql` tells Jupyter that you are running SQL code, not Python code.

In actual practice, you'd usually connect to some database server via a network connection, e.g. hosted on some computer on the internet.

In [4]:
%%sql
sqlite:///data/lec18_basic_examples.db

Now that we're connected, we can, for example, display the contents of the Dragon table.

In [5]:
%%sql
SELECT * FROM Dragon;

 * sqlite:///data/lec18_basic_examples.db
Done.


name,year,cute
hiccup,2010,10
drogon,2011,-100
dragon 2,2019,0


In [6]:
%%sql
SELECT * FROM sqlite_master WHERE type='table'

 * sqlite:///data/lec18_basic_examples.db
Done.


type,name,tbl_name,rootpage,sql
table,sqlite_sequence,sqlite_sequence,7,"CREATE TABLE sqlite_sequence(name,seq)"
table,Dragon,Dragon,2,"CREATE TABLE Dragon (  name TEXT PRIMARY KEY,  year INTEGER CHECK (year >= 2000),  cute INTEGER )"
table,Dish,Dish,4,"CREATE TABLE Dish (  name TEXT PRIMARY KEY,  type TEXT,  cost INTEGER CHECK (cost >= 0) )"
table,Scene,Scene,6,"CREATE TABLE Scene (  id INTEGER PRIMARY KEY AUTOINCREMENT,  biome TEXT NOT NULL,  city TEXT NOT NULL,  visitors INTEGER CHECK (visitors >= 0),  created_at DATETIME DEFAULT (DATETIME('now')) )"


In [7]:
%%sql
SELECT * FROM Dragon;

 * sqlite:///data/lec18_basic_examples.db
Done.


name,year,cute
hiccup,2010,10
drogon,2011,-100
dragon 2,2019,0


In [8]:
%%sql
SELECT cute, year FROM Dragon;

 * sqlite:///data/lec18_basic_examples.db
Done.


cute,year
10,2010
-100,2011
0,2019


In [9]:
%%sql
SELECT cute AS cuteness, year AS birth FROM Dragon;

 * sqlite:///data/lec18_basic_examples.db
Done.


cuteness,birth
10,2010
-100,2011
0,2019


In [10]:
%%sql
SELECT cute AS cuteness,
       year AS birth 
FROM Dragon;

 * sqlite:///data/lec18_basic_examples.db
Done.


cuteness,birth
10,2010
-100,2011
0,2019


In [11]:
%%sql
SELECT name, year
FROM Dragon
WHERE cute > 0;

 * sqlite:///data/lec18_basic_examples.db
Done.


name,year
hiccup,2010


In [12]:
%%sql
SELECT name, cute, year
FROM Dragon
WHERE cute > 0 OR year > 2015;

 * sqlite:///data/lec18_basic_examples.db
Done.


name,cute,year
hiccup,10,2010
dragon 2,0,2019


In [13]:
%%sql
SELECT *
FROM Dragon
ORDER BY cute DESC;

 * sqlite:///data/lec18_basic_examples.db
Done.


name,year,cute
hiccup,2010,10
dragon 2,2019,0
drogon,2011,-100


In [14]:
%%sql
SELECT *
FROM Dragon
LIMIT 2;

 * sqlite:///data/lec18_basic_examples.db
Done.


name,year,cute
hiccup,2010,10
drogon,2011,-100


In [15]:
%%sql
SELECT *
FROM Dragon
LIMIT 2
OFFSET 1;

 * sqlite:///data/lec18_basic_examples.db
Done.


name,year,cute
drogon,2011,-100
dragon 2,2019,0


## Python-SQL combinations

Using the syntax

```python
%%sql PYTHON_VARIABLE <<
BODY OF QUERY
```

lets us capture the result of the SQL statement into a Python variable. [The docs provide more details](https://github.com/catherinedevlin/ipython-sql#assignment).

For example, let's run the same query above but storing the output into a variable named `drag2`:

In [16]:
%%sql drag2 << 
SELECT *
FROM Dragon
LIMIT 2
OFFSET 1;

 * sqlite:///data/lec18_basic_examples.db
Done.
Returning data to local variable drag2


We can now see this variable and what type it is. While it looks a lot like a data frame, it's not one:

In [17]:
drag2

name,year,cute
drogon,2011,-100
dragon 2,2019,0


In [18]:
type(drag2)

sql.run.ResultSet

### Pandas-SQL interplay

First, the above SQL results can be easily converted into true Pandas DataFrames, by calling their `DataFrame` method:

In [19]:
drag2.DataFrame()

Unnamed: 0,name,year,cute
0,drogon,2011,-100
1,dragon 2,2019,0


But we can also use Pandas to connect to the same database, in pure Python. Sometimes this may be more convenient than using the SQL magic syntax - in the homework you'll practice some of this as well. You should be familiar with both methods, and use the one that best fits your needs.

In [20]:
engine = sqlalchemy.create_engine("sqlite:///data/lec18_basic_examples.db")
connection = engine.connect()

In [22]:
query = """
SELECT *
FROM Dragon
LIMIT 2
OFFSET 1;
"""

pd.read_sql(query, connection)

Unnamed: 0,name,year,cute
0,drogon,2011,-100
1,dragon 2,2019,0


As you can see, both the direct pandas query, and converting `drag2` to a DataFrame, produce identical results (as they should!).

Finally, now that we have the `query` variable with the contents of our query, we can use it directly in the SQL magic too! It supports `{var}` syntax (as well as `$var` and will expand the python `var` variable into its value before making the query:

In [26]:
%%sql
{query}

 * sqlite:///data/lec18_basic_examples.db
Done.


name,year,cute
drogon,2011,-100
dragon 2,2019,0


In [27]:
%%sql
$query

 * sqlite:///data/lec18_basic_examples.db
Done.


name,year,cute
drogon,2011,-100
dragon 2,2019,0


In [33]:
query = """
SELECT *
FROM Dragon
LIMIT {limit}
OFFSET 1;
"""
print(query.format(limit=3))


SELECT *
FROM Dragon
LIMIT 3
OFFSET 1;



In [32]:
%%sql
{query.format(limit=1)}

 * sqlite:///data/lec18_basic_examples.db
Done.


name,year,cute
drogon,2011,-100


## GROUP BY operations

In [24]:
%%sql
SELECT *
FROM Dish;

 * sqlite:///data/lec18_basic_examples.db
Done.


name,type,cost
ravioli,entree,10
pork bun,entree,7
taco,entree,7
edamame,appetizer,4
fries,appetizer,4
potsticker,appetizer,4
ice cream,dessert,5


In [25]:
%%sql
SELECT type
FROM Dish;

 * sqlite:///data/lec18_basic_examples.db
Done.


type
entree
entree
entree
appetizer
appetizer
appetizer
dessert


In [26]:
%%sql
SELECT type
FROM Dish
GROUP BY type;

 * sqlite:///data/lec18_basic_examples.db
Done.


type
appetizer
dessert
entree


In [27]:
%%sql
SELECT type, MAX(cost)
FROM Dish
GROUP BY type;

 * sqlite:///data/lec18_basic_examples.db
Done.


type,MAX(cost)
appetizer,4
dessert,5
entree,10


In [28]:
%%sql
SELECT type, 
       SUM(cost), 
       MIN(cost),
       MAX(name)
FROM Dish
GROUP BY type;

 * sqlite:///data/lec18_basic_examples.db
Done.


type,SUM(cost),MIN(cost),MAX(name)
appetizer,12,4,potsticker
dessert,5,5,ice cream
entree,24,7,taco


## Trickier GROUP BY

In [29]:
%%sql
SELECT type, COUNT(cost)
FROM Dish
GROUP BY type;

 * sqlite:///data/lec18_basic_examples.db
Done.


type,COUNT(cost)
appetizer,3
dessert,1
entree,3


In [30]:
%%sql
SELECT type, COUNT(cost)
FROM Dish
GROUP BY type;

 * sqlite:///data/lec18_basic_examples.db
Done.


type,COUNT(cost)
appetizer,3
dessert,1
entree,3


In [31]:
%%sql
SELECT type, COUNT(*)
FROM Dish
GROUP BY type;

 * sqlite:///data/lec18_basic_examples.db
Done.


type,COUNT(*)
appetizer,3
dessert,1
entree,3


In [32]:
%%sql
SELECT type, cost
FROM Dish
GROUP BY type, cost;

 * sqlite:///data/lec18_basic_examples.db
Done.


type,cost
appetizer,4
dessert,5
entree,7
entree,10


In [34]:
%%sql
SELECT type, cost, COUNT(*) as size
FROM Dish
GROUP BY type, cost;

 * sqlite:///data/lec18_basic_examples.db
Done.


type,cost,size
appetizer,4,3
dessert,5,1
entree,7,2
entree,10,1


Remember you can rename columns if you want, so instead of seeing a column named `COUNT(*)`, we get something more descriptive:

In [34]:
%%sql
SELECT type, cost, COUNT(*) as size
FROM Dish
GROUP BY type, cost;

 * sqlite:///data/lec18_basic_examples.db
Done.


type,cost,size
appetizer,4,3
dessert,5,1
entree,7,2
entree,10,1


In [35]:
%%sql
SELECT type, COUNT(*)
FROM Dish
GROUP BY type;

 * sqlite:///data/lec18_basic_examples.db
Done.


type,COUNT(*)
appetizer,3
dessert,1
entree,3


In [36]:
%%sql
SELECT type, COUNT(*)
FROM Dish
GROUP BY type
HAVING MAX(cost) < 8

 * sqlite:///data/lec18_basic_examples.db
Done.


type,COUNT(*)
appetizer,3
dessert,1


In [37]:
%%sql
SELECT type, COUNT(*)
FROM Dish
WHERE cost < 8
GROUP BY type;

 * sqlite:///data/lec18_basic_examples.db
Done.


type,COUNT(*)
appetizer,3
dessert,1
entree,2


## DISTINCT

In [38]:
%%sql
SELECT DISTINCT cost
FROM Dish;

 * sqlite:///data/lec18_basic_examples.db
Done.


cost
10
7
4
5


In [39]:
%%sql
SELECT DISTINCT type 
FROM Dish
WHERE cost < 9;

 * sqlite:///data/lec18_basic_examples.db
Done.


type
entree
appetizer
dessert


In [40]:
%%sql
SELECT DISTINCT type, cost 
FROM Dish
WHERE cost < 9;

 * sqlite:///data/lec18_basic_examples.db
Done.


type,cost
entree,7
appetizer,4
dessert,5


In [41]:
%%sql
SELECT DISTINCT type, cost 
FROM Dish;

 * sqlite:///data/lec18_basic_examples.db
Done.


type,cost
entree,10
entree,7
appetizer,4
dessert,5


In [42]:
%%sql
SELECT type, AVG(cost)
FROM Dish
GROUP BY type;

 * sqlite:///data/lec18_basic_examples.db
Done.


type,AVG(cost)
appetizer,4.0
dessert,5.0
entree,8.0


In [43]:
%%sql
SELECT type, AVG(DISTINCT cost)
FROM Dish
GROUP BY type;

 * sqlite:///data/lec18_basic_examples.db
Done.


type,AVG(DISTINCT cost)
appetizer,4.0
dessert,5.0
entree,8.5
