# SQL I

Introducing SQL and databases.

## Starting Up SQL

Before we look at SQL syntax in detail, let's first get ourselves set up to run SQL queries in Jupyter.

#### Approach: `pd.read_sql`

It turns out that `pandas` has a special-purpose function to parse SQL queries. We can pass in a SQL query as a string to return a `pandas` DataFrame. To achieve the same result as we did using cell magic above, we can do the following.

**1. Connect to a database**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import sqlalchemy
import pandas as pd

engine = sqlalchemy.create_engine("sqlite:////content/basic_examples.db")
connection = engine.connect()

**2. Run a simple SQL query**

In [None]:
query = """
SELECT *
FROM Dragon;
"""

pd.read_sql(query, engine)

Unnamed: 0,name,year,cute
0,hiccup,2010,10
1,drogon,2011,-100
2,dragon 2,2019,0


## Basic Queries

Every SQL query *must* contain a `SELECT` and `FROM` clause.

* `SELECT`: specify the column(s) to return in the output
* `FROM`: specify the database table from which to extract data

**Question:** Select all columns from the **Dragon** table.

In [None]:
pd.read_sql("select *  from dragon", connection)

Unnamed: 0,name,year,cute
0,hiccup,2010,10
1,drogon,2011,-100
2,dragon 2,2019,0


**Question:** Select columns **cute** and **year** from the **Dragon** table.

In [None]:
pd.read_sql('select cute,year from dragon', connection)

Unnamed: 0,cute,year
0,10,2010
1,-100,2011
2,0,2019


**Aliasing** with `AS`

**Question:** Repeat the last exercise with aliasing.

In [None]:
pd.read_sql('select cute as Cutness ,year as Birth from Dragon', connection)

Unnamed: 0,Cutness,Birth
0,10,2010
1,-100,2011
2,0,2019


**Uniqueness** with `DISTINCT`

**Question:** Select all the unique years in the **Dragon** table.

In [None]:
pd.read_sql('select distinct year from dragon',connection)

Unnamed: 0,year
0,2010
1,2011
2,2019


**Filtering** with `WHERE`

**Question:** Select the **name** and **year** columns from the **Dragon** table such that the cute value is greater than 0.

In [None]:
pd.read_sql('select name, year from dragon where cute > 0 ', connection)

Unnamed: 0,name,year
0,hiccup,2010


**Question:** Select the **name**, **cute** and **year** columns from the **Dragon** table such that the cute value is greater than 0 or the year is greater than 2013.

In [None]:
pd.read_sql('select  name , cute , year  from dragon where cute > 0 or year >2013', connection)

Unnamed: 0,name,cute,year
0,hiccup,10,2010
1,dragon 2,0,2019


**Question:** Select the **name** and **year** columns from the **Dragon** table such that the name is either 'puff' or 'hiccup'.

In [None]:
pd.read_sql("select name , year from dragon where name in ('puff' , 'hiccup')", connection)

Unnamed: 0,name,year
0,hiccup,2010


**Question:** Get the name and cute value of all dragons whose cute value is not null.

In [None]:
pd.read_sql('select name , cute from dragon where cute is Not Null', connection)

Unnamed: 0,name,cute
0,hiccup,10
1,drogon,-100
2,dragon 2,0


**Ordering** data using `ORDER BY`

**Question:** Sort the **Dragon** table in descending order of cuteness.

In [None]:
pd.read_sql('select *  from dragon order by cute  desc', connection)

Unnamed: 0,name,year,cute
0,hiccup,2010,10
1,dragon 2,2019,0
2,drogon,2011,-100


**Restricting** output with `LIMIT` and `OFFSET`

**Question:** Query the first two rows of the **Dragon** table.

In [None]:
pd.read_sql('select * from dragon limit 2', connection)

Unnamed: 0,name,year,cute
0,hiccup,2010,10
1,drogon,2011,-100


**Question:** Query the two rows after the first row of the **Dragon** table.

In [None]:
pd.read_sql('select * from dragon limit 2 offset 1 ', connection)

Unnamed: 0,name,year,cute
0,drogon,2011,-100
1,dragon 2,2019,0


## Grouping Data with `GROUP BY`

**Question:** Get all rows and columns of the **Dish** table.

In [None]:
pd.read_sql('select * from dish', connection)

Unnamed: 0,name,type,cost
0,ravioli,entree,10
1,ramen,entree,7
2,taco,entree,7
3,edamame,appetizer,4
4,fries,appetizer,4
5,potsticker,appetizer,4
6,ice cream,dessert,5


A small note: the fact that `type` is highlighted in green below is a consequence of Jupyter assuming that we are writing Python code (where `type` is a built-in keyword). `type` does *not* have a special meaning in SQL, so the color below does not indicate any special functionality. When we run the cell, Jupyter realizes it should recognize the code as SQL.

**Question:** Select the **type** column of the **Dish** table.

In [None]:
pd.read_sql("select * from dish ", connection)

Unnamed: 0,name,type,cost
0,ravioli,entree,10
1,ramen,entree,7
2,taco,entree,7
3,edamame,appetizer,4
4,fries,appetizer,4
5,potsticker,appetizer,4
6,ice cream,dessert,5


**Question:** Get all the dish types using GROUP BY.

In [None]:
pd.read_sql('select type from dish group by type' , connection)

Unnamed: 0,type
0,appetizer
1,dessert
2,entree


**Question:** Query the total cost of each type of dish.

In [None]:
pd.read_sql('select type , sum(cost) as Total_Cost from dish group by type', connection)

Unnamed: 0,type,Total_Cost
0,appetizer,12
1,dessert,5
2,entree,24


**Question:** Query the total cost, the minimum cost and the name of the most expensive dish of each type.

In [None]:
pd.read_sql('select type, name, MIN(cost) from Dish GROUP BY type', connection)

Unnamed: 0,type,name,MIN(cost)
0,appetizer,edamame,4
1,dessert,ice cream,5
2,entree,ramen,7


In [None]:
pd.read_sql('select type, name, MAX(cost) from Dish GROUP BY type', connection)

Unnamed: 0,type,name,MAX(cost)
0,appetizer,edamame,4
1,dessert,ice cream,5
2,entree,ravioli,10


**Question:** Count the number of rows in each year in the **Dragon** table.

In [None]:
pd.read_sql('select year ,count(*) as row_count from dragon group by year ', connection)

Unnamed: 0,year,row_count
0,2010,1
1,2011,1
2,2019,1


**Question:** Count the number of rows - including the rows with NULLs - in each year in the **Dragon** table.

In [None]:
pd.read_sql('select COUNT(*) from Dragon', connection)

Unnamed: 0,COUNT(*)
0,3
