In [1]:
import pandas as pd
from sqlalchemy import create_engine

engine = create_engine('postgres+psycopg2://postgres:pass1234@localhost:5432/nutrition')

# `SELECT` and `FROM`
---

SQL is used to pull specific data from a database. The two primary clauses that must be present in every query are `SELECT`, and `FROM`.

- `SELECT` allows you to select a subset of columns from a table
- `FROM`: Since there are often many tables in a database, it's important to specify which table you're querying. 

```SQL
-- Returns all columns from the users table
SELECT * 
FROM users;
```

**Challenge: Show me all the columns from the `restaurants` table.**

<details>
    <summary>ERD</summary>
    <img src="../erd.png">
</details>

In [2]:
sql = """
SELECT *
FROM restaurants
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,id,name,inserted_at,updated_at,slug,url
0,1,A&amp;W Restaurants,2020-01-17 21:14:18,2020-01-17 21:14:18,aw-restaurants,http://www.awrestaurants.com/
1,2,Applebee's,2020-01-17 21:14:18,2020-01-17 21:14:18,applebees,http://www.applebees.com/
2,3,Arby's,2020-01-17 21:14:18,2020-01-17 21:14:18,arbys,http://www.arbys.com/
3,4,Atlanta Bread Company,2020-01-17 21:14:18,2020-01-17 21:14:18,atlanta-bread-company,http://www.atlantabread.com/
4,5,Bojangle's Famous Chicken 'n Biscuits,2020-01-17 21:14:18,2020-01-17 21:14:18,bojangles-famous-chicken-n-biscuits,http://www.bojangles.com


We can also get a subset of the columns from a given table.

```SQL
-- Returns the name and salary columns from the users table
SELECT name, salary
FROM users;
```


**Challenge: Show me the `name` and `calories` columns from the `foods` table**

In [3]:
sql = """
SELECT name, calories 
FROM foods;
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,name,calories
0,Boneless Buffalo Wings w/ Bleu Cheese,1490
1,Bottomless Tostada Chips w/ Salsa,1020
2,Classic Nachos - Beef (12),1720
3,Classic Nachos - Beef (8),1170
4,Classic Nachos - Chicken (12),1670


**Challenge: Give me the names of all the categories**

In [4]:
sql = """
SELECT name
FROM categories
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,name
0,Appetizers
1,Bread
2,Breads
3,Breakfast
4,Breakfast Entrees


# Namespacing
---

Sometimes you see the columns prefixed by their corresponding table. This is overkill when you’re just querying, one table, but becomes important once you query from multiple tables.
```SQL
SELECT users.name, users.salary 
FROM users;
```

**Challenge: Show me the `name` and `calories` columns from the `foods` table**

<details>
    <summary>ERD</summary>
    <img src="../erd.png">
</details>

In [5]:
sql = """
SELECT foods.name, foods.calories
FROM foods
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,name,calories
0,Boneless Buffalo Wings w/ Bleu Cheese,1490
1,Bottomless Tostada Chips w/ Salsa,1020
2,Classic Nachos - Beef (12),1720
3,Classic Nachos - Beef (8),1170
4,Classic Nachos - Chicken (12),1670


**Challenge: Give me the names of all the `categories` using namespacing.**

In [6]:
sql = """
SELECT categories.name
FROM categories
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,name
0,Appetizers
1,Bread
2,Breads
3,Breakfast
4,Breakfast Entrees


You can also namespace a wildcard.
```SQL
SELECT users.* 
FROM users;
```

**Challenge: Give me every column from the `foods` table. Use namespacing.**

In [7]:
sql = """
SELECT foods.*
FROM foods
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,id,name,calories,carbs,fat,restaurant_id,inserted_at,updated_at
0,1,Boneless Buffalo Wings w/ Bleu Cheese,1490,94,88,12,2020-01-17 21:14:22,2020-01-17 21:14:22
1,2,Bottomless Tostada Chips w/ Salsa,1020,125,51,12,2020-01-17 21:14:22,2020-01-17 21:14:22
2,3,Classic Nachos - Beef (12),1720,86,108,12,2020-01-17 21:14:22,2020-01-17 21:14:22
3,4,Classic Nachos - Beef (8),1170,59,74,12,2020-01-17 21:14:22,2020-01-17 21:14:22
4,5,Classic Nachos - Chicken (12),1670,83,103,12,2020-01-17 21:14:22,2020-01-17 21:14:22


# Aliasing
---

Writing out the same table can get pretty cumbersome. Thankfully we can give our tables a _temporary_ (and hopefully shorter) name. This is called aliasing.

```SQL
SELECT u.name, u.salary 
FROM users AS u;
```

**Challenge: Query the `name` and `calories` columns from the `foods` table, and rename the `foods` table to `f`**

<details>
    <summary>ERD</summary>
    <img src="../erd.png">
</details>

In [8]:
sql = """
SELECT f.*
FROM foods AS f
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,id,name,calories,carbs,fat,restaurant_id,inserted_at,updated_at
0,1,Boneless Buffalo Wings w/ Bleu Cheese,1490,94,88,12,2020-01-17 21:14:22,2020-01-17 21:14:22
1,2,Bottomless Tostada Chips w/ Salsa,1020,125,51,12,2020-01-17 21:14:22,2020-01-17 21:14:22
2,3,Classic Nachos - Beef (12),1720,86,108,12,2020-01-17 21:14:22,2020-01-17 21:14:22
3,4,Classic Nachos - Beef (8),1170,59,74,12,2020-01-17 21:14:22,2020-01-17 21:14:22
4,5,Classic Nachos - Chicken (12),1670,83,103,12,2020-01-17 21:14:22,2020-01-17 21:14:22


**Challenge: Give me the names of all the `restaurants`, and alias the table**

In [9]:
sql = """
SELECT r.name
FROM restaurants AS r
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,name
0,A&amp;W Restaurants
1,Applebee's
2,Arby's
3,Atlanta Bread Company
4,Bojangle's Famous Chicken 'n Biscuits


Recall that the `foods`, `restaurants` and `categories` tables all have a `name` column. When we start combining tables into a single query, we might want to give each name column an alias as well.
```SQL
-- Returns the name column from users table. "name" is renamed to "user"
SELECT u.name AS user
FROM users AS u;
```

**Show me the `name` (temporarily renamed to food) and `calories` columns from the `foods` table**

In [10]:
sql = """
SELECT f.name AS food, f.calories
FROM foods AS f
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,food,calories
0,Boneless Buffalo Wings w/ Bleu Cheese,1490
1,Bottomless Tostada Chips w/ Salsa,1020
2,Classic Nachos - Beef (12),1720
3,Classic Nachos - Beef (8),1170
4,Classic Nachos - Chicken (12),1670


**Challenge: Give me all the category names, rename the `name` column to `category`**

In [11]:
sql = """
SELECT c.name AS category
FROM categories AS c
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,category
0,Appetizers
1,Bread
2,Breads
3,Breakfast
4,Breakfast Entrees


# `DISTINCT`
---

`DISTINCT` returns a list of **unique** values from a given column:
```SQL
-- Returns the unique universities represented by all the users
SELECT DISTINCT u.university
FROM users u;
```

**Show me the unique names (renamed to `food`) from the `foods` table.**

<details>
    <summary>ERD</summary>
    <img src="../erd.png">
</details>

In [12]:
sql = """
SELECT DISTINCT f.name AS food
FROM foods AS f
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,food
0,Hash Browns - small
1,French Fries - small (Salted)
2,Brewed Tea- Black
3,Cheesy @ the Plate (add side)
4,Wings with Hot Sauce


**Challenge: From the `foods` table, query the unique values from the `restaurant_id` column**

In [13]:
sql = """
SELECT DISTINCT f.restaurant_id
FROM foods AS f
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,restaurant_id
0,42
1,29
2,4
3,34
4,41


# `ORDER BY`
---

Sometimes it makes sense to order your query on a certain column. For example, we might want to get a list of users sorted alphabetically:
```SQL
SELECT users.name
FROM users
ORDER BY users.name ASC;
```

**Show me the `name` (temporarily renamed to `food`) and `calories` columns from the `foods` table, ordered from most caloric to least**

<details>
    <summary>ERD</summary>
    <img src="../erd.png">
</details>

In [14]:
sql = """
SELECT f.name AS foods, f.calories
FROM foods AS f
ORDER BY f.calories DESC
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,foods,calories
0,20 piece & 10 biscuit box serves 9-11),8820
1,12 piece & 6 biscuit box (serves 5-7),5300
2,Country Large Oblong - VG (whole),4220
3,White Rye Oblong - VG (whole),3540
4,8 piece & 4 biscuit box (serves 3-5),3534


**Give me all the `categories` in reverse alphabetical order.**

In [15]:
sql = """
SELECT c.name
FROM categories c
ORDER BY c.name DESC
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,name
0,Wraps
1,Wings
2,Wing Sauces
3,Tortillas
4,Toppings


**Retrieve all the columns from `foods`, from most `carbs` to least**

In [16]:
sql = """
SELECT f.*
FROM foods f
ORDER BY f.carbs DESC
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,id,name,calories,carbs,fat,restaurant_id,inserted_at,updated_at
0,3573,Chili - Bowl,710,881,24,16,2020-01-17 21:14:36,2020-01-17 21:14:36
1,1815,Country Large Oblong - VG (whole),4220,862,16,15,2020-01-17 21:14:29,2020-01-17 21:14:29
2,533,Whopper JR.® Sandwich with Cheese w/o Mayo,300,660,14,7,2020-01-17 21:14:24,2020-01-17 21:14:24
3,1821,White Rye Oblong - VG (whole),3540,649,59,15,2020-01-17 21:14:29,2020-01-17 21:14:29
4,1555,The Commuter Croissant (Egg Whites),600,600,600,15,2020-01-17 21:14:28,2020-01-17 21:14:28


**What food has the most fat?**

In [17]:
sql = """
SELECT f.*
FROM foods f
ORDER BY f.fat DESC
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,id,name,calories,carbs,fat,restaurant_id,inserted_at,updated_at
0,1555,The Commuter Croissant (Egg Whites),600,600,600,15,2020-01-17 21:14:28,2020-01-17 21:14:28
1,2432,20 piece & 10 biscuit box serves 9-11),8820,545,540,5,2020-01-17 21:14:32,2020-01-17 21:14:32
2,1554,Ham & Swiss Panini (Whole Eggs),490,490,490,15,2020-01-17 21:14:28,2020-01-17 21:14:28
3,1556,Anaheim Panini (Egg Whites),470,470,470,15,2020-01-17 21:14:28,2020-01-17 21:14:28
4,1557,Chicken Apple Sausage Panini (Egg Whites),460,460,460,15,2020-01-17 21:14:28,2020-01-17 21:14:28


You can order on multiple columns. Priority is given from left to right in your `ORDER BY` clause.

```SQL
-- Returns users from oldest to youngest. If they have the same age, they will then be sorted alphabetically
SELECT u.name, u.age
FROM users u
ORDER BY u.age DESC, u.name ASC
```

**Give me the `restaurant_id` (renamed to `rid`), `name` and `calories` from `foods`. Order first by `restaurant_id` from smallest to biggest, then by `calories` from biggest to smallest**

In [18]:
sql = """
SELECT f.restaurant_id AS rid, f.name, f.calories
FROM foods f
ORDER BY f.restaurant_id ASC, f.calories DESC
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,rid,name,calories
0,1,A&W® Root Beer,1760
1,1,Reese's® Peanut Butter Fudge Blendrrr,1360
2,1,Cheese Curds (Large),1140
3,1,Chocolate Fudge Blendrrr,1010
4,1,Orange Freeze (medium),970


**Give me the `calories` and `name` from the `foods` table. Ordered first by `calories` in descending order, then by `name` in alphabetical order**

In [19]:
sql = """
SELECT f.name, f.calories
FROM foods f
ORDER BY f.calories DESC, f.name ASC
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,name,calories
0,20 piece & 10 biscuit box serves 9-11),8820
1,12 piece & 6 biscuit box (serves 5-7),5300
2,Country Large Oblong - VG (whole),4220
3,White Rye Oblong - VG (whole),3540
4,8 piece & 4 biscuit box (serves 3-5),3534


# `LIMIT`
---

Rather than returning ALL rows from a given table, you might only want a subset. This can be achieved with the `LIMIT` command.

```SQL
-- Returns the top ten highest paid users
SELECT users.name, users.salary
FROM users
ORDER BY users.salary DESC
LIMIT 10
```

**Give me the `name` and `calories` of the 20 most caloric items from the `foods` table.**

<details>
    <summary>ERD</summary>
    <img src="../erd.png">
</details>

In [20]:
sql = """
SELECT f.name, f.calories
FROM foods f
ORDER BY f.calories DESC
LIMIT 20
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,name,calories
0,20 piece & 10 biscuit box serves 9-11),8820
1,12 piece & 6 biscuit box (serves 5-7),5300
2,Country Large Oblong - VG (whole),4220
3,White Rye Oblong - VG (whole),3540
4,8 piece & 4 biscuit box (serves 3-5),3534


**What are the top ten most fatty foods?**

In [21]:
sql = """
SELECT f.*
FROM foods f
ORDER BY f.fat DESC
LIMIT 10
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,id,name,calories,carbs,fat,restaurant_id,inserted_at,updated_at
0,1555,The Commuter Croissant (Egg Whites),600,600,600,15,2020-01-17 21:14:28,2020-01-17 21:14:28
1,2432,20 piece & 10 biscuit box serves 9-11),8820,545,540,5,2020-01-17 21:14:32,2020-01-17 21:14:32
2,1554,Ham & Swiss Panini (Whole Eggs),490,490,490,15,2020-01-17 21:14:28,2020-01-17 21:14:28
3,1556,Anaheim Panini (Egg Whites),470,470,470,15,2020-01-17 21:14:28,2020-01-17 21:14:28
4,1557,Chicken Apple Sausage Panini (Egg Whites),460,460,460,15,2020-01-17 21:14:28,2020-01-17 21:14:28


# `WHERE`
---

One of the more important skills in SQL is the ability to filter your queries based a certain condition. This is accomplished with the `WHERE` command.

```SQL
-- Returns all users who make less than $30k
SELECT users.*
FROM users
WHERE users.salary < 30000
```

Numerical filters are similar to python:
- Greater than: `>`
- Greater than or equal to: `>=`
- Less than: `<`
- Less than or equal to: `<=`
- Equal to: `=`

**Give me the `name` and `calories` of `foods` with more than 1,000 calories.**

<details>
    <summary>ERD</summary>
    <img src="../erd.png">
</details>

In [22]:
sql = """
SELECT f.name, f.calories
FROM foods f
WHERE f.calories > 1000
ORDER BY f.calories
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,name,calories
0,Parmesan Shrimp Pasta (with linguini),1001
1,Bleu Ribbon Burger,1001
2,Southwestern Jalapeño Burger,1005
3,Bella Chicken Crepê,1007
4,Bacon Cheeseburger*,1007


**What are the top 10 least caloric foods over 100 calories?**

In [23]:
sql = """
SELECT f.name, f.calories
FROM foods f
WHERE f.calories > 100
ORDER BY f.calories
LIMIT 10
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,name,calories
0,Fried Mozzarella,101
1,Chicken Tenders Basket,102
2,Peach Iced Tea,103
3,Raspberry Iced Tea,103
4,Catfish,105


# `AND`/`OR`
---

Multiple `WHERE` clauses can be chained together with `AND` and `OR`, similar to chaining multiple boolean expressions in Python.

```SQL
-- Returns all users under the age of 30 who are making more than $100k
SELECT users.*
FROM users
WHERE users.age < 30
AND users.salary > 100000
```

**Give me the `name`, `calories` and `carbs` from all `foods` over 1,000 calories or having _at least_ 30g of carbs.**

<details>
    <summary>ERD</summary>
    <img src="../erd.png">
</details>

In [24]:
sql = """
SELECT f.name, f.calories, f.carbs
FROM foods f
WHERE f.calories > 1000
OR f.carbs >= 30
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,name,calories,carbs
0,Boneless Buffalo Wings w/ Bleu Cheese,1490,94
1,Bottomless Tostada Chips w/ Salsa,1020,125
2,Classic Nachos - Beef (12),1720,86
3,Classic Nachos - Beef (8),1170,59
4,Classic Nachos - Chicken (12),1670,83


**Give me all foods from Jimmy Johns (`restaurant_id` = 27) with 700 or more calories**

In [25]:
sql = """
SELECT f.*
FROM foods f
WHERE f.restaurant_id = 27
AND f.calories >= 700
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,id,name,calories,carbs,fat,restaurant_id,inserted_at,updated_at


# `BETWEEN`
---

You can also filter numerically based on a range with `BETWEEN`.

```SQL
-- Returns all users between 18 and 25 years old (inclusive)
SELECT users.*
FROM users
WHERE users.age BETWEEN 18 AND 25
```

**Give me the `name` and `calories` of `foods` that are between 0 and 10 calories (inclusive)**

<details>
    <summary>ERD</summary>
    <img src="../erd.png">
</details>

In [26]:
sql = """
SELECT f.name, f.calories
FROM foods f
WHERE f.calories BETWEEN 0 AND 10
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,name,calories
0,Buffalo Dipping Sauce Adds,10
1,Coffee,0
2,Desert Heat® Based on 1/2 tsp,8
3,Fresh Tomato Salsa,5
4,A&W® Diet Root Beer,0


# Filtering on String Values
---

When filtering on a string value, use single quotes:

```SQL
SELECT users.*
FROM users
WHERE users.role = 'admin'
```

**Find McDonald's from the `restaurants` table. NOTE: Apostrophes are represented with two single quotes ('')**

<details>
    <summary>ERD</summary>
    <img src="../erd.png">
</details>

In [27]:
sql = """
SELECT r.*
FROM restaurants r
WHERE r.name = 'McDonald''s'
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,id,name,inserted_at,updated_at,slug,url
0,30,McDonald's,2020-01-17 21:14:18,2020-01-17 21:14:18,mcdonalds,http://www.mcdonalds.com/us/en/home.html


# Wildcards
---

We can use `LIKE` and wildcards (%) to broaden our string filters:

```SQL
-- Returns all users whose name begins with a capital "R"
SELECT users.*
FROM users
WHERE users.name LIKE 'R%'
```

```SQL
-- Returns all users whose name ends with a capital "R"
SELECT users.*
FROM users
WHERE users.name LIKE '%R'
```

```SQL
-- Returns all users with a capital "R" somewhere in the name
SELECT users.*
FROM users
WHERE users.name LIKE '%R%'
```

Notes:
- `ILIKE` is a case insensitive `LIKE`. 
- You can negate a `LIKE` with `NOT LIKE`.
- _In this lecture, all wildcards need to be double parentheses_ (`%%`).

**Find all the Whoppers in the `foods` table.**

<details>
    <summary>ERD</summary>
    <img src="../erd.png">
</details>

In [28]:
sql = """
SELECT f.*
FROM foods f
WHERE f.name ILIKE '%%Whopper%%'
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,id,name,calories,carbs,fat,restaurant_id,inserted_at,updated_at
0,519,Double Whopper® Sandwich w/o Mayo,740,51,39,7,2020-01-17 21:14:24,2020-01-17 21:14:24
1,529,Mustard Whopper® Sandwich,530,52,23,7,2020-01-17 21:14:24,2020-01-17 21:14:24
2,520,Double Whopper® Sandwich with Cheese,990,53,65,7,2020-01-17 21:14:24,2020-01-17 21:14:24
3,530,Whopper JR.® Sandwich,340,28,19,7,2020-01-17 21:14:24,2020-01-17 21:14:24
4,521,Double Whopper® Sandwich with Cheese w/o Mayo,830,53,47,7,2020-01-17 21:14:24,2020-01-17 21:14:24


# Null values
---

Databases can have null values (our equivalent to `NaN`'s). You can filter them like so:

```SQL
-- Returns all users with no name
SELECT users.*
FROM users
WHERE users.name IS NULL
```

```SQL
-- Returns all users with a value for their name
SELECT users.*
FROM users
WHERE users.name IS NOT NULL
```

**Give me everything from the foods table with null values in the `calories` column.**


<details>
    <summary>ERD</summary>
    <img src="../erd.png">
</details>

In [29]:
sql = """
SELECT f.*
FROM foods f
WHERE f.calories IS NULL
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,id,name,calories,carbs,fat,restaurant_id,inserted_at,updated_at


# Joining Tables
---

It's very common to want to combine information from multiple tables into one query. 

For example, we might want to find the user's name for a given blog post. We can do this by joining. 

```SQL
-- Returns all blog posts with the associated user name. 
-- A user has many posts, and a post belongs to a user. Therefore the foreign key (user_id)
-- is on the child table: posts.
SELECT posts.*, users.name
FROM posts
INNER JOIN users ON users.id = posts.user_id
```

There are several types of joins:
- Inner join
- Left/Right join
- Left/Right outer join
- Unions

NOTE: The most common join is the inner join.

**Give me the names of every food item with their associated restaurant. Be sure to alias the columns.**

<details>
    <summary>ERD</summary>
    <img src="../erd.png">
</details>

In [30]:
sql = """
SELECT f.name AS food, r.name AS restaurant
FROM foods f
INNER JOIN restaurants r ON r.id = f.restaurant_id
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,food,restaurant
0,Boneless Buffalo Wings w/ Bleu Cheese,Chili's
1,Bottomless Tostada Chips w/ Salsa,Chili's
2,Classic Nachos - Beef (12),Chili's
3,Classic Nachos - Beef (8),Chili's
4,Classic Nachos - Chicken (12),Chili's


**Give me the names of every food item from Burger King.**

In [31]:
sql = """
SELECT f.*, r.name AS restaurant
FROM foods f
INNER JOIN restaurants r ON r.id = f.restaurant_id
WHERE r.name = 'Burger King'
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,id,name,calories,carbs,fat,restaurant_id,inserted_at,updated_at,restaurant
0,519,Double Whopper® Sandwich w/o Mayo,740,51,39,7,2020-01-17 21:14:24,2020-01-17 21:14:24,Burger King
1,529,Mustard Whopper® Sandwich,530,52,23,7,2020-01-17 21:14:24,2020-01-17 21:14:24,Burger King
2,539,Bacon Cheeseburger,330,28,16,7,2020-01-17 21:14:24,2020-01-17 21:14:24,Burger King
3,549,Original Chicken Sandwich,630,46,39,7,2020-01-17 21:14:24,2020-01-17 21:14:24,Burger King
4,559,Chicken Tenders® (20 pc),950,50,55,7,2020-01-17 21:14:24,2020-01-17 21:14:24,Burger King


**Give me the names of every food item with their associated category.**

In [32]:
sql = """
SELECT f.name AS food, c.name AS category
FROM foods f
INNER JOIN categories_foods cf ON cf.food_id = f.id
INNER JOIN categories c ON c.id = cf.category_id
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,food,category
0,Boneless Buffalo Wings w/ Bleu Cheese,Appetizers
1,Bottomless Tostada Chips w/ Salsa,Appetizers
2,Classic Nachos - Beef (12),Appetizers
3,Classic Nachos - Beef (8),Appetizers
4,Classic Nachos - Chicken (12),Appetizers


**Give me the `name`, `restaurant`, `category` and `calories` of every food item. Sorted alphabetically by `restaurant`, then by `calories` in descending order.**

In [33]:
sql = """
SELECT f.name AS food, f.calories, c.name AS category, r.name AS restaurant
FROM foods f
INNER JOIN categories_foods cf ON cf.food_id = f.id
INNER JOIN categories c ON c.id = cf.category_id
INNER JOIN restaurants r ON r.id = f.restaurant_id
ORDER BY r.name, f.calories DESC
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,food,calories,category,restaurant
0,A&W® Root Beer,1760,Drinks,A&amp;W Restaurants
1,Reese's® Peanut Butter Fudge Blendrrr,1360,Desserts,A&amp;W Restaurants
2,Cheese Curds (Large),1140,Sides,A&amp;W Restaurants
3,Chocolate Fudge Blendrrr,1010,Desserts,A&amp;W Restaurants
4,Orange Freeze (medium),970,Shakes,A&amp;W Restaurants


**Give me the names of every food item in the Desserts category.**

In [34]:
sql = """
SELECT f.name AS food, c.name AS category
FROM foods f
INNER JOIN categories_foods cf ON cf.food_id = f.id
INNER JOIN categories c ON c.id = cf.category_id
WHERE c.name = 'Desserts'
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,food,category
0,Brownie Sundae,Desserts
1,Cheesecake,Desserts
2,Chocolate Chip Paradise Pie,Desserts
3,Molten Chocolate Cake,Desserts
4,Soft Serve Cone,Desserts


**Give me the `name`, `restaurant` and `calories` of the top ten most caloric "Kid's Meals" (this is a category)**

In [35]:
sql = """
SELECT f.name AS food, f.calories, c.name AS category, r.name AS restaurant
FROM foods f
INNER JOIN categories_foods cf ON cf.food_id = f.id
INNER JOIN categories c ON c.id = cf.category_id
INNER JOIN restaurants r ON r.id = f.restaurant_id
WHERE c.name = 'Kid''s Meals'
ORDER BY f.calories DESC
LIMIT 10
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,food,calories,category,restaurant
0,The Lorax's Breakfast with (2) Truffula Chip P...,1150,Kid's Meals,IHOP
1,Captain's Catch,1060,Kid's Meals,Joe's Crab Shack
2,The Lorax's Breakfast with (2) Buttermilk Panc...,1060,Kid's Meals,IHOP
3,The Lorax's Breakfast with (1) Rooty Tooty Bar...,1050,Kid's Meals,IHOP
4,Peanut Butter & Jelly Sandwich on Harvest Brea...,810,Kid's Meals,Corner Bakery Cafe


# Aggregating
---

Sometimes we might want to reduce our query to a single value. For example, we may want to know how many users our in our database:

```SQL
SELECT COUNT(users.id)
FROM users
```

The `COUNT` in the previous query is an aggregate function. The most common aggregate functions are:

- `COUNT`
- `AVG`
- `MIN`
- `MAX`
- `SUM`

**What total number of calories in the entire `foods` table?**

<details>
    <summary>ERD</summary>
    <img src="../erd.png">
</details>

In [36]:
sql = """
SELECT SUM(f.calories)
FROM foods f
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,sum
0,2626332


# `GROUP BY`
---

Often we’ll want to group our data into buckets and then run some sort of aggregate function. 

```SQL
-- Returns how much each user spends on average
SELECT users.id, AVG(payments.amount)
FROM users
INNER JOIN payments ON payments.user_id = user.id
GROUP BY users.id;
```

NOTE: Every column you're returning that isn't being aggregated needs to be in the `GROUP BY` clause:

```SQL
-- Returns how much each user spends on average
SELECT users.id, users.name, AVG(payments.amount)
FROM users
INNER JOIN payments ON payments.user_id = user.id
GROUP BY users.id, users.name
```

**Return the average number of calories for each restaurant, sorted alphabetically**

<details>
    <summary>ERD</summary>
    <img src="../erd.png">
</details>

In [37]:
sql = """
SELECT r.name, AVG(f.calories)
FROM foods f
INNER JOIN restaurants r ON r.id = f.restaurant_id
GROUP BY r.name
ORDER BY r.name
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,name,avg
0,A&amp;W Restaurants,468.278689
1,Applebee's,684.75
2,Arby's,419.893617
3,Atlanta Bread Company,450.697674
4,Bojangle's Famous Chicken 'n Biscuits,745.068493


# `HAVING`
---

Sometimes you might want to use the result of an aggregate function as a filter. We can do this with `HAVING`, which is similar to `WHERE` but for aggregates:

```SQL
-- Returns the users who average more than $1,000 per purchase
SELECT users.id, users.name, AVG(payments.amount)
FROM users
INNER JOIN payments ON payments.user_id = users.id
GROUP BY users.id, users.name
HAVING AVG(payments.amount) > 1000;
```

**Give me the `name` and average `calories` for all `restaurants` with more than 700 calories per menu item on average.**

<details>
    <summary>ERD</summary>
    <img src="../erd.png">
</details>

In [38]:
sql = """
SELECT r.name, AVG(f.calories)
FROM foods f
INNER JOIN restaurants r ON r.id = f.restaurant_id
GROUP BY r.name
HAVING AVG(f.calories) > 700
ORDER BY r.name
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,name,avg
0,Bojangle's Famous Chicken 'n Biscuits,745.068493
1,Chili's,718.771429
2,Joe's Crab Shack,871.293103


**What `categories` average more than 750 `calories` per item? Order your results from most caloric to least.**

In [39]:
sql = """
SELECT c.name, AVG(f.calories)
FROM foods f
INNER JOIN categories_foods cf ON cf.food_id = f.id
INNER JOIN categories c ON c.id = cf.category_id
GROUP BY c.name
HAVING AVG(f.calories) > 750
ORDER BY AVG(f.calories) DESC
"""

df = pd.read_sql_query(sql, engine)
df.head()

Unnamed: 0,name,avg
0,Ribs,1499.0
1,Appetizers,798.038298
2,Seafood,789.284314
3,Burgers,771.37037
