# SQL tutorial using a database of drinks and bars

## Things we will be doing:
- Creating a PostgreSQL database using the `psql` shell tool
- Importing existing data to fill the database (the `drinks.sql` file in this repository)
- Using the `psycopg2` Python package to interact with the database (we could have alternatively used a dedicated SQL IDE like SQL Workbench)
- Quering the database to explore SQL concepts such as:
    - Filtering, ordering, limiting, etc.
    - Joining tables
    - Grouping records
    - Aggregate functions
    
## Requirements:
- Install PostgreSQL (and potentially give the `postgres` user a password)
- Install the `psycopg2` Python package in your anaconda virtual environment of choice. **Note**: you will need to install other dependencies beforehand. Look up which ones for your operating system.
    - Using conda: `conda install -c anaconda psycopg2`
    - Using pip: `pip install psycopg2`

# Creating the database using `psql`

1. Start up your terminal.
2. Switch to the "postgres" user that was created automatically when installing PostgreSQL (find the equivalent command for Windows). \
`sudo -u postgres -i`
3. Start the `psql` shell tool. \
`psql`
4. List the PostgreSQL databases that exist on your computer (press "q" when you want to exit the list view). \
`\l`
5. Create a new empty database called "drinks". \
`CREATE DATABASE drinks;`
4. Confirm that the "drinks" database was created (press "q" when you want to exit the list view). \
`\l`
5. We can now leave the `psql` tool. \
`exit`
5. And also log out of the "postgres" user. \
`exit`

# Create a Python connection to the database and load in some data from the `drinks.sql` file

In [3]:
import psycopg2
import pandas as pd

# Create a connection to the drinks database we just made (substitute the password you set for the "postgres" user)
con = psycopg2.connect(database='drinks', user='postgres', password='postgres',
                       host='127.0.0.1', port='5432')
cur = con.cursor()

cur.execute(open('drinks.sql', 'r').read())
con.commit()

DuplicateTable: relation "drink_info" already exists


In [2]:
# Using triple quotations will ensure that everything inside the string
# is read as a character (i.e. no need to use escape characters)
query = """
SELECT *  
FROM drinks
LIMIT 5;
"""

# You first execute the query, then get it's result.
# Note: if you try to chain .execute().fetchall() in the same statement,
# the database my not have time to execute the query in time, which will
# result in you getting an error.
cur.execute(query)                                     
response = cur.fetchall()

# Let's look at the format of the response we get back
print(f'This is the raw response we get back:\n{response}\n')

# You can alternatively use pandas to get a nicely formatted DataFrame
pandas_response = pd.read_sql_query(query, con)
print('This is the nicely structured pandas response:\n')
pandas_response

UndefinedTable: relation "drinks" does not exist
LINE 3: FROM drinks
             ^


In [13]:
# Before doing anything else, let's create a function out
# of the things we're doing above
def execute_query(query_string, return_pandas=True):
    if return_pandas:
        response = pd.read_sql_query(query_string, con)
    else:
        cur.execute(query_string)
        response = cur.fetchall()
    return response


# Let's try out our function to make sure it does
# the same as what we have above
execute_query(query)

Unnamed: 0,drink_id,type
0,drink 1,cocktail
1,drink 2,wine
2,drink 3,rum
3,drink 4,cocktail
4,drink 5,cocktail


# Some SQL references before we get started
![](sql-cheat-sheet.png)

## Order of operations
![](order_of_operations.png)

# Time to start writing some queries

## Problem 1
Get the bar name and average price of drinks at each bar.
<table style="border: 5px; width: 100%">
 <tr>
    <td><b style="font-size:30px">menu_items</b></td>
    <td><b style="font-size:30px">orders</b></td>
    <td><b style="font-size:30px">drinks</b></td>
 </tr>
 <tr>
    <td style="font-size:20px"><b>drink_id</b>: string</td>
    <td style="font-size:20px"><b>drink_id</b>: string</td>
    <td style="font-size:20px"><b>drink_id</b>: string</td>
 </tr>
 <tr>
    <td style="font-size:20px"><b>bar</b>: string</td>
    <td style="font-size:20px"><b>bar</b>: string</td>
    <td style="font-size:20px"><b>type</b>: string</td>
 </tr>
 <tr>
    <td style="font-size:20px"><b>price</b>: real</td>
    <td style="font-size:20px"><b>person</b>: string</td>
    <td style="font-size:20px"></td>
 </tr>
 <tr>
    <td style="font-size:20px"></td>
    <td style="font-size:20px"><b>date</b>: string</td>
    <td style="font-size:20px"></td>
 </tr>
 <tr>
    <td style="font-size:20px"></td>
    <td style="font-size:20px"><b>quantity</b>: integer</td>
    <td style="font-size:20px"></td>
 </tr>
</table>

In [None]:
query = '''
SELECT ...
'''
execute_query(query)


## Problem 2
Get the bars with the top 5 average prices.
<table style="border: 5px; width: 100%">
 <tr>
    <td><b style="font-size:30px">menu_items</b></td>
    <td><b style="font-size:30px">orders</b></td>
    <td><b style="font-size:30px">drinks</b></td>
 </tr>
 <tr>
    <td style="font-size:20px"><b>drink_id</b>: string</td>
    <td style="font-size:20px"><b>drink_id</b>: string</td>
    <td style="font-size:20px"><b>drink_id</b>: string</td>
 </tr>
 <tr>
    <td style="font-size:20px"><b>bar</b>: string</td>
    <td style="font-size:20px"><b>bar</b>: string</td>
    <td style="font-size:20px"><b>type</b>: string</td>
 </tr>
 <tr>
    <td style="font-size:20px"><b>price</b>: real</td>
    <td style="font-size:20px"><b>person</b>: string</td>
    <td style="font-size:20px"></td>
 </tr>
 <tr>
    <td style="font-size:20px"></td>
    <td style="font-size:20px"><b>date</b>: string</td>
    <td style="font-size:20px"></td>
 </tr>
 <tr>
    <td style="font-size:20px"></td>
    <td style="font-size:20px"><b>quantity</b>: integer</td>
    <td style="font-size:20px"></td>
 </tr>
</table>

## Problem 3
Get the bar with the cheapest drink, along with the drink and price.
<table style="border: 5px; width: 100%">
 <tr>
    <td><b style="font-size:30px">menu_items</b></td>
    <td><b style="font-size:30px">orders</b></td>
    <td><b style="font-size:30px">drinks</b></td>
 </tr>
 <tr>
    <td style="font-size:20px"><b>drink_id</b>: string</td>
    <td style="font-size:20px"><b>drink_id</b>: string</td>
    <td style="font-size:20px"><b>drink_id</b>: string</td>
 </tr>
 <tr>
    <td style="font-size:20px"><b>bar</b>: string</td>
    <td style="font-size:20px"><b>bar</b>: string</td>
    <td style="font-size:20px"><b>type</b>: string</td>
 </tr>
 <tr>
    <td style="font-size:20px"><b>price</b>: real</td>
    <td style="font-size:20px"><b>person</b>: string</td>
    <td style="font-size:20px"></td>
 </tr>
 <tr>
    <td style="font-size:20px"></td>
    <td style="font-size:20px"><b>date</b>: string</td>
    <td style="font-size:20px"></td>
 </tr>
 <tr>
    <td style="font-size:20px"></td>
    <td style="font-size:20px"><b>quantity</b>: integer</td>
    <td style="font-size:20px"></td>
 </tr>
</table>

## Problem 4
Get the number of beers sold by each bar in descending order.
<table style="border: 5px; width: 100%">
 <tr>
    <td><b style="font-size:30px">menu_items</b></td>
    <td><b style="font-size:30px">orders</b></td>
    <td><b style="font-size:30px">drinks</b></td>
 </tr>
 <tr>
    <td style="font-size:20px"><b>drink_id</b>: string</td>
    <td style="font-size:20px"><b>drink_id</b>: string</td>
    <td style="font-size:20px"><b>drink_id</b>: string</td>
 </tr>
 <tr>
    <td style="font-size:20px"><b>bar</b>: string</td>
    <td style="font-size:20px"><b>bar</b>: string</td>
    <td style="font-size:20px"><b>type</b>: string</td>
 </tr>
 <tr>
    <td style="font-size:20px"><b>price</b>: real</td>
    <td style="font-size:20px"><b>person</b>: string</td>
    <td style="font-size:20px"></td>
 </tr>
 <tr>
    <td style="font-size:20px"></td>
    <td style="font-size:20px"><b>date</b>: string</td>
    <td style="font-size:20px"></td>
 </tr>
 <tr>
    <td style="font-size:20px"></td>
    <td style="font-size:20px"><b>quantity</b>: integer</td>
    <td style="font-size:20px"></td>
 </tr>
</table>

## Problem 5
For each person, find the bar they visit, and the type(s) and price(s) of the drink(s) they drink during those visits.
<table style="border: 5px; width: 100%">
 <tr>
    <td><b style="font-size:30px">menu_items</b></td>
    <td><b style="font-size:30px">orders</b></td>
    <td><b style="font-size:30px">drinks</b></td>
 </tr>
 <tr>
    <td style="font-size:20px"><b>drink_id</b>: string</td>
    <td style="font-size:20px"><b>drink_id</b>: string</td>
    <td style="font-size:20px"><b>drink_id</b>: string</td>
 </tr>
 <tr>
    <td style="font-size:20px"><b>bar</b>: string</td>
    <td style="font-size:20px"><b>bar</b>: string</td>
    <td style="font-size:20px"><b>type</b>: string</td>
 </tr>
 <tr>
    <td style="font-size:20px"><b>price</b>: real</td>
    <td style="font-size:20px"><b>person</b>: string</td>
    <td style="font-size:20px"></td>
 </tr>
 <tr>
    <td style="font-size:20px"></td>
    <td style="font-size:20px"><b>date</b>: string</td>
    <td style="font-size:20px"></td>
 </tr>
 <tr>
    <td style="font-size:20px"></td>
    <td style="font-size:20px"><b>quantity</b>: integer</td>
    <td style="font-size:20px"></td>
 </tr>
</table>

# Below are my answers

## Answer 1

In [14]:
execute_query("""
SELECT bar, AVG(price) AS avg_price
FROM menu_items
GROUP BY bar;
""")

Unnamed: 0,bar,avg_price
0,bar 16,121.364365
1,bar 12,152.071032
2,bar 7,61.567487
3,bar 11,31.562558
4,bar 3,129.809513
5,bar 18,105.214779
6,bar 19,18.224434
7,bar 17,107.329703
8,bar 13,150.560439
9,bar 8,67.919044


## Answer 2

In [15]:
execute_query("""
SELECT bar, AVG(price) AS avg_price
FROM menu_items
GROUP BY bar
ORDER BY avg_price DESC
LIMIT 5;
""")

Unnamed: 0,bar,avg_price
0,bar 12,152.071032
1,bar 13,150.560439
2,bar 3,129.809513
3,bar 16,121.364365
4,bar 17,107.329703


## Answer 3

In [16]:
execute_query("""
SELECT bar, drink_id, price
FROM menu_items
ORDER BY price ASC
LIMIT 1;
""")

Unnamed: 0,bar,drink_id,price
0,bar 18,drink 43,3.477886


## Answer 4

In [17]:
execute_query("""
SELECT orders.bar, SUM(orders.quantity) as beers_sold 
FROM orders
JOIN drinks ON drinks.drink_id = orders.drink_id
WHERE drinks.type LIKE '%beer%'
GROUP BY orders.bar
ORDER BY beers_sold DESC;
""")

Unnamed: 0,bar,beers_sold
0,bar 20,176
1,bar 5,111
2,bar 3,108
3,bar 11,80
4,bar 2,79
5,bar 10,66
6,bar 17,36


## Answer 5

In [18]:
execute_query("""
SELECT o.person, o.bar, d.type, d.drink_id, h.price
FROM orders AS o
JOIN menu_items AS h ON (o.drink_id = h.drink_id AND o.bar = h.bar) 
JOIN drinks AS d ON o.drink_id = d.drink_id
GROUP BY o.person, o.bar, d.type, d.drink_id, h.price;
""")

Unnamed: 0,person,bar,type,drink_id,price
0,person 1,bar 14,wine,drink 37,304.054170
1,person 1,bar 14,wine,drink 6,6.283809
2,person 1,bar 18,soda,drink 43,3.477886
3,person 1,bar 19,cocktail,drink 1,11.775652
4,person 1,bar 19,whisky,drink 42,34.507183
...,...,...,...,...,...
1363,person 98,bar 7,vodka,drink 15,47.559166
1364,person 98,bar 7,whisky,drink 28,91.364480
1365,person 99,bar 11,beer,drink 38,6.242310
1366,person 99,bar 11,whisky,drink 42,46.720566


# Remember to close your database connection when you're done.
**Note**: If you run this cell then try to run queries, it won't work.
You will have to re-run the cell that initialized the connection.

In [19]:
con.close()