# DS3 SQL Workshop

### Monday, January 28, 2023
### Baraa Zekeria and Ojas Vashishtha

## Setup

In [None]:
# upgrade pip
!{sys.executable} -m pip install --upgrade pip

# install external libraries
import sys
!{sys.executable} -m pip install pandas # data manipulation & analysis
!{sys.executable} -m pip install sqlite3 # SQL

# import libraries
import pandas as pd
import sqlite3

# notebook configurations
pd.options.display.max_colwidth = 1000

import warnings
warnings.filterwarnings("ignore")

print("\n**ALL LIBRARIES IMPORTED SUCCESSFULLY**")

## Connect to SQLite Database

### Step 1: Import Library

In [None]:
import sqlite3

### Step 2: Create a connection

In [None]:
connection = sqlite3.connect("ds3.db")


### Step 3: Create a cursor object

In [None]:
crsr = connection.cursor()

### Step 4: Convert pandas DataFrame() to a SQL Table

In [None]:
independents = pd.read_csv("data/Independents100.csv")
independents = independents.assign(sales = independents["sales"].apply(int))
ratings = pd.read_csv("data/independents_ratings.csv")

In [None]:
independents.to_sql("independents", con = connection)
ratings.to_sql("ratings", con = connection)

#### Datasets info
- ```independents```: 100 highest-grossing independent restaurants in the US
    - Sorted by ```sales``` in *descending* order
- ```ratings```: Yelp ratings from March 2021 for restaurants in ```independents```

### Step 5: Create queries!

In [None]:
pd.read_sql('''SELECT * 
            FROM independents''', connection)

In [None]:
pd.read_sql('''SELECT * 
            FROM ratings''', connection)

## Basic Commands

| Keyword | Description
| :- | :- 
| **SELECT**|Selects data from a database
| **AS**| Renames a columns or table with an alias
| **FROM**| Specifies which table to select or delete data from
| **WHERE**| Filters a result set to include only records that fulfill a specified condition
| **JOIN**| Joins tables (right, left, inner, outer, etc.)
| **GROUP BY**| Groups the result set (used with aggregate functions: ```COUNT```, ```MAX```, ```MIN```, ```SUM```, ```AVG```)
| **HAVING**| Used instead of ```WHERE``` with aggreagte functions
| **ORDER BY**| Sorts the result set in ascending or descending order

## Ex 1: Write a query to output all of the restaurants in California and New York

|    |   rank | restaurant                                  |    sales |   avg_check | city           | state   |   meals_served |
|---:|-------:|:--------------------------------------------|---------:|------------:|:---------------|:--------|---------------:|
|  0 |      1 | Carmine's (Times Square)                    | 39080335 |          40 | New York       | N.Y.    |         469803 |
|  3 |      4 | LAVO Italian Restaurant & Nightclub         | 26916180 |          90 | New York       | N.Y.    |         198500 |
|  4 |      5 | Bryant Park Grill & Cafe                    | 26900000 |          62 | New York       | N.Y.    |         403000 |
|  8 |      9 | Balthazar                                   | 24547800 |          87 | New York       | N.Y.    |         519000 |
|  9 |     10 | Smith & Wollensky                           | 24501000 |         107 | New York       | N.Y.    |         257364 |
| ... |     ... | ...                     | ... |          ... | ...       | ...    |         ... |
| 92 |     93 | Virgil's Real Barbecue                      | 12245998 |          31 | New York       | N.Y.    |         251800 |
| 94 |     95 | Franciscan Crab Restaurant                  | 12218147 |          59 | San Francisco  | Calif.  |         240000 |
| 95 |     96 | George's at the Cove                        | 12194000 |          80 | La Jolla       | Calif.  |         250000 |
| 96 |     97 | Le Coucou                                   | 12187523 |          95 | New York       | N.Y.    |          87070 |
| 98 |     99 | Upland                                      | 11965564 |          52 | New York       | N.Y.    |         171825 |

33 rows x 7 columns

In [None]:
pd_ca_ny = independents[independents["state"].isin(["Calif.", "N.Y."])]
pd_ca_ny

In [None]:
sql_ca_ny = pd.read_sql('''SELECT * 
                        FROM independents 
                        WHERE state IN ("Calif.", "N.Y.")''', connection)
sql_ca_ny

## Q1: Write a query to output the name of restaurants in California that has an average check less than 50. 

|    | restaurant                    |
|---:|:------------------------------|
| 33 | Bottega Louie                 |
| 39 | Original Joe's Westlake       |
| 46 | Harris Ranch Inn & Restaurant |
| 70 | Cliff House                   |
| 78 | Paradise Cove Beach Cafe      |
| 80 | Original Joe's                |

6 rows x 1 column

In [None]:
# include Pandas code here (if needed)
...

In [None]:
# include SQL code here
...

## Q2: Write a query to merge restaurants and rankings. Assign to ```sql_merge```.

|    |   rank | restaurant                                  |    sales |   avg_check | city            | state   |   meals_served |   rating |
|---:|-------:|:--------------------------------------------|---------:|------------:|:----------------|:--------|---------------:|---------:|
|  0 |      1 | Carmine's (Times Square)                    | 39080335 |          40 | New York        | N.Y.    |         469803 |      4   |
|  1 |      2 | The Boathouse Orlando                       | 35218364 |          43 | Orlando         | Fla.    |         820819 |      4   |
|  2 |      4 | LAVO Italian Restaurant & Nightclub         | 26916180 |          90 | New York        | N.Y.    |         198500 |      2.5 |
|  3 |      5 | Bryant Park Grill & Cafe                    | 26900000 |          62 | New York        | N.Y.    |         403000 |      3.5 |
|  4 |      6 | Gibsons Bar & Steakhouse                    | 25409952 |          80 | Chicago         | Ill.    |         348567 |      4   |
|  ... |      ... | ...              | ... |         ... | ...       | ...    |         ... |      ... |
| 86 |     95 | Franciscan Crab Restaurant                  | 12218147 |          59 | San Francisco   | Calif.  |         240000 |      3.5 |
| 87 |     96 | George's at the Cove                        | 12194000 |          80 | La Jolla        | Calif.  |         250000 |      4   |
| 88 |     97 | Le Coucou                                   | 12187523 |          95 | New York        | N.Y.    |          87070 |      4   |
| 89 |     99 | Upland                                      | 11965564 |          52 | New York        | N.Y.    |         171825 |      4   |
| 90 |    100 | Virgil's Real Barbecue                      | 11391678 |          27 | Las Vegas       | Nev.    |         208276 |      4   |

91 rows x 8 columns

In [None]:
# include Pandas code here (if needed)
...

In [None]:
# include SQL code here
...

Let's save ```sql_merge``` as a SQL table for easy access.

If you accidently delete the code below, please run the following to save your table within ```ds3.db```:

```py
sql_merge.to_sql("sql_merge", con = connection)
```

In [None]:
sql_merge.to_sql("sql_merge", con = connection)

## Q3: Write a query that finds the average check per state. Then, for the most expensive state, find the most expensive restaurant. Is that also the highest rated restaurant for that state? Output all the columns for this question. 

In [None]:
# include Pandas code here (if needed)
...

In [None]:
# include SQL code here
...

## Q4: Write your own query! Use the data given and the commands to find useful insights/something interesting in the data. Each group will share their findings!

In [None]:
...