# DS-SF-36 | 04 | Databases and Scrapping | Assignment | Starter Code

## `SQLite` and Bistro

In this assignment, we will be exploring the `bistro` dataset.  The previous assignment used `pandas`.  Today, we'll answer the same questions but using `SQLite`.  In some situations, `pandas` will be a better solution.  In others, doing it using `SQL` will make more sense.  As you gain more experience, you'll know which one to use.

> ### Question 1.  Import the `sqlite3` package.

In [11]:
import os

import pandas as pd
pd.set_option('display.max_rows', 10)
pd.set_option('display.notebook_repr_html', True)
pd.set_option('display.max_columns', 10)

import sqlite3

> ### Question 2.  Connect to the `dataset-04-bistro.db` database.  The rest of this assignment focus on the `bistro` table.

In [12]:
db = sqlite3.connect(os.path.join('..','datasets','dataset-04-bistro.db'))

> ### Question 3.  How many samples (i.e., rows) are in this dataset?

In [22]:
r =pd.io.sql.read_sql(
'''
SELECT *
    FROM bistro
;
''', con = db)

In [35]:
r.index.size

244

Answer: 244

> ### Question 4.  Print the first two rows of the table to the console.

In [46]:
r =pd.io.sql.read_sql(
'''
SELECT *
    FROM bistro
LIMIT 2
;
''', con = db)

In [47]:
r

Unnamed: 0,index,day,time,name,gender,is_smoker,party,check,tip
0,0,Sunday,Dinner,Kimberly,Female,0,2,16.99,1.01
1,1,Sunday,Dinner,Nicholas,Male,0,3,10.34,1.66


> ### Question 5.  For which week days does the dataset has data for?

In [57]:
r =pd.io.sql.read_sql(
'''
    SELECT DISTINCT
        b.day
    FROM 
        bistro b
;
''', con = db)

In [58]:
r

Unnamed: 0,day
0,Sunday
1,Saturday
2,Thursday
3,Friday


Answer: Sunday, Saturday, Thursday, Friday

> ### Question 6.  How often was the bistro patronized for each week day?

In [56]:
r =pd.io.sql.read_sql(
'''
    SELECT  
        b.day,
        count() as count
    FROM 
        bistro b
    GROUP 
        BY b.day
;
''', con = db)

In [55]:
r

Unnamed: 0,day,count
0,Friday,19
1,Saturday,87
2,Sunday,76
3,Thursday,62


Answer: 

    Saturday    87        
    Sunday      76
    Thursday    62
    Friday      19

> ### Question 7.  How much tip did waiters collect for each week day?

In [62]:
r =pd.io.sql.read_sql(
'''
    SELECT  
        b.day,
        sum(b.tip) as tip
    FROM 
        bistro b
    GROUP 
        BY b.day
;
''', con = db)

In [63]:
r

Unnamed: 0,day,tip
0,Friday,51.96
1,Saturday,260.4
2,Sunday,247.39
3,Thursday,171.83


Answer: TODO

> ### Question 8.  What is the average tip per check (in absolute \$) for each week day?

In [70]:
r =pd.io.sql.read_sql(
'''

    WITH total_tip AS
    (
        SELECT  
            b.day,
            sum(b.tip) as tip,
            count(b.tip) as count
        FROM 
            bistro b
        GROUP 
            BY b.day
    )

    SELECT  
        avg.day as day,
        (avg.tip/avg.count) as average_tip
    FROM 
        total_tip avg
;
''', con = db)

In [71]:
r

Unnamed: 0,day,average_tip
0,Friday,2.734737
1,Saturday,2.993103
2,Sunday,3.255132
3,Thursday,2.771452


Answer: TODO

> ### Question 9.  What is the average tip per check (as a percentage of the check) for each week day?

(`CHECK` is a reserved keywork; use `` `check` `` (put the name between backticks) to reference the `check` column)

In [73]:
r =pd.io.sql.read_sql(
'''

    WITH total_tip AS
    (
        SELECT  
            b.day,
            sum(b.tip) as tip,
            sum(b.'check') as bill
        FROM 
            bistro b
        GROUP 
            BY b.day
    )

    SELECT  
        avg.day as day,
        (avg.tip*100/avg.bill) as percentage_tip
    FROM 
        total_tip avg
;
''', con = db)

In [74]:
r

Unnamed: 0,day,percentage_tip
0,Friday,15.944519
1,Saturday,14.642375
2,Sunday,15.203791
3,Thursday,15.673201


Answer: TODO

> ### Question 10.  Are there any name in common between male and female patrons?  (E.g., `Chris` can refer to either a man or a woman)

In [87]:
r =pd.io.sql.read_sql(
'''

        SELECT DISTINCT 
           b1.name, 
           b1.gender
        FROM 
            bistro b1
            INNER JOIN bistro b2 on b1.name = b2.name and b1.gender != b2.gender
        
;
''', con = db)

In [88]:
r

Unnamed: 0,name,gender
0,Casey,Male
1,Casey,Female


Answer: TODO

> ### Question 11.  If no patrons share the same name, how many unique patrons are in the dataset?

In [92]:
r =pd.io.sql.read_sql(
'''

        SELECT DISTINCT 
           b1.name, 
           b1.gender
        FROM 
            bistro b1
        
;
''', con = db)

In [93]:
r

Unnamed: 0,name,gender
0,Kimberly,Female
1,Nicholas,Male
2,Larry,Male
3,Joseph,Male
4,Janice,Female
...,...,...
177,Darwin,Male
178,Henry,Male
179,Jeremy,Male
180,Dorothy,Female


Answer: TODO

> ### Question 12.  How many times did `Kevin` patronized the bistro?  How about `Alice`?

In [96]:
r =pd.io.sql.read_sql(
'''

        SELECT b1.name, count(*) count
        FROM 
            bistro b1
        WHERE
            b1.name ='Kevin' or b1.name = 'Alice'
        GROUP BY
            b1.name
        
;
''', con = db)

In [97]:
r

Unnamed: 0,name,count
0,Alice,2
1,Kevin,4


Answer: TODO

> ### Question 13.  Who are the top 3 female and male patrons?

In [105]:
r =pd.io.sql.read_sql(
'''

        SELECT 
            b1.name,
            count(*) count
        FROM 
            bistro b1        
        WHERE
            b1.gender ='Female'
        GROUP BY
            b1.name
        ORDER BY
            2 DESC
        LIMIT 3
        
;
''', con = db)

In [106]:
r

Unnamed: 0,name,count
0,Mary,4
1,Casey,3
2,Laura,3


In [107]:
r =pd.io.sql.read_sql(
'''

        SELECT 
            b1.name,
            count(*) count
        FROM 
            bistro b1        
        WHERE
            b1.gender ='Male'
        GROUP BY
            b1.name
        ORDER BY
            2 DESC
        LIMIT 3
        
;
''', con = db)

In [108]:
r

Unnamed: 0,name,count
0,David,8
1,Casey,5
2,James,5


Answer: TODO

> ### Question 14.  Who's the best tipper (as a fraction of all tips over all check totals)?  Who's the worst?  How many times did they patronize the bistro?

In [144]:
r =pd.io.sql.read_sql(
'''
WITH best_tipper as (
    SELECT  
        b.name,
        b.tip/b.'check' as fraction
    FROM 
        bistro b
    ORDER BY
        2 DESC
    LIMIT 1
    
),

 worst_tipper as (
    SELECT  
        b.name,
        b.tip/b.'check' as fraction
    FROM 
        bistro b
    ORDER BY
        2 ASC
    LIMIT 1
    
)


    SELECT 
        *,
        (select count() from bistro b where b.name = bt.name) as count
    FROM 
        best_tipper bt
        
    UNION ALL
     SELECT 
        *,
        (select count() from bistro b where b.name = wt.name) as count
    FROM 
        worst_tipper wt
    
        
;
''', con = db)

In [145]:
r

Unnamed: 0,name,fraction,count
0,Daniel,0.710345,2
1,Jeremy,0.035638,1


Answer: TODO