### 02 Queries, Take 1

Add more descriptive title and summary later.

In [26]:
import sqlite3

In [27]:
%load_ext sql
%sql sqlite:///nyc_inspections.db

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


Before diving too deep, let's explore the base tables in the database to see any missing or out of place values.

In [28]:
%%sql
SELECT boro, COUNT(camis) AS n_restaurants FROM restaurants
GROUP BY boro
HAVING boro != '0'
LIMIT 10;

 * sqlite:///nyc_inspections.db
Done.


BORO,n_restaurants
Bronx,19620
Brooklyn,55205
Manhattan,77600
Queens,48972
Staten Island,6664


Unlabeled boroughs were marked as "0". The above query filters them out.

In [29]:
%%sql
SELECT * FROM inspections
ORDER BY INSPECDATE
LIMIT 5;

 * sqlite:///nyc_inspections.db
Done.


INSPECDATE,INSPECTYPE,CAMIS,SCORE,GRADE,VIOLCODE
1900-01-01 00:00:00,,50125332,,,
1900-01-01 00:00:00,,50113608,,,
1900-01-01 00:00:00,,50132947,,,
1900-01-01 00:00:00,,50131641,,,
1900-01-01 00:00:00,,50107034,,,


Dataset documentation says any entries with an inspection date of 1900-01-01 are new restaurants without an inspection yet. These will have to be filtered out in later analyses.

Let's start the analyses by seeing how many total restaurants are in this dataset?

In [30]:
%%sql
SELECT COUNT(DISTINCT camis) AS n_restaurants
FROM inspections;

 * sqlite:///nyc_inspections.db
Done.


n_restaurants
28348


In [31]:
# Save result to workspace
n_restau = _

Next we'll try filtering the results for inspections that resulted in a valid grade. For anything but initial inspections, grades are not given unless the score is <= 13. If a restaurant recieves a score > 13, the grade for that inspection will still be the grade received during the last inspection cycle and is therefore not reflective of their "current" grade.

In [32]:
%%sql
SELECT camis, MAX(inspecdate) AS lastinspection, grade, score, violcode
FROM inspections
WHERE (
      inspectype IN (                                            
            'Cycle Inspection / Re-inspection',
            'Pre-permit (Operational) / Re-inspection'
            )
      OR (inspectype IN (
            'Cycle Inspection / Initial Inspection',                                  
            'Pre-permit (Operational) / Initial Inspection'
            )
            AND SCORE <= 13
            )
      OR (inspectype IN (                                                    
            'Pre-permit (Operational) / Reopening Inspection',
            'Cycle Inspection / Reopening Inspection'
            )))
      AND GRADE IN ('A', 'B', 'C', 'P', 'Z')
GROUP BY camis
LIMIT 10;

 * sqlite:///nyc_inspections.db
Done.


CAMIS,lastinspection,GRADE,SCORE,VIOLCODE
30075445,2023-02-03 00:00:00,Z,13.0,02G
30112340,2022-07-13 00:00:00,A,11.0,10F
30191841,2022-01-04 00:00:00,A,12.0,10F
40356018,2022-02-01 00:00:00,A,7.0,02G
40356483,2022-08-19 00:00:00,A,2.0,10F
40356731,2023-01-17 00:00:00,A,9.0,08A
40357217,2021-07-28 00:00:00,A,10.0,02G
40359480,2022-05-03 00:00:00,A,9.0,06D
40359705,2022-02-10 00:00:00,A,12.0,10F
40360045,2023-01-31 00:00:00,A,13.0,04M


How many restaurants did *not* have valid grades?

In [33]:
%%sql
SELECT COUNT(camis) AS n_restaurants
FROM (
      SELECT camis, MAX(inspecdate) AS lastinspection, grade, score
      FROM inspections
      WHERE (
            inspectype IN (                                            
                  'Cycle Inspection / Re-inspection',
                  'Pre-permit (Operational) / Re-inspection'
                  )
            OR (inspectype IN (
                  'Cycle Inspection / Initial Inspection',                                  
                  'Pre-permit (Operational) / Initial Inspection'
                  )
                  AND SCORE <= 13
                  )
            OR (inspectype IN (                                                    
                  'Pre-permit (Operational) / Reopening Inspection',
                  'Cycle Inspection / Reopening Inspection'
                  )))
            AND GRADE IN ('A', 'B', 'C', 'P', 'Z')
      GROUP BY camis
      )

 * sqlite:///nyc_inspections.db
Done.


n_restaurants
22228


In [34]:
n_restau_graded = _

How many restaurants in the dataset did not have valid grades?

In [35]:
df_restau = n_restau.DataFrame()
df_restau_graded = n_restau_graded.DataFrame()

invalid = int(df_restau.values - df_restau_graded.values)
print(f'{invalid} out of {int(df_restau.values)} restaurants had invalid grades.')

6120 out of 28348 restaurants had invalid grades.


Let's make the query for valid grades into a temporary view to simplify follow-on queries.

In [36]:
%%sql
DROP VIEW IF EXISTS has_grade;

CREATE TEMP VIEW has_grade AS

SELECT camis, MAX(inspecdate) AS lastinspection, grade, score, violcode
FROM inspections
WHERE (
      inspectype IN (                                            
            'Cycle Inspection / Re-inspection',
            'Pre-permit (Operational) / Re-inspection'
            )
      OR (inspectype IN (
            'Cycle Inspection / Initial Inspection',                                  
            'Pre-permit (Operational) / Initial Inspection'
            )
            AND SCORE <= 13
            )
      OR (inspectype IN (                                                    
            'Pre-permit (Operational) / Reopening Inspection',
            'Cycle Inspection / Reopening Inspection'
            )))
      AND GRADE IN ('A', 'B', 'C', 'P', 'Z')
GROUP BY camis;

 * sqlite:///nyc_inspections.db
Done.
Done.


[]

It would be interesting to visualize the distribution of grades across the city. Let's join the has_grade table with the restaurants table and plot the restaurant locations.

In [37]:
%%sql
SELECT has_grade.camis, dba, has_grade.grade, 
    has_grade.score, latitude, longitude
FROM has_grade
JOIN restaurants
ON has_grade.camis = restaurants.camis
LIMIT 10;

 * sqlite:///nyc_inspections.db
Done.


CAMIS,DBA,GRADE,SCORE,Latitude,Longitude
50072080,SIMON & THE WHALE,A,7.0,40.739691505171,-73.984591210857
40825908,POLANCO RESTAURANT BBQ,A,10.0,0.0,0.0
50040406,DOUGH,A,12.0,40.75450075314,-73.975950427165
50012650,LOS POBLANOS RESTAURANT,Z,39.0,40.651469577237,-74.010656511979
50015482,WING HING CHINESE CUISINE,A,7.0,0.0,0.0
50119654,PEPERINO,Z,49.0,40.74665432138,-73.980359852787
40381595,PIETROS,A,4.0,40.751069173359,-73.973114812431
40536414,LA CABANA RESTAURANT,B,22.0,0.0,0.0
50054796,SUBWAY,A,12.0,40.753751186581,-73.87200411027
41161346,LE BASKET,A,7.0,40.727906397739,-73.994685452724
