### 02 Queries, Take 1

Add more descriptive title and summary later.

In [8]:
import sqlite3

In [9]:
%load_ext sql
%sql sqlite:///nyc_inspections.db

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


Before diving too deep, let's explore the base tables in the database to see any missing or out of place values.

In [10]:
%%sql
SELECT boro, COUNT(camis) AS n_restaurants FROM restaurants
GROUP BY boro
LIMIT 5;

 * sqlite:///nyc_inspections.db
Done.


BORO,n_restaurants
0,78
Bronx,19620
Brooklyn,55205
Manhattan,77600
Queens,48972


Looks like there's an error with the borough Staten Island.

In [11]:
%%sql
SELECT * FROM inspections
ORDER BY INSPECDATE
LIMIT 5;

 * sqlite:///nyc_inspections.db
Done.


INSPECDATE,INSPECTYPE,CAMIS,SCORE,GRADE,VIOLCODE
1900-01-01 00:00:00,,50125332,,,
1900-01-01 00:00:00,,50113608,,,
1900-01-01 00:00:00,,50132947,,,
1900-01-01 00:00:00,,50131641,,,
1900-01-01 00:00:00,,50107034,,,


Dataset documentation states any entries with an inspection date of 1900-01-01 are new restaurants without an inspection yet. These will have to be filtered out in later analyses.

Let's start by seeing how many total restaurants are in this dataset?

In [12]:
%%sql
SELECT COUNT(DISTINCT camis) AS n_restaurants
FROM inspections;

 * sqlite:///nyc_inspections.db
Done.


n_restaurants
28348


In [13]:
# Save result to workspace
n_restau = _

Next we'll try filtering the results for inspections that resulted in a valid grade. For anything but initial inspections, grades are not given unless the score is <= 13. If a restaurant recieves a score > 13, the grade for that inspection will still be the grade received during the last inspection cycle and is therefore not reflective of their "current" grade.

In [14]:
%%sql
SELECT camis, MAX(inspecdate) AS lastinspection, grade, score
FROM inspections
WHERE (
      inspectype IN (                                            
            'Cycle Inspection / Re-inspection',
            'Pre-permit (Operational) / Re-inspection'
            )
      OR (inspectype IN (
            'Cycle Inspection / Initial Inspection',                                  
            'Pre-permit (Operational) / Initial Inspection'
            )
            AND SCORE <= 13
            )
      OR (inspectype IN (                                                    
            'Pre-permit (Operational) / Reopening Inspection',
            'Cycle Inspection / Reopening Inspection'
            )))
      AND GRADE IN ('A', 'B', 'C', 'P', 'Z')
GROUP BY camis
LIMIT 10;

 * sqlite:///nyc_inspections.db
Done.


CAMIS,lastinspection,GRADE,SCORE
30075445,2023-02-03 00:00:00,Z,13.0
30112340,2022-07-13 00:00:00,A,11.0
30191841,2022-01-04 00:00:00,A,12.0
40356018,2022-02-01 00:00:00,A,7.0
40356483,2022-08-19 00:00:00,A,2.0
40356731,2023-01-17 00:00:00,A,9.0
40357217,2021-07-28 00:00:00,A,10.0
40359480,2022-05-03 00:00:00,A,9.0
40359705,2022-02-10 00:00:00,A,12.0
40360045,2023-01-31 00:00:00,A,13.0


How many restaurants does this leave us with?

In [15]:
%%sql
SELECT COUNT(camis) AS n_restaurants
FROM (
      SELECT camis, MAX(inspecdate) AS lastinspection, grade, score
      FROM inspections
      WHERE (
            inspectype IN (                                            
                  'Cycle Inspection / Re-inspection',
                  'Pre-permit (Operational) / Re-inspection'
                  )
            OR (inspectype IN (
                  'Cycle Inspection / Initial Inspection',                                  
                  'Pre-permit (Operational) / Initial Inspection'
                  )
                  AND SCORE <= 13
                  )
            OR (inspectype IN (                                                    
                  'Pre-permit (Operational) / Reopening Inspection',
                  'Cycle Inspection / Reopening Inspection'
                  )))
            AND GRADE IN ('A', 'B', 'C', 'P', 'Z')
      GROUP BY camis
      )

 * sqlite:///nyc_inspections.db
Done.


n_restaurants
22228


In [16]:
n_restau_graded = _

How many restaurants in the dataset did not have valid grades?

In [17]:
df_restau = n_restau.DataFrame()
df_restau_graded = n_restau_graded.DataFrame()

invalid = int(df_restau.values - df_restau_graded.values)
print(f'{invalid} out of {int(df_restau.values)} restaurants had invalid grades.')

6120 out of 28348 restaurants had invalid grades.
