<a href="https://colab.research.google.com/github/dirtydupe/cisc_3140_Midterm/blob/master/CISC3140_Midterm_Notebook_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Joe Troia - Notebook 1
## Dataset: DOHMH New York City Restaurant Inspection Results

The objectives in the use of this dataset:

1. Calculate number of restaurants per borough (graded and ungraded) and record number of each grade attained per borough.
2. Use this in order to find the percentage of each grade per borough as well as
3. percentage share of each borough per grade 
4. Calculate average inspection score per borough (a numerical score in which lower is better)
5. Calcualte average number of violations per borough

*Note that the restaurants in the dataset can have multiple inspection dates and multiple violations recorded per inspection so there will be duplicate records for restaurants. Only the one grade and score recorded per inspection will count towards the grade totals. Also, not every citation has a grade attached to it so this needs to be taken into account when parsing the records.*

In looking to follow the specifications provided by Professor Chuang, many of the problems were tackled with a focus on using list comprehension,  `lambda`  and the following Python functions:
* `map()`
* `reduce()`
* `filter()`

### Inspection Grades:
*  **A** 
*  **B**
*  **C**
*  **P** - Grade Pending
*  **Z** - Grade Pending issued on re-opening following an initial inspection that resulted in a closure
*  **N** - Not Yet Graded










Importing required libraries:

In [0]:
import functools
import json
import urllib.request

The API endpoint of the data to be examined is at https://data.cityofnewyork.us/resource/43nn-pn8j.json

By default, this will return only 1000 results so we'll pass the query parameters `$limit=100000&$offset=0` so that the json returned will contain many, but not all, of the elements in the dataset.

Collaboratory seems to hang up when trying to process all 380,000+ elements in the data; so this compromise needs to be made.

In [0]:
url = "https://data.cityofnewyork.us/resource/43nn-pn8j.json?$limit=100000&$offset=0"

Opening the URL and reading the Response object, then putting the json string into the list `data`

In [0]:
response = urllib.request.urlopen(url)
jsonObj = response.read()
data = json.loads(jsonObj)

Checking the total number of elements in the list

In [17]:
dataSize = len(data)
print(dataSize)

100000


Defining an Inspection class and a Restaurant class to hold accumulator fields and other data that will allow us to make the final calculations.

*Aside: "Restaurant" is one of those words that look weird when you spell it out, amirite?*

In [0]:
class Inspection():
      def __init__(self):
          self.date = ""
          self.grade = ""
          self.score = 0
          self.numViolations = 0
          
      def incrViolations(self):
        self.numViolations += 1
      

class Restaurant():
      def __init__(self):
          self.inspections = {}
          self.camis = ""

Here I'm defining a Borough class which will contain fields and methods which will manipulate and aggregate data for each borough including a list of Restaurant objects belonging to each.

In [0]:
class Borough():
    def __init__(self, name):
        self.name = name
        self.restaurants = {}
        self.allCitations = []
        self.allGrades = []
        self.num_A = 0
        self.num_B = 0
        self.num_C = 0
        self.num_P = 0
        self.num_Z = 0
        self.num_N = 0
        self.totalRest = 0
        self.totalCitations = 0
        self.totalGrades = 0
        
    def addRestaurant(self, camis, restaurant):
        self.restaurants[camis] = restaurant
        
    def isNewRestaurant(self, camis):
        if camis in self.restaurants:
            return False
        
        return True

Instantiating an object for each borough and putting them into the `boroughs` list 
* The lambda function is defined to call `Borough`'s constructor
* List comprehension is used to create the list of `Borough` objects

In [51]:
constructBoroughs = lambda x: Borough(x)

boroughs = ["Brooklyn", "Manhattan", "Queens", "Bronx", "Staten Island"]
boroughs = [constructBoroughs(b) for b in boroughs]

for b in boroughs:
    print(b.name)

Brooklyn
Manhattan
Queens
Bronx
Staten Island


Using `map()`, I call a function that itself calls `filter()` on the bulk data.  This will filter `data` based on the borough field. The resulting list is stored in the current borough object's `allCitations` field.                               

In [52]:
def buildCitationLists(borough):
      citList = list(filter(lambda x: x['boro'] == borough.name , data))
      borough.allCitations = citList
      return borough      

boroughs = list(map(buildCitationLists, boroughs))

boroughs[0].totalCitations = len(boroughs[0].allCitations)
boroughs[1].totalCitations = len(boroughs[1].allCitations)
boroughs[2].totalCitations = len(boroughs[2].allCitations)
boroughs[3].totalCitations = len(boroughs[3].allCitations)
boroughs[4].totalCitations = len(boroughs[4].allCitations)

print("CITATIONS TOTALS")
print("Brooklyn:", len(boroughs[0].allCitations))
print("Manhattan:", len(boroughs[1].allCitations))
print("Queens:", len(boroughs[2].allCitations))
print("Bronx:", len(boroughs[3].allCitations))
print("Staten Island:", len(boroughs[4].allCitations))

CITATIONS TOTALS
Brooklyn: 25578
Manhattan: 39252
Queens: 22811
Bronx: 8945
Staten Island: 3384


Using `map()` again, the `restaurants` dictionary of each borough is built. A unique 'camis' id key maps to each unique Restaurant object. Restaurant objects are created for each new camis by testing if the key is present in that `Borough` object's `restaurants` dictionary.

Additionally, the details of the citation are entered if it is a new inspection or, if an inspection was already recorded in the `Restaurant` object, the number of violations is incremented if a violation was noted on that line of data.

In [53]:
def buildRestaurantDicts(record, borough):    
      if borough.isNewRestaurant(record['camis']):
          restaurant = Restaurant()
          restaurant.camis = record['camis']      
          borough.addRestaurant(record['camis'], restaurant)
 
      if record['inspection_date'] in borough.restaurants[record['camis']].inspections:
          if 'violation_code' in record:
              borough.restaurants[record['camis']].inspections[record['inspection_date']].incrViolations()
      else:
          inspection = Inspection()
          inspection.date = record['inspection_date']
          
          if 'grade' in record:
              inspection.grade = record['grade']
          
          if 'score' in record:
              inspection.score = record['score']
            
          if 'violation_code' in record:
              inspection.numViolations = 1
          
          borough.restaurants[record['camis']].inspections[record['inspection_date']] = inspection
          
          
def buildBoroughObjects(borough):       
      for record in borough.allCitations:
          buildRestaurantDicts(record, borough)

      return borough

boroughs = list(map(buildBoroughObjects, boroughs))

boroughs[0].totalRest = len(boroughs[0].restaurants)
boroughs[1].totalRest = len(boroughs[1].restaurants)
boroughs[2].totalRest = len(boroughs[2].restaurants)
boroughs[3].totalRest = len(boroughs[3].restaurants)
boroughs[4].totalRest = len(boroughs[4].restaurants)

print("RESTAURANTS PER BOROUGH")
print("Brooklyn:", len(boroughs[0].restaurants))
print("Manhattan:", len(boroughs[1].restaurants))
print("Queens:", len(boroughs[2].restaurants))
print("Bronx:", len(boroughs[3].restaurants))
print("Staten Island:", len(boroughs[4].restaurants))

RESTAURANTS PER BOROUGH
Brooklyn: 5871
Manhattan: 9271
Queens: 5351
Bronx: 2118
Staten Island: 847


Find number of grades per borough:
* Total number of grades given


In [54]:
def accumulateGrades(borough):
      boroGrades = []
    
      for restKey in borough.restaurants:
        restaurant = borough.restaurants[restKey]
        
        for dateKey in restaurant.inspections:
            inspection = restaurant.inspections[dateKey]
            if inspection.grade != "":
              boroGrades.append(inspection.grade)
              
      return boroGrades
          
allGrades = list(map(accumulateGrades, boroughs))

boroughs[0].allGrades = allGrades[0]
boroughs[1].allGrades = allGrades[1]
boroughs[2].allGrades = allGrades[2]
boroughs[3].allGrades = allGrades[3]
boroughs[4].allGrades = allGrades[4]

print("TOTAL GRADES GIVEN")
print("Brooklyn:", len(allGrades[0]))
print("Manhattan:", len(allGrades[1]))
print("Queens", len(allGrades[2]))
print("Bronx", len(allGrades[3]))
print("Staten Island", len(allGrades[4]))

TOTAL GRADES GIVEN
Brooklyn: 10042
Manhattan: 15648
Queens 9287
Bronx 3675
Staten Island 1382


* Number of each grade attained per borough

In [63]:
def tallyGrades(borough):
      borough.num_A = len(list(filter(lambda x: x == "A", borough.allGrades)))
      borough.num_B = len(list(filter(lambda x: x == "B", borough.allGrades)))
      borough.num_C = len(list(filter(lambda x: x == "C", borough.allGrades)))
      borough.num_P = len(list(filter(lambda x: x == "P", borough.allGrades)))
      borough.num_Z = len(list(filter(lambda x: x == "Z", borough.allGrades)))
      borough.num_N = len(list(filter(lambda x: x == "N", borough.allGrades)))
      return borough

boroughs = list(map(tallyGrades, boroughs))

print("NUMBER OF EACH GRADE GIVEN")
print("Brooklyn - A:", boroughs[0].num_A, " B:", boroughs[0].num_B,
      " C:", boroughs[0].num_C, " P:", boroughs[0].num_P, " Z:", boroughs[0].num_Z, " N:", boroughs[0].num_N)
print("Manhattan - A:", boroughs[1].num_A, " B:", boroughs[1].num_B,
      " C:", boroughs[1].num_C, " P:", boroughs[1].num_P, " Z:", boroughs[1].num_Z, " N:", boroughs[1].num_N)
print("Queens - A:", boroughs[2].num_A, " B:", boroughs[2].num_B,
      " C:", boroughs[2].num_C, " P:", boroughs[2].num_P, " Z:", boroughs[2].num_Z, " N:", boroughs[2].num_N)
print("Bronx - A:", boroughs[3].num_A, " B:", boroughs[3].num_B,
      " C:", boroughs[3].num_C, " P:", boroughs[3].num_P, " Z:", boroughs[3].num_Z, " N:", boroughs[3].num_N)
print("Staten Island - A:", boroughs[4].num_A, " B:", boroughs[4].num_B,
      " C:", boroughs[4].num_C, " P:", boroughs[4].num_P, " Z:", boroughs[4].num_Z, " N:", boroughs[4].num_N)


NUMBER OF EACH GRADE GIVEN
Brooklyn - A: 8151  B: 1089  C: 323  P: 164  Z: 224  N: 90
Manhattan - A: 13006  B: 1571  C: 509  P: 189  Z: 283  N: 89
Queens - A: 7649  B: 932  C: 265  P: 145  Z: 216  N: 80
Bronx - A: 2947  B: 450  C: 123  P: 44  Z: 92  N: 19
Staten Island - A: 1147  B: 154  C: 35  P: 13  Z: 20  N: 13


Total inspection scores and find average for each borough

`reduce()` is employed to add the total number of violations . Then we divide by number of restaurants, using `map()`  to do this for each borough

It was a challenge for me to break out of the object-oriented way of thinking. In writing the code, I felt like I was iterating through the original dataset many more times than I would have if I'd followed a programming style that I was more familiar with.  I found myself breaking the flow of the program into more numerous, yet admittedly more succinct, parts in order to fulfill the spec.