<a href="https://colab.research.google.com/github/dirtydupe/cisc_3140_Midterm/blob/master/CISC3140_Midterm_Notebook_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Joe Troia - Notebook 1
## Dataset: DOHMH New York City Restaurant Inspection Results

The objectives in the use of this dataset:

1. Calculate number of restaurants per borough (graded and ungraded) and record number of each grade attained per borough.
2. Use this in order to find the percentage share of each grade per borough
3. Calcualate average number of violations per restaurant per borough

*Note that the restaurants in the dataset can have multiple inspection dates and multiple violations recorded per inspection so there will be duplicate records for restaurants. Only the one grade and score recorded per inspection will count towards the grade totals. Also, not every citation has a grade attached to it so this needs to be taken into account when parsing the records.*

In looking to follow the specifications provided by Professor Chuang, many of the problems were tackled with a focus on using list comprehension,  `lambda`  and the following Python functions:
* `map()`
* `reduce()`
* `filter()`

### Inspection Grades:
*  **A** 
*  **B**
*  **C**
*  **P** - Grade Pending
*  **Z** - Grade Pending issued on re-opening following an initial inspection that resulted in a closure
*  **N** - Not Yet Graded










Importing required libraries:

In [0]:
from functools import reduce
import json
import urllib.request

The API endpoint of the data to be examined is at https://data.cityofnewyork.us/resource/43nn-pn8j.json

By default, this will return only 1000 results so we'll pass the query parameters `$limit=400000&$offset=0` so that the json returned will contain all of the elements in the dataset.

In [0]:
url = "https://data.cityofnewyork.us/resource/43nn-pn8j.json?$limit=400000&$offset=0"

Opening the URL and reading the Response object, then putting the json string into the list `data`

In [0]:
response = urllib.request.urlopen(url)
jsonObj = response.read()
data = json.loads(jsonObj)

Checking the total number of elements in the list

In [68]:
dataSize = len(data)
print(dataSize)

386321


Defining an Inspection class and a Restaurant class to hold accumulator fields and other data that will allow us to make the final calculations.

* The `inspections` dictionary maps dates to `Inspection` objects

*Aside: "Restaurant" is one of those words that look weird when you spell it out, amirite?*

In [0]:
class Inspection():
      def __init__(self):
          self.date = ""
          self.grade = ""
          self.score = 0
          self.numViolations = 0
          
      def incrViolations(self):
        self.numViolations += 1
      

class Restaurant():
      def __init__(self):
          self.inspections = {}
          self.camis = ""

Here I'm defining a Borough class which will contain fields and methods which will manipulate and aggregate data for each borough including a list of Restaurant objects belonging to each.
 * The `restaurant` dictionary maps camis ids to `Restaurant` objects

In [0]:
class Borough():
    def __init__(self, name):
        self.name = name
        self.restaurants = {}
        self.allCitations = []
        self.allGrades = []
        self.num_A = 0
        self.num_B = 0
        self.num_C = 0
        self.num_P = 0
        self.num_Z = 0
        self.num_N = 0
        self.totalRest = 0
        self.totalCitations = 0
        self.totalViolations = 0
        
    def addRestaurant(self, camis, restaurant):
        self.restaurants[camis] = restaurant
        
    def isNewRestaurant(self, camis):
        if camis in self.restaurants:
            return False
        
        return True

Instantiating an object for each borough and putting them into the `boroughs` list 
* The lambda function is defined to call `Borough`'s constructor
* List comprehension is used to create the list of `Borough` objects

In [71]:
constructBoroughs = lambda x: Borough(x)

boroughs = ["Brooklyn", "Manhattan", "Queens", "Bronx", "Staten Island"]
boroughs = [constructBoroughs(b) for b in boroughs]

for b in boroughs:
    print(b.name)

Brooklyn
Manhattan
Queens
Bronx
Staten Island


Using `map()`, I call a function that itself calls `filter()` on the bulk data.  This will filter `data` based on the borough field. The resulting list is stored in the current borough object's `allCitations` field.                               

Citations refers to each individual data row in the dataset.

In [72]:
def buildCitationLists(borough):
      citList = list(filter(lambda x: x['boro'] == borough.name , data))
      borough.allCitations = citList
      return borough      

boroughs = list(map(buildCitationLists, boroughs))

boroughs[0].totalCitations = len(boroughs[0].allCitations)
boroughs[1].totalCitations = len(boroughs[1].allCitations)
boroughs[2].totalCitations = len(boroughs[2].allCitations)
boroughs[3].totalCitations = len(boroughs[3].allCitations)
boroughs[4].totalCitations = len(boroughs[4].allCitations)

print("CITATIONS IN DATASET")
print("Brooklyn:", len(boroughs[0].allCitations))
print("Manhattan:", len(boroughs[1].allCitations))
print("Queens:", len(boroughs[2].allCitations))
print("Bronx:", len(boroughs[3].allCitations))
print("Staten Island:", len(boroughs[4].allCitations))

CITATIONS IN DATASET
Brooklyn: 98071
Manhattan: 152428
Queens: 87787
Bronx: 34821
Staten Island: 13102


Using `map()` again, the `restaurants` dictionary of each borough is built. A unique 'camis' id key maps to each unique Restaurant object. Restaurant objects are created for each new camis by testing if the key is present in that `Borough` object's `restaurants` dictionary.

Additionally, the details of the citation are entered if it is a new inspection or, if an inspection was already recorded in the `Restaurant` object, the number of violations is incremented if a violation was noted on that line of data.

In [73]:
def buildRestaurantDicts(record, borough):    
      if borough.isNewRestaurant(record['camis']):
          restaurant = Restaurant()
          restaurant.camis = record['camis']      
          borough.addRestaurant(record['camis'], restaurant)
 
      if record['inspection_date'] in borough.restaurants[record['camis']].inspections:
          if 'violation_code' in record:
              borough.restaurants[record['camis']].inspections[record['inspection_date']].incrViolations()
      else:
          inspection = Inspection()
          inspection.date = record['inspection_date']
          
          if 'grade' in record:
              inspection.grade = record['grade']
              
          if 'violation_code' in record:
              inspection.numViolations = 1
          
          borough.restaurants[record['camis']].inspections[record['inspection_date']] = inspection
          
          
def buildBoroughObjects(borough):       
      for record in borough.allCitations:
          buildRestaurantDicts(record, borough)

      return borough

boroughs = list(map(buildBoroughObjects, boroughs))

boroughs[0].totalRest = len(boroughs[0].restaurants)
boroughs[1].totalRest = len(boroughs[1].restaurants)
boroughs[2].totalRest = len(boroughs[2].restaurants)
boroughs[3].totalRest = len(boroughs[3].restaurants)
boroughs[4].totalRest = len(boroughs[4].restaurants)

print("RESTAURANTS PER BOROUGH")
print("Brooklyn:", len(boroughs[0].restaurants))
print("Manhattan:", len(boroughs[1].restaurants))
print("Queens:", len(boroughs[2].restaurants))
print("Bronx:", len(boroughs[3].restaurants))
print("Staten Island:", len(boroughs[4].restaurants))

RESTAURANTS PER BOROUGH
Brooklyn: 6717
Manhattan: 10668
Queens: 6070
Bronx: 2413
Staten Island: 973


Finding total number of grades given
* `boroughs` is passed to the map function along with the `accumulateGrades()` function
* `accumulateGrades()` will iterate through each `borough` and eventually return a list of numbers of grades for each borough


In [74]:
def accumulateGrades(borough):
      boroGrades = []
    
      for restKey in borough.restaurants:
        restaurant = borough.restaurants[restKey]
        
        for dateKey in restaurant.inspections:
            inspection = restaurant.inspections[dateKey]
            if inspection.grade != "":
              boroGrades.append(inspection.grade)
              
      return boroGrades
          
allGrades = list(map(accumulateGrades, boroughs))

boroughs[0].allGrades = allGrades[0]
boroughs[1].allGrades = allGrades[1]
boroughs[2].allGrades = allGrades[2]
boroughs[3].allGrades = allGrades[3]
boroughs[4].allGrades = allGrades[4]

print("TOTAL GRADES GIVEN")
print("Brooklyn:", len(allGrades[0]))
print("Manhattan:", len(allGrades[1]))
print("Queens", len(allGrades[2]))
print("Bronx", len(allGrades[3]))
print("Staten Island", len(allGrades[4]))

TOTAL GRADES GIVEN
Brooklyn: 20237
Manhattan: 32159
Queens 18841
Bronx 7639
Staten Island 2819


Findng the number of each grade attained per borough
* `boroughs` is passed to the map function along with the `tallyGrades()` function
* The `len()` of the list returned by passing a lambda function which tests for equality to `filter()` for each possible grade

In [75]:
def tallyGrades(borough):
      borough.num_A = len(list(filter(lambda x: x == "A", borough.allGrades)))
      borough.num_B = len(list(filter(lambda x: x == "B", borough.allGrades)))
      borough.num_C = len(list(filter(lambda x: x == "C", borough.allGrades)))
      borough.num_P = len(list(filter(lambda x: x == "P", borough.allGrades)))
      borough.num_Z = len(list(filter(lambda x: x == "Z", borough.allGrades)))
      borough.num_N = len(list(filter(lambda x: x == "N", borough.allGrades)))
      return borough

boroughs = list(map(tallyGrades, boroughs))

print("NUMBER OF EACH GRADE GIVEN")
print("Brooklyn - A:", boroughs[0].num_A, " B:", boroughs[0].num_B,
      " C:", boroughs[0].num_C, " P:", boroughs[0].num_P, 
      " Z:", boroughs[0].num_Z, " N:", boroughs[0].num_N)
print("Manhattan - A:", boroughs[1].num_A, " B:", boroughs[1].num_B,
      " C:", boroughs[1].num_C, " P:", boroughs[1].num_P, 
      " Z:", boroughs[1].num_Z, " N:", boroughs[1].num_N)
print("Queens - A:", boroughs[2].num_A, " B:", boroughs[2].num_B,
      " C:", boroughs[2].num_C, " P:", boroughs[2].num_P, 
      " Z:", boroughs[2].num_Z, " N:", boroughs[2].num_N)
print("Bronx - A:", boroughs[3].num_A, " B:", boroughs[3].num_B,
      " C:", boroughs[3].num_C, " P:", boroughs[3].num_P, 
      " Z:", boroughs[3].num_Z, " N:", boroughs[3].num_N)
print("Staten Island - A:", boroughs[4].num_A, " B:", boroughs[4].num_B,
      " C:", boroughs[4].num_C, " P:", boroughs[4].num_P, 
      " Z:", boroughs[4].num_Z, " N:", boroughs[4].num_N)


NUMBER OF EACH GRADE GIVEN
Brooklyn - A: 17114  B: 1616  C: 529  P: 501  Z: 348  N: 127
Manhattan - A: 27923  B: 2318  C: 759  P: 556  Z: 466  N: 136
Queens - A: 16215  B: 1413  C: 420  P: 388  Z: 301  N: 104
Bronx - A: 6460  B: 664  C: 186  P: 152  Z: 144  N: 33
Staten Island - A: 2433  B: 236  C: 57  P: 44  Z: 27  N: 22


`reduce()` is employed to add the total number of violations . Then we divide by number of restaurants, using `map()`  to do this for each borough
* `boroughs` is passed to the map function along with the `accumulateViolations()` function
* `accumulateViolations()` will iterate through each `borough` and eventually return a list of numbers of violations for each borough
* `reduce()` is passed a a lambda function which will sum the total number of violations

In [76]:
def accumulateViolations(borough):
      boroViolations = []
    
      for restKey in borough.restaurants:
        restaurant = borough.restaurants[restKey]
        
        for dateKey in restaurant.inspections:
            inspection = restaurant.inspections[dateKey]
            boroViolations.append(inspection.numViolations)
              
      return boroViolations
  

boroughViolations = list(map(accumulateViolations, boroughs))

boroughs[0].totalViolations = reduce((lambda x, y: x + y), boroughViolations[0])
boroughs[1].totalViolations = reduce((lambda x, y: x + y), boroughViolations[1])
boroughs[2].totalViolations = reduce((lambda x, y: x + y), boroughViolations[2])
boroughs[3].totalViolations = reduce((lambda x, y: x + y), boroughViolations[3])
boroughs[4].totalViolations = reduce((lambda x, y: x + y), boroughViolations[4])

print("TOTAL VIOLATIONS")
print("Brooklyn:", boroughs[0].totalViolations)
print("Manhattan:", boroughs[1].totalViolations)
print("Queens:", boroughs[2].totalViolations)
print("Bronx:", boroughs[3].totalViolations)
print("Staten Island:", boroughs[4].totalViolations)


TOTAL VIOLATIONS
Brooklyn: 96349
Manhattan: 150243
Queens: 86542
Bronx: 34355
Staten Island: 12959


Calculation of average number of violations per restaurant per borough

In [77]:
def calculateViolationAvg(borough):
      return borough.totalViolations / borough.totalRest

avgViolations = list(map(calculateViolationAvg, boroughs))

print("AVERAGE NUMBER OF VIOLATIONS PER RESTAURANT")
print("Brooklyn:", avgViolations[0])
print("Manhattan:", avgViolations[1])
print("Queens:", avgViolations[2])
print("Bronx:", avgViolations[3])
print("Staten Island:", avgViolations[4])

AVERAGE NUMBER OF VIOLATIONS PER RESTAURANT
Brooklyn: 14.34405240434718
Manhattan: 14.083520809898763
Queens: 14.257331136738056
Bronx: 14.23746373808537
Staten Island: 13.318602261048305


Finally, we find calculate percentage share of each grade per borough by referring to the number of each grade given and the total number of grades given for each borough.
* For each borough `(number of grade X) / (total grades given)`

*Aside: Please pardon the brutishness of some of this code. I simply did not have enough time to refactor.*

In [78]:
def calculateGradePct(borough):
      pctList = []
      pctList.append((borough.num_A / len(borough.allGrades)) * 100)
      pctList.append((borough.num_B / len(borough.allGrades)) * 100)
      pctList.append((borough.num_C / len(borough.allGrades)) * 100)
      pctList.append((borough.num_P / len(borough.allGrades)) * 100)
      pctList.append((borough.num_Z / len(borough.allGrades)) * 100)
      pctList.append((borough.num_N / len(borough.allGrades)) * 100)
      return pctList

gradePctList = list(map(calculateGradePct, boroughs))

print("PERCENTAGE OF EACH GRADE")
print("Brooklyn - A:", gradePctList[0][0], "%  B:", gradePctList[0][1],
      "%  C:", gradePctList[0][2], "%  P:", gradePctList[0][3], 
      "%  Z:", gradePctList[0][4], "%  N:", gradePctList[0][5], "%")
print("Manhattan - A:", gradePctList[1][0], "%  B:", gradePctList[1][1],
      "%  C:", gradePctList[1][2], "%  P:", gradePctList[1][3], 
      "%  Z:", gradePctList[1][4], "%  N:", gradePctList[1][5], "%")
print("Queens - A:", gradePctList[2][0], "%  B:", gradePctList[2][1],
      "%  C:", gradePctList[2][2], "%  P:", gradePctList[2][3], 
      "%  Z:", gradePctList[2][4], "%  N:", gradePctList[2][5], "%")
print("Bronx - A:", gradePctList[3][0], "%  B:", gradePctList[3][1],
      "%  C:", gradePctList[3][2], "%  P:", gradePctList[3][3], 
      "%  Z:", gradePctList[3][4], "%  N:", gradePctList[3][5], "%")
print("Bronx - A:", gradePctList[4][0], "%  B:", gradePctList[4][1],
      "%  C:", gradePctList[4][2], "%  P:", gradePctList[4][3], 
      "%  Z:", gradePctList[4][4], "%  N:", gradePctList[4][5], "%")

PERCENTAGE OF EACH GRADE
Brooklyn - A: 84.56787073182784 %  B: 7.985373326085882 %  C: 2.6140238177595494 %  P: 2.4756633888422197 %  Z: 1.719622473686811 %  N: 0.627563374017888 %
Manhattan - A: 86.82794863024348 %  B: 7.207935570135887 %  C: 2.3601480145526916 %  P: 1.728909481016201 %  Z: 1.4490500326502689 %  N: 0.4228987219751858 %
Queens - A: 86.06231091767953 %  B: 7.499601931956902 %  C: 2.229181041346001 %  P: 2.0593386762910675 %  Z: 1.5975797462979673 %  N: 0.5519876864285335 %
Bronx - A: 84.5660426757429 %  B: 8.69223720382249 %  C: 2.4348736745647335 %  P: 1.9897892394292447 %  Z: 1.8850634899856005 %  N: 0.43199371645503337 %
Bronx - A: 86.30720113515432 %  B: 8.37176303653778 %  C: 2.0219936147570063 %  P: 1.5608371763036537 %  Z: 0.9577864490954239 %  N: 0.7804185881518269 %


It was a challenge for me to break out of the object-oriented way of thinking. In writing the code, I felt like I was iterating through the original dataset many more times than I would have if I'd followed a programming style that I was more familiar with.  I found myself breaking the flow of the program into more numerous, yet admittedly more succinct, parts in order to fulfill the spec.  Having little experience with Python and these functions, the biggest obstable was fitting these functions into my plan.