In [1]:
import pandas as pd
pd.set_option("max_columns", None)
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
%matplotlib inline

The goal of this notebook is to generate the files necessary for a treemap of New York City 311 Call complaint types. The result ought to be done in a reproducible way.

In [2]:
calls = pd.read_csv("../data/311_Service_Requests_from_2010_to_Present.csv", index_col=0, encoding='latin-1')

  interactivity=interactivity, compiler=compiler, result=result)


In [3]:
len(calls)

2230652

In [4]:
calls['Complaint Type'].value_counts()

Noise - Residential                  209823
HEAT/HOT WATER                       195258
Illegal Parking                      115522
Blocked Driveway                     111245
Street Condition                      86869
Street Light Condition                84642
UNSANITARY CONDITION                  76314
Water System                          70430
Noise - Street/Sidewalk               59729
PAINT/PLASTER                         56833
Noise                                 56179
PLUMBING                              48655
Noise - Commercial                    46433
Homeless Person Assistance            43909
Traffic Signal Condition              35184
DOOR/WINDOW                           35020
Sanitation Condition                  33547
Dirty Conditions                      32608
Sewer                                 31641
Rodent                                30184
WATER LEAK                            29988
General Construction/Plumbing         28794
Missed Collection (All Materials

These are a combination of not granular enough, too granular, and weirdly named. For a proper treemap, we'll need to generate our own categories.

The cells below represent almost two hours of work doing so...

In [5]:
len(calls['Complaint Type'].value_counts())

250

In [6]:
complaint_types = calls['Complaint Type'].value_counts().index

In [7]:
i = -1

In [8]:
complaint_types

Index(['Noise - Residential', 'HEAT/HOT WATER', 'Illegal Parking',
       'Blocked Driveway', 'Street Condition', 'Street Light Condition',
       'UNSANITARY CONDITION', 'Water System', 'Noise - Street/Sidewalk',
       'PAINT/PLASTER',
       ...
       'Construction', 'VACANT APARTMENT', 'Advocate-SCRIE/DRIE',
       'Building Condition', 'Asbestos/Garbage Nuisance', 'MOLD', 'LEAD',
       'Trapping Pigeon', 'Gas Station Discharge Lines', 'Advocate - RPIE'],
      dtype='object', length=250)

In [128]:
complaint_types[75]

'Special Projects Inspection Team (SPIT)'

In [130]:
calls['Complaint Type'].value_counts().iloc[75]

3407

In [165]:
calls['Complaint Type'].value_counts().index[138]

'City Vehicle Placard Complaint'

In [166]:
complaint_typification = [
    ['Noise', 'Residential Noise'],
    ['Heat or Water', 'Unspecified Heat or Water Issue'],
    ['Illegal Parking', 'Unspecified Parking Violation'],
    ['Illegal Parking', 'Blocked Driveway'],
    ['Street Condition', 'Unspecified Street Condition'],
    ['Street Condition', 'Damaged Street Light'],
    ['Sanitation', 'Unspecified Sanitation Issue'],
    ['Heat or Water', 'Water System'],
    ['Noise', 'Street or Sidewalk'],
    ['Building Maintenance', 'Paint or Plaster'],
    ['Noise', 'Unspecified Noise'],
    ['Heat or Water', 'Plumbing'],
    ['Noise', 'Commerical Noise'],
    ['Homelessness', 'Homelessness Assistance'],
    ['Street Condition', 'Damaged Traffic Signal'],
    ['Building Maintenance', 'Door or Window'],
    ['Sanitation', 'Unspecified Sanitation Issue'],
    ['Sanitation', 'Unspecified Sanitation Issue'],        
    ['Sanitation', 'Sewer'],
    ['Sanitation', 'Rodents'],
    ['Heat or Water', 'Water Leak'],
    ['Building Maintenance', 'General Construction'],
    ['Sanitation', 'Missed Collections'],
    ['Illegal Parking', 'Derelict Vehicle'],
    ['Paperwork', 'Literature Request'],
    ['Illegal Parking', 'Derelict Vehicle'],
    ['Building Maintenance', 'Electrical'],
    ['Building Maintenance', 'Illegal Conversion'],
    ['Consumer Complaint', 'Unspecified Consumer Complaint'],
    ['Paperwork', 'Benefit Card Replacement'],
    ['Greenery', 'Damaged Tree'],
    ['Street Condition', 'Broken Meter'],
    ['Building Maintenance', 'Flooring or Stairs'],
    ['Noise', 'Vehicle Noise'],
    ['Greenery', 'New Tree Request'],
    ['Consumer Complaint', 'Unspecified Consumer Complaint'],
    ['Consumer Complaint', 'Taxi Complaint'],
    ['Building Maintenance', 'Unspecified Maintenance Issue'],    
    ['Greenery', 'Overgrowth'],
    ['Crime', 'Graffiti'],
    ['General', 'Miscellaneous Issue'],
    ['Building Maintenance', 'Elevator'],
    ['Building Maintenance', 'Lead'],
    ['Greenery', 'Dead Tree'],
    ['Consumer Complaint', 'Food Establishment Complaint'],
    ['Animal Issues', 'Animal Abuse'],
    ['Property and Housing', 'Property Reduction'],
    ['Crime', 'Non-Emergency Police Matter'],
    ['Air Quality', 'Unspecified Air Quality Issue'],
    ['Crime', 'Unspecified Crime'],
    ['Building Maintenance', 'Sewer or Sidewalk'],
    ['General', 'Miscellaneous Issue'],  # Other Enforcement
    ['Street Condition', 'Damaged Sign'],
    ['Property and Housing', 'Rent Increase Exemption'],
    ['Property and Housing', 'Property Account Update'],
    ['Street Condition', 'Sidewalk Condition'],
    ['Property and Housing', 'Senior Citizen Housing'],
    ['Consumer Complaint', 'For Hire Vehicle'],
    ['Property and Housing', 'Document Request'],
    ['Sanitation', 'Standing Water'],
    ['Street Condition', 'Traffic'],
    ['Street Condition', 'Snow'],
    ['Homelessness', 'Homeless Encampment'],
    ['Fire Safety', 'Fire Safety Director On-Site Exam Scheduling'],
    ['Consumer Complaint', 'Street Vendor Complaint'],
    ['Street Condition', 'Missing Sign'],    
    ['Air Quality', 'Indoor Air Quality'],
    ['Heat or Water', 'Water Conservation'],
    ['Heat or Water', 'Plumbing'],
    ['Property and Housing', 'Payment Issue'],
    ['Noise', 'Park Noise'],
    ['Paperwork', 'Parking Status Request'],
    ['Street Condition', 'Highway Problem'],
    ['General', 'Agency Issue'],  # Agency Issue
    ['General', 'Special Enforcement'],  # Special Enforcement
    ['General', 'Special Building Inspection'],
    ['Paperwork', 'Literature Request'],
    ['Consumer Complaint', 'Food Poisoning'],
    ['Building Maintenance', 'Electrical'], # TODO: Possibly street condition though?
    ['Sanitation', 'Hazardous Materials'],
    ['Greenery', 'Illegal Tree Damage'],
    ['Paperwork', 'Literature Request'],
    ['Sanitation', 'Electronic Waste Disposal'],
    ['Street Condition', 'Curb Condition'],
    ['Consumer Complaint', 'Taxi Complaint'],
    ['Sanitation', 'Litter Basket Request'],
    ['Greenery', 'Parks Violation'],
    ['Animal Issues', 'Unsanitary Animal'],
    ['Sanitation', 'Vacant Lot Condition'],
    ['Street Condition', 'Damaged Sign'],
    ['Building Maintenance', 'Asbestos'],
    ['Animal Issues', 'Animal in a Park'],
    ['Greenery', 'Parks Violation'],
    ['Paperwork', 'DCA or DOH License'],
    ['Street Condition', 'Derelict Bicycle'],
    ['Property and Housing', 'Unspecified Issue'],
    ['Building Maintenance', 'School Maintenance'],
    ['Property and Housing', 'Unspecified Issue'],
    ['Heat or Water', 'Boilers'],
    ['Air Quality', 'Smoking'],
    ['Crime', 'Alcohol'],
    ['Heat or Water', 'Water Quality'],
    ['Sanitation', 'Industrial Waste'],
    ['Sanitation', 'Sewage'],
    ['Sanitation', 'Overflowing Garbage Can'],
    ['Crime', 'Posting Advertisements'],
    ['Paperwork', 'Parking Status Request'],
    ['General', 'Miscellaneous Issue'],
    ['Noise', 'Helicopter Noise'],
    ['Sanitation', 'Mosquitoes'],
    ['Consumer Complaint', 'Cable Complaint'],
    ['Heat or Water', 'Non-Residential Heat'],
    ['Building Maintenance', 'Elevator'],
    ['Street Condition', 'Broken Parking Meter'],
    ['Noise', 'Place of Worship Noise'],
    ['Fire Safety', 'Inspection'], # possibly Fire?
    ['General', 'Miscellaneous Issue'],
    ['Sanitation', 'Sweeping'],
    ['Sanitation', 'Pigeons'],
    ['Sanitation', 'Recycling Enforcement'],
    ['Paperwork', 'Parking Status Request'],
    ['Animal Issues', 'Unleashed Dog'],
    ['Public Transit', 'Ferry Inquiry or Complaint'],
    ['Property and Housing', 'Rent Increase Exemption'],
    ['Paperwork', 'Parking Status Request'],
    ['General', 'Daycare Inquiry'],
    ['Construction', 'Site Safety'],
    ['Homelessness', 'Panhandling'],
    ['Crime', 'Urinating in Public'],
    ['Crime', 'Elder Abuse'],
    ['Paperwork', 'Literature Request'],
    ['Consumer Complaint', 'Taxi Complaint'],
    ['General', 'Miscellaneous Issue'],
    ['Consumer Complaint', 'Paid Professional Complaint'],
    ['Consumer Complaint', 'For Hire Vehicle'],
    ['Consumer Complaint', 'Senior Center Complaint'],
    ['Greenery', 'Dead or Dying Tree'],
    ['Noise', 'Public Assembly Noise'],
    ['Illegal Parking', 'City Vehicle Placard Complaint'],
    ['Construction', 'Site Safety'],
    ['Street Condition', 'Bridge or Tunnel'],
    ['Street Condition', 'Payphone or LinkNYC'],
    ['Sanitation', 'Mold'],
    ['General', 'Found Property'],
    ['Animal Issues', 'Illegal Animal'],
    ['Construction', 'Site Safety'],
    ['Fire Safety', 'Fire Alarm'],
    ['Sanitation', 'Sweeping'],
    ['Crime', 'Drug Activity'],
    ['Paperwork', 'Literature Request'],
    ['Property and Housing', 'Property Statement Issue'],
    ['Crime', 'Disorderly Youth'],
    ['Greenery', 'Unspecified Greenery Issue'],
    ['General', 'Alzheimer Care'],
    ['Public Transit', 'Ferry Inquiry or Complaint'],
    ['Crime', 'Illegal Fireworks'],
    ['Property and Housing', 'Property Reduction'],
    ['Paperwork', 'Miscellaneous Paperwork'],
    ['Public Transit', 'Bus Stop Shelter Placement'],
    ['Noise', 'Sanitation Truck Noise'],
    ['General', 'Complaint'],
    ['Animal Issues', 'Harboring Bees or Wasps'],
    ['Consumer Complaint', 'Beach or Pool or Sauna'],
    ['General', 'Special Enforcement'],
    ['Sanitation', 'Litter Basket Request'],
    ['Heat or Water', 'Water Quality'],
    ['Greenery', 'Poison Ivy'],
    ['Fire Safety', 'Fire Alarm'],
    ['Paperwork', 'Parking Status Request'],
    ['Noise', 'Public Assembly'],
    ['Street Condition', 'Bike Rack'],
    ['Consumer Complaint', 'Home Delivered Meal Complaint'],
    ['Fire Safety', 'Fire Alarm'],
    ['Paperwork', 'Literature Request'],
    ['General', 'Compliment'],
    ['Greenery', 'Parks Violation'],
    ['Consumer Complaint', 'Home Delivered Meal Complaint'],
    ['Paperwork', 'Miscellaneous Paperwork'],
    ['Public Transit', 'Ferry Inquiry or Complaint'],
    ['Street Condition', 'Damaged Sign'],
    ['Paperwork', 'Literature Request'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'],
    ['Paperwork', 'Literature Request'],
    ['Consumer Complaint', 'Parking Facility Complaint'],
    ['General', 'Miscellaneous Issue'],
    ['Animal Issues', 'Animal Facility Issue'],
    ['Animal Issues', 'Illegal Animal'],
    ['Animal Issues', 'Animal Facility Issue'],
    ['Building Maintenance', 'Door or Window'],
    ['General', 'Miscellaneous Issue'],
    ['Illegal Parking', 'Derelict Vehicle'],
    ['Street Condition', 'Payphone or LinkNYC'],
    ['Sanitation', 'Public Toilet'],
    ['General', 'Miscellaneous Issue'],
    ['Sanitation', 'Overflowing Litter Basket'],
    ['Sanitation', 'Sweeping'],
    ['Fire Safety', 'Food Preparation Fire Safety'],
    ['General', 'Research Inquiry'], # "Research"
    ['General', 'Miscellaneous Issue'],
    ['Consumer Complaint', 'Home Care Provider Complaint'],
    ['General', 'Miscellaneous Issue'],
    ['Fire Safety', 'Food Preparation Fire Safety'],
    ['Street Condition', 'Missing Sign'],
    ['Street Condition', 'Damaged Sign'],
    ['Heat or Water', 'Water System'],
    ['Paperwork', 'Parking Status Request'],
    ['General', 'Miscellaneous Issue'],
    ['Paperwork', 'Literature Request'],
    ['Fire Safety', 'Fire Alarm'],
    ['General', 'Miscellaneous Issue'],
    ['Consumer Complaint', 'Paid Professional Complaint'],
    ['General', 'Miscellaneous Issue'],
    ['Property and Housing', 'Property Account Update'],
    ['Street Condition', 'Bridge or Tunnel'],
    ['Fire Safety', 'Fire Alarm'],
    ['Consumer Complaint', 'For Hire Vehicle'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'],
    ['Sanitation', 'Hazardous Materials'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'],
    ['Sanitation', 'Hazardous Materials'],
    ['Noise', 'Public Assembly'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'], # Squeegee!
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'],
    ['Construction', 'Site Safety'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue'], # Trapping Pigeon...?
    ['Building Maintenance', 'Unspecified Maintenance Issue'],
    ['Building Maintenance', 'Unspecified Maintenance Issue'],
    ['Sanitation', 'Mold'],
    ['General', 'Miscellaneous Issue'],
    ['General', 'Miscellaneous Issue']
]

In [167]:
complaint_type_map = {z[0]: z[1] for z in zip(calls['Complaint Type'].value_counts().index,
                                              [t[0] for t in complaint_typification])}

In [168]:
complaint_subtype_map = {z[0]: z[1] for z in zip(calls['Complaint Type'].value_counts().index,
                                                 [t[1] for t in complaint_typification])}

In [169]:
calls['Generalized Complaint Type'] = calls['Complaint Type'].map(complaint_type_map)

In [170]:
calls['Generalized Complaint Subtype'] = calls['Complaint Type'].map(complaint_subtype_map)

This particular map will only work when applied to the data in its current form. The addition of even a single new entry type or the movement of various entry types around as a natural result of increasing call volumes in a future or different version of the dataset being ingested will invalidate the entire experiment and require starting all over again.

Obviously we don't want to do that.

The solution is to save the intermediate products, the maps, to CSV and Pickle files.

In a future version of this code, running in a future notebook, we will load these intermediates and use them to classify the bulk of the records that we have, leaving just a few left over that we still need to deal with.

In [171]:
pd.DataFrame(index=calls['Complaint Type'].value_counts().index,
             data={
        'Generalized Complaint Type': [complaint_type_map[t] for t in calls['Complaint Type'].value_counts().index],
        'Generalized Complaint Subtype': [complaint_subtype_map[t] for t in calls['Complaint Type'].value_counts().index]
}).to_csv("../data/complaint_mapping.csv")

In [172]:
import pickle
pickle.dump(complaint_type_map, open("../data/complaint_type_map.p", "wb" ))
pickle.dump(complaint_subtype_map, open("../data/complaint_subtype_map.p", "wb" ))

Let's save the version of the data that we've generated. Since this is a large file, we'll save it in place, meaning that we'll overwrite the existing version of the file. This should be safe, since we haven't modified any columns, just added new ones.

Now let's save the actual data to a JSON file.

In [173]:
pd.set_option('max_rows', 500)

In [174]:
calls.groupby(['Generalized Complaint Type', 'Generalized Complaint Subtype'])['Complaint Type'].count()

Generalized Complaint Type  Generalized Complaint Subtype               
Air Quality                 Indoor Air Quality                                4137
                            Smoking                                           1541
                            Unspecified Air Quality Issue                     8349
Animal Issues               Animal Abuse                                      9410
                            Animal Facility Issue                              112
                            Animal in a Park                                  1791
                            Harboring Bees or Wasps                            176
                            Illegal Animal                                     328
                            Unleashed Dog                                      587
                            Unsanitary Animal                                 2119
Building Maintenance        Asbestos                                          1999
              

In [175]:
types = calls.groupby(['Generalized Complaint Type', 'Generalized Complaint Subtype'])['Complaint Type'].count().reset_index()

In [176]:
types.columns = ['Type', 'Subtype', 'Count']

In [177]:
types.to_csv("complaint_types_2.csv", index=False)

In [178]:
types.to_csv("C:/Users/Alex/Desktop/threshold-tree/data/complaint_types_2.csv", index=False)

In [179]:
types.to_csv("C:/Users/Alex/Desktop/d3-data/data/complaint_types_2.csv", index=False)

The issue: I want to build a treemap using this data, but many of these numbers are just too small! Plus, building a treemap using D3 is *hard*.

The other issue: this is not appropriate for the data science project.

The other other issue: this is only interesting with a full year's data.

The solution: wait until a full year of data comes in, then return to this project as the first of my major D3 projects in the new year.