### Data Dictionary

The people of New Yorker use the 311 system to report complaints about the non-emergency problems to local authorities. Various agencies in New York are assigned these problems. The Department of Housing Preservation and Development of New York City is the agency that processes 311 complaints that are related to housing and buildings.

The agency needs answers to several questions. The answers to those questions must be supported by data and analytics. These are their  questions:

- Which type of complaint should the Department of Housing Preservation and Development of New York City focus on first?
- Should the Department of Housing Preservation and Development of New York City focus on any particular set of boroughs, ZIP codes, or street (where the complaints are severe) for the specific type of complaints you identified in response to Question 1?
- Does the Complaint Type that you identified in response to question 1 have an obvious relationship with any particular characteristic or characteristics of the houses or buildings?
- Can a predictive model be built for a future prediction of the possibility of complaints of the type that you have identified in response to question 1?


### Import Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn


%matplotlib inline
sns.set_style('dark')
sns.set(font_scale=1.5)

from sklearn.model_selection import cross_val_score, train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.preprocessing import LabelEncoder, StandardScaler, MinMaxScaler, OneHotEncoder
from sklearn.metrics import confusion_matrix, classification_report, mean_absolute_error, mean_squared_error,r2_score
from sklearn.metrics import plot_confusion_matrix, plot_precision_recall_curve, plot_roc_curve, accuracy_score
from sklearn.metrics import auc, f1_score, precision_score, recall_score, roc_auc_score


import feature_engine.missing_data_imputers as mdi
from feature_engine.outlier_removers import Winsorizer

import warnings
warnings.filterwarnings('ignore')

import pickle
from pickle import dump, load

pd.options.display.max_columns= None
#pd.options.display.max_rows = None

### Data Exploration

In [2]:
df = pd.read_csv("partone.csv")

In [3]:
df

Unnamed: 0,ComplaintType,Zipcode,Street,Borough
0,HEAT/HOT WATER,10019.0,WEST 52 STREET,MANHATTAN
1,UNSANITARY CONDITION,11204.0,67 STREET,BROOKLYN
2,HEAT/HOT WATER,11372.0,37 AVENUE,QUEENS
3,HEAT/HOT WATER,10458.0,SOUTHERN BOULEVARD,BRONX
4,APPLIANCE,11209.0,78 STREET,BROOKLYN
...,...,...,...,...
5939141,HEAT/HOT WATER,10029.0,EAST 108 STREET,MANHATTAN
5939142,HEAT/HOT WATER,10461.0,BRUCKNER BOULEVARD,BRONX
5939143,HEAT/HOT WATER,10034.0,SHERMAN AVENUE,MANHATTAN
5939144,HEAT/HOT WATER,10467.0,WEST GUN HILL ROAD,BRONX


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5939146 entries, 0 to 5939145
Data columns (total 4 columns):
 #   Column         Dtype  
---  ------         -----  
 0   ComplaintType  object 
 1   Zipcode        float64
 2   Street         object 
 3   Borough        object 
dtypes: float64(1), object(3)
memory usage: 181.2+ MB


In [5]:
df['Zipcode'] = df['Zipcode'].astype('object')

In [6]:
df.describe(include='all')

Unnamed: 0,ComplaintType,Zipcode,Street,Borough
count,5939146,5939146.0,5939146,5939146
unique,29,202.0,6567,6
top,HEAT/HOT WATER,11226.0,GRAND CONCOURSE,BROOKLYN
freq,1254458,215709.0,91983,1731202


In [7]:
df.shape

(5939146, 4)

In [8]:
df.columns

Index(['ComplaintType', 'Zipcode', 'Street', 'Borough'], dtype='object')

In [9]:
df

Unnamed: 0,ComplaintType,Zipcode,Street,Borough
0,HEAT/HOT WATER,10019,WEST 52 STREET,MANHATTAN
1,UNSANITARY CONDITION,11204,67 STREET,BROOKLYN
2,HEAT/HOT WATER,11372,37 AVENUE,QUEENS
3,HEAT/HOT WATER,10458,SOUTHERN BOULEVARD,BRONX
4,APPLIANCE,11209,78 STREET,BROOKLYN
...,...,...,...,...
5939141,HEAT/HOT WATER,10029,EAST 108 STREET,MANHATTAN
5939142,HEAT/HOT WATER,10461,BRUCKNER BOULEVARD,BRONX
5939143,HEAT/HOT WATER,10034,SHERMAN AVENUE,MANHATTAN
5939144,HEAT/HOT WATER,10467,WEST GUN HILL ROAD,BRONX


In [10]:
df2 = df.groupby('ComplaintType').count()

In [11]:
df2

Unnamed: 0_level_0,Zipcode,Street,Borough
ComplaintType,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AGENCY,8,8,8
APPLIANCE,112677,112677,112677
Appliance,4,4,4
CONSTRUCTION,5044,5044,5044
DOOR/WINDOW,205133,205133,205133
ELECTRIC,306447,306447,306447
ELEVATOR,6720,6720,6720
Electric,1,1,1
FLOORING/STAIRS,137313,137313,137313
GENERAL,151176,151176,151176


In [12]:
df3 = df.groupby('Zipcode').count()

In [13]:
df3

Unnamed: 0_level_0,ComplaintType,Street,Borough
Zipcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
10001.0,9031,9031,9031
10002.0,32385,32385,32385
10003.0,25574,25574,25574
10004.0,329,329,329
10005.0,440,440,440
...,...,...,...
11692.0,12498,12498,12498
11693.0,4768,4768,4768
11694.0,10557,10557,10557
11697.0,269,269,269


In [14]:
df4 = df.groupby('Street').count()
df4

Unnamed: 0_level_0,ComplaintType,Zipcode,Borough
Street,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1 AVENUE,15308,15308,15308
1 COURT,1,1,1
1 PLACE,76,76,76
1 STREET,261,261,261
10 AVENUE,7629,7629,7629
...,...,...,...
ZEREGA AVENUE,1438,1438,1438
ZION STREET,1,1,1
ZOE STREET,8,8,8
ZOLLER ROAD,35,35,35


### Data Preprocessing

### Treat Duplicate Values

In [15]:
df['ComplaintType'].value_counts()

HEAT/HOT WATER          1254458
HEATING                  875942
PLUMBING                 709126
GENERAL CONSTRUCTION     498752
UNSANITARY CONDITION     451236
PAINT - PLASTER          359741
PAINT/PLASTER            346166
ELECTRIC                 306447
NONCONST                 259999
DOOR/WINDOW              205133
WATER LEAK               193468
GENERAL                  151176
FLOORING/STAIRS          137313
APPLIANCE                112677
SAFETY                    51495
OUTSIDE BUILDING           7133
ELEVATOR                   6720
Unsanitary Condition       5486
CONSTRUCTION               5044
General                    1157
Safety                      424
STRUCTURAL                   16
Plumbing                     11
AGENCY                        8
VACANT APARTMENT              6
Outside Building              6
Appliance                     4
Mold                          1
Electric                      1
Name: ComplaintType, dtype: int64

In [16]:
df.replace(to_replace='HEATING',value='HEAT/HOT WATER',inplace=True)

In [17]:
df['ComplaintType'].value_counts()

HEAT/HOT WATER          2130400
PLUMBING                 709126
GENERAL CONSTRUCTION     498752
UNSANITARY CONDITION     451236
PAINT - PLASTER          359741
PAINT/PLASTER            346166
ELECTRIC                 306447
NONCONST                 259999
DOOR/WINDOW              205133
WATER LEAK               193468
GENERAL                  151176
FLOORING/STAIRS          137313
APPLIANCE                112677
SAFETY                    51495
OUTSIDE BUILDING           7133
ELEVATOR                   6720
Unsanitary Condition       5486
CONSTRUCTION               5044
General                    1157
Safety                      424
STRUCTURAL                   16
Plumbing                     11
AGENCY                        8
VACANT APARTMENT              6
Outside Building              6
Appliance                     4
Mold                          1
Electric                      1
Name: ComplaintType, dtype: int64

In [18]:
df.replace(to_replace='PAINT - PLASTER',value='PAINT/PLASTER',inplace=True)

In [19]:
df['ComplaintType'].value_counts()

HEAT/HOT WATER          2130400
PLUMBING                 709126
PAINT/PLASTER            705907
GENERAL CONSTRUCTION     498752
UNSANITARY CONDITION     451236
ELECTRIC                 306447
NONCONST                 259999
DOOR/WINDOW              205133
WATER LEAK               193468
GENERAL                  151176
FLOORING/STAIRS          137313
APPLIANCE                112677
SAFETY                    51495
OUTSIDE BUILDING           7133
ELEVATOR                   6720
Unsanitary Condition       5486
CONSTRUCTION               5044
General                    1157
Safety                      424
STRUCTURAL                   16
Plumbing                     11
AGENCY                        8
Outside Building              6
VACANT APARTMENT              6
Appliance                     4
Mold                          1
Electric                      1
Name: ComplaintType, dtype: int64

In [20]:
df.replace(to_replace='CONSTRUCTION',value='GENERAL CONSTRUCTION',inplace=True)

In [21]:
df['ComplaintType'].value_counts()

HEAT/HOT WATER          2130400
PLUMBING                 709126
PAINT/PLASTER            705907
GENERAL CONSTRUCTION     503796
UNSANITARY CONDITION     451236
ELECTRIC                 306447
NONCONST                 259999
DOOR/WINDOW              205133
WATER LEAK               193468
GENERAL                  151176
FLOORING/STAIRS          137313
APPLIANCE                112677
SAFETY                    51495
OUTSIDE BUILDING           7133
ELEVATOR                   6720
Unsanitary Condition       5486
General                    1157
Safety                      424
STRUCTURAL                   16
Plumbing                     11
AGENCY                        8
VACANT APARTMENT              6
Outside Building              6
Appliance                     4
Mold                          1
Electric                      1
Name: ComplaintType, dtype: int64

In [22]:
df.replace(to_replace='Unsanitary Condition',value='UNSANITARY CONDITION',inplace=True)

In [23]:
df['ComplaintType'].value_counts()

HEAT/HOT WATER          2130400
PLUMBING                 709126
PAINT/PLASTER            705907
GENERAL CONSTRUCTION     503796
UNSANITARY CONDITION     456722
ELECTRIC                 306447
NONCONST                 259999
DOOR/WINDOW              205133
WATER LEAK               193468
GENERAL                  151176
FLOORING/STAIRS          137313
APPLIANCE                112677
SAFETY                    51495
OUTSIDE BUILDING           7133
ELEVATOR                   6720
General                    1157
Safety                      424
STRUCTURAL                   16
Plumbing                     11
AGENCY                        8
VACANT APARTMENT              6
Outside Building              6
Appliance                     4
Mold                          1
Electric                      1
Name: ComplaintType, dtype: int64

In [24]:
df.replace(to_replace='Electric',value='ELECTRIC',inplace=True)

In [25]:
df['ComplaintType'].value_counts()

HEAT/HOT WATER          2130400
PLUMBING                 709126
PAINT/PLASTER            705907
GENERAL CONSTRUCTION     503796
UNSANITARY CONDITION     456722
ELECTRIC                 306448
NONCONST                 259999
DOOR/WINDOW              205133
WATER LEAK               193468
GENERAL                  151176
FLOORING/STAIRS          137313
APPLIANCE                112677
SAFETY                    51495
OUTSIDE BUILDING           7133
ELEVATOR                   6720
General                    1157
Safety                      424
STRUCTURAL                   16
Plumbing                     11
AGENCY                        8
VACANT APARTMENT              6
Outside Building              6
Appliance                     4
Mold                          1
Name: ComplaintType, dtype: int64

In [26]:
df.replace(to_replace='General',value='GENERAL',inplace=True)

In [27]:
df['ComplaintType'].value_counts()

HEAT/HOT WATER          2130400
PLUMBING                 709126
PAINT/PLASTER            705907
GENERAL CONSTRUCTION     503796
UNSANITARY CONDITION     456722
ELECTRIC                 306448
NONCONST                 259999
DOOR/WINDOW              205133
WATER LEAK               193468
GENERAL                  152333
FLOORING/STAIRS          137313
APPLIANCE                112677
SAFETY                    51495
OUTSIDE BUILDING           7133
ELEVATOR                   6720
Safety                      424
STRUCTURAL                   16
Plumbing                     11
AGENCY                        8
VACANT APARTMENT              6
Outside Building              6
Appliance                     4
Mold                          1
Name: ComplaintType, dtype: int64

In [28]:
df.replace(to_replace='Safety',value='SAFETY',inplace=True)

In [29]:
df['ComplaintType'].value_counts()

HEAT/HOT WATER          2130400
PLUMBING                 709126
PAINT/PLASTER            705907
GENERAL CONSTRUCTION     503796
UNSANITARY CONDITION     456722
ELECTRIC                 306448
NONCONST                 259999
DOOR/WINDOW              205133
WATER LEAK               193468
GENERAL                  152333
FLOORING/STAIRS          137313
APPLIANCE                112677
SAFETY                    51919
OUTSIDE BUILDING           7133
ELEVATOR                   6720
STRUCTURAL                   16
Plumbing                     11
AGENCY                        8
VACANT APARTMENT              6
Outside Building              6
Appliance                     4
Mold                          1
Name: ComplaintType, dtype: int64

In [30]:
df.replace(to_replace='Plumbing',value='PLUMBING',inplace=True)

In [31]:
df['ComplaintType'].value_counts()

HEAT/HOT WATER          2130400
PLUMBING                 709137
PAINT/PLASTER            705907
GENERAL CONSTRUCTION     503796
UNSANITARY CONDITION     456722
ELECTRIC                 306448
NONCONST                 259999
DOOR/WINDOW              205133
WATER LEAK               193468
GENERAL                  152333
FLOORING/STAIRS          137313
APPLIANCE                112677
SAFETY                    51919
OUTSIDE BUILDING           7133
ELEVATOR                   6720
STRUCTURAL                   16
AGENCY                        8
VACANT APARTMENT              6
Outside Building              6
Appliance                     4
Mold                          1
Name: ComplaintType, dtype: int64

In [32]:
df.replace(to_replace='Appliance',value='APPLIANCE',inplace=True)

In [33]:
df['ComplaintType'].value_counts()

HEAT/HOT WATER          2130400
PLUMBING                 709137
PAINT/PLASTER            705907
GENERAL CONSTRUCTION     503796
UNSANITARY CONDITION     456722
ELECTRIC                 306448
NONCONST                 259999
DOOR/WINDOW              205133
WATER LEAK               193468
GENERAL                  152333
FLOORING/STAIRS          137313
APPLIANCE                112681
SAFETY                    51919
OUTSIDE BUILDING           7133
ELEVATOR                   6720
STRUCTURAL                   16
AGENCY                        8
Outside Building              6
VACANT APARTMENT              6
Mold                          1
Name: ComplaintType, dtype: int64

In [34]:
df.replace(to_replace='Outside Building',value='OUTSIDE BUILDING',inplace=True)

In [35]:
df['ComplaintType'].value_counts()

HEAT/HOT WATER          2130400
PLUMBING                 709137
PAINT/PLASTER            705907
GENERAL CONSTRUCTION     503796
UNSANITARY CONDITION     456722
ELECTRIC                 306448
NONCONST                 259999
DOOR/WINDOW              205133
WATER LEAK               193468
GENERAL                  152333
FLOORING/STAIRS          137313
APPLIANCE                112681
SAFETY                    51919
OUTSIDE BUILDING           7139
ELEVATOR                   6720
STRUCTURAL                   16
AGENCY                        8
VACANT APARTMENT              6
Mold                          1
Name: ComplaintType, dtype: int64

In [36]:
df.replace(to_replace='Mold',value='MOLD',inplace=True)

In [37]:
df['ComplaintType'].value_counts()

HEAT/HOT WATER          2130400
PLUMBING                 709137
PAINT/PLASTER            705907
GENERAL CONSTRUCTION     503796
UNSANITARY CONDITION     456722
ELECTRIC                 306448
NONCONST                 259999
DOOR/WINDOW              205133
WATER LEAK               193468
GENERAL                  152333
FLOORING/STAIRS          137313
APPLIANCE                112681
SAFETY                    51919
OUTSIDE BUILDING           7139
ELEVATOR                   6720
STRUCTURAL                   16
AGENCY                        8
VACANT APARTMENT              6
MOLD                          1
Name: ComplaintType, dtype: int64

In [38]:
#df.to_csv('parttwo.csv',index=False)

In [39]:
df2 = df.groupby('ComplaintType').count()

In [40]:
df2

Unnamed: 0_level_0,Zipcode,Street,Borough
ComplaintType,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AGENCY,8,8,8
APPLIANCE,112681,112681,112681
DOOR/WINDOW,205133,205133,205133
ELECTRIC,306448,306448,306448
ELEVATOR,6720,6720,6720
FLOORING/STAIRS,137313,137313,137313
GENERAL,152333,152333,152333
GENERAL CONSTRUCTION,503796,503796,503796
HEAT/HOT WATER,2130400,2130400,2130400
MOLD,1,1,1


In [41]:
df5 = df.groupby('Borough').count()

In [42]:
df5

Unnamed: 0_level_0,ComplaintType,Zipcode,Street
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
BRONX,1609837,1609837,1609837
BROOKLYN,1731202,1731202,1731202
MANHATTAN,1049360,1049360,1049360
QUEENS,641741,641741,641741
STATEN ISLAND,87187,87187,87187
Unspecified,819819,819819,819819


In [43]:
df

Unnamed: 0,ComplaintType,Zipcode,Street,Borough
0,HEAT/HOT WATER,10019.0,WEST 52 STREET,MANHATTAN
1,UNSANITARY CONDITION,11204.0,67 STREET,BROOKLYN
2,HEAT/HOT WATER,11372.0,37 AVENUE,QUEENS
3,HEAT/HOT WATER,10458.0,SOUTHERN BOULEVARD,BRONX
4,APPLIANCE,11209.0,78 STREET,BROOKLYN
...,...,...,...,...
5939141,HEAT/HOT WATER,10029.0,EAST 108 STREET,MANHATTAN
5939142,HEAT/HOT WATER,10461.0,BRUCKNER BOULEVARD,BRONX
5939143,HEAT/HOT WATER,10034.0,SHERMAN AVENUE,MANHATTAN
5939144,HEAT/HOT WATER,10467.0,WEST GUN HILL ROAD,BRONX


In [44]:
unspecifiedareas = df[df['Borough']=='Unspecified']

In [45]:
unspecifiedareas

Unnamed: 0,ComplaintType,Zipcode,Street,Borough
150007,HEAT/HOT WATER,10032.0,WEST 173 STREET,Unspecified
150008,HEAT/HOT WATER,11210.0,NEW YORK AVENUE,Unspecified
150009,HEAT/HOT WATER,11210.0,NEW YORK AVENUE,Unspecified
150010,HEAT/HOT WATER,11235.0,HOMECREST AVENUE,Unspecified
150011,HEAT/HOT WATER,10029.0,EAST 115 STREET,Unspecified
...,...,...,...,...
5904481,GENERAL,10040.0,HILLSIDE AVENUE,Unspecified
5904487,GENERAL,10030.0,ADAM C POWELL BOULEVARD,Unspecified
5904651,UNSANITARY CONDITION,10030.0,ADAM C POWELL BOULEVARD,Unspecified
5904697,GENERAL,10040.0,HILLSIDE AVENUE,Unspecified


In [46]:
df.replace(to_replace='Unspecified',value=np.nan,inplace=True)

In [47]:
df['Borough'].value_counts()

BROOKLYN         1731202
BRONX            1609837
MANHATTAN        1049360
QUEENS            641741
STATEN ISLAND      87187
Name: Borough, dtype: int64

In [48]:
df.isnull().sum()

ComplaintType         0
Zipcode               0
Street                0
Borough          819819
dtype: int64

In [49]:
df.dropna(inplace=True)

In [50]:
df.reset_index(drop=True,inplace=True)

In [51]:
df

Unnamed: 0,ComplaintType,Zipcode,Street,Borough
0,HEAT/HOT WATER,10019.0,WEST 52 STREET,MANHATTAN
1,UNSANITARY CONDITION,11204.0,67 STREET,BROOKLYN
2,HEAT/HOT WATER,11372.0,37 AVENUE,QUEENS
3,HEAT/HOT WATER,10458.0,SOUTHERN BOULEVARD,BRONX
4,APPLIANCE,11209.0,78 STREET,BROOKLYN
...,...,...,...,...
5119322,HEAT/HOT WATER,10029.0,EAST 108 STREET,MANHATTAN
5119323,HEAT/HOT WATER,10461.0,BRUCKNER BOULEVARD,BRONX
5119324,HEAT/HOT WATER,10034.0,SHERMAN AVENUE,MANHATTAN
5119325,HEAT/HOT WATER,10467.0,WEST GUN HILL ROAD,BRONX


### Create and save processed dataset

In [52]:
#df.to_csv("parttwo.csv",index=False)

### Focusing on Heat/Hot Water

In [53]:
df = pd.read_csv("parttwo.csv")

In [54]:
df

Unnamed: 0,ComplaintType,Zipcode,Street,Borough
0,HEAT/HOT WATER,10019.0,WEST 52 STREET,MANHATTAN
1,UNSANITARY CONDITION,11204.0,67 STREET,BROOKLYN
2,HEAT/HOT WATER,11372.0,37 AVENUE,QUEENS
3,HEAT/HOT WATER,10458.0,SOUTHERN BOULEVARD,BRONX
4,APPLIANCE,11209.0,78 STREET,BROOKLYN
...,...,...,...,...
5119322,HEAT/HOT WATER,10029.0,EAST 108 STREET,MANHATTAN
5119323,HEAT/HOT WATER,10461.0,BRUCKNER BOULEVARD,BRONX
5119324,HEAT/HOT WATER,10034.0,SHERMAN AVENUE,MANHATTAN
5119325,HEAT/HOT WATER,10467.0,WEST GUN HILL ROAD,BRONX


In [55]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5119327 entries, 0 to 5119326
Data columns (total 4 columns):
 #   Column         Dtype  
---  ------         -----  
 0   ComplaintType  object 
 1   Zipcode        float64
 2   Street         object 
 3   Borough        object 
dtypes: float64(1), object(3)
memory usage: 156.2+ MB


In [56]:
df2 = df.groupby('ComplaintType').count()
df2

Unnamed: 0_level_0,Zipcode,Street,Borough
ComplaintType,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AGENCY,8,8,8
APPLIANCE,95368,95368,95368
DOOR/WINDOW,205133,205133,205133
ELECTRIC,260880,260880,260880
ELEVATOR,6720,6720,6720
FLOORING/STAIRS,137313,137313,137313
GENERAL,152317,152317,152317
GENERAL CONSTRUCTION,352062,352062,352062
HEAT/HOT WATER,1847755,1847755,1847755
MOLD,1,1,1


In [57]:
df3 = df.groupby('Zipcode').count()
df3

Unnamed: 0_level_0,ComplaintType,Street,Borough
Zipcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
10001.0,7595,7595,7595
10002.0,28917,28917,28917
10003.0,22957,22957,22957
10004.0,303,303,303
10005.0,415,415,415
...,...,...,...
11692.0,10618,10618,10618
11693.0,4255,4255,4255
11694.0,9510,9510,9510
11697.0,229,229,229


In [58]:
df4 = df.groupby('Street').count()
df4

Unnamed: 0_level_0,ComplaintType,Zipcode,Borough
Street,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1 AVENUE,13748,13748,13748
1 COURT,1,1,1
1 PLACE,71,71,71
1 STREET,237,237,237
10 AVENUE,6909,6909,6909
...,...,...,...
ZEREGA AVENUE,1230,1230,1230
ZION STREET,1,1,1
ZOE STREET,8,8,8
ZOLLER ROAD,33,33,33


In [59]:
df5 = df.groupby('Borough').count()
df5

Unnamed: 0_level_0,ComplaintType,Zipcode,Street
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
BRONX,1609837,1609837,1609837
BROOKLYN,1731202,1731202,1731202
MANHATTAN,1049360,1049360,1049360
QUEENS,641741,641741,641741
STATEN ISLAND,87187,87187,87187


In [60]:
heatissue = df[df['ComplaintType']=='HEAT/HOT WATER']

In [61]:
heatissue

Unnamed: 0,ComplaintType,Zipcode,Street,Borough
0,HEAT/HOT WATER,10019.0,WEST 52 STREET,MANHATTAN
2,HEAT/HOT WATER,11372.0,37 AVENUE,QUEENS
3,HEAT/HOT WATER,10458.0,SOUTHERN BOULEVARD,BRONX
5,HEAT/HOT WATER,10456.0,MORRIS AVENUE,BRONX
7,HEAT/HOT WATER,11372.0,81 STREET,QUEENS
...,...,...,...,...
5119320,HEAT/HOT WATER,10029.0,EAST 108 STREET,MANHATTAN
5119322,HEAT/HOT WATER,10029.0,EAST 108 STREET,MANHATTAN
5119323,HEAT/HOT WATER,10461.0,BRUCKNER BOULEVARD,BRONX
5119324,HEAT/HOT WATER,10034.0,SHERMAN AVENUE,MANHATTAN


In [62]:
heatissue.reset_index(drop=True,inplace=True)

In [63]:
heatissue

Unnamed: 0,ComplaintType,Zipcode,Street,Borough
0,HEAT/HOT WATER,10019.0,WEST 52 STREET,MANHATTAN
1,HEAT/HOT WATER,11372.0,37 AVENUE,QUEENS
2,HEAT/HOT WATER,10458.0,SOUTHERN BOULEVARD,BRONX
3,HEAT/HOT WATER,10456.0,MORRIS AVENUE,BRONX
4,HEAT/HOT WATER,11372.0,81 STREET,QUEENS
...,...,...,...,...
1847750,HEAT/HOT WATER,10029.0,EAST 108 STREET,MANHATTAN
1847751,HEAT/HOT WATER,10029.0,EAST 108 STREET,MANHATTAN
1847752,HEAT/HOT WATER,10461.0,BRUCKNER BOULEVARD,BRONX
1847753,HEAT/HOT WATER,10034.0,SHERMAN AVENUE,MANHATTAN


In [64]:
zipcodes = heatissue.pivot_table(index=['Zipcode'],aggfunc=len)
zipcodes

Unnamed: 0_level_0,Borough,ComplaintType,Street
Zipcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
10001.0,3327,3327,3327
10002.0,12879,12879,12879
10003.0,9763,9763,9763
10004.0,141,141,141
10005.0,175,175,175
...,...,...,...
11692.0,2600,2600,2600
11693.0,1042,1042,1042
11694.0,4019,4019,4019
11697.0,60,60,60


In [65]:
zipcodes['Borough'].max()

59673

In [66]:
zipcodes[zipcodes['Borough']==59673]

Unnamed: 0_level_0,Borough,ComplaintType,Street
Zipcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
11226.0,59673,59673,59673


## Zipcode 11226 has highest Heat/Hot Water issue

In [67]:
borough = heatissue.pivot_table(index=['Borough'],aggfunc=len)
borough

Unnamed: 0_level_0,ComplaintType,Street,Zipcode
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
BRONX,600147,600147,600147.0
BROOKLYN,569310,569310,569310.0
MANHATTAN,418432,418432,418432.0
QUEENS,241660,241660,241660.0
STATEN ISLAND,18206,18206,18206.0


In [68]:
borough['ComplaintType'].max()

600147

In [69]:
borough[borough['ComplaintType']==600147]

Unnamed: 0_level_0,ComplaintType,Street,Zipcode
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
BRONX,600147,600147,600147.0


## Bronx has highest Heat/Hot Water issue

In [70]:
streets = heatissue.pivot_table(index=['Street'],aggfunc=len)
streets

Unnamed: 0_level_0,Borough,ComplaintType,Zipcode
Street,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1 AVENUE,5111,5111,5111.0
1 COURT,1,1,1.0
1 PLACE,22,22,22.0
1 STREET,109,109,109.0
10 AVENUE,2425,2425,2425.0
...,...,...,...
ZEREGA AVENUE,605,605,605.0
ZION STREET,1,1,1.0
ZOE STREET,6,6,6.0
ZOLLER ROAD,12,12,12.0


In [71]:
streets['ComplaintType'].max()

33010

In [72]:
streets[streets['ComplaintType']==33010]

Unnamed: 0_level_0,Borough,ComplaintType,Zipcode
Street,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
GRAND CONCOURSE,33010,33010,33010.0


## GRAND CONCOURSE has highest Heat/Hot Water issue

In [73]:
heatissue

Unnamed: 0,ComplaintType,Zipcode,Street,Borough
0,HEAT/HOT WATER,10019.0,WEST 52 STREET,MANHATTAN
1,HEAT/HOT WATER,11372.0,37 AVENUE,QUEENS
2,HEAT/HOT WATER,10458.0,SOUTHERN BOULEVARD,BRONX
3,HEAT/HOT WATER,10456.0,MORRIS AVENUE,BRONX
4,HEAT/HOT WATER,11372.0,81 STREET,QUEENS
...,...,...,...,...
1847750,HEAT/HOT WATER,10029.0,EAST 108 STREET,MANHATTAN
1847751,HEAT/HOT WATER,10029.0,EAST 108 STREET,MANHATTAN
1847752,HEAT/HOT WATER,10461.0,BRUCKNER BOULEVARD,BRONX
1847753,HEAT/HOT WATER,10034.0,SHERMAN AVENUE,MANHATTAN


In [74]:
#Save as csv
#heatissue.to_csv("heat.csv",index=False)