# <b> Problem Statement

The people of New Yorker use the 311 system to report complaints about the non-emergency problems to local authorities. Various agencies in New York are assigned these problems. The Department of Housing Preservation and Development of New York City is the agency that processes 311 complaints that are related to housing and buildings.

In the last few years, the number of 311 complaints coming to the Department of Housing Preservation and Development has increased significantly. Although these complaints are not necessarily urgent, the large volume of complaints and the sudden increase is impacting the overall efficiency of operations of the agency.

Therefore, the Department of Housing Preservation and Development has approached your organization to help them manage the large volume of 311 complaints they are receiving every year.

The agency needs answers to several questions. The answers to those questions must be supported by data and analytics. 
## These are their  questions:

### Which type of complaint should the Department of Housing Preservation and Development of New York City focus on first?
    Should the Department of Housing Preservation and Development of New York City focus on any particular set of boroughs, ZIP codes, or street (where the complaints are severe) for the specific type of complaints you identified in response to Question 1?
    Does the Complaint Type that you identified in response to question 1 have an obvious relationship with any particular characteristic or characteristics of the houses or buildings?
    Can a predictive model be built for a future prediction of the possibility of complaints of the type that you have identified in response to question 1?

    
Your organization has assigned you as the lead data scientist to provide the answers to these questions. You need to work on getting answers to them in this Capstone Project by following the standard approach of data science and machine learning.

In [2]:
import pandas as pd
import numpy as np
import os
#from sodapy import Socrata

In [3]:
!pip install sodapy



In [5]:
from sodapy import Socrata

sodapy is a module to handle SODA API. Let's set up the api parameters

In [6]:
SODA_API ='https://data.cityofnewyork.us/resource/erm2-nwe9.json'
socrata_domain = 'data.cityofnewyork.us'
socrata_dataset_identifier = 'erm2-nwe9'


#socrate_token = os.environ.get()

Create a socrata object to handle the API

In [7]:
client = Socrata(socrata_domain, 'qJQUD7YIiS4kgvLS1km08uZxh')

In [8]:
print('Domain: {domain:}\nSession: {session:}\nURI Prefix: {uri_prefix:}'.format(**client.__dict__))

Domain: data.cityofnewyork.us
Session: <requests.sessions.Session object at 0x7f8e2a8ae630>
URI Prefix: https://


Let's find out what are the columns are in the datasets

In [9]:
meta = client.get_metadata(socrata_dataset_identifier)
[x['name'] for x in meta['columns']]

['Unique Key',
 'Created Date',
 'Closed Date',
 'Agency',
 'Agency Name',
 'Complaint Type',
 'Descriptor',
 'Location Type',
 'Incident Zip',
 'Incident Address',
 'Street Name',
 'Cross Street 1',
 'Cross Street 2',
 'Intersection Street 1',
 'Intersection Street 2',
 'Address Type',
 'City',
 'Landmark',
 'Facility Type',
 'Status',
 'Due Date',
 'Resolution Description',
 'Resolution Action Updated Date',
 'Community Board',
 'BBL',
 'Borough',
 'X Coordinate (State Plane)',
 'Y Coordinate (State Plane)',
 'Open Data Channel Type',
 'Park Facility Name',
 'Park Borough',
 'Vehicle Type',
 'Taxi Company Borough',
 'Taxi Pick Up Location',
 'Bridge Highway Name',
 'Bridge Highway Direction',
 'Road Ramp',
 'Bridge Highway Segment',
 'Latitude',
 'Longitude',
 'Location',
 'Zip Codes',
 'Community Districts',
 'Borough Boundaries',
 'City Council Districts',
 'Police Precincts']

We don't need all of this. We will select only calls made for Department of Housing Preservation and Development. We also will need cases that hasn't been closed yet.
We are going to make a query and embed the query with the API request

In [10]:
query= """ select unique_key, created_date, agency, agency_name, complaint_type, descriptor, location_type, incident_zip, street_name,
landmark, status, due_date, borough, latitude, longitude
where agency = 'HPD' and closed_date is null"""
result = client.get(socrata_dataset_identifier,query=query)
result[0:3]

[{'unique_key': '45921136',
  'created_date': '2020-03-30T08:51:47.000',
  'agency': 'HPD',
  'agency_name': 'Department of Housing Preservation and Development',
  'complaint_type': 'GENERAL',
  'descriptor': 'CABINET',
  'location_type': 'RESIDENTIAL BUILDING',
  'incident_zip': '10029',
  'street_name': 'EAST  109 STREET',
  'status': 'Open',
  'borough': 'MANHATTAN',
  'latitude': '40.792336317680174',
  'longitude': '-73.9405012909439'},
 {'unique_key': '45921162',
  'created_date': '2020-03-30T08:36:25.000',
  'agency': 'HPD',
  'agency_name': 'Department of Housing Preservation and Development',
  'complaint_type': 'HEAT/HOT WATER',
  'descriptor': 'APARTMENT ONLY',
  'location_type': 'RESIDENTIAL BUILDING',
  'incident_zip': '11208',
  'street_name': 'CLEVELAND STREET',
  'status': 'Open',
  'borough': 'BROOKLYN',
  'latitude': '40.668888026964495',
  'longitude': '-73.88301437191929'},
 {'unique_key': '45921151',
  'created_date': '2020-03-30T17:04:38.000',
  'agency': 'HPD',


In [14]:
df_complaint = 

In [15]:
type(meta)

dict

['Unique Key',
 'Created Date',
 'Closed Date',
 'Agency',
 'Agency Name',
 'Complaint Type',
 'Descriptor',
 'Location Type',
 'Incident Zip',
 'Incident Address',
 'Street Name',
 'Cross Street 1',
 'Cross Street 2',
 'Intersection Street 1',
 'Intersection Street 2',
 'Address Type',
 'City',
 'Landmark',
 'Facility Type',
 'Status',
 'Due Date',
 'Resolution Description',
 'Resolution Action Updated Date',
 'Community Board',
 'BBL',
 'Borough',
 'X Coordinate (State Plane)',
 'Y Coordinate (State Plane)',
 'Open Data Channel Type',
 'Park Facility Name',
 'Park Borough',
 'Vehicle Type',
 'Taxi Company Borough',
 'Taxi Pick Up Location',
 'Bridge Highway Name',
 'Bridge Highway Direction',
 'Road Ramp',
 'Bridge Highway Segment',
 'Latitude',
 'Longitude',
 'Location',
 'Zip Codes',
 'Community Districts',
 'Borough Boundaries',
 'City Council Districts',
 'Police Precincts']