# Abstract

Data pertaining to the time, location, and content of thousands of 311 calls in New York City is recorded every day. By studying trends in this data, government agencies can respond more effectively to non-emergency requests and issues raised by the populations they serve. Using public data on 311 calls and community districts in NYC, this project explores which types of calls are the most common, how daily call volume varies across different districts, and how calls are distributed to various responding government agencies. Using natural language processing and the Keras library, this project aims to develop a neural network that can classify the government agency that responded to a call, given the call's description as an input. The best-performing model correctly classified 73% of calls in a test subset of the data. The agency variable was heavily imbalanced: the New York Police Department (NYPD) responded to just over 50% of all 311 calls. There were 14 total government agencies to which 311 calls in the dataset were assigned, which presents difficulties in training a classifier with perfect accuracy. Currently, most non-emergency service requests are handled through a phone call. This type of classifier, if developed further, could facilitate the automatic assigment of non-emergency requests to the appropriate agency in an online context where requests generate text descriptions.


# Data Sources

The data used in this project was obtained from two sources: 

- [NYC Open Data's 311 Service Requests from 2010 to Present](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9)
  - This dataset contains information about the time, location, complaint type, and status of more than 24 million 311 service requests made in New York City within the past decade. This project uses a subset of the data from 2020 that was accessed with the [Socrata Open Data (SODA) API](https://dev.socrata.com/consumers/getting-started.html). 
- [NYC Department of City Planning’s Community District Profiles](https://communityprofiles.planning.nyc.gov/)
  - After navigating to any profile on the Community District Profiles website,  the Indicators Data can be obtained under "Download the Data." This dataset contains development and population information for each Community District in New York City. Community board names, which correspond to community districts, can also be found in the 311 dataset. 



# Loading Dependencies

In [None]:
pip install sodapy

In [5]:
from sodapy import Socrata
import pandas as pd
from google.colab import drive

# Obtaining and Exporting the 311 Data

The [API Documentation](https://dev.socrata.com/foundry/data.cityofnewyork.us/erm2-nwe9) for this dataset contains further information about how to obtain filtered versions of the data. Below, we simply request the 1500000 most recent 311 calls. The line ``client.timeout = 1000`` prevents the Socrata connection from timing out after its default setting of 10 seconds. 

In [1]:
client = Socrata("data.cityofnewyork.us", None)

client.timeout = 1000
results = client.get("erm2-nwe9", limit=1500000)

The results are stored into a pandas dataframe and exported as a CSV file to Google Drive:

In [None]:
df = pd.DataFrame.from_records(results)

In [None]:
drive.mount("/content/gdrive")

In [None]:
df.to_csv('/content/gdrive/My Drive/Colab Notebooks/311.csv', header=True)

# Google Drive Links

The datasets can also be accessed via public Google Drive documents at the links below.

- [311 Service Requests](https://drive.google.com/file/d/1wzwvb1gEN9o9sIhRPoZAt8ISLIoQrdrT/view?usp=sharing)

- [Community District Indicators](https://drive.google.com/file/d/1CBg0A-Y1IocnqQTtBQ_35kMZNwa5wxR4/view?usp=sharing)