Skip to content

allisoncstafford/walk_risk_engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Walk Risk Engine

This project provides the risk of walking between user-supplied start and end points in Manhattan, New York City.

Data sources

Arrest data was downloaded from NYC Open Data's NYPD Arrests databases. This data set includes the latitude and longitude of arrests to the 0.000001 precision. Although this is an imperfect proxy for crimes and safety, considering the documented differences in policing, especially in communities of color. Unfortunately, more-precise proxies for disturbances such as 911 calls requiring police responses were not available.

Historic weather data was retrieved from darksky API for New York City over the date range provided by the NYC Arrest databases.

The recommended path is chosen via a call to the Google Directions API.

How it works

Data cleaning

The arrest data was binned into latitude and longitude coordinates with a precision of 0.001 to reduce the number of lat/long pairs, while maintaining an umcertainty range of plus or minus ~150 ft.

The weather data returned from DarkSkyAPI contains NaN where the value is zero, for example no measurable precipitation. It has no Ozone measurements prior to November 10, 2016. Ozone was removed as a feature.

All latitude, longitude points existing within Manhattan were determined using shape files provided by NYC Open Data. See manhattan_lat_longs.ipynb.

These latitudes and longitudes, along with a list of the days in the arrest dataset allowed these "zero-arrest" rows to be added to the database.

To reduce file size and floating point issues, weather features were scaled so that they could be saved as integers without loosing precision.

Preprocessing

The locations were combined into latlong strings, and then OneHotEncoded.

Model

The random forest classifier predicts the likelyhood of one or more arrest occuring at each location given the 'latlong', latitude, longitude, and projected apparent high temperature.

See it work

You can check out the flask app online here. Or the repo containing all of the functions needed to take user inputs and display the results on github here.

Attribution

Powered by Dark Sky

NYPD Arrests Data (Historic)

NYPD Arrests Data (Year to Date)

powered by Google

About

This project provides the risk of walking between user-supplied start and end points in New York City.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published