This project provides the risk of walking between user-supplied start and end points in Manhattan, New York City.
Arrest data was downloaded from NYC Open Data's NYPD Arrests databases. This data set includes the latitude and longitude of arrests to the 0.000001 precision. Although this is an imperfect proxy for crimes and safety, considering the documented differences in policing, especially in communities of color. Unfortunately, more-precise proxies for disturbances such as 911 calls requiring police responses were not available.
Historic weather data was retrieved from darksky API for New York City over the date range provided by the NYC Arrest databases.
The recommended path is chosen via a call to the Google Directions API.
The arrest data was binned into latitude and longitude coordinates with a precision of 0.001 to reduce the number of lat/long pairs, while maintaining an umcertainty range of plus or minus ~150 ft.
The weather data returned from DarkSkyAPI contains NaN where the value is zero, for example no measurable precipitation. It has no Ozone measurements prior to November 10, 2016. Ozone was removed as a feature.
All latitude, longitude points existing within Manhattan were determined using shape files provided by NYC Open Data. See manhattan_lat_longs.ipynb.
These latitudes and longitudes, along with a list of the days in the arrest dataset allowed these "zero-arrest" rows to be added to the database.
To reduce file size and floating point issues, weather features were scaled so that they could be saved as integers without loosing precision.
The locations were combined into latlong strings, and then OneHotEncoded.
The random forest classifier predicts the likelyhood of one or more arrest occuring at each location given the 'latlong', latitude, longitude, and projected apparent high temperature.
You can check out the flask app online here. Or the repo containing all of the functions needed to take user inputs and display the results on github here.