<a href="https://colab.research.google.com/github/28Protons/BuffStateDSAFall2021Challenge/blob/main/DSA_fall_2021_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Your First Day On The Job**
  You have just been hired by the city of Buffalo to work on their new high tech data analytics team. The Mayor is planning to receive realtime messages from citizens within a custom mobile app, and they want help getting to the heart of what life is like for each person they interact with. 

## The Project Scope
  Given any Geocoordinates in the city, display the neighborhood the point is in, and tell us a story about that neighborhood. It could be crime stats for that neighborhood, the most common 311 call concerns, demographic breakdowns, a nice chart, a summarized series, or a table.

  Fortunately, a volunteer developer has provided some mostly-working code that includes several data files and helper functions to access them as well as a simple demo. 

  An example function has been included `show_summary_for_coords(coods)` and it is called by default on the bottom of the script with a test address. 


## Notes from the Mayor
  You might want to provide one ore more talking points about the data. You could say "Theft is high in this area, this citizen might care about pubic safety policies"

  Your job is to tell any story that you think will help the me understand the neighborhood from where the citizen is messaging.

## A Note from the Programmer
   One of the datasets is missing lot of data for the names of neighborhoods, Public 311 data. If you want to use this dataset you may have to use either geocordinate distances with TurfPy, or create a computed column using the `find_neighborhood_by_point()` function or write a script to fix the data. It's not necessary but I encourage you to give it a try. 

## Constraints
  You can use the preconfigured data included in the project or anything else from [Open Data Buffalo](https://data.buffalony.gov/), but keep in mind that unfortunately, The mobile app will go live tomorrow, so time is of the essence. You shouldn't worry too much about bells and whistles.

  You do not need to worry about any mobile application development, or bulidng a mobile interface for your solution. Assume the mobile app is able to display the standard output from a command-line process and any image or markup formats. You can output your solution to standard out, or write it to a file. It only needs to process 1 citizen at a time. 

## Enviroment Setup
  A github repository has been provided with a base set of data. You can use this data to accomplish the task, add your own repository, or connect via any API to Buffalo Open Data.

### Polygons and Turfpy 
A library called Turfpy is being used to build and query a shape map of neighborhoods. You can also use it to find the distance between two coordinates, perform geocordinate transformations and other mapping operations. [Turfpy Docs](https://turfpy.readthedocs.io/en/latest/)

## Data Reference

All of the data is from Open Data Buffalo. Here are links directly to the documentation for each provided dataset. Note that some of our data has been filtered to decrease file size and all of the provided datasets have a 'neighborhood' column. 

 [Neighborhood Metrics](https://data.buffalony.gov/Economic-Neighborhood-Development/Neighborhood-Metrics/adai-75jt) 'profile'

[Crime Incidents](https://data.buffalony.gov/Public-Safety/Crime-Incidents/d6g9-xbgu) 'crime'

[Public 311 Calls](https://data.buffalony.gov/Quality-of-Life/311-Service-Requests/whkc-e5vr) '311'

[Historic Landmarks](https://data.buffalony.gov/Economic-Neighborhood-Development/Historic-Local-Landmarks/c3aq-3eh4) 'landmarks'

You can access a named dataset easily using the provided function: `load_data_for_source(source_tag)`

In [97]:
!pip install turfpy



In [154]:
!git clone https://github.com/28Protons/BuffStateDSAFall2021Challenge.git

Cloning into 'BuffStateDSAFall2021Challenge'...
remote: Enumerating objects: 30, done.[K
remote: Counting objects: 100% (30/30), done.[K
remote: Compressing objects: 100% (28/28), done.[K
remote: Total 30 (delta 9), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (30/30), done.


In [155]:
import pandas as pd
from   turfpy.measurement import boolean_point_in_polygon
from   geojson import Point, MultiPolygon, Feature
from   os import path 

project_meta = {
    "map":     "/content/BuffStateDSAFall2021Challenge/neighborhoods.json", #Docs 
    "crime":   "/content/BuffStateDSAFall2021Challenge/crime_incidents.csv",
    "311":     "/content/BuffStateDSAFall2021Challenge/public311.json",
    "metrics": "/content/BuffStateDSAFall2021Challenge/neighborhood_metrics.csv",
    "landmarks": "/content/BuffStateDSAFall2021Challenge/historic_landmarks.csv"
}

def fill_neighborhood_map():
    """Fill the Neighborhood map.
    Fill Shape Map with the 'shape_data' specified in project meta 
    """
    neighborhood_map = {}
    neighborhood_map_data = load_data_for_source("map")

    for index, row in neighborhood_map_data.iterrows():
        shapeMap = MultiPolygon(row.the_geom['coordinates'])
        polygon  = Feature(geometry=shapeMap)
        neighborhood_map[row.nbhdname] = polygon
    
    return neighborhood_map

def find_neighborhood_by_point(neighborhood_map, point):
    """Given a point Feature,  find the neighborhood that point lives in
    Feature(geometry=Point([-78.83889, 42.93353]))
    Find the named shape within which a point resides. A brute force solution is provided
    HINT: use boolean_point_in_polygon(pt, value) from turfpy
    """
    ret_val = ""
    for key, value in neighborhood_map.items():
      status = boolean_point_in_polygon(point, value)
    
      if(status) :
        ret_val = key
        break
    
    return ret_val

def load_data_for_source(source_tag,subkey=None):
    """Load Data for source.
    Trivially read data from a source file. Data type is inferred from the file extension
    Source Tag uses the project_meta dictionary to lookup the file name
    """
    data_file = project_meta[source_tag]
    file_ext  = path.splitext(data_file)[1]
    if file_ext == ".csv":
      data_frame = pd.read_csv(data_file)
    elif file_ext == ".json":
      data_frame = pd.read_json(data_file)
    else:
      raise Exception(f"Couldn't load data file {data_file}. Is there an extension handler for {file_ext}")

    return data_frame

def filter_data_by_neighborhood(data_frame, neighborhood):
    """Given a shape map labelled by neighborhood, a neighbhorhood label, and a data_fame with a neighborhood column
    Return a filtered list as a dataframe of data by neighborhood
    """
    return data_frame[data_frame['neighborhood'] == neighborhood]

def show_summary_for_coordinates(coordinates):
    """Given a shape map labelled by neighborhood, a neighbhorhood label, and a data_fame with a neighborhood column
    Return a filtered list as a dataframe of data by neighborhood
    """
    map    = fill_neighborhood_map()
    point  = Feature(geometry=Point(coords))

    neighborhood = find_neighborhood_by_point(map, point)
   
    metrics_df   = load_data_for_source("metrics")
    metrics_df_filtered = filter_data_by_neighborhood(metrics_df, neighborhood)

    crime_df = load_data_for_source("crime")
    crime_df_filtered = filter_data_by_neighborhood(crime_df, neighborhood)
    
    print(neighborhood)
    print("-------------")
    print(metrics_df_filtered[['Employment Rate','Median Income','Percent Non-Family Households']])
    print("-------------")
    print(crime_df_filtered['incident_type_primary'].value_counts())


##Example usage
coords = [-78.83889, 42.93353]
show_summary_for_coordinates(coords)

Fillmore-Leroy
-------------
    Employment Rate  Median Income  Percent Non-Family Households
11            82.79          22910                          45.19
-------------
LARCENY/THEFT        96
ASSAULT              74
BURGLARY             27
UUV                  26
ROBBERY              17
MURDER                4
THEFT OF SERVICES     1
Name: incident_type_primary, dtype: int64
