## Assignment 4
### Vanita Thompson

Module 4
In this module we’ll be looking at data from the New York City tree census:
https://data.cityofnewyork.us/Environment/2015-Street-Tree-Census-Tree-Data/uvpi-gqnh
This data is collected by volunteers across the city, and is meant to catalog information
about every single tree in the city.
Build a dash app for a arborist studying the health of various tree species (as defined by the
variable ‘spc_common’) across each borough (defined by the variable ‘borough’). This
arborist would like to answer the following two questions for each species and in each
borough:
1. What proportion of trees are in good, fair, or poor health according to the ‘health’
variable?
2. Are stewards (steward activity measured by the ‘steward’ variable) having an impact
on the health of trees?
Please see the accompanying notebook for an introduction and some notes on the Socrata
API.
Deployment: Dash deployment is more complicated than deploying shiny apps, so
deployment in this case is optional (and will result in extra credit). You can read instructions
on deploying a dash app to heroku here: https://dash.plot.ly/deployment

In [1]:
import pandas as pd
import numpy as np

## Data Import and Analysis

All of the column names used are from the data dictionary.

In [2]:
# Importing ata from NYC tree census, xtracting only columns necessary for question one, filtering out NaN and nan values, and reading it into a pandas data frame.

url1 = ('https://data.cityofnewyork.us/resource/nwxe-4ae8.json?$limit=700000&' +\
    '$offset=0&$select=count(tree_id),boroname,spc_common,health,steward&' +\
    '$where=health!=%27NaN%27%20and%20spc_common!=%27nan%27' +\
    '&$group=boroname,health,steward,spc_common').replace(' ', '%20')
trees_all = pd.read_json(url1)
# Importing ata from NYC tree census, xtracting only columns necessary for question two, filtering out NaN and nan values, and reading it into a pandas data frame.
url2 = ('https://data.cityofnewyork.us/resource/nwxe-4ae8.json?$limit=700000&' +\
    '$offset=0&$select=tree_id,boroname,spc_common,health,steward&' +\
    '$where=health!=%27NaN%27%20and%20spc_common!=%27nan%27').replace(' ', '%20')
trees_2d = pd.read_json(url2)

In [3]:
# Sorting for the 'steward' value in the visualizations, changing the 'None' value to '0'
def label_steward (row):
   if row['steward'] == 'None' :
      return '0'
   return row['steward']

trees_all['nbr_stewards'] = trees_all.apply (lambda row: label_steward(row), axis=1)
trees_2d['nbr_stewards'] = trees_2d.apply (lambda row: label_steward(row), axis=1)
trees_2d.columns = ['Borough','Health','Species','steward','tree_id','Nbr Stewards']

In [4]:
# Using the trees_all dataframe to build a dataframe for question 1 based on borough
trees_borough1 = trees_all.groupby(['boroname','health']).agg({'count_tree_id': [np.sum]})
trees_borough1.rename(index=str, columns={"count_tree_id": "Tree Count"})
trees_borough1_df = pd.DataFrame(trees_borough1.to_records())
trees_borough2 = trees_all.groupby(['boroname']).agg({'count_tree_id': [np.sum]})
trees_borough2.rename(index=str, columns={"count_tree_id": "Total Tree Count"})
trees_borough2_df = pd.DataFrame(trees_borough2.to_records())
trees_borough_health = pd.merge(trees_borough1_df, trees_borough2_df, how='left', on='boroname')
trees_borough_health.columns = ['Borough','Health','Tree Counts','Total Tree Counts']
trees_borough_health['Proportion'] = trees_borough_health['Tree Counts'] / trees_borough_health['Total Tree Counts']

In [5]:
# Using the trees_all dataframe to build a dataframe for question 1 based on species
trees_species1 = trees_all.groupby(['spc_common','health']).agg({'count_tree_id': [np.sum]})
trees_species1.rename(index=str, columns={"count_tree_id": "Tree Count"})
trees_species1_df = pd.DataFrame(trees_species1.to_records())
trees_species2 = trees_all.groupby(['spc_common']).agg({'count_tree_id': [np.sum]})
trees_species2.rename(index=str, columns={"count_tree_id": "Total Tree Count"})
trees_species2_df = pd.DataFrame(trees_species2.to_records())
trees_species_health = pd.merge(trees_species1_df, trees_species2_df, how='left', on='spc_common')
trees_species_health.columns = ['Species','Health','Tree Counts','Total Tree Counts']
trees_species_health['Proportion'] = trees_species_health['Tree Counts'] / trees_species_health['Total Tree Counts']

## Data Visualization

In [None]:
import dash
import dash_core_components as dcc
import dash_html_components as html

external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

# setting up datasets 
df_boro = trees_borough_health
df_spec = trees_species_health

# setting up drop-downs 
borough_drop_down = df_boro['Borough'].unique()
species_drop_down = df_spec['Species'].unique()

app = dash.Dash(__name__, external_stylesheets=external_stylesheets)

app.layout = html.Div([
    html.H1('Proportions of Health Status by Borough'),
    html.Div('''
        Borough
    '''),
    dcc.Dropdown(
        id='borough-dropdown',
        options=[{'label': i, 'value': i} for i in borough_drop_down],
        multi=True, 
        value='Bronx'
    ),
    dcc.Graph(
        id='borough-graph'    
    ),
    html.H1('Proportions of Health Status by Species'),
    html.Div('''
        Species
    '''),
    dcc.Dropdown(
        id='species-dropdown',
        options=[{'label': i, 'value': i} for i in species_drop_down], 
        multi=True, 
        value='American beech'
    ),
    dcc.Graph(
        id='species-graph'    
    ), 
    html.H1('Borough Tree Health vs Stewards 2d Histogram'),
    dcc.Graph(
        id='borough-steward-graph'
    ), 
    html.H1('Species Tree Health vs Stewards 2d Histogram'),
    dcc.Graph(
        id='species-steward-graph'
    )
    ])

# health by borough
@app.callback(
    dash.dependencies.Output('borough-graph', 'figure'),
    [dash.dependencies.Input('borough-dropdown', 'value')])

def update_boro_output(borough_dropdown_value):
    html.H1('Proportions of Health Status by Borough'),
    html.Div('''
        Borough
    '''),
    dff = df_boro[df_boro['Borough'] == borough_dropdown_value]
    figure = {
            'data': [
                {'x': dff.Borough[dff['Health'] == 'Good'], 'y': dff.Proportion[dff['Health'] == 'Good'], 'type': 'bar', 'name': 'Good'},
                {'x': dff.Borough[dff['Health'] == 'Fair'], 'y': dff.Proportion[dff['Health'] == 'Fair'], 'type': 'bar', 'name': 'Fair'},
                {'x': dff.Borough[dff['Health'] == 'Poor'], 'y': dff.Proportion[dff['Health'] == 'Poor'], 'type': 'bar', 'name': 'Poor'}
            ],
            'layout': {
                'title': 'Proportions of Health Status by Borough'
                    }
            }
    return figure 

# health by species
@app.callback(
    dash.dependencies.Output('species-graph', 'figure'),
    [dash.dependencies.Input('species-dropdown', 'value')])

def update_spec_output(species_dropdown_value):
    html.H1('Proportions of Health Status by Species'),
    html.Div('''
        Species
    '''),
    dff = df_spec[df_spec['Species'] == species_dropdown_value]
    figure = {
            'data': [
                {'x': dff.Species[dff['Health'] == 'Good'], 'y': dff.Proportion[dff['Health'] == 'Good'], 'type': 'bar', 'name': 'Good'},
                {'x': dff.Species[dff['Health'] == 'Fair'], 'y': dff.Proportion[dff['Health'] == 'Fair'], 'type': 'bar', 'name': 'Fair'},
                {'x': dff.Species[dff['Health'] == 'Poor'], 'y': dff.Proportion[dff['Health'] == 'Poor'], 'type': 'bar', 'name': 'Poor'}
            ],
            'layout': {
                'title': 'Proportions of Health Status by Species'
                    }
            }
    return figure 

# histogram of steward count vs health count by borough
@app.callback(
    dash.dependencies.Output('borough-steward-graph', 'figure'),
    [dash.dependencies.Input('borough-dropdown', 'value')])

def update_boro_steward_output(borough_dropdown_value):
    html.H1('Borough Tree Health vs Stewards 2d Histogram'),
    dff = trees_2d[trees_2d['Borough'] == borough_dropdown_value]
    figure = {
            'data': [
                {'x': dff['Health'], 'y': dff['Nbr Stewards'], 'type': 'histogram2d'}
            ],
            'layout': {
                'title': 'Borough Tree Health vs Stewards 2d Histogram'
                    }
            }
    
    return figure 

# histogram of steward count vs health count by species
@app.callback(
    dash.dependencies.Output('species-steward-graph', 'figure'),
    [dash.dependencies.Input('species-dropdown', 'value')])

def update_spec_steward_output(species_dropdown_value):
    html.H1('Species Tree Health vs Stewards 2d Histogram'),
    dff = trees_2d[trees_2d['Species'] == species_dropdown_value]
    figure = {
            'data': [
                {'x': dff['Health'], 'y': dff['Nbr Stewards'], 'type': 'histogram2d'}
            ],
            'layout': {
                'title': 'Species Tree Health vs Stewards 2d Histogram'
                    }
            }
    return figure 


if __name__ == '__main__':
    app.run_server(debug=False)

Dash is running on http://127.0.0.1:8050/

Dash is running on http://127.0.0.1:8050/

Dash is running on http://127.0.0.1:8050/

Dash is running on http://127.0.0.1:8050/

 * Serving Flask app "__main__" (lazy loading)
 * Environment: production
   Use a production WSGI server instead.
 * Debug mode: off


The graphs can be visualized by clicking the dash link above http://127.0.0.1:8050/ 