# <span style="color:Purple">Data Exploration: </span>
# <span>Using Fatal Accidents Data to Find Systemic Problems in US Interstates</span>

# <span style="color:Navy">Notebook 2: </span><span> Data Exploration and Visualization </span>

Our data exploration and analysis is in service of answering the question: 
##### <span style="color:Navy">"Where does the US interstate network exhibit the most significant patterns of systemic problems leading to fatal traffic accidents?"</span>

To this end, a data set composed of all fatal accidents in the nation from 2010 to 2016 has been compiled. Let's start with some basic data exploration and visualization to ask and answer basic questions.

### Preliminary Setup

#### Import the needed modules, including ArcPy, the ArcGIS API for Python, and other useful modules

In [2]:
import arcpy
arcpy.env.overwriteOutput = True
import arcgis
from arcgis import features
import pandas as pd
from arcgis.features import GeoAccessor, GeoSeriesAccessor
from arcgis.geoenrichment import *
from arcgis.geometry import project
import os
import time

In [5]:
# Imports for plotting in Bokeh
import numpy as np
import bokeh
from bokeh.io import output_notebook, output_file, show
from bokeh.plotting import figure
from bokeh.models import Legend, Range1d
from bokeh.embed import file_html
from bokeh.resources import CDN
# Set bokeh to output plots in the notebook
output_notebook()

In [7]:
# gis = arcgis.gis.GIS("home", verify_cert=False)
gis = arcgis.gis.GIS("https://esrifederal.maps.arcgis.com", username="Anieto_esrifederal", verify_cert=False)

Enter password: ········


In [11]:
# Create a helper function that receives a spatially-enabled dataframe and column as input, and returns a map widget, symbolized layer, and bokeh histogram using the layer's colormap
def create_map_and_histogram(map_location, sdf, column, method='esriClassifyNaturalBreaks', class_count=5, cmap='OrRd', alpha=0.8, plot_height=600, plot_width=600):
    """
    create_map_and_histogram
    inputs:
        map_location: Location for the map widget. The entry will be geocoded and used as the starting extent for the map widget. Example: "Pittsburgh"
        sdf: spatially-enabled dataframe that is plotted on a map and histogram
        column: column to use for layer symbology and histogram
    return: map widget, bokeh plot
    """
    
    # Create map
    map_obj = gis.map(map_location)
    sdf.spatial.plot(map_widget=map_obj, renderer_type='c', method=method, class_count=class_count, col=column, cmap=cmap, alpha=0.8)  
    
    # Extract the layer's class breaks and colors
    class_breaks = map_obj.layers[0].layer.layerDefinition.drawingInfo.renderer.classBreakInfos
    cbs_list = []
    cmap_list = []
    for cb in class_breaks:
        cbs_list.append(cb.classMaxValue)
        cmap_list.append('#%02x%02x%02x' % (cb.symbol.color[0], cb.symbol.color[1], cb.symbol.color[2]))
    
    # Create a histogram of salesvol values
    hist, edges = np.histogram(sdf[column],
                              bins=class_count)

    # Put the information in a dataframe
    hist_df = pd.DataFrame({column: hist,
                            'left': edges[:-1],
                            'right': edges[1:]})
    
    # Add colors to each hist_df record
    hist_df['color'] = pd.Series(cmap_list)

    # Create the blank plot
    p = figure(plot_height = plot_height, plot_width = plot_width, 
               title = 'Histogram',
               y_axis_label = 'Feature Count',
               x_axis_label = column)

    # Add a quad glyph
    p.quad(bottom=0, top=hist_df[column], 
           left=hist_df['left'], right=hist_df['right'],
           line_color='white', fill_color=hist_df['color'])

    # Return outputs
    return map_obj, p

# <span style="color:purple">1) Retrieve fatal accident data</span>

We published an item containing fatal accidents to the ArcGIS Enterprise. Let's find this item using the "Add" toolbar and load it into our notebook as an item variable.

In [8]:
fars_allyears_fc = r"C:\Users\albe9057\Desktop\NHTSA_Pro_Project\fars_allyears.gdb\FARSaux_Accidents_Project"

In [9]:
fars_sedf = pd.DataFrame.spatial.from_featureclass(fars_allyears_fc)
fars_sedf.head()

Unnamed: 0,OBJECTID,STATE,COUNTY,MONTH,DAY,HOUR,MINUTE,VE_FORMS,PERSONS,PEDS,...,BIA,SPJ_INDIAN,INDIAN_RES,RUR_URB,FUNC_SYS,RD_OWNER,DATE,TIME,DATETIME_2,SHAPE
0,1,1,21,1,1,7,30,1,1,0,...,,,,,,,2003-01-01,07:30:00,2003-01-01 07:30:00,"{""x"": 825546.1922000013, ""y"": -781400.58239999..."
1,2,1,71,1,1,15,50,1,1,0,...,,,,,,,2003-01-01,15:50:00,2003-01-01 15:50:00,"{""x"": 873057.9888000004, ""y"": -573902.54509999..."
2,3,1,51,1,5,12,19,1,1,0,...,,,,,,,2003-01-05,12:19:00,2003-01-05 12:19:00,"{""x"": 897368.2910000011, ""y"": -824484.13279999..."
3,4,1,111,1,4,7,50,1,1,0,...,,,,,,,2003-01-04,07:50:00,2003-01-04 07:50:00,"{""x"": 942991.9792000018, ""y"": -746566.58009999..."
4,5,1,13,1,1,19,30,2,3,0,...,,,,,,,2003-01-01,19:30:00,2003-01-01 19:30:00,"{""x"": 834351.8916999996, ""y"": -938639.39039999..."


# <span style="color:purple">1) Basic data descriptions</span>

In [12]:
fars_sedf.shape

(514185, 110)

In [16]:
fars_sedf.columns.tolist()

['OBJECTID',
 'STATE',
 'COUNTY',
 'MONTH',
 'DAY',
 'HOUR',
 'MINUTE',
 'VE_FORMS',
 'PERSONS',
 'PEDS',
 'NHS',
 'ROAD_FNC',
 'ROUTE',
 'SP_JUR',
 'HARM_EV',
 'MAN_COLL',
 'REL_JUNC',
 'REL_ROAD',
 'TRAF_FLO',
 'NO_LANES',
 'SP_LIMIT',
 'ALIGNMNT',
 'PROFILE',
 'PAVE_TYP',
 'SUR_COND',
 'TRA_CONT',
 'T_CONT_F',
 'HIT_RUN',
 'LGT_COND',
 'WEATHER',
 'C_M_ZONE',
 'NOT_HOUR',
 'NOT_MIN',
 'ARR_HOUR',
 'ARR_MIN',
 'HOSP_HR',
 'HOSP_MN',
 'SCH_BUS',
 'CF1',
 'CF2',
 'CF3',
 'FATALS',
 'DAY_WEEK',
 'DRUNK_DR',
 'ST_CASE',
 'CITY',
 'MILEPT',
 'YEAR',
 'TWAY_ID',
 'RAIL',
 'latitude',
 'LONGITUDE',
 'A_CRAINJ',
 'A_REGION',
 'A_RU',
 'A_INTER',
 'A_RELRD',
 'A_INTSEC',
 'A_ROADFC',
 'A_JUNC',
 'A_MANCOL',
 'A_RD',
 'A_TOD',
 'A_DOW',
 'A_HR',
 'A_CT',
 'A_LT',
 'A_MC',
 'A_SPCRA',
 'A_PED',
 'A_PEDAL',
 'A_ROLL',
 'A_POLPUR',
 'A_POSBAC',
 'A_D15_19',
 'A_D16_19',
 'A_D15_20',
 'A_D16_20',
 'A_D65PLS',
 'A_D21_24',
 'A_D16_24',
 'A_DIST',
 'A_DROWSY',
 'DATETIME',
 'X',
 'Y',
 'X_Y_VALID',


FARS metadata documentation: https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/812602

In [19]:
fars_sedf.describe()

Unnamed: 0,OBJECTID,STATE,COUNTY,MONTH,DAY,HOUR,MINUTE,VE_FORMS,PERSONS,PEDS,...,A_D16_20,A_D65PLS,A_D21_24,A_D16_24,A_DIST,A_DROWSY,DATETIME,X,Y,X_Y_VALID
count,514185.0,514185.0,514185.0,514185.0,514185.0,514185.0,514185.0,514185.0,514185.0,514185.0,...,514185.0,514185.0,514185.0,514185.0,514185.0,514185.0,514185.0,514185.0,514185.0,514185.0
mean,257093.0,27.531188,90.21672,6.711557,15.6521,13.284244,28.837041,1.502907,2.440089,0.180692,...,1.840326,1.834062,1.847143,1.697391,1.883589,1.974542,19907530000000.0,-91.743374,36.914874,0.972259
std,148432.568419,16.226781,93.867852,3.359763,8.867418,10.464198,18.574252,0.795356,1.902167,0.459987,...,0.366304,0.372026,0.35985,0.459388,0.320718,0.157511,1891509000000.0,14.798868,5.234091,0.16423
min,1.0,1.0,0.0,1.0,1.0,-99.0,0.0,1.0,0.0,0.0,...,1.0,1.0,1.0,1.0,1.0,1.0,-99.0,-174.204181,18.960467,0.0
25%,128547.0,12.0,31.0,4.0,8.0,7.0,14.0,1.0,1.0,0.0,...,2.0,2.0,2.0,1.0,2.0,2.0,20050420000000.0,-97.555647,33.407386,1.0
50%,257093.0,27.0,71.0,7.0,16.0,14.0,30.0,1.0,2.0,0.0,...,2.0,2.0,2.0,2.0,2.0,2.0,20080920000000.0,-87.415439,36.837156,1.0
75%,385639.0,42.0,115.0,10.0,23.0,19.0,44.0,2.0,3.0,0.0,...,2.0,2.0,2.0,2.0,2.0,2.0,20121200000000.0,-81.0,40.833239,1.0
max,514185.0,56.0,999.0,12.0,99.0,99.0,99.0,92.0,158.0,74.0,...,2.0,2.0,2.0,2.0,2.0,2.0,20161230000000.0,-66.993169,71.324003,1.0


In [20]:
fars_sedf.FATALS.describe()

count    514185.000000
mean          1.099507
std           0.385133
min           1.000000
25%           1.000000
50%           1.000000
75%           1.000000
max          23.000000
Name: FATALS, dtype: float64

In [21]:
fars_sedf.loc[fars_sedf.FATALS == 23]

Unnamed: 0,OBJECTID,STATE,COUNTY,MONTH,DAY,HOUR,MINUTE,VE_FORMS,PERSONS,PEDS,...,BIA,SPJ_INDIAN,INDIAN_RES,RUR_URB,FUNC_SYS,RD_OWNER,DATE,TIME,DATETIME_2,SHAPE
112920,112921,48,113,9,23,6,7,1,38,0,...,,,,,,,2005-09-23,06:07:00,2005-09-23 06:07:00,"{""x"": -60116.101999999955, ""y"": -870215.196000..."


This unfortunate event was related to the Hurricane Rita evacuation, where a vehicle caught fire:
https://www.ntsb.gov/investigations/AccidentReports/Pages/HAR0701.aspx

In [23]:
fars_interstates_sedf = fars_sedf.loc[fars_sedf['FUNC_SYS'] == 1]
fars_interstates_sedf

Unnamed: 0,OBJECTID,STATE,COUNTY,MONTH,DAY,HOUR,MINUTE,VE_FORMS,PERSONS,PEDS,...,BIA,SPJ_INDIAN,INDIAN_RES,RUR_URB,FUNC_SYS,RD_OWNER,DATE,TIME,DATETIME_2,SHAPE
408717,408718,1,83,1,1,22,13,1,1,0,...,0,0,0,1,1,1,2015-01-01,22:13:00,2015-01-01 22:13:00.000000,"{""x"": 786904.8220999986, ""y"": -560344.17369999..."
408736,408737,1,115,1,23,13,48,2,4,0,...,0,0,0,1,1,1,2015-01-23,13:48:00,2015-01-23 13:48:00.000000,"{""x"": 836022.4448999986, ""y"": -713257.25250000..."
408759,408760,1,73,1,31,3,10,1,2,0,...,0,0,0,2,1,1,2015-01-31,03:10:00,2015-01-31 03:10:00.000000,"{""x"": 815359.2540999986, ""y"": -734958.9057, ""s..."
408760,408761,1,97,1,31,8,19,1,1,0,...,0,0,0,2,1,1,2015-01-31,08:19:00,2015-01-31 08:19:00.000000,"{""x"": 723280.0604999997, ""y"": -1066510.2558000..."
408761,408762,1,17,2,8,2,15,1,1,0,...,0,0,0,2,1,1,2015-02-08,02:15:00,2015-02-08 02:15:00.000000,"{""x"": 962744.9134999998, ""y"": -783883.2396, ""s..."
408779,408780,1,97,2,4,0,19,1,1,1,...,0,0,0,2,1,1,2015-02-04,00:19:00,2015-02-04 00:19:00.000000,"{""x"": 722648.8920000009, ""y"": -1057724.7522999..."
408789,408790,1,73,2,16,5,17,4,4,0,...,0,0,0,2,1,1,2015-02-16,05:17:00,2015-02-16 05:17:00.000000,"{""x"": 811166.6207000017, ""y"": -729103.25779999..."
408794,408795,1,73,3,7,4,45,1,2,0,...,0,0,0,2,1,1,2015-03-07,04:45:00,2015-03-07 04:45:00.000000,"{""x"": 796953.184799999, ""y"": -738085.992499999..."
408808,408809,1,73,3,9,5,55,3,3,0,...,0,0,0,2,1,1,2015-03-09,05:55:00,2015-03-09 05:55:00.000001,"{""x"": 827072.8751999997, ""y"": -716307.3262, ""s..."
408809,408810,1,53,3,6,20,25,1,1,0,...,0,0,0,1,1,1,2015-03-06,20:25:00,2015-03-06 20:25:00.000000,"{""x"": 791680.3949000016, ""y"": -993271.98269999..."


In [27]:
fars_interstates_sedf['YEAR'].unique().tolist()

[2015, 2016]

In [31]:
us_map = gis.map("USA")
us_map

MapView(layout=Layout(height='400px', width='100%'))

In [32]:
fars_interstates_sedf.spatial.plot(map_widget=us_map)

True

In [33]:
gis.content.import_data(fars_interstates_sedf, title="fars_interstates_2015and2016")

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

SystemError: <da.funcInfo object at 0x00000217A12E3030> returned NULL without setting an error

In [None]:
arcgis.features.analyze_patterns.find_point_clusters()