In [None]:
import folium
import requests
import pandas as pd

# https://www.kaggle.com/datasets/adityadesai13/11000-bike-crash-data
crash_table = pd.read_csv("NCDOT_BikePedCrash.csv")

crash_table.head(10)

Unnamed: 0,X,Y,OBJECTID,AmbulanceR,BikeAge,BikeAgeGrp,BikeAlcDrg,BikeAlcFlg,BikeDir,BikeInjury,...,RdConfig,RdDefects,RdFeature,RdSurface,Region,RuralUrban,SpeedLimit,TraffCntrl,Weather,Workzone
0,-78.883896,36.03949,1,Yes,11,11-15,.,No,With Traffic,B: Suspected Minor Injury,...,"Two-Way, Divided, Unprotected Median",,No Special Feature,Smooth Asphalt,Piedmont,Urban,30 - 35 MPH,No Control Present,Clear,No
1,-78.7828,35.751118,2,Yes,20,20-24,.,No,Facing Traffic,C: Possible Injury,...,"Two-Way, Divided, Unprotected Median",,Four-Way Intersection,Smooth Asphalt,Piedmont,Urban,30 - 35 MPH,Stop And Go Signal,Clear,No
2,-80.69782,35.084732,3,Yes,37,30-39,.,No,Unknown,B: Suspected Minor Injury,...,"Two-Way, Not Divided",,No Special Feature,Smooth Asphalt,Piedmont,Urban,20 - 25 MPH,No Control Present,Cloudy,No
3,-80.47932,35.6844,4,Yes,30,30-39,.,No,With Traffic,C: Possible Injury,...,"Two-Way, Not Divided",,Four-Way Intersection,Smooth Asphalt,Piedmont,Urban,30 - 35 MPH,No Control Present,Cloudy,No
4,-78.90445,34.999428,5,Yes,45,40-49,.,No,With Traffic,B: Suspected Minor Injury,...,"Two-Way, Not Divided",,No Special Feature,Coarse Asphalt,Coastal,Urban,30 - 35 MPH,"Double Yellow Line, No Passing Zone",Clear,No
5,-80.47759,35.666668,6,Yes,58,50-59,.,No,With Traffic,B: Suspected Minor Injury,...,"Two-Way, Not Divided",,Four-Way Intersection,Smooth Asphalt,Piedmont,Urban,30 - 35 MPH,Stop And Go Signal,Clear,No
6,-79.59008,35.839287,7,No,51,50-59,.,Yes,With Traffic,A: Suspected Serious Injury,...,"Two-Way, Not Divided",,No Special Feature,Coarse Asphalt,Piedmont,Rural,50 - 55 MPH,"Double Yellow Line, No Passing Zone",Cloudy,No
7,-76.65675,34.717346,8,Yes,13,11-15,.,No,With Traffic,C: Possible Injury,...,"Two-Way, Not Divided",,No Special Feature,Smooth Asphalt,Coastal,Rural,20 - 25 MPH,Stop Sign,Clear,No
8,-81.57649,35.890682,9,Yes,18,16-19,.,No,With Traffic,C: Possible Injury,...,"Two-Way, Not Divided",,No Special Feature,Coarse Asphalt,Mountains,Urban,Unknown,No Control Present,Cloudy,No
9,-79.75971,36.512688,10,No,39,30-39,.,No,With Traffic,B: Suspected Minor Injury,...,"Two-Way, Not Divided",Unknown,No Special Feature,Smooth Asphalt,Piedmont,Urban,30 - 35 MPH,No Control Present,Clear,No


The dataset I chose was the bike crash data in the state of North Carolina from 2007 to 2018. The dataset was from the [Town of Chapel Hill GIS & Analytics](https://opendata-townofchapelhill.hub.arcgis.com/) website and was provided by the North Carolina Department of Transportaion.

The dataset contains much categorical data from the demographic of both the biker and driver of each crash, the traffic situation, weather condition, injuries and more. For simplicity, I selected the subset of columns below that I wanted to analyze with the data.

In [None]:
crash_table = crash_table[['Latitude', 'Longitude', 'AmbulanceR', 'BikeAge', 'BikeAgeGrp', 'BikeDir', 'BikeInjury', 'BikeRace', 'BikeSex', 'City', 'CrashAlcoh', 'CrashYear']]
crash_table.head(10)

Unnamed: 0,Latitude,Longitude,AmbulanceR,BikeAge,BikeAgeGrp,BikeDir,BikeInjury,BikeRace,BikeSex,City,CrashAlcoh,CrashYear
0,36.03949,-78.883896,Yes,11,11-15,With Traffic,B: Suspected Minor Injury,Black,Male,Durham,No,2007
1,35.751118,-78.7828,Yes,20,20-24,Facing Traffic,C: Possible Injury,Hispanic,Male,Cary,No,2007
2,35.084732,-80.69782,Yes,37,30-39,Unknown,B: Suspected Minor Injury,Black,Male,Stallings,No,2007
3,35.6844,-80.47932,Yes,30,30-39,With Traffic,C: Possible Injury,White,Male,Salisbury,No,2007
4,34.999428,-78.90445,Yes,45,40-49,With Traffic,B: Suspected Minor Injury,Black,Male,Fayetteville,No,2007
5,35.666668,-80.47759,Yes,58,50-59,With Traffic,B: Suspected Minor Injury,White,Male,Salisbury,No,2007
6,35.839287,-79.59008,No,51,50-59,With Traffic,A: Suspected Serious Injury,Black,Male,None - Rural Crash,Yes,2007
7,34.717346,-76.65675,Yes,13,11-15,With Traffic,C: Possible Injury,White,Male,Beaufort,No,2007
8,35.890682,-81.57649,Yes,18,16-19,With Traffic,C: Possible Injury,White,Male,Lenoir,No,2007
9,36.512688,-79.75971,No,39,30-39,With Traffic,B: Suspected Minor Injury,White,Male,Eden,No,2007


In [None]:
crash_table[pd.notnull(crash_table["Latitude"])]["Latitude"].count()

11266

In [None]:
crash_table['BikeInjury'].unique()

array(['B: Suspected Minor Injury', 'C: Possible Injury',
       'A: Suspected Serious Injury', 'K: Killed', 'Unknown Injury',
       'O: No Injury'], dtype=object)

In [None]:
filtered = crash_table.loc[crash_table['City'] == 'Raleigh']
filtered = filtered.loc[crash_table['CrashYear'] >= 2014]

colors = {}
colors['Unknown Injury'] = 'lightgray'
colors['O: No Injury'] = 'green'
colors['K: Killed'] = 'black'
colors['A: Suspected Serious Injury'] = 'red'
colors['B: Suspected Minor Injury'] = 'orange'
colors['C: Possible Injury'] = 'lightgreen'

icons = {'Yes' : 'exclamation', 'No' : 'circle'}

map_osm = folium.Map(location=[35.8596, -78.6282], zoom_start=13)
filtered.apply(lambda row: folium.Marker(location=[row["Latitude"], row["Longitude"]], icon=folium.Icon(color=colors[row['BikeInjury']], icon=icons[row['AmbulanceR']], prefix='fa')).add_to(map_osm),axis=1)
map_osm

For this graph, I filtered the dataset for only crash entries in the capital city of North Carolina, Raleigh, for the most recent five year available in the dataset (2014-2018). I thought since Raleigh was one of the fastest growing metropolitan area in the United States, it is bound to experience high levels of traffic, so I thought it would be insigthful for me to focus my attention on crashes within Raleigh. The filtering step was also to decrease the number of markers of the map so that it can be more readable allowing us to anaylyze the data better.

As shown in the map above, I have color coded the markers to represent the severity of injury for crash incident:

- Gray: Unknown
- Green: No Injury
- Light Green: Possible Injury
- Orange: Suspected Minor Injury
- Red: Suspected Major Injury
- Black: Killed

The icons inside each marker also denotes whether an ambulance was required at the scene of the crash with an exclamantion mark (!) meaning ambulance was required and circle meaning no ambulance was required.

Looking at the markers on the map above, it is apparent that most of bike crashes occured on streets closer to the center of the city which experience higher traffice. Specifically we can see that high concentrations of crashes were reported on Hillborough Street and the commerical area around Union, Nash, and Moore Squares. Other than these areas, crashes were very spread out and sporadic in the suburban areas.

From this data, I think the local officials will gain a general idea of which areas to further investigate to find the cause for high concentration of bike crashes and implement new traffic regulations to decrease the risk of bike crashes in the city.