In [2]:
import os

In [1]:
from pandas import DataFrame
from dbfread import DBF

In [3]:
DATADIR = os.path.expanduser("~/data/fars/")

## Data Exploration

In [4]:
# Load in the accidents from 2011
acc11 = DataFrame(iter(DBF(DATADIR + "2011/accident.dbf")))

Usually the first thing I do when looking at a table is check the dimensions and the data types. From the documentation I expect approximately 40,000 rows and 50 columns.

In [6]:
acc11.shape

(29867, 50)

In [5]:
acc11.dtypes

ARR_HOUR        int64
ARR_MIN         int64
CF1             int64
CF2             int64
CF3             int64
CITY            int64
COUNTY          int64
DAY             int64
DAY_WEEK        int64
DRUNK_DR        int64
FATALS          int64
HARM_EV         int64
HOSP_HR         int64
HOSP_MN         int64
HOUR            int64
LATITUDE      float64
LGT_COND        int64
LONGITUD      float64
MAN_COLL        int64
MILEPT          int64
MINUTE          int64
MONTH           int64
NHS             int64
NOT_HOUR        int64
NOT_MIN         int64
PEDS            int64
PERMVIT         int64
PERNOTMVIT      int64
PERSONS         int64
PVH_INVL        int64
RAIL           object
RELJCT1         int64
RELJCT2         int64
REL_ROAD        int64
ROAD_FNC        int64
ROUTE           int64
SCH_BUS         int64
SP_JUR          int64
STATE           int64
ST_CASE         int64
TWAY_ID        object
TWAY_ID2       object
TYP_INT         int64
VE_FORMS        int64
VE_TOTAL        int64
WEATHER   

These are mostly integers, because they're coded according to a manual. Think categorical variables, or factors in R. To actually make sense of these we'll need to have all the lookup tables. If you can find them online please let me know!

## Maps

Lets map accidents that occurred around Davis / Sacramento.

We'll use folium: https://folium.readthedocs.io/en/latest/ 

(Thanks Nick for the recommendation)

In [7]:
import folium

In [17]:
sac = folium.Map(location=[38.5449, -121.7405])
sac

That's a nice looking map, now we'll add points for accidents.

First we need to pick out those accidents to display. We can subset using square brackets.

In [22]:
acc_sac = acc11[(38 < acc11["LATITUDE"])
                & (acc11["LATITUDE"] < 39)
                & (-122 < acc11["LONGITUD"])
                & (acc11["LONGITUD"] < -121)
               ]

In [34]:
for loc in acc_sac[["LATITUDE", "LONGITUD"]].itertuples(index = False):
    folium.Marker(loc).add_to(sac)

In [35]:
sac

## The Golden Hour

gold·en hour (noun) MEDICINE

> The first hour after the occurrence of a traumatic injury, considered the most critical for successful emergency treatment.

Let's look at how much time passed between when an accident occurred and the first responders arrived on the scene.

Here's some relevant info from the docs:

```
C9B Minute of Crash

Definition: This data element records the minutes after the hour at which the crash occurred.

Additional Information: All time is 24-hour military time.

The time of the crash/arrival of the emergency medical service can occur in a different day than
the arrival of emergency medical service at the crash scene/hospital.
This data element also appears in the Vehicle and Person data files and in the Parkwork data
file as PMINUTE.

SAS Name: MINUTE
1975- 2010-
2008 2009 Later
00-59 00-59 00-59 Minute
-- 88 -- Not Applicable or Not Notified
99 99 99 Unknown


C30B Minute of Arrival at Scene

Definition: This data element records the minutes after the hour that emergency medical
service arrived on the crash scene.

Prior to 2015, this data element’s Locator Code or Data Element Number was C29B.
SAS Name: ARR_MIN
1975- 1999- 2009-
1998 2008 Later
00-59 00-59 00-59 Minute
00 -- Not Notified or Officially Cancelled
(when ARR_HOUR = 00)
-- 00 -- Not Notified (when ARR_HOUR = 00)
-- -- 88 Not Applicable or Not Notified
-- 97 97 Officially Cancelled
-- 98 98 Unknown if Arrived
99 99 99 Unknown Minutes
```

So we can use the time elements in the data to see how long it takes for the emergency medical services to arrive.