In [1]:
import os

In [2]:
from pandas import DataFrame
from dbfread import DBF

# Might need to install this from command line:
#
# $ pip install dbfread

In [3]:
# This is the module found in my current directory, the fars.py file
import fars

# Change this according to what's convenient for you
DATADIR = os.path.expanduser("~/data/fars/")

The lines below are commented since I don't 

You'll see output like this:

```
downloaded FARS2010.zip
downloaded FARS2011.zip
unzipped FARS2010.zip
unzipped FARS2010.zip
```

In [4]:
# Uncomment the following lines to fetch the data and unzip it
fars.download(datadir = DATADIR)
fars.unzip_all(DATADIR)


downloaded FARS2010.zip


KeyboardInterrupt: 

## Exercise 1

Here are a couple functions that were helpful when downloading the data.

Your task- fill in the code to actually do it. Check if you're correct by running the test.

This is the time to get your feet wet trying Python.

In [None]:
def before2012(x, start=2010):
    """
    Looking at the top level directory here: ftp://ftp.nhtsa.dot.gov/fars/
    We need to find those that are only years.
    
    In 2012 the directory pattern changed, so we'll just look at those
    years before that.

    start year can be as early as 1975.
    
    >>> before2012("2011")
    True
    >>> before2012("Auxiliary_FARS_Files_Formats/")
    False
    
    """
    return False
    
    

def isfars(fname):
    """
    Return True if the filename looks like a FARS file
    
    >>> isfars("FARS2011.zip")
    True
    >>> isfars("MI2011DBF.zip")
    False
    
    """
    return False


import doctest
doctest.testmod(verbose=True)

## Data Exploration

If you're having trouble downloading with the script you can try the FTP server from your web browser: [ftp://ftp.nhtsa.dot.gov/fars/](ftp://ftp.nhtsa.dot.gov/fars/)

If that doesn't work there's one year available on our web server: http://anson.ucdavis.edu/~clarkf/

In [None]:
# Load in the accidents from 2011
acc11 = DataFrame(iter(DBF(DATADIR + "2011/accident.dbf")))

Usually the first thing I do when looking at a table is check the dimensions and the data types. From the documentation I expect approximately 40,000 rows and 50 columns.

In [None]:
acc11.shape

In [None]:
acc11.dtypes

In [None]:
acc11.head()

These are mostly integers, because they're coded according to a manual. Think categorical variables, or factors in R. To actually make sense of these we'll need to have all the lookup tables. If you can find them online please let me know!

## Maps

Lets map accidents that occurred around Davis / Sacramento.

We'll use folium: https://folium.readthedocs.io/en/latest/ 

(Thanks Nick for the recommendation)

In [None]:
# Might need to install this from command line:
#
# $ pip install folium

import folium

In [None]:
sac = folium.Map(location=[38.5449, -121.7405])
sac

That's a nice looking map, now we'll add points for accidents.

First we need to pick out those accidents to display. We can subset using square brackets.

In [None]:
acc_sac = acc11[(38 < acc11["LATITUDE"])
                & (acc11["LATITUDE"] < 39)
                & (-122 < acc11["LONGITUD"])
                & (acc11["LONGITUD"] < -121)
               ]

In [None]:
for loc in acc_sac[["LATITUDE", "LONGITUD"]].itertuples(index = False):
    folium.Marker(loc).add_to(sac)

In [None]:
sac

## The Golden Hour

gold·en hour (noun) MEDICINE

> The first hour after the occurrence of a traumatic injury, considered the most critical for successful emergency treatment.

Let's look at how much time passed between when an accident occurred and the first responders arrived on the scene.

Here's some relevant info from the docs:

```
C9B Minute of Crash

Definition: This data element records the minutes after the hour at which the crash occurred.

Additional Information: All time is 24-hour military time.

The time of the crash/arrival of the emergency medical service can occur in a different day than
the arrival of emergency medical service at the crash scene/hospital.
This data element also appears in the Vehicle and Person data files and in the Parkwork data
file as PMINUTE.

SAS Name: MINUTE
1975- 2010-
2008 2009 Later
00-59 00-59 00-59 Minute
-- 88 -- Not Applicable or Not Notified
99 99 99 Unknown


C30B Minute of Arrival at Scene

Definition: This data element records the minutes after the hour that emergency medical
service arrived on the crash scene.

Prior to 2015, this data element’s Locator Code or Data Element Number was C29B.
SAS Name: ARR_MIN
1975- 1999- 2009-
1998 2008 Later
00-59 00-59 00-59 Minute
00 -- Not Notified or Officially Cancelled
(when ARR_HOUR = 00)
-- 00 -- Not Notified (when ARR_HOUR = 00)
-- -- 88 Not Applicable or Not Notified
-- 97 97 Officially Cancelled
-- 98 98 Unknown if Arrived
99 99 99 Unknown Minutes
```

So we can use the time elements in the data to see how long it takes for the emergency medical services to arrive.