# Visualising Delaware Restaurant Violations for 2016

## 1. Installing dependencies

Firstly we have to make sure all the python packages we wish to use are installed.

Prefixing a line with `!` will run the rest of the line as a shell command.

The following is just like using `pip` from the shell.

In [23]:
# Install dependencies
# Note we want to make sure that we are installing at least version 0.3.0 of folium
! pip install folium==0.3 pandas python-dateutil cufflinks requests



Note: pip is the most common way to install Python dependencies

## 2. Importing packages before we start working

Before we can use them we have to import the dependecies we installed in step 1.

In [30]:
# Import builtin python packages
import os

# Import installed python packages
import pandas as pd
import cufflinks
import folium
import dateutil
import requests

## 3. Downloading data

### Restaurant violation data

Next lets download some data from _http://data.delaware.gov_.

The Health category contains a dataset of _Restuarant Inspection Violations_


In [85]:
# Download Delaware 'Restuarant Inspection Violations' data from
# https://data.delaware.gov/Health/Restaurant-Inspection-Violations/384s-wygj

# Make sure that the data dir exists
os.makedirs('./data', exist_ok=True)

# restaurant_inspection_violations_url = 'https://data.delaware.gov/api/views/384s-wygj/rows.csv?accessType=DOWNLOAD'
# Note: The notebooks.azure.com service limits where data can be downloaded from.
#       We've downloaded the data to github with IS accessible within notebooks.azure.com
#       Just uncomment the url below and comment out the one above
restaurant_inspection_violations_url = 'https://raw.githubusercontent.com/tomdottom/putting-data-on-the-web/master/data/restaurant_inspection_violations.csv'

# Download the data
with open('./data/restaurant_inspection_violations.csv', 'wb') as fh:
    response = requests.get(restaurant_inspection_violations_url)
    if response.status_code == 200:
        fh.write(response.content)
        print("Download succeeded")
    else:
        print("Download failed")
        print(response.reason)

Download succeeded


### GeoJSON shapes for Delaware

https://en.wikipedia.org/wiki/GeoJSON
> GeoJSON[1] is an open standard format designed for representing simple geographical features ...

State, county, census tract, congressional district, and zip code geographic geojson shapes can be found at https://github.com/tomdottom/delaware_geojson

You may have trouble using the command below if using the Microsoft Azure Notebooks service.
In which case follow the instructions on https://notebooks.azure.com/faq#upload_data
 

In [86]:
# Download Delaware zip code geojson shapes
    
restaurant_inspection_violations_url = 'https://raw.githubusercontent.com/tomdottom/delaware_geojson/master/zcta.geojson'
# Note: The notebooks.azure.com service limits where data can be downloaded from.
#       We've downloaded the data to github with IS accessible within notebooks.azure.com
#       Just uncomment the url below and comment out the one above
# restaurant_inspection_violations_url = 'https://raw.githubusercontent.com/tomdottom/putting-data-on-the-web/master/data/de_zcta.geojson'

# Download the data
with open('./data/de_zcta.geojson', 'wb') as fh:
    response = requests.get(restaurant_inspection_violations_url)
    if response.status_code == 200:
        fh.write(response.content)
        print("Download succeeded")
    else:
        print("Download failed")
        print(response.reason)

Download succeeded


## 4. Load the restaurant data

In [87]:
# Read in restaurant violation data into a pandas dataframe
violations_df = pd.read_csv('./data/restaurant_inspection_violations.csv')

In [88]:
# Look at the first 3 rows of the loaded data
violations_df.head(3)

Unnamed: 0,Food Establishment Name,Food Establishment Street Address,Food Establishment City,Food Establishment Zip Code,Inspection Date,Inspection Type,Violation 1,Violation 1 Description,Violation 2,Violation 2 Description,...,Violation 9 Description,Violation 10,Violation 10 Description,Violation 11,Violation 11 Description,Violation 12,Violation 12 Desription,Violation 13,Violation 13 Description,Geocoded Location
0,A'LATTE SOUL,1053 B WALNUT STREET,MILFORD,19963,01/19/2016,Routine,,,,,...,,,,,,,,,,"1053 B WALNUT STREET\nMILFORD, DE 19963\n"
1,WAL-MART STORE 1736-99,4898 N. DUPONT HWY.,CHESWOLD,19936,08/05/2016,Followup,,,,,...,,,,,,,,,,"4898 N. DUPONT HWY.\nCHESWOLD, DE 19936\n"
2,CHINA TASTE,28263 LEXUS DRIVE UNIT 2,MILFORD,19963,08/08/2016,Routine,6-501_111 Controlling Pests Pf ( C),Violation - Corrected on Site,,,...,,,,,,,,,,"28263 LEXUS DRIVE UNIT 2\nMILFORD, DE 19963\n"


## 5. Prepare the data

This step can often be skipped if the csv contains data in correct format.

In this case we will be using the the powerful [Python Pandas](http://pandas.pydata.org/) library to manipulate the data.

This could be done just as easily with excel if you are more comfortable with it.

We need two columns. One with the `zip code` and the other with the `violation count`

In [89]:
# Add a new year column by extracting year from inspection date
violations_df['Inspection Year'] = violations_df['Inspection Date'].map(lambda x: dateutil.parser.parse(x).year)

In [90]:
# Just grad 2016 data
violations_2016_df = violations_df[violations_df['Inspection Year'] == 2016]

In [91]:
violations_by_zipcode = violations_2016_df['Food Establishment Zip Code'].value_counts()
violations_by_zipcode = violations_by_zipcode.to_frame()
violations_by_zipcode = violations_by_zipcode.reset_index()
violations_by_zipcode.columns = ['zip code', 'violation count']

# We can save the transformed data to a csv
violations_by_zipcode.to_csv('./de_restaurant_violation_by_zipcode.csv')

In [92]:
# violations_by_zipcode
violations_by_zipcode.head(3)

Unnamed: 0,zip code,violation count
0,19720,431
1,19971,419
2,19801,415


## 6. Creating the `choropleth` map

https://en.wikipedia.org/wiki/Choropleth_map

> A choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income.

The [Folium](http://python-visualization.github.io/folium/) package will let us create a choropleth to visualise.

In [93]:
# Create a map centered and zoomed into Delaware
violations_map = folium.Map(location=[39, -75], zoom_start=8)
    
# Create the map
violations_map.choropleth(
    geo_path='./data/de_zcta.geojson',
    data=violations_by_zipcode,
    columns=['zip code', 'violation count'],   # These are the names of the columns of our data
    key_on='feature.properties.STATEFP10',     # This is a property of the GeoJSON file, if using a different geojson
                                               # you may have to open up the file and property you are interested in 
    fill_color='YlOrRd',                       # This is a brewer color (https://bl.ocks.org/mbostock/5577023)
    highlight=True,
    legend_name='Restaurant Inspection Violations 2016'
)

In [94]:
# To view the map in the notebook just type the variable name of the map.
violations_map

## 7. Save the map

We can save the map to a HTML document.

To use this in our own website we can open up this file and copy the needed code.

In [59]:
violations_map.save('./data/restaurant_violations.html')

In [60]:
# Open save html map in a new tab
import webbrowser
webbrowser.open_new_tab('./data/restaurant_violations.html')

True

[fin]