# Playing around with Pittsburgh 311 Data


This notebook is going to play around with the [311 Data](https://data.wprdc.org/dataset/311-data) from the [Western Pennsylvania Regional Data Center](http://www.wprdc.org/)


I have taken the liberty of downloading the 311 data 


In [None]:
# use the %ls magic to list the files in the current directory.
%ls

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sms
%matplotlib inline

In [None]:
three11s = pd.read_csv("data/pgh-311.csv", parse_dates=['CREATED_ON'])

In [None]:
three11s.dtypes

In [None]:
three11s.head()

In [None]:
three11s.loc[0]

## Embedded Plots

In [None]:
# Plot the number of 311 requests per month

month_counts = three11s.groupby(three11s.CREATED_ON.dt.month)

y = month_counts.size()
x = month_counts.CREATED_ON.first()

axes = pd.Series(y.values, index=x).plot(figsize=(15,5))

plt.ylim(0)
plt.xlabel('Month')
plt.ylabel('Complaint')


# Exploring Request types

In [None]:
grouped_by_type = three11s.groupby(three11s.REQUEST_TYPE)

size = grouped_by_type.size()
size
#len(size)
#size[size > 200]


There are too many request types (268). We need some higher level categories to make this more comprehensible. Fortunately, there is an [Issue and Category codebook](https://data.wprdc.org/dataset/311-data/resource/40ddfbed-f225-4320-b4d2-7f1e09da72a4) that we can use to map between low and higher level categories.

In [None]:
codebook = pd.read_csv('data/codebook.csv')
codebook.head()

In [None]:
merged_data = pd.merge(three11s, 
                       codebook[['Category', 'Issue']], 
                       how='left',
                       left_on="REQUEST_TYPE", 
                       right_on="Issue")

In [None]:
merged_data.head()

In [None]:
grouped_by_type = merged_data.groupby(merged_data.Category)
size = grouped_by_type.size()
size

That is a more manageable list of categories for data visualization. Let's take a look at the distribution of requests per category in the dataset.

In [None]:
size.plot(kind='barh', figsize=(8,6))

## Looking at requests at the neighborhood level


Thankfully, the 311 data from the WPRDC already includes neighborhood information for each request in the NEIGHBORHOOD column. We can take advantage of this to filter and count requests by neighborhood.

In [None]:
merged_data.groupby(merged_data.NEIGHBORHOOD).size().sort(inplace=False,
                                                         ascending=False)

In GRAPH form

In [None]:
merged_data.groupby(merged_data.NEIGHBORHOOD).size().sort(inplace=False,
                                                         ascending=True).plot(kind="barh", figsize=(5,20))

So we can see from the graph above that Brookline, followed by the South Side Slopes, Carrick, and South Side Flats, make the most 311 requests. It would be interesting to get some neighborhood population data and compute the number of requests per capita. 

I bet those data are available, **maybe YOU could create that graph!**

## Widgets

Jupyter Notebooks have a very powerful [widget](https://github.com/ipython/ipywidgets) framework that allows you to easily add interactive components to live notebooks. 

In [None]:
# create a function that generates a chart of requests per neighborhood
def issues_by_neighborhood(neighborhood):
    """Generates a plot of issue categories by neighborhood"""
    grouped_by_type = merged_data[merged_data['NEIGHBORHOOD'] == neighborhood].groupby(merged_data.Category)
    size = grouped_by_type.size()
    size.plot(kind='barh', figsize=(8,6))

In [None]:
issues_by_neighborhood('Greenfield')

In [None]:
issues_by_neighborhood('Brookline')

In [None]:
issues_by_neighborhood('Garfield')

In [None]:
from ipywidgets import interact

@interact(hood=sorted(list(pd.Series(three11s.NEIGHBORHOOD.unique()).dropna())))
def issues_by_neighborhood(hood):
    """Generates a plot of issue categories by neighborhood"""
    grouped_by_type = merged_data[merged_data['NEIGHBORHOOD'] == hood].groupby(merged_data.Category)
    size = grouped_by_type.size()
    size.plot(kind='barh',figsize=(8,6))