Reshaping DataFrames
---

### Grouping Data

Let's work with some real data from Pittsburgh in this example. I got this data from [Western Pennslyvania Regional Data Center](http://www.wprdc.org/). First, we should get an idea of the shape of the data:

In [None]:
df = pd.read_csv("311.csv")
df.head()

This data was collected by the city of Pittsburgh from 311 calls. We are going to use the `groupby` functionality to extract some information from this data.

I want you to extract some data for your neighborhood. First we will create the groupby object for the column `"NEIGHBORHOOD"`.

In [None]:
neighborhood = df.groupby(by="NEIGHBORHOOD")

- To get the groups, you can use the `groups` data member.
- We can determine the number of 311 calls from each group by using the `count` method on the grouped `DataFrame` (I use head below to reduce the amount of output)

In [None]:
neighborhood.count().head()

### Tasks

1. Select one of the columns from the grouped `DataFrame` and print the counts for all neighborhoods
2. Did your neighborhood make the list?
3. Which neighborhood has the most 311 calls?

For the neighborhood with the most 311 calls, lets group again by the `"REQUEST_TYPE"`

To get a group from a `DataFrame` you can use the `get_group` method, example:

In [None]:
neighborhood.get_group("Allegheny Center")

### Tasks

1. Using the `get_group` and `groupby` functions, downselect the `neighborhood` `DataFrame` to the neighborhood with the most 311 calls and determine how many different types of requests were made

- If we wanted to see all 311 calls for a particular neighborhood and request type we could simply make a groupby object for both columns!

In [None]:
requests_by_neighborhood = df.groupby(by=["NEIGHBORHOOD", "REQUEST_TYPE"])
requests_by_neighborhood.get_group(("Allegheny Center", "Potholes"))

- Grouping is very useful when you want to aggregrate based on duplicate entries

### Pivoting

- We can use pivoting to change the shape of our data. For example, if we wanted the police zones as our columns and neighborhood as our values.

In [None]:
police_zones = df.pivot(values="NEIGHBORHOOD", columns="POLICE_ZONE")
police_zones.head()

- Now we have a new `DataFrame` with a few columns: `nan`, `1.0`, `2.0`, `3.0`, `4.0`, `5.0`, and `6.0`
- My guess is the `nan` is because there are cases where the police zone is not specified, let's remove it

In [None]:
police_zones = police_zones.iloc[:, 1:]
police_zones.head()

- For each column, let's get the unique entries:

In [None]:
for col in police_zones.columns:
    print(col)
    print(police_zones[col].unique())