<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Pandas Joins Lab

_Instructor: Aymeric Flaisler_
___


Very often we may need to do some data transformations to do some merging. For example, if we want to provide some information on events in a given area, we might have a dataset that looks like the following:

| Block | Event   |
|------|------|
|   1  | Block Party|
|   2  | Block Party|
|   1  | House Party|
|   1  | Open Bar|

In this example, we have multiple rows for Block (3 rows for Block 1 and 1 row for Block 2). If we wanted to join these to another table on the block keys, we'd be doing what's known as a 1 to many join. We will be revisiting that topic later.

Another option is to create some aggregate function (a count, a mean, a median, etc.) so that our data set has only one row per key. If we counted up the number of events in our toy dataset above, it might look like:

| Block | count(Event)   |
|------|------|
|   1  | 3|
|   2  | 1|
    
This sort of groupby aggregation allows us to join a larger dataset with a smaller, provided that we can summarize them using some sort of aggregate function.

# Your Mission, Should You Choose to Accept it

In this lab, you will take the role of an enterprising researcher, making use of the numerous free datasets available at the [City of Chicago Data Portal](https://data.cityofchicago.org/). You have a hunch that different types of reporting to 311, the City's information line, might be correlated with demographic characteristics of the 77 [community areas of Chicago](https://en.wikipedia.org/wiki/Community_areas_in_Chicago). You have downloaded some of this data in the following forms:

- **2008-2012-chi-census.csv** - A few selected Census outcomes from 2008-2012, aggregated by the Community Area
- **chicago_311_abandoned_vehicles.csv** - Calls to 311 for abandoned vehicles in 2008-2012
- **chicago_311_graffiti.csv** - Calls to 311 for graffiti removal in 2008-2012
- **chicago_311_vacant_abandoned_building.csv** - Calls to 311 about vacant or abandoned buildings in 2008-2012

Firing up your trusty laptop with Python, Numpy, and Pandas, you get to work. One way to join two of the datasets together, you realize, is with the following code:

```Python
census_data = pd.read_csv('2008-2012-chi-census.csv')
abandoned_vehicles = pd.read_csv('chicago_311_abandoned_vehicles.csv')

census_data.merge(abandoned_vehicles.groupby('Community Area').count(), 
        left_on='Community Area Number', 
        right_index=True, how='inner')
```

**Note:** We're doing a couple of things here that we haven't done before!

1. If our keys are named differently in each dataset, we can identify them by using **left_on** and **right_on** to point to the keys in the left and right dataset respectively
2. We are counting all the rows per **'Community Area'** in **abandoned_vehicles**. For the rest of this exercise, feel free to use that construction but, if you're interested in learning more, [df.groupby](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html) contains the documentation for the **groupby** method in Pandas
3. We can also merge on the index of a dataframe, using **left_index=True** (if we want to join on the left dataset's index) or **right_index=True** (if we want to join on the right dataset's index).

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inline

### 1. Load each csv into Python and take a few minutes to explore each. What sort of data does it provide?

In [2]:
# A:
census = pd.read_csv('./datasets/2008-2012-chi-census.csv')
vehicles = pd.read_csv('./datasets/chicago_311_abandoned_vehicles.csv')
graffiti = pd.read_csv('./datasets/chicago_311_graffiti.csv')
buildings = pd.read_csv('./datasets/chicago_311_vacant_abandoned_building.csv')

In [3]:
census.head()

Unnamed: 0,Community Area Number,Community Area Name,Percent Housing Crowded,Percent Households Below Poverty,Percent Aged 16+ Unemployed,Percent Aged 25+ Without HS Diploma,Percent Aged Under 18 or Over 64,Per Capita Income,Hardship Index
0,1.0,Rogers Park,7.7,23.6,8.7,18.2,27.5,23939,39.0
1,2.0,West Ridge,7.8,17.2,8.8,20.8,38.5,23040,46.0
2,3.0,Uptown,3.8,24.0,8.9,11.8,22.2,35787,20.0
3,4.0,Lincoln Square,3.4,10.9,8.2,13.4,25.5,37524,17.0
4,5.0,North Center,0.3,7.5,5.2,4.5,26.2,57123,6.0


In [4]:
vehicles.head()

Unnamed: 0,Creation Date,Status,Completion Date,Service Request Number,Type of Service Request,License Plate,Vehicle Make/Model,Vehicle Color,Current Activity,Most Recent Action,...,ZIP Code,X Coordinate,Y Coordinate,Ward,Police District,Community Area,SSA,Latitude,Longitude,Location
0,11/06/2008,Completed,04/14/2011,08-02247245,Abandoned Vehicle Complaint,ECV9366,General Motors Corp.,Maroon,,,...,60651.0,1146868.0,1905922.0,37.0,11.0,23.0,,41.897894,-87.735872,"(41.89789380014405, -87.73587218370734)"
1,02/03/2009,Completed - Dup,06/20/2012,09-00193526,Abandoned Vehicle Complaint,43930H,Ford,White,FVI - Outcome,Create Work Order,...,60639.0,1143057.0,1913138.0,31.0,25.0,19.0,,41.918219,-87.749987,"(41.91821929083758, -87.74998656996482)"
2,03/19/2009,Completed,03/22/2011,09-00478279,Abandoned Vehicle Complaint,X397109,Ford,White,FVI - Outcome,Vehicle was moved from original address requested,...,60628.0,,,8.0,5.0,50.0,51.0,41.72106,-87.595821,"(41.72105953472156, -87.59582056221775)"
3,05/13/2009,Completed,01/26/2011,09-00806953,Abandoned Vehicle Complaint,,,,,,...,60639.0,1138769.0,1911409.0,29.0,25.0,25.0,,41.913093,-87.765781,"(41.91309264271512, -87.76578124615286)"
4,05/13/2009,Completed,01/26/2011,09-00806954,Abandoned Vehicle Complaint,,,,,,...,60623.0,,,24.0,10.0,29.0,,41.861539,-87.715425,"(41.861538991418534, -87.71542483156753)"


In [5]:
graffiti.head()

Unnamed: 0,Creation Date,Status,Completion Date,Service Request Number,Type of Service Request,What Type of Surface is the Graffiti on?,Where is the Graffiti located?,Street Address,ZIP Code,X Coordinate,Y Coordinate,Ward,Police District,Community Area,SSA,Latitude,Longitude,Location
0,04/08/2008,Completed,05/25/2014,08-00601980,Graffiti Removal,Wood - Painted,Front,249 W CERMAK RD,60616.0,1174940.0,1889749.0,25.0,21.0,34.0,,41.852717,-87.633935,"(41.85271673337672, -87.6339345627447)"
1,04/08/2008,Completed - Dup,10/22/2014,08-00601980,Graffiti Removal,Wood - Painted,Front,249 W CERMAK RD,60616.0,1174940.0,1889749.0,25.0,21.0,34.0,,41.852717,-87.633935,"(41.85271673337672, -87.6339345627447)"
2,04/17/2009,Completed,09/18/2015,09-00652518,Graffiti Removal,Metal,Pole,50 W WACKER DR,60601.0,1175891.0,1902174.0,42.0,1.0,32.0,,41.887042,-87.62987,"(41.887041950950504, -87.62987031372374)"
3,04/17/2009,Completed,09/21/2015,09-00652518,Graffiti Removal,Metal - Painted,Pole,50 W WACKER DR,60601.0,1175891.0,1902174.0,42.0,1.0,32.0,,41.887042,-87.62987,"(41.887041950950504, -87.62987031372374)"
4,06/16/2009,Completed,05/18/2016,09-01018187,Graffiti Removal,Brick - Unpainted,Rear,1803 W MONTROSE AVE,60613.0,1163432.0,1929256.0,47.0,19.0,5.0,31.0,41.961407,-87.674628,"(41.96140670845814, -87.67462801002456)"


In [6]:
buildings.head()

Unnamed: 0,SERVICE REQUEST TYPE,SERVICE REQUEST NUMBER,DATE SERVICE REQUEST WAS RECEIVED,"LOCATION OF BUILDING ON THE LOT (IF GARAGE, CHANGE TYPE CODE TO BGD).",IS THE BUILDING DANGEROUS OR HAZARDOUS?,IS BUILDING OPEN OR BOARDED?,"IF THE BUILDING IS OPEN, WHERE IS THE ENTRY POINT?",IS THE BUILDING CURRENTLY VACANT OR OCCUPIED?,IS THE BUILDING VACANT DUE TO FIRE?,"ANY PEOPLE USING PROPERTY? (HOMELESS, CHILDEN, GANGS)",...,ADDRESS STREET SUFFIX,ZIP CODE,X COORDINATE,Y COORDINATE,Ward,Police District,Community Area,LATITUDE,LONGITUDE,Location
0,Vacant/Abandoned Building,08-00109075,01/18/2008,,,,,,,,...,ST,60613.0,,,,,,,,
1,Vacant/Abandoned Building,08-00577896,04/03/2008,,,Building is Open / Unsecure,,Vacant,,,...,ST,60621.0,1170179.0,1858859.0,17.0,7.0,68.0,41.768198,-87.651771,"(41.76819814695611, -87.65177097869127)"
2,Vacant/Abandoned Building,08-00588295,04/05/2008,,,Building is Open / Unsecure,"GARAGE, VAGRANTS BROKE INTO GARAGE AND USE IT ...",Vacant,,True,...,AVE,60619.0,1182657.0,1850683.0,6.0,6.0,44.0,41.745482,-87.606287,"(41.745482414802325, -87.60628681474407)"
3,Vacant/Abandoned Building,08-01476976,07/30/2008,,,Building is Open / Unsecure,REAR,Vacant,,,...,AVE,60621.0,1174523.0,1857609.0,6.0,7.0,68.0,41.764674,-87.635884,"(41.764673747551555, -87.63588403606937)"
4,Vacant/Abandoned Building,08-01559367,08/07/2008,,,Building is Open / Unsecure,FRONT AND REAR,Vacant,,,...,ST,60636.0,1169023.0,1855703.0,17.0,7.0,67.0,41.759564,-87.656096,"(41.75956423181548, -87.65609637199394)"


### 2. Join each of the 311 calls datasets (using pandas's grouby function) to get a count of each type of call per Community Area

In [7]:
# A:
_ls_df = [census,
          vehicles,
          graffiti,
          buildings]
[_df.columns.values for _df in _ls_df]

In [None]:
# Apply the groupby function on the community area / byt the type of service calls


In [None]:
# Rename the columns


In [None]:
# Merge the data altogther


### 3. What sort of trends can you identify? How would you do so (via plotting, analysis, etc.)?



In [None]:
# A:

### 4. What sorts of questions would you want to use this data to answer?

In [None]:
# A: