# Chicago crime and weather

Let's see if we can combine the data from the previous two sections and analyse them both side by side and see if there is some correlation between temperature and crime rates. (Anyone seen "[Body Heat](https://www.imdb.com/title/tt0082089/)"?)

In [2]:
%matplotlib inline

import pandas as pd
import numpy as np

I've gone through the crimes CSV file and it turns out we actually haven't gone through all of the 2016 data. The rows are very chaotically ordered. Let's see if we can filter through the file, only select the 2016 data and throw away 
the rest (mainly due to memory constraints of the machines people might want to run these examples on). We're gonna use the `iterator` and `chunksize` options of the `read_csv` function (for such slightly more advanced uses see [the docs](http://pandas.pydata.org/pandas-docs/stable/io.html#csv-text-files)).

In [16]:
file_path = 'data/chicago_crime/Crimes_-_2001_to_present.csv'
chunksize = 100000
df_chunks = []

for df in pd.read_csv(file_path, parse_dates=[2], index_col=[2], chunksize=chunksize, iterator=True):
    df = df.loc['2016-01-01':'2017-01-01']
    df_chunks.append(df)
    
crimes2016 = pd.concat(df_chunks)

# Serialising & compressing the data
crimes_file_path = 'data/chicago_crime/crimes.csv.gzip'
crimes2016.sort_index(ascending=True).to_csv(crimes_file_path, compression='gzip')

crimes2016.head()

Unnamed: 0_level_0,ID,Case Number,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2016-06-20 09:00:00,10606686,HZ358366,047XX S INDIANA AVE,1562,SEX OFFENSE,AGG CRIMINAL SEXUAL ABUSE,RESIDENCE,True,False,224,...,3.0,38.0,17,1178398.0,1873627.0,2016,03/01/2018 03:52:35 PM,41.808541,-87.621195,"(41.808540691, -87.621194998)"
2016-11-12 07:30:00,10750326,HZ512286,038XX W 61ST ST,031A,ROBBERY,ARMED: HANDGUN,SIDEWALK,True,False,823,...,13.0,65.0,03,1151816.0,1863880.0,2016,03/01/2018 03:52:35 PM,41.782357,-87.718948,"(41.782356535, -87.718947917)"
2016-05-03 21:08:00,22451,HZ250365,074XX S MAPLEWOOD AVE,0110,HOMICIDE,FIRST DEGREE MURDER,STREET,False,False,835,...,18.0,66.0,01A,1160663.0,1855290.0,2016,05/10/2016 03:56:50 PM,41.758606,-87.686749,"(41.758606301, -87.686748849)"
2016-05-03 14:00:00,22452,HZ249724,049XX W MONROE ST,0110,HOMICIDE,FIRST DEGREE MURDER,STREET,False,False,1533,...,28.0,25.0,01A,1143598.0,1899211.0,2016,05/10/2016 03:56:50 PM,41.879467,-87.748196,"(41.879467146, -87.748195577)"
2017-01-01 00:01:00,11227508,JB146365,027XX S WHIPPLE ST,1754,OFFENSE INVOLVING CHILDREN,AGG SEX ASSLT OF CHILD FAM MBR,RESIDENCE,False,False,1033,...,12.0,30.0,02,,,2017,02/11/2018 03:57:41 PM,,,


Now we can quickly deserialise the crimes data.

In [25]:
crimes2016 = pd.read_csv(crimes_file_path, index_col=0, parse_dates=True, compression='gzip')
crimes2016