# Analytical Detective
#### By John Bobo based on a problem set from MIT’s Analytics Edge MOOC
#### April 25, 2016


Crime is an international concern, but it is documented and handled in very different ways in different countries. In the United States, violent crimes and property crimes are recorded by the Federal Bureau of Investigation (FBI).  Additionally, each city documents crime, and some cities release data regarding crime rates. The city of Chicago, Illinois releases crime data from 2001 onward [online](https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2).

Chicago is the third most populous city in the United States, with a population of over 2.7 million people.

There are two main types of crimes: violent crimes, and property crimes. In this problem, we'll focus on one specific type of property crime, called "motor vehicle theft" (sometimes referred to as grand theft auto). This is the act of stealing, or attempting to steal, a car. In this problem, we'll use some basic data analysis in R to understand the motor vehicle thefts in Chicago. 

Please download the file [mvtWeek1.csv]( https://d37djvu3ytnwxt.cloudfront.net/asset-v1:MITx+15.071x_3+1T2016+type@asset+block/mvtWeek1.csv) for this exploration (do not open this file in any spreadsheet software before completing this problem because it might change the format of the Date field). Here is a list of descriptions of the variables:

  * **ID**: a unique identifier for each observation  
  * **Date**: the date the crime occurred  
  * **LocationDescription**: the location where the crime occurred  
  * **Arrest**: whether or not an arrest was made for the crime (TRUE if an arrest was made, and FALSE if an arrest was not made)    
  * **Domestic**: whether or not the crime was a domestic crime, meaning that it was committed against a family member (TRUE if it was domestic, and FALSE if it was not domestic)  
  * **Beat**: the area, or "beat" in which the crime occurred. This is the smallest regional division defined by the Chicago police department.  
  * **District**: the police district in which the crime occurred. Each district is composed of many beats, and are defined by the Chicago Police Department.  
  * **CommunityArea**: the community area in which the crime occurred. Since the 1920s, Chicago has been divided into what are called "community areas", of which there are now 77. The community areas were devised in an attempt to create socially homogeneous regions.   
  * **Year**: the year in which the crime occurred.  
  * **Latitude**: the latitude of the location at which the crime occurred.  
  * **Longitutde**: the longitude of the location at which the crime occurred.

## Loading the Data

In [10]:
import numpy as np
import pandas as pd

%matplotlib inline

import seaborn as sns

in_file = '/Users/johnbobo/analytics_edge/data/mvtWeek1.csv'
mvt = pd.read_csv(in_file, low_memory=False)


In [12]:
mvt.head()

Unnamed: 0,ID,Date,LocationDescription,Arrest,Domestic,Beat,District,CommunityArea,Year,Latitude,Longitude
0,8951354,12/31/12 23:15,STREET,False,False,623,6,69,2012,41.756284,-87.621645
1,8951141,12/31/12 22:00,STREET,False,False,1213,12,24,2012,41.898788,-87.661303
2,8952745,12/31/12 22:00,RESIDENTIAL YARD (FRONT/BACK),False,False,1622,16,11,2012,41.969186,-87.76767
3,8952223,12/31/12 22:00,STREET,False,False,724,7,67,2012,41.769329,-87.657726
4,8951608,12/31/12 21:30,STREET,False,False,211,2,35,2012,41.837568,-87.621761


**How many observations does our dataset hold?**

In [15]:
mvt.shape[0]

191641

**How many variables are in this dataset?**

In [16]:
mvt.shape[1]

11

**What is the maximum values for the variable `ID`?**

In [17]:
max(mvt.ID)

9181151

**What is the minimum value of the variable `Beat`?**

In [19]:
min(mvt.Beat)

111

**How many observations have value `True` in the `Arrest` variable?**

In [23]:
sum(mvt.Arrest == True)

15536

**How many observations have a LocationDescription value of `ALLEY`?**

In [24]:
sum(mvt.LocationDescription == "ALLEY")

2308

## Understanding Dates

In what format are the entries in the variable Date?

In [26]:
type(mvt.Date[0])

str