## Final Project - Part-time Data Science
## Rationing Resources for the Los Angeles Police Department
## Joseph Roeges - 4-10-19

## A. Introduction


### The Los Angeles Police Department has long suffered from extensive underfunding and the region has had a notoriously high crime rate, affecting how the precincts provide service and coverage for the corresponding neighborhoods. To help alleviate the strain that is put on officers, administrators, and the Los Angeles City Council, data scientists can use previously cataloged data to determine the best usage of specialized staff and training procedures, the correct equipment to best protect local residents, and the region that requires the most resources, by the time of day that the crime is occurring. 

## B. Project Goal

### The goal of this project is to determine what variables would improve the institution’s ability to predict the time of day that a crime would take place to help redistribute resources throughout the department. The model based on our training data should surpass randomized predictions. 


## C. Data Set

### The dataset for this project includes over 1 million rows of crime reports between 2010 and 2017 and the columns will include information about the date of the crime, the time that it occurred, the type of crime, the area that it took place, the description of the crime, the age of the victim, the sex of the victim, the description of the premise, and the weapon used. The database was uploaded as “Crime Data from 2010 to Present,” from Data.gov. 
### My initial hypothesis is that violent crimes and those involving weapons will occur most often between the late evening (9pm) and early morning (3am). We will require a strong correlation between specific variables and the time of day to determine if the model is a good predictor.
### There are also limitations to and assumptions for our analysis. Most crimes are not reported so we cannot determine how this model will affect all victims of a crime. This data also does not consider the individual background of the victim or perpetrator, the unique harbinger that led to the event, or the result of due process of the report through the judicial system. We will only focus on the superficial aspects of the reported cases. Once the database is analyzed, we hope to better understand how, where, what type, and when crimes are occurring in Los Angeles to improve the efficiency of the institution.

## D. Exploratory Data Analysis

In [165]:
import pandas as pd #Reading in data languages to use in this analysis
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

plt.style.use('fivethirtyeight') #Choosing a style for the presentation of data in plots.
%matplotlib inline

# Increasing default figure and font sizes for easier viewing.
plt.rcParams['figure.figsize'] = (7, 5)
plt.rcParams['font.size'] = 14

In [166]:
#Reading information from the .csv file and placing it in 'LA_Crime' to manipulate.
LA_Crime = pd.read_csv('~/Documents/General-Assembly/Crime_Data_from_2010_to_Present.csv')

In [167]:
#Presenting the first five rows of data.
LA_Crime.head()

Unnamed: 0,DR Number,Date Reported,Date Occurred,Time Occurred,Area ID,Area Name,Reporting District,Crime Code,Crime Code Description,MO Codes,...,Weapon Description,Status Code,Status Description,Crime Code 1,Crime Code 2,Crime Code 3,Crime Code 4,Address,Cross Street,Location
0,151521112,11/4/2015,11/3/2015,2230,,N Hollywood,1555,330,BURGLARY FROM VEHICLE,344,...,,IC,Invest Cont,330.0,,,,11100 CAMARILLO ST,,"(34.1577, -118.3727)"
1,151521113,11/4/2015,10/30/2015,200,,N Hollywood,1548,330,BURGLARY FROM VEHICLE,0344 1609 1307,...,,IC,Invest Cont,330.0,,,,11100 CHANDLER BL,,"(34.1681, -118.3724)"
2,151521117,11/4/2015,11/4/2015,1400,,N Hollywood,1506,930,CRIMINAL THREATS - NO WEAPON DISPLAYED,421,...,VERBAL THREAT,JA,Juv Arrest,930.0,,,,7300 BAKMAN AV,,"(34.203, -118.3779)"
3,151521121,11/4/2015,4/28/2015,2125,,N Hollywood,1567,121,"RAPE, FORCIBLE",2000 0429 1241 0416 0400 0527 1813 2002,...,"STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)",AO,Adult Other,121.0,,,,10700 LANDALE ST,,"(34.1513, -118.3642)"
4,151521123,11/5/2015,10/27/2015,600,,N Hollywood,1515,354,THEFT OF IDENTITY,0100 1822,...,,IC,Invest Cont,354.0,,,,11700 LEMAY ST,,"(34.1912, -118.3891)"


In [168]:
#There are 1,048,575 rows of individual cases and 26 columns of case identifiers. 
LA_Crime.shape

(1048575, 26)

In [169]:
#The 26 different case identifiers are listed below.
LA_Crime.columns

Index(['DR Number', 'Date Reported', 'Date Occurred', 'Time Occurred',
       'Area ID', 'Area Name', 'Reporting District', 'Crime Code',
       'Crime Code Description', 'MO Codes', 'Victim Age', 'Victim Sex',
       'Victim Descent', 'Premise Code', 'Premise Description',
       'Weapon Used Code', 'Weapon Description', 'Status Code',
       'Status Description', 'Crime Code 1', 'Crime Code 2', 'Crime Code 3',
       'Crime Code 4', 'Address', 'Cross Street', 'Location '],
      dtype='object')

In [170]:
# I renamed columns as to be represent the data and make information easier to access.
LA_Crime.rename(columns={'DR Number': 'File_Number', 'Date Reported':'Date_Reported','Date Occurred':'Date_Occured',
                     'Time Occured':'Time_Occured','Area ID':'Area_ID','Area Name':'Area_Name','Reporting District':
                     'R_District','Crime Code':'Crime_Code','Crime Code Description':'Crime_Code_Desc','MO Codes':
                     'MO_Codes','Victim Age':'Victim_Age','Victim Sex':'Victim_Sex','Victim Descent':'Victim_Descent',
                     'Premise Code':'Premise_Code','Premise Description':'Premise_Desc','Weapon Used Code':'Weapon_Code',
                     'Weapon Description':'Weapon_Desc','Status Code':'Status_Code','Status Description':'Status_Desc',
                     'Crime Code 1':'Crime_Code_1','Crime Code 2':'Crime_Code_2','Crime Code 3':'Crime_Code_3','Crime Code 4':
                     'Crime_Code_4','Address':'Address_Com','Cross Street':'Cross_Str_Com','Location ':'Loc_Com'}, 
            inplace=True)

In [171]:
LA_Crime.head() #The changes to the columns can be viewed below.

Unnamed: 0,File_Number,Date_Reported,Date_Occured,Time Occurred,Area_ID,Area_Name,R_District,Crime_Code,Crime_Code_Desc,MO_Codes,...,Weapon_Desc,Status_Code,Status_Desc,Crime_Code_1,Crime_Code_2,Crime_Code_3,Crime_Code_4,Address_Com,Cross_Str_Com,Loc_Com
0,151521112,11/4/2015,11/3/2015,2230,,N Hollywood,1555,330,BURGLARY FROM VEHICLE,344,...,,IC,Invest Cont,330.0,,,,11100 CAMARILLO ST,,"(34.1577, -118.3727)"
1,151521113,11/4/2015,10/30/2015,200,,N Hollywood,1548,330,BURGLARY FROM VEHICLE,0344 1609 1307,...,,IC,Invest Cont,330.0,,,,11100 CHANDLER BL,,"(34.1681, -118.3724)"
2,151521117,11/4/2015,11/4/2015,1400,,N Hollywood,1506,930,CRIMINAL THREATS - NO WEAPON DISPLAYED,421,...,VERBAL THREAT,JA,Juv Arrest,930.0,,,,7300 BAKMAN AV,,"(34.203, -118.3779)"
3,151521121,11/4/2015,4/28/2015,2125,,N Hollywood,1567,121,"RAPE, FORCIBLE",2000 0429 1241 0416 0400 0527 1813 2002,...,"STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)",AO,Adult Other,121.0,,,,10700 LANDALE ST,,"(34.1513, -118.3642)"
4,151521123,11/5/2015,10/27/2015,600,,N Hollywood,1515,354,THEFT OF IDENTITY,0100 1822,...,,IC,Invest Cont,354.0,,,,11700 LEMAY ST,,"(34.1912, -118.3891)"


In [172]:
LA_Crime.dtypes #This is showing the types for each description. All of these categories make sense.

File_Number          int64
Date_Reported       object
Date_Occured        object
Time Occurred        int64
Area_ID            float64
Area_Name           object
R_District           int64
Crime_Code           int64
Crime_Code_Desc     object
MO_Codes            object
Victim_Age         float64
Victim_Sex          object
Victim_Descent      object
Premise_Code       float64
Premise_Desc        object
Weapon_Code        float64
Weapon_Desc         object
Status_Code         object
Status_Desc         object
Crime_Code_1       float64
Crime_Code_2       float64
Crime_Code_3       float64
Crime_Code_4       float64
Address_Com         object
Cross_Str_Com       object
Loc_Com             object
dtype: object

In [173]:
LA_Crime.isnull().sum() #We now want to review if we have any missing data. We have several columns (15) with missing data.

File_Number              0
Date_Reported            0
Date_Occured             0
Time Occurred            0
Area_ID             883384
Area_Name                0
R_District               0
Crime_Code               0
Crime_Code_Desc          0
MO_Codes            119756
Victim_Age            1065
Victim_Sex          107051
Victim_Descent      107061
Premise_Code            17
Premise_Desc            79
Weapon_Code         687296
Weapon_Desc         687297
Status_Code              1
Status_Desc              0
Crime_Code_1             6
Crime_Code_2        978250
Crime_Code_3       1046505
Crime_Code_4       1048511
Address_Com              0
Cross_Str_Com       867930
Loc_Com                  0
dtype: int64

In [174]:
LA_Crime['Crime_Code_1'].nunique() #There are 6 missing values for this column, but it should be the same as the Crime Code

145

In [175]:
# We have use the 'Crime_Code' since is meant to mimic the entry for 'Crime_Code_1'.
LA_Crime.fillna(value={'Crime_Code_1': LA_Crime['Crime_Code'] }, inplace=True)
LA_Crime.head(10)

Unnamed: 0,File_Number,Date_Reported,Date_Occured,Time Occurred,Area_ID,Area_Name,R_District,Crime_Code,Crime_Code_Desc,MO_Codes,...,Weapon_Desc,Status_Code,Status_Desc,Crime_Code_1,Crime_Code_2,Crime_Code_3,Crime_Code_4,Address_Com,Cross_Str_Com,Loc_Com
0,151521112,11/4/2015,11/3/2015,2230,,N Hollywood,1555,330,BURGLARY FROM VEHICLE,344,...,,IC,Invest Cont,330.0,,,,11100 CAMARILLO ST,,"(34.1577, -118.3727)"
1,151521113,11/4/2015,10/30/2015,200,,N Hollywood,1548,330,BURGLARY FROM VEHICLE,0344 1609 1307,...,,IC,Invest Cont,330.0,,,,11100 CHANDLER BL,,"(34.1681, -118.3724)"
2,151521117,11/4/2015,11/4/2015,1400,,N Hollywood,1506,930,CRIMINAL THREATS - NO WEAPON DISPLAYED,421,...,VERBAL THREAT,JA,Juv Arrest,930.0,,,,7300 BAKMAN AV,,"(34.203, -118.3779)"
3,151521121,11/4/2015,4/28/2015,2125,,N Hollywood,1567,121,"RAPE, FORCIBLE",2000 0429 1241 0416 0400 0527 1813 2002,...,"STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)",AO,Adult Other,121.0,,,,10700 LANDALE ST,,"(34.1513, -118.3642)"
4,151521123,11/5/2015,10/27/2015,600,,N Hollywood,1515,354,THEFT OF IDENTITY,0100 1822,...,,IC,Invest Cont,354.0,,,,11700 LEMAY ST,,"(34.1912, -118.3891)"
5,151521128,11/5/2015,11/4/2015,2200,,N Hollywood,1548,510,VEHICLE - STOLEN,,...,,IC,Invest Cont,510.0,,,,5200 CAHUENGA BL,,"(34.1667, -118.3702)"
6,151521130,11/4/2015,11/4/2015,1930,,N Hollywood,1548,341,"THEFT-GRAND ($950.01 & OVER)EXCPT,GUNS,FOWL,LI...",0344 1202,...,,IC,Invest Cont,341.0,,,,5300 LANKERSHIM BL,,"(34.167, -118.3759)"
7,151521131,11/4/2015,10/5/2015,1200,,N Hollywood,1522,331,THEFT FROM MOTOR VEHICLE - GRAND ($400 AND OVER),344,...,,IC,Invest Cont,331.0,,,,13100 WELBY WY,,"(34.193, -118.4181)"
8,151521132,11/4/2015,11/4/2015,730,,N Hollywood,1523,745,VANDALISM - MISDEAMEANOR ($399 OR UNDER),329,...,,IC,Invest Cont,745.0,,,,6200 BEEMAN AV,,"(34.1831, -118.4073)"
9,151521134,11/4/2015,11/4/2015,2230,,N Hollywood,1548,440,THEFT PLAIN - PETTY ($950 & UNDER),0344 0104 0325,...,,AA,Adult Arrest,440.0,,,,5300 VINELAND AV,,"(34.167, -118.3703)"


In [176]:
# We will fill in all null responses with No Weapon in the Weapon Description column.
LA_Crime.fillna(value={'Weapon_Desc': 'No Weapon'}, inplace=True)
LA_Crime.head(10)

Unnamed: 0,File_Number,Date_Reported,Date_Occured,Time Occurred,Area_ID,Area_Name,R_District,Crime_Code,Crime_Code_Desc,MO_Codes,...,Weapon_Desc,Status_Code,Status_Desc,Crime_Code_1,Crime_Code_2,Crime_Code_3,Crime_Code_4,Address_Com,Cross_Str_Com,Loc_Com
0,151521112,11/4/2015,11/3/2015,2230,,N Hollywood,1555,330,BURGLARY FROM VEHICLE,344,...,No Weapon,IC,Invest Cont,330.0,,,,11100 CAMARILLO ST,,"(34.1577, -118.3727)"
1,151521113,11/4/2015,10/30/2015,200,,N Hollywood,1548,330,BURGLARY FROM VEHICLE,0344 1609 1307,...,No Weapon,IC,Invest Cont,330.0,,,,11100 CHANDLER BL,,"(34.1681, -118.3724)"
2,151521117,11/4/2015,11/4/2015,1400,,N Hollywood,1506,930,CRIMINAL THREATS - NO WEAPON DISPLAYED,421,...,VERBAL THREAT,JA,Juv Arrest,930.0,,,,7300 BAKMAN AV,,"(34.203, -118.3779)"
3,151521121,11/4/2015,4/28/2015,2125,,N Hollywood,1567,121,"RAPE, FORCIBLE",2000 0429 1241 0416 0400 0527 1813 2002,...,"STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)",AO,Adult Other,121.0,,,,10700 LANDALE ST,,"(34.1513, -118.3642)"
4,151521123,11/5/2015,10/27/2015,600,,N Hollywood,1515,354,THEFT OF IDENTITY,0100 1822,...,No Weapon,IC,Invest Cont,354.0,,,,11700 LEMAY ST,,"(34.1912, -118.3891)"
5,151521128,11/5/2015,11/4/2015,2200,,N Hollywood,1548,510,VEHICLE - STOLEN,,...,No Weapon,IC,Invest Cont,510.0,,,,5200 CAHUENGA BL,,"(34.1667, -118.3702)"
6,151521130,11/4/2015,11/4/2015,1930,,N Hollywood,1548,341,"THEFT-GRAND ($950.01 & OVER)EXCPT,GUNS,FOWL,LI...",0344 1202,...,No Weapon,IC,Invest Cont,341.0,,,,5300 LANKERSHIM BL,,"(34.167, -118.3759)"
7,151521131,11/4/2015,10/5/2015,1200,,N Hollywood,1522,331,THEFT FROM MOTOR VEHICLE - GRAND ($400 AND OVER),344,...,No Weapon,IC,Invest Cont,331.0,,,,13100 WELBY WY,,"(34.193, -118.4181)"
8,151521132,11/4/2015,11/4/2015,730,,N Hollywood,1523,745,VANDALISM - MISDEAMEANOR ($399 OR UNDER),329,...,No Weapon,IC,Invest Cont,745.0,,,,6200 BEEMAN AV,,"(34.1831, -118.4073)"
9,151521134,11/4/2015,11/4/2015,2230,,N Hollywood,1548,440,THEFT PLAIN - PETTY ($950 & UNDER),0344 0104 0325,...,No Weapon,AA,Adult Arrest,440.0,,,,5300 VINELAND AV,,"(34.167, -118.3703)"


In [177]:
LA_Crime['Weapon_Code'].value_counts().head() #This reviews the style of 

400.0    215558
500.0     30005
511.0     29032
102.0     18029
109.0      6923
Name: Weapon_Code, dtype: int64

In [178]:
LA_Crime.fillna(value={'Weapon_Code': 0 }, inplace=True)

In [179]:
LA_Crime.fillna(value={'Victim_Age':LA_Crime['Victim_Age'].mean()}, inplace=True)

In [180]:
LA_Crime.fillna(value={'Crime_Code_2':'Empty','Crime_Code_3':'Empty','Crime_Code_4':'Empty'}, inplace=True)

In [181]:
LA_Crime['Premise_Desc'].value_counts().head() #Reviewing the premise description column.

STREET                                          247667
SINGLE FAMILY DWELLING                          204597
MULTI-UNIT DWELLING (APARTMENT, DUPLEX, ETC)    126208
PARKING LOT                                      77153
SIDEWALK                                         52549
Name: Premise_Desc, dtype: int64

In [183]:
LA_Crime.fillna(value={'Premise_Desc': "STREET" }, inplace=True)
LA_Crime.isnull().sum()

MemoryError: 

In [None]:
LA_Crime.fillna(value={'Premise_Code': 0,'Cross_Str_Com':'None','MO_Codes':'None','Status_Code':0 }, inplace=True)
LA_Crime.isnull().sum()

In [164]:
LA_Crime2 = LA_Crime
LA_Crime2.dropna(subset=['Victim_Sex'],inplace=True)
LA_Crime2.dropna(subset=['Victim_Descent'],inplace=True)
LA_Crime2.drop(columns='Area_ID',inplace=True)
LA_Crime2.isnull().sum()

KeyError: "['Area_ID'] not found in axis"

In [13]:
LA_Crime.dtypes

DR Number                   int64
Date Reported              object
Date Occurred              object
Time Occurred               int64
Area ID                   float64
Area Name                  object
Reporting District          int64
Crime Code                  int64
Crime Code Description     object
MO Codes                   object
Victim Age                float64
Victim Sex                 object
Victim Descent             object
Premise Code              float64
Premise Description        object
Weapon Used Code          float64
Weapon Description         object
Status Code                object
Status Description         object
Crime Code 1              float64
Crime Code 2              float64
Crime Code 3              float64
Crime Code 4              float64
Address                    object
Cross Street               object
Location                   object
dtype: object

In [14]:
LA_Crime.corr()

Unnamed: 0,DR Number,Time Occurred,Area ID,Reporting District,Crime Code,Victim Age,Premise Code,Weapon Used Code,Crime Code 1,Crime Code 2,Crime Code 3,Crime Code 4
DR Number,1.0,-0.010665,0.013788,0.099133,0.011063,0.023642,3.1e-05,0.018252,0.010698,-0.076767,0.01951,0.181208
Time Occurred,-0.010665,1.0,0.009255,0.011872,0.009885,-0.048485,-0.084068,-0.000297,0.009928,0.000653,-0.008852,-0.108404
Area ID,0.013788,0.009255,1.0,0.998975,-0.012509,0.010492,0.021929,-0.005879,-0.012557,-0.039095,0.024952,0.206373
Reporting District,0.099133,0.011872,0.998975,1.0,-0.008109,0.012441,0.004888,-0.006799,-0.008282,-0.08469,-0.035812,0.053496
Crime Code,0.011063,0.009885,-0.012509,-0.008109,1.0,-0.026076,0.115432,0.416985,0.999411,0.045081,0.13468,0.068785
Victim Age,0.023642,-0.048485,0.010492,0.012441,-0.026076,1.0,0.179899,0.080718,-0.025927,-0.03061,-0.009287,-0.165875
Premise Code,3.1e-05,-0.084068,0.021929,0.004888,0.115432,0.179899,1.0,0.211965,0.115519,-0.035774,-0.018244,0.241395
Weapon Used Code,0.018252,-0.000297,-0.005879,-0.006799,0.416985,0.080718,0.211965,1.0,0.41762,-0.130877,-0.016315,-0.028386
Crime Code 1,0.010698,0.009928,-0.012557,-0.008282,0.999411,-0.025927,0.115519,0.41762,1.0,0.059488,0.166796,0.059683
Crime Code 2,-0.076767,0.000653,-0.039095,-0.08469,0.045081,-0.03061,-0.035774,-0.130877,0.059488,1.0,0.330473,0.336878
