<a href="https://colab.research.google.com/github/AlexVonSchwerdtner/BA780-Team6/blob/main/Boston_Crime_Scene_Analytics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#*BA780 - Team 6*
##Team Project

#**Boston Crime Scene Analytics**

###*Problem definition:*

As current students living in the Boston Area we are interested in analyzing the Boston Crime Scene. We would like to analyze and discover what types of crimes are most common, where the different types of crimes are most likely to occur, and if the frequency of crimes change over specific time spans (e.g. day, week & year). Our mission is to answer questions like “where a tourist is most likely to be a victim of a crime at a certain time of day, specific months or seasons of a given year?”

**Data Source:** Analyze Boston

https://data.boston.gov/dataset/crime-incident-reports-august-2015-to-date-source-new-system

*Crime incident reports are provided by Boston Police Department (BPD) to document the initial details surrounding an incident to which BPD officers respond. This is a dataset containing records from the new crime incident report system, which includes a reduced set of fields focused on capturing the type of incident as well as when and where it occurred. Records in the new system begin in June of 2015.*

## **Data Cleaning**

Importing Required Packages

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Reading all files to the environment

In [3]:
# reading the crime data sets to the environement
crimes2015 = pd.read_csv('https://raw.githubusercontent.com/AlexVonSchwerdtner/BA780-Team6/main/crime-incident-reports-2015.csv')
crimes2016 = pd.read_csv('https://raw.githubusercontent.com/AlexVonSchwerdtner/BA780-Team6/main/crime-incident-reports-2016.csv')
crimes2017 = pd.read_csv('https://raw.githubusercontent.com/AlexVonSchwerdtner/BA780-Team6/main/crime-incident-reports-2017.csv')
crimes2018 = pd.read_csv('https://raw.githubusercontent.com/AlexVonSchwerdtner/BA780-Team6/main/crime-incident-reports-2018.csv')
crimes2019 = pd.read_csv('https://raw.githubusercontent.com/AlexVonSchwerdtner/BA780-Team6/main/crime-incident-reports-2019.csv')
crimes2020 = pd.read_csv('https://raw.githubusercontent.com/AlexVonSchwerdtner/BA780-Team6/main/crime-incident-reports-2020.csv')
crimes2021 = pd.read_csv('https://raw.githubusercontent.com/AlexVonSchwerdtner/BA780-Team6/main/crime-incident-reports-2021.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [4]:
# reading the offense codes to the environment
offense_codes = pd.read_excel('https://raw.githubusercontent.com/AlexVonSchwerdtner/BA780-Team6/main/rmsoffensecodes.xlsx')
offense_codes.head()

Unnamed: 0,CODE,NAME
0,612,LARCENY PURSE SNATCH - NO FORCE
1,613,LARCENY SHOPLIFTING
2,615,LARCENY THEFT OF MV PARTS & ACCESSORIES
3,1731,INCEST
4,3111,LICENSE PREMISE VIOLATION


In [7]:
# checking for any duplicates
print(len(offense_codes))
print(len(offense_codes.drop_duplicates(subset='CODE', keep='first')))

576
425


In [8]:
# dropping duplicates
offense_codes = offense_codes.drop_duplicates(subset='CODE', keep='first').reset_index(drop=True)

In [10]:
# concatenating all crime datasets
frames = [crimes2015,crimes2016,crimes2017,crimes2018,crimes2019,crimes2020,crimes2021]
Crimes_all_years = finals = pd.concat(frames).reset_index()

In [11]:
# checking for any duplicates
print(len(Crimes_all_years))
print(len(Crimes_all_years.drop_duplicates(subset='INCIDENT_NUMBER', keep='first')))

494281
452208


In [12]:
# dropping duplicates
Crime_all_years = Crimes_all_years.drop_duplicates(subset='INCIDENT_NUMBER', keep='first').reset_index(drop=True)

In [13]:
# filling the NA's in the 'OFFENSE_CODE_GROUP' to "Other"
Crimes_all_years['OFFENSE_CODE_GROUP']= Crimes_all_years['OFFENSE_CODE_GROUP'].fillna("Other")

In [14]:
Crimes_all_years.isna().sum()

index                       0
INCIDENT_NUMBER             0
OFFENSE_CODE                0
OFFENSE_CODE_GROUP          0
OFFENSE_DESCRIPTION         0
DISTRICT                 3120
REPORTING_AREA              0
SHOOTING               351798
OCCURRED_ON_DATE            0
YEAR                        0
MONTH                       0
DAY_OF_WEEK                 0
HOUR                        0
UCR_PART               141125
STREET                  11886
Lat                     22530
Long                    22530
Location                    0
dtype: int64

In [16]:
# missing values in Lat column
Crimes_all_years[Crimes_all_years['Lat'].isnull()]

Unnamed: 0,index,INCIDENT_NUMBER,OFFENSE_CODE,OFFENSE_CODE_GROUP,OFFENSE_DESCRIPTION,DISTRICT,REPORTING_AREA,SHOOTING,OCCURRED_ON_DATE,YEAR,MONTH,DAY_OF_WEEK,HOUR,UCR_PART,STREET,Lat,Long,Location
16,16,I182039429,1107,Fraud,FRAUD - IMPERSONATION,C6,226,,2015-11-26 08:00:00,2015,11,Thursday,8,Part Two,E FOURTH ST,,,"(0.00000000, 0.00000000)"
45,45,I172061344,1102,Fraud,FRAUD - FALSE PRETENSE / SCHEME,A1,92,,2015-10-13 12:00:00,2015,10,Tuesday,12,Part Two,COURT ST,,,"(0.00000000, 0.00000000)"
83,83,I162101249,1102,Fraud,FRAUD - FALSE PRETENSE / SCHEME,E18,,,2015-12-24 20:00:00,2015,12,Thursday,20,Part Two,CUMMINS HWY,,,"(0.00000000, 0.00000000)"
91,91,I162097933,1106,Confidence Games,FRAUD - CREDIT CARD / ATM FRAUD,E5,739,,2015-12-01 00:00:00,2015,12,Tuesday,0,Part Two,ADDELAIDE PL,,,"(0.00000000, 0.00000000)"
92,92,I162097933,3109,Police Service Incidents,SERVICE TO OTHER PD INSIDE OF MA.,E5,739,,2015-12-01 00:00:00,2015,12,Tuesday,0,Part Three,ADDELAIDE PL,,,"(0.00000000, 0.00000000)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
353153,98788,I172022524,2660,Other,OTHER OFFENSE,A1,171,,2018-12-21 02:15:00,2018,12,Friday,2,Part Two,BOYLSTON ST,,,"(0.00000000, 0.00000000)"
353154,98789,I172022524,1810,Drug Violation,DRUGS - SALE / MANUFACTURING,A1,171,,2018-12-21 02:15:00,2018,12,Friday,2,Part Two,BOYLSTON ST,,,"(0.00000000, 0.00000000)"
353168,98803,I172002908,802,Simple Assault,ASSAULT SIMPLE - BATTERY,A1,,,2018-01-16 16:00:00,2018,1,Tuesday,16,Part Two,,,,"(0.00000000, 0.00000000)"
353169,98804,I172002908,3125,Warrant Arrests,WARRANT ARREST,A1,,,2018-01-16 16:00:00,2018,1,Tuesday,16,Part Three,,,,"(0.00000000, 0.00000000)"


In [17]:
# missing values in Long column
Crime_all_years[Crime_all_years['Long'].isnull()]

Unnamed: 0,index,INCIDENT_NUMBER,OFFENSE_CODE,OFFENSE_CODE_GROUP,OFFENSE_DESCRIPTION,DISTRICT,REPORTING_AREA,SHOOTING,OCCURRED_ON_DATE,YEAR,MONTH,DAY_OF_WEEK,HOUR,UCR_PART,STREET,Lat,Long,Location
13,16,I182039429,1107,Fraud,FRAUD - IMPERSONATION,C6,226,,2015-11-26 08:00:00,2015,11,Thursday,8,Part Two,E FOURTH ST,,,"(0.00000000, 0.00000000)"
42,45,I172061344,1102,Fraud,FRAUD - FALSE PRETENSE / SCHEME,A1,92,,2015-10-13 12:00:00,2015,10,Tuesday,12,Part Two,COURT ST,,,"(0.00000000, 0.00000000)"
78,83,I162101249,1102,Fraud,FRAUD - FALSE PRETENSE / SCHEME,E18,,,2015-12-24 20:00:00,2015,12,Thursday,20,Part Two,CUMMINS HWY,,,"(0.00000000, 0.00000000)"
86,91,I162097933,1106,Confidence Games,FRAUD - CREDIT CARD / ATM FRAUD,E5,739,,2015-12-01 00:00:00,2015,12,Tuesday,0,Part Two,ADDELAIDE PL,,,"(0.00000000, 0.00000000)"
106,112,I162083921,1102,Fraud,FRAUD - FALSE PRETENSE / SCHEME,D4,,,2015-09-20 12:00:00,2015,9,Sunday,12,Part Two,CLARENDON ST,,,"(0.00000000, 0.00000000)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
311094,98649,I182000010,3802,Motor Vehicle Accident Response,M/V ACCIDENT - PROPERTY DAMAGE,B3,,,2018-01-01 00:15:00,2018,1,Monday,0,Part Three,CUMMINS HWY,,,"(0.00000000, 0.00000000)"
311098,98655,I172107333,724,Auto Theft,AUTO THEFT,A1,,,2018-01-11 00:50:00,2018,1,Thursday,0,Part One,BOYLSTON ST,,,"(0.00000000, 0.00000000)"
311147,98781,I172022524,3125,Warrant Arrests,WARRANT ARREST,A1,171,,2018-12-21 02:15:00,2018,12,Friday,2,Part Three,BOYLSTON ST,,,"(0.00000000, 0.00000000)"
311153,98803,I172002908,802,Simple Assault,ASSAULT SIMPLE - BATTERY,A1,,,2018-01-16 16:00:00,2018,1,Tuesday,16,Part Two,,,,"(0.00000000, 0.00000000)"


In [None]:
df1[df1['Lat'].isnull()]['Location'].unique()

array(['(0.00000000, 0.00000000)'], dtype=object)

## **General Overview**

In [None]:
Crimes_all_years.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 493638 entries, 0 to 493637
Data columns (total 18 columns):
 #   Column               Non-Null Count   Dtype  
---  ------               --------------   -----  
 0   index                493638 non-null  int64  
 1   INCIDENT_NUMBER      493638 non-null  object 
 2   OFFENSE_CODE         493638 non-null  int64  
 3   OFFENSE_CODE_GROUP   352610 non-null  object 
 4   OFFENSE_DESCRIPTION  493638 non-null  object 
 5   DISTRICT             490518 non-null  object 
 6   REPORTING_AREA       493638 non-null  object 
 7   SHOOTING             142181 non-null  object 
 8   OCCURRED_ON_DATE     493638 non-null  object 
 9   YEAR                 493638 non-null  int64  
 10  MONTH                493638 non-null  int64  
 11  DAY_OF_WEEK          493638 non-null  object 
 12  HOUR                 493638 non-null  int64  
 13  UCR_PART             352516 non-null  object 
 14  STREET               481764 non-null  object 
 15  Lat              

In [9]:
offense_codes1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 425 entries, 0 to 424
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   CODE    425 non-null    int64 
 1   NAME    425 non-null    object
dtypes: int64(1), object(1)
memory usage: 6.8+ KB


## **By Hour and Weekday**

## **By Season**

## **By Year**

## **Map**