## ANALYZING CRIME IN LOS ANGELES

### Answer the following questions: 
 - Which hour has the highest frequency of crimes? Store as an integer variable called `peak_crime_hour`.
 - Which area has the largest frequency of night crimes (crimes committed between 10PM and 3:59AM)? Save as a string
 variable called `peak_night_crime_location`.
 - Identify the number of crimes committed against victims of different age groups. Save as a pandas Series called victim_ages, with age group labels `"0-17"`, `"18-25"`, `"26-34"`, `"35-44"`, `"45-54"`, `"55-64"`, and `"65+"` as the index and the frequency of crimes as the values.


In [97]:
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns

crime = pd.read_csv('../Data/crimes.csv')

In [98]:
crime.head(3)

Unnamed: 0,DR_NO,Date Rptd,DATE OCC,TIME OCC,AREA NAME,Crm Cd Desc,Vict Age,Vict Sex,Vict Descent,Weapon Desc,Status Desc,LOCATION
0,220314085,2022-07-22,2020-05-12,1110,Southwest,THEFT OF IDENTITY,27,F,B,,Invest Cont,2500 S SYCAMORE AV
1,222013040,2022-08-06,2020-06-04,1620,Olympic,THEFT OF IDENTITY,60,M,H,,Invest Cont,3300 SAN MARINO ST
2,220614831,2022-08-18,2020-08-17,1200,Hollywood,THEFT OF IDENTITY,28,M,H,,Invest Cont,1900 TRANSIENT


In [99]:
crime.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 185715 entries, 0 to 185714
Data columns (total 12 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   DR_NO         185715 non-null  int64 
 1   Date Rptd     185715 non-null  object
 2   DATE OCC      185715 non-null  object
 3   TIME OCC      185715 non-null  int64 
 4   AREA NAME     185715 non-null  object
 5   Crm Cd Desc   185715 non-null  object
 6   Vict Age      185715 non-null  int64 
 7   Vict Sex      185704 non-null  object
 8   Vict Descent  185705 non-null  object
 9   Weapon Desc   73502 non-null   object
 10  Status Desc   185715 non-null  object
 11  LOCATION      185715 non-null  object
dtypes: int64(3), object(9)
memory usage: 17.0+ MB


In [100]:
crime.shape

(185715, 12)

In [101]:
# Converting 'Date Rptd' and 'DATE OCC' column data type into datetime

crime['Date Rptd'] = pd.to_datetime(crime['Date Rptd'], format='%Y-%m-%d')
crime['DATE OCC'] = pd.to_datetime(crime['DATE OCC'], format='%Y-%m-%d')
crime['DATE OCC']


0        2020-05-12
1        2020-06-04
2        2020-08-17
3        2020-01-27
4        2020-07-14
            ...    
185710   2023-05-25
185711   2023-01-26
185712   2023-03-22
185713   2023-04-12
185714   2023-03-05
Name: DATE OCC, Length: 185715, dtype: datetime64[ns]

In [102]:
# Converting 'TIME OCC' to a timestamp

crime['TIME OCC'] = crime['TIME OCC'].astype(str)
crime['TIME OCC'] = crime['TIME OCC'].str.zfill(4)
dt_time_format = pd.to_datetime(crime['TIME OCC'], format='%H%M')
crime['TIME OCC'] = dt_time_format.dt.time

In [None]:
# Creating a new column to extract the hour component for each crime occurrence 

crime['Hour Occurred'] = dt_time_format.dt.hour

# Counting the frequency of crimes for each hour to see which one 
# has the highest number of occurrences

crime_freq_per_hr = crime['Hour Occurred'].value_counts()

# Answer for Question #1

peak_crime_hour = crime_freq_per_hr.idxmax()

In [112]:
crime.head(5)

Unnamed: 0,DR_NO,Date Rptd,DATE OCC,TIME OCC,AREA NAME,Crm Cd Desc,Vict Age,Vict Sex,Vict Descent,Weapon Desc,Status Desc,LOCATION,Hour Occurred
0,220314085,2022-07-22,2020-05-12,11:10:00,Southwest,THEFT OF IDENTITY,27,F,B,,Invest Cont,2500 S SYCAMORE AV,11
1,222013040,2022-08-06,2020-06-04,16:20:00,Olympic,THEFT OF IDENTITY,60,M,H,,Invest Cont,3300 SAN MARINO ST,16
2,220614831,2022-08-18,2020-08-17,12:00:00,Hollywood,THEFT OF IDENTITY,28,M,H,,Invest Cont,1900 TRANSIENT,12
3,231207725,2023-02-27,2020-01-27,06:35:00,77th Street,THEFT OF IDENTITY,37,M,H,,Invest Cont,6200 4TH AV,6
4,220213256,2022-07-14,2020-07-14,09:00:00,Rampart,THEFT OF IDENTITY,79,M,B,,Invest Cont,1200 W 7TH ST,9


In [130]:
crimes_at_night = crime[
    (crime['Hour Occurred'] >= 20) | (crime['Hour Occurred'] <= 3)]

crimes_at_night['AREA NAME'].value_counts()

AREA NAME
Central        4909
77th Street    3642
Hollywood      3581
Southwest      3455
Olympic        3162
Southeast      3105
Pacific        2954
Newton         2940
Rampart        2673
N Hollywood    2660
Van Nuys       2537
Wilshire       2527
Northeast      2451
West Valley    2374
Topanga        2329
West LA        2157
Mission        2112
Devonshire     2108
Harbor         2053
Hollenbeck     1933
Foothill       1801
Name: count, dtype: int64

In [None]:
# Answer to question #2

peak_night_crime_location = crimes_at_night['AREA NAME'].value_counts().idxmax()
peak_night_crime_location

'Central'