![Los Angeles skyline](la_skyline.jpg)

Los Angeles, California 😎. The City of Angels. Tinseltown. The Entertainment Capital of the World! 

Known for its warm weather, palm trees, sprawling coastline, and Hollywood, along with producing some of the most iconic films and songs. However, as with any highly populated city, it isn't always glamorous and there can be a large volume of crime. That's where you can help!

You have been asked to support the Los Angeles Police Department (LAPD) by analyzing crime data to identify patterns in criminal behavior. They plan to use your insights to allocate resources effectively to tackle various crimes in different areas.

## The Data

They have provided you with a single dataset to use. A summary and preview are provided below.

It is a modified version of the original data, which is publicly available from Los Angeles Open Data.

# crimes.csv

| Column     | Description              |
|------------|--------------------------|
| `'DR_NO'` | Division of Records Number: Official file number made up of a 2-digit year, area ID, and 5 digits. |
| `'Date Rptd'` | Date reported - MM/DD/YYYY. |
| `'DATE OCC'` | Date of occurrence - MM/DD/YYYY. |
| `'TIME OCC'` | In 24-hour military time. |
| `'AREA NAME'` | The 21 Geographic Areas or Patrol Divisions are also given a name designation that references a landmark or the surrounding community that it is responsible for. For example, the 77th Street Division is located at the intersection of South Broadway and 77th Street, serving neighborhoods in South Los Angeles. |
| `'Crm Cd Desc'` | Indicates the crime committed. |
| `'Vict Age'` | Victim's age in years. |
| `'Vict Sex'` | Victim's sex: `F`: Female, `M`: Male, `X`: Unknown. |
| `'Vict Descent'` | Victim's descent:<ul><li>`A` - Other Asian</li><li>`B` - Black</li><li>`C` - Chinese</li><li>`D` - Cambodian</li><li>`F` - Filipino</li><li>`G` - Guamanian</li><li>`H` - Hispanic/Latin/Mexican</li><li>`I` - American Indian/Alaskan Native</li><li>`J` - Japanese</li><li>`K` - Korean</li><li>`L` - Laotian</li><li>`O` - Other</li><li>`P` - Pacific Islander</li><li>`S` - Samoan</li><li>`U` - Hawaiian</li><li>`V` - Vietnamese</li><li>`W` - White</li><li>`X` - Unknown</li><li>`Z` - Asian Indian</li> |
| `'Weapon Desc'` | Description of the weapon used (if applicable). |
| `'Status Desc'` | Crime status. |
| `'LOCATION'` | Street address of the crime. |

In [41]:
# Re-run this cell
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display
crimes = pd.read_csv("crimes.csv", parse_dates=["Date Rptd", "DATE OCC"], dtype={"TIME OCC": str})
crimes.head()

Unnamed: 0,DR_NO,Date Rptd,DATE OCC,TIME OCC,AREA NAME,Crm Cd Desc,Vict Age,Vict Sex,Vict Descent,Weapon Desc,Status Desc,LOCATION
0,220314085,2022-07-22,2020-05-12,1110,Southwest,THEFT OF IDENTITY,27,F,B,,Invest Cont,2500 S SYCAMORE AV
1,222013040,2022-08-06,2020-06-04,1620,Olympic,THEFT OF IDENTITY,60,M,H,,Invest Cont,3300 SAN MARINO ST
2,220614831,2022-08-18,2020-08-17,1200,Hollywood,THEFT OF IDENTITY,28,M,H,,Invest Cont,1900 TRANSIENT
3,231207725,2023-02-27,2020-01-27,635,77th Street,THEFT OF IDENTITY,37,M,H,,Invest Cont,6200 4TH AV
4,220213256,2022-07-14,2020-07-14,900,Rampart,THEFT OF IDENTITY,79,M,B,,Invest Cont,1200 W 7TH ST


In [42]:
# Start coding here
# Use as many cells as you need
crimes.info()
crimes.isnull().sum()
crimes['AREA NAME'].value_counts()
crimes['TIME OCC'] = crimes['TIME OCC'].str.zfill(4)
crimes['TIME OCC'] = pd.to_datetime(crimes['TIME OCC'], format = '%H%M').dt.time

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 185715 entries, 0 to 185714
Data columns (total 12 columns):
 #   Column        Non-Null Count   Dtype         
---  ------        --------------   -----         
 0   DR_NO         185715 non-null  int64         
 1   Date Rptd     185715 non-null  datetime64[ns]
 2   DATE OCC      185715 non-null  datetime64[ns]
 3   TIME OCC      185715 non-null  object        
 4   AREA NAME     185715 non-null  object        
 5   Crm Cd Desc   185715 non-null  object        
 6   Vict Age      185715 non-null  int64         
 7   Vict Sex      185704 non-null  object        
 8   Vict Descent  185705 non-null  object        
 9   Weapon Desc   73502 non-null   object        
 10  Status Desc   185715 non-null  object        
 11  LOCATION      185715 non-null  object        
dtypes: datetime64[ns](2), int64(2), object(8)
memory usage: 17.0+ MB


In [43]:
crimes.describe()


Unnamed: 0,DR_NO,Vict Age
count,185715.0,185715.0
mean,225578100.0,39.999257
std,5017438.0,15.450227
min,200907200.0,2.0
25%,221010800.0,28.0
50%,222011400.0,37.0
75%,231004400.0,50.0
max,239909700.0,99.0


In [44]:
null_victims = crimes[crimes['Vict Sex'].isnull() | crimes['Vict Descent'].isnull()]
display(null_victims)
crimes['Vict Sex'] = crimes['Vict Sex'].fillna('X')
crimes['Vict Descent'] = crimes['Vict Descent'].fillna('X')
crimes['Weapon Desc'] = crimes['Weapon Desc'].fillna('Unknown')
print(crimes[['Vict Sex', 'Vict Descent', 'Weapon Desc' ]].isnull().sum())


Unnamed: 0,DR_NO,Date Rptd,DATE OCC,TIME OCC,AREA NAME,Crm Cd Desc,Vict Age,Vict Sex,Vict Descent,Weapon Desc,Status Desc,LOCATION
3457,221615369,2022-12-21,2022-12-19,08:00:00,Foothill,"THEFT-GRAND ($950.01 & OVER)EXCPT,GUNS,FOWL,LI...",21,,H,,Invest Cont,11000 ARMINTA ST
15004,220314540,2022-08-01,2022-07-30,03:00:00,Southwest,THEFT PLAIN - PETTY ($950 & UNDER),21,,B,,Invest Cont,2400 S WESTERN AV
32086,221813489,2022-06-29,2022-06-29,11:30:00,Southeast,DISCHARGE FIREARMS/SHOTS FIRED,22,,,SEMI-AUTOMATIC PISTOL,Adult Arrest,600 W 119TH ST
38511,221615373,2022-12-21,2022-12-20,16:30:00,Foothill,THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND ...,21,,,,Invest Cont,8500 TERHUNE AV
38738,220218443,2022-10-12,2022-10-12,23:25:00,Rampart,"VANDALISM - FELONY ($400 & OVER, ALL CHURCH VA...",21,,,,Invest Cont,4100 ROSEWOOD AV
48786,220211337,2022-06-04,2022-06-04,02:26:00,Rampart,"VANDALISM - FELONY ($400 & OVER, ALL CHURCH VA...",21,,,,Invest Cont,1400 W 12TH PL
65540,221517206,2022-11-09,2022-11-08,10:00:00,N Hollywood,BURGLARY,83,,,,Invest Cont,13000 VICTORY BL
92698,220219989,2022-11-09,2022-11-09,17:55:00,Rampart,VANDALISM - MISDEAMEANOR ($399 OR UNDER),21,,,"STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)",Invest Cont,400 WITMER ST
110168,230111779,2023-05-03,2023-05-02,22:30:00,Central,BURGLARY FROM VEHICLE,36,,,,Invest Cont,1800 S MAIN ST
119171,230114251,2023-06-14,2023-06-14,02:50:00,Central,BURGLARY,40,,,,Invest Cont,1000 S HOPE ST


Vict Sex        0
Vict Descent    0
Weapon Desc     0
dtype: int64


In [45]:
# Overview of Crime Categories
crime_counts = crimes['Crm Cd Desc'].value_counts()
print(crimes['Date Rptd'].value_counts())
pd.set_option('display.max_rows', None)
display(crime_counts)

2023-02-03    762
2023-02-02    721
2023-01-03    716
2022-12-02    700
2022-06-02    687
2022-12-05    667
2022-09-02    660
2022-11-03    658
2022-08-03    653
2022-11-02    649
2022-10-03    643
2022-08-01    642
2022-12-12    638
2022-08-02    631
2023-02-06    628
2023-01-04    608
2022-12-08    607
2022-10-04    603
2022-06-21    597
2022-07-05    589
2022-08-08    586
2022-06-06    585
2023-02-01    584
2022-06-01    578
2022-09-06    577
2022-10-05    574
2022-11-01    574
2022-12-01    565
2022-09-12    563
2023-01-30    559
2022-06-13    558
2022-06-03    558
2023-04-03    552
2022-12-06    545
2022-10-24    544
2023-02-13    543
2023-02-09    543
2023-02-21    542
2023-06-12    540
2023-03-03    540
2022-12-23    539
2022-06-10    539
2022-12-03    538
2022-08-24    538
2022-09-19    537
2023-06-26    536
2022-06-07    533
2022-09-07    532
2023-06-08    531
2022-07-02    531
2022-12-09    529
2023-03-06    529
2023-01-06    529
2022-09-08    528
2023-03-13    528
2022-12-20

THEFT OF IDENTITY                                           22670
BATTERY - SIMPLE ASSAULT                                    19694
BURGLARY FROM VEHICLE                                       13799
ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT              13215
INTIMATE PARTNER - SIMPLE ASSAULT                           11981
THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND OVER)         11484
VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)     10719
THEFT PLAIN - PETTY ($950 & UNDER)                          10603
BURGLARY                                                    10268
THEFT-GRAND ($950.01 & OVER)EXCPT,GUNS,FOWL,LIVESTK,PROD     7057
ROBBERY                                                      6470
CRIMINAL THREATS - NO WEAPON DISPLAYED                       5133
VANDALISM - MISDEAMEANOR ($399 OR UNDER)                     4090
BRANDISH WEAPON                                              3662
INTIMATE PARTNER - AGGRAVATED ASSAULT                        3129
VIOLATION 

In [46]:
#  Hour with the the highest frequency of crimes

# Extract the hour
crimes['HOUR OCC'] = crimes['TIME OCC'].apply(lambda x: x.hour)

# Count occurences per hour.
hourly_crime_counts = crimes['HOUR OCC'].value_counts().sort_index()

# Find hour with the highest frequency of crime.
peak_crime_hour = hourly_crime_counts.idxmax()
print(f'{peak_crime_hour} : 00')




12 : 00


In [47]:
# Finding the area which has the largest frequency of night crimes (crimes committed between 10pm and 3:59am).

# Define the start and end time for night crimes
start_time = pd.to_datetime("22:00", format="%H:%M").time()  # 10 PM
end_time = pd.to_datetime("03:59", format="%H:%M").time()    # 3:59 AM

# Filter the crimes
night_crimes = crimes[(crimes['TIME OCC'] >= start_time) | (crimes['TIME OCC'] <= end_time)]

# Count the occurrences of crimes by area
night_crime_counts = night_crimes['AREA NAME'].value_counts()

# Find the area with the highest frequency of night crimes
peak_night_crime_location = night_crime_counts.idxmax()

# Display the result
print(peak_night_crime_location)

Central


In [48]:
# number of crimes committed against victims of different age groups.

# Define the bins and labels for the age groups
bins = [0, 18, 26, 35, 45, 55, 65, float('inf')]
labels = ['0-17', '18-25', '26-34', '35-44', '45-54', '55-64', '65+']

# Create a new column for age group
crimes['AGE GROUP'] = pd.cut(crimes['Vict Age'], bins=bins, labels=labels, right=False)

# Count the frequency of crimes in each age group
victim_ages = crimes['AGE GROUP'].value_counts().sort_index()

# Display the resulting Series
print(victim_ages)

0-17      4528
18-25    28291
26-34    47470
35-44    42157
45-54    28353
55-64    20169
65+      14747
Name: AGE GROUP, dtype: int64
