# Analyzing Crime In Los Angeles

### Background
Los Angeles, California 😎. The City of Angels. Tinseltown. The Entertainment Capital of the World! Known for its warm weather, palm trees, sprawling coastline, and Hollywood, along with producing some of the most iconic films and songs!

However, as with any highly populated city, it isn't always glamorous and there can be a large volume of crime. This is the juncture where the project objective assumes significance.

### Objective
Support the Los Angeles Police Department (LAPD) by cleaning and analyzing the crime data to identify patterns in criminal behavior. The LAPD plans to use our insights to allocate resources effectively to tackle various crimes in different areas.

### TO DO
- Clean and prepare the dataset for analysis
- Perform exploratory data analysis
- Formulate and address questions related to crime trends, patterns, and factors influencing crime rates.

<hr>
<hr>

In [1]:
# Importing Necessary Libraries

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import datetime

<hr>

### Data Collection & Inspection

In [2]:
print("Loading Dataset.....")

df = pd.read_csv("crimes.csv")
df.head()

Loading Dataset.....


Unnamed: 0,DR_NO,Date Rptd,DATE OCC,TIME OCC,AREA,AREA NAME,Rpt Dist No,Part 1-2,Crm Cd,Crm Cd Desc,...,Status,Status Desc,Crm Cd 1,Crm Cd 2,Crm Cd 3,Crm Cd 4,LOCATION,Cross Street,LAT,LON
0,10304468,01/08/2020 12:00:00 AM,01/08/2020 12:00:00 AM,2230,3,Southwest,377,2,624,BATTERY - SIMPLE ASSAULT,...,AO,Adult Other,624.0,,,,1100 W 39TH PL,,34.0141,-118.2978
1,190101086,01/02/2020 12:00:00 AM,01/01/2020 12:00:00 AM,330,1,Central,163,2,624,BATTERY - SIMPLE ASSAULT,...,IC,Invest Cont,624.0,,,,700 S HILL ST,,34.0459,-118.2545
2,200110444,04/14/2020 12:00:00 AM,02/13/2020 12:00:00 AM,1200,1,Central,155,2,845,SEX OFFENDER REGISTRANT OUT OF COMPLIANCE,...,AA,Adult Arrest,845.0,,,,200 E 6TH ST,,34.0448,-118.2474
3,191501505,01/01/2020 12:00:00 AM,01/01/2020 12:00:00 AM,1730,15,N Hollywood,1543,2,745,VANDALISM - MISDEAMEANOR ($399 OR UNDER),...,IC,Invest Cont,745.0,998.0,,,5400 CORTEEN PL,,34.1685,-118.4019
4,191921269,01/01/2020 12:00:00 AM,01/01/2020 12:00:00 AM,415,19,Mission,1998,2,740,"VANDALISM - FELONY ($400 & OVER, ALL CHURCH VA...",...,IC,Invest Cont,740.0,,,,14400 TITUS ST,,34.2198,-118.4468


In [3]:
print("Number of records (rows) in dataset : {}".format(df.shape[0]))
print("Number of attributes (columns) in dataset : {}".format(df.shape[1]))

Number of records (rows) in dataset : 820599
Number of attributes (columns) in dataset : 28


In [4]:
print("Information On Dataset")
df.info()

Information On Dataset
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 820599 entries, 0 to 820598
Data columns (total 28 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   DR_NO           820599 non-null  int64  
 1   Date Rptd       820599 non-null  object 
 2   DATE OCC        820599 non-null  object 
 3   TIME OCC        820599 non-null  int64  
 4   AREA            820599 non-null  int64  
 5   AREA NAME       820599 non-null  object 
 6   Rpt Dist No     820599 non-null  int64  
 7   Part 1-2        820599 non-null  int64  
 8   Crm Cd          820599 non-null  int64  
 9   Crm Cd Desc     820599 non-null  object 
 10  Mocodes         707114 non-null  object 
 11  Vict Age        820599 non-null  int64  
 12  Vict Sex        712653 non-null  object 
 13  Vict Descent    712645 non-null  object 
 14  Premis Cd       820589 non-null  float64
 15  Premis Desc     820116 non-null  object 
 16  Weapon Used Cd  286078 non-null  

#### Discriptive Statistics Of Dataset

In [5]:
df.describe(include=[np.number]).round(3)

# similar to 
# df.describe().round(3)

Unnamed: 0,DR_NO,TIME OCC,AREA,Rpt Dist No,Part 1-2,Crm Cd,Vict Age,Premis Cd,Weapon Used Cd,Crm Cd 1,Crm Cd 2,Crm Cd 3,Crm Cd 4,LAT,LON
count,820599.0,820599.0,820599.0,820599.0,820599.0,820599.0,820599.0,820589.0,286078.0,820589.0,60413.0,2025.0,60.0,820599.0,820599.0
mean,216129900.0,1335.627,10.712,1117.592,1.414,500.804,29.806,305.759,362.917,500.542,957.478,983.615,990.75,33.605,-116.726
std,10830450.0,654.021,6.094,609.361,0.493,207.808,21.777,216.67,123.754,207.596,111.524,52.845,27.908,3.97,13.786
min,817.0,1.0,1.0,101.0,1.0,110.0,-3.0,101.0,101.0,110.0,210.0,310.0,821.0,0.0,-118.668
25%,210204600.0,900.0,6.0,621.0,1.0,331.0,7.0,101.0,310.0,331.0,998.0,998.0,998.0,34.01,-118.429
50%,220117600.0,1415.0,11.0,1142.0,1.0,442.0,31.0,203.0,400.0,442.0,998.0,998.0,998.0,34.058,-118.319
75%,222005600.0,1900.0,16.0,1617.0,2.0,626.0,45.0,501.0,400.0,626.0,998.0,998.0,998.0,34.162,-118.273
max,239916500.0,2359.0,21.0,2199.0,2.0,956.0,120.0,976.0,516.0,956.0,999.0,999.0,999.0,34.334,0.0


In [6]:
df.describe(include=[object])

# here, similar to
# df.describe(include=['O'])

Unnamed: 0,Date Rptd,DATE OCC,AREA NAME,Crm Cd Desc,Mocodes,Vict Sex,Vict Descent,Premis Desc,Weapon Desc,Status,Status Desc,LOCATION,Cross Street
count,820599,820599,820599,820599,707114,712653,712645,820116,286078,820599,820599,820599,131214
unique,1385,1385,21,138,273856,5,20,306,79,6,6,63719,9688
top,02/03/2023 12:00:00 AM,12/02/2022 12:00:00 AM,Central,VEHICLE - STOLEN,344,M,H,STREET,"STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)",IC,Invest Cont,800 N ALAMEDA ST,BROADWAY
freq,924,1130,55209,87888,33541,338824,251794,207601,153313,656897,656897,1486,2184


In [7]:
print("Column Names")
df.columns

Column Names


Index(['DR_NO', 'Date Rptd', 'DATE OCC', 'TIME OCC', 'AREA', 'AREA NAME',
       'Rpt Dist No', 'Part 1-2', 'Crm Cd', 'Crm Cd Desc', 'Mocodes',
       'Vict Age', 'Vict Sex', 'Vict Descent', 'Premis Cd', 'Premis Desc',
       'Weapon Used Cd', 'Weapon Desc', 'Status', 'Status Desc', 'Crm Cd 1',
       'Crm Cd 2', 'Crm Cd 3', 'Crm Cd 4', 'LOCATION', 'Cross Street', 'LAT',
       'LON'],
      dtype='object')

<hr>
<hr>