# Introduction

Welcome to this project analyzing crime data from Los Angeles, California, known as the "City of Angels." 🌴✨

Los Angeles is famous for its sunny weather, palm trees, and entertainment industry. However, like any major city, it also faces challenges, including a high crime rate. This project aims to explore a crime dataset to identify patterns in criminal behavior and provide valuable insights to the Los Angeles Police Department (LAPD).

By analyzing the `crimes.csv` file, we will address the following questions:

- **Which hour has the highest frequency of crimes?** 
- **Which area has the largest frequency of night crimes (crimes committed between 10 PM and 3:59 AM)?** 
- **What is the number of crimes committed against victims of different age groups?** 

These analyses will assist the LAPD in effectively allocating resources to tackle crime in various areas. Let’s dive into the data!

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
crimes = pd.read_csv("Dataset/crimes.csv", parse_dates=["Date Rptd", "DATE OCC"], dtype={"TIME OCC": str})
crimes.head()

Unnamed: 0,DR_NO,Date Rptd,DATE OCC,TIME OCC,AREA NAME,Crm Cd Desc,Vict Age,Vict Sex,Vict Descent,Weapon Desc,Status Desc,LOCATION
0,220314085,2022-07-22,2020-05-12,1110,Southwest,THEFT OF IDENTITY,27,F,B,,Invest Cont,2500 S SYCAMORE AV
1,222013040,2022-08-06,2020-06-04,1620,Olympic,THEFT OF IDENTITY,60,M,H,,Invest Cont,3300 SAN MARINO ST
2,220614831,2022-08-18,2020-08-17,1200,Hollywood,THEFT OF IDENTITY,28,M,H,,Invest Cont,1900 TRANSIENT
3,231207725,2023-02-27,2020-01-27,635,77th Street,THEFT OF IDENTITY,37,M,H,,Invest Cont,6200 4TH AV
4,220213256,2022-07-14,2020-07-14,900,Rampart,THEFT OF IDENTITY,79,M,B,,Invest Cont,1200 W 7TH ST


In [4]:
crimes.shape

(185715, 12)

## 1 Which hour has the highest frequency of crimes?

To achieve this, we need to retain only the first two elements from each row in the 'TIME OCC' column, as we have already transformed this column into a string type.

In [19]:
hour_crimes = crimes["TIME OCC"].str[:2]

In [26]:
hour_crimes = hour_crimes

In [38]:
hour_crimes_count = hour_crimes.value_counts()

In [39]:
hour_crimes_count

TIME OCC
12    13663
18    10125
17     9964
20     9579
15     9393
19     9262
16     9224
14     8872
11     8787
0      8728
21     8701
22     8531
13     8474
10     8440
8      7523
23     7419
9      7092
1      5836
6      5621
7      5403
2      4726
3      3943
4      3238
5      3171
Name: count, dtype: int64

but its more clear with frequency 

In [29]:
hour_crimes = pd.to_numeric(hour_crimes)

In [37]:
hour_crimes

0         11
1         16
2         12
3          6
4          9
          ..
185710    11
185711    18
185712    10
185713    16
185714     9
Name: TIME OCC, Length: 185715, dtype: int64

In [30]:
total_counts = hour_crimes.value_counts().sum()

In [36]:
total_counts

185715

In [40]:
percentage_crimes = (hour_crimes_count / total_counts) * 100

In [41]:
percentage_crimes

TIME OCC
12    7.356972
18    5.451902
17    5.365210
20    5.157903
15    5.057750
19    4.987212
16    4.966750
14    4.777212
11    4.731443
0     4.699674
21    4.685136
22    4.593598
13    4.562906
10    4.544598
8     4.050831
23    3.994831
9     3.818755
1     3.142449
6     3.026681
7     2.909297
2     2.544759
3     2.123146
4     1.743532
5     1.707455
Name: count, dtype: float64

In [42]:
percentage_crimes.sum()

100.0

It is observed that the highest frequency of offenses occurs at 12 AM, accounting for 7% of all recorded incidents. Conversely, the lowest frequency is noted at 5 AM, representing only 1.7% of the total offenses.