# Analyzing Crimes in Los Angeles

Known for its warm weather, palm trees, sprawling coastline, and Hollywood, along with producing some of the most iconic films and songs. However, as with any highly populated city, it isn't always glamorous and there can be a large volume of crime. That's where you can help!

You have been asked to support the Los Angeles Police Department (LAPD) by analyzing crime data to identify patterns in criminal behavior. They plan to use your insights to allocate resources effectively to tackle various crimes in different areas.

For a detailed explanation and narrative of this notebook, please visit this **[link](https://noeyislearning.craft.me/AVqMasVLPtbTtc)**.

## I. Information

They have provided you with a single dataset to use, (`crimes.csv`). A summary and preview are provided below.

It is a modified version of the original data, which is publicly available from Los Angeles Open Data.

| Column     | Description              |
|------------|--------------------------|
| `'DR_NO'` | Division of Records Number: Official file number made up of a 2-digit year, area ID, and 5 digits. |
| `'Date Rptd'` | Date reported - MM/DD/YYYY. |
| `'DATE OCC'` | Date of occurrence - MM/DD/YYYY. |
| `'TIME OCC'` | In 24-hour military time. |
| `'AREA NAME'` | The 21 Geographic Areas or Patrol Divisions are also given a name designation that references a landmark or the surrounding community that it is responsible for. For example, the 77th Street Division is located at the intersection of South Broadway and 77th Street, serving neighborhoods in South Los Angeles. |
| `'Crm Cd Desc'` | Indicates the crime committed. |
| `'Vict Age'` | Victim's age in years. |
| `'Vict Sex'` | Victim's sex: `F`: Female, `M`: Male, `X`: Unknown. |
| `'Vict Descent'` | Victim's descent:<ul><li>`A` - Other Asian</li><li>`B` - Black</li><li>`C` - Chinese</li><li>`D` - Cambodian</li><li>`F` - Filipino</li><li>`G` - Guamanian</li><li>`H` - Hispanic/Latin/Mexican</li><li>`I` - American Indian/Alaskan Native</li><li>`J` - Japanese</li><li>`K` - Korean</li><li>`L` - Laotian</li><li>`O` - Other</li><li>`P` - Pacific Islander</li><li>`S` - Samoan</li><li>`U` - Hawaiian</li><li>`V` - Vietnamese</li><li>`W` - White</li><li>`X` - Unknown</li><li>`Z` - Asian Indian</li> |
| `'Weapon Desc'` | Description of the weapon used (if applicable). |
| `'Status Desc'` | Crime status. |
| `'LOCATION'` | Street address of the crime. |
    
## II. Import Libraries

In [2]:
import pandas as pd
import numpy as np

## III. Load and Preivew the Dataset

- Parse the columns `Date Rptd` and `DATE OCC` as datetime objects. This is useful for any date-related operations that we might want to perform later.
- `TIME OCC` column should be read as a string (text) type, rather than the default numeric type. This is important if the time data includes leading zero or is not strictly numeric.

In [3]:
crimes = pd.read_csv("crimes.csv", parse_dates=["Date Rptd", "DATE OCC"], dtype={"TIME OCC": str})
crimes.head() 

Unnamed: 0,DR_NO,Date Rptd,DATE OCC,TIME OCC,AREA NAME,Crm Cd Desc,Vict Age,Vict Sex,Vict Descent,Weapon Desc,Status Desc,LOCATION
0,220314085,2022-07-22,2020-05-12,1110,Southwest,THEFT OF IDENTITY,27,F,B,,Invest Cont,2500 S SYCAMORE AV
1,222013040,2022-08-06,2020-06-04,1620,Olympic,THEFT OF IDENTITY,60,M,H,,Invest Cont,3300 SAN MARINO ST
2,220614831,2022-08-18,2020-08-17,1200,Hollywood,THEFT OF IDENTITY,28,M,H,,Invest Cont,1900 TRANSIENT
3,231207725,2023-02-27,2020-01-27,635,77th Street,THEFT OF IDENTITY,37,M,H,,Invest Cont,6200 4TH AV
4,220213256,2022-07-14,2020-07-14,900,Rampart,THEFT OF IDENTITY,79,M,B,,Invest Cont,1200 W 7TH ST


## IV. Solutions

### 1. Which hour has the highest frequency of crimes? 

- Store as an integer variable called `peak_crime_hour`
- Extract the hour from the `TIME OCC` column
- Find the hour with the highest frequency of crimes
- Preview result

In [4]:
# Extract the hour
crimes['HOUR OCC'] = crimes['TIME OCC'].str.zfill(4).str[:2].astype(int)

# Find the hour
peak_crime_hour = crimes['HOUR OCC'].mode()[0]

# Preview
print(f'Hour which has the highest frequency of crimes: {peak_crime_hour}')

Hour which has the highest frequency of crimes: 12


**Code Explanation**
- `crimes['HOUR OCC'] = crimes['TIME OCC'].str.zfill(4).str[:2].astype(int)`
    - `.str.zfill(4)`: The method pads the string representation of the time with leading zeros to ensure it is at least 4 characters long. For example, '9', becomes '0009' and '15' remains '0015'.
    -  `.str[:2]`: This slices the first two characters of the padded string, effectively extracting the hour part. For example, '0009' would yield '00' and '0015' would yield '00'.
    -  `.astype(int)`: This converts the extracted hour string to an integer type. The resulting column `HOUR OCC` will contain the hour as integers.
-  `peak_crime_hour = crimes['HOUR OCC'].mode()[0]`
    - `crimes['HOUR OCC'].mode()`: This calculates the mode of the "HOUR OCC" column, which is the hour that appears most frequently in the dataset.
    - `[0]`: Since the `mode()` function can return multiple values (if there are ties), this selects the first mode value, which represents the peak crime hour.

---
### 2. Which area has the largest frequency of night crimes (crimes committed between 10pm and 3:59am)? 

- Save as a string variable called `peak_night_crime_location`
- Define the range for night crimes, as given (between 10pm to 3:59am)
- Filter the DataFrame for night crimes
- Find the area with the largest frequency of night crimes
- Preview result

In [5]:
# Define the range
night_crime_hours = list(range(22, 24)) + list(range(0,4))

# Filter the DataFrame
night_crimes = crimes[crimes['HOUR OCC'].isin(night_crime_hours)]

# Find the area with the largest frequency
peak_night_crime_location = night_crimes['AREA NAME'].mode()[0]

# Preview
print(f'Area which has the largest frequency of night crimes: {peak_night_crime_location}')

Area which has the largest frequency of night crimes: Central


**Code Explanation**

- `night_crime_hours = list(range(22, 24)) + list(range(0, 4))`
    -  `list(range(22, 24))`: This creates a list of integers representing the hours from 22 (10 PM) to 23 (11 PM), which results in `[22, 23]`.
    -  `list(range(0, 4))`: This creates a list of integers representing the hours from 0 (midnight) to 3 (3 AM), resulting in `[0, 1, 2, 3]`.
    -  The `+` operator combines these two lists, resulting in `night_crime_hours = [22, 23, 0, 1, 2, 3]`. This list represents the hours considered as "night" for the analysis.
-  `night_crimes = crimes[crimes['HOUR OCC'].isin(night_crime_hours)]`
    -  `crimes['HOUR OCC'].isin(night_crime_hours)`: This checks each value in the `HOUR OCC` column to see if it is in the `night_crime_hours` list. It returns a Boolean mask (`True`/`False`) for each row.

---
### 3. Identify the number of crimes committed against victims of different age groups/brackets.

- Save as a pandas Series called victim_ages, with age bracket labels `"0-17"`, `"18-25"`, `"26-34"`, `"35-44"`, `"45-54"`, `"55-64"`, and `"65+"` as the index and the frequency of crimes as the values.
- Define the age bracket bins and labels
- Split the `Vict Age` column into age bracket
- Count the number of crimes in each age bracket
- Preview result

In [6]:
# Define the group bins and labels
bins = [0, 17, 25, 34, 44, 54, 64, np.inf]
labels = ['0-17', '18-25', '26-34', '35-44', '45-54', '55-64', '65+']

# Split the 'Vict Age' column
crimes['Age Bracket'] = pd.cut(crimes['Vict Age'], bins=bins, labels=labels)

# Count the number of crimes
victim_ages = crimes['Age Bracket'].value_counts()

# Preview
print(f'Number of crimes committed in different age groups:\n{victim_ages}')

Number of crimes committed in different age groups:
26-34    47470
35-44    42157
45-54    28353
18-25    28291
55-64    20169
65+      14747
0-17      4528
Name: Age Bracket, dtype: int64


**Code Explanation**

- `bins = [0, 17, 25, 34, 44, 54, 64, np.inf]` and `labels = ['0-17', '18-25', '26-34', '35-44', '45-54', '55-64', '65+']`
    -  `bins`: This list defines the edges of the age groups.
        - ... 65 and above (using `np.inf` to represent infinity)
    - `labels`: This list provides the labels for each of the age groups defined by the bins
- `crimes['Age Bracket'] = pd.cut(crimes['Vict Age'], bins=bins, labels=labels)`
    -  `pd.cut(crimes['Vict Age'], bins=bins, labels=labels)`: This function cuts or splits the `Vict Age` column into discrete intervals defined by bins and assigns the corresponding labels to each interval.