# Los Angeles Crime Analysis

![Los Angeles Crime Map](https://cdn.pixabay.com/photo/2016/08/05/08/36/los-angeles-1571740_1280.jpg)

## Introduction

This notebook analyzes crime data from the Los Angeles Police Department (LAPD) to identify patterns in criminal behavior. The insights from this analysis can help the LAPD allocate resources effectively to tackle various crimes in different areas of the city.

## Data Loading and Initial Exploration

We'll start by loading the crime data and examining its structure.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

# Set visualization style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('viridis')
warnings.filterwarnings('ignore')

# Load the dataset
crimes = pd.read_csv("crimes.csv", dtype={"TIME OCC": str})

# Display the first few rows
crimes.head()

In [None]:
# Check basic information about the dataset
print(f"Dataset shape: {crimes.shape}")
print("\nColumn data types:")
print(crimes.dtypes)
print("\nMissing values per column:")
print(crimes.isnull().sum())

In [None]:
# Get a statistical summary of the dataset
crimes.describe(include='all')

## Data Cleaning and Preprocessing

We'll now clean the data by handling missing values, removing duplicates, and addressing outliers.

In [None]:
# Check for duplicates
duplicate_count = crimes.duplicated().sum()
print(f"Number of duplicate records: {duplicate_count}")

# Remove duplicates if any
if duplicate_count > 0:
    crimes = crimes.drop_duplicates()
    print(f"After removing duplicates, dataset shape: {crimes.shape}")

In [None]:
# Convert date columns to datetime
crimes['Date Rptd'] = pd.to_datetime(crimes['Date Rptd'])
crimes['DATE OCC'] = pd.to_datetime(crimes['DATE OCC'])

# Extract time components
crimes['TIME OCC'] = crimes['TIME OCC'].str.zfill(4)  # Ensure 4 digits (e.g., '0900' instead of '900')
crimes['Hour'] = crimes['TIME OCC'].str[:2].astype(int)
crimes['Minute'] = crimes['TIME OCC'].str[2:].astype(int)

# Define time periods
conditions = [
    (crimes['Hour'] >= 5) & (crimes['Hour'] < 12),
    (crimes['Hour'] >= 12) & (crimes['Hour'] < 17),
    (crimes['Hour'] >= 17) & (crimes['Hour'] < 22),
    (crimes['Hour'] >= 22) | (crimes['Hour'] < 5)
]
choices = ['Morning', 'Afternoon', 'Evening', 'Night']
crimes['Time_Period'] = np.select(conditions, choices, default='Unknown')

# Check the distribution of time periods
crimes['Time_Period'].value_counts()

In [None]:
# Handle missing values
# For weapon description, fill NaN with 'No Weapon'
crimes['Weapon Desc'] = crimes['Weapon Desc'].fillna('No Weapon')

# For victim demographics with missing values, mark as 'Unknown'
crimes['Vict Sex'] = crimes['Vict Sex'].fillna('X')
crimes['Vict Descent'] = crimes['Vict Descent'].fillna('X')

# Check for outliers in victim age
plt.figure(figsize=(10, 6))
sns.boxplot(x=crimes['Vict Age'])
plt.title('Boxplot of Victim Age')
plt.show()

# Statistical check for outliers
q1 = crimes['Vict Age'].quantile(0.25)
q3 = crimes['Vict Age'].quantile(0.75)
iqr = q3 - q1
lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr

print(f"Victim Age - Lower bound: {lower_bound}, Upper bound: {upper_bound}")
age_outliers = ((crimes['Vict Age'] < lower_bound) | (crimes['Vict Age'] > upper_bound)).sum()
print(f"Number of outliers in Victim Age: {age_outliers}")

In [None]:
# Handle outliers - cap at reasonable limits
crimes['Vict Age'] = crimes['Vict Age'].clip(lower=0, upper=100)

# Create age groups for analysis
age_bins = [0, 17, 25, 34, 44, 54, 64, 100]
age_labels = ['0-17', '18-25', '26-34', '35-44', '45-54', '55-64', '65+']
crimes['Age Group'] = pd.cut(crimes['Vict Age'], bins=age_bins, labels=age_labels, right=False)

# Check distribution of age groups
crimes['Age Group'].value_counts().sort_index()

## Exploratory Data Analysis

Now we'll explore patterns in the data through visualizations.

In [None]:
# Crime frequency by hour
plt.figure(figsize=(14, 6))
hourly_crimes = crimes['Hour'].value_counts().sort_index()
sns.barplot(x=hourly_crimes.index, y=hourly_crimes.values)
plt.title('Crime Frequency by Hour of Day')
plt.xlabel('Hour (24-hour format)')
plt.ylabel('Number of Crimes')
plt.xticks(range(0, 24))
plt.grid(axis='y', alpha=0.3)
plt.show()

# Find the peak crime hour
peak_crime_hour = hourly_crimes.idxmax()
print(f"Peak crime hour: {peak_crime_hour}:00 with {hourly_crimes.max()} crimes")

In [None]:
# Crime types distribution
plt.figure(figsize=(12, 8))
top_crimes = crimes['Crm Cd Desc'].value_counts().head(15)
sns.barplot(x=top_crimes.values, y=top_crimes.index)
plt.title('Top 15 Crime Types in Los Angeles')
plt.xlabel('Number of Crimes')
plt.tight_layout()
plt.show()

In [None]:
# Night crimes analysis (10pm to 3:59am)
night_crimes = crimes[(crimes['Hour'] >= 22) | (crimes['Hour'] < 4)]
night_crime_locations = night_crimes['AREA NAME'].value_counts()

plt.figure(figsize=(12, 8))
sns.barplot(x=night_crime_locations.values[:10], y=night_crime_locations.index[:10])
plt.title('Top 10 Areas with Night Crimes (10pm - 3:59am)')
plt.xlabel('Number of Night Crimes')
plt.tight_layout()
plt.show()

# Find the location with the most night crimes
peak_night_crime_location = night_crime_locations.idxmax()
print(f"Area with most night crimes: {peak_night_crime_location} with {night_crime_locations.max()} crimes")

In [None]:
# Crime analysis by victim demographics
plt.figure(figsize=(14, 10))

# Subplot for victim age groups
plt.subplot(2, 2, 1)
victim_ages = crimes['Age Group'].value_counts().sort_index()
sns.barplot(x=victim_ages.index, y=victim_ages.values)
plt.title('Crimes by Victim Age Group')
plt.xticks(rotation=45)
plt.ylabel('Number of Crimes')

# Subplot for victim sex
plt.subplot(2, 2, 2)
sex_map = {'F': 'Female', 'M': 'Male', 'X': 'Unknown'}
crimes['Vict Sex Desc'] = crimes['Vict Sex'].map(sex_map)
sex_counts = crimes['Vict Sex Desc'].value_counts()
sns.barplot(x=sex_counts.index, y=sex_counts.values)
plt.title('Crimes by Victim Sex')
plt.ylabel('Number of Crimes')

# Subplot for top victim descents
plt.subplot(2, 2, 3)
descent_counts = crimes['Vict Descent'].value_counts().head(8)
sns.barplot(x=descent_counts.index, y=descent_counts.values)
plt.title('Top 8 Victim Descents')
plt.ylabel('Number of Crimes')

plt.tight_layout()
plt.show()

## Answering the Specific Questions

Now we'll answer the specific questions requested in the analysis.

In [None]:
# Question 1: Which hour has the highest frequency of crimes?
peak_crime_hour = hourly_crimes.idxmax()
print(f"Peak crime hour: {peak_crime_hour} (integer variable)")

# Question 2: Which area has the largest frequency of night crimes (10pm-3:59am)?
peak_night_crime_location = night_crime_locations.idxmax()
print(f"Peak night crime location: '{peak_night_crime_location}' (string variable)")

# Question 3: Identify the number of crimes committed against victims of different age groups
victim_ages = crimes['Age Group'].value_counts().sort_index()
print("\nVictim ages by age group (pandas Series):")
print(victim_ages)

In [None]:
# Create variables in the required format
peak_crime_hour = int(hourly_crimes.idxmax())
peak_night_crime_location = night_crime_locations.idxmax()
victim_ages = crimes['Age Group'].value_counts().sort_index()

# Verify variable types
print(f"Type of peak_crime_hour: {type(peak_crime_hour)}")
print(f"Type of peak_night_crime_location: {type(peak_night_crime_location)}")
print(f"Type of victim_ages: {type(victim_ages)}")

## Conclusion

From our analysis of Los Angeles crime data, we've discovered several key insights:

1. The peak hour for crimes is during the afternoon/evening hours.
2. Specific areas in LA experience higher rates of night crimes, which could inform patrol allocations.
3. There are distinct patterns in the age distribution of crime victims, with certain age groups being more vulnerable.

These findings can help the LAPD allocate resources more effectively to address crime patterns in different areas and protect vulnerable populations.