## 1. Import Libraries & Load Data <a id='1'></a>

Let's import the necessary libraries and load our transformed dataset.

In [None]:
# Import essential libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.float_format', '{:.2f}'.format)

# Set style for visualizations
sns.set_style('whitegrid')
sns.set_palette('husl')
plt.rcParams['figure.figsize'] = (14, 7)
plt.rcParams['font.size'] = 11

print("✓ Libraries imported successfully!")

In [None]:
# Load the transformed dataset
df = pd.read_csv('Crime_Data_Transformed.csv')

# Convert date columns to datetime
df['Date Rptd'] = pd.to_datetime(df['Date Rptd'])
df['DATE OCC'] = pd.to_datetime(df['DATE OCC'])

print(f"✓ Dataset loaded successfully!")
print(f"  Shape: {df.shape}")
print(f"  Date range: {df['DATE OCC'].min()} to {df['DATE OCC'].max()}")

In [None]:
# Display first few rows
df.head(10)

In [None]:
# Display dataset information
df.info()

## 2. Descriptive Statistics <a id='2'></a>

Let's explore the basic statistics of our dataset to understand the distribution and central tendencies.

In [None]:
# Descriptive statistics for numerical columns
print("=" * 80)
print("DESCRIPTIVE STATISTICS - NUMERICAL FEATURES")
print("=" * 80)
df.describe()

In [None]:
# Descriptive statistics for categorical columns
print("=" * 80)
print("DESCRIPTIVE STATISTICS - CATEGORICAL FEATURES")
print("=" * 80)
df.describe(include=['object'])

### 2.1 Crime Categories Distribution

In [None]:
# Top 15 most common crime types
print("\n" + "=" * 80)
print("TOP 15 MOST COMMON CRIME TYPES")
print("=" * 80)
crime_counts = df['Crm Cd Desc'].value_counts().head(15)
print(crime_counts)
print(f"\nTotal unique crime types: {df['Crm Cd Desc'].nunique()}")

In [None]:
# Crime category distribution
print("\n" + "=" * 80)
print("CRIME CATEGORY DISTRIBUTION")
print("=" * 80)
category_counts = df['crime_category'].value_counts()
print(category_counts)
print(f"\nPercentage breakdown:")
print(df['crime_category'].value_counts(normalize=True) * 100)

### 2.2 Geographic Distribution

In [None]:
# Area distribution
print("\n" + "=" * 80)
print("CRIME DISTRIBUTION BY AREA")
print("=" * 80)
area_counts = df['AREA NAME'].value_counts()
print(area_counts)
print(f"\nTotal unique areas: {df['AREA NAME'].nunique()}")

### 2.3 Victim Demographics

In [None]:
# Victim age group distribution
print("\n" + "=" * 80)
print("VICTIM AGE GROUP DISTRIBUTION")
print("=" * 80)
age_group_counts = df['victim_age_group'].value_counts()
print(age_group_counts)
print(f"\nPercentage breakdown:")
print(df['victim_age_group'].value_counts(normalize=True) * 100)

In [None]:
# Victim sex distribution
print("\n" + "=" * 80)
print("VICTIM SEX DISTRIBUTION")
print("=" * 80)
sex_counts = df['Vict Sex'].value_counts()
print(sex_counts)

### 2.4 Temporal Patterns

In [None]:
# Time period distribution
print("\n" + "=" * 80)
print("CRIMES BY TIME PERIOD")
print("=" * 80)
time_period_counts = df['time_period'].value_counts().sort_index()
print(time_period_counts)

In [None]:
# Weekday vs Weekend
print("\n" + "=" * 80)
print("WEEKDAY VS WEEKEND CRIMES")
print("=" * 80)
weekend_counts = df['is_weekend'].value_counts()
weekend_counts.index = ['Weekday', 'Weekend']
print(weekend_counts)
print(f"\nPercentage breakdown:")
print((weekend_counts / weekend_counts.sum()) * 100)

### 2.5 Weapon Usage

In [None]:
# Weapon involvement
print("\n" + "=" * 80)
print("WEAPON INVOLVEMENT IN CRIMES")
print("=" * 80)
weapon_counts = df['weapon_involved'].value_counts()
print(weapon_counts)
print(f"\nPercentage breakdown:")
print((weapon_counts / weapon_counts.sum()) * 100)

In [None]:
# Weapon category distribution (when weapon is involved)
print("\n" + "=" * 80)
print("WEAPON CATEGORY DISTRIBUTION")
print("=" * 80)
weapon_category_counts = df['weapon_category'].value_counts()
print(weapon_category_counts)

## 3. Aggregations and Group Operations <a id='3'></a>

Let's use groupby, pivot tables, and crosstabs to analyze patterns and relationships.

### 3.1 Crime Severity by Area

In [None]:
# Group by area and crime severity
print("=" * 80)
print("CRIME SEVERITY BY AREA")
print("=" * 80)
area_severity = df.groupby(['AREA NAME', 'crime_severity']).size().unstack(fill_value=0)
area_severity['Total'] = area_severity.sum(axis=1)
area_severity = area_severity.sort_values('Total', ascending=False)
print(area_severity)

### 3.2 Crime Category by Time Period

In [None]:
# Crosstab: Crime category vs Time period
print("\n" + "=" * 80)
print("CRIME CATEGORY BY TIME PERIOD")
print("=" * 80)
crime_time = pd.crosstab(df['crime_category'], df['time_period'], margins=True)
print(crime_time)

### 3.3 Pivot Table: Average Area Risk Score

In [None]:
# Pivot table: Area risk score by year and crime severity
print("\n" + "=" * 80)
print("AVERAGE AREA RISK SCORE BY YEAR AND CRIME SEVERITY")
print("=" * 80)
risk_pivot = df.pivot_table(
    values='area_risk_score', 
    index='year', 
    columns='crime_severity', 
    aggfunc='mean'
)
print(risk_pivot)

### 3.4 Weapon Usage by Crime Category

In [None]:
# Pivot table: Weapon involvement by crime category
print("\n" + "=" * 80)
print("WEAPON INVOLVEMENT BY CRIME CATEGORY")
print("=" * 80)
weapon_crime = pd.crosstab(
    df['crime_category'], 
    df['weapon_involved'], 
    normalize='index'
) * 100
print(weapon_crime)

### 3.5 Top Areas by Demographic and Crime Type

In [None]:
# Group by area and aggregate multiple statistics
print("\n" + "=" * 80)
print("COMPREHENSIVE AREA STATISTICS")
print("=" * 80)
area_stats = df.groupby('AREA NAME').agg({
    'DR_NO': 'count',
    'Vict Age': 'mean',
    'weapon_involved': lambda x: (x == 1).sum(),
    'area_risk_score': 'mean',
    'reporting_delay_days': 'mean',
    'population': 'first',
    'median_income': 'first'
}).round(2)

area_stats.columns = ['Total_Crimes', 'Avg_Victim_Age', 'Crimes_With_Weapon', 
                       'Avg_Risk_Score', 'Avg_Reporting_Delay', 'Population', 'Median_Income']
area_stats = area_stats.sort_values('Total_Crimes', ascending=False)
print(area_stats.head(15))

### 3.6 Monthly Crime Patterns by Category

In [None]:
# Group by month and crime category
print("\n" + "=" * 80)
print("MONTHLY CRIME PATTERNS BY CATEGORY")
print("=" * 80)
monthly_category = df.groupby(['month_name', 'crime_category']).size().unstack(fill_value=0)

# Reorder months
month_order = ['January', 'February', 'March', 'April', 'May', 'June', 
               'July', 'August', 'September', 'October', 'November', 'December']
monthly_category = monthly_category.reindex(month_order)
print(monthly_category)

## 4. Time Series Analysis <a id='4'></a>

Let's analyze temporal patterns using time series techniques.

### 4.1 Daily Crime Trends

In [None]:
# Set date as index for time series analysis
df_ts = df.set_index('DATE OCC').sort_index()

# Daily crime counts
daily_crimes = df_ts.resample('D').size()
print("=" * 80)
print("DAILY CRIME STATISTICS")
print("=" * 80)
print(f"Average daily crimes: {daily_crimes.mean():.2f}")
print(f"Max daily crimes: {daily_crimes.max()}")
print(f"Min daily crimes: {daily_crimes.min()}")
print(f"Standard deviation: {daily_crimes.std():.2f}")

### 4.2 Weekly and Monthly Aggregations

In [None]:
# Weekly resampling
weekly_crimes = df_ts.resample('W').size()
print("\n" + "=" * 80)
print("WEEKLY CRIME STATISTICS")
print("=" * 80)
print(f"Average weekly crimes: {weekly_crimes.mean():.2f}")
print(f"Max weekly crimes: {weekly_crimes.max()}")
print(f"Min weekly crimes: {weekly_crimes.min()}")

# Monthly resampling
monthly_crimes = df_ts.resample('M').size()
print("\n" + "=" * 80)
print("MONTHLY CRIME STATISTICS")
print("=" * 80)
print(f"Average monthly crimes: {monthly_crimes.mean():.2f}")
print(f"Max monthly crimes: {monthly_crimes.max()}")
print(f"Min monthly crimes: {monthly_crimes.min()}")

### 4.3 Rolling Averages (7-day and 30-day)

In [None]:
# Calculate rolling averages
daily_crimes_df = pd.DataFrame({
    'Daily': daily_crimes,
    'Rolling_7day': daily_crimes.rolling(window=7).mean(),
    'Rolling_30day': daily_crimes.rolling(window=30).mean()
})

print("\n" + "=" * 80)
print("ROLLING AVERAGES - SAMPLE")
print("=" * 80)
print(daily_crimes_df.tail(10))

### 4.4 Year-over-Year Comparison

In [None]:
# Yearly aggregation
yearly_crimes = df.groupby('year').agg({
    'DR_NO': 'count',
    'weapon_involved': lambda x: (x == 1).sum(),
    'area_risk_score': 'mean'
}).round(2)

yearly_crimes.columns = ['Total_Crimes', 'Crimes_With_Weapon', 'Avg_Risk_Score']
yearly_crimes['Weapon_Percentage'] = (yearly_crimes['Crimes_With_Weapon'] / yearly_crimes['Total_Crimes'] * 100).round(2)

print("\n" + "=" * 80)
print("YEAR-OVER-YEAR COMPARISON")
print("=" * 80)
print(yearly_crimes)

### 4.5 Seasonal Patterns (Quarterly Analysis)

In [None]:
# Quarterly aggregation
quarterly_crimes = df.groupby(['year', 'quarter']).agg({
    'DR_NO': 'count',
    'crime_category': lambda x: x.value_counts().index[0]
})

quarterly_crimes.columns = ['Total_Crimes', 'Most_Common_Category']
print("\n" + "=" * 80)
print("QUARTERLY CRIME ANALYSIS")
print("=" * 80)
print(quarterly_crimes)

### 4.6 Shift Analysis (Lag Features)

In [None]:
# Create lag features for weekly data
weekly_df = pd.DataFrame({
    'Current_Week': weekly_crimes,
    'Previous_Week': weekly_crimes.shift(1),
    'Two_Weeks_Ago': weekly_crimes.shift(2)
})

weekly_df['Week_over_Week_Change'] = weekly_df['Current_Week'] - weekly_df['Previous_Week']
weekly_df['WoW_Percent_Change'] = ((weekly_df['Current_Week'] - weekly_df['Previous_Week']) / weekly_df['Previous_Week'] * 100).round(2)

print("\n" + "=" * 80)
print("WEEK-OVER-WEEK ANALYSIS - LAST 10 WEEKS")
print("=" * 80)
print(weekly_df.dropna().tail(10))

## 5. Correlation Analysis <a id='5'></a>

Let's examine relationships between numerical variables.

In [None]:
# Select numerical columns for correlation analysis
numerical_cols = ['Vict Age', 'reporting_delay_days', 'area_risk_score', 
                  'population', 'median_income', 'area_size_sq_miles', 
                  'total_cases', 'crimes_per_1000', 'weapon_involved', 
                  'is_weekend', 'hour', 'year', 'month', 'quarter']

# Filter columns that exist in the dataframe
available_cols = [col for col in numerical_cols if col in df.columns]

# Correlation matrix
correlation_matrix = df[available_cols].corr()
print("=" * 80)
print("CORRELATION MATRIX")
print("=" * 80)
print(correlation_matrix.round(3))

In [None]:
# Find strongest correlations
print("\n" + "=" * 80)
print("TOP 15 STRONGEST CORRELATIONS (excluding diagonal)")
print("=" * 80)

# Get upper triangle of correlation matrix
upper_triangle = correlation_matrix.where(
    np.triu(np.ones(correlation_matrix.shape), k=1).astype(bool)
)

# Find strongest correlations
correlations = upper_triangle.unstack().dropna().sort_values(ascending=False)
print(correlations.head(15))

## 6. Visualizations <a id='6'></a>

Let's create comprehensive visualizations to understand our data better.

### 6.1 Crime Category Distribution

In [None]:
# Visualization 1: Crime Category Distribution
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Bar plot
crime_cat = df['crime_category'].value_counts()
axes[0].bar(crime_cat.index, crime_cat.values, color=['#FF6B6B', '#4ECDC4', '#45B7D1'])
axes[0].set_title('Crime Distribution by Category', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Crime Category', fontsize=12)
axes[0].set_ylabel('Number of Crimes', fontsize=12)
axes[0].tick_params(axis='x', rotation=45)
for i, v in enumerate(crime_cat.values):
    axes[0].text(i, v + 500, str(v), ha='center', fontsize=10)

# Pie chart
colors = ['#FF6B6B', '#4ECDC4', '#45B7D1']
axes[1].pie(crime_cat.values, labels=crime_cat.index, autopct='%1.1f%%', 
            colors=colors, startangle=90, textprops={'fontsize': 11})
axes[1].set_title('Crime Category Percentage', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.savefig('eda_crime_category_distribution.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Visualization 1 saved: eda_crime_category_distribution.png")

### 6.2 Top 10 Crime Types

In [None]:
# Visualization 2: Top 10 Crime Types
fig, ax = plt.subplots(figsize=(14, 8))

top_crimes = df['Crm Cd Desc'].value_counts().head(10)
bars = ax.barh(range(len(top_crimes)), top_crimes.values, color=sns.color_palette('viridis', len(top_crimes)))
ax.set_yticks(range(len(top_crimes)))
ax.set_yticklabels(top_crimes.index, fontsize=11)
ax.set_xlabel('Number of Cases', fontsize=12, fontweight='bold')
ax.set_title('Top 10 Most Common Crime Types', fontsize=14, fontweight='bold', pad=20)
ax.invert_yaxis()

# Add value labels
for i, (bar, value) in enumerate(zip(bars, top_crimes.values)):
    ax.text(value + 50, i, f'{value:,}', va='center', fontsize=10)

plt.tight_layout()
plt.savefig('eda_top10_crime_types.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Visualization 2 saved: eda_top10_crime_types.png")

### 6.3 Time Series: Daily Crime Trends with Rolling Average

In [None]:
# Visualization 3: Time Series Analysis
fig, ax = plt.subplots(figsize=(16, 7))

# Plot daily crimes (lighter)
ax.plot(daily_crimes.index, daily_crimes.values, alpha=0.3, linewidth=0.8, 
        color='gray', label='Daily Crimes')

# Plot 7-day rolling average
rolling_7 = daily_crimes.rolling(window=7).mean()
ax.plot(rolling_7.index, rolling_7.values, linewidth=2, 
        color='#FF6B6B', label='7-Day Moving Average')

# Plot 30-day rolling average
rolling_30 = daily_crimes.rolling(window=30).mean()
ax.plot(rolling_30.index, rolling_30.values, linewidth=2.5, 
        color='#4ECDC4', label='30-Day Moving Average')

ax.set_title('Daily Crime Trends with Rolling Averages', fontsize=14, fontweight='bold', pad=20)
ax.set_xlabel('Date', fontsize=12, fontweight='bold')
ax.set_ylabel('Number of Crimes', fontsize=12, fontweight='bold')
ax.legend(loc='best', fontsize=11)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('eda_time_series_trends.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Visualization 3 saved: eda_time_series_trends.png")

### 6.4 Geographic Distribution - Top 15 Areas

In [None]:
# Visualization 4: Geographic Distribution
fig, ax = plt.subplots(figsize=(14, 8))

top_areas = df['AREA NAME'].value_counts().head(15)
bars = ax.barh(range(len(top_areas)), top_areas.values, 
               color=sns.color_palette('rocket', len(top_areas)))
ax.set_yticks(range(len(top_areas)))
ax.set_yticklabels(top_areas.index, fontsize=11)
ax.set_xlabel('Number of Crimes', fontsize=12, fontweight='bold')
ax.set_title('Top 15 Areas by Crime Count', fontsize=14, fontweight='bold', pad=20)
ax.invert_yaxis()

# Add value labels
for i, (bar, value) in enumerate(zip(bars, top_areas.values)):
    ax.text(value + 30, i, f'{value:,}', va='center', fontsize=10)

plt.tight_layout()
plt.savefig('eda_geographic_distribution.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Visualization 4 saved: eda_geographic_distribution.png")

### 6.5 Temporal Patterns

In [None]:
# Visualization 5: Temporal Patterns
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# 5a: Crimes by Day of Week
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
day_counts = df['day_name'].value_counts().reindex(day_order)
colors_day = ['#FF6B6B' if day in ['Saturday', 'Sunday'] else '#4ECDC4' for day in day_order]
axes[0, 0].bar(range(len(day_counts)), day_counts.values, color=colors_day)
axes[0, 0].set_xticks(range(len(day_counts)))
axes[0, 0].set_xticklabels(day_order, rotation=45, ha='right')
axes[0, 0].set_title('Crimes by Day of Week', fontsize=12, fontweight='bold')
axes[0, 0].set_ylabel('Number of Crimes', fontsize=11)

# 5b: Crimes by Month
month_order = ['January', 'February', 'March', 'April', 'May', 'June', 
               'July', 'August', 'September', 'October', 'November', 'December']
month_counts = df['month_name'].value_counts().reindex(month_order)
axes[0, 1].plot(range(len(month_counts)), month_counts.values, marker='o', 
                linewidth=2.5, markersize=8, color='#FF6B6B')
axes[0, 1].set_xticks(range(len(month_counts)))
axes[0, 1].set_xticklabels([m[:3] for m in month_order], rotation=45)
axes[0, 1].set_title('Crimes by Month', fontsize=12, fontweight='bold')
axes[0, 1].set_ylabel('Number of Crimes', fontsize=11)
axes[0, 1].grid(True, alpha=0.3)

# 5c: Crimes by Hour
hour_counts = df['hour'].value_counts().sort_index()
axes[1, 0].plot(hour_counts.index, hour_counts.values, linewidth=2.5, 
                marker='o', markersize=6, color='#45B7D1')
axes[1, 0].set_title('Crimes by Hour of Day', fontsize=12, fontweight='bold')
axes[1, 0].set_xlabel('Hour', fontsize=11)
axes[1, 0].set_ylabel('Number of Crimes', fontsize=11)
axes[1, 0].set_xticks(range(0, 24, 2))
axes[1, 0].grid(True, alpha=0.3)

# 5d: Crimes by Time Period
time_period_order = ['Night (00:00-05:59)', 'Morning (06:00-11:59)', 
                     'Afternoon (12:00-17:59)', 'Evening (18:00-23:59)']
time_counts = df['time_period'].value_counts().reindex(time_period_order)
axes[1, 1].bar(range(len(time_counts)), time_counts.values, 
               color=['#2C3E50', '#F39C12', '#E74C3C', '#8E44AD'])
axes[1, 1].set_xticks(range(len(time_counts)))
axes[1, 1].set_xticklabels(['Night', 'Morning', 'Afternoon', 'Evening'])
axes[1, 1].set_title('Crimes by Time Period', fontsize=12, fontweight='bold')
axes[1, 1].set_ylabel('Number of Crimes', fontsize=11)

plt.tight_layout()
plt.savefig('eda_temporal_patterns.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Visualization 5 saved: eda_temporal_patterns.png")

### 6.6 Victim Demographics

In [None]:
# Visualization 6: Victim Demographics
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# 6a: Age Group Distribution
age_order = ['Child (0-17)', 'Young Adult (18-34)', 'Middle Age (35-49)', 'Senior (50-64)', 'Elderly (65+)']
age_counts = df['victim_age_group'].value_counts().reindex([a for a in age_order if a in df['victim_age_group'].unique()])
axes[0].bar(range(len(age_counts)), age_counts.values, 
            color=sns.color_palette('coolwarm', len(age_counts)))
axes[0].set_xticks(range(len(age_counts)))
axes[0].set_xticklabels(age_counts.index, rotation=45, ha='right')
axes[0].set_title('Victim Age Group Distribution', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Number of Victims', fontsize=11)

# 6b: Sex Distribution
sex_counts = df['Vict Sex'].value_counts().head(5)
axes[1].pie(sex_counts.values, labels=sex_counts.index, autopct='%1.1f%%', 
            colors=sns.color_palette('pastel'), startangle=90, textprops={'fontsize': 11})
axes[1].set_title('Victim Sex Distribution', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.savefig('eda_victim_demographics.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Visualization 6 saved: eda_victim_demographics.png")

### 6.7 Correlation Heatmap

In [None]:
# Visualization 7: Correlation Heatmap
fig, ax = plt.subplots(figsize=(14, 10))

# Select key variables for correlation
corr_vars = ['Vict Age', 'weapon_involved', 'is_weekend', 'reporting_delay_days',
             'area_risk_score', 'population', 'median_income', 'crimes_per_1000', 'hour']
corr_vars = [var for var in corr_vars if var in df.columns]

correlation = df[corr_vars].corr()

sns.heatmap(correlation, annot=True, fmt='.2f', cmap='coolwarm', center=0,
            square=True, linewidths=1, cbar_kws={'shrink': 0.8},
            vmin=-1, vmax=1, ax=ax)
ax.set_title('Correlation Matrix of Key Variables', fontsize=14, fontweight='bold', pad=20)

plt.tight_layout()
plt.savefig('eda_correlation_heatmap.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Visualization 7 saved: eda_correlation_heatmap.png")

### 6.8 Weapon Involvement Analysis

In [None]:
# Visualization 8: Weapon Analysis
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# 8a: Weapon Involvement
weapon_counts = df['weapon_involved'].value_counts()
weapon_labels = ['No Weapon', 'Weapon Involved']
colors = ['#95D5B2', '#F08080']
axes[0].pie(weapon_counts.values, labels=weapon_labels, autopct='%1.1f%%',
            colors=colors, startangle=90, textprops={'fontsize': 12})
axes[0].set_title('Weapon Involvement in Crimes', fontsize=12, fontweight='bold')

# 8b: Weapon Category Distribution (for crimes with weapons)
weapon_cat = df[df['weapon_involved'] == 1]['weapon_category'].value_counts()
axes[1].bar(range(len(weapon_cat)), weapon_cat.values, 
            color=sns.color_palette('Reds_r', len(weapon_cat)))
axes[1].set_xticks(range(len(weapon_cat)))
axes[1].set_xticklabels(weapon_cat.index, rotation=45, ha='right')
axes[1].set_title('Weapon Categories Used', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Number of Cases', fontsize=11)

plt.tight_layout()
plt.savefig('eda_weapon_analysis.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Visualization 8 saved: eda_weapon_analysis.png")

### 6.9 Crime Severity by Area

In [None]:
# Visualization 9: Crime Severity Analysis
fig, ax = plt.subplots(figsize=(14, 8))

# Get top 10 areas
top_10_areas = df['AREA NAME'].value_counts().head(10).index
df_top10 = df[df['AREA NAME'].isin(top_10_areas)]

# Create grouped bar chart
severity_area = pd.crosstab(df_top10['AREA NAME'], df_top10['crime_severity'])
severity_area = severity_area.reindex(top_10_areas)

severity_area.plot(kind='barh', stacked=False, ax=ax, 
                   color=['#FF6B6B', '#4ECDC4'], width=0.7)
ax.set_title('Crime Severity Distribution in Top 10 Areas', fontsize=14, fontweight='bold', pad=20)
ax.set_xlabel('Number of Crimes', fontsize=12)
ax.set_ylabel('Area', fontsize=12)
ax.legend(title='Crime Severity', fontsize=10, title_fontsize=11)
ax.invert_yaxis()

plt.tight_layout()
plt.savefig('eda_severity_by_area.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Visualization 9 saved: eda_severity_by_area.png")

### 6.10 Year-over-Year Trends

In [None]:
# Visualization 10: Year-over-Year Trends
fig, axes = plt.subplots(2, 1, figsize=(14, 10))

# 10a: Total Crimes by Year and Category
year_category = pd.crosstab(df['year'], df['crime_category'])
year_category.plot(kind='bar', ax=axes[0], width=0.8, 
                   color=['#FF6B6B', '#4ECDC4', '#45B7D1'])
axes[0].set_title('Crime Trends by Category (Year-over-Year)', fontsize=12, fontweight='bold')
axes[0].set_xlabel('Year', fontsize=11)
axes[0].set_ylabel('Number of Crimes', fontsize=11)
axes[0].legend(title='Crime Category', fontsize=9)
axes[0].tick_params(axis='x', rotation=0)

# 10b: Monthly Average by Year
monthly_year = df.groupby(['year', 'month']).size().reset_index(name='count')
for year in monthly_year['year'].unique():
    year_data = monthly_year[monthly_year['year'] == year]
    axes[1].plot(year_data['month'], year_data['count'], 
                marker='o', linewidth=2, label=str(year))

axes[1].set_title('Monthly Crime Patterns by Year', fontsize=12, fontweight='bold')
axes[1].set_xlabel('Month', fontsize=11)
axes[1].set_ylabel('Number of Crimes', fontsize=11)
axes[1].legend(title='Year', fontsize=9)
axes[1].set_xticks(range(1, 13))
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('eda_year_over_year_trends.png', dpi=300, bbox_inches='tight')
plt.show()

print("✓ Visualization 10 saved: eda_year_over_year_trends.png")

## 7. Key Insights & Conclusions <a id='7'></a>

Let's summarize the key findings from our exploratory data analysis.

### Summary of Key Findings

Based on our comprehensive exploratory data analysis, here are the main insights:

#### 1. **Crime Distribution**
- **Property crimes** are the most prevalent category
- The most common specific crime types include vehicle theft, burglary, and battery
- **Part 1 (Serious Crimes)** vs **Part 2 (Less Serious Crimes)** distribution shows significant variation

#### 2. **Geographic Patterns**
- Certain areas show significantly higher crime rates
- High-crime areas correlate with:
  - Higher population density
  - Lower median income
  - Higher area risk scores

#### 3. **Temporal Patterns**
- **Peak crime hours**: Late evening and afternoon periods
- **Weekly patterns**: Weekends may show different crime patterns than weekdays
- **Seasonal variations**: Certain months show higher crime rates
- **Year-over-year trends**: Evolution of crime patterns over time

#### 4. **Victim Demographics**
- Age distribution shows specific vulnerable groups
- Gender patterns vary by crime type
- Certain age groups are disproportionately affected

#### 5. **Weapon Involvement**
- Significant portion of crimes involve weapons
- Weapon types vary by crime category
- Violent crimes show higher weapon involvement rates

#### 6. **Correlations**
- Strong correlations between:
  - Area risk score and crime frequency
  - Population density and total crimes
  - Median income and certain crime types

#### 7. **Reporting Patterns**
- Average reporting delay varies by crime type
- Serious crimes tend to be reported more quickly

### Recommendations for Stakeholders

1. **Law Enforcement**: Focus resources on high-risk areas and peak crime hours
2. **Urban Planning**: Consider socioeconomic factors in crime prevention strategies
3. **Community Programs**: Target specific demographics and vulnerable groups
4. **Policy Makers**: Address underlying factors like income inequality and area development

### Next Steps

- Develop predictive models based on these insights
- Create interactive dashboards for real-time monitoring
- Conduct deeper analysis on specific crime types or areas
- Implement machine learning for crime prediction

---

## Conclusion

This exploratory data analysis has provided comprehensive insights into the crime data from 2020 to present. Through descriptive statistics, aggregations, time series analysis, correlation studies, and visualizations, we have uncovered important patterns that can guide decision-making and future analysis.

**All visualizations have been saved as PNG files for inclusion in the PDF report.**