# Documentation and Workflow Description

- Seasonality, hourly distribution (if timestamp data exists)

# 📄 Documentation and Workflow Description

This notebook focuses on exploring temporal crime patterns from the dataset `crime_dataset_india_2023.csv`, covering incidents reported between 2001 and 2014. The primary goal is to identify **seasonal**, **monthly**, and **hourly** trends in criminal activities, along with regional insights.

---

## 🔍 Analysis of Crime Data (2001–2014)

We will analyze crime data using the following temporal dimensions (if timestamp data is available):

- **Seasonality**: Identify how crime rates vary across different seasons.
- **Monthly Distribution**: Spot month-wise patterns in crimes.
- **Time of Day**: Understand crime frequency during morning, noon, evening, and night.
- **Hourly Trends**: Analyze hourly distribution of crimes (if possible).

### 🗂️ Dataset Used

**Filename**: `crime_dataset_india_2023.csv`

This dataset contains the following columns:

---

## 🧠 Planned Analyses

We aim to answer the following:

- What types of crimes are more common during **specific months** or **seasons**?
- Which **cities** or **states** report higher crime rates during particular **times of the day** (morning, noon, night)?
- Are there crime types that frequently occur at **night**?
- How does **seasonality** affect different crime types?

---

## ⏳ To-Do (For Future Work)

- 🔔 Analyze crime patterns around **festivals**.
- 🏖️ Study trends during **public holidays**.

In [None]:
from pathlib import Path
import pandas as pd

root = Path().resolve().parent
df_temporal_data = pd.read_csv(root / 'dataset/Detailed crime data from various cities in India for the year 2023/crime_dataset_india_2023.csv')

In [None]:
df_temporal_data.head()

In [None]:
# Month
# Time of the day
# Seasonsal
    # WHICH CRIME # WHICH CITY # WHICH STATE # TOTOAL CRIMES IN A SPECIFIC TIME (SEASON, TIME OF THE DAY (MORNING, NIGHT, NOON)) # TYPE OF CRIME NIGHT


## TO DO LATER
# FESTIVEAL # HOLIDAY 


In [None]:
# Columns
columns = list(df_temporal_data.columns)
columns

In [None]:
# Datatypes of columns
df_temporal_data.dtypes

In [None]:
# Removing unwanted columns
df_temporal_data = df_temporal_data.drop(columns=["Date Case Closed", "Case Closed", "Police Deployed", "Crime Code", "Weapon Used"], axis=1)
df_temporal_data

In [None]:
# Checking for null values
print(df_temporal_data.isnull().sum())

In [None]:
# Checking for duplicates
print(df_temporal_data.duplicated().sum())

In [None]:
# Cities in the dataset
df_temporal_data["City"].unique()

In [None]:
# Define the mapping of cities to states
city_to_state = {
    "Ahmedabad": "Gujarat",
    "Chennai": "Tamil Nadu",
    "Ludhiana": "Punjab",
    "Pune": "Maharashtra",
    "Delhi": "Delhi (National Capital Territory)",
    "Mumbai": "Maharashtra",
    "Surat": "Gujarat",
    "Visakhapatnam": "Andhra Pradesh",
    "Bangalore": "Karnataka",
    "Kolkata": "West Bengal",
    "Ghaziabad": "Uttar Pradesh",
    "Hyderabad": "Telangana",
    "Jaipur": "Rajasthan",
    "Lucknow": "Uttar Pradesh",
    "Bhopal": "Madhya Pradesh",
    "Patna": "Bihar",
    "Kanpur": "Uttar Pradesh",
    "Varanasi": "Uttar Pradesh",
    "Nagpur": "Maharashtra",
    "Meerut": "Uttar Pradesh",
    "Thane": "Maharashtra",
    "Indore": "Madhya Pradesh",
    "Rajkot": "Gujarat",
    "Vasai": "Maharashtra",
    "Agra": "Uttar Pradesh",
    "Kalyan": "Maharashtra",
    "Nashik": "Maharashtra",
    "Srinagar": "Jammu and Kashmir",
    "Faridabad": "Haryana"
}

# Map the cities to their respective states and create a new column
df_temporal_data['State'] = df_temporal_data['City'].map(city_to_state)

# Display the updated DataFrame
df_temporal_data.head()

In [None]:
# Ensure 'Time of Occurrence' is in datetime format with the correct format specified
df_temporal_data['Time of Occurrence'] = pd.to_datetime(
    df_temporal_data['Time of Occurrence'], 
    format='%d-%m-%Y %H:%M',  # Specify the correct format
    errors='coerce'  # Coerce invalid parsing to NaT (Not a Time)
)

# Check for rows where parsing failed
invalid_rows = df_temporal_data[df_temporal_data['Time of Occurrence'].isna()]
print("Rows with invalid 'Time of Occurrence':")
print(invalid_rows)

# Display the updated DataFrame
df_temporal_data.head()

In [None]:
df_temporal_data['Year'] = df_temporal_data['Time of Occurrence'].dt.year
df_temporal_data['Month'] = df_temporal_data['Time of Occurrence'].dt.month
df_temporal_data['Day'] = df_temporal_data['Time of Occurrence'].dt.day
df_temporal_data['Time'] = df_temporal_data['Time of Occurrence'].dt.time

In [None]:
# Extracting the year, month and time of Data of Occurrence
df_temporal_data['Year'] = pd.to_datetime(df_temporal_data['Date of Occurrence']).dt.year
df_temporal_data['Month'] = pd.to_datetime(df_temporal_data['Date of Occurrence']).dt.month
df_temporal_data['Day'] = pd.to_datetime(df_temporal_data['Date of Occurrence']).dt.day
df_temporal_data['Time'] = pd.to_datetime(df_temporal_data['Date of Occurrence']).dt.time

In [None]:
df_temporal_data

In [None]:
df_temporal_data.columns

In [None]:
# Unique Crime Types
print(df_temporal_data['Crime Domain'].unique())

print(df_temporal_data['Crime Description'].unique())

In [None]:
# 1. Extract the date (without time) as a separate column
df_temporal_data['DateOnly'] = df_temporal_data['Time of Occurrence'].dt.date

# 2. Group by 'DateOnly' instead of the full 'Time of Occurrence'
crimes_per_day = df_temporal_data.groupby('DateOnly').size().reset_index(name='Total Crimes')
print(crimes_per_day.head())

In [None]:
# Extract month from 'Time of Occurrence'
df_temporal_data['Month'] = df_temporal_data['Time of Occurrence'].dt.month

# Group by month
crimes_per_month = df_temporal_data.groupby('Month').size().reset_index(name='Total Crimes')
print(crimes_per_month.head(12))

In [None]:
# Ensure 'Year' column exists
df_temporal_data['Year'] = df_temporal_data['Time of Occurrence'].dt.year

# Total crimes recorded per year
crimes_per_year = df_temporal_data.groupby('Year').size().reset_index(name='Total Crimes')
print(crimes_per_year.head())

#### Visualisation

In [None]:
# Bar chart of crimes per month
plt.figure(figsize=(10, 6))
plt.bar(crimes_per_month['Month'], crimes_per_month['Total Crimes'], color='skyblue')
plt.title('Total Crimes per Month')
plt.xlabel('Month')
plt.ylabel('Total Crimes')
plt.xticks(range(1, 13))
plt.show()

In [None]:
# Line plot of crimes per year
plt.figure(figsize=(10, 6))
plt.plot(crimes_per_year['Year'], crimes_per_year['Total Crimes'], marker='o')
plt.title('Total Crimes per Year')
plt.xlabel('Year')
plt.ylabel('Total Crimes')
plt.show()

In [None]:
# Seasonality Analysis

import matplotlib.pyplot as plt
import seaborn as sns

# Define seasons based on month
def get_season(month):
    if month in [12, 1, 2]:
        return 'Winter'
    elif month in [3, 4, 5]:
        return 'Spring'
    elif month in [6, 7, 8]:
        return 'Summer'
    else:  # 9, 10, 11
        return 'Fall'

# Create a Season column
df_temporal_data['Season'] = df_temporal_data['Month'].apply(get_season)

# Group by season to get crime count
crimes_per_season = df_temporal_data.groupby('Season').size().reset_index(name='Total Crimes')

# Ensure proper season order
season_order = ['Winter', 'Spring', 'Summer', 'Fall']
crimes_per_season['Season'] = pd.Categorical(crimes_per_season['Season'], categories=season_order, ordered=True)
crimes_per_season = crimes_per_season.sort_values('Season')

# Plot
plt.figure(figsize=(10, 6))
plt.bar(crimes_per_season['Season'], crimes_per_season['Total Crimes'], color='skyblue')
plt.title('Total Crimes by Season')
plt.xlabel('Season')
plt.ylabel('Total Crimes')
plt.show()

In [None]:
# Time-of-Day Distribution

# Define time of day categories
def get_time_of_day(hour):
    if 5 <= hour < 12:
        return 'Morning'
    elif 12 <= hour < 17:
        return 'Afternoon'
    elif 17 <= hour < 21:
        return 'Evening'
    else:  # 21-4
        return 'Night'

# Extract hour from Time of Occurrence
df_temporal_data['Hour'] = df_temporal_data['Time of Occurrence'].dt.hour

# Create Time of Day column
df_temporal_data['TimeOfDay'] = df_temporal_data['Hour'].apply(get_time_of_day)

# Group by time of day
crimes_by_time_of_day = df_temporal_data.groupby('TimeOfDay').size().reset_index(name='Total Crimes')

# Ensure proper time of day order
time_order = ['Morning', 'Afternoon', 'Evening', 'Night']
crimes_by_time_of_day['TimeOfDay'] = pd.Categorical(crimes_by_time_of_day['TimeOfDay'], 
                                                   categories=time_order, ordered=True)
crimes_by_time_of_day = crimes_by_time_of_day.sort_values('TimeOfDay')

# Plot
plt.figure(figsize=(10, 6))
plt.bar(crimes_by_time_of_day['TimeOfDay'], crimes_by_time_of_day['Total Crimes'], color='skyblue')
plt.title('Crime Distribution by Time of Day')
plt.xlabel('Time of Day')
plt.ylabel('Total Crimes')
plt.show()

In [None]:
# Hourly Heatmap
# Extract day of week and hour for heatmap
df_temporal_data['DayOfWeek'] = df_temporal_data['Time of Occurrence'].dt.day_name()
df_temporal_data['Hour'] = df_temporal_data['Time of Occurrence'].dt.hour

# Create pivot table for heatmap
hourly_crimes = df_temporal_data.pivot_table(index='DayOfWeek', 
                                           columns='Hour', 
                                           values='Report Number', 
                                           aggfunc='count', 
                                           fill_value=0)

# Ensure days are in correct order
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
hourly_crimes = hourly_crimes.reindex(day_order)

# Create heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(hourly_crimes, cmap='YlOrRd', linewidths=.5, annot=False)
plt.title('Crime Frequency by Day and Hour')
plt.xlabel('Hour of Day')
plt.ylabel('Day of Week')
plt.show()

In [None]:
# Crime Types by Time
# Group by crime domain and time of day
crime_types_by_time = df_temporal_data.groupby(['Crime Domain', 'TimeOfDay']).size().unstack(fill_value=0)

# Plot stacked bar chart
plt.figure(figsize=(14, 8))
crime_types_by_time.plot(kind='bar', stacked=True, figsize=(14, 8))
plt.title('Crime Types by Time of Day')
plt.xlabel('Crime Domain')
plt.ylabel('Number of Crimes')
plt.xticks(rotation=45, ha='right')
plt.legend(title='Time of Day')
plt.tight_layout()
plt.show()

In [None]:
# Regional Comparisons
# Create pivot table for state comparisons across months
state_month_crimes = df_temporal_data.pivot_table(index='State', 
                                                columns='Month', 
                                                values='Report Number',
                                                aggfunc='count', 
                                                fill_value=0)

# Select top 10 states by total crime for clearer visualization
top_states = state_month_crimes.sum(axis=1).sort_values(ascending=False).head(10).index
state_month_top = state_month_crimes.loc[top_states]

# Plot heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(state_month_top, cmap='YlOrRd', annot=True, fmt='d')
plt.title('Crime Rates by State and Month (Top 10 States)')
plt.xlabel('Month')
plt.ylabel('State')
plt.show()

#### Geospatial 

In [None]:
# What types of crimes are more common during specific months or seasons?

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import plotly.express as px

# For crimes by month and crime type
# Create a pivot table of crime domains by month
crime_month_pivot = df_temporal_data.pivot_table(
    index='Month',
    columns='Crime Domain',
    aggfunc='size',
    fill_value=0
)

# Create a heatmap showing crime types across months
plt.figure(figsize=(14, 8))
sns.heatmap(crime_month_pivot, cmap='YlOrRd', annot=True, fmt='d')
plt.title('Crime Types Distribution by Month')
plt.xlabel('Crime Domain')
plt.ylabel('Month')
plt.show()

# For crimes by season and crime type
# Create a pivot table of crime domains by season
crime_season_pivot = df_temporal_data.pivot_table(
    index='Season', 
    columns='Crime Domain',
    aggfunc='size',
    fill_value=0
)

# Ensure proper season order
season_order = ['Winter', 'Spring', 'Summer', 'Fall']
crime_season_pivot = crime_season_pivot.reindex(season_order)

# Create a heatmap
plt.figure(figsize=(14, 8))
sns.heatmap(crime_season_pivot, cmap='YlOrRd', annot=True, fmt='d')
plt.title('Crime Types Distribution by Season')
plt.xlabel('Crime Domain')
plt.ylabel('Season')
plt.show()

# Interactive version using Plotly
crime_season_data = df_temporal_data.groupby(['Season', 'Crime Domain']).size().reset_index(name='Count')
fig_crime_season = px.bar(
    crime_season_data,
    x='Season',
    y='Count',
    color='Crime Domain',
    barmode='group',
    category_orders={'Season': season_order},
    title='Crime Types by Season'
)
fig_crime_season.update_layout(
    xaxis_title='Season',
    yaxis_title='Number of Crimes',
    legend_title='Crime Domain'
)
fig_crime_season.show()

In [None]:
#  Which cities or states report higher crime rates during particular times of the day?
# Create a pivot table for state by time of day
state_time_crimes = df_temporal_data.pivot_table(
    index='State',
    columns='TimeOfDay',
    values='Report Number',
    aggfunc='count',
    fill_value=0
)

# Select top 10 states by total crime
top_states = state_time_crimes.sum(axis=1).sort_values(ascending=False).head(10).index
state_time_top = state_time_crimes.loc[top_states]

# Plot heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(state_time_top, cmap='YlOrRd', annot=True, fmt='d')
plt.title('Crime Rates by State and Time of Day (Top 10 States)')
plt.xlabel('Time of Day')
plt.ylabel('State')
plt.show()

# Interactive version using Plotly
state_time_data = df_temporal_data.groupby(['State', 'TimeOfDay']).size().reset_index(name='Count')
top_states_data = state_time_data[state_time_data['State'].isin(top_states)]

fig_state_time = px.bar(
    top_states_data,
    x='State',
    y='Count',
    color='TimeOfDay',
    barmode='group',
    category_orders={'TimeOfDay': ['Morning', 'Afternoon', 'Evening', 'Night']},
    title='Crime Distribution by State and Time of Day (Top 10 States)'
)
fig_state_time.update_layout(
    xaxis_title='State',
    yaxis_title='Number of Crimes',
    legend_title='Time of Day'
)
fig_state_time.show()

In [None]:
# 3. Are there crime types that frequently occur at night?
# Filter for night crimes only
night_crimes = df_temporal_data[df_temporal_data['TimeOfDay'] == 'Night']

# Create a bar chart showing distribution of crime types at night
night_crime_counts = night_crimes.groupby('Crime Domain').size().sort_values(ascending=False)

plt.figure(figsize=(12, 6))
night_crime_counts.plot(kind='bar', color='darkblue')
plt.title('Crime Types Occurring at Night')
plt.xlabel('Crime Domain')
plt.ylabel('Number of Crimes')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

# Compare night crime proportion with other times
# Calculate percentage of each crime type occurring at night
crime_time_distribution = pd.crosstab(
    df_temporal_data['Crime Domain'], 
    df_temporal_data['TimeOfDay'], 
    normalize='index'
) * 100

# Sort by night percentage
crime_time_distribution = crime_time_distribution.sort_values('Night', ascending=False)

# Plot
plt.figure(figsize=(14, 8))
crime_time_distribution.plot(kind='bar', stacked=True)
plt.title('Percentage Distribution of Crimes by Time of Day')
plt.xlabel('Crime Domain')
plt.ylabel('Percentage')
plt.xticks(rotation=45, ha='right')
plt.legend(title='Time of Day')
plt.tight_layout()
plt.show()

# Interactive version using Plotly
night_crime_data = night_crimes.groupby('Crime Domain').size().reset_index(name='Count')
fig_night_crimes = px.bar(
    night_crime_data.sort_values('Count', ascending=False),
    x='Crime Domain',
    y='Count',
    color='Count',
    color_continuous_scale='Reds',
    title='Crime Types Occurring at Night'
)
fig_night_crimes.update_layout(
    xaxis_title='Crime Domain',
    yaxis_title='Number of Crimes',
    coloraxis_showscale=False
)
fig_night_crimes.show()

In [None]:
# 4. How does seasonality affect different crime types?
# Create a grouped bar chart to compare crime types across seasons
crime_season_data = df_temporal_data.groupby(['Crime Domain', 'Season']).size().reset_index(name='Count')

# Plot with Plotly for interactivity
fig_crime_by_season = px.bar(
    crime_season_data,
    x='Crime Domain',
    y='Count',
    color='Season',
    barmode='group',
    category_orders={'Season': ['Winter', 'Spring', 'Summer', 'Fall']},
    title='Crime Types by Season'
)
fig_crime_by_season.update_layout(
    xaxis_title='Crime Domain',
    yaxis_title='Number of Crimes',
    legend_title='Season',
    xaxis={'categoryorder': 'total descending'}
)
fig_crime_by_season.show()

# For specific crime types, show seasonal trends
top_crimes = df_temporal_data['Crime Domain'].value_counts().head(5).index.tolist()
top_crime_data = df_temporal_data[df_temporal_data['Crime Domain'].isin(top_crimes)]

# Group by crime type and season
top_crime_season = top_crime_data.groupby(['Crime Domain', 'Season']).size().reset_index(name='Count')

# Plot for top crimes
fig_top_crimes_season = px.line(
    top_crime_season,
    x='Season',
    y='Count',
    color='Crime Domain',
    markers=True,
    category_orders={'Season': ['Winter', 'Spring', 'Summer', 'Fall']},
    title='Seasonal Trends for Top Crime Types'
)
fig_top_crimes_season.update_layout(
    xaxis_title='Season',
    yaxis_title='Number of Crimes',
    legend_title='Crime Type'
)
fig_top_crimes_season.show()

In [None]:
import plotly.express as px
import geopandas as gpd

# You'd need to obtain an India states GeoJSON file
# Example code assuming you have one:

# Get state-level crime counts
state_crimes = df_temporal_data.groupby('State').size().reset_index(name='Total_Crimes')

# Assuming you have a geodataframe with state boundaries
# india_states = gpd.read_file('india_states.geojson')

# Merge crime data with geodataframe 
# state_map = india_states.merge(state_crimes, left_on='state_name', right_on='State')

# Create choropleth map
fig_state_crimes = px.choropleth(
    state_crimes,  # Replace with state_map if you have GeoJSON
    # geojson=state_map.geometry.__geo_interface__,  # Uncomment if you have GeoJSON
    # locations=state_map.index,  # Uncomment if you have GeoJSON
    locations='State',  # Use this if you're using built-in maps
    color='Total_Crimes',
    hover_name='State',
    scope='asia',  # Choose appropriate scope for India
    color_continuous_scale='OrRd',
    labels={'Total_Crimes': 'Total Crimes'}
)

fig_state_crimes.update_layout(
    title_text="Crime Distribution by State in India",
    margin={"r": 0, "t": 40, "l": 0, "b": 0}
)
fig_state_crimes.show()

In [None]:
import plotly.express as px
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt

# 1. Time of Day and Geographic Distribution of Crimes

# First, let's create a state-level crime map showing crimes by time of day
# Aggregate crimes by state and time of day
state_time_crime = df_temporal_data.groupby(['State', 'TimeOfDay']).size().reset_index(name='Crimes')

# Create separate datasets for each time of day
morning_crimes = state_time_crime[state_time_crime['TimeOfDay'] == 'Morning'].rename(columns={'Crimes': 'Morning_Crimes'})
afternoon_crimes = state_time_crime[state_time_crime['TimeOfDay'] == 'Afternoon'].rename(columns={'Crimes': 'Afternoon_Crimes'})
evening_crimes = state_time_crime[state_time_crime['TimeOfDay'] == 'Evening'].rename(columns={'Crimes': 'Evening_Crimes'})
night_crimes = state_time_crime[state_time_crime['TimeOfDay'] == 'Night'].rename(columns={'Crimes': 'Night_Crimes'})

# Merge with the GeoJSON data - first simplify to state level
india_states = gpd.read_file('/Users/ananthakrishnab/Desktop/Projects/Community Risk Profiling Using FIR Data/Utils/Geospatial_info/india_district.geojson')

# Aggregate to state level
india_states['STATE'] = india_states['NAME_1']
state_geo = india_states.dissolve(by='STATE').reset_index()

# Map state names to match your dataframe - create a mapping dictionary if needed
state_name_mapping = {
    # Add mappings as needed based on your data
    'Uttar Pradesh': 'Uttar Pradesh',
    'Maharashtra': 'Maharashtra',
    'Delhi': 'Delhi (National Capital Territory)'
    # Add more as needed
}

# Apply mapping if needed
# state_geo['STATE'] = state_geo['STATE'].map(lambda x: state_name_mapping.get(x, x))

# Create choropleth map for night crimes
night_state_map = state_geo.merge(night_crimes, left_on='STATE', right_on='State', how='left')
night_state_map['Night_Crimes'] = night_state_map['Night_Crimes'].fillna(0)

fig_night_state = px.choropleth(
    night_state_map,
    geojson=night_state_map.geometry.__geo_interface__,
    locations=night_state_map.index,
    color='Night_Crimes',
    hover_name='STATE',
    projection='mercator',
    color_continuous_scale='Plasma',
    labels={'Night_Crimes': 'Night Crimes'}
)

fig_night_state.update_geos(fitbounds="locations", visible=False)
fig_night_state.update_layout(
    title_text="🌙 Crime Distribution by State at Night",
    margin={"r": 0, "t": 40, "l": 0, "b": 0}
)
fig_night_state.show()

# 2. Seasonal Crime Patterns by State

# Aggregate crimes by state and season
state_season_crime = df_temporal_data.groupby(['State', 'Season']).size().reset_index(name='Crimes')

# Create separate datasets for each season
summer_crimes = state_season_crime[state_season_crime['Season'] == 'Summer'].rename(columns={'Crimes': 'Summer_Crimes'})
winter_crimes = state_season_crime[state_season_crime['Season'] == 'Winter'].rename(columns={'Crimes': 'Winter_Crimes'})

# Create seasonal maps
summer_state_map = state_geo.merge(summer_crimes, left_on='STATE', right_on='State', how='left')
summer_state_map['Summer_Crimes'] = summer_state_map['Summer_Crimes'].fillna(0)

fig_summer_state = px.choropleth(
    summer_state_map,
    geojson=summer_state_map.geometry.__geo_interface__,
    locations=summer_state_map.index,
    color='Summer_Crimes',
    hover_name='STATE',
    projection='mercator',
    color_continuous_scale='Oranges',
    labels={'Summer_Crimes': 'Summer Crimes'}
)

fig_summer_state.update_geos(fitbounds="locations", visible=False)
fig_summer_state.update_layout(
    title_text="☀️ Crime Distribution by State in Summer",
    margin={"r": 0, "t": 40, "l": 0, "b": 0}
)
fig_summer_state.show()

# 3. Crime Types at Night - Geographic Distribution

# Get the most common night crime type for each state
night_crime_types = df_temporal_data[df_temporal_data['TimeOfDay'] == 'Night'].groupby(['State', 'Crime Domain']).size().reset_index(name='Count')
night_crime_types = night_crime_types.sort_values(['State', 'Count'], ascending=[True, False])
most_common_night_crime = night_crime_types.groupby('State').first().reset_index()
most_common_night_crime = most_common_night_crime.rename(columns={'Crime Domain': 'Most_Common_Night_Crime'})

# Merge with state geo data
night_crime_state_map = state_geo.merge(
    most_common_night_crime,
    left_on='STATE',
    right_on='State',
    how='left'
)

# Create a categorical map of most common night crimes
fig_night_crime_types = px.choropleth(
    night_crime_state_map,
    geojson=night_crime_state_map.geometry.__geo_interface__,
    locations=night_crime_state_map.index,
    color='Most_Common_Night_Crime',
    hover_name='STATE',
    projection='mercator',
    color_discrete_sequence=px.colors.qualitative.Dark24,
    labels={'Most_Common_Night_Crime': 'Most Common Crime at Night'}
)

fig_night_crime_types.update_geos(fitbounds="locations", visible=False)
fig_night_crime_types.update_layout(
    title_text="🌃 Most Common Crime Type at Night by State",
    margin={"r": 0, "t": 40, "l": 0, "b": 0}
)
fig_night_crime_types.show()

# 4. Seasonal Variation in Crime Types - Geographic Distribution

# For each state, calculate the crime type with the biggest seasonal variation
# First, pivot the data to get crime counts by season and crime type for each state
seasonal_variation = df_temporal_data.pivot_table(
    index=['State', 'Crime Domain'],
    columns='Season',
    aggfunc='size',
    fill_value=0
).reset_index()

# Calculate the range of seasonal variation for each crime type in each state
seasonal_variation['Seasonal_Range'] = seasonal_variation[['Winter', 'Spring', 'Summer', 'Fall']].max(axis=1) - seasonal_variation[['Winter', 'Spring', 'Summer', 'Fall']].min(axis=1)

# Get the crime type with the highest seasonal variation for each state
highest_variation = seasonal_variation.sort_values(['State', 'Seasonal_Range'], ascending=[True, False])
highest_variation_crime = highest_variation.groupby('State').first().reset_index()
highest_variation_crime = highest_variation_crime[['State', 'Crime Domain', 'Seasonal_Range']]
highest_variation_crime = highest_variation_crime.rename(columns={'Crime Domain': 'Highest_Seasonal_Variation_Crime'})

# Merge with state geo data
seasonal_var_map = state_geo.merge(
    highest_variation_crime,
    left_on='STATE',
    right_on='State',
    how='left'
)

# Create a categorical map of crimes with highest seasonal variation
fig_seasonal_var = px.choropleth(
    seasonal_var_map,
    geojson=seasonal_var_map.geometry.__geo_interface__,
    locations=seasonal_var_map.index,
    color='Highest_Seasonal_Variation_Crime',
    hover_name='STATE',
    projection='mercator',
    color_discrete_sequence=px.colors.qualitative.Pastel,
    labels={'Highest_Seasonal_Variation_Crime': 'Crime with Highest Seasonal Variation'}
)

fig_seasonal_var.update_geos(fitbounds="locations", visible=False)
fig_seasonal_var.update_layout(
    title_text="🍂❄️☀️🌱 Crimes with Highest Seasonal Variation by State",
    margin={"r": 0, "t": 40, "l": 0, "b": 0}
)
fig_seasonal_var.show()