## 2024 Global Crime Rate Analysis
**In this notebook, we will analyze the "2024 Crime Rate by Country" dataset using various visualization techniques.**

In [None]:
# Importing Library for Data Analysis and Manipulation
import pandas as pd

# Importing Libraries for Visualization
import matplotlib.pyplot as plt  # For basic plots 
import seaborn as sns  # For statistical data visualization 

# Import Library for handling geospatial data
import geopandas as gpd

# Import Library for clustering
from sklearn.cluster import KMeans

# Enabling Plotly in Kaggle Notebook 
from plotly.offline import init_notebook_mode  
init_notebook_mode(connected=True)  # Ensures Plotly visualizations work in Kaggle

# To suppress warnings
import warnings
warnings.filterwarnings("ignore")

## 1. Load and Preview Data
**Before going to analysis, we need to understand the structure of the dataset by loading and checking it.**

In [None]:
# Load the dataset
df = pd.read_csv('/kaggle/input/global-crime-rates-analysis-a-2024-overview/crime_rate_by_country_2024.csv')

# Preview the first and last five rows 
display(df.head())
display(df.tail())

In [None]:
# Display the structure and summary of the data
print("\nDataset Information:")
df.info()

# Check for missing values
print("\nMissing Values:")
print(df.isnull().sum())

**The dataset contains multiple columns including country, crime index, criminality score, market score, actors score, resilience score, and safety index.**<br>
**No missing values based on the checked result.**

## 2. Data formatting and Consistency
**We will standardize all float values to one decimal place using the round method to ensure consistency.**

In [None]:
# Round the data to one decimal place
df = df.round(1)

# Preview the first and last five rows 
display(df.head())
display(df.tail())

## 3. Exploratory Data Analysis
### 3.1 Bivariate Analysis
**Exploratory Data Analysis plays a key role in uncovering underlying patterns and relationships within the dataset. In this section, we will perform bivariate analyses, specifically focusing on the relationships between CrimeIndex and ResilienceScore, CriminalityScore and SafetyIndex, and MarketsScore and SafetyIndex. These analyses will help us gain deeper insights into how these variables interact and provide a clearer understanding of potential correlations in the data.**

In [None]:
# Scatter plot between CrimeIndex and ResilienceScore
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='CrimeIndex', y='ResilienceScore', hue='Country', palette='tab20', legend='full')
plt.title('CrimeIndex vs. ResilienceScore')
plt.xlabel('Crime Index')
plt.ylabel('Resilience Score')
plt.legend(title='Country', bbox_to_anchor=(0.5, -0.1), loc='upper center', fontsize='small', title_fontsize='large', markerscale=0.8, ncol=5)  # Horizontal legend
plt.tight_layout()
plt.show()

# Scatter plot between CriminalityScore and SafetyIndex
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='CriminalityScore', y='SafetyIndex', hue='Country', palette='tab20', legend='full')
plt.title('CriminalityScore vs. SafetyIndex')
plt.xlabel('Criminality Score')
plt.ylabel('Safety Index')
plt.legend(title='Country', bbox_to_anchor=(0.5, -0.1), loc='upper center', fontsize='small', title_fontsize='large', markerscale=0.8, ncol=5)  # Horizontal legend
plt.tight_layout()
plt.show()

# Scatter plot between MarketsScore and SafetyIndex
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='MarketsScore', y='SafetyIndex', hue='Country', palette='tab20', legend='full')
plt.title('MarketsScore vs. SafetyIndex')
plt.xlabel('Markets Score')
plt.ylabel('Safety Index')
plt.legend(title='Country', bbox_to_anchor=(0.5, -0.1), loc='upper center', fontsize='small', title_fontsize='large', markerscale=0.8, ncol=5)  # Horizontal legend
plt.tight_layout()
plt.show()

**The plots above provide insights into the relationships between various metrics:**
- **CrimeIndex vs. ResilienceScore**: This comparison helps us explore whether areas with higher crime rates tend to have lower resilience scores, which could indicate a potential vulnerability to social challenges or instability.
- **CriminalityScore vs. SafetyIndex**: By analyzing this relationship, we aim to determine if higher criminality scores are associated with lower safety perceptions, suggesting a possible link between crime rates and public safety concerns.
- **MarketsScore vs. SafetyIndex**: This analysis investigates whether regions with better market conditions (as indicated by the MarketsScore) are correlated with higher safety levels, which could imply that stronger economic conditions contribute to a safer environment.

### 3.2 Correlation Heatmap
**Correlation analysis can be performed between numerical columns. A heatmap helps visualize the strength of the relationships between different variables. A higher correlation indicates a stronger relationship, while a lower correlation (or negative correlation) indicates a weaker or inverse relationship**

In [None]:
# Compute correlation matrix
correlation_matrix = df.select_dtypes(include=['number']).corr() # Only use numeric columns

# Check the correlation matrix 
print("Correlation Matrix:")
print(correlation_matrix)

# Plot heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.1f', linewidths=0.5)
plt.title('Correlation Heatmap')
plt.show()

### 3.3 Clustering 
**Using crime index, criminality score, and safety index for clustering by KMeans algorithm.**

In [None]:
# Select relevant columns ('CrimeIndex', 'CriminalityScore', 'SafetyIndex') for clustering
X = df[['CrimeIndex', 'CriminalityScore', 'SafetyIndex']]

# Perform KMeans clustering with 3 clusters
kmeans = KMeans(n_clusters=3, random_state=0).fit(X)

# Add the cluster labels as a new column in the DataFrame
df['Cluster'] = kmeans.labels_

# Create a scatter plot with 'CrimeIndex' on the x-axis and 'SafetyIndex' on the y-axis
# The data points are colored based on their assigned cluster
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='CrimeIndex', y='SafetyIndex', hue='Cluster', palette='viridis')

# Set plot title, axis labels, and display the plot
plt.title('KMeans Clustering of Countries')
plt.xlabel('Crime Index')
plt.ylabel('Safety Index')
plt.show()

**The plot displays the countries grouped into three clusters, with each cluster represented by a distinct color. The KMeans algorithm has categorized the countries based on their levels of crime and safety.**

## 4. Merging the Data with a Geospatial World Map
**This process involves combining crime data with a world map, using country names as the key for merging. The result is a visual representation where each country is colored based on its respective crime rate, allowing for an easy comparison of crime rates across different regions of the world.**

In [None]:
# Select relevant columns for crime data: Country and CrimeInde
crime_data = df[['Country', 'CrimeIndex']]

# Load the world map shapefile using GeoPandas
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# Merge the world map data with the crime data based on country names
world = world.merge(crime_data, how="left", left_on="name", right_on="Country")

# Plot the world map, coloring by CrimeRate (CrimeIndex in your case)
fig, ax = plt.subplots(1, 1, figsize=(15, 10))
world.boundary.plot(ax=ax, linewidth=1)
world.plot(column='CrimeIndex', ax=ax, legend=True,
           legend_kwds={'label': "Crime Rate by Country",
                        'orientation': "horizontal"},
           cmap='coolwarm')

ax.set_title('2024 Global Crime Rate Distribution', fontsize=16)
plt.show()