## 1. Load and Inspect the Data

In [None]:
import pandas as pd

# Load the dataset
suicide_df = pd.read_csv(r"C:\Users\HP\global_suicide_analysis\data\master.csv")
suicide_df.head()

In [None]:
# Shape of Data Set
print(f"Rows: {suicide_df.shape[0]}, Columns: {suicide_df.shape[1]}")

# Data info
suicide_df.info()

# Basic statistics for numeric columns
suicide_df.describe()

In [None]:
# Check for null values in each column
suicide_df.isnull().sum()

In [None]:
# Preview unique countries
print("Countries:", suicide_df['country'].nunique())
print(suicide_df['country'].unique()[:10])  # First 10 countries

# Preview years
print("Years:", suicide_df['year'].unique())

# Preview genders
print("Sex:", suicide_df['sex'].unique())

# Preview age groups
print("Age groups:", suicide_df['age'].unique())


### Data Cleaning

Handle missing/nulls, Standardize column names, Convert data types (e.g., year to datetime if needed)


In [None]:
# Check for incomplete or null values in each column
suicide_df.isnull().sum()

In [None]:
# Clean column names first
suicide_df.columns = (
suicide_df.columns
.str.strip()
.str.lower()
.str.replace(' ', '_')
.str.replace('/', '_')
.str.replace('(', '')
.str.replace(')', '')
)

# Now check the new column names
print(suicide_df.columns.tolist())


In [None]:
# Now these names will match
suicide_df = suicide_df.drop(columns=['country-year', 'hdi_for_year'])

In [None]:
suicide_df

In [None]:
# Ensure year is treated as an integer
suicide_df['year'] = suicide_df['year'].astype(int)

# Optionally, convert to datetime (if using for time-series plots)
# suicide_df['year'] = pd.to_datetime(suicide_df['year'], format='%Y') 

# Clean GDP columns (remove commas, convert to int)
suicide_df['gdp_for_year_$'] = suicide_df['gdp_for_year_$'].str.replace(',', '').astype(int)
suicide_df['gdp_per_capita_$'] = suicide_df['gdp_per_capita_$'].astype(int)

In [None]:
suicide_df

This line below creates a new column in your DataFrame called suicide_rate. It calculates the number of suicides per 100,000 people for each row (which represents a specific country, year, gender, and age group).

In [None]:
suicide_df['suicide_rate'] = (suicide_df['suicides_no'] / suicide_df['population']) * 100000


In [None]:
suicide_df

 ### Exploratory Visualizatiions

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style = 'whitegrid')

#### Global Suicide Rate Over Time (1985–2016)

In [None]:
# Group and calculate suicide rate per year
yearly_data = suicide_df.groupby('year')[['suicides_no', 'population']].sum().reset_index()
yearly_data['suicide_rate'] = (yearly_data['suicides_no'] / yearly_data['population']) * 100000

# Plot
plt.figure(figsize=(12, 6))
sns.lineplot(data=yearly_data, x='year', y='suicide_rate', marker='o')
plt.title('Global Suicide Rate Over Time (per 100,000 people)')
plt.xlabel('Year')
plt.ylabel('Suicide Rate')
plt.tight_layout()
plt.show()

The global suicide rate showed a gradual increase from the late 1980s, peaked around the mid 1990s.The global suicide rate has generally shown a decreasing trend from 1995 up to 2016. This could correlate with global economic downturns, increased reporting, or awareness

#### 3.2 Suicide Rate by Gender

In [None]:
# Group by year and sex
gender_data = suicide_df.groupby(['year', 'sex'])[['suicides_no', 'population']].sum().reset_index()
gender_data['suicide_rate'] = (gender_data['suicides_no'] / gender_data['population']) * 100000

# Plot
plt.figure(figsize=(12, 6))
sns.lineplot(data=gender_data, x='year', y='suicide_rate', hue='sex', marker='o')
plt.title('Suicide Rate by Gender Over Time')
plt.xlabel('Year')
plt.ylabel('Suicide Rate')
plt.legend(title='Gender')
plt.tight_layout()
plt.show()


Suicide rates were consistently higher for males across all years. In many countries, male suicide rates are 3–4 times higher than females, possibly due to social expectations, underdiagnosed mental health issues, or reluctance to seek help.

#### Average Suicide Rate by Gender

In [None]:
# Group by gender (sex)
gender_data = suicide_df.groupby('sex')[['suicides_no', 'population']].sum().reset_index()

# Calculate suicide rate per 100k
gender_data['suicide_rate'] = (gender_data['suicides_no'] / gender_data['population']) * 100000

plt.figure(figsize=(8, 5))
sns.barplot(data=gender_data, x='sex', y='suicide_rate', hue = 'sex', palette='Set2')

plt.title('Average Global Suicide Rate by Gender (1985–2016)')
plt.xlabel('Gender')
plt.ylabel('Suicide Rate per 100,000')
plt.tight_layout()
plt.show()

The suicide rate for males is significantly higher than for females, often 2x–4x depending on the country. This gender gap is consistent globally and may relate to social, psychological, and cultural factors.

#### 3.3 Suicide Rate by Age Group

In [None]:
# Group by age
age_data = suicide_df.groupby('age')[['suicides_no', 'population']].sum().reset_index()
age_data['suicide_rate'] = (age_data['suicides_no'] / age_data['population']) * 100000

# Sort for better visual
age_data = age_data.sort_values(by='suicide_rate', ascending=True)

# Plot
plt.figure(figsize=(10, 5))
sns.barplot(data=age_data, x='age', y='suicide_rate', hue= 'age', palette='magma')
plt.title('Average Suicide Rate by Age Group')
plt.xlabel('Age Group')
plt.ylabel('Suicide Rate')
plt.tight_layout()
plt.show()


The 55–74 and 75+ age groups had the highest suicide rates on average. This aligns with global studies linking suicide in older populations to loneliness, illness, and reduced social support systems.

#### Top 10 Countries with Highest Average Suicide Rates

In [None]:
# Group by country
country_data = suicide_df.groupby('country')[['suicides_no', 'population']].sum().reset_index()
country_data['suicide_rate'] = (country_data['suicides_no'] / country_data['population']) * 100000

# Sort top 10
top_countries = country_data.sort_values(by='suicide_rate', ascending=False).head(10)

# Plot
plt.figure(figsize=(12, 6))
sns.barplot(data=top_countries, y='country', x='suicide_rate', hue= 'country', palette='Reds_r')
plt.title('Top 10 Countries by Suicide Rate')
plt.xlabel('Suicide Rate per 100,000')
plt.ylabel('Country')
plt.tight_layout()
plt.show()


Countries like Lithuania, Russia, and South Korea consistently appeared at the top. Cultural, societal, and economic pressures may contribute to higher rates. Some may also have better suicide reporting systems, making stats appear higher.

#### GDP vs Suicide Rate (Correlation)

In [None]:
# Scatter plot
plt.figure(figsize=(10, 6))
sns.scatterplot(data=suicide_df, x='gdp_per_capita_$', y='suicide_rate', hue='sex', alpha=0.6)
plt.title('GDP per Capita vs Suicide Rate')
plt.xlabel('GDP per Capita ($)')
plt.ylabel('Suicide Rate per 100k')
plt.tight_layout()
plt.show()


There is no strong direct correlation between GDP per capita and suicide rate. High-income countries are not immune. This suggests that mental health is not solely dependent on wealth, access to healthcare, cultural stigma, and social support are more influential.

