Author: Niamh Hogan

# **Suicide Mortality in Ireland: Demographic Trends and EU Comparison (2012–2019)**

## **Project Overview**

In [41]:
# Imports

import pandas as pd
import numpy as np

## **Data Loading**  

In this section, I load the datasets required for the analysis. This includes Irish mortality data by age, sex, and county, as well as EU mortality and population data. I perform initial checks to inspect the structure and contents of each dataset, ensuring that the data has loaded correctly and is ready for cleaning and analysis.

**Irish Deaths by Year, Age, and Sex**

In this dataset, I load Irish mortality data published by the Central Statistics Office (CSO). The dataset contains annual death counts in Ireland, broken down by year, sex, cause of death, and age group at death, with unit of measurement & values representing overall counts of deaths. This dataset (VSA35) was downloaded as a CSV file from the [Central Statistics Office data portal](https://data.cso.ie/#) and allows me to analyse mortality patterns across different age groups and sexes over time.  

I load the CSV file *irishdata_year_age_sex_cso.csv* into a pandas DataFrame called *irish_age_sex_df* ([Pandas](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)).
I then view the first three rows to inspect the data and ensure it has loaded correctly:

In [42]:
irish_age_sex_df = pd.read_csv(
    "./data/irishdata_year_age_sex_cso.csv"
)

irish_age_sex_df.head()

Unnamed: 0,Statistic Label,Year,Sex,Cause of Death,Age Group at Death,UNIT,VALUE
0,Revised Deaths Occurring,2007,Both sexes,X60-X84 Intentional self-harm,Under 1 year,Number,
1,Revised Deaths Occurring,2007,Both sexes,X60-X84 Intentional self-harm,1 - 4 years,Number,
2,Revised Deaths Occurring,2007,Both sexes,X60-X84 Intentional self-harm,5 - 9 years,Number,
3,Revised Deaths Occurring,2007,Both sexes,X60-X84 Intentional self-harm,10 - 14 years,Number,8.0
4,Revised Deaths Occurring,2007,Both sexes,X60-X84 Intentional self-harm,15 - 19 years,Number,27.0


**Irish Deaths by County and Sex**

In this dataset, I load Irish mortality data published by the Central Statistics Office (CSO) for years 2015-2022, containing annual death counts broken down by year, sex, county, and cause of death. The values represent overall counts of deaths, allowing for comparison of mortality patterns across different counties and between the sexes. This dataset (VSA112) was downloaded as a CSV file from the [Central Statistics Office data portal](https://data.cso.ie/#) and supports geographic analysis of mortality trends within Ireland.  

I load the CSV file *irishdata_year_counties_sex_cso.csv* into a pandas DataFrame called *irish_counties_df* ([Pandas](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)).
I then view the first three rows to inspect the data and ensure it has loaded correctly:

In [43]:
irish_counties_df = pd.read_csv(
    "./data/irishdata_year_counties_sex_cso.csv"
)

irish_counties_df.head(3)

Unnamed: 0,Statistic Label,Year,Sex,County,Cause of Death,UNIT,VALUE
0,Deaths Occuring,2015,Both sexes,Ireland,Intentional self-harm (X60-X84),Number,500.0
1,Deaths Occuring,2015,Both sexes,Carlow County Council,Intentional self-harm (X60-X84),Number,7.0
2,Deaths Occuring,2015,Both sexes,Dublin City Council,Intentional self-harm (X60-X84),Number,54.0


**EU Deaths by Country and Sex**  

In this dataset, I load European mortality data from the World Health Organization (WHO), containing annual death counts (1969 - 2022) by country and sex. The values represent overall counts of deaths, allowing me to compare mortality patterns across different European countries and between the sexes. This dataset was downloaded as a CSV file from the [WHO data portal](https://gateway.euro.who.int/en/indicators/hfamdb_761-deaths-suicide-and-intentional-self-harm/#id=31291) and allows me to analyse trends in suicide mortality across Europe.  

I load the WHO EU deaths CSV file into a pandas DataFrame called eu_deaths_df ([Pandas](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)). I skip the first 30 rows because they contain metadata & notes rather than the actual data and I set *low_memory=False* to ensure pandas correctly infers column data types for the entire file, preventing mixed-type warnings ([Pandas](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)). I then view the first three rows to check the structure and confirm the data loaded correctly for analysis:

In [44]:
eu_deaths_df = pd.read_csv(
    "./data/who_eu_deaths.csv",
    skiprows=30,
    low_memory=False,
)

eu_deaths_df.head(3)

Unnamed: 0,COUNTRY,COUNTRY_GRP,AGE_GRP_LIST,SEX,SUBNATIONAL_MDB,YEAR,VALUE
0,ALB,,TOTAL,FEMALE,,1987.0,25.0
1,ALB,,TOTAL,FEMALE,,1988.0,22.0
2,ALB,,TOTAL,FEMALE,,1989.0,15.0


**EU Population**  

In this dataset, I load European population data for 2012–2022, containing total population counts by country, year, age group, and sex. The values represent overall population counts. I include this dataset to standardize EU death counts, which allows me to compare mortality across countries of different population sizes. By using population data, I can calculate rates or adjusted counts so that countries with larger populations do not appear to have disproportionately higher mortality ([Health Knowledge](https://www.healthknowledge.org.uk/e-learning/epidemiology/specialists/standardisation?utm_source=chatgpt.com)). This dataset was downloaded as a CSV file from [Eurostat](https://ec.europa.eu/eurostat/databrowser/view/demo_pjan/default/table) and supports fair cross-country comparisons of mortality trends.  

Below, I load the CSV file *eu_pop_2012_2022.csv* into a pandas DataFrame called *eu_pop_df* ([Pandas](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)).
I then view the first three rows to inspect the data and ensure it has loaded correctly:

In [45]:
eu_pop_df = pd.read_csv(
    "./data/eu_pop_2012_2022.csv"
)

eu_pop_df.head(3)

Unnamed: 0,Time,geo,Value,age,sex,unit
0,2012,AT,8408121,TOTAL,T,NR
1,2012,BE,11075889,TOTAL,T,NR
2,2012,BG,7327224,TOTAL,T,NR


## **Data Cleansing**

**Cleaning irish_age_sex_df**

In [46]:
# drop unnecessary columns for irish_age_sex_df
drop_col_list1 = ["Statistic Label", "Cause of Death", "UNIT"]

irish_age_sex_df.drop(columns=drop_col_list1, inplace=True)

# sanity check
print(irish_age_sex_df.head(3))

   Year         Sex Age Group at Death  VALUE
0  2007  Both sexes       Under 1 year    NaN
1  2007  Both sexes        1 - 4 years    NaN
2  2007  Both sexes        5 - 9 years    NaN


In [47]:
#irish_age_sex_df
print(irish_age_sex_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 960 entries, 0 to 959
Data columns (total 4 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Year                960 non-null    int64  
 1   Sex                 960 non-null    object 
 2   Age Group at Death  960 non-null    object 
 3   VALUE               782 non-null    float64
dtypes: float64(1), int64(1), object(2)
memory usage: 30.1+ KB
None


In [48]:
# Print Age Group at Death values
print(irish_age_sex_df["Age Group at Death"].unique())

['Under 1 year' '1 - 4 years' '5 - 9 years' '10 - 14 years'
 '15 - 19 years' '20 - 24 years' '25 - 29 years' '30 - 34 years'
 '35 - 39 years' '40 - 44 years' '45 - 49 years' '50 - 54 years'
 '55 - 59 years' '60 - 64 years' '65 - 69 years' '70 - 74 years'
 '75 - 79 years' '80 - 84 years' '85 years and over' 'All ages']


In [49]:
# Age Group at Death covert to int
def age_to_int(age_str):
    if age_str == 'All ages':
        return np.nan  
    if age_str == 'Under 1 year':
        return 0
    if 'and over' in age_str: 
        return int(age_str.split()[0])
    return int(age_str.split(' - ')[0])

# Apply to the column
irish_age_sex_df["Age Group at Death"] = irish_age_sex_df["Age Group at Death"].apply(age_to_int).astype('Int64')


In [50]:
# coverting All Ages to midpoint
all_ages_midpoint = 42

# Fill <NA> values with the midpoint
irish_age_sex_df["Age Group at Death"] = irish_age_sex_df["Age Group at Death"].fillna(all_ages_midpoint)

# Check result
print(irish_age_sex_df["Age Group at Death"].unique())

<IntegerArray>
[0, 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 42]
Length: 20, dtype: Int64


In [51]:
print(irish_age_sex_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 960 entries, 0 to 959
Data columns (total 4 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Year                960 non-null    int64  
 1   Sex                 960 non-null    object 
 2   Age Group at Death  960 non-null    Int64  
 3   VALUE               782 non-null    float64
dtypes: Int64(1), float64(1), int64(1), object(1)
memory usage: 31.1+ KB
None


I check for missing values in the dataset using pandas. This includes `NaN` values, empty strings, and strings containing only whitespace. I combine `.isna()` to detect `NaN`, equality checks for empty strings, and `.str.strip()` to identify whitespace-only entries ([pandas documentation](https://pandas.pydata.org/docs/)).

In [52]:
# Check for NaN, empty strings, or strings with only whitespace
irish_age_sex_df.apply(
    lambda col: col.isna().sum()
    + (col == "").sum()
    + col.astype(str).str.strip().eq("").sum()
)

Year                    0
Sex                     0
Age Group at Death      0
VALUE                 178
dtype: int64

The VALUE column contains 178 missing entries, which I remove because missing death counts cannot be included in statistical analysis. I convert `<NA>` values, pandas’ nullable missing type ([Pandas](https://pandas.pydata.org/docs/user_guide/missing_data.html)), to standard NaN to ensure all missing entries are properly recognized. Dropping rows with missing VALUE allows me to retain only valid death counts for analysis.

In [53]:
# Convert pandas <NA> to np.nan
irish_age_sex_df["VALUE"] = irish_age_sex_df["VALUE"].replace({pd.NA: np.nan})

# Now drop rows with missing VALUE
irish_age_sex_df = irish_age_sex_df.dropna(subset=["VALUE"])

In [54]:
# Confirm
irish_age_sex_df.isna().sum()

Year                  0
Sex                   0
Age Group at Death    0
VALUE                 0
dtype: int64

In [55]:
# Convert VALUE to int 
irish_age_sex_df["VALUE"] = irish_age_sex_df["VALUE"].astype("Int64")

I create a pivot table to summarize death counts by year, age group, and sex. Missing combinations, previously showing as `<NA>`, are replaced with 0 to indicate no deaths.

In [56]:
# Convert to pivot table for analysis
irish_deaths_by_age_sex_pivot = pd.pivot_table(
    irish_age_sex_df,
    values="VALUE",
    index="Year",
    columns=["Age Group at Death", "Sex"],
    aggfunc="sum",
)

# Replace <NA> / missing values with 0
irish_deaths_by_age_sex_pivot = irish_deaths_by_age_sex_pivot.fillna(0)

print(irish_deaths_by_age_sex_pivot.head(3))

# Save cleaned pivot table
irish_deaths_by_age_sex_pivot.to_csv(
    "./data/irish_deaths_by_age_sex_pivot.csv"
)

Age Group at Death         10                     15                     20  \
Sex                Both sexes Female Male Both sexes Female Male Both sexes   
Year                                                                          
2007                        8      1    7         27      7   20         69   
2008                        4      1    3         47     18   29         51   
2009                        5      3    2         38      7   31         48   

Age Group at Death                     25  ...   70         75              \
Sex                Female Male Both sexes  ... Male Both sexes Female Male   
Year                                       ...                               
2007                    9   60         69  ...   15          7      2    5   
2008                    9   42         72  ...    9          7      3    4   
2009                    7   41         70  ...    7          8      1    7   

Age Group at Death         80                     85    

**Cleaning irish_counties_df**

In [57]:
# drop unnecessary columns for irish_counties_df 
drop_col_list2= ["Statistic Label", "Cause of Death", "UNIT"]

irish_counties_df.drop(columns=drop_col_list2, inplace=True)

# sanity check
print(irish_counties_df.head(3))

   Year         Sex                 County  VALUE
0  2015  Both sexes                Ireland  500.0
1  2015  Both sexes  Carlow County Council    7.0
2  2015  Both sexes    Dublin City Council   54.0


In [58]:
#irish_counties_df
print(irish_counties_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Year    768 non-null    int64  
 1   Sex     768 non-null    object 
 2   County  768 non-null    object 
 3   VALUE   745 non-null    float64
dtypes: float64(1), int64(1), object(2)
memory usage: 24.1+ KB
None


In [59]:
# Value converted to int
irish_counties_df["VALUE"] = irish_counties_df["VALUE"].astype("Int64")

print(irish_counties_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Year    768 non-null    int64 
 1   Sex     768 non-null    object
 2   County  768 non-null    object
 3   VALUE   745 non-null    Int64 
dtypes: Int64(1), int64(1), object(2)
memory usage: 24.9+ KB
None


I check for missing values in the dataset using pandas. This includes `NaN` values, empty strings, and strings containing only whitespace. I combine `.isna()` to detect `NaN`, equality checks for empty strings, and `.str.strip()` to identify whitespace-only entries ([pandas documentation](https://pandas.pydata.org/docs/)).

In [60]:
# Check for NaN, empty strings, or strings with only whitespace
irish_counties_df.apply(
    lambda col: col.isna().sum()
    + (col == "").sum()
    + col.astype(str).str.strip().eq("").sum()
)

Year       0
Sex        0
County     0
VALUE     23
dtype: int64

The VALUE column has 23 missing entries. I drop these rows because missing death counts cannot be included in statistical analysis.

In [61]:
# Drop values
irish_counties_df = irish_counties_df.dropna(subset=["VALUE"])

irish_counties_df.apply(
    lambda col: col.isna().sum()
    + (col == "").sum()
    + col.astype(str).str.strip().eq("").sum()
)

Year      0
Sex       0
County    0
VALUE     0
dtype: int64

In [62]:
# Convert to pivot table for analysis
irish_county_deaths_pivot = pd.pivot_table(
    irish_counties_df,
    values="VALUE",
    index="Year",
    columns=["County", "Sex"],
    aggfunc="sum",
)

print(irish_county_deaths_pivot.head(3))

# Save cleaned pivot table
irish_county_deaths_pivot.to_csv(
    "./data/irish_county_deaths_pivot.csv"
)


County Carlow County Council             Cavan County Council              \
Sex               Both sexes Female Male           Both sexes Female Male   
Year                                                                        
2015                       7      3    4                    7      1    6   
2016                       9      1    8                   16      5   11   
2017                       8      3    5                   12      2   10   

County Clare County Council             Cork City Council  ...  \
Sex              Both sexes Female Male        Both sexes  ...   
Year                                                       ...   
2015                     17      3   14                14  ...   
2016                     15      3   12                24  ...   
2017                     20      2   18                13  ...   

County Waterford City & County Council Westmeath County Council              \
Sex                               Male               Both sex

**Cleaning eu_deaths_df**

In [63]:
# drop unnecessary columns for eu_deaths_df
drop_col_list3= ["COUNTRY_GRP", "AGE_GRP_LIST", "SUBNATIONAL_MDB"]

eu_deaths_df.drop(columns=drop_col_list3, inplace=True)

print(eu_deaths_df.head(3))

  COUNTRY     SEX    YEAR  VALUE
0     ALB  FEMALE  1987.0   25.0
1     ALB  FEMALE  1988.0   22.0
2     ALB  FEMALE  1989.0   15.0


In [64]:
# Keep only rows for both sexes
eu_deaths_df = eu_deaths_df[eu_deaths_df["SEX"] == "ALL"]

print(eu_deaths_df.head(3))

     COUNTRY  SEX    YEAR  VALUE
4011     ALB  ALL  1987.0   73.0
4012     ALB  ALL  1988.0   63.0
4013     ALB  ALL  1989.0   68.0


In [65]:
# EU member state variable
eu_members = [
    "AUT", "BEL", "BGR", "HRV", "CYP", "CZE", "DNK", "EST", "FIN", "FRA",
    "DEU", "GRC", "HUN", "IRL", "ITA", "LVA", "LTU", "LUX", "MLT", "NLD",
    "POL", "PRT", "ROU", "SVK", "SVN", "ESP", "SWE"
]

# Drop non-EU member states 
eu_deaths_df = eu_deaths_df[eu_deaths_df["COUNTRY"].isin(eu_members)]

# Print countries alphabetically
countries = sorted(eu_deaths_df["COUNTRY"].unique())
print(countries)

['AUT', 'BEL', 'BGR', 'CYP', 'CZE', 'DEU', 'DNK', 'ESP', 'EST', 'FIN', 'FRA', 'GRC', 'HRV', 'HUN', 'IRL', 'ITA', 'LTU', 'LUX', 'LVA', 'MLT', 'NLD', 'POL', 'PRT', 'ROU', 'SVK', 'SVN', 'SWE']


I filter the dataset to include only the years 2012–2016 because earlier and later years contain too much missing data for reliable analysis. Limiting the range ensures that the mortality rates I calculate are based on complete and comparable data across countries.

In [66]:
# Keep only rows where YEAR is between 2012 and 2016
eu_deaths_df = eu_deaths_df[
    (eu_deaths_df["YEAR"] >= 2012) & (eu_deaths_df["YEAR"] <= 2016)
]

eu_deaths_df["YEAR"].unique()

array([2012., 2013., 2014., 2015., 2016.])

In [67]:
#eu_deaths_df
print(eu_deaths_df.info())

<class 'pandas.core.frame.DataFrame'>
Index: 129 entries, 4121 to 5875
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   COUNTRY  129 non-null    object 
 1   SEX      129 non-null    object 
 2   YEAR     129 non-null    float64
 3   VALUE    129 non-null    float64
dtypes: float64(2), object(2)
memory usage: 5.0+ KB
None


In [68]:
# Convert VALUE to int
eu_deaths_df["VALUE"] = eu_deaths_df["VALUE"].astype("Int64")

# Convert YEAR to int
eu_deaths_df["YEAR"] = eu_deaths_df["YEAR"].astype("Int64")

print(eu_deaths_df.info())

<class 'pandas.core.frame.DataFrame'>
Index: 129 entries, 4121 to 5875
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   COUNTRY  129 non-null    object
 1   SEX      129 non-null    object
 2   YEAR     129 non-null    Int64 
 3   VALUE    129 non-null    Int64 
dtypes: Int64(2), object(2)
memory usage: 5.3+ KB
None


In [69]:
# Check for NaN, empty strings, or strings with only whitespace
eu_deaths_df.apply(
    lambda col: col.isna().sum()
    + (col == "").sum()
    + col.astype(str).str.strip().eq("").sum()
)

COUNTRY    0
SEX        0
YEAR       0
VALUE      0
dtype: int64

**Cleaning eu_pop_df**

In [70]:
# drop unnecessary columns for eu_pop_df
drop_col_list4= ["age", "sex", "unit"]

eu_pop_df.drop(columns=drop_col_list4, inplace=True)

# sanity check
print(eu_pop_df.head(3)) 

   Time geo     Value
0  2012  AT   8408121
1  2012  BE  11075889
2  2012  BG   7327224


In [71]:
# Keep only rows where YEAR is between 2012 and 2022
eu_pop_df = eu_pop_df[(eu_pop_df["Time"] >= 2012) & (eu_pop_df["Time"] <= 2016)]

eu_pop_df["Time"].unique()

array([2012, 2013, 2014, 2015, 2016])

In [72]:
print(eu_pop_df.info())

<class 'pandas.core.frame.DataFrame'>
Index: 135 entries, 0 to 134
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Time    135 non-null    int64 
 1   geo     135 non-null    object
 2   Value   135 non-null    int64 
dtypes: int64(2), object(1)
memory usage: 4.2+ KB
None


**Standardizing EU Mortality Data for cross country analysis**  

I merge the EU deaths dataset with population data to allow standardization of mortality counts. 
For each country and year, I calculate the mortality rate per 100,000 population by dividing the number of deaths by the population and multiplying by 100,000. 
This standardization ensures that differences in country population sizes do not distort comparisons and allows for meaningful cross-country analysis.

In [73]:
# Rename columns in population data for clarity
eu_pop_df.rename(
    columns={"geo": "COUNTRY", "Time": "YEAR", "Value": "POPULATION"},
    inplace=True,
)

In [74]:
# print all country codes for both datasets
print(sorted(eu_deaths_df["COUNTRY"].unique()))
print(sorted(eu_pop_df["COUNTRY"].unique()))

['AUT', 'BEL', 'BGR', 'CYP', 'CZE', 'DEU', 'DNK', 'ESP', 'EST', 'FIN', 'FRA', 'GRC', 'HRV', 'HUN', 'IRL', 'ITA', 'LTU', 'LUX', 'LVA', 'MLT', 'NLD', 'POL', 'PRT', 'ROU', 'SVK', 'SVN', 'SWE']
['AT', 'BE', 'BG', 'CY', 'CZ', 'DE', 'DK', 'EE', 'EL', 'ES', 'FI', 'FR', 'HR', 'HU', 'IE', 'IT', 'LT', 'LU', 'LV', 'MT', 'NL', 'PL', 'PT', 'RO', 'SE', 'SI', 'SK']


In [75]:
# Map from 3-letter codes (eu_deaths_df) to 2-letter codes (eu_pop_df)
country_code_map = {
    "AUT": "AT", "BEL": "BE", "BGR": "BG", "CYP": "CY", "CZE": "CZ",
    "DEU": "DE", "DNK": "DK", "ESP": "ES", "EST": "EE", "FIN": "FI",
    "FRA": "FR", "GRC": "EL", "HRV": "HR", "HUN": "HU", "IRL": "IE",
    "ITA": "IT", "LTU": "LT", "LUX": "LU", "LVA": "LV", "MLT": "MT",
    "NLD": "NL", "POL": "PL", "PRT": "PT", "ROU": "RO", "SVK": "SK",
    "SVN": "SI", "SWE": "SE"
}

In [76]:
# Convert COUNTRY column in eu_deaths_df to 2-letter codes
eu_deaths_df["COUNTRY"] = eu_deaths_df["COUNTRY"].map(country_code_map)

In [77]:
# Merge EU deaths with population data on COUNTRY and YEAR
eu_merged_df = eu_deaths_df.merge(
    eu_pop_df[["COUNTRY", "YEAR", "POPULATION"]],
    on=["COUNTRY", "YEAR"],
    how="left",
)

# Calculate mortality rate per 100,000 population
eu_merged_df["MORTALITY_RATE"] = (
    eu_merged_df["VALUE"] / eu_merged_df["POPULATION"]
) * 100000

# Optional: check the first few rows
eu_merged_df.head(10) 

Unnamed: 0,COUNTRY,SEX,YEAR,VALUE,POPULATION,MORTALITY_RATE
0,AT,ALL,2012,1275,8408121,15.163911
1,AT,ALL,2013,1291,8451860,15.274744
2,AT,ALL,2014,1313,8507786,15.432922
3,AT,ALL,2015,1249,8584926,14.548757
4,AT,ALL,2016,1198,8700471,13.769369
5,BE,ALL,2012,2023,11075889,18.2649
6,BE,ALL,2013,1895,11137974,17.013866
7,BE,ALL,2014,1898,11180840,16.975469
8,BE,ALL,2015,1866,11237274,16.605451
9,BE,ALL,2016,1905,11311117,16.841838


In [78]:
# Convert merged EU dataset to pivot table
eu_mortality_pivot = pd.pivot_table(
    eu_merged_df,
    values="MORTALITY_RATE",
    index="YEAR",
    columns="COUNTRY",
)

# Optional: inspect the first few rows
eu_mortality_pivot.head(10)

# Save pivot table to CSV
eu_mortality_pivot.to_csv("./data/eu_mortality_rate_2012_2016.csv")

# END