This jupyter notebook analyses the differences between the sexes by age in Ireland.
- Weighted mean age (by sex)
- The difference between the sexes by age

In [None]:
# Importing the pandas library
import pandas as pd

In [None]:
url = "https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.ReadDataset/FY006A/CSV/1.0/en"
df = pd.read_csv(url)

In [None]:
# Get the list of column headers
df.columns


In [None]:
# Get the list of column headers
headers = df.columns.tolist()
headers

For calculation we need three columns:
- sex(male/female)
- single Year of Age
- value

In [None]:
# Dropping unnecessary columns
drop_columns = [
    'STATISTIC',
    'Statistic Label',
    'TLIST(A1)',
    'CensusYear',
    'C02199V02655',
    'C02076V03371',
    'C03789V04537',
    'Administrative Counties',
    'UNIT'
]

# Removing the specified columns from the DataFrame
df.drop(columns=drop_columns, inplace=True)

# Get the list of column headers after dropping unnecessary columns
df.columns


In [None]:
# Filtering out rows where "Single Year of Age" is "All ages"
df = df[df["Single Year of Age"] != "All ages"]

In [None]:
# Output what we have in the 'Single Year of Age' column
df["Single Year of Age"].unique()

Getting our data to numerical view.

In [None]:
df['Single Year of Age'] = df['Single Year of Age'].str.replace('Under 1 year', '0')
df['Single Year of Age'] = df['Single Year of Age'].str.replace('\D', '', regex=True)


Choosing just males and females. Ignoring Both sexes

In [None]:
df = df[df["Sex"] != "Both sexes"]
df["Sex"].unique()

Fixing table data, converting all data to numeric

In [37]:
df["Single Year of Age"] = pd.to_numeric(df["Single Year of Age"], errors="coerce")
df["VALUE"] = pd.to_numeric(df["VALUE"], errors="coerce")

Calculation for male

In [None]:
df_male = df[df["Sex"] == "Male"]
total_age = (df_male["Single Year of Age"] * df_male["VALUE"]).sum()
total_population = df_male["VALUE"].sum()
weighted_mean_male = total_age / total_population

print(weighted_mean_male)

37.7394477371039


In [42]:
df_female = df[df["Sex"] == "Female"]
total_age = (df_female["Single Year of Age"] * df_female["VALUE"]).sum()
total_population = df_female["VALUE"].sum()
weighted_mean_female = total_age / total_population

print(weighted_mean_female)

38.9397958987787


Difference between ages

In [44]:
age_difference = weighted_mean_female - weighted_mean_male
print(age_difference)


1.2003481616747962
