# Population Analysis ( Ireland )

*Author : Maroua EL imame*

This notebook 'assignment05-population.ipynb' is split into 3 parts : 
- 1st part :
- 2nd part :
- 3rd part :

The purpose is to analyse the differences between the sexes by age in Ireland.  

Weighted mean age (by sex)  
The difference between the sexes by age  



## Part I : 
___

### 1.1 Retrieving DATA 


In [1]:
# import python libraries/packages 
import pandas as pd

In [2]:
# read in the data from the CSO url below
url = "https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.ReadDataset/FY006A/CSV/1.0/en"
df = pd.read_csv(url)
df.tail((7))

Unnamed: 0,STATISTIC,Statistic Label,TLIST(A1),CensusYear,C02199V02655,Sex,C02076V03371,Single Year of Age,C03789V04537,Administrative Counties,UNIT,VALUE
9785,FY006AC01,Population,2022,2022,2,Female,650,100 years and over,2ae19629-148e-13a3-e055-000000000001,Leitrim County Council,Number,4
9786,FY006AC01,Population,2022,2022,2,Female,650,100 years and over,2ae19629-1493-13a3-e055-000000000001,Mayo County Council,Number,25
9787,FY006AC01,Population,2022,2022,2,Female,650,100 years and over,2ae19629-1497-13a3-e055-000000000001,Roscommon County Council,Number,7
9788,FY006AC01,Population,2022,2022,2,Female,650,100 years and over,2ae19629-1498-13a3-e055-000000000001,Sligo County Council,Number,9
9789,FY006AC01,Population,2022,2022,2,Female,650,100 years and over,2ae19629-149d-13a3-e055-000000000001,Cavan County Council,Number,12
9790,FY006AC01,Population,2022,2022,2,Female,650,100 years and over,2ae19629-14a4-13a3-e055-000000000001,Donegal County Council,Number,31
9791,FY006AC01,Population,2022,2022,2,Female,650,100 years and over,2ae19629-1495-13a3-e055-000000000001,Monaghan County Council,Number,7


In [3]:
# show column names as a list
headers = df.columns.tolist()
headers

['STATISTIC',
 'Statistic Label',
 'TLIST(A1)',
 'CensusYear',
 'C02199V02655',
 'Sex',
 'C02076V03371',
 'Single Year of Age',
 'C03789V04537',
 'Administrative Counties',
 'UNIT',
 'VALUE']

### 1.2 Cleaning DATA

In [4]:
# drop/remove columns that are not relevant to this analysis and store them into drop_col_list variable. 
drop_col_list = ['STATISTIC', 'Statistic Label','TLIST(A1)','CensusYear','C02199V02655','C02076V03371','C03789V04537','UNIT']
df.drop(columns=drop_col_list, inplace=True)

# show the new dataframe with relevant columns
df.head()


Unnamed: 0,Sex,Single Year of Age,Administrative Counties,VALUE
0,Both sexes,All ages,Ireland,5149139
1,Both sexes,All ages,Carlow County Council,61968
2,Both sexes,All ages,Dublin City Council,592713
3,Both sexes,All ages,Dún Laoghaire Rathdown County Council,233860
4,Both sexes,All ages,Fingal County Council,330506


In [5]:
# 'Sex' column shows Both sexes and 'Single year of age' column shows all ages, I removed this inclusive data as it's irrelevant to thsi analysis. 
df = df[df["Single Year of Age"] != "All ages"]
df = df[df["Sex"] != "Both sexes"]

# shwo the new dataframe with relevant data
df.head((7))

Unnamed: 0,Sex,Single Year of Age,Administrative Counties,VALUE
3296,Male,Under 1 year,Ireland,29610
3297,Male,Under 1 year,Carlow County Council,346
3298,Male,Under 1 year,Dublin City Council,3188
3299,Male,Under 1 year,Dún Laoghaire Rathdown County Council,1269
3300,Male,Under 1 year,Fingal County Council,2059
3301,Male,Under 1 year,South Dublin County Council,1855
3302,Male,Under 1 year,Kildare County Council,1550


In [6]:
# replace 'Under 1 year' by '0'
df['Single Year of Age'] = df['Single Year of Age'].str.replace('Under 1 year', '0')
df['Single Year of Age'] = df['Single Year of Age'].str.replace('\D', '', regex=True)

  df['Single Year of Age'] = df['Single Year of Age'].str.replace('\D', '', regex=True)


In [7]:
# show the updated version of df 
df.head((7))

Unnamed: 0,Sex,Single Year of Age,Administrative Counties,VALUE
3296,Male,0,Ireland,29610
3297,Male,0,Carlow County Council,346
3298,Male,0,Dublin City Council,3188
3299,Male,0,Dún Laoghaire Rathdown County Council,1269
3300,Male,0,Fingal County Council,2059
3301,Male,0,South Dublin County Council,1855
3302,Male,0,Kildare County Council,1550


In [8]:
# show df infos and check Dtype. Checking Dtype allows me to check the column is in the right format so I could use it correctly in analyzing data.
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 6464 entries, 3296 to 9791
Data columns (total 4 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   Sex                      6464 non-null   object
 1   Single Year of Age       6464 non-null   object
 2   Administrative Counties  6464 non-null   object
 3   VALUE                    6464 non-null   int64 
dtypes: int64(1), object(3)
memory usage: 252.5+ KB


In [9]:
# Dtype of 'Single year of age' shows as object, sincee age is a digit, I need to change its type to ineger/numeric. 
df['Single Year of Age']=df['Single Year of Age'].astype('int64')
# show df updated info
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 6464 entries, 3296 to 9791
Data columns (total 4 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   Sex                      6464 non-null   object
 1   Single Year of Age       6464 non-null   int64 
 2   Administrative Counties  6464 non-null   object
 3   VALUE                    6464 non-null   int64 
dtypes: int64(2), object(2)
memory usage: 252.5+ KB


In [10]:
# convert the df into a pivot table. This part does not need to look at the regions, I will excude it from the pivot table.
df_anal = pd.pivot_table(df, 'VALUE',"Single Year of Age","Sex")
# show pivot table head 
df_anal.head((7))

Sex,Female,Male
Single Year of Age,Unnamed: 1_level_1,Unnamed: 2_level_1
0,1761.625,1850.625
1,1721.5625,1804.6875
2,1810.875,1889.75
3,1842.6875,1937.5625
4,1863.6875,1980.375
5,1958.875,2042.75
6,2038.875,2130.75


### 1.3 Analyzing data  

Weighted Mean = **Σ**  (age × population_at_age)    ÷    **Σ**  (population_at_age)    

In [11]:
# assign a variable to columns names
headers = list(df_anal.columns)
headers

['Female', 'Male']

Calculate the weighted mean age for Females 

In [12]:
# sum of all female population
females_count = df_anal['Female'].sum()
females_count

162786.875

In [13]:
# cumulative ages of female population :  multiply each age by the female number at that age, then sum
cumages_female = df_anal['Female'].mul(df_anal.index, axis=0).sum()
cumages_female

6338887.6875

In [14]:
# Weighted mean of male population 
weighted_mean_female = cumages_female/females_count
weighted_mean_female

38.9397958987787

Calculate the weighted mean age for Males 

In [15]:
# sum of all male population
males_count = df_anal['Male'].sum()
males_count

159034.3125

In [16]:
# multiply each age by the male count at that age, then sum
cumages_male = df_anal['Male'].mul(df_anal.index, axis=0).sum()
cumages_male

6001867.125

In [17]:
# Weighted mean of male population 
weighted_mean_male = cumages_male/males_count
weighted_mean_male


37.7394477371039

Overall difference between total males and totale females 

In [18]:
# calculate the difference  sum of population of Females and Males 
difference_by_age = females_count - males_count
# show the difference. Sicne the result is positive, we conclude that female population exceeds male population by 3752.5625
difference_by_age

3752.5625

### Difference between Females and Males by age 

In [19]:
# calculate the difference between Females and Males by age then assign it to a new column where the result is pro=inted 
df_anal['difference_females_males'] = df_anal['Female'] - df_anal['Male']  

In [20]:
print ( "Weighted mean of female population in Ireland is  : ", weighted_mean_female)
print ("Weighted mean of male population in Ireland is    : ",weighted_mean_male )
print (df_anal['difference_females_males'].head())
print (df_anal['difference_females_males'].tail())

Weighted mean of female population in Ireland is  :  38.9397958987787
Weighted mean of male population in Ireland is    :  37.7394477371039
Single Year of Age
0    -89.0000
1    -83.1250
2    -78.8750
3    -94.8750
4   -116.6875
Name: difference_females_males, dtype: float64
Single Year of Age
96     39.3125
97     32.1875
98     22.6250
99     14.4375
100    26.8750
Name: difference_females_males, dtype: float64


The difference between 'Females' and 'Males' by age explained :  
A negative value indicates that the female population is less than male population at that age group.  