# Collecte des données

* Démographiques

https://hub.worldpop.org/project/categories?id=3
https://population.un.org/wpp/downloads?folder=Standard%20Projections&group=Most%20used
https://dhsprogram.com/data/available-datasets.cfm

* Économiques

https://data.imf.org/en/Datasets#t=coveo117bcfc4&sort=%40idata_publication_date%20descending
https://www.oecd.org/en/data.html
https://unctadstat.unctad.org/EN/

* Sociales

https://www.who.int/data/gho
https://hdr.undp.org/data-center
https://genderdata.worldbank.org/en/home

In [43]:
import pandas as pd
import re

## Données démographiques et sanitaires

In [44]:
dem = pd.read_excel("./WPP2024_GEN_F01_DEMOGRAPHIC_INDICATORS_COMPACT.xlsx", 
                    sheet_name="Estimates", skiprows=16, index_col="Index")

Extraire les données du BENIN

In [45]:
def get_one_value_var(database):
    entete = database.columns
    one_val_var = []
    for col in entete:
        if len(database[col].value_counts()) == 1 or database[col].isna().sum() == len(database):
            one_val_var.append(col)
    return one_val_var

def extract_region_data(dataset, region: str):
    extrated_df = dataset[dataset['Region, subregion, country or area *'] == region]
    extrated_df = extrated_df.drop(get_one_value_var(extrated_df), axis = 1)
    return extrated_df

In [46]:
benin_data = extract_region_data(dem, "Benin")

Véririfer qu'il n'y a pas de données manquantes

In [47]:
benin_data.isna().sum().sort_values(ascending=False)

Year                                                                                              0
Total Population, as of 1 January (thousands)                                                     0
Total Population, as of 1 July (thousands)                                                        0
Male Population, as of 1 July (thousands)                                                         0
Female Population, as of 1 July (thousands)                                                       0
Population Density, as of 1 July (persons per square km)                                          0
Population Sex Ratio, as of 1 July (males per 100 females)                                        0
Median Age, as of 1 July (years)                                                                  0
Natural Change, Births minus Deaths (thousands)                                                   0
Rate of Natural Change (per 1,000 population)                                                     0


In [48]:
benin_data.sample(5)

Unnamed: 0_level_0,Year,"Total Population, as of 1 January (thousands)","Total Population, as of 1 July (thousands)","Male Population, as of 1 July (thousands)","Female Population, as of 1 July (thousands)","Population Density, as of 1 July (persons per square km)","Population Sex Ratio, as of 1 July (males per 100 females)","Median Age, as of 1 July (years)","Natural Change, Births minus Deaths (thousands)","Rate of Natural Change (per 1,000 population)",...,"Male Mortality before Age 60 (deaths under age 60 per 1,000 male live births)","Female Mortality before Age 60 (deaths under age 60 per 1,000 female live births)","Mortality between Age 15 and 50, both sexes (deaths under age 50 per 1,000 alive at age 15)","Male Mortality between Age 15 and 50 (deaths under age 50 per 1,000 males alive at age 15)","Female Mortality between Age 15 and 50 (deaths under age 50 per 1,000 females alive at age 15)","Mortality between Age 15 and 60, both sexes (deaths under age 60 per 1,000 alive at age 15)","Male Mortality between Age 15 and 60 (deaths under age 60 per 1,000 males alive at age 15)","Female Mortality between Age 15 and 60 (deaths under age 60 per 1,000 females alive at age 15)",Net Number of Migrants (thousands),"Net Migration Rate (per 1,000 population)"
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
5957,1982.0,4096.347,4156.596,2021.254,2135.343,36.862,94.657,16.082,122.747,29.531,...,559.263,462.27,231.943,269.63,196.824,349.968,403.589,300.638,-2.253,-0.542
5987,2012.0,10243.939,10397.657,5184.26,5213.397,92.211,99.441,16.874,306.174,29.447,...,388.168,358.574,174.209,180.658,167.838,276.688,290.16,263.748,1.26,0.121
5986,2011.0,9943.308,10093.623,5028.977,5064.646,89.514,99.296,16.797,296.346,29.36,...,388.437,359.726,173.519,179.634,167.494,275.77,288.789,263.306,4.283,0.424
5944,1969.0,2965.544,2998.222,1436.795,1561.428,26.589,92.018,17.771,70.158,23.399,...,626.532,553.576,271.531,300.916,245.132,400.396,443.566,360.824,-4.792,-1.598
5925,1950.0,2242.408,2250.476,1055.264,1195.212,19.958,88.291,22.824,20.734,9.213,...,711.278,650.586,321.131,346.981,296.913,460.797,499.812,424.744,-4.603,-2.045


## Données démographique par âge

In [49]:
dem_age = pd.read_excel("./WPP_POP_5-YEAR_AGE_GROUPS.xlsx", sheet_name="Estimates", skiprows=16, index_col="Index")

In [50]:
benin_age = extract_region_data(dem_age, "Benin")

In [51]:
benin_age.tail()

Unnamed: 0_level_0,Year,0-4,5-9,10-14,15-19,20-24,25-29,30-34,35-39,40-44,...,55-59,60-64,65-69,70-74,75-79,80-84,85-89,90-94,95-99,100+
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
5994,2019.0,2092.3305,1807.271,1557.247,1327.1545,1151.6015,980.569,832.724,693.7485,553.619,...,294.6915,217.8055,156.9945,111.1135,68.977,33.205,12.255,3.018,0.439,0.035
5995,2020.0,2116.112,1859.698,1599.0325,1365.8045,1177.034,1013.8375,856.8085,718.8575,574.4505,...,306.682,226.844,162.393,113.1305,71.357,34.4405,12.509,3.032,0.433,0.0335
5996,2021.0,2135.8595,1910.8,1640.477,1408.263,1206.1245,1043.758,881.233,742.482,597.6365,...,317.9935,236.4485,168.0235,115.467,73.373,35.6995,12.699,3.005,0.414,0.0305
5997,2022.0,2154.968,1957.322,1683.2985,1453.222,1236.6395,1073.106,905.375,765.456,622.3625,...,328.8635,246.8635,174.3795,118.432,75.0465,37.0095,13.0405,3.0435,0.4085,0.029
5998,2023.0,2176.4425,1996.616,1730.4715,1496.5645,1269.0645,1102.765,929.7095,788.971,647.879,...,339.823,257.763,181.4615,121.8785,76.7765,38.473,13.547,3.148,0.418,0.0295


## Fusion des deux tables de données

In [52]:
print(benin_data.shape)
print(benin_age.shape)


(74, 55)
(74, 22)


Fusion en utilisant l'année

In [53]:
benin_df = pd.merge(benin_age, benin_data, on="Year")

In [54]:
benin_df['ID'] = benin_df.index

benin_df = benin_df[[benin_df.columns[-1]] + list(benin_df.columns[:-1])]

Exporter les données en parquet ou Excel

In [55]:
# benin_df.to_excel("Donnees_ben_pop.xlsx", index=False)

In [56]:
benin_df.to_parquet("Donnees_ben_pop.parquet", engine="pyarrow")