[List of Belgian provinces by life expectancy](https://en.wikipedia.org/wiki/List_of_Belgian_provinces_by_life_expectancy) 
<i>([Copy saved in my private space](https://en.wikipedia.org/wiki/User:Lady3mlnm/List_of_Belgian_provinces_by_life_expectancy))</i>/ 
[Продолжительность жизни в провинциях Бельгии](https://ru.wikipedia.org/wiki/Продолжительность_жизни_в_провинциях_Бельгии)<br>
Data source: "[For a healthy Belgium](https://www.healthybelgium.be/en/health-status/life-expectancy-and-quality-of-life/life-expectancy)", [Statbel](https://statbel.fgov.be/en/themes/population/mortality-life-expectancy-and-causes-death/life-expectancy-and-life-tables)<br>
[MapChart](https://www.mapchart.net/belgium.html)

In [3]:
import pandas as pd
import math
import re
from collections import namedtuple
import json

import sys
sys.path.append("..")
import mal_moduls_private.mal_total as mal

In [4]:
WRITE_TABLES_TO_FILES = False

In [5]:
# Load data for provinces for 2020-2022 (this does not include data for the capital and country as a whole).
# Since in the "total"-CSV-file some provinces are named differently from other places, this case is required special handling.
df_total = pd.read_csv('data/provinces_total_2020-2022.csv', index_col='Province', decimal=',')
df_male = pd.read_csv('data/provinces_men_2020-2022.csv', index_col='Category')
df_female = pd.read_csv('data/provinces_women_2020-2022.csv', index_col='Category')

df_total.index = [st.strip() for st in df_total.index.to_list()]
df_total.rename(index={'Antwerp': 'Antwerpen',
                       'Walloon Brabant': 'Brabant Wallon',
                       'East Flanders': 'Oost-Vlaanderen',
                       'Flemish Brabant': 'Vlaams-Brabant',
                       'West Flanders': 'West-Vlaanderen'}, inplace=True)

# Combine data for provinces
df_provinces_2020_22 = pd.concat([df_total, df_male, df_female], axis='columns')
df_provinces_2020_22.columns = ['total', 'male', 'female']

df_provinces_2020_22.rename(index={'Liège': 'Liege'}, inplace=True)

df_provinces_2020_22

Unnamed: 0,total,male,female
Antwerpen,82.34,80.4382,84.2331
Limburg,82.46,80.5259,84.4099
Oost-Vlaanderen,82.19,79.9905,84.3563
Vlaams-Brabant,82.93,80.8062,84.986
West-Vlaanderen,82.29,80.0362,84.5656
Brabant Wallon,82.48,80.2764,84.5243
Hainaut,78.7,75.8689,81.4813
Liege,79.49,77.1272,81.8337
Luxembourg,80.07,77.5624,82.6682
Namur,79.78,77.165,82.3869


In [6]:
def load_data_for_provinces_for_period(perid):
    '''Load data for provinces for period (e.g.2021-2023).
       This data does not include data for the capital and country as a whole.'''
    df_total = pd.read_csv(f'data/provinces_total_{perid}.csv', index_col='Category')
    df_male = pd.read_csv(f'data/provinces_men_{perid}.csv', index_col='Category')
    df_female = pd.read_csv(f'data/provinces_women_{perid}.csv', index_col='Category')

    # Combine data for provinces
    df_provinces = pd.concat([df_total, df_male, df_female], axis='columns')
    df_provinces.columns = ['total', 'male', 'female']
    df_provinces.index.name = ''
    return df_provinces


# load data for for provinces for 2021-2023 period
df_provinces_2021_23 = load_data_for_provinces_for_period('2021-2023')
df_provinces_2021_23

Unnamed: 0,total,male,female
,,,
Vlaams-Brabant,83.3717,81.2945,85.3701
Brabant Wallon,82.9191,80.7391,84.9308
Limburg,82.8262,80.9159,84.7411
Antwerpen,82.7958,80.9536,84.6048
West-Vlaanderen,82.6911,80.477,84.9143
Oost-Vlaanderen,82.5006,80.339,84.6237
Luxembourg,80.3164,77.778,82.9134
Namur,80.2993,77.8727,82.6635
Liege,80.2394,77.9885,82.4239


In [7]:
# analogously, load data for for provinces for 2022-2024 period
df_provinces_2022_24 = load_data_for_provinces_for_period('2022-2024')
df_provinces_2022_24

Unnamed: 0,total,male,female
,,,
Limburg,83.8098,82.0387,85.5445
Vlaams-Brabant,83.6502,81.7151,85.4963
Oost-Vlaanderen,83.1077,81.0913,85.0601
Antwerpen,83.0889,81.3317,84.8097
West-Vlaanderen,82.9808,80.8823,85.0744
Brabant Wallon,82.9651,80.8268,84.9271
Luxembourg,80.6341,78.2577,83.0497
Namur,80.6159,78.3374,82.8198
Liege,80.4712,78.3652,82.4967


In [8]:
# Load data for regions and country as a wholw
df_regions_total = pd.read_csv('data/regions_total_2024.csv', index_col='Category')
df_regions_male = pd.read_csv('data/regions_men_2024.csv', index_col='Category')
df_regions_female = pd.read_csv('data/regions_women_2024.csv', index_col='Category')

df_regions_total.tail()

Unnamed: 0_level_0,Belgium,Brussels,Flanders,Wallonia
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020,80.7881,79.61,81.9976,78.943
2021,81.65,81.2756,82.6816,79.8717
2022,81.6875,81.5818,82.6182,80.0154
2023,82.2775,82.1767,83.1667,80.6443
2024,82.3547,82.1498,84.0539,80.6988


<br>
<br>
<br>

Something with data is wrong. Exploration:

In [10]:
dfe = df_regions_total.copy()
dfe.index.name = ''

In [11]:
pop_wallonia = 3.692  # population of Wallonia in 2024 (millions people)
pop_brussels = 1.250  # population of Brussels
pop_flanders = 6.822  # population of Flanders
pop_sum = pop_wallonia + pop_brussels + pop_flanders
pop_sum

11.764

In [12]:
dfe['Belgium_my_estimate'] = (dfe['Wallonia']*pop_wallonia + dfe['Brussels']*pop_brussels + dfe['Flanders']*pop_flanders) / pop_sum
dfe['Δ'] = dfe['Belgium_my_estimate'] - dfe['Belgium']
dfe.round(2)

Unnamed: 0,Belgium,Brussels,Flanders,Wallonia,Belgium_my_estimate,Δ
,,,,,,
2000.0,77.79,77.94,78.49,76.5,77.81,0.02
2001.0,78.09,78.03,78.89,76.67,78.1,0.01
2002.0,78.17,78.06,78.95,76.81,78.18,0.01
2003.0,78.32,78.11,79.15,76.9,78.33,0.01
2004.0,78.97,78.87,79.77,77.54,78.97,0.0
2005.0,79.05,79.04,79.88,77.57,79.07,0.02
2006.0,79.38,79.12,80.23,77.91,79.38,0.0
2007.0,79.53,79.58,80.38,77.99,79.54,0.01
2008.0,79.59,79.43,80.47,78.04,79.6,0.01


WARNING: Something wrong with data for 2024.<br>
I decided to retract update of life expectancy tables from Wikipedia.

<br>
<br>

Combine data for regions and provinces

In [15]:
def concat_means_for_total_male_female_df(df_total, df_male, df_female, *, period_start, period_end):
    '''Concatenate mean values for given dataFrames for period with given start and end years.
       Then concatenate results.'''
    df = pd.concat([df_total.loc[period_start:period_end, :].mean(),
                    df_male.loc[period_start:period_end, :].mean(),
                    df_female.loc[period_start:period_end, :].mean()], axis='columns')
    df.columns = ['total', 'male', 'female']
    return df

In [16]:
df_regions_2022_24 = concat_means_for_total_male_female_df(df_regions_total, df_regions_male, df_regions_female,
                                                           period_start=2022, period_end=2024)

df_2022_24 = pd.concat([df_regions_2022_24, df_provinces_2022_24], axis='index')

# change order of records
df_2022_24.sort_values(by=['total', 'male', 'female'], ascending=False, inplace=True)
df_2022_24 = pd.concat([df_2022_24.loc[['Belgium']], df_2022_24.drop(['Belgium', 'Flanders', 'Wallonia']), df_2022_24.loc[['Flanders', 'Wallonia']]])

df_2022_24['fΔm'] = df_2022_24['female'] - df_2022_24['male']
df_2022_24

Unnamed: 0,total,male,female,fΔm
Belgium,82.106567,79.9954,84.158567,4.163167
Limburg,83.8098,82.0387,85.5445,3.5058
Vlaams-Brabant,83.6502,81.7151,85.4963,3.7812
Oost-Vlaanderen,83.1077,81.0913,85.0601,3.9688
Antwerpen,83.0889,81.3317,84.8097,3.478
West-Vlaanderen,82.9808,80.8823,85.0744,4.1921
Brabant Wallon,82.9651,80.8268,84.9271,4.1003
Brussels,81.969433,79.514133,84.214833,4.7007
Luxembourg,80.6341,78.2577,83.0497,4.792
Namur,80.6159,78.3374,82.8198,4.4824


In [17]:
# This list will be used for sorting of the next dataFrames.
ls_indexes_in_correct_order = df_2022_24.index.to_list()

In [18]:
df_regions_2021_23 = concat_means_for_total_male_female_df(df_regions_total, df_regions_male, df_regions_female,
                                                           period_start=2021, period_end=2023)

df_2021_23 = pd.concat([df_regions_2021_23, df_provinces_2021_23], axis='index')

# change order of records
df_2021_23 = df_2021_23.loc[ls_indexes_in_correct_order]

df_2021_23['fΔm'] = df_2021_23['female'] - df_2021_23['male']
df_2021_23

Unnamed: 0,total,male,female,fΔm
Belgium,81.871667,79.656133,84.036433,4.3803
Limburg,82.8262,80.9159,84.7411,3.8252
Vlaams-Brabant,83.3717,81.2945,85.3701,4.0756
Oost-Vlaanderen,82.5006,80.339,84.6237,4.2847
Antwerpen,82.7958,80.9536,84.6048,3.6512
West-Vlaanderen,82.6911,80.477,84.9143,4.4373
Brabant Wallon,82.9191,80.7391,84.9308,4.1917
Brussels,81.678033,79.127933,84.0129,4.884967
Luxembourg,80.3164,77.778,82.9134,5.1354
Namur,80.2993,77.8727,82.6635,4.7908


In [19]:
# analogiously, process another period   (the same code - not the best practice)
df_regions_2020_22 = concat_means_for_total_male_female_df(df_regions_total, df_regions_male, df_regions_female,
                                                           period_start=2020, period_end=2022)

df_2020_22 = pd.concat([df_regions_2020_22, df_provinces_2020_22], axis='index')

# change order of records
df_2020_22 = df_2020_22.loc[ls_indexes_in_correct_order]

df_2020_22['fΔm'] = df_2020_22['female'] - df_2020_22['male']
df_2020_22

Unnamed: 0,total,male,female,fΔm
Belgium,81.3752,79.102667,83.6181,4.515433
Limburg,82.46,80.5259,84.4099,3.884
Vlaams-Brabant,82.93,80.8062,84.986,4.1798
Oost-Vlaanderen,82.19,79.9905,84.3563,4.3658
Antwerpen,82.34,80.4382,84.2331,3.7949
West-Vlaanderen,82.29,80.0362,84.5656,4.5294
Brabant Wallon,82.48,80.2764,84.5243,4.2479
Brussels,80.822467,78.194833,83.273033,5.0782
Luxembourg,80.07,77.5624,82.6682,5.1058
Namur,79.78,77.165,82.3869,5.2219


<br />
<br />

In [21]:
# just for general interest, explore results
mal.min_and_max_values(df_2020_22, row_center=['Brussels', 'Belgium'], max_lng=15)

Number of records: 14


Unnamed: 0,total,male,female,fΔm
max,82.93 -Vlaams-Brabant,80.81 -Vlaams-Brabant,84.99 -Vlaams-Brabant,5.61 -Hainaut
max_2,82.48 -Brabant Wallon,80.53 -Limburg,84.57 -West-Vlaanderen,5.22 -Namur
max_3,82.46 -Limburg,80.44 -Antwerpen,84.52 -Brabant Wallon,5.11 -Wallonia
Brussels,– 80.82 –,– 78.19 –,– 83.27 –,– 5.08 –
Belgium,– 81.38 –,– 79.1 –,– 83.62 –,– 4.52 –
min_3,79.61 -Wallonia,77.13 -Liege,82.14 -Wallonia,4.15 -Flanders
min_2,79.49 -Liege,77.04 -Wallonia,81.83 -Liege,3.88 -Limburg
min,78.7 -Hainaut,75.87 -Hainaut,81.48 -Hainaut,3.79 -Antwerpen


In [22]:
mal.min_and_max_values(df_2021_23, row_center=['Brussels', 'Belgium'], max_lng=15)

Number of records: 14


Unnamed: 0,total,male,female,fΔm
max,83.37 -Vlaams-Brabant,81.29 -Vlaams-Brabant,85.37 -Vlaams-Brabant,5.52 -Hainaut
max_2,82.92 -Brabant Wallon,80.95 -Antwerpen,84.93 -Brabant Wallon,5.14 -Luxembourg
max_3,82.83 -Limburg,80.92 -Limburg,84.91 -West-Vlaanderen,4.93 -Wallonia
Brussels,– 81.68 –,– 79.13 –,– 84.01 –,– 4.88 –
Belgium,– 81.87 –,– 79.66 –,– 84.04 –,– 4.38 –
min_3,80.24 -Liege,77.78 -Luxembourg,82.61 -Wallonia,4.05 -Flanders
min_2,80.18 -Wallonia,77.68 -Wallonia,82.42 -Liege,3.83 -Limburg
min,79.28 -Hainaut,76.48 -Hainaut,82.0 -Hainaut,3.65 -Antwerpen


In [23]:
mal.min_and_max_values(df_2022_24, row_center=['Brussels', 'Belgium'], max_lng=15)

Number of records: 14


Unnamed: 0,total,male,female,fΔm
max,83.81 -Limburg,82.04 -Limburg,85.54 -Limburg,5.39 -Hainaut
max_2,83.65 -Vlaams-Brabant,81.72 -Vlaams-Brabant,85.5 -Vlaams-Brabant,4.79 -Luxembourg
max_3,83.28 -Flanders,81.37 -Flanders,85.15 -Flanders,4.71 -Wallonia
Brussels,– 81.97 –,– 79.51 –,– 84.21 –,– 4.7 –
Belgium,– 82.11 –,– 80.0 –,– 84.16 –,– 4.16 –
min_3,80.47 -Liege,78.26 -Luxembourg,82.77 -Wallonia,3.78 -Vlaams-Brabant
min_2,80.45 -Wallonia,78.06 -Wallonia,82.5 -Liege,3.51 -Limburg
min,79.63 -Hainaut,76.9 -Hainaut,82.29 -Hainaut,3.48 -Antwerpen


<br>
<br>

In [25]:
# print('dd_replacement = {')
# for region in sorted(df_2021_23.index.to_list()):
#     print(f'    "{region}"   : {{"en": ("", ""), "ru": ("", "")}},')
# print('}')

In [26]:
dd_replacement_provinces = {
    'Belgium'   : {'en': ('Belgium on average', 'Belgium'), 'ru': ('Бельгия в целом', 'Бельгия')},
    'Flanders'   : {'en': ('Flanders on average', 'Flanders'), 'ru': ('Фла́ндрия в среднем', 'Фландрия (историческая область)')},
    'Wallonia'   : {'en': ('Wallonia on average', 'Wallonia'), 'ru': ('Валлония в среднем', 'Валлония')},
    'Antwerpen'   : {'en': ('Antwerp', 'Antwerp Province'), 'ru': ('Антве́рпен', 'Антверпен (провинция)')},
    'Brabant Wallon'   : {'en': ('Walloon Brabant', 'Walloon Brabant'), 'ru': ('Валлонский Браба́нт', 'Валлонский Брабант')},
    'Brussels'   : {'en': ('Brussels', 'Brussels'), 'ru': ('Брюссе́ль', 'Брюссельский столичный регион')},
    'Hainaut'   : {'en': ('Hainaut', 'Hainaut Province'), 'ru': ('Эно́', 'Эно')},
    'Limburg'   : {'en': ('Limburg', 'Limburg (Belgium)'), 'ru': ('Ли́мбург', 'Лимбург (провинция Бельгии)')},
    'Liege'   : {'en': ('Liège', 'Liège Province'), 'ru': ('Льеж', 'Льеж (провинция)')},
    'Luxembourg'   : {'en': ('Belgian Luxembourg', 'Luxembourg (Belgium)'), 'ru': ('Люксембу́рг', 'Люксембург (провинция)')},
    'Namur'   : {'en': ('Namur', 'Namur Province'), 'ru': ('Намю́р', 'Намюр (провинция)')},
    'Oost-Vlaanderen'   : {'en': ('East Flanders', 'East Flanders'), 'ru': ('Восточная Фла́ндрия', 'Восточная Фландрия')},
    'Vlaams-Brabant'   : {'en': ('Flemish Brabant', 'Flemish Brabant'), 'ru': ('Фламандский Браба́нт', 'Фламандский Брабант')},
    'West-Vlaanderen'   : {'en': ('West Flanders', 'West Flanders'), 'ru': ('Западная Фла́ндрия', 'Западная Фландрия')}
}

In [27]:
# create code for placing info in Wikipedia
def create_table_provinces(df1, df2, df3, dd_replacement=dd_replacement_provinces, file_header='', lang='en'):

    def if_value(x, prec=2):
        return '—' if math.isnan(x) else \
               f"{x:0.{prec}f}"  if x>=0 else \
               f"−{-x:0.{prec}f}"                #"{x:0.{prec}f}".format(x, prec)
    
    def chval(x, prec=2, *, add_par=''):  # change_value
        return f'style="color:silver;{add_par}"| —' if math.isnan(x) else \
               f'style="color:darkgreen;{add_par}"| {x:0.{prec}f}' if x>0 else \
               f'style="color:crimson;{add_par}"| −{-x:0.{prec}f}' if x<0 else \
               f'style="color:darkgray;{add_par}"| {x:0.{prec}f}'
    
    def chval_bold(x, prec=2, *, add_par=''):  # change_value
        return f'style="color:silver;{add_par}"| \'\'\'—\'\'\'' if math.isnan(x) else \
               f'style="color:darkgreen;{add_par}"| \'\'\'{x:0.{prec}f}\'\'\'' if x>0 else \
               f'style="color:crimson;{add_par}"| \'\'\'−{-x:0.{prec}f}\'\'\'' if x<0 else \
               f'style="color:darkgray;{add_par}"| \'\'\'{x:0.{prec}f}\'\'\''

    with open('design/' + file_header, mode='r', encoding="utf-8") as fh:
        table_header = fh.read()

    st = ''
    for i in range(len(df1)):
        ser1 = df1.iloc[i]  # order of provinces in the dataFrames the same
        ser2 = df2.iloc[i]
        ser3 = df3.iloc[i]
        assert ser1.name == ser2.name == ser3.name, f"names of regions don't coinside: {ser1.name} vs {ser2.name}"
        
        padding_size = '3.5ex' if lang=='ru' else '3.2ex'
        
        name_link = dd_replacement[ser1.name][lang][1]
        name_visible = dd_replacement[ser1.name][lang][0]
        name_inserted = name_link if name_link == name_visible else f"{name_link}|{name_visible}"
        
        if ser1.name in ['Belgium', 'Flanders', 'Wallonia']:
            st += '\n' + '|-class=static-row-header\n' + \
                  f'| \'\'\'[[{name_inserted}]]\'\'\' ' + \
                  f'||style="background:#e0ffd8;"| \'\'\'{if_value(ser1.total)}\'\'\' ' + \
                  f'||style="background:#eaf3ff;"| \'\'\'{if_value(ser1.male)}\'\'\' ' + \
                  f'||style="background:#fee7f6;"| \'\'\'{if_value(ser1.female)}\'\'\' ' + \
                  f'|| \'\'\'{if_value(ser1.fΔm)}\'\'\' ' + \
                  f'||{chval_bold(ser2.total-ser1.total, add_par="padding-right:"+padding_size+";border-left-width:2px;")} ' + \
                  f'||style="background:#e0ffd8;border-left-width:2px;"| \'\'\'{if_value(ser2.total)}\'\'\' ' + \
                  f'||style="background:#eaf3ff;"| \'\'\'{if_value(ser2.male)}\'\'\' ' + \
                  f'||style="background:#fee7f6;"| \'\'\'{if_value(ser2.female)}\'\'\' ' + \
                  f'|| \'\'\'{if_value(ser2.fΔm)}\'\'\' ' + \
                  f'||{chval_bold(ser3.total-ser2.total, add_par="padding-right:"+padding_size+";border-left-width:2px;")} ' + \
                  f'||style="background:#e0ffd8;border-left-width:2px;"| \'\'\'{if_value(ser3.total)}\'\'\' ' + \
                  f'||style="background:#eaf3ff;"| \'\'\'{if_value(ser3.male)}\'\'\' ' + \
                  f'||style="background:#fee7f6;"| \'\'\'{if_value(ser3.female)}\'\'\' ' + \
                  f'|| \'\'\'{if_value(ser3.fΔm)}\'\'\''
        else:
            st += '\n' + '|-\n' + \
                  f'| [[{name_inserted}]] ' + \
                  f'||style="background:#e0ffd8;"| {if_value(ser1.total)} ' + \
                  f'||style="background:#eaf3ff;"| {if_value(ser1.male)} ' + \
                  f'||style="background:#fee7f6;"| {if_value(ser1.female)} ' + \
                  f'|| {if_value(ser1.fΔm)} ' + \
                  f'||{chval(ser2.total-ser1.total, add_par="padding-right:"+padding_size+";border-left-width:2px;")} ' + \
                  f'||style="background:#e0ffd8;border-left-width:2px;"| {if_value(ser2.total)} ' + \
                  f'||style="background:#eaf3ff;"| {if_value(ser2.male)} ' + \
                  f'||style="background:#fee7f6;"| {if_value(ser2.female)} ' + \
                  f'|| {if_value(ser2.fΔm)} ' + \
                  f'||{chval(ser3.total-ser2.total, add_par="padding-right:"+padding_size+";border-left-width:2px;")} ' + \
                  f'||style="background:#e0ffd8;border-left-width:2px;"| {if_value(ser3.total)} ' + \
                  f'||style="background:#eaf3ff;"| {if_value(ser3.male)} ' + \
                  f'||style="background:#fee7f6;"| {if_value(ser3.female)} ' + \
                  f'|| {if_value(ser3.fΔm)}'
            
            
    if lang == 'ru':
        st = re.sub('(?<=\\d)\\.(?=\\d)', ',', st)  # replace . to comma, if this . is between two digits
        st = st.replace('padding-right:3,5ex;', 'padding-right:3.5ex;')
    # else:
    #     st = st.replace('padding-right:3,5ex;', 'padding-right:3.5ex;')

    st = table_header + st + '\n|}'
    
    # gray color for missing values
    # st = st.replace(';"|—', ';color:silver;"|—')

    return st


if WRITE_TABLES_TO_FILES:
    table_code = create_table_provinces(df1=df_2020_22, df2=df_2021_23, df3=df_2022_24, file_header='Belgian_header_en -2024_provinces.txt', lang='en')
    with open('output/Table code for Belgian provinces -2024, en.txt', 'w', encoding="utf-8") as fh:
        fh.write(table_code)
    
    table_code = create_table_provinces(df1=df_2020_22, df2=df_2021_23, df3=df_2022_24, file_header='Belgian_header_ru -2024_provinces.txt', lang='ru')
    with open('output/Table code for Belgian provinces -2024, ru.txt', 'w', encoding="utf-8") as fh:
        fh.write(table_code)

<br>
<br>

In [29]:
dd_replacement_regions = {
    'Belgium'   : {'en': ('Belgium', ''), 'ru': ('Бельгия', '')},
    'Flanders'   : {'en': ('Flanders', 'Flanders'), 'ru': ('Фла́ндрия', 'Фландрия (историческая область)')},
    'Wallonia'   : {'en': ('Wallonia', 'Wallonia'), 'ru': ('Валлония', 'Валлония')},
    'Brussels'   : {'en': ('Brussels', 'Brussels'), 'ru': ('Брюссе́ль', 'Брюссельский столичный регион')}
}

In [30]:
# create code for placing info in Wikipedia
def create_table_regions(df_total, df_male, df_female, dd_replacement=dd_replacement_regions, file_header='', lang='en'):

    def if_value(x, prec=2):
        return '—' if math.isnan(x) else \
               f"{x:0.{prec}f}"  if x>=0 else \
               f"−{-x:0.{prec}f}"                #"{x:0.{prec}f}".format(x, prec)
    
    def chval(x, prec=2, *, add_par=''):  # change_value
        return f'style="color:silver;{add_par}"| —' if math.isnan(x) else \
               f'style="color:darkgreen;{add_par}"| {x:0.{prec}f}' if x>0 else \
               f'style="color:crimson;{add_par}"| −{-x:0.{prec}f}' if x<0 else \
               f'style="color:darkgray;{add_par}"| {x:0.{prec}f}'
    
    def chval_bold(x, prec=2, *, add_par=''):  # change_value
        return f'style="color:silver;{add_par}"| \'\'\'—\'\'\'' if math.isnan(x) else \
               f'style="color:darkgreen;{add_par}"| \'\'\'{x:0.{prec}f}\'\'\'' if x>0 else \
               f'style="color:crimson;{add_par}"| \'\'\'−{-x:0.{prec}f}\'\'\'' if x<0 else \
               f'style="color:darkgray;{add_par}"| \'\'\'{x:0.{prec}f}\'\'\''

    with open('design/' + file_header, mode='r', encoding="utf-8") as fh:
        table_header = fh.read()

    st = ''
    for i in range(len(df_total)):
        ser_total = df_total.iloc[i]
        ser_male = df_male.loc[ser_total.name]
        ser_female = df_female.loc[ser_total.name]
        assert ser_total.name == ser_male.name == ser_female.name
        
        if ser_total.name in ['Belgium']:
            st += '\n' + '|-\n' + \
                  f'| \'\'\'{dd_replacement[ser_total.name][lang][0]}\'\'\' ' + \
                  f'||style="background:#e0ffd8;"| \'\'\'{if_value(ser_total[2024], 2)}\'\'\' ' + \
                  f'||style="background:#eaf3ff;"| \'\'\'{if_value(ser_male[2024], 2)}\'\'\' ' + \
                  f'||style="background:#fee7f6;"| \'\'\'{if_value(ser_female[2024], 2)}\'\'\' ' + \
                  f'||style="background:#fff8dc;"| \'\'\'{if_value(ser_female[2024]-ser_male[2024], 2)}\'\'\' ' + \
                  f'||style="border-left-width:2px;padding-left:1em;"| \'\'\'{if_value(ser_total[2019], 2)}\'\'\' ' + \
                  f'||{chval_bold(ser_total[2020]-ser_total[2019], 2)} ' + \
                  f'|| \'\'\'{if_value(ser_total[2020], 2)}\'\'\' ' + \
                  f'||{chval_bold(ser_total[2021]-ser_total[2020], 2)} ' + \
                  f'|| \'\'\'{if_value(ser_total[2021], 2)}\'\'\' ' + \
                  f'||{chval_bold(ser_total[2022]-ser_total[2021], 2)} ' + \
                  f'|| \'\'\'{if_value(ser_total[2022], 2)}\'\'\' ' + \
                  f'||{chval_bold(ser_total[2023]-ser_total[2022], 2)} ' + \
                  f'|| \'\'\'{if_value(ser_total[2023], 2)}\'\'\' ' + \
                  f'||{chval_bold(ser_total[2024]-ser_total[2023], 2)} ' + \
                  f'||style="background:#e0ffd8;"| \'\'\'{if_value(ser_total[2024], 2)}\'\'\' ' + \
                  f'||{chval_bold(ser_total[2024]-ser_total[2019], 2, add_par="border-left-width:2px;")}'
        else:
            name_link = dd_replacement[ser_total.name][lang][1]
            name_visible = dd_replacement[ser_total.name][lang][0]
            name_inserted = name_link if name_link == name_visible else f"{name_link}|{name_visible}"
            st += '\n' + '|-\n' + \
                  f'| [[{name_inserted}]] ' + \
                  f'||style="background:#e0ffd8;"| {if_value(ser_total[2024], 2)} ' + \
                  f'||style="background:#eaf3ff;"| {if_value(ser_male[2024], 2)} ' + \
                  f'||style="background:#fee7f6;"| {if_value(ser_female[2024], 2)} ' + \
                  f'||style="background:#fff8dc;"| {if_value(ser_female[2024]-ser_male[2024], 2)} ' + \
                  f'||style="border-left-width:2px;padding-left:1em;"| {if_value(ser_total[2019], 2)} ' + \
                  f'||{chval(ser_total[2020]-ser_total[2019], 2)} ' + \
                  f'|| {if_value(ser_total[2020], 2)} ' + \
                  f'||{chval(ser_total[2021]-ser_total[2020], 2)} ' + \
                  f'|| {if_value(ser_total[2021], 2)} ' + \
                  f'||{chval(ser_total[2022]-ser_total[2021], 2)} ' + \
                  f'|| {if_value(ser_total[2022], 2)} ' + \
                  f'||{chval(ser_total[2023]-ser_total[2022], 2)} ' + \
                  f'|| {if_value(ser_total[2023], 2)} ' + \
                  f'||{chval(ser_total[2024]-ser_total[2023], 2)} ' + \
                  f'||style="background:#e0ffd8;"| {if_value(ser_total[2024], 2)} ' + \
                  f'||{chval(ser_total[2024]-ser_total[2019], 2, add_par="border-left-width:2px;")}'

    if lang == 'ru':
        st = re.sub('(?<=\\d)\\.(?=\\d)', ',', st)  # replace . to comma, if this . is between two digits

    st = table_header + st + '\n|}'
    
    # gray color for missing values
    # st = st.replace(';"|—', ';color:silver;"|—')

    return st


ls_regions = ['Flanders', 'Belgium', 'Brussels', 'Wallonia']

if WRITE_TABLES_TO_FILES:
    table_code = create_table_regions(df_total =df_regions_total.T.loc[ls_regions],
                                      df_male  =df_regions_male.T.loc[ls_regions],
                                      df_female=df_regions_female.T.loc[ls_regions],
                                      file_header='Belgian_header_en -2024_regions.txt', lang='en')
    with open('output/Table code for Belgian regions -2024, en.txt', 'w', encoding="utf-8") as fh:
        fh.write(table_code)

    table_code = create_table_regions(df_total =df_regions_total.T.loc[ls_regions],
                                      df_male  =df_regions_male.T.loc[ls_regions],
                                      df_female=df_regions_female.T.loc[ls_regions],
                                      file_header='Belgian_header_ru -2024_regions.txt', lang='ru')
    with open('output/Table code for Belgian regions -2024, ru.txt', 'w', encoding="utf-8") as fh:
        fh.write(table_code)

<br>
<br>
<br>
<hr>

<h3>Map creation</h3>

In [32]:
CountryGroup = namedtuple('CountryGroup', ['group_label', 'color', 'countries'])

In [33]:
df_map = df_2022_24.copy()                   \
           .drop(['Belgium', 'Flanders', 'Wallonia']) \
           .rename(index={
               'Antwerpen' : 'Antwerp province',
               'Brussels' : 'Brussels province',
               'Oost-Vlaanderen' : 'East Flanders province',
               'Vlaams-Brabant' : 'Flemish Brabant province',
               'Hainaut' : 'Hainaut province',
               'Limburg' : 'Limburg province',
               'Liege' : 'Liege province',
               'Luxembourg' : 'Luxembourg province',
               'Namur' : 'Namur province',
               'Brabant Wallon' : 'Waloon Brabant province',
               'West-Vlaanderen' : 'West Flanders province'
           })        

df_map.sort_values(by=['total', 'male'], ascending=False).round(2)

Unnamed: 0,total,male,female,fΔm
Limburg province,83.81,82.04,85.54,3.51
Flemish Brabant province,83.65,81.72,85.5,3.78
East Flanders province,83.11,81.09,85.06,3.97
Antwerp province,83.09,81.33,84.81,3.48
West Flanders province,82.98,80.88,85.07,4.19
Waloon Brabant province,82.97,80.83,84.93,4.1
Brussels province,81.97,79.51,84.21,4.7
Luxembourg province,80.63,78.26,83.05,4.79
Namur province,80.62,78.34,82.82,4.48
Liege province,80.47,78.37,82.5,4.13


<br />
<br />

In [35]:
dd_legend = {
    '83.5–83.99' : '00a000',
    '83.0–83.49' : '00b800',
    '82.5–82.99' : '00d800',
    '82.0–82.49' : '00f100',
    '81.5–81.99' : '82ff00',
    '81.0–81.49' : 'c6ff00',
    '80.5–80.99' : 'ffff00',
    '80.0–80.49' : 'ffd800',
    '79.5–79.99' : 'ffac00',
    '79.0–79.49' : 'ff7800',
    '78.5–78.99' : 'ff0000'
}

# version with step 0.25
    # '83.50–83.74' : '002000',
    # '83.25–83.49' : '004800',
    # '83.00–83.24' : '006800',
    # '82.75–82.99' : '009000',
    # '82.50–82.74' : '00b000',
    # '82.25–82.49' : '00d400',
    # '82.00–82.24' : '00ec00',
    # '81.75–81.99' : '78ff00',
    # '81.50–81.74' : 'c8ff00',
    # '81.25–81.49' : 'ffff00',
    # '81.00–81.24' : 'ffe000',
    # '80.75–80.99' : 'ffc000',
    # '80.50–80.74' : 'ffa000',
    # '80.25–80.49' : 'ff8000',
    # '80.00–80.24' : 'ff5400',
    # '79.75–79.99' : 'ff1800',
    # '79.50–79.74' : 'd80000',
    # '79.25–79.49' : 'b40000',
    # '79.00–79.24' : '900000',
    # '78.75–78.99' : '680000',
    # '78.50–78.74' : '400000'

def create_legend_code(dd_legend):
    for k, v in dd_legend.items():
        print(f"{{{{Legend|#{v}|{k}}}}}")

create_legend_code(dd_legend)

{{Legend|#00a000|83.5–83.99}}
{{Legend|#00b800|83.0–83.49}}
{{Legend|#00d800|82.5–82.99}}
{{Legend|#00f100|82.0–82.49}}
{{Legend|#82ff00|81.5–81.99}}
{{Legend|#c6ff00|81.0–81.49}}
{{Legend|#ffff00|80.5–80.99}}
{{Legend|#ffd800|80.0–80.49}}
{{Legend|#ffac00|79.5–79.99}}
{{Legend|#ff7800|79.0–79.49}}
{{Legend|#ff0000|78.5–78.99}}


<br />
<br />

In [37]:
def filter_df(df, selected_column):
    filtered_df = df.loc[:, [selected_column]]   \
                    .sort_values(by=selected_column, ascending=False) \
                    .dropna()

    filtered_df['group_label'] = filtered_df[selected_column].map(lambda x: f"{x * 4 // 2 / 2:.1f}–{x * 4 // 2 / 2 + 0.49:.2f}")
                                 # for step 0.25: lambda x: f"{x * 8 // 2 / 4:.2f}–{x * 8 // 2 / 4 + 0.24:.2f}"
    
    
    # filtered_df['group_label'] = filtered_df['group_label'].replace(['79.5–79.99', '79.0–79.49', '78.5–78.99', '78.0–78.49'], '78.0–79.99')
    
    min_value = filtered_df[selected_column].min()
    max_value = filtered_df[selected_column].max()
    
    print(f"           ——— {selected_column} ———")
    print(f"Range: {min_value:.2f} – {max_value:.2f}   " +
          f"({filtered_df[selected_column].idxmin()} – {filtered_df[selected_column].idxmax()})")
    print(f"Number of groups: {filtered_df['group_label'].nunique()}")
    print(f"Number of values: {len(filtered_df)}")

    return filtered_df


df_sel = filter_df(df_map, selected_column='total')
df_sel.round(2)

           ——— total ———
Range: 79.63 – 83.81   (Hainaut province – Limburg province)
Number of groups: 7
Number of values: 11


Unnamed: 0,total,group_label
Limburg province,83.81,83.5–83.99
Flemish Brabant province,83.65,83.5–83.99
East Flanders province,83.11,83.0–83.49
Antwerp province,83.09,83.0–83.49
West Flanders province,82.98,82.5–82.99
Waloon Brabant province,82.97,82.5–82.99
Brussels province,81.97,81.5–81.99
Luxembourg province,80.63,80.5–80.99
Namur province,80.62,80.5–80.99
Liege province,80.47,80.0–80.49


In [38]:
def extract_indexes(subdf, dd_legend = dd_legend):
    group_label = subdf['group_label'].iloc[0]
    countries = subdf.index.to_list()
    color = (dd_legend[group_label])
    
    ls_grouping.append(CountryGroup(group_label=group_label, countries=countries, color=color))

    return pd.Series([color, countries], index=['color', 'regions'])


ls_grouping = []
df_grouped = df_sel.groupby(['group_label'])[['group_label']].apply(extract_indexes).loc[::-1]

df_grouped

Unnamed: 0_level_0,color,regions
group_label,Unnamed: 1_level_1,Unnamed: 2_level_1
83.5–83.99,00a000,"[Limburg province, Flemish Brabant province]"
83.0–83.49,00b800,"[East Flanders province, Antwerp province]"
82.5–82.99,00d800,"[West Flanders province, Waloon Brabant province]"
81.5–81.99,82ff00,[Brussels province]
80.5–80.99,ffff00,"[Luxembourg province, Namur province]"
80.0–80.49,ffd800,[Liege province]
79.5–79.99,ffac00,[Hainaut province]


In [39]:
def create_map_code_regions(ls_grouping, title=''):
    true, false  = True, False
    
    jo = {
        "groups": { },
        "title": title,
        "hidden": [],
        "background": "#ffffff",
        "borders": "#000",
        "legendFont": "Century Gothic",
        "legendFontColor": "#000",
        "legendBorderColor": "#00000000",
        "legendBgColor": "#00000000",
        "legendWidth": 150,
        "legendBoxShape": "square",
        "areBordersShown": true,
        "defaultColor": "#d1dbdd",
        "labelsColor": "#6a0707",
        "labelsFont": "Arial",
        "strokeWidth": "medium",
        "areLabelsShown": true,
        "uncoloredScriptColor": "#ffff33",
        "v6": true,
        "levelsVisibility": [
            "show",
            "show",
            "transparent"
        ],
        "legendPosition": "custom",
        "legendX": 120,
        "legendY": 512,
        "legendSize": "larger",
        "legendTranslateX": "0.00",
        "legendStatus": "show",
        "scalingPatterns": true,
        "legendRowsSameColor": true,
        "legendColumnCount": 1
    }
    
    for group_label, color, regions in ls_grouping[::-1]:
        jo['groups'][f'#{color}'] = {'label': group_label, 'paths': [region.replace(' ', '_',) for region in regions]}
    return jo


jo = create_map_code_regions(ls_grouping, title='2022 – 2024')
print(json.dumps(jo))

{"groups": {"#00a000": {"label": "83.5\u201383.99", "paths": ["Limburg_province", "Flemish_Brabant_province"]}, "#00b800": {"label": "83.0\u201383.49", "paths": ["East_Flanders_province", "Antwerp_province"]}, "#00d800": {"label": "82.5\u201382.99", "paths": ["West_Flanders_province", "Waloon_Brabant_province"]}, "#82ff00": {"label": "81.5\u201381.99", "paths": ["Brussels_province"]}, "#ffff00": {"label": "80.5\u201380.99", "paths": ["Luxembourg_province", "Namur_province"]}, "#ffd800": {"label": "80.0\u201380.49", "paths": ["Liege_province"]}, "#ffac00": {"label": "79.5\u201379.99", "paths": ["Hainaut_province"]}}, "title": "2022 \u2013 2024", "hidden": [], "background": "#ffffff", "borders": "#000", "legendFont": "Century Gothic", "legendFontColor": "#000", "legendBorderColor": "#00000000", "legendBgColor": "#00000000", "legendWidth": 150, "legendBoxShape": "square", "areBordersShown": true, "defaultColor": "#d1dbdd", "labelsColor": "#6a0707", "labelsFont": "Arial", "strokeWidth": "m

In [40]:
# with open(f"output/JSON MapChart.txt", 'w', encoding="utf-8") as fh:
#     json.dump(jo, fh, indent=2, ensure_ascii=False)

<br>
<br>
<br>

Something with data is wrong. Manual exploration:

In [42]:
df_regions_total.tail()

Unnamed: 0_level_0,Belgium,Brussels,Flanders,Wallonia
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020,80.7881,79.61,81.9976,78.943
2021,81.65,81.2756,82.6816,79.8717
2022,81.6875,81.5818,82.6182,80.0154
2023,82.2775,82.1767,83.1667,80.6443
2024,82.3547,82.1498,84.0539,80.6988


In [43]:
# Belgium:
print("Belgium:\t", round(82.11 - 81.87, 2))
print("Flanders:\t", round(83.28 - 82.82, 2))
print("Brussels:\t", round(81.97 - 81.68, 2))
print("Wallonia:\t", round(80.45 - 80.18, 2))

Belgium:	 0.24
Flanders:	 0.46
Brussels:	 0.29
Wallonia:	 0.27


In [44]:
# manually find approximate average LE in Belgium from values for Flanders, Wallonia, and Brussels regions, for 2021-2023:
le_av_2021_23 = (82.82 * 6.8 + 81.68 * 1.2 + 80.18 * 3.7) / 11.7
print(f"{le_av_2021_23:.3f}")

le_av_2022_24 = (83.28 * 6.8 + 81.97 * 1.2 + 80.45 * 3.7) / 11.7
print(f"{le_av_2022_24:.3f}")

81.868
82.251


In [45]:
print((df_regions_total.loc[2022, 'Belgium'] + df_regions_total.loc[2023, 'Belgium'] + df_regions_total.loc[2024, 'Belgium']) / 3)
print((df_regions_total.loc[2022, 'Flanders'] + df_regions_total.loc[2023, 'Flanders'] + df_regions_total.loc[2024, 'Flanders']) / 3)
print((df_regions_total.loc[2022, 'Brussels'] + df_regions_total.loc[2023, 'Brussels'] + df_regions_total.loc[2024, 'Brussels']) / 3)
print((df_regions_total.loc[2022, 'Wallonia'] + df_regions_total.loc[2023, 'Wallonia'] + df_regions_total.loc[2024, 'Wallonia']) / 3)

82.10656666666667
83.2796
81.96943333333333
80.45283333333333
