## Gender effect: age pyramid

To illustrate the gender effect, we use a simple age pyramid showing, for a name entered into a search input, the number of births in 5-year intervals, with male births on the left and female births on the right.

Our main feddback on this vis was related to the scale: the first implementation used different scales for each gender, which could be misleading. This was fixed here: both charts use the same scale. However Altair does not allow to bind the scale to the search result, meaning we can't dynamically change the scale according to the number of births. So we had to load the scale once with the biggest number of births. This was not always readable, small names almost did not appear. 

To alleviate this, we changed the scale to a symlog scale (which is similar to a log scale except it works even with values equal to 0) ranging from 0 to ~250,000. Which made values hard to read (the log scale makes it hard to extract exact values). So we added a tooltip showing the exact value for the bar under the mouse.

### 0) Installs and imports

In [6]:
!pip install anywidget altair
!pip install -U ipykernel




[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import altair as alt 
alt.data_transformers.enable('json')
import pandas as pd
import pandas as pd

### 1) Import data

Also remove case from 'preusuel'.

In [3]:
baby_names = pd.read_csv('dpt2020.csv', sep = ";")
baby_names['annais'] = pd.to_numeric(baby_names['annais'], errors = "coerce", downcast = "integer")
baby_names['preusuel'] = baby_names['preusuel'].str.casefold()
print(len(baby_names))
baby_names

3727553


Unnamed: 0,sexe,preusuel,annais,dpt,nombre
0,1,_prenoms_rares,1900.0,02,7
1,1,_prenoms_rares,1900.0,04,9
2,1,_prenoms_rares,1900.0,05,8
3,1,_prenoms_rares,1900.0,06,23
4,1,_prenoms_rares,1900.0,07,9
...,...,...,...,...,...
3727548,2,zya,2018.0,59,3
3727549,2,zya,,XX,264
3727550,2,zyna,2013.0,93,3
3727551,2,zyna,,XX,59


### 2) Preprocessing

Binning the dataframe into 5-year intervals and adding a 'year_range' column that will be used byt the tooltip.

In [4]:
# Bin the years into 5-year intervals
baby_names['annais_binned'] = (baby_names['annais'] // 5) * 5

# Group by name and binned years, then aggregate
baby_names_agg = baby_names.groupby(['preusuel', 'annais_binned', 'sexe'])['nombre'].sum().unstack(fill_value=0).reset_index()

# Rename columns
baby_names_agg.columns = ['preusuel', 'annais', 'pop_m', 'pop_f']
baby_names_formatted = baby_names_agg[['preusuel', 'annais', 'pop_m', 'pop_f']]

# Add a column for the tooltip
baby_names_formatted['year_range'] = baby_names_formatted['annais'].astype(int).astype(str) + ' - ' + (baby_names_formatted['annais'] + 4).astype(int).astype(str)

# Changing last year_range because it's only for 1 year
baby_names_formatted.loc[len(baby_names_formatted) - 1, 'year_range'] = "2020"


baby_names_formatted

Unnamed: 0,preusuel,annais,pop_m,pop_f,year_range
0,_prenoms_rares,1900.0,6552,8780,1900 - 1904
1,_prenoms_rares,1905.0,7398,9221,1905 - 1909
2,_prenoms_rares,1910.0,7961,9811,1910 - 1914
3,_prenoms_rares,1915.0,5391,6808,1915 - 1919
4,_prenoms_rares,1920.0,9056,11005,1920 - 1924
...,...,...,...,...,...
79530,ïssa,2015.0,3,0,2015 - 2019
79531,ömer,1995.0,3,0,1995 - 1999
79532,ömer,2010.0,38,0,2010 - 2014
79533,ömer,2015.0,126,0,2015 - 2019


### 3) Building the chart

The chart is actually three charts: the male chart (on the left), the female chart (on the right) and a middle chart which shows the year intervals.

The search input is bound to both gender charts on the 'preusuel' field.

In [5]:
search_input = alt.param(
    value='',
    bind=alt.binding(
        input='search',
        placeholder="Name",
        name='Name: ',
    )
)

base = alt.Chart(baby_names_formatted).transform_filter(
    alt.FieldEqualPredicate(field='preusuel', equal=search_input)
).add_params(
    search_input
).properties(height=400)

max_value = max(baby_names_formatted[['pop_m', 'pop_f']].max())
scale = alt.Scale(type="symlog", domain=[0, max_value])

tick_values = [1, 10, 100, 1000, 10000, max_value]
tick_labels = ['1', '10', '100', '1000', '10000', f"{max_value}"]

chart_m = base.mark_bar(color='#7CB518').encode(
    x=alt.X('pop_m:Q',
            title='Male',
            sort='descending',
            scale=scale,
            axis=alt.Axis(tickCount=len(tick_values), values=tick_values, labels=True, labelExpr="datum.value")
            ),
    y=alt.Y('annais:N', axis=None, sort='descending'),
    tooltip=[
        alt.Tooltip('year_range:N', title='Years:'),
        alt.Tooltip('pop_m:Q', title='Male birth(s):'),
    ]
)

middle = base.encode(
    y=alt.Y('annais:O', axis=None, sort='descending'),
    text=alt.Text('annais:O'),
).mark_text().properties(width=20)

chart_f = base.mark_bar(color="#F3DE2C").encode(
    x=alt.X('pop_f:Q',
            title='Female',
            sort='ascending',
            scale=scale,
            axis=alt.Axis(tickCount=len(tick_values), values=tick_values, labels=True, labelExpr="datum.value")
            ),
    y=alt.Y('annais:N', axis=None, sort='descending'),
    tooltip=[
        alt.Tooltip('year_range:N', title='Years:'),
        alt.Tooltip('pop_f:Q', title='Female birth(s):'),
    ]
)

chart_m | chart_f
chart = alt.concat(chart_m, middle, chart_f, spacing=5)
 
jchart = alt.JupyterChart(chart)
jchart

JupyterChart(spec={'config': {'view': {'continuousWidth': 300, 'continuousHeight': 300}}, 'concat': [{'mark': …