## Visualization 1 : 

How do baby names evolve over time? 

Are there names that have consistently remained popular or unpopular? 

Are there some that have were suddenly or briefly popular or unpopular? 

Are there trends in time?

# First Solution

### Stacked bar chart

We created here a stacked bar chart using Altair to display the top 10 most popular names over the years. It encodes the x-axis with the annais field as an ordinal scale, the y-axis with the sum of nombre field as a quantitative scale, and the color of the bars with the preusuel field. Additionally, it includes a tooltip that shows the name, year, and count for each bar.


The strengths of using a stacked bar chart to display the top names for each year in a bar chart format include:

- Comparison: A stacked bar chart allows for easy visual comparison between names within each year. We can quickly identify the most popular and least popular names by comparing the heights of the bars.

- Trend Analysis: By observing the changes in the distribution of the stacked bars over the years, We can identify trends in name popularity. For example, We can see if certain names consistently remain popular or if there are fluctuations in their popularity.

- Total Count: The stacked bars also provide information on the total count of names in a given year. By looking at the overall height of the bars, We can understand the total number of occurrences of names and compare it across different years.

- Name Contributions: The stacked nature of the bars allows us to see the contribution of each name to the total count. This helps in identifying the relative popularity of different names within a year.

However, there are also some potential weaknesses to consider:

- Visual Clutter: There are too many names the dataset spans a large number of years so the stacked bar chart can become visually cluttered and challenging to interpret. This makes it difficult to distinguish individual names and observe trends clearly.

- Lack of Granularity: A stacked bar chart provides an overview of name popularity trends but may not offer detailed insights into specific names or their variations (e.g., spelling variations).

- Data Size Limitations: We encounter limitations in terms of the number of names or years that can be effectively displayed in a single chart. 


In [1]:
import altair as alt
import pandas as pd

# Load the data
names = pd.read_csv("dpt2020.csv", sep=";")

names.drop(names[names.preusuel == '_PRENOMS_RARES'].index, inplace=True)
names.drop(names[names.dpt == 'XX'].index, inplace=True)

# Aggregating the data to find top 10 names for each year
top_10_names = names.groupby('annais').apply(lambda x: x.nlargest(10, 'nombre')).reset_index(drop=True)

display(top_10_names)

Unnamed: 0,sexe,preusuel,annais,dpt,nombre
0,2,MARIE,1900,29,2519
1,2,MARIE,1900,75,1576
2,2,SUZANNE,1900,75,1382
3,2,MARIE,1900,56,1358
4,2,MARIE,1900,59,1353
...,...,...,...,...,...
1205,1,MOHAMED,2020,93,238
1206,1,ADAM,2020,92,214
1207,1,LÉO,2020,59,210
1208,2,LOUISE,2020,75,208


In [2]:
# Creating the stacked bar chart
chart = alt.Chart(top_10_names).mark_bar().encode(
    x='annais:O',
    y='sum(nombre):Q',
    color='preusuel:N',
    tooltip=['preusuel:N', 'annais:O', 'nombre:Q']
).properties(
    width=900,
    height=600,
    title='Top 10 Most Popular Names Over the Years'
)

chart

In [3]:
region_counts = top_10_names.groupby(['dpt', 'preusuel'])['nombre'].sum().reset_index()

chart2 = alt.Chart(region_counts).mark_rect().encode(
    alt.X('dpt:N', axis=None),
    alt.Y('preusuel:N', axis=None),
    alt.Color('nombre:Q'),
    alt.Tooltip(['preusuel:N', 'nombre:Q'])
).properties(
    width=600,
    height=400,
    title='Baby Name Popularity by Region (Treemap)'
)

# Display the chart
chart2

# Second Solution

In [5]:
names_fem = names[names.sexe==2]
names_masc = names[names.sexe==1]
names_fem

Unnamed: 0,sexe,preusuel,annais,dpt,nombre
1741138,2,AALIYA,2017,75,3
1741140,2,AALIYAH,2001,92,4
1741141,2,AALIYAH,2001,971,5
1741142,2,AALIYAH,2002,06,3
1741143,2,AALIYAH,2002,13,3
...,...,...,...,...,...
3727545,2,ZYA,2013,44,4
3727546,2,ZYA,2013,59,3
3727547,2,ZYA,2017,974,3
3727548,2,ZYA,2018,59,3


In [6]:
names_fem_popular = names_fem[['preusuel', 'nombre']].groupby('preusuel', as_index=False).sum()
top_names_fem = names_fem_popular.sort_values('nombre', ascending=False)[:20]

names_masc_popular = names_masc[['preusuel', 'nombre']].groupby('preusuel', as_index=False).sum()
top_names_masc = names_masc_popular.sort_values('nombre', ascending=False)[:20]

names_popular = names[['preusuel', 'nombre']].groupby('preusuel', as_index=False).sum()
top_names = names_popular.sort_values('nombre', ascending=False)[:20]

print(list(top_names_fem['preusuel']))
print(list(top_names_masc['preusuel']))
print(list(top_names['preusuel']))

['MARIE', 'JEANNE', 'FRANÇOISE', 'MONIQUE', 'CATHERINE', 'NATHALIE', 'ISABELLE', 'JACQUELINE', 'ANNE', 'SYLVIE', 'MARTINE', 'MADELEINE', 'NICOLE', 'SUZANNE', 'HÉLÈNE', 'CHRISTINE', 'LOUISE', 'MARGUERITE', 'DENISE', 'CHRISTIANE']
['JEAN', 'PIERRE', 'MICHEL', 'ANDRÉ', 'PHILIPPE', 'LOUIS', 'RENÉ', 'ALAIN', 'JACQUES', 'BERNARD', 'MARCEL', 'DANIEL', 'ROGER', 'PAUL', 'ROBERT', 'CLAUDE', 'HENRI', 'CHRISTIAN', 'GEORGES', 'NICOLAS']
['MARIE', 'JEAN', 'PIERRE', 'MICHEL', 'ANDRÉ', 'JEANNE', 'PHILIPPE', 'LOUIS', 'RENÉ', 'ALAIN', 'JACQUES', 'BERNARD', 'MARCEL', 'CLAUDE', 'DANIEL', 'ROGER', 'PAUL', 'ROBERT', 'DOMINIQUE', 'GEORGES']


In [7]:
names_masc_filt = names_masc[names_masc['preusuel'].isin(list(top_names_masc['preusuel']))].groupby(['preusuel', 'annais'], as_index=False).sum().drop('sexe', axis=1)
names_fem_filt = names_fem[names_fem['preusuel'].isin(list(top_names_fem['preusuel']))].groupby(['preusuel', 'annais'], as_index=False).sum().drop('sexe', axis=1)
names_filt = names[names['preusuel'].isin(list(top_names['preusuel']))].groupby(['preusuel', 'annais'], as_index=False).sum().drop('sexe', axis=1)

print(names_masc_filt.head())
print(names_fem_filt.head())
print(names_filt.head())

  preusuel annais           dpt  nombre
0    ALAIN   1900  122935465675      83
1    ALAIN   1901      22293575      99
2    ALAIN   1902      22293575     106
3    ALAIN   1903  222429467588     120
4    ALAIN   1904  222935444656     136
  preusuel annais                                                dpt  nombre
0     ANNE   1900  0102030607080910111213141516171819202122232425...    3440
1     ANNE   1901  0102030506070809101112131415161718192021222324...    3455
2     ANNE   1902  0102030405060708091011131415161718192021222324...    3721
3     ANNE   1903  0102030607080910111213141516171819202122232425...    3744
4     ANNE   1904  0102030506070809101112131415161718192021222425...    3552
  preusuel annais           dpt  nombre
0    ALAIN   1900  122935465675      83
1    ALAIN   1901      22293575      99
2    ALAIN   1902      22293575     106
3    ALAIN   1903  222429467588     120
4    ALAIN   1904  222935444656     136


In [8]:
selection = alt.selection_multi(fields=['preusuel'], bind='legend')
chart = alt.Chart(names_filt).mark_line().encode(
  x='annais:T',
  y='nombre:Q',
  color=alt.condition(selection, 'preusuel:N', alt.value('lightgray')),
  size=alt.condition(selection, alt.value(3), alt.value(1))
).add_selection(selection)

chart



In [9]:
selection = alt.selection_multi(fields=['preusuel'], bind='legend')
chart = alt.Chart(names_masc_filt).mark_line().encode(
  x='annais:T',
  y='nombre:Q',
  color=alt.condition(selection, 'preusuel:N', alt.value('lightgray')),
  size=alt.condition(selection, alt.value(3), alt.value(1))
).add_selection(selection)

chart



In [10]:
selection = alt.selection_multi(fields=['preusuel'], bind='legend')
chart = alt.Chart(names_fem_filt).mark_line().encode(
  x='annais:T',
  y='nombre:Q',
  color=alt.condition(selection, 'preusuel:N', alt.value('lightgray')),
  size=alt.condition(selection, alt.value(3), alt.value(1))
).add_selection(selection)
chart

