# This Notebook represent our initial implementation

# VIZUALISATION 1 

**How do baby names evolve over time?**

**Are there names that have consistently remained popular or unpopular?**

**Are there some that have were suddenly or briefly popular or unpopular?**

**Are there trends in time?**

**Stacked bar chart**

We created here a stacked bar chart using Altair to display the top 10 most popular names over the years. It encodes the x-axis with the annais field as an ordinal scale, the y-axis with the sum of nombre field as a quantitative scale, and the color of the bars with the preusuel field. Additionally, it includes a tooltip that shows the name, year, and count for each bar.

The strengths of using a stacked bar chart to display the top names for each year in a bar chart format include:

**Comparison**: A stacked bar chart allows for easy visual comparison between names within each year. We can quickly identify the most popular and least popular names by comparing the heights of the bars.

**Trend Analysis**: By observing the changes in the distribution of the stacked bars over the years, We can identify trends in name popularity. For example, We can see if certain names consistently remain popular or if there are fluctuations in their popularity.

**Total Count**: The stacked bars also provide information on the total count of names in a given year. By looking at the overall height of the bars, We can understand the total number of occurrences of names and compare it across different years.

**Name Contributions**: The stacked nature of the bars allows us to see the contribution of each name to the total count. This helps in identifying the relative popularity of different names within a year.

However, there are also some potential weaknesses to consider:

**Visual Clutter**: There are too many names the dataset spans a large number of years so the stacked bar chart can become visually cluttered and challenging to interpret. This makes it difficult to distinguish individual names and observe trends clearly.

**Lack of Granularity**: A stacked bar chart provides an overview of name popularity trends but may not offer detailed insights into specific names or their variations (e.g., spelling variations).

**Data Size Limitations**: We encounter limitations in terms of the number of names or years that can be effectively displayed in a single chart.

In [1]:
import altair as alt
import pandas as pd

# Load the data
names = pd.read_csv("dpt2020.csv", sep=";")

names.drop(names[names.preusuel == '_PRENOMS_RARES'].index, inplace=True)
names.drop(names[names.dpt == 'XX'].index, inplace=True)

# Aggregating the data to find top 10 names for each year
top_10_names = names.groupby('annais').apply(lambda x: x.nlargest(10, 'nombre')).reset_index(drop=True)

In [2]:
# Creating the stacked bar chart
chart = alt.Chart(top_10_names).mark_bar().encode(
    x='annais:O',
    y='sum(nombre):Q',
    color='preusuel:N',
    tooltip=['preusuel:N', 'annais:O', 'nombre:Q']
).properties(
    width=900,
    height=600,
    title='Top 10 Most Popular Names Over the Years'
)

chart

**Visualization 2 :**

**Is there a regional effect in the data?**

**Are some names more popular in some regions?**

**Are popular names generally popular across the whole country?**

In [3]:
import altair as alt
import pandas as pd
import geopandas as gpd # Requires geopandas -- e.g.: conda install -c conda-forge geopandas
alt.data_transformers.enable('json') # Let Altair/Vega-Lite work with large data sets


DataTransformerRegistry.enable('json')

In [4]:
names = pd.read_csv("dpt2020.csv", sep=";")
names.drop(names[names.preusuel == '_PRENOMS_RARES'].index, inplace=True)
names.drop(names[names.dpt == 'XX'].index, inplace=True)
depts = gpd.read_file('departements-version-simplifiee.geojson')

just_names = names

names = depts.merge(names, how='right', left_on='code', right_on='dpt') 

grouped = names.groupby(['dpt', 'preusuel', 'sexe'], as_index=False).sum(numeric_only=True)

grouped = depts.merge(grouped, how='right', left_on='code', right_on='dpt')

In [5]:
def solution(name):



    names = pd.read_csv("dpt2020.csv", sep=";")
    names.drop(names[names.preusuel == '_PRENOMS_RARES'].index, inplace=True)
    names.drop(names[names.dpt == 'XX'].index, inplace=True)


    names['annais'] = names['annais'].astype(int)

    subset = names[(names.annais >= 2000) & (names.annais <= 2020) & (names.dpt != 'XX')]




    subset = subset[subset.preusuel == name]

    heatmap = alt.Chart(subset).mark_rect().encode(
        x=alt.X('annais:O', title='Year'),
        y=alt.Y('dpt:N', title='Region'),
        color=alt.Color('sum(nombre):Q', title='Popularity'),
        tooltip=['annais', 'dpt', 'sum(nombre)']
    ).properties(
        width=1000,
        height=1400,
        title='Popularity of the Name "{}" across Regions and Years'.format(name)
    )
    return heatmap

In [6]:
solution("GABRIELLE") 

Here are the questions we need to answer with this solution. But why do we think it's a good visualisation ?

**Is there a regional effect in the data?**
**Are some names more popular in some regions?**
**Are popular names generally popular across the whole country?**


**Is there a regional effect in the data?**

The data visualization offers valuable insights into the regional effect on naming preferences. It demonstrates that certain names, like Gabrielle, enjoy high popularity in specific regions such as Paris, but their popularity might not extend uniformly across the entire country. This suggests that naming trends exhibit significant regional variations, indicating the influence of cultural diversity and regional factors in shaping naming preferences in different parts of France.

**Are popular names generally popular across the whole country of France? and are some names more popular in some regions?**

The visualization provides evidence of a regional effect in the popularity of names, as seen with the example of Gabrielle. While some names achieve popularity nationwide, others experience varying degrees of popularity depending on the region. This observation highlights the unique cultural and social dynamics present in different regions of France, contributing to the preference and prominence of certain names in specific areas. Consequently, it can be concluded that popular names are not necessarily universally popular across the entire country, indicating the existence of regional variations in naming preferences.

**Strengths:**

The visualization enables a quick assessment of name trends and their evolution over time. By focusing on a specific name, it facilitates the observation of popularity shifts and patterns associated with that name. The visualization sheds light on regional differences in naming preferences, underscoring the cultural richness and diversity within France.

**Weaknesses:**

The choice of the name under analysis can significantly impact the results, warranting careful selection. Aesthetically, the visualization could be further refined for enhanced visual appeal. While the visualization captures temporal trends, it might not provide a comprehensive understanding of the multifaceted factors influencing naming patterns. Additionally, the data size limitations of the visualization should be acknowledged. Importantly, the visualization does not provide a definitive answer to whether popular names are generally popular nationwide, as the regional effect demonstrates variations in popularity across different parts of the country.