## Visualization 1 : 

How do baby names evolve over time? 

Are there names that have consistently remained popular or unpopular? 

Are there some that have were suddenly or briefly popular or unpopular? 

Are there trends in time?

### Stacked bar chart

We created here a stacked bar chart using Altair to display the top 10 most popular names over the years. It encodes the x-axis with the annais field as an ordinal scale, the y-axis with the sum of nombre field as a quantitative scale, and the color of the bars with the preusuel field. Additionally, it includes a tooltip that shows the name, year, and count for each bar.


The strengths of using a stacked bar chart to display the top names for each year in a bar chart format include:

- Comparison: A stacked bar chart allows for easy visual comparison between names within each year. We can quickly identify the most popular and least popular names by comparing the heights of the bars.

- Trend Analysis: By observing the changes in the distribution of the stacked bars over the years, We can identify trends in name popularity. For example, We can see if certain names consistently remain popular or if there are fluctuations in their popularity.

- Total Count: The stacked bars also provide information on the total count of names in a given year. By looking at the overall height of the bars, We can understand the total number of occurrences of names and compare it across different years.

- Name Contributions: The stacked nature of the bars allows us to see the contribution of each name to the total count. This helps in identifying the relative popularity of different names within a year.

However, there are also some potential weaknesses to consider:

- Visual Clutter: There are too many names the dataset spans a large number of years so the stacked bar chart can become visually cluttered and challenging to interpret. This makes it difficult to distinguish individual names and observe trends clearly.

- Lack of Granularity: A stacked bar chart provides an overview of name popularity trends but may not offer detailed insights into specific names or their variations (e.g., spelling variations).

- Data Size Limitations: We encounter limitations in terms of the number of names or years that can be effectively displayed in a single chart. 


In [1]:
import altair as alt
import pandas as pd

# Load the data
names = pd.read_csv("dpt2020.csv", sep=";")

names.drop(names[names.preusuel == '_PRENOMS_RARES'].index, inplace=True)
names.drop(names[names.dpt == 'XX'].index, inplace=True)

# Aggregating the data to find top 10 names for each year
top_10_names = names.groupby('annais').apply(lambda x: x.nlargest(10, 'nombre')).reset_index(drop=True)

display(top_10_names)

Unnamed: 0,sexe,preusuel,annais,dpt,nombre
0,2,MARIE,1900,29,2519
1,2,MARIE,1900,75,1576
2,2,SUZANNE,1900,75,1382
3,2,MARIE,1900,56,1358
4,2,MARIE,1900,59,1353
...,...,...,...,...,...
1205,1,MOHAMED,2020,93,238
1206,1,ADAM,2020,92,214
1207,1,LÉO,2020,59,210
1208,2,LOUISE,2020,75,208


In [2]:
# Creating the stacked bar chart
chart = alt.Chart(top_10_names).mark_bar().encode(
    x='annais:O',
    y='sum(nombre):Q',
    color='preusuel:N',
    tooltip=['preusuel:N', 'annais:O', 'nombre:Q']
).properties(
    width=900,
    height=600,
    title='Top 10 Most Popular Names Over the Years'
)

chart

### Line Chart

1) To identify names that have consistently remained popular or unpopular, we calculate the average occurrences of each name across all years and sort them accordingly.

2) To identify names that were suddenly or briefly popular or unpopular, we analyze the yearly occurrences of each name and look for significant changes or spikes.

3) To identify trends over time, we analyze the overall pattern of occurrences for different names by creating line plots for a selected set of names.


Strengths:

- Visualizing Trends: Line charts are effective at showing trends and patterns over time. They allow us to easily observe the rise or decline of name popularity and identify any long-term trends.

- Comparing Multiple Names: Line charts enable the comparison of multiple names on the same chart. 

- Highlighting Significant Changes: By plotting the significant changes or spikes in occurrences, we can easily identify names that experienced sudden or brief popularity or unpopularity. These changes are visually apparent as peaks or valleys in the chart.

- Exploring Individual Name Histories: Line charts provide a way to explore the history of individual names over time. By hovering over the lines, we can see the specific occurrences of each name in different years.


Weaknesses : 

- Limited Comparison for Large Number of Names: If the number of names is very large, it can become visually cluttered and challenging to compare all the lines effectively. In such cases, focusing on a subset of names or using interactive features to filter or highlight specific names can be helpful.


In conclusion Line chart are quite useful if we want to answer to each question individually but it can become challenging if we want the answers to all the questions with a single graph.

In [20]:
# Group the data by name and year and calculate the total occurrences
name_counts = names.groupby(['preusuel', 'annais'])['nombre'].sum().reset_index()

# Calculate the average occurrences of each name
name_avg_counts = name_counts.groupby('preusuel')['nombre'].mean().reset_index()

# Sort the names based on average occurrences in descending order
popular_names = name_avg_counts.sort_values('nombre', ascending=False)

# Get the top 10 popular names
top_10_popular_names = popular_names.head(10)

# Get the bottom 10 unpopular names
bottom_10_unpopular_names = popular_names.tail(10)

# Filter the data for the top 10 popular names
top_10_popular_counts = name_counts[name_counts['preusuel'].isin(top_10_popular_names['preusuel'])]

# Filter the data for the top 10 unpopular names
top_10_unpopular_counts = name_counts[name_counts['preusuel'].isin(bottom_10_unpopular_names['preusuel'])]

# Define the Altair line chart for the top 10 popular names
chart_popular = alt.Chart(top_10_popular_counts).mark_line().encode(
    x='annais:O',
    y='nombre:Q',
    color='preusuel:N',
    tooltip=['preusuel:N', 'annais:O', 'nombre:Q']
).properties(
    width=800,
    height=400,
    title='Evolution of Top 10 Popular Names Over Time'
)

# Define the Altair line chart for the top 10 unpopular names
chart_unpopular = alt.Chart(top_10_unpopular_counts).mark_line().encode(
    x='annais:O',
    y='nombre:Q',
    color='preusuel:N',
    tooltip=['preusuel:N', 'annais:O', 'nombre:Q']
).properties(
    width=800,
    height=400,
    title='Evolution of Top 10 Unpopular Names Over Time'
)

# Display the line charts
chart_popular | chart_unpopular

In [21]:
# Calculate the yearly occurrences of each name
name_yearly_counts = name_counts.groupby(['preusuel', 'annais'])['nombre'].sum().reset_index()

# Calculate the difference in occurrences between consecutive years for each name
name_yearly_diff = name_yearly_counts.groupby('preusuel')['nombre'].diff()

# Filter names with significant changes or spikes in occurrences
significant_changes = name_yearly_counts[(name_yearly_diff > 1000) | (name_yearly_diff < -1000)]


# Define the Altair line chart
chart = alt.Chart(significant_changes).mark_line().encode(
    x='annais:O',
    y='nombre:Q',
    color='preusuel:N',
    tooltip=['preusuel:N', 'annais:O', 'nombre:Q']
).properties(
    width=800,
    height=400,
    title='Names with Significant Changes in Occurrences Over Time'
)

# Display the line chart
chart

In [17]:
# Select a set of names to visualize
selected_names = ['MARIE', 'JEAN', 'THIERRY']

# Filter the data for the selected names
selected_names_counts = name_counts[name_counts['preusuel'].isin(selected_names)]

# Define the Altair line plot
line_plot = alt.Chart(selected_names_counts).mark_line().encode(
    x='annais:O',
    y='nombre:Q',
    color='preusuel:N',
    tooltip=['preusuel:N', 'annais:O', 'nombre:Q']
).properties(
    width=900,
    height=600,
    title='Trends of Selected Names (MARIE, JEAN, THIERRY) Over Time'
)

# Display the line plot
line_plot


## Visualization 2 : 

Is there a regional effect in the data? 

Are some names more popular in some regions? 

Are popular names generally popular across the whole country?

### Heatmap

In this case, we created a heatmap to provide a comprehensive view of the popularity of baby names and to explore the regional effect in the data. We use a color gradient to represent the number of occurrences, with darker shades indicating higher popularity. This visualization allows for easy identification of popular and unpopular names across different departments.

The strengths and weaknesses of this type of chart for representing baby name popularity by region are as follows:

Strengths:

- Comparison of Popularity: The the color represents the number of occurrences of a specific name in a region, allowing for easy visual comparison of popularity across regions and names.

- Effective for displaying patterns and trends: Heatmaps are particularly useful for identifying patterns and trends in data. They allow us to quickly identify areas of high and low values, making it easy to spot clusters, correlations, and outliers.


Weaknesses:

- Distortion due to color perception: Color perception can vary among individuals, which may lead to differences in interpretation. It's essential to choose a color scheme that is accessible and avoids misleading interpretations.

- Difficulty in comparing exact values: While heatmaps provide a good sense of the relative magnitude or density of data values, they may not be ideal for precise comparisons between specific values.

In [8]:
region_counts = top_10_names.groupby(['dpt', 'preusuel'])['nombre'].sum().reset_index()

# Specify a color scheme
color_scheme = 'category10'  # You can choose from 'category10', 'accent', 'dark2', 'paired', 'pastel1', 'pastel2', 'set1', 'set2', 'set3', etc.

# Create the treemap chart with the specified color scheme
chart2 = alt.Chart(region_counts).mark_rect().encode(
    alt.X('dpt:N', axis=alt.Axis(title='Department')),
    alt.Y('preusuel:N', axis=alt.Axis(title='Name')),
    alt.Color('nombre:Q', scale=alt.Scale(scheme=color_scheme)),  # Set the color scheme
    alt.Tooltip(['preusuel:N', 'nombre:Q'])
).properties(
    width=800,
    height=900,
    title='Baby Name Popularity by Region'
)

# Display the chart
chart2