# Visualization 3

## About the evolution of names within "sexe" in France.

Are there gender effects in the data? Does popularity of names given to both sexes evolve consistently? (Note: this data set treats sex as binary; this is a simplification that carries into this assignment but does not generally hold.)

We first process the data with Python, create a csv and visualize it with Tableau.

## Data

In [None]:
# Content of babynames/utils.py
# Rewritten here to simplify the understanding of the project

import polars as pl
import re
import os

data_dir_path = os.path.join('..', 'data')
data_file_path = os.path.join(data_dir_path, 'dpt2020.csv')
data_coord_path = os.path.join(data_dir_path, 'dpt.csv')

def load_baby_names_data(base_file_path=data_file_path, coord_file_path=data_coord_path):
    """
    Load data for the baby names project from CSV files using Polars.

    Parameters:
    base_file_path (str): The path to the CSV file for base data.
    coord_file_path (str): The path to the CSV file for coord data.

    Returns:
    DataFrame: A Polars DataFrame containing a merge of all the loaded data.
    """
    data = load_data(base_file_path)
    data_coord = load_coord_data(coord_file_path)
    baby_names_data = data.join(data_coord, on='dpt', how='left')
    return baby_names_data

def load_data(file_path):
    """
    Load data from a CSV file using Polars.

    Parameters:
    file_path (str): The path to the CSV file.

    Returns:
    DataFrame: A Polars DataFrame containing the loaded data.
    """

    # Define the data types
    dtypes = {
        'sexe': pl.Int64,
        'preusuel': pl.Utf8,
        'annais': pl.Utf8,
        'dpt': pl.Utf8,
        'nombre': pl.Int64
    }

    # Load the data
    data = pl.read_csv(file_path, dtypes=dtypes)

    return data

def load_coord_data(file_path):
    """
    Load data from a CSV file using Polars.

    Parameters:
    file_path (str): The path to the CSV file.

    Returns:
    DataFrame: A Polars DataFrame containing the loaded data.
    """
    # Load the data
    data = pl.read_csv(file_path)

    # Apply the conversion function to the 'longitude' and 'latitude' columns
    data = data.with_columns(data['longitude'].apply(dms_to_dd).alias('longitude'))
    data = data.with_columns(data['latitude'].apply(dms_to_dd).alias('latitude'))

    return data

def dms_to_dd(dms):
    """
    Convert degrees, minutes, and seconds to decimal degrees.

    Parameters:
    dms (str): A string representing degrees, minutes, and seconds in the format 'D°M'S" O/E'.

    Returns:
    float: The equivalent value in decimal degrees.
    """
    # example 5°20'56'' E, 46°05'58''
    degrees, minutes, seconds, direction = re.split('[°\'"]+', dms)
    dd = float(degrees) + float(minutes)/60 + float(seconds)/(60*60)
    if str.strip(direction) in ('S','O'):
        dd *= -1
    return dd

In [None]:
# Load the data with additionnal columns about the departements geography
df = load_baby_names_data()

In [None]:
# Preprocess the data 
# Calculate the ratio of a given sexe for a preusuel x annais x dpt (prénom usuel*année*dpt)
# This ratio will called ratio_sexe in the visualization
# Fake example : 
# on one row you'll have Claude with a ratio of 0.2 for women, in 1962, département 75
# on another row Claude with a ratio of 0.8 for men, in 1962, département 75
preusuel_annais = (df[["preusuel", "annais", "dpt", "nombre"]]).groupby(["preusuel", "annais", "dpt"], maintain_order=True).sum()
df = df.join(preusuel_annais, on=["preusuel", "annais", "dpt"], how='left')
df = df.with_columns((pl.col("nombre") / + pl.col("nombre_right")).alias("sexe_ratio"))
df = df.rename({"nombre_right":"total_sexe_indistinct"})
df

In [None]:
df.write_csv(file="dpt_france_sexratio_name.csv")

Now we use this csv in Tableau.

To generate the following dashboard, to answer the original question.

## Visualization

![Evolution of prénoms & sexes in France 1900-1920](../../images/Visualisation_3/global_debut.JPG "Evolution of prénoms & sexes in France PART 1")

On top left corner, we have a table made of maps of France. There are two rows, one for the "feminin" population and another for the "masculin" one. Each column corresponds to a full decade from 1900 to 2010.

On a map of France, a disk is located on a département center. The size is relative to the corresponding population that year. Actually each disk is a pie chart with slices for differents values of ratio_sexe as explained earlier in the code.
The lower the ratio the more yellow, the higher the greener. 

For instance here we that overall the names have high ratio meaning men have (nearly) men-only names and women have women-only names.

Note that there are filters on the right and bottom. To filter specific names or a set of names according to their overall popularity.

But for now let's how France evolve globally as far as prénoms and sexes are concerned.

![Evolution of prénoms & sexes in France 1990-2010](../../images/Visualisation_3/global_fin.JPG "Evolution of prénoms & sexes in France PART 2")

A growing part of our pie is becoming slightly more yellow over time, meaning that more and more people decade after decade have names that are not only given to one sexe.

Even though it is not the focus here, one can see the evolution of the french demography for each sexe with this visualisation. With the baby boom, and the differences in départements. Seems like they make more babies in the extreme North.

Now that we have checked names in general, let's focus, and see a famous particular example of a French name given to both men and women : Camille. 

![Evolution of Camille & sexes in France 1900-1920](../../images/Visualisation_3/camille_debut.JPG "Evolution of Camille & sexes in France PART 1")

In the beginning 1900-1920 Camille is modestly male dominated. The name is also disappearing gradually, before reappearing, in a very different way.

![Evolution of Camille & sexes in France 1990-2010](../../images/Visualisation_3/camille_fin.JPG "Evolution of Camille & sexes in France PART 2")

In the 90s happens the explosion of the name Camille for women. As time goes, we see that not only the volume diminishes but the ratio evens out !

Unfortunately looking at the tail only of the names popularity graph/filter on the low part does not yield interesting results.

So let's look at another name more popular recently with different characteristics : Ange.

![Evolution of Ange & sexes in France 1900-1920](../../images/Visualisation_3/ange_debut.JPG "Evolution of Ange & sexes in France PART 1")

Ange has a special path, originating more in the west départements of Bretagne, this unusual name (sort of) migrates toward Paris and dies there. It is also male dominated.

![Evolution of Ange & sexes in France 1990-2010](../../images/Visualisation_3/ange_fin.JPG "Evolution of Ange & sexes in France PART 2")

However it sparks again in the Paris area and some other places with the emergence of female Ange. 
Hence we foud a name that participates to the growing trend of unisex names.