# Project 3 - Gender-Neutral Baby Name Trends in NYC (2011–2021)

**Robin Zhao**

- **Dataset(s) to be used:** [Popular Baby Names](https://data.cityofnewyork.us/Health/Popular-Baby-Names/25th-nujf/about_data)
- **Analysis question:** Are more babies receiving gender-neutral names over time? A "gender-neutral name" is defined as a name that appears for both male and female babies.
- **Columns that will (likely) be used:**
  - Year of Birth
  - Gender
  - Child's First Name
  - Count
- Since I am using only one dataset, there are no columns needed for merging or joining.

- **Hypothesis**: I hypothesize that the proportion of babies receiving gender-neutral names has increased in recent years, reflecting cultural shifts toward greater openness around gender identity.

## Introduction
The most important reason I chose to explore gender-neutral baby names is that my own English name, Robin, is a gender-neutral name. I adopted this name on 2022. Before that, I went by Eleanor, a traditionally feminine name. My Chinese name, Ruobing, sounds very similar to Robin, and I liked that connection, but I did not choose Robin earlier because I believed it was mostly a boy’s name.

As a sociology major in college, I was introduced to ideas about gender, identity, and the social construction of norms. Before that, I held some of the common stereotypes many people have, for example, assuming that names should clearly signal gender, or that gender-neutral names were “mostly for boys.” The more I studied topics like gender roles, labeling, and identity expression, the more I began questioning these assumptions.

The more I studied, the more I realized that choosing a gender-neutral name did not make my identity less clear, instead, it gave me a sense of openness and flexibility that aligned with how I understood myself. Changing my name to Robin in 2022 felt like a meaningful reflection of this shift. It represented both personal growth and a move away from rigid gender expectations.

Because of this experience, I am especially interested in whether more babies today are receiving gender-neutral names in more recently. Examining this trend allows me to connect my personal story with broader cultural changes in how society understands gender and identity.

## Step 0: Basic Setting

In [1]:
import pandas as pd
import plotly.io as pio
pio.renderers.default = "notebook_connected+plotly_mimetype"

import plotly.express as px


## Step 1: Load Data

In [2]:
names = pd.read_csv("Popular_Baby_Names_20251206.csv")
names.head()

Unnamed: 0,Year of Birth,Gender,Ethnicity,Child's First Name,Count,Rank
0,2021,FEMALE,ASIAN AND PACIFIC ISLANDER,Chloe,71,1
1,2021,FEMALE,ASIAN AND PACIFIC ISLANDER,Olivia,71,1
2,2021,FEMALE,ASIAN AND PACIFIC ISLANDER,Emma,66,2
3,2021,FEMALE,ASIAN AND PACIFIC ISLANDER,Mia,59,3
4,2021,FEMALE,ASIAN AND PACIFIC ISLANDER,Ava,53,4


In [3]:
names.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 77287 entries, 0 to 77286
Data columns (total 6 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Year of Birth       77287 non-null  int64 
 1   Gender              77287 non-null  object
 2   Ethnicity           77287 non-null  object
 3   Child's First Name  77287 non-null  object
 4   Count               77287 non-null  int64 
 5   Rank                77287 non-null  int64 
dtypes: int64(3), object(3)
memory usage: 3.5+ MB


In step 1, I load the Popular Baby Names dataset for New York City. The dataset contains annual records of baby names, including each child's year of birth, gender, ethnicity, first name, and the number of babies who received that name in a given year.

This information is essential for my project because it allows me to identify names that appear for both male and female babies. By examining the dataset from 2011 to 2021, I can analyze how the use of gender-neutral names has changed over time and begin addressing my research question.

## Step 2: Find Gender-Neutral Names

In [4]:
name_gender = (
    names
    .groupby(["Year of Birth", "Child's First Name"])["Gender"]
    .unique() # Get unique
    .reset_index(name = "Gender List")
)
name_gender.head()

Unnamed: 0,Year of Birth,Child's First Name,Gender List
0,2011,AALIYAH,[FEMALE]
1,2011,AARAV,[MALE]
2,2011,AARON,[MALE]
3,2011,ABBY,[FEMALE]
4,2011,ABDIEL,[MALE]


In [5]:
name_gender["# of genders"] = name_gender["Gender List"].apply(len) # Count number
neutral_names = name_gender[name_gender["# of genders"] == 2] # Filter neutral names
neutral_names.head()

Unnamed: 0,Year of Birth,Child's First Name,Gender List,# of genders
53,2011,ALEXIS,"[FEMALE, MALE]",2
102,2011,ANGEL,"[FEMALE, MALE]",2
127,2011,ARIEL,"[FEMALE, MALE]",2
155,2011,AVERY,"[FEMALE, MALE]",2
185,2011,BLAKE,"[FEMALE, MALE]",2


In this step, I identify which names are gender-neutral. I group the data by Year of Birth and Child’s First Name and collect the unique genders associated with each name. If a name appears for both FEMALE and MALE in the same year (that is, it has two unique genders), I label it as gender-neutral and keep those rows in the neutral_names table.

## Step 3: Calculate Gender-Neutral Naming Trends

In [6]:
neutral_with_counts = names.merge(
    neutral_names[["Year of Birth", "Child's First Name"]],
    on=["Year of Birth", "Child's First Name"],
    how="inner", # Inner join to keep only neutral names
)
    
neutral_with_counts.head()

Unnamed: 0,Year of Birth,Gender,Ethnicity,Child's First Name,Count,Rank
0,2021,FEMALE,ASIAN AND PACIFIC ISLANDER,Tenzin,26,17
1,2021,FEMALE,ASIAN AND PACIFIC ISLANDER,Avery,15,27
2,2021,FEMALE,BLACK NON HISPANIC,Avery,15,26
3,2021,FEMALE,BLACK NON HISPANIC,Ariel,15,26
4,2021,FEMALE,BLACK NON HISPANIC,Shiloh,13,28


Here I code an inner merge to keep only the rows that correspond to gender-neutral names. The result preserves all available demographic and count information for gender-neutral names.

In [7]:
neutral_baby_counts = (
    neutral_with_counts
    .groupby("Year of Birth")["Count"] # Group by year
    .sum() # Sum counts
    .reset_index(name = "Total Neutral Name Baby Count")
)
neutral_baby_counts.head()

Unnamed: 0,Year of Birth,Total Neutral Name Baby Count
0,2011,15928
1,2012,16897
2,2013,14259
3,2014,17190
4,2015,2610


Here I aggregate the total number of babies who received gender-neutral names in each year. This provides the numerator for calculating the yearly proportion.

In [8]:
total_baby_counts = (
    names
    .groupby("Year of Birth")["Count"]
    .sum()
    .reset_index(name = "Total Baby Count") 
)
# Same as before but this is for all names
total_baby_counts.head()

Unnamed: 0,Year of Birth,Total Baby Count
0,2011,541525
1,2012,557042
2,2013,540062
3,2014,544078
4,2015,69600


This part computes the total number of babies born in each year across all names. This will serve as the denominator when calculating the proportion of gender-neutral names.

In [9]:
baby_stats = pd.merge(
    neutral_baby_counts,
    total_baby_counts,
    on="Year of Birth",
    how = "left" # Left join to keep all years
) # Merge neutral and total counts
baby_stats["Proportion Neutral Name Babies"] = (
    baby_stats["Total Neutral Name Baby Count"] / baby_stats["Total Baby Count"]
) # Create proportion column

baby_stats_sorted = baby_stats.sort_values("Year of Birth")
# Sort by year
baby_stats_sorted.head()

Unnamed: 0,Year of Birth,Total Neutral Name Baby Count,Total Baby Count,Proportion Neutral Name Babies
0,2011,15928,541525,0.029413
1,2012,16897,557042,0.030333
2,2013,14259,540062,0.026403
3,2014,17190,544078,0.031595
4,2015,2610,69600,0.0375


Finally, I combine the gender-neutral name counts with total baby counts and compute the proportion of babies given gender-neutral names each year. Sorting by year prepares the data for visualizing the long-term trend.

## Step 4: Visualization

In [10]:
fig = px.line(
    baby_stats_sorted,
    x="Year of Birth",
    y="Total Neutral Name Baby Count",
    title=" Number of Babies with Gender-Neutral Names Over Time (NYC, 2011-2021)",
)
fig.show()
    

This line chart visualizes the total number of babies given gender-neutral names in NYC from 2011 to 2021. It allows us to observe overall trends in how frequently gender-neutral names are used over time. The sharp decline after 2014 suggests a notable shift in naming patterns, which is surprising. A more detailed interpretation will be discussed in the final conclusion section.

In [11]:
fig = px.line(
    baby_stats_sorted,
    x="Year of Birth",
    y="Proportion Neutral Name Babies",
    title="Proportion of Babies with Gender-Neutral Names Over Time (NYC, 2011-2021)",
)
fig.show()

This line chart shows the proportion of all babies born each year who received a gender-neutral name. Using proportions helps control for changes in total births and provides a clearer measure of popularity. Same pattern as the first chart, the overall trend remains downward after 2015, however, the peak around 2015 indicates a period of increased interest in gender-neutral names.

In [12]:
yearly_top = (
    neutral_with_counts
    .sort_values(["Year of Birth", "Count"], ascending=[True, False]) # Sort by year and count
    .groupby("Year of Birth") # Group by year
    .head(1) # Get top name per year
    .reset_index()
)
yearly_top[['Year of Birth', "Child's First Name", "Count"]]

Unnamed: 0,Year of Birth,Child's First Name,Count
0,2011,ANGEL,253
1,2012,ANGEL,236
2,2013,Dylan,270
3,2014,Dylan,292
4,2015,Dylan,339
5,2016,Dylan,312
6,2017,Dylan,287
7,2018,Dylan,244
8,2019,Dylan,212
9,2020,Dylan,196


This table identifies the most common gender-neutral baby name for each year by selecting the name with the highest count within each year. It helps us understand which specific names contributed most to the overall trend. After 2012, "Dylan" consistently appears as the most popular gender-neutral name.

In [13]:
fig = px.bar(
    yearly_top,
    x="Year of Birth",
    y="Count",
    hover_data=["Child's First Name"],
    color = "Child's First Name",
    title="Most Popular Gender-Neutral Baby Name Each Year (NYC, 2011-2021)",
)
fig.show()

This bar chart visualizes the count of the most popular gender-neutral name in each year. It highlights the rise and gradual decline in popularity of names like "Dylan" during the decade. The pattern mirrors the earlier charts, reinforcing that both overall usage and top-name usage decreased after the mid-2010s.

## Conclusion

My initial hypothesis was that the use of gender-neutral baby names in NYC would increase over time, especially given broader cultural conversations about gender inclusivity and the rising popularity of unisex identities. However, the results of this analysis do not align with that expectation. Instead of showing a steady upward trend, the total number of babies with gender-neutral names drops sharply after 2014 and remains relatively low through 2021. Even when adjusting for overall birth counts, the proportion of neutral-name babies demonstrates a similar decline. This downward trend also appears consistently in the popularity of individual gender-neutral names.

There are several possible explanations for this unexpected pattern. First, the dataset ends in 2021, which means we are missing the years 2022–2025, which is a period that public discussions about gender identity, nonbinary recognition, and inclusive naming practices have accelerated significantly. It is possible that the trend shifted upward again after 2021, but the absence of these later years prevents us from capturing that movement.

Second, the steep decline after 2014 may reflect broader demographic and social changes in NYC, including changes in the composition of the population, migration patterns, and birth rates across different cultural groups—factors that influence naming preferences independently of gender-identity trends.

Finally, because the dataset reflects NYC only, it may not represent national or global naming patterns. It is possible that gender-neutral naming increased elsewhere but not within NYC specifically.

Overall, while this analysis did not confirm my original hypothesis, it offers a useful starting point for understanding how gender-neutral naming patterns may be shaped by cultural, demographic, and data-related factors. The decline observed after 2014 raises important questions rather than definitive answers, and the missing post-2021 data leaves open the possibility that more recent shifts in gender discourse could reveal different trends. With a more complete dataset in future years, we may see a clearer picture, either confirming the patterns observed here or revealing new shifts in gender-neutral naming.