# Introduction
My cousin Morgan and his wife Lauren are expecting their second child. I asked them what names they are considering. They don't know *what* name they want to give the baby, but they do know what *kind* of name they want to give it. Normal, but not *too* common. You know, distinctive, but not *too* weird. A just-right "Goldilocks" name. 

As Goldilocks's own parents have demonstrated, it's a matter of taste whether a name is TOO common or TOO weird. But I suggested to Morgan that as long as he could put some quantative parameters on too common, he could use naming statistics to generate candidate baby names that are *objectively the *best* names for them. It's almost like SCIENCE! Nerd that I am, I decided to take this tongue in cheek project to the internet. This program gives you the top baby names from Social Security data 1880-2022, but allows you to filter your list a few different ways for what is "too normal." 

So Morg, here's the deal. The initial filters I suggest using here are my best guesses about what kind of names would hit the sweet spot of "normal but not TOO normal" for YOU two specifically. You can adjust the values and rerun this code as many times as you like, of course, but tell me how close I got! Also, if you name your baby by using this code, you are legally obligated to make me a godparent.

# How does this work? 
If you have never used a code notebook before, here are some instructions. A notebook has cells of regular text (like this one) interspersed with cells of code that you can run (like the next one). 

I recommend hiding the code cells to make the interactive parts of this easier to see. Go to the menu at the top, and select View > Collapse All Code.

Then select Run > Run All Cells
It may take a minute to import the name data, but then it should start printing more instructions for you. 

If you want to rerun just part of the program, you can also run the code cells one at a time. To run just one cell, click on it to select it, and then press the Play button at the top or select Run > Run Selected Cells.

You must run the first set-up cell every time you open this website, but after that you can run any of the other cells as many times as you like during that session. 

In [None]:
#Run this set-up to get started

import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import datetime

#set display parameter to see full lists of names
pd.options.display.max_rows = None

print('Importing name data...this may take a minute or two...')
print()
# Import name data     
names = pd.read_csv('mostnames.csv',
                   dtype = {'name':str,
                           'sex':str,
                           'count':int,
                           'year':int
                           })

# get the current year
this_year = datetime.date.today().year
print('Set up done! You only have to run this once per session.')
print('Now you can run the other cells as many times as you want.')
print("Run the next code cell whenever you are ready!")


In [None]:
# Run this next! The main name generator!
# 1) Get user parameters to eliminate the top ranking names for the last n years
print('~~~THE FIRST NAME WAS TOO HOT~~~')
print()
print("You don't want a name that's too TRENDY, right? Schools must be fully saturated with little Avas and Quinns by now.") 
print("My guess is anything that's been in the top 10 names for either sex anytime in the last 10 years is too trendy.")
print("So, when prompted, enter 10 for both the elimination rank and years. "
      "(You can run it again later and enter whatever you like.)")

rank = input("Eliminate anything that ranked this number or better: ")
rank = int(rank)
years = input("During the last how many years?: ")
years = int(years)

#Now get those trendy names
trendy = (names[names.year >= this_year-years]n
          
          .groupby(["year", "sex"])
          .apply(lambda x: x.nlargest(rank, "count"))
          .reset_index(drop=True)
          .groupby(['name', 'sex'])
          .size()
          .reset_index()
          .rename(columns={0:'years_on_top'})
          .sort_values(['sex', 'years_on_top'], ascending=[True, False]))
print('Okay, eliminated {} trendy names that made the top {} in the last {} years.'.format(len(trendy), rank, years))
printnames = input('Do you want to see the list? (y/n): ')
if printnames == 'y':
    print(trendy)          
else:
    print('Ok, moving on...')
    
# 2) Get user parameters for overused names
print()
print("~~~THE NEXT NAME WAS TOO OVERDONE~~~")
print
print("You probably also don't want a name that's been massively overused by a recent generation, either."
      " I think we have enough Jennifers for now, right?"
      " Let's try eliminating anything that has been in the top 10 for at least 3 of the last 20 years.")
print()
print("So, for the next prompts, enter:")
print("rank: 10")
print("number of times: 3")
print("years: 20")

rank2 = input("Eliminate any names that ranked this number or better: ")
rank2 = int(rank2)
freq2 = input("At least this many times: ")
freq2 = int(freq2)
years2 = input("In the last _____ years: ")
years2 = int(years2)

#Now get those overused names
overused = (names[names.year >= this_year-years2]
            .groupby(["year", "sex"])
            .apply(lambda x: x.nlargest(rank2, "count"))
            .reset_index(drop=True)
            .groupby(['name', 'sex'])
            .size()
            .reset_index()
            .rename(columns={0:'years_on_top'})
            .sort_values(['sex', 'years_on_top'], ascending=[True, False]))
overused = overused[overused.years_on_top >= freq2]
overused = overused[~overused.name.isin(trendy.name)]
print('Okay, eliminated an additional {} overused names that made the top {} in at least {} of the last {} years.'.format(len(overused), rank2, freq2, years2))
printnames = input('Do you want to see the list? y/n: ')
if printnames == 'y':
    print(overused)          
else:
    print('Ok, moving on...')
    
# 3) Get user parameters for names that are too common overall
print()
print("~~~THE NEXT NAME WAS TOO BORING~~~")
print
print("There are classics, and then there's creakers. We all know what I'm talking about. John. Mary."
      " The kind of names that barely function as names, because you have to add a descriptor if you want"
      " to specify which person you are talking about. (Tall Dave, Cousin Dave, or Bald Dave?)")
print()
print("I'm guessing that threshold for you is something like names that have been"
      " in the top ten for ten or more years during the last seventy years.")
print()
print("So, for the next prompts, enter:")
print("rank: 10")
print("number of times: 10")
print("years: 70")

rank2 = input("Eliminate any names that ranked this number or better: ")
rank2 = int(rank3)
freq2 = input("At least this many times: ")
freq2 = int(freq3)
years2 = input("In the last _____ years: ")
years2 = int(years3)

#Get those dead common names
common = (names[names.year >= this_year-years3]
            .groupby(["year", "sex"])
            .apply(lambda x: x.nlargest(rank3, "count"))
            .reset_index(drop=True)
            .groupby(['name', 'sex'])
            .size()
            .reset_index()
            .rename(columns={0:'years_on_top'})
            .sort_values(['sex', 'years_on_top'], ascending=[True, False]))
common = common[common.years_on_top >= freq3]
common = common[~common.name.isin(trendy.name) & ~common.name.isin(overused.name)]
print('Okay, eliminated {} common names that made the top {} in at least {} of the last {} years.'.format(len(common), rank3, freq3, years3))
printnames = input('Do you want to see the list? y/n: ')
if printnames == 'y':
    print(common)          
else:
    print('Ok, moving on...')

# 4) Get user parameters for name frequency
print()
print("~~~THE LAST NAME WAS JUST RIGHT~~~")
print
print("Moment of truth! We'll get a list of names that have been popular any time since 1880,"
      " minus the ones we just excluded. Let's get anything else that's ever been in the top twenty."
      " (It's probably fewer than you think!)")
print()
print("So, for the next prompt, enter:")
print("rank: 20")

rank4 = input('See any names that ranked this number or better: ')
rank4 = int(rank4)

#Get all the other top names
greatest_hits = (names.groupby(["year", "sex"])
             .apply(lambda x: x.nlargest(rank4, "count"))
             .reset_index(drop=True)
             .groupby(['name', 'sex'])
             .size()
             .reset_index()
             .rename(columns={0:'years_on_top'})
             .sort_values(['sex', 'years_on_top'], ascending=[True, False]))
             
#combine all those too common names and remove them from the greatest_hits
too_popular = pd.concat([trendy, overused, common]).drop_duplicates('name').sort_values(['sex','years_on_top'])
greatest_hits = greatest_hits[~greatest_hits.name.isin(too_popular.name)]

print('We found {} names in your Goldilocks zone!'.format(len(greatest_hits)))
printnames = input('Do you want to see the list? y/n: ')
if printnames == 'y':
    print(greatest_hits)          
else:
    print("Hmph! Well try it again to get a list you DO want to see.")

# Bonus Feature: Unisex Names
Morg, remember how I mentioned women with masculine names have been shown to succeed more in STEM and business? Want to see a list of names that have been used for both male and female babies recently? In the entire dataset, there are more than 11,000 names that have been registered for both girls and boys, but most of them are used overwhelmingly for one sex, with just few outliers of the other. Maybe you want to see names where only 5% of the babies cross the gender line, or maybe you would rather see names that are closer to 50-50. So next you will need to specify the level of gender neutrality you want these names to have, by specifying a minimum percentage of time the name should have been used for girls and boys. Run the next code block to see these.

Interesting side note: "Unisex" names almost always start as masculine names. When unisex names become popular as female name, parents have historically abandoned them as male names. This is why 98% of Ashleys and Leslies are female now, whereas 100% of them were male at the turn of the century, for example. However, I've found at least one recent exceptions to this rule: Shannon has a very long history as ambisexual, but after reaching a low of 4% male in the 90s, Shannon has slowly been regaining ground for males since the 2000s, with male use being 30-42% in the last five years. I wonder if this name is an exception or if there is more of this happening due to recent generations' more flexible attitudes towards gender?

In [None]:
#Get user input for time frame and gender neutrality
print('Since name gender shifts over time, please specify how many years you want this data to include.')
years_a = input('How many recent years?: ')
years_a = int(years_a)

print('OK, you want names that were used for both boys and girls over the last {} years.'.format(years_a))
f_percent = input('What should the minimum percentage of FEMALE use be in that time? (please enter only a number, no percent sign): ')
f_percent = int(f_percent)
if f_percent >= 1:
    f_percent = f_percent*.01
    
m_percent = input('And what should the minimum percentage of MALE use be? (please enter only a number, no percent sign): ')
m_percent = int(m_percent)
if m_percent >= 1:
    m_percent = m_percent*.01

#Find those ambisexual names!
recent = names[names.year >= this_year-years_a]
recent_by_sex = (recent
                 .groupby(['name', 'sex'])
                 .size()
                 .reset_index())

recent_ambi = set(recent_by_sex[recent_by_sex.duplicated('name')].name)

ambi = (recent[recent.name.isin(recent_ambi)]
               .groupby(["name", "sex"])
               .agg({'year':'count', 'count':'sum'})
               .reset_index()
               .rename(columns={'year':'years_used', 'count':'total'})
               .sort_values(['name', 'sex'], ascending=[True, False])
               .pivot(columns='sex', index='name', values='total'))

ambi['percent_female'] = ambi.F/(ambi.F + ambi.M)
ambi['total_uses'] = ambi.F + ambi.M

popular = ambi[ambi.percent_female >= f_percent]
popular = popular[popular.percent_female <= (1-m_percent)].sort_values('total_uses', ascending = False)
popular.percent_female = popular.percent_female.apply(lambda percent: round(percent, 2))

print('Your criteria returned {} names.'.format(len(popular)))
print('If that is more than you expected, it probably includes quite rare names.')
abridge = input('Would you like to narrow it down to just the more common names on the list? (y/n): ')
if abridge == 'y':
    satisfied = 'n'
    while satisfied != 'y':
        threshold = input('See only names with at least how many uses? (1000, 5000, etc): ')
        threshold = int(threshold)
        popular = popular[popular.total_uses >= threshold]
        print('There are {} names left now.'.format(len(popular)))
        satisfied = input('Is that list small enough? (y/n): ')
    print(popular)
else:
    seelist = input('Would you like to see the list? (y/n): ')
    if seelist == 'y':
        print(popular)

## Want to see the gender history of a specific name? 
Run the next code cell to check out the details on any specific name.

In [None]:
#See how a specific name has changed over time
def sexism(testname):
    name_df = (names[names.name == testname]
               .pivot(columns = 'sex', values = 'count', index = 'year')
               .fillna(0))
    if len(name_df) == 0:
        print(testname + ' not found.')
        return None
    elif len(name_df.columns) < 2:
        print(testname + ' only recorded for one sex.')
        return name_df
    else:
        name_df['percent_female'] = round(name_df.F/(name_df.F + name_df.M), 2)
        return name_df

testname = input('What name? (Make sure to capitalize it): ')
name_report = sexism(testname)
if name_report is not None:
    print(name_report)