public
Description: Distribution of US baby names, 1880-2008
Homepage:
Clone URL: git://github.com/hadley/data-baby-names.git
name age message
file .gitignore Fri May 15 07:46:47 -0700 2009 First pass of downloading, parsing, cleaning an... [hadley]
file 1-download.r Fri May 15 07:46:47 -0700 2009 First pass of downloading, parsing, cleaning an... [hadley]
file 2-parse.rb Fri May 15 07:46:47 -0700 2009 First pass of downloading, parsing, cleaning an... [hadley]
file 3-clean.r Fri May 15 07:48:29 -0700 2009 Save out names as csv file [hadley]
file 4-explore.r Fri May 15 09:02:34 -0700 2009 Bigger image [hadley]
file 5-old-testament.r Sun Jun 07 21:11:55 -0700 2009 Explore usage of old testament names [hadley]
file 6-sex-exploration.r Tue Oct 06 15:13:15 -0700 2009 Combine both sex related explorations into one [hadley]
file 7-top5.r Mon Jun 08 07:20:28 -0700 2009 Explore top 5 names [hadley]
file 8-variable.r Wed Oct 21 09:34:11 -0700 2009 Add births info and script to extract interesti... [hadley]
file baby-names-by-state.csv Mon Jun 08 07:20:42 -0700 2009 Cleaned baby names by state [hadley]
file baby-names.csv Fri May 15 07:48:29 -0700 2009 Save out names as csv file [hadley]
file births.csv Wed Oct 21 09:34:11 -0700 2009 Add births info and script to extract interesti... [hadley]
directory by-state/ Mon Jun 08 07:20:42 -0700 2009 Cleaned baby names by state [hadley]
directory images/ Fri May 15 09:02:34 -0700 2009 Bigger image [hadley]
file old-testament.txt Sun Jun 07 21:11:55 -0700 2009 Explore usage of old testament names [hadley]
file readme.markdown Sun May 17 07:59:07 -0700 2009 Give full link to baby-names.csv [hadley]
readme.markdown

US Baby names 1880-2009

Data

baby-names.csv contains the top 1000 girl and boy baby names from 1880 to 2009. This data was aggregated from the data made available from the social security administration. If you want to recreate it yourself, run the files 1-download.r, 2-parse.rb and 3-clean.r in order. You will need both R and ruby.

Percent of names in top 1000

Percent of baby names in top 1000

Since the 1960's the percentage of babies with names in the top 1000 has been shrinking, to it's current level of 80% of boys and 67% of girls.

Last letters

Stimulated by the discussion on Andrew Gelman's blog (prompted by an old post of the baby name wizard blog) here are plots showing the distribution of last letter of names, 1880-2008.

Distribution of last letter of baby names