This is a notebook for looking at 35 years of historical US degrees data from NCES-IPEDS.
Initial exploration for the blog post
is in the file
Bookworming.Rmd. The full-text and code for the
Atlantic article is in
Atlantic.Rmd. That file does not contain the
file edits on the piece.
The taxonomy used here is a modified version of one developed by the
American Academy of Arts and Sciences for the Humanities Indicators. I
generally defer to their categories, but believe communications to be
in the social sciences (not humanities) and don't put "General
Studies" into the humanities bin. The precise taxonomy is in the
AmacadCIPS.csv, using the file
CIP-HI Crosswalk (1987-pres)-Table 1.csv. (Direct Link)
I have changed this file to include some disciplines for
non-humanities majors as well as to reflect my changed definition of the humanities.
You can examine the versioning history on it if you want to see my exact changes.
Data downloaded from the IPEDS series of the National Center for Education Statistics. Ones ending with '.txt' are pre-cleaned.
Tidying done in 'read_cips.R' (also not really the right name for the file).
A few dplyr functions are bundled into 'EDA_functions.R'.
functions.R is legacy code that was used to build the pre-1998 data.
Data is encoded. Information about majors is in
information about institutions is in
Because parsing the individual year data files takes a while,
read_cips.R creates intermediate
.feather files to load on
subsequent runs. This will take up additional disk space.