Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Shuojia's Original Project at


Using the IPUMPS data set, Shuojia observed:

  1. There is an upward trend with more women taking a share of STEM jobs.
  2. Gender was shown to yeild statistically significant findings in terms of:
  • unemployment - women were more unemployed, gap at 0.5%
  • pay gap - women were paid less, gap at 30%
  • job satifaction - women were less satisfied, gap at 1%
  1. Salary was predictable, with size of employer and degree as contributing variables. Location was not available.
  2. Job Field was predictable, with major and degree level as contributing variables.


Thank you for sharing your project with us. It was a great topic with many variables of interest, which clearly engaged all of us in discussion. We like how you thought to ask how the analysis could be used afterward, such as use by college or by Linked in, and how this thinking led to two models, one to predict salary and one to predict career path. We appreciate that you thought about the location as an important angle, even though it wasn’t available in the data. We found it to be an inspiring project.


Please put your name by a question to claim it, create a separate file to answer, and post the link Feel free to add more questions

  1. What is statisticians' definition of unemployment, more clearly? You don't have work? You have given up on looking for work?
  2. Higher-degreed people tend to marry one another; perhaps whoever gets the job first "wins", allowing the other more flexibility in employment.
  3. What colors should we use for the genders in data visualizations?
  4. Data sets that disambiguate masters' degrees as well as batchelors and PhDs... do we have them?
  5. When we plot the gender dimensions on a time series, how do we explain the ups and downs? How does the late-2008 financial crisis appear?
  6. How do the national averages for all women compare to STEM women?
  7. How is contract work reflected in the data?
  8. How are STEM crossovers described? Some non-STEM major go to work in STEM-adjacent jobs (e.g., Shannon)
  9. What is the process behind the collection of a givin data set? Is it a snapshot of a person at a certain junction? Is the subject followed later?
  10. Is the gender pay gap at entry level, and/or later down the road?
  11. For a given dataset, what is the date of the latest record? How fresh is the data, and does it matter?
  12. Does a given data set include management levels?
  13. Wait, if the gender is so influential, how did it play out in the predictive models?


For inspiration on additional variables of interest, read The Book of Why by Judea Pearl, pp. 304-316: in a case of gender discrimination, no discrimination was found when the variable of interest was Department. But when an additional variable of interest, State of Residence was illuminated, discrimination was more obvious.

You can’t perform that action at this time.