# COGS 108 - Project Proposal

# Names

- Caitlyn Cielo
- Byung Joon (Alex) Park
- Matt Yang
- Tianyu Wang
- Timothy Kam

# Research Question

“How does the type of academic major influence early career wages and the usefulness of the major when pursuing a career in the United States?”

***What we are measuring:***

- Specifically, how do earnings differ between individuals with differing major types, how does the choice of major influence these earnings disparities, and assess the applicability of their education to their career path? Our objective is to explore both tangible (salary) and intangible (major relevance) impacts of academic major selection on post-graduate success. 




## Background and Prior Work

The relationship between major types, wages, and the applicability of that major degree has been an interesting subject of inquiry within the field of labor economics. This research question delves into pivotal sources that shed light on the economic and professional implications of major selection.

Firstly, the U.S. Bureau of Labor Statistics (BLS) "Education Pays, 2021" report provides a foundational perspective on the broader economic value of educational attainment, revealing how different levels of education correlate with varying median weekly earnings and unemployment rates. The report underscores the premise that higher educational levels typically afford higher earnings and lower unemployment risks. This backdrop is crucial for our analysis, setting the stage to dive deeper into how specific major choices within these educational levels further influence early career outcomes. Although the BLS data is expansive across educational levels, it highlights the importance of further investigation into the nuances of major selection, pointing towards the need for more detailed, major-specific analyses [1].

In addition, the Pew Research Center’s Social & Demographic Trend Project report, "The Economic Value of College Majors: The Rising Cost of Not Going to College," states an in-depth examination of wages associated with 137 college majors. It distinctly illustrates the lifetime earnings gap between the highest and lowest-paying majors and identifies STEM, health, and business majors as among the highest-paying. This analysis is particularly relevant to our research question as it directly relates major selection to financial outcomes and, by extension, to the perceived applicability and usefulness of education in one's career path. The report's findings on major-specific earnings, coupled with its insights into the popularity and economic benefits of earning an advanced degree by undergraduate major, provide a comprehensive understanding of the tangible and intangible impacts of major selection on post-graduate success [2].

Together, these sources equip us with a robust framework to explore how different academic majors impact early career wages and the subjective evaluation of major relevance in professional settings. By analyzing both the broad economic trends highlighted, we aim to uncover nuanced insights into the implications of major choice on early career trajectories and the alignment of educational backgrounds with career aspirations.

1. <a name="U.S. Bureau of Labor Statistics."></a> [^](#U.S. Bureau of Labor Statistics.) (n.d.). Education pays, 2021 : Career Outlook. U.S. Bureau of Labor Statistics. https://www.bls.gov/careeroutlook/2022/data-on-display/education-pays.htm 
2. <a name="Pew Research Center"></a> [^](#The Rising Cost of Not Going to College.) Pew Research Center’s Social & Demographic Trends Project, Pew Research Center, Feb. 2014, www.pewresearch.org/social-trends/2014/02/11/the-rising-cost-of-not-going-to-college/.

# Hypothesis


We hypothesize that, on average, individuals who have more science-oriented major graduates will be significantly more likely to work at a job that uses their major and make significantly more money than humanities or art-related majors. We recognize that the relationship between job opportunities and wages varies across different majors, influenced by the demand for specific skills, the marketability of those skills, and the dynamic nature of job markets. To account for these variations, our analysis will segment data by field of study and account for factors such as the sample size surveyed for each major and the possible range of errors within the data set. We will also be pulling from a database with over 1 million total individual points of data taken by the US census, so it will be an accurate and correct representation of the relations that exist. This methodology aims to provide a nuanced understanding of how the major is used in post-graduate jobs and how that influences the amount of money those individuals make.

# Data

information on academic majors, early career wages, employment sectors, and other relevant metrics such as job satisfaction, major-specific employment rates, type of degree, annual earnings, and relevance of job to the major. Additionally, demographic info such as gender, ethnicity, and age groups could provide necessary information into the disparities. Our analysis will leverage publicly available datasets that provide comprehensive insights into the relationship between academic majors and career outcomes in the United States.

The dataset we use below is particularly suited for this analysis as it contains detailed info on 173 different college majors and 5,709,666 observations, making it well within the typical expectation for a comprehensive analysis which often revolves around datasets with about 1000 observations or more. In this case, each major serves as an observation, with the dataset providing a rich array of features for each such as rank (rank by median earning), major, major_category, total (number of people w/ a major), men, women, share women (proportion of female graduates), median, full_time, and part_time. Its coverage of majors and associated economic outcomes makes it a key resource to understand the economic value of college majors. 

Given the richness of our dataset, we can explore various angles, like the gender distribution within majors, the relationship between the share of women in a major and the median wages, and the distribution of college vs. non-college jobs across different fields of study. In addition, this dataset focuses on recent datasets and provides insight into how different fields of study correlate with economic success in the labor market. Overall, its structured format allows for efficient data processing, enabling an examination of the relationship between academic disciplines and employment outcomes. 


## Data overview

For each dataset include the following information

- Dataset #1
        
    - ***Dataset Name:*** American Community Survey 2010-2012 Public Use Microdata Series: College Majors
    - ***Link to the dataset:*** https://github.com/fivethirtyeight/data/blob/master/college-majors/recent-grads.csv
    
    - ***Number of observations:*** The sum of everything in the "Total" column is 5,709,666.
    - ***Number of variables:*** 21 

1. ***Rank:*** This variable represents the ranking of the academic majors based on some criteria, such as median salary or employment rate. Data type: Integer

2. ***Major_code:*** A numerical code or identifier assigned to each academic major. Data type: Integer

3. ***Major:*** The name or label of the academic major. Data type: String

4. ***Total:*** Total number of individuals who graduated with the corresponding major. Data type: Integer

5. ***Men:*** Number of male graduates for the major. Data type: Integer

6. ***Women:*** Number of female graduates for the major. Data type: Integer

7. ***Major_category:*** The category or field to which the major belongs (e.g., Engineering, Arts, Business). Data type: String

8. ***ShareWomen:*** The proportion of female graduates in the major, represented as a percentage. Data type: Float

9. ***Sample_size:*** The size of the sample used to collect data for the major. Data type: Integer

10. ***Employed:*** Number of graduates who are employed. Data type: Integer

11. ***Full_time:*** Number of graduates employed full-time. Data type: Integer

12. ***Part_time:*** Number of graduates employed part-time. Data type: Integer

13. ***Full_time_year_round:*** Number of graduates employed full-time year-round. Data type: Integer

14. ***Unemployed:*** Number of unemployed graduates. Data type: Integer

15. ***Unemployment_rate:*** The unemployment rate among graduates with the major, represented as a percentage. Data type: Float

16. ***Median:*** The median earnings of graduates with the major. Data type: Integer

17. ***P25th:*** The 25th percentile of earnings of graduates with the major. Data type: Integer

18. ***P75th:*** The 75th percentile of earnings of graduates with the major. Data type: Integer

19. ***College_jobs:*** Number of graduates whose jobs require a college degree. Data type: Integer

20. ***Non_college_jobs:*** Number of graduates whose jobs do not require a college degree. Data type: Integer

21. ***Low_wage_jobs:*** Number of graduates whose jobs are considered low-wage. Data type: Integer


# Ethics & Privacy

1. ***Data Permission:***
    - For our analysis, we specifically used a dataset regarding graduates' employment outcomes, which was sourced from publicly accessible repositories on GitHub and Data. These platforms host datasets under conditions that permit academic and research use, provided the data is publicly listed. Our dataset's public availability on GitHub signifies the repository owner's consent for its use, aligning with GitHub's policy that distinguishes between private and public repositories. Data similarly hosts datasets with explicit licensing, clarifying permissible uses. By selecting a dataset from a public GitHub repository and Data. We ensured our project adhered to open data principles and respected the intentions of the data contributors. 


2. ***Potential Data Discrepancies and Biases:***
    - While our dataset is derived from reputable sources and reflects information from the U.S. Census Bureau (USCB), we acknowledge the inherent potential for biases and discrepancies. These could stem from the survey design, data collection methods, or the representation within the dataset. Recognizing these limitations, we have undertaken our analysis with a commitment to minimizing these biases, ensuring our interpretations and conclusions are as accurate and unbiased as possible. The original data collection by the USCB is guided by stringent protocols to ensure accuracy and reliability, and our analysis seeks to extend these principles by critically evaluating the data and its implications.
 

3. ***Data Privacy:***
    - Our dataset focuses on the relationship between educational attainment, major choice, and wages, without including personally identifiable information. This approach aligns with the data privacy policies of both the USCB and GitHub, ensuring the anonymity of individuals represented in the dataset. The USCB's privacy policy is rigorous, protecting survey participants' information, and by using a dataset based on their data, we inherit these privacy protections. GitHub's policy on datasets prohibits the sharing of personal information like addresses or phone numbers, and our adherence to using an anonymized dataset ensures we respect individual privacy while facilitating our analysis. This careful consideration of privacy concerns underscores our commitment to ethical research practices, allowing us to explore important educational and economic questions responsibly.

# Team Expectations 

For our project, Caitlyn Cielo, Byung Joon Park, Matt Yang, Tianyu Wang, and Timothy Kam, have established a set of expectations to ensure the successful completion of the COGS 108 final assignment that includes but is not limited to the following.

We commit to maintaining open, honest communication, utilizing platforms like weekly Zoom and in-person meetings to facilitate collaboration. Each member is expected to contribute equally, meet all deadlines, and maintain a high level of quality in their work. We aim to leverage each member's strengths and provide support in areas of growth to foster a positive and productive team environment. Should conflicts arise, we agree to approach them constructively, prioritizing respect. We will aim to resolve disagreements through open discussion and clear communication.


# Project Timeline Proposal


| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 2/14  |  8 PM | Define research objectives: Investigate how education level and major choice influence wages in the United States. | Finalize methodology: Plan to use available datasets on education, major, and wages for analysis; Orchestrate the contributions of each individual within the group. | 
| 2/21  |  8 PM | Collect datasets: Gather data on education levels, major types, and wages from reliable sources. | Clean data: Address missing values, and outliers, and ensure data consistency. | 
| 2/28  |  8 PM | Summarize data: Utilize descriptive statistics to understand basic trends and distributions. | Explore initial relationships: Identify any noticeable associations between education, major, and wages. |
| 2/14 |  8 PM | Conduct regression: Perform analysis to assess the impact of education and major on wages while controlling for relevant variables. | Interpret results: Understand the significance of education and major choice in determining wage outcomes. |
| 2/28 |  8 PM | Prepare report: Compile findings, methodology, and conclusions into a succinct report. | Develop presentation: Summarize key findings and insights for a brief presentation to peers and instructors. |
| 3/6  |  8 PM | Complete analysis; Draft results/conclusion/discussion | Discuss/edit full project |
| 3/13 |  8 PM | NA | Turn in Final Project & Group Project Surveys |