## **1994 Wage Inequality Among US Citizens Based on Annual Income**

##### Authors: Danielle Keith, Sam Chin, Jason Zheng

### Introduction

The 1990s saw a major economic boom in the US. Fueled by a tech boom, the economy grew at 3% with low unemployment and 2 million new jobs. While income increased in 1994, minorities and women still faced significant wage inequality (Wilson, 2016). This highlights that economic prosperity didn't necessarily translate to equal pay for all groups.

An analysis of 1994 census data will allow us to predict whether a given person's income is greater or lesser than/equal to $50,000 based on factors like level of education, age, sex, race, native country, and hours worked per week

We will be using the `Adult` dataset taken from https://archive.ics.uci.edu/dataset/2/adult, which has extracted 32,560 entries from the 1994 US census. Of the 32,560 rows, each observation represents a single person and their various attributes. There are 15 columns each signifying a different defining factor.

 - `age`: af ange o indivi
ual
 - `workclass`: employment status (ex. self-employed, private, unemployed)
 - `fnlwgt`: final weight or the number of people this individual's entry represents
 - `education`: the highest level of education completed (ex. 12th grade, Bachelor's, Doctorate)
 - `education-num`: the highest level of education completed in numerical form 
 - `marital-status`: marital status (ex. married, single)
 - `occupation`: general type of occupation held (ex. sales, services, etc.)
 - `relationship`: primary relationship to others (ex. wife, husband, relative)
 - `race`: racial identity (ex. white, black, asian)
 - `sex`: biological sex (ex. male, female)
 - `capital-gain`: money earned on investments
 - `capital-loss`: money lost on investments
 - `hours-per-week`: hours at work each week
 - `native-country`: country of origin (ex. United States, India, Cuba)
 - `income`: annual income in USD (by <=50k, >50k)50k)

### Methods

### Results

In [1]:
# Loading libraries
library(tidyverse)
library(repr)
library(tidymodels)

“package ‘ggplot2’ was built under R version 4.3.2”
── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.3     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mggplot2  [39m 3.5.0     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.2     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.1.1 ──

[32m✔[39m [34mbroom    

In [2]:
adult <- read_csv("data/adult.csv")

[1mRows: [22m[34m32560[39m [1mColumns: [22m[34m15[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (9): State-gov, Bachelors, Never-married, Adm-clerical, Not-in-family, W...
[32mdbl[39m (6): 39, 77516, 13, 2174, 0, 40

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


#### *Expected Findings*
- More younger people have an annual income greater than 50,000 dollars due to results also showing them having longer years of formal education.
- More non-white, female, immigrants are likely to have an annual income of less than 50,000 compared to white, male, native US residents.
- More non-white, female, immigrants work longer hours than white, male, native US residents despite having a lower income.

#### *Impact of Findings*
The results of the suggested findings can be compared to modern-day statistics to determine if the annual income for different races, sex, age, and native groups has changed. Modern findings may suggest a decrease in wage inequality between race and sex groups, allowing us to identify what changes were made. If modern stats show immigrants making an annual income more comparable to native US residents than in 1994, the government can use it as a way of attracting migrants to the country. If in 2024, data shows younger generations having more years of formal education compared to older generations, the government can advertise a rise in education levels. 

#### *Future Questions*
- Is formal education an important way to increase annual income in 2024 compared to 30 years ago?
- Has the wage gap between different races, sex, and resident groups in the United States changed in the last 30 years?
- Are US immigrants likely to receive financial support comparable to native US residents in 2024?

### References