# Wage Gap Analysis
Angel Wang

## Statement of Problem

Since last several decades, millions more women have joined the workforce with huge increase in education level. However, the differences in employment distribution of women and men within occupation is still a concern. And 
Many thinks that there is a significant declines in terms of gender differences in employment distribution. While there are still many thinks that not only some jobs are male-dominated, but also the earning differences between male workers and female workers are giant.

Therefore, is there a gender gap at all in the US labor market. Are female workers are paid less? Is the gender discrimination related with specific industries? Can women close the wage gap by choosing different occupations? There are many questions remained for us to examine. And this project weeks to explore occupational differences between sexes and the cause using the data from Bureau of Labor Statistics for all working American citizens. 

## Objectives

1) What occupations are most gender dominated (highest percentage of women and lowest percentage of women)?  
2) What occupations have the biggest and smallest pay gap between male and female workers?  
3) What is the percentage of female workers in the most paying occupations?  
4) What factors determine the wage gap?  
5) Propose initiatives to close the wage gap based on insights from analysis.

## Technical Approach

#### Data

The dataset used for analysis is the income information for different occupations from Kaggle, retrieved from Bureau of Labor Statistics. It is available here: https://www.kaggle.com/jonavery/incomes-by-career-and-gender/data. 
The dataset has been downloaded and saved as inc_occ_gender.csv. It has median weekly income for 557 different occupations for male and female workers separately, as well as the count of male and female workers. The data was captured as of January 2015. 

There are 558 rows and 7 columns with following headers: 

| Tables Columns | Meaning                                |
| -------------- | -------------------------------------- | 
| Occupation     | Job titles                             | 
| All_workers    | Number of workers                      | 
| All_weekly     | Median weekly income                   |  
| M_workders     | Number of male workers                 |
| M_weekly       | Median weekly income for male workers  |
| F_workders     | Number of female workers               |
| F_weekly       | Median weekly income for female workers| 

* All worker numbers are in thousands, and all income is in USD. There are several missing values for median income columns since there is no data available for the income information for those occupations.  
* Kaggle data terms and conditions: https://www.kaggle.com/terms  
* Bureau of Labor Statistics copyright information of the data:   https://www.bls.gov/opub/#copyright  

Here are the first 5 rows of the data (the first row is the data for all occupations):  


In [9]:
import pandas as pd
df = pd.read_csv('inc_occ_gender.csv')
df[:5]

Unnamed: 0,Occupation,All_workers,All_weekly,M_workers,M_weekly,F_workers,F_weekly
0,ALL OCCUPATIONS,109080,809,60746,895,48334,726
1,MANAGEMENT,12480,1351,7332,1486,5147,1139
2,Chief executives,1046,2041,763,2251,283,1836
3,General and operations managers,823,1260,621,1347,202,1002
4,Legislators,8,Na,5,Na,4,Na


#### Tools and Approach


Programming Language: Python  
Python packages Needed:   
Pandas - Reading and writing data, manipulating missing values and aggregating/transforming data  
Numpy - Array indexing and array math  
Matplotlib - Plotting  
sklearn - Linear Regression to model salary vs fields and gender

First step to explore the gender wage gap is to calculate gender ratio for each occupation, wage gap (wage differences) and wage ratio (female wage/male wage) for each occupation between male and female. After calculation, next is to find and plot the occupations with biggest and smallest wage ratio (0-1). The larger the wage ratio is, the lager the wage gap is between female and male workers. When wage ratio is 1, it means female workers make the same as male workers in that field. Then find and plot the occupations with the largest and smallest percentage of female workers. We can compare that those occupations to the highest paid occupations. Depending on the result, it may be also necessary to find out the the percentage of female workers in the top paid occupations to have a full understanding on the relationship among gender, wage and occupations.

Last, we want to analyze what is the most significant factor that determines the wage gap between male and female workers. Therefore, we will build a simple linear resgression model to summarize and study relationships between wage gap (dependent variable) and gender + occupation + salary(independent variables).  


#### Expected Result

1) List and plot of 10 occupations that have highest/lowest percentage of female workers  
2) List and plot of 10 occupations that have the biggest/smallest wage gap between male and female workers  
3) The percentage of female workers in the 10 most paid occupations  
4) Regression model for wage gap   
5) Reflection of the result

## Human-centered Design Considerations 

Typically, full-time working women are paid about less than men. Now days, women still control jobs like nurses, waitress, administrative assistants, but not highpaid jobs like engineers or technicians. And research has shown that higher education cannot close the wage gap between women and men. The reason behind this situation is Gender discrimination, an important factor contributes to the wage gap. 

The gender wage gap is also a concept that reflects the consequences of unequal pay and a lack of other work-life supports. Many countries have implemented policies for data reporting to government authorities or publicly for years. Therefore, this project also aims to supports the need of policy change for US policymakers in wage discriminate prevention. In order to close the wage gap, we also need attention from employers and publics besides the policymakers. Employers should be more familiarized with the wage gap from this analysis and start to develop a wage gap reduction plan for workplace. As for public, they also can benefit from more knowledge of wage gap from this analysis and be encouraged to explore more various career options.

This is a human-centered reseaarch project for the following reasons:  
1) This project concerning closing wage gap aims to solve human-related research questions for human benefits.  
2) Data used for this project is human-related (US workers count and median income)


### Limitation and Future Work

1) Besides gender information, we can also include variables like race, religion and geographic information, etc to explore pay discrimination.   
2) Data collected for this project is not the most current reflection of US labor market as it was collect in 2015. But same process can be conducted using new data when available.  
3) Definition of occupation title may differ for different industries, and this may affect the accuracy of our analysis.  