# Washington Post Newspaper Guild Pay Study 2019

This is the study of Washington Post Guild members' salaries based on data turned over by management of The Washington Post on July 2, 2019, pursuant to a request by members of the Guild. Management turned over two Excel files: one file detailing the salaries of current guild members working for The Post (as of the date of transmission) and one file detailing the salaries of past guild members who worked for The Post and have left the organization in the past five years.

What follows is an attempt to understand pay at The Washington Post. No individual analysis should be taken on its own to mean that disparities in pay do or do not exist. This study will start with summary analysis of trends and will dive deeper as the study goes on. 

The only data manipulation done prior to analysis was taking the data out of Excel and putting the files into CSV files, converting dates from 'MM/DD/YYYY' to 'YYYY-MM-DD' and removing commas from monetary columns where values exceeded 1,000.

## Importing data

In [1]:
from pathlib import Path

import re
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.iolib.summary2 import summary_col
from linearmodels.iv import IV2SLS
import seaborn as sns

pd.options.display.max_columns = None

pd.set_option('display.float_format', lambda x: '%.2f' % x)

In [2]:
BASEDIR = Path.cwd()
CSVPATH = BASEDIR.joinpath('csvs')

In [3]:
active_wd_schema = {
    'department': str,
    'employee_id': str,
    'gender': str,
    'race_ethnicity': str,
    'date_of_birth': str,
    'original_hire_date': str,
    'hire_date': str,
    'pay_rate_type': str,
    'current_base_pay': np.float64,
    'job_profile_current': str,
    'time_type_current': str,
    'cost_center_current': str,
    'effective_date1': str,
    'business_process_type1': str,
    'business_process_reason1': str,
    'pay_rate_type1': str,
    'base_pay_change1': np.float64,
    'job_profile1': str,
    'time_type1': str,
    'cost_center1': str,
    'effective_date2': str,
    'business_process_type2': str,
    'business_process_reason2': str,
    'pay_rate_type2': str,
    'base_pay_change2': np.float64,
    'job_profile2': str,
    'time_type2': str,
    'cost_center2': str,
    'effective_date3': str,
    'business_process_type3': str,
    'business_process_reason3': str,
    'pay_rate_type3': str,
    'base_pay_change3': np.float64,
    'job_profile3': str,
    'time_type3': str,
    'cost_center3': str,
    'effective_date4': str,
    'business_process_type4': str,
    'business_process_reason4': str,
    'pay_rate_type4': str,
    'base_pay_change4': np.float64,
    'job_profile4': str,
    'time_type4': str,
    'cost_center4': str,
    'effective_date5': str,
    'business_process_type5': str,
    'business_process_reason5': str,
    'pay_rate_type5': str,
    'base_pay_change5': np.float64,
    'job_profile5': str,
    'time_type5': str,
    'cost_center5': str,
    'effective_date6': str,
    'business_process_type6': str,
    'business_process_reason6': str,
    'pay_rate_type6': str,
    'base_pay_change6': np.float64,
    'job_profile6': str,
    'time_type6': str,
    'cost_center6': str,
    'effective_date7': str,
    'business_process_type7': str,
    'business_process_reason7': str,
    'pay_rate_type7': str,
    'base_pay_change7': np.float64,
    'job_profile7': str,
    'time_type7': str,
    'cost_center7': str,
    'effective_date8': str,
    'business_process_type8': str,
    'business_process_reason8': str,
    'pay_rate_type8': str,
    'base_pay_change8': np.float64,
    'job_profile8': str,
    'time_type8': str,
    'cost_center8': str,
    'effective_date9': str,
    'business_process_type9': str,
    'business_process_reason9': str,
    'pay_rate_type9': str,
    'base_pay_change9': np.float64,
    'job_profile9': str,
    'time_type9': str,
    'cost_center9': str,
    'effective_date10': str,
    'business_process_type10': str,
    'business_process_reason10': str,
    'pay_rate_type10': str,
    'base_pay_change10': np.float64,
    'job_profile10': str,
    'time_type10': str,
    'cost_center10': str,
    'effective_date11': str,
    'business_process_type11': str,
    'business_process_reason11': str,
    'pay_rate_type11': str,
    'base_pay_change11': np.float64,
    'job_profile11': str,
    'time_type11': str,
    'cost_center11': str,
    'effective_date12': str,
    'business_process_type12': str,
    'business_process_reason12': str,
    'pay_rate_type12': str,
    'base_pay_change12': np.float64,
    'job_profile12': str,
    'time_type12': str,
    'cost_center12': str,
    'effective_date13': str,
    'business_process_type13': str,
    'business_process_reason13': str,
    'pay_rate_type13': str,
    'base_pay_change13': np.float64,
    'job_profile13': str,
    'time_type13': str,
    'cost_center13': str,
    'effective_date14': str,
    'business_process_type14': str,
    'business_process_reason14': str,
    'pay_rate_type14': str,
    'base_pay_change14': np.float64,
    'job_profile14': str,
    'time_type14': str,
    'cost_center14': str,
    'effective_date15': str,
    'business_process_type15': str,
    'business_process_reason15': str,
    'pay_rate_type15': str,
    'base_pay_change15': np.float64,
    'job_profile15': str,
    'time_type15': str,
    'cost_center15': str,
    'effective_date16': str,
    'business_process_type16': str,
    'business_process_reason16': str,
    'pay_rate_type16': str,
    'base_pay_change16': np.float64,
    'job_profile16': str,
    'time_type16': str,
    'cost_center16': str,
    'effective_date17': str,
    'business_process_type17': str,
    'business_process_reason17': str,
    'pay_rate_type17': str,
    'base_pay_change17': np.float64,
    'job_profile17': str,
    'time_type17': str,
    'cost_center17': str,
    'effective_date18': str,
    'business_process_type18': str,
    'business_process_reason18': str,
    'pay_rate_type18': str,
    'base_pay_change18': np.float64,
    'job_profile18': str,
    'time_type18': str,
    'cost_center18': str,
    'effective_date19': str,
    'business_process_type19': str,
    'pay_rate_type19': str,
    'base_pay_change19': np.float64,
    'job_profile19': str,
    'time_type19': str,
    'cost_center19': str,
    '2015_annual_performance_rating': np.float64,
    '2016_annual_performance_rating': np.float64,
    '2017_annual_performance_rating': np.float64,
    '2018_annual_performance_rating': np.float64
}

parse_dates = ['date_of_birth', 'original_hire_date', 'hire_date','effective_date1','effective_date2','effective_date3','effective_date4','effective_date5','effective_date6','effective_date7','effective_date8','effective_date9','effective_date10','effective_date11','effective_date12','effective_date13','effective_date14','effective_date15','effective_date16','effective_date17','effective_date18']

In [4]:
terminated_wd_schema = {
    'department': str,
    'employee_id': str,
    'gender': str,
    'race_ethnicity': str,
    'date_of_birth': str,
    'original_hire_date': str,
    'hire_date': str,
    'termination_date': str,
    'pay_rate_type': str,
    'current_base_pay': np.float64,
    'job_profile_current': str,
    'time_type_current': str,
    'cost_center_current': str,
    'effective_date1': str,
    'business_process_type1': str,
    'business_process_reason1': str,
    'pay_rate_type1': str,
    'base_pay_change1': np.float64,
    'job_profile1': str,
    'time_type1': str,
    'cost_center1': str,
    'effective_date2': str,
    'business_process_type2': str,
    'business_process_reason2': str,
    'pay_rate_type2': str,
    'base_pay_change2': np.float64,
    'job_profile2': str,
    'time_type2': str,
    'cost_center2': str,
    'effective_date3': str,
    'business_process_type3': str,
    'business_process_reason3': str,
    'pay_rate_type3': str,
    'base_pay_change3': np.float64,
    'job_profile3': str,
    'time_type3': str,
    'cost_center3': str,
    'effective_date4': str,
    'business_process_type4': str,
    'business_process_reason4': str,
    'pay_rate_type4': str,
    'base_pay_change4': np.float64,
    'job_profile4': str,
    'time_type4': str,
    'cost_center4': str,
    'effective_date5': str,
    'business_process_type5': str,
    'business_process_reason5': str,
    'pay_rate_type5': str,
    'base_pay_change5': np.float64,
    'job_profile5': str,
    'time_type5': str,
    'cost_center5': str,
    'effective_date6': str,
    'business_process_type6': str,
    'business_process_reason6': str,
    'pay_rate_type6': str,
    'base_pay_change6': np.float64,
    'job_profile6': str,
    'time_type6': str,
    'cost_center6': str,
    'effective_date7': str,
    'business_process_type7': str,
    'business_process_reason7': str,
    'pay_rate_type7': str,
    'base_pay_change7': np.float64,
    'job_profile7': str,
    'time_type7': str,
    'cost_center7': str,
    'effective_date8': str,
    'business_process_type8': str,
    'business_process_reason8': str,
    'pay_rate_type8': str,
    'base_pay_change8': np.float64,
    'job_profile8': str,
    'time_type8': str,
    'cost_center8': str,
    'effective_date9': str,
    'business_process_type9': str,
    'business_process_reason9': str,
    'pay_rate_type9': str,
    'base_pay_change9': np.float64,
    'job_profile9': str,
    'time_type9': str,
    'cost_center9': str,
    'effective_date10': str,
    'business_process_type10': str,
    'business_process_reason10': str,
    'pay_rate_type10': str,
    'base_pay_change10': np.float64,
    'job_profile10': str,
    'time_type10': str,
    'cost_center10': str,
    'effective_date11': str,
    'business_process_type11': str,
    'business_process_reason11': str,
    'pay_rate_type11': str,
    'base_pay_change11': np.float64,
    'job_profile11': str,
    'time_type11': str,
    'cost_center11': str,
    'effective_date12': str,
    'business_process_type12': str,
    'business_process_reason12': str,
    'pay_rate_type12': str,
    'base_pay_change12': np.float64,
    'job_profile12': str,
    'time_type12': str,
    'cost_center12': str,
    'effective_date13': str,
    'business_process_type13': str,
    'business_process_reason13': str,
    'pay_rate_type13': str,
    'base_pay_change13': np.float64,
    'job_profile13': str,
    'time_type13': str,
    'cost_center13': str,
    'effective_date14': str,
    'business_process_type14': str,
    'business_process_reason14': str,
    'pay_rate_type14': str,
    'job_profile14': str,
    'time_type14': str,
    'cost_center14': str,
    '2015_annual_performance_rating': np.float64,
    '2016_annual_performance_rating': np.float64,
    '2017_annual_performance_rating': np.float64,
    '2018_annual_performance_rating': np.float64
}

parse_dates2 = ['date_of_birth', 'original_hire_date', 'hire_date','termination_date','effective_date1','effective_date2','effective_date3','effective_date4','effective_date5','effective_date6','effective_date7','effective_date8','effective_date9','effective_date10','effective_date11','effective_date12','effective_date13','effective_date14']

In [5]:
df = pd.read_csv(CSVPATH.joinpath('active_wd.csv'), dtype=active_wd_schema, parse_dates=parse_dates)
df2 = pd.read_csv(CSVPATH.joinpath('terminated_wd.csv'), dtype=terminated_wd_schema, parse_dates=parse_dates2)

## Add fields for analysis

In [6]:
date_received = np.datetime64('2019-07-02')

df['age'] = (date_received - df['date_of_birth']).astype('<m8[Y]')
df['years_of_service'] = (date_received - df['hire_date']).astype('<m8[Y]')
df2['age'] = (date_received - df2['date_of_birth']).astype('<m8[Y]')
df2['years_of_service'] = (date_received - df2['hire_date']).astype('<m8[Y]')

### Add field for 5-year age groups

In [7]:
bins= [0,25,30,35,40,45,50,55,60,65,100]
labels = ['<25','25-29','30-34','35-39','40-44', '45-49','50-54','55-59','60-64','65+']
df['age_group_5'] = pd.cut(df['age'], bins=bins, labels=labels, right=False)
df2['age_group_5'] = pd.cut(df2['age'], bins=bins, labels=labels, right=False)

### Add field for 10-year age groups

In [8]:
bins= [0,25,35,45,55,65,100]
labels = ['<25','25-34','35-44','45-54','55-64','65+']
df['age_group_10'] = pd.cut(df['age'], bins=bins, labels=labels, right=False)
df2['age_group_10'] = pd.cut(df2['age'], bins=bins, labels=labels, right=False)

### Add field for years-of-service groups

In [9]:
bins= [0,1,3,6,11,16,21,26,100]
labels = ['0','1-2','3-5','6-10','11-15','16-20','21-25','25+']
df['years_of_service_grouped'] = pd.cut(df['years_of_service'], bins=bins, labels=labels, right=False)
df2['years_of_service_grouped'] = pd.cut(df2['years_of_service'], bins=bins, labels=labels, right=False)

### Group departments

In [10]:
def dept(row):
    NEWS_DEPTS = ['News', 'Editorial', 'News Service and Syndicate']
    COMMERCIAL_DEPTS = [
        'Client Solutions', 'Circulation', 'Finance', 'Marketing', 'WP News Media Services', 'Production', 'Public Relations', 'Administration', 'Product', 'Audience Development and Insights', 'Customer Care and Logistics', 'Legal', 'Washington Post Live'
    ]
    if row['department'] in NEWS_DEPTS:
        return 'News'
    elif row['department'] in COMMERCIAL_DEPTS:
        return 'Commercial'
    else:
        return 'Unknown'

df['dept'] = df.apply(lambda row: dept(row), axis=1)
df2['dept'] = df2.apply(lambda row: dept(row), axis=1)

### Group desks

In [11]:
def desk(row):
    OPERATIONS = ['110000 News Operations','110001 News Digital Operations']
    AUDIENCE = ['Audience Development and Engagement']
    AUDIO = ['110620 News Audio']
    DESIGN = ['110604 Presentation Design','110605 Presentation']
    EMERGING = ['110664 News National Apps','110665 News The Lily','110666 News Snapchat','110667 News By The Way']
    FINANCIAL = ['113210 Economy and Business']
    FOREIGN = ['114000 Foreign Administration','114095 News Foreign Brazil','114100 Foreign Latam','114220 News Foreign Istanbul','114235 Foreign Western Europe','114300 News Foreign West Africa','114415 Foreign Hong Kong','114405 Foreign Beijing Bureau','114105 Foreign Mexico Bureau','114005 Foreign Beirut Bureau','114400 Foreign India Bureau','114410 Foreign Tokyo Bureau','114205 Foreign Islamabad Bureau','114305 Foreign Nairobi Bureau','114240 Foreign Rome Bureau','114200 Foreign London Bureau','114230 Foreign Moscow Bureau','114225 Foreign Cairo Bureau','114215 Foreign Berlin Bureau']
    GRAPHICS = ['110603 Presentation Graphics']
    INVESTIGATIVE = ['110450 Investigative']
    LOCAL = ['112300 Local Politics and Government']
    MULTI = ['110601 Multiplatform Desk']
    NATIONAL = ['110500 Magazine','113200 National Politics and Government','113205 National Security','113215 News National Health & Science','113220 National Enterprise','113235 National America','113240 News National Environment']
    RESEARCH = ['110006 News Content & Research']
    LOGISTICS = ['110455 News Logistics']
    OUTLOOK = ['110410 Book World','110460 Outlook']
    POLLING = ['110475 Polling']
    SPORTS = ['110015 Sports Main']
    STYLE = ['110300 Style','110435 Food','110485 Travel','110495 Local Living','110505 Weekend']
    UNIVERSAL = ['110600 Universal Desk']
    VIDEO = ['110652 News Video - General']
    OTHER = ['110663 Wake Up Report']
    EDITORIAL = ['115000 Editorial Administration']
    if row['cost_center_current'] in OPERATIONS:
        return 'Operations'
    elif row['cost_center_current'] in AUDIENCE:
        return 'Audience Development and Engagement'
    elif row['cost_center_current'] in AUDIO:
        return 'Audio'
    elif row['cost_center_current'] in DESIGN:
        return 'Design'
    elif row['cost_center_current'] in EMERGING:
        return 'Emerging News Products'
    elif row['cost_center_current'] in FINANCIAL:
        return 'Financial'
    elif row['cost_center_current'] in FOREIGN:
        return 'Foreign'
    elif row['cost_center_current'] in GRAPHICS:
        return 'Graphics'
    elif row['cost_center_current'] in LOCAL:
        return 'Local'
    elif row['cost_center_current'] in MULTI:
        return 'Multiplatform'
    elif row['cost_center_current'] in NATIONAL:
        return 'National'
    elif row['cost_center_current'] in RESEARCH:
        return 'News Content and Research'
    elif row['cost_center_current'] in LOGISTICS:
        return 'News Logistics'
    elif row['cost_center_current'] in OUTLOOK:
        return 'Outlook'
    elif row['cost_center_current'] in POLLING:
        return 'Polling'
    elif row['cost_center_current'] in SPORTS:
        return 'Sports'
    elif row['cost_center_current'] in STYLE:
        return 'Style'
    elif row['cost_center_current'] in UNIVERSAL:
        return 'Universal Desk'
    elif row['cost_center_current'] in VIDEO:
        return 'Video'
    elif row['cost_center_current'] in OTHER:
        return 'Other'
    elif row['cost_center_current'] in EDITORIAL:
        return 'Editorial'
    else:
        return 'non-newsroom'

df['desk'] = df.apply(lambda row: desk(row), axis=1)
df2['desk'] = df2.apply(lambda row: desk(row), axis=1)

### Group desks by median salary ranges

In [12]:
def tier(row):
    TIER1 = ['National','Foreign','Financial','Investigative']
    TIER2 = ['Style','Local','Graphics','Universal Desk','Sports','Outlook','Editorial']
    TIER3 = ['Audio','Polling','Design','Operations','Multiplatform','Video','Audience Development and Engagement']
    TIER4 = ['News Logistics','News Content and Research','Emerging News Products','Other']
    if row['desk'] in TIER1:
        return 'Tier 1'
    elif row['desk'] in TIER2:
        return 'Tier 2'
    elif row['desk'] in TIER3:
        return 'Tier 3'
    elif row['desk'] in TIER4:
        return 'Tier 4'
    else:
        return 'other'

df['tier'] = df.apply(lambda row: tier(row), axis=1)
df2['tier'] = df2.apply(lambda row: tier(row), axis=1)

### Group race and ethnicity

In [13]:
def race_groups(row):
    WHITE = ['White (United States of America)']
    NONWHITE = [
        'Black or African American (United States of America)', 'Asian (United States of America)', 'Hispanic or Latino (United States of America)', 'Two or More Races (United States of America)', 'American Indian or Alaska Native (United States of America)', 'Native Hawaiian or Other Pacific Islander (United States of America)'
    ]
    if row['race_ethnicity'] in WHITE:
        return 'white'
    elif row['race_ethnicity'] in NONWHITE:
        return 'person of color'
    else:
        return 'unknown'

df['race_grouping'] = df.apply(lambda row: race_groups(row), axis=1)
df2['race_grouping'] = df2.apply(lambda row: race_groups(row), axis=1)

### Employee pay change grouping

In [14]:
reason_for_change1 = df[['business_process_reason1','base_pay_change1','effective_date1','pay_rate_type1','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason1':'business_process_reason','base_pay_change1':'base_pay_change','effective_date1':'effective_date','pay_rate_type1':'pay_rate_type'})
reason_for_change2 = df[['business_process_reason2','base_pay_change2','effective_date2','pay_rate_type2','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason2':'business_process_reason','base_pay_change2':'base_pay_change','effective_date2':'effective_date','pay_rate_type2':'pay_rate_type'})
reason_for_change3 = df[['business_process_reason3','base_pay_change3','effective_date3','pay_rate_type3','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason3':'business_process_reason','base_pay_change3':'base_pay_change','effective_date3':'effective_date','pay_rate_type3':'pay_rate_type'})
reason_for_change4 = df[['business_process_reason4','base_pay_change4','effective_date4','pay_rate_type4','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason4':'business_process_reason','base_pay_change4':'base_pay_change','effective_date4':'effective_date','pay_rate_type4':'pay_rate_type'})
reason_for_change5 = df[['business_process_reason5','base_pay_change5','effective_date5','pay_rate_type5','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason5':'business_process_reason','base_pay_change5':'base_pay_change','effective_date5':'effective_date','pay_rate_type5':'pay_rate_type'})
reason_for_change6 = df[['business_process_reason6','base_pay_change6','effective_date6','pay_rate_type6','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason6':'business_process_reason','base_pay_change6':'base_pay_change','effective_date6':'effective_date','pay_rate_type6':'pay_rate_type'})
reason_for_change7 = df[['business_process_reason7','base_pay_change7','effective_date7','pay_rate_type7','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason7':'business_process_reason','base_pay_change7':'base_pay_change','effective_date7':'effective_date','pay_rate_type7':'pay_rate_type'})
reason_for_change8 = df[['business_process_reason8','base_pay_change8','effective_date8','pay_rate_type8','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason8':'business_process_reason','base_pay_change8':'base_pay_change','effective_date8':'effective_date','pay_rate_type8':'pay_rate_type'})
reason_for_change9 = df[['business_process_reason9','base_pay_change9','effective_date9','pay_rate_type9','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason9':'business_process_reason','base_pay_change9':'base_pay_change','effective_date9':'effective_date','pay_rate_type9':'pay_rate_type'})
reason_for_change10 = df[['business_process_reason10','base_pay_change10','effective_date10','pay_rate_type10','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason10':'business_process_reason','base_pay_change10':'base_pay_change','effective_date10':'effective_date','pay_rate_type10':'pay_rate_type'})
reason_for_change11 = df[['business_process_reason11','base_pay_change11','effective_date11','pay_rate_type11','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason11':'business_process_reason','base_pay_change11':'base_pay_change','effective_date11':'effective_date','pay_rate_type11':'pay_rate_type'})
reason_for_change12 = df[['business_process_reason12','base_pay_change12','effective_date12','pay_rate_type12','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason12':'business_process_reason','base_pay_change12':'base_pay_change','effective_date12':'effective_date','pay_rate_type12':'pay_rate_type'})
reason_for_change13 = df[['business_process_reason13','base_pay_change13','effective_date13','pay_rate_type13','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason13':'business_process_reason','base_pay_change13':'base_pay_change','effective_date13':'effective_date','pay_rate_type13':'pay_rate_type'})
reason_for_change14 = df[['business_process_reason14','base_pay_change14','effective_date14','pay_rate_type14','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason14':'business_process_reason','base_pay_change14':'base_pay_change','effective_date14':'effective_date','pay_rate_type14':'pay_rate_type'})
reason_for_change15 = df[['business_process_reason15','base_pay_change15','effective_date15','pay_rate_type15','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason15':'business_process_reason','base_pay_change15':'base_pay_change','effective_date15':'effective_date','pay_rate_type15':'pay_rate_type'})
reason_for_change16 = df[['business_process_reason16','base_pay_change16','effective_date16','pay_rate_type16','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason16':'business_process_reason','base_pay_change16':'base_pay_change','effective_date16':'effective_date','pay_rate_type16':'pay_rate_type'})
reason_for_change17 = df[['business_process_reason17','base_pay_change17','effective_date17','pay_rate_type17','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason17':'business_process_reason','base_pay_change17':'base_pay_change','effective_date17':'effective_date','pay_rate_type17':'pay_rate_type'})
reason_for_change18 = df[['business_process_reason18','base_pay_change18','effective_date18','pay_rate_type18','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason18':'business_process_reason','base_pay_change18':'base_pay_change','effective_date18':'effective_date','pay_rate_type18':'pay_rate_type'})
reason_for_change19 = df2[['business_process_reason1','base_pay_change1','effective_date1','pay_rate_type1','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason1':'business_process_reason','base_pay_change1':'base_pay_change','effective_date1':'effective_date','pay_rate_type1':'pay_rate_type'})
reason_for_change20 = df2[['business_process_reason2','base_pay_change2','effective_date2','pay_rate_type2','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason2':'business_process_reason','base_pay_change2':'base_pay_change','effective_date2':'effective_date','pay_rate_type2':'pay_rate_type'})
reason_for_change21 = df2[['business_process_reason3','base_pay_change3','effective_date3','pay_rate_type3','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason3':'business_process_reason','base_pay_change3':'base_pay_change','effective_date3':'effective_date','pay_rate_type3':'pay_rate_type'})
reason_for_change22 = df2[['business_process_reason4','base_pay_change4','effective_date4','pay_rate_type4','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason4':'business_process_reason','base_pay_change4':'base_pay_change','effective_date4':'effective_date','pay_rate_type4':'pay_rate_type'})
reason_for_change23 = df2[['business_process_reason5','base_pay_change5','effective_date5','pay_rate_type5','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason5':'business_process_reason','base_pay_change5':'base_pay_change','effective_date5':'effective_date','pay_rate_type5':'pay_rate_type'})
reason_for_change24 = df2[['business_process_reason6','base_pay_change6','effective_date6','pay_rate_type6','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason6':'business_process_reason','base_pay_change6':'base_pay_change','effective_date6':'effective_date','pay_rate_type6':'pay_rate_type'})
reason_for_change25 = df2[['business_process_reason7','base_pay_change7','effective_date7','pay_rate_type7','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason7':'business_process_reason','base_pay_change7':'base_pay_change','effective_date7':'effective_date','pay_rate_type7':'pay_rate_type'})
reason_for_change26 = df2[['business_process_reason8','base_pay_change8','effective_date8','pay_rate_type8','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason8':'business_process_reason','base_pay_change8':'base_pay_change','effective_date8':'effective_date','pay_rate_type8':'pay_rate_type'})
reason_for_change27 = df2[['business_process_reason9','base_pay_change9','effective_date9','pay_rate_type9','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason9':'business_process_reason','base_pay_change9':'base_pay_change','effective_date9':'effective_date','pay_rate_type9':'pay_rate_type'})
reason_for_change28 = df2[['business_process_reason10','base_pay_change10','effective_date10','pay_rate_type10','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason10':'business_process_reason','base_pay_change10':'base_pay_change','effective_date10':'effective_date','pay_rate_type10':'pay_rate_type'})
reason_for_change29 = df2[['business_process_reason11','base_pay_change11','effective_date11','pay_rate_type11','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason11':'business_process_reason','base_pay_change11':'base_pay_change','effective_date11':'effective_date','pay_rate_type11':'pay_rate_type'})
reason_for_change30 = df2[['business_process_reason12','base_pay_change12','effective_date12','pay_rate_type12','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason12':'business_process_reason','base_pay_change12':'base_pay_change','effective_date12':'effective_date','pay_rate_type12':'pay_rate_type'})
reason_for_change31 = df2[['business_process_reason13','base_pay_change13','effective_date13','pay_rate_type13','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating']].rename(columns={'business_process_reason13':'business_process_reason','base_pay_change13':'base_pay_change','effective_date13':'effective_date','pay_rate_type13':'pay_rate_type'})
reason_for_change1 = pd.DataFrame(reason_for_change1)
reason_for_change2 = pd.DataFrame(reason_for_change2)
reason_for_change3 = pd.DataFrame(reason_for_change3)
reason_for_change4 = pd.DataFrame(reason_for_change4)
reason_for_change5 = pd.DataFrame(reason_for_change5)
reason_for_change6 = pd.DataFrame(reason_for_change6)
reason_for_change7 = pd.DataFrame(reason_for_change7)
reason_for_change8 = pd.DataFrame(reason_for_change8)
reason_for_change9 = pd.DataFrame(reason_for_change9)
reason_for_change10 = pd.DataFrame(reason_for_change10)
reason_for_change11 = pd.DataFrame(reason_for_change11)
reason_for_change12 = pd.DataFrame(reason_for_change12)
reason_for_change13 = pd.DataFrame(reason_for_change13)
reason_for_change14 = pd.DataFrame(reason_for_change14)
reason_for_change15 = pd.DataFrame(reason_for_change15)
reason_for_change16 = pd.DataFrame(reason_for_change16)
reason_for_change17 = pd.DataFrame(reason_for_change17)
reason_for_change18 = pd.DataFrame(reason_for_change18)
reason_for_change19 = pd.DataFrame(reason_for_change19)
reason_for_change20 = pd.DataFrame(reason_for_change20)
reason_for_change21 = pd.DataFrame(reason_for_change21)
reason_for_change22 = pd.DataFrame(reason_for_change22)
reason_for_change23 = pd.DataFrame(reason_for_change23)
reason_for_change24 = pd.DataFrame(reason_for_change24)
reason_for_change25 = pd.DataFrame(reason_for_change25)
reason_for_change26 = pd.DataFrame(reason_for_change26)
reason_for_change27 = pd.DataFrame(reason_for_change27)
reason_for_change28 = pd.DataFrame(reason_for_change28)
reason_for_change29 = pd.DataFrame(reason_for_change29)
reason_for_change30 = pd.DataFrame(reason_for_change30)
reason_for_change31 = pd.DataFrame(reason_for_change31)

reason_for_change_combined = pd.concat([reason_for_change1,reason_for_change2,reason_for_change3,reason_for_change4,reason_for_change5,reason_for_change6,reason_for_change7,reason_for_change8,reason_for_change9,reason_for_change10,reason_for_change11,reason_for_change12,reason_for_change13,reason_for_change14,reason_for_change15,reason_for_change16,reason_for_change17,reason_for_change18,reason_for_change19,reason_for_change20,reason_for_change21,reason_for_change22,reason_for_change23,reason_for_change24,reason_for_change25,reason_for_change26,reason_for_change27,reason_for_change28,reason_for_change29,reason_for_change30,reason_for_change31])

### Employee performance evaluation grouping

In [15]:
fifteen1 = df[['2015_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2015_annual_performance_rating':'performance_rating'})
fifteen2 = df2[['2015_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2015_annual_performance_rating':'performance_rating'})
sixteen1 = df[['2016_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2016_annual_performance_rating':'performance_rating'})
sixteen2 = df2[['2016_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2016_annual_performance_rating':'performance_rating'})
seventeen1 = df[['2017_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2017_annual_performance_rating':'performance_rating'})
seventeen2 = df2[['2017_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2017_annual_performance_rating':'performance_rating'})
eighteen1 = df[['2018_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2018_annual_performance_rating':'performance_rating'})
eighteen2 = df2[['2018_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2018_annual_performance_rating':'performance_rating'})
fifteen1 = pd.DataFrame(fifteen1)
fifteen2 = pd.DataFrame(fifteen2)
sixteen1 = pd.DataFrame(sixteen1)
sixteen2 = pd.DataFrame(sixteen2)
seventeen1 = pd.DataFrame(seventeen1)
seventeen2 = pd.DataFrame(seventeen2)
eighteen1 = pd.DataFrame(eighteen1)
eighteen2 = pd.DataFrame(eighteen2)

ratings_combined = pd.concat([fifteen1,fifteen2,sixteen1,sixteen2,seventeen1,seventeen2,eighteen1,eighteen2])

### Create departmental data frames

In [16]:
news_salaried = df[(df['dept'] == 'News') & (df['pay_rate_type'] == 'Salaried')]
news_hourly = df[(df['dept'] == 'News') & (df['pay_rate_type'] == 'Hourly')]
commercial_salaried = df[(df['dept'] == 'Commercial') & (df['pay_rate_type'] == 'Salaried')]
commercial_hourly = df[(df['dept'] == 'Commercial') & (df['pay_rate_type'] == 'Hourly')]

news_salaried2 = df2[(df2['dept'] == 'News') & (df2['pay_rate_type'] == 'Salaried')]
news_hourly2 = df2[(df2['dept'] == 'News') & (df2['pay_rate_type'] == 'Hourly')]
commercial_salaried2 = df2[(df2['dept'] == 'Commercial') & (df2['pay_rate_type'] == 'Salaried')]
commercial_hourly2 = df2[(df2['dept'] == 'Commercial') & (df2['pay_rate_type'] == 'Hourly')]

## Supress Results

### Suppress results where there are less than five employees

In [17]:
df['count'] = 1
df2['count'] = 1

def suppress(results):
    results.columns = results.columns.get_level_values(1)
    return results[results['count_nonzero'] >= 5]

### Suppress results and order them by count of employees

In [18]:
def suppress_count(results):
    results.columns = results.columns.get_level_values(1)
    return results[results['count_nonzero'] >= 5].sort_values('count_nonzero', ascending=False)

### Suppress results and order them by median salary of employees

In [19]:
def suppress_median(results):
    results.columns = results.columns.get_level_values(1)
    return results[results['count_nonzero'] >= 5].sort_values('median', ascending=False)

## Summary Analysis

### Employee counts

In [20]:
current_employee_count = df.shape[0]
terminated_employee_count = df2.shape[0]

print('Total employees in data: ' + str(current_employee_count + terminated_employee_count))
print('Current employees: ' + str(current_employee_count))
print('Terminated employees: ' + str(terminated_employee_count))

Total employees in data: 1489
Current employees: 950
Terminated employees: 539


In [21]:
current_salaried_employee_count = df[df['pay_rate_type'] == 'Salaried'].shape[0]
terminated_salaried_employee_count = df2[df2['pay_rate_type'] == 'Salaried'].shape[0]

print('Total salaried employees in data: ' + str(current_salaried_employee_count + terminated_salaried_employee_count))
print('Current salaried employees: ' + str(current_salaried_employee_count))
print('Terminated salaried employees: ' + str(terminated_salaried_employee_count))

Total salaried employees in data: 989
Current salaried employees: 707
Terminated salaried employees: 282


In [22]:
current_hourly_employee_count = df[df['pay_rate_type'] == 'Hourly'].shape[0]
terminated_hourly_employee_count = df2[df2['pay_rate_type'] == 'Hourly'].shape[0]

print('Total hourly employees in data: ' + str(current_hourly_employee_count + terminated_hourly_employee_count))
print('Current hourly employees: ' + str(current_hourly_employee_count))
print('Terminated hourly employees: ' + str(terminated_hourly_employee_count))

Total hourly employees in data: 500
Current hourly employees: 243
Terminated hourly employees: 257


### Salary information

In [23]:
current_mean_salary = df[df['pay_rate_type'] == 'Salaried']['current_base_pay'].mean()
current_median_salary = df[df['pay_rate_type'] == 'Salaried']['current_base_pay'].median()

print('The mean yearly pay for current salaried employees is $' + str(current_mean_salary) + '.')
print('The median yearly pay for current salaried employees is $' + str(current_median_salary) + '.')

The mean yearly pay for current salaried employees is $112382.98421499293.
The median yearly pay for current salaried employees is $99903.95.


In [24]:
current_mean_hourly = df[df['pay_rate_type'] == 'Hourly']['current_base_pay'].mean()
current_median_hourly = df[df['pay_rate_type'] == 'Hourly']['current_base_pay'].median()

print('The mean rate for current hourly employees at The Washington Post is $' + str(current_mean_hourly) + '.')
print('The median rate for current hourly employees at The Washington Post is $' + str(current_median_hourly) + '.')

The mean rate for current hourly employees at The Washington Post is $30.197119341563788.
The median rate for current hourly employees at The Washington Post is $29.23.


### Employee gender

In [25]:
current_employee_gender = df.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_gender)

Unnamed: 0_level_0,count_nonzero
gender,Unnamed: 1_level_1
Female,507.0
Male,443.0


In [26]:
terminated_employee_gender = df2.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(terminated_employee_gender)

Unnamed: 0_level_0,count_nonzero
gender,Unnamed: 1_level_1
Female,291.0
Male,246.0


In [27]:
current_median_salary_gender = df[df['pay_rate_type'] == 'Salaried'].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_salary_gender)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,370.0,91815.82
Male,337.0,109928.29


In [28]:
current_median_hourly_gender = df[df['pay_rate_type'] == 'Hourly'].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_hourly_gender)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,137.0,30.77
Male,106.0,25.84


In [29]:
current_age_gender_salaried = df[df['pay_rate_type'] == 'Salaried'].groupby(['gender'])['age'].median().sort_values(ascending=False)
current_age_gender_salaried

gender
Male     41.00
Female   35.00
Name: age, dtype: float64

### Employee race and ethnicity

In [30]:
current_employee_race_ethnicity = df.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_employee_race_ethnicity)

Unnamed: 0_level_0,count_nonzero
race_ethnicity,Unnamed: 1_level_1
White (United States of America),612.0
Black or African American (United States of America),157.0
Asian (United States of America),77.0
Hispanic or Latino (United States of America),45.0
Two or More Races (United States of America),18.0
Prefer Not to Disclose (United States of America),14.0


In [31]:
terminated_employee_race_ethnicity = df2.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(terminated_employee_race_ethnicity)

Unnamed: 0_level_0,count_nonzero
race_ethnicity,Unnamed: 1_level_1
White (United States of America),290.0
Black or African American (United States of America),162.0
Asian (United States of America),46.0
Hispanic or Latino (United States of America),20.0
Two or More Races (United States of America),10.0
Prefer Not to Disclose (United States of America),7.0


In [32]:
current_median_salary_race = df[df['pay_rate_type'] == 'Salaried'].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_median_salary_race)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),505.0,102880.0
Black or African American (United States of America),62.0,91881.24
Asian (United States of America),59.0,90780.0
Prefer Not to Disclose (United States of America),10.0,82140.0
Hispanic or Latino (United States of America),33.0,82000.0
Two or More Races (United States of America),14.0,79860.0


In [33]:
current_median_hourly_race = df[df['pay_rate_type'] == 'Hourly'].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_median_hourly_race)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),107.0,32.71
Asian (United States of America),18.0,27.3
Hispanic or Latino (United States of America),12.0,25.62
Black or African American (United States of America),95.0,25.16


In [34]:
current_age_race_salaried = df[df['pay_rate_type'] == 'Salaried'].groupby(['race_ethnicity'])['age'].median().sort_values(ascending=False)
current_age_race_salaried

race_ethnicity
American Indian or Alaska Native (United States of America)            49.50
Native Hawaiian or Other Pacific Islander (United States of America)   43.00
Black or African American (United States of America)                   41.50
White (United States of America)                                       39.00
Hispanic or Latino (United States of America)                          37.00
Asian (United States of America)                                       33.00
Prefer Not to Disclose (United States of America)                      31.50
Two or More Races (United States of America)                           28.00
Name: age, dtype: float64

In [35]:
current_age_race_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['race_ethnicity'])['age'].median().sort_values(ascending=False)
current_age_race_hourly

race_ethnicity
American Indian or Alaska Native (United States of America)   53.50
Black or African American (United States of America)          47.00
White (United States of America)                              39.00
Asian (United States of America)                              32.00
Prefer Not to Disclose (United States of America)             30.00
Hispanic or Latino (United States of America)                 29.50
Two or More Races (United States of America)                  26.50
Name: age, dtype: float64

### Employee gender x race/ethnicity

In [36]:
current_employee_race_gender = df.groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_race_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
race_ethnicity,gender,Unnamed: 2_level_1
Asian (United States of America),Female,53.0
Asian (United States of America),Male,24.0
Black or African American (United States of America),Female,80.0
Black or African American (United States of America),Male,77.0
Hispanic or Latino (United States of America),Female,24.0
Hispanic or Latino (United States of America),Male,21.0
Prefer Not to Disclose (United States of America),Female,6.0
Prefer Not to Disclose (United States of America),Male,8.0
Two or More Races (United States of America),Female,12.0
Two or More Races (United States of America),Male,6.0


In [37]:
current_salaried_race_gender = df[df['pay_rate_type'] == 'Salaried'].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_salaried_race_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
race_ethnicity,gender,Unnamed: 2_level_1
Asian (United States of America),Female,42.0
Asian (United States of America),Male,17.0
Black or African American (United States of America),Female,31.0
Black or African American (United States of America),Male,31.0
Hispanic or Latino (United States of America),Female,16.0
Hispanic or Latino (United States of America),Male,17.0
Prefer Not to Disclose (United States of America),Female,5.0
Prefer Not to Disclose (United States of America),Male,5.0
Two or More Races (United States of America),Female,9.0
Two or More Races (United States of America),Male,5.0


In [38]:
current_hourly_race_gender = df[df['pay_rate_type'] == 'Hourly'].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_hourly_race_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
race_ethnicity,gender,Unnamed: 2_level_1
Asian (United States of America),Female,11.0
Asian (United States of America),Male,7.0
Black or African American (United States of America),Female,49.0
Black or African American (United States of America),Male,46.0
Hispanic or Latino (United States of America),Female,8.0
White (United States of America),Female,63.0
White (United States of America),Male,44.0


In [39]:
current_median_salary_race_gender = df[df['pay_rate_type'] == 'Salaried'].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_salary_race_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Asian (United States of America),Female,42.0,91115.0
Asian (United States of America),Male,17.0,90431.45
Black or African American (United States of America),Female,31.0,87808.33
Black or African American (United States of America),Male,31.0,99931.09
Hispanic or Latino (United States of America),Female,16.0,80250.0
Hispanic or Latino (United States of America),Male,17.0,90780.0
Prefer Not to Disclose (United States of America),Female,5.0,73000.0
Prefer Not to Disclose (United States of America),Male,5.0,88280.0
Two or More Races (United States of America),Female,9.0,75000.0
Two or More Races (United States of America),Male,5.0,94875.0


In [40]:
current_median_hourly_race_gender = df[df['pay_rate_type'] == 'Hourly'].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_hourly_race_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Asian (United States of America),Female,11.0,28.3
Asian (United States of America),Male,7.0,26.3
Black or African American (United States of America),Female,49.0,26.82
Black or African American (United States of America),Male,46.0,23.2
Hispanic or Latino (United States of America),Female,8.0,28.17
White (United States of America),Female,63.0,33.46
White (United States of America),Male,44.0,31.0


### Employee age

In [41]:
current_employee_age_5 = df.groupby(['age_group_5']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_age_5)

Unnamed: 0_level_0,count_nonzero
age_group_5,Unnamed: 1_level_1
<25,59.0
25-29,171.0
30-34,139.0
35-39,125.0
40-44,98.0
45-49,80.0
50-54,105.0
55-59,84.0
60-64,56.0
65+,33.0


In [42]:
terminated_employee_age_5 = df2.groupby(['age_group_5']).agg({'current_base_pay': [np.count_nonzero]})
suppress(terminated_employee_age_5)

Unnamed: 0_level_0,count_nonzero
age_group_5,Unnamed: 1_level_1
<25,7.0
25-29,117.0
30-34,115.0
35-39,56.0
40-44,52.0
45-49,40.0
50-54,33.0
55-59,42.0
60-64,29.0
65+,44.0


In [43]:
current_employee_age_10 = df.groupby(['age_group_10']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_age_10)

Unnamed: 0_level_0,count_nonzero
age_group_10,Unnamed: 1_level_1
<25,59.0
25-34,310.0
35-44,223.0
45-54,185.0
55-64,140.0
65+,33.0


In [44]:
terminated_employee_age_10 = df2.groupby(['age_group_10']).agg({'current_base_pay': [np.count_nonzero]})
suppress(terminated_employee_age_10)

Unnamed: 0_level_0,count_nonzero
age_group_10,Unnamed: 1_level_1
<25,7.0
25-34,232.0
35-44,108.0
45-54,73.0
55-64,71.0
65+,44.0


In [45]:
current_median_salary_age_5 = df[df['pay_rate_type'] == 'Salaried'].groupby(['age_group_5']).agg({'current_base_pay': [np.median, np.count_nonzero]})
suppress(current_median_salary_age_5)

Unnamed: 0_level_0,median,count_nonzero
age_group_5,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,64640.0,34.0
25-29,80000.0,126.0
30-34,92500.0,119.0
35-39,105301.31,104.0
40-44,125924.46,72.0
45-49,99502.5,56.0
50-54,110844.65,80.0
55-59,139716.51,61.0
60-64,113134.31,38.0
65+,153061.0,17.0


In [46]:
current_median_hourly_age_5 = df[df['pay_rate_type'] == 'Hourly'].groupby(['age_group_5']).agg({'current_base_pay': [np.median, np.count_nonzero]})
suppress(current_median_hourly_age_5)

Unnamed: 0_level_0,median,count_nonzero
age_group_5,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,25.64,25.0
25-29,30.77,45.0
30-34,30.61,20.0
35-39,31.24,21.0
40-44,29.48,26.0
45-49,31.4,24.0
50-54,26.14,25.0
55-59,27.05,23.0
60-64,24.98,18.0
65+,27.26,16.0


In [47]:
current_median_salary_age_10 = df[df['pay_rate_type'] == 'Salaried'].groupby(['age_group_10']).agg({'current_base_pay': [np.median, np.count_nonzero]})
suppress(current_median_salary_age_10)

Unnamed: 0_level_0,median,count_nonzero
age_group_10,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,64640.0,34.0
25-34,85500.0,245.0
35-44,115118.47,176.0
45-54,108202.32,136.0
55-64,127059.4,99.0
65+,153061.0,17.0


In [48]:
current_median_hourly_age_10 = df[df['pay_rate_type'] == 'Hourly'].groupby(['age_group_10']).agg({'current_base_pay': [np.median, np.count_nonzero]})
suppress(current_median_hourly_age_10)

Unnamed: 0_level_0,median,count_nonzero
age_group_10,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,25.64,25.0
25-34,30.77,65.0
35-44,30.77,47.0
45-54,28.3,49.0
55-64,26.46,41.0
65+,27.26,16.0


### Employee department

In [49]:
current_employee_dept = df.groupby(['dept']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_employee_dept)

Unnamed: 0_level_0,count_nonzero
dept,Unnamed: 1_level_1
News,670.0
Commercial,280.0


In [50]:
current_employee_department = df.groupby(['department']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_employee_department)

Unnamed: 0_level_0,count_nonzero
department,Unnamed: 1_level_1
News,632.0
Client Solutions,164.0
Circulation,49.0
Editorial,38.0
Finance,31.0
Marketing,11.0
WP News Media Services,9.0
Production,6.0
Public Relations,5.0


In [51]:
current_employee_dept_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['dept']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_dept_salary)

Unnamed: 0_level_0,count_nonzero,median
dept,Unnamed: 1_level_1,Unnamed: 2_level_1
News,574.0,104669.96
Commercial,133.0,86104.69


In [52]:
current_employee_department_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['department']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_department_salary)

Unnamed: 0_level_0,count_nonzero,median
department,Unnamed: 1_level_1,Unnamed: 2_level_1
Editorial,33.0,105000.0
News,541.0,104559.92
Finance,8.0,90575.5
WP News Media Services,9.0,86104.69
Client Solutions,102.0,85633.86
Marketing,7.0,81196.11
Production,5.0,71665.06


In [53]:
current_employee_dept_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['dept']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_dept_hourly)

Unnamed: 0_level_0,count_nonzero,median
dept,Unnamed: 1_level_1,Unnamed: 2_level_1
News,96.0,33.05
Commercial,147.0,26.27


In [54]:
current_employee_department_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['department']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_department_hourly)

Unnamed: 0_level_0,count_nonzero,median
department,Unnamed: 1_level_1,Unnamed: 2_level_1
Public Relations,5.0,35.01
News,91.0,33.12
Editorial,5.0,32.31
Client Solutions,62.0,29.41
Finance,23.0,29.23
Circulation,49.0,22.44


### Employee cost center

In [55]:
current_employee_desk = df.groupby(['desk']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_employee_desk)

Unnamed: 0_level_0,count_nonzero
desk,Unnamed: 1_level_1
non-newsroom,316.0
National,118.0
Local,70.0
Style,54.0
Video,50.0
Sports,48.0
Design,46.0
Multiplatform,42.0
Financial,38.0
Editorial,38.0


In [56]:
current_employee_cost_center = df.groupby(['cost_center_current']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_employee_cost_center)

Unnamed: 0_level_0,count_nonzero
cost_center_current,Unnamed: 1_level_1
112300 Local Politics and Government,70.0
113200 National Politics and Government,63.0
110652 News Video - General,50.0
110015 Sports Main,48.0
110601 Multiplatform Desk,42.0
110300 Style,39.0
119065 Dispatch Operations (Night Circulation),39.0
115000 Editorial Administration,38.0
113210 Economy and Business,38.0
110605 Presentation,24.0


In [57]:
current_employee_desk_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['desk']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_desk_salary)

Unnamed: 0_level_0,count_nonzero,median
desk,Unnamed: 1_level_1,Unnamed: 2_level_1
National,106.0,149520.5
Foreign,25.0,135000.0
Financial,38.0,133509.94
Style,45.0,107170.81
Local,65.0,105780.0
Editorial,33.0,105000.0
Graphics,15.0,100780.0
Universal Desk,8.0,100444.28
Sports,37.0,100000.0
Outlook,6.0,99937.5


In [58]:
current_employee_cost_center_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['cost_center_current']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_cost_center_salary)

Unnamed: 0_level_0,count_nonzero,median
cost_center_current,Unnamed: 1_level_1,Unnamed: 2_level_1
113205 National Security,17.0,172780.0
117682 Global Sales,21.0,164984.25
113200 National Politics and Government,55.0,145980.0
113235 National America,12.0,137123.72
113215 News National Health & Science,12.0,135594.87
113210 Economy and Business,38.0,133509.94
110450 Investigative,13.0,129780.0
117600 Leadership Executive,5.0,127500.0
113240 News National Environment,5.0,126080.0
110300 Style,36.0,115177.72


In [59]:
current_employee_desk_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['desk']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_desk_hourly)

Unnamed: 0_level_0,count_nonzero,median
desk,Unnamed: 1_level_1,Unnamed: 2_level_1
Audio,6.0,39.75
Universal Desk,8.0,38.67
Multiplatform,16.0,34.09
Editorial,5.0,32.31
National,12.0,31.74
non-newsroom,154.0,26.57
Local,5.0,26.46
Style,9.0,21.77
Sports,11.0,20.91
Operations,7.0,15.59


In [60]:
current_employee_cost_center_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['cost_center_current']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_cost_center_hourly)

Unnamed: 0_level_0,count_nonzero,median
cost_center_current,Unnamed: 1_level_1,Unnamed: 2_level_1
110620 News Audio,6.0,39.75
110600 Universal Desk,8.0,38.67
110610 Audience Development and Engagement,7.0,37.58
129100 Community,5.0,35.01
110601 Multiplatform Desk,16.0,34.09
115000 Editorial Administration,5.0,32.31
126060 Circulation Accounting,9.0,30.51
113200 National Politics and Government,8.0,30.49
126020 Revenue Administration,14.0,28.75
117210 Production Creative,5.0,28.13


### Employee years of service

In [61]:
current_employee_yos = df.groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_yos)

Unnamed: 0_level_0,count_nonzero
years_of_service_grouped,Unnamed: 1_level_1
0,138.0
1-2,223.0
3-5,195.0
6-10,109.0
11-15,80.0
16-20,102.0
21-25,46.0
25+,57.0


In [62]:
terminated_employee_yos = df2.groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero]})
suppress(terminated_employee_yos)

Unnamed: 0_level_0,count_nonzero
years_of_service_grouped,Unnamed: 1_level_1
0,8.0
1-2,78.0
3-5,196.0
6-10,119.0
11-15,51.0
16-20,44.0
21-25,12.0
25+,29.0


In [63]:
current_employee_yos_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_employee_yos_salary)

Unnamed: 0_level_0,count_nonzero,median
years_of_service_grouped,Unnamed: 1_level_1,Unnamed: 2_level_1
0,96.0,85000.0
1-2,164.0,91776.89
3-5,172.0,92305.85
6-10,75.0,106602.62
11-15,56.0,107685.39
16-20,74.0,125300.67
21-25,32.0,128485.24
25+,38.0,131793.39


In [64]:
current_employee_yos_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_employee_yos_hourly)

Unnamed: 0_level_0,count_nonzero,median
years_of_service_grouped,Unnamed: 1_level_1,Unnamed: 2_level_1
0,42.0,27.7
1-2,59.0,31.68
3-5,23.0,27.05
6-10,34.0,29.25
11-15,24.0,32.41
16-20,28.0,27.78
21-25,14.0,31.14
25+,19.0,26.82


In [65]:
current_employee_yos_gender = df.groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_yos_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
years_of_service_grouped,gender,Unnamed: 2_level_1
0,Female,82.0
0,Male,56.0
1-2,Female,132.0
1-2,Male,91.0
3-5,Female,96.0
3-5,Male,99.0
6-10,Female,51.0
6-10,Male,58.0
11-15,Female,41.0
11-15,Male,39.0


In [66]:
current_employee_yos_gender_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_employee_yos_gender_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Female,61.0,80000.0
0,Male,35.0,100000.0
1-2,Female,96.0,85780.0
1-2,Male,68.0,96737.8
3-5,Female,88.0,89724.74
3-5,Male,84.0,95265.36
6-10,Female,38.0,99499.7
6-10,Male,37.0,117843.5
11-15,Female,28.0,98141.6
11-15,Male,28.0,126910.89


In [67]:
current_employee_yos_gender_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_employee_yos_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Female,21.0,29.23
0,Male,21.0,22.05
1-2,Female,36.0,31.92
1-2,Male,23.0,26.04
3-5,Female,8.0,34.77
3-5,Male,15.0,22.98
6-10,Female,13.0,30.84
6-10,Male,21.0,25.16
11-15,Female,13.0,34.72
11-15,Male,11.0,29.92


In [68]:
current_employee_yos_race = df.groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_yos_race)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
years_of_service_grouped,race_ethnicity,Unnamed: 2_level_1
0,Asian (United States of America),15.0
0,Black or African American (United States of America),20.0
0,Hispanic or Latino (United States of America),10.0
0,Prefer Not to Disclose (United States of America),8.0
0,Two or More Races (United States of America),6.0
0,White (United States of America),77.0
1-2,Asian (United States of America),20.0
1-2,Black or African American (United States of America),30.0
1-2,Hispanic or Latino (United States of America),12.0
1-2,Two or More Races (United States of America),6.0


In [69]:
current_employee_yos_race_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_employee_yos_race_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Asian (United States of America),11.0,77000.0
0,Black or African American (United States of America),5.0,87000.0
0,Hispanic or Latino (United States of America),5.0,75000.0
0,White (United States of America),65.0,90000.0
1-2,Asian (United States of America),16.0,87780.0
1-2,Black or African American (United States of America),12.0,89780.0
1-2,Hispanic or Latino (United States of America),7.0,82000.0
1-2,Two or More Races (United States of America),5.0,68000.0
1-2,White (United States of America),115.0,92780.0
3-5,Asian (United States of America),15.0,92260.14


In [70]:
current_employee_yos_race_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_employee_yos_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Black or African American (United States of America),15.0,25.64
0,Hispanic or Latino (United States of America),5.0,28.21
0,White (United States of America),12.0,29.52
1-2,Black or African American (United States of America),18.0,25.75
1-2,Hispanic or Latino (United States of America),5.0,21.85
1-2,White (United States of America),31.0,33.46
3-5,Black or African American (United States of America),6.0,21.83
3-5,White (United States of America),11.0,29.23
6-10,Black or African American (United States of America),15.0,24.38
6-10,White (United States of America),15.0,31.92


### Employee performance evaluations

In [71]:
fifteen = pd.concat([fifteen1,fifteen2])
fifteenrating_gender = fifteen.groupby(['gender'])['performance_rating'].median().sort_values(ascending=False)
fifteenrating_gender

gender
Male     3.40
Female   3.40
Name: performance_rating, dtype: float64

In [72]:
sixteen = pd.concat([sixteen1,sixteen2])
sixteenrating_gender = sixteen.groupby(['gender'])['performance_rating'].median().sort_values(ascending=False)
sixteenrating_gender

gender
Male     3.30
Female   3.30
Name: performance_rating, dtype: float64

In [73]:
seventeen = pd.concat([seventeen1,seventeen2])
seventeenrating_gender = seventeen.groupby(['gender'])['performance_rating'].median().sort_values(ascending=False)
seventeenrating_gender

gender
Male     3.40
Female   3.40
Name: performance_rating, dtype: float64

In [74]:
eighteen = pd.concat([eighteen1,eighteen2])
eighteenrating_gender = eighteen.groupby(['gender'])['performance_rating'].median().sort_values(ascending=False)
eighteenrating_gender

gender
Male     3.40
Female   3.40
Name: performance_rating, dtype: float64

In [75]:
fifteenrating_race_ethnicity = fifteen.groupby(['race_ethnicity'])['performance_rating'].median().sort_values(ascending=False)
fifteenrating_race_ethnicity

race_ethnicity
American Indian or Alaska Native (United States of America)            3.50
White (United States of America)                                       3.40
Asian (United States of America)                                       3.40
Two or More Races (United States of America)                           3.30
Prefer Not to Disclose (United States of America)                      3.30
Native Hawaiian or Other Pacific Islander (United States of America)   3.25
Hispanic or Latino (United States of America)                          3.20
Black or African American (United States of America)                   3.20
Name: performance_rating, dtype: float64

In [76]:
sixteenrating_race_ethnicity = sixteen.groupby(['race_ethnicity'])['performance_rating'].median().sort_values(ascending=False)
sixteenrating_race_ethnicity

race_ethnicity
Native Hawaiian or Other Pacific Islander (United States of America)   3.70
White (United States of America)                                       3.40
Asian (United States of America)                                       3.35
Prefer Not to Disclose (United States of America)                      3.30
American Indian or Alaska Native (United States of America)            3.25
Two or More Races (United States of America)                           3.20
Black or African American (United States of America)                   3.20
Hispanic or Latino (United States of America)                          3.10
Name: performance_rating, dtype: float64

In [77]:
seventeenrating_race_ethnicity = seventeen.groupby(['race_ethnicity'])['performance_rating'].median().sort_values(ascending=False)
seventeenrating_race_ethnicity

race_ethnicity
American Indian or Alaska Native (United States of America)            3.55
Native Hawaiian or Other Pacific Islander (United States of America)   3.50
White (United States of America)                                       3.40
Prefer Not to Disclose (United States of America)                      3.40
Asian (United States of America)                                       3.40
Two or More Races (United States of America)                           3.30
Hispanic or Latino (United States of America)                          3.30
Black or African American (United States of America)                   3.20
Name: performance_rating, dtype: float64

In [78]:
eighteenrating_race_ethnicity = eighteen.groupby(['race_ethnicity'])['performance_rating'].median().sort_values(ascending=False)
eighteenrating_race_ethnicity

race_ethnicity
American Indian or Alaska Native (United States of America)            3.55
White (United States of America)                                       3.50
Native Hawaiian or Other Pacific Islander (United States of America)   3.40
Asian (United States of America)                                       3.40
Prefer Not to Disclose (United States of America)                      3.35
Two or More Races (United States of America)                           3.30
Hispanic or Latino (United States of America)                          3.30
Black or African American (United States of America)                   3.30
Name: performance_rating, dtype: float64

In [79]:
fifteenrating_gender_race = fifteen.groupby(['race_ethnicity','gender'])['performance_rating'].median().sort_values(ascending=False)
fifteenrating_gender_race

race_ethnicity                                                        gender
White (United States of America)                                      Male     3.50
Asian (United States of America)                                      Male     3.50
American Indian or Alaska Native (United States of America)           Female   3.50
White (United States of America)                                      Female   3.40
Asian (United States of America)                                      Female   3.40
American Indian or Alaska Native (United States of America)           Male     3.40
Two or More Races (United States of America)                          Female   3.30
Prefer Not to Disclose (United States of America)                     Female   3.30
Native Hawaiian or Other Pacific Islander (United States of America)  Male     3.30
Hispanic or Latino (United States of America)                         Female   3.30
Native Hawaiian or Other Pacific Islander (United States of America)  Female   3.20

In [80]:
sixteenrating_gender_race = sixteen.groupby(['race_ethnicity','gender'])['performance_rating'].median().sort_values(ascending=False)
sixteenrating_gender_race

race_ethnicity                                                        gender
Native Hawaiian or Other Pacific Islander (United States of America)  Female   4.10
White (United States of America)                                      Male     3.40
                                                                      Female   3.40
Asian (United States of America)                                      Female   3.40
Prefer Not to Disclose (United States of America)                     Female   3.30
Native Hawaiian or Other Pacific Islander (United States of America)  Male     3.30
Asian (United States of America)                                      Male     3.30
American Indian or Alaska Native (United States of America)           Female   3.30
Black or African American (United States of America)                  Female   3.25
Two or More Races (United States of America)                          Female   3.20
American Indian or Alaska Native (United States of America)           Male     3.20

In [81]:
seventeenrating_gender_race = seventeen.groupby(['race_ethnicity','gender'])['performance_rating'].median().sort_values(ascending=False)
seventeenrating_gender_race

race_ethnicity                                                        gender
Native Hawaiian or Other Pacific Islander (United States of America)  Female   4.00
American Indian or Alaska Native (United States of America)           Female   3.70
Two or More Races (United States of America)                          Male     3.50
Prefer Not to Disclose (United States of America)                     Female   3.50
White (United States of America)                                      Male     3.40
                                                                      Female   3.40
Asian (United States of America)                                      Female   3.40
Hispanic or Latino (United States of America)                         Male     3.30
                                                                      Female   3.30
Asian (United States of America)                                      Male     3.30
Two or More Races (United States of America)                          Female   3.25

In [82]:
eighteenrating_gender_race = eighteen.groupby(['race_ethnicity','gender'])['performance_rating'].median().sort_values(ascending=False)
eighteenrating_gender_race

race_ethnicity                                                        gender
American Indian or Alaska Native (United States of America)           Female   3.70
Prefer Not to Disclose (United States of America)                     Female   3.55
White (United States of America)                                      Male     3.50
                                                                      Female   3.40
Native Hawaiian or Other Pacific Islander (United States of America)  Male     3.40
Asian (United States of America)                                      Male     3.40
                                                                      Female   3.40
Two or More Races (United States of America)                          Male     3.35
                                                                      Female   3.30
Prefer Not to Disclose (United States of America)                     Male     3.30
Hispanic or Latino (United States of America)                         Male     3.30

### Employee pay changes

In [83]:
reason_for_change = reason_for_change_combined.groupby(['business_process_reason']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(reason_for_change)

Unnamed: 0_level_0,count_nonzero
business_process_reason,Unnamed: 1_level_1
Request Compensation Change > Adjustment > Contract Increase,2451
Merit > Performance > Annual Performance Appraisal,1729
Data Change > Data Change > Change Job Details,673
Transfer > Transfer > Move to another Manager,533
Request Compensation Change > Adjustment > Change Plan Assignment,435
Request Compensation Change > Adjustment > Market Adjustment,384
Promotion > Promotion > Promotion,359
Hire Employee > New Hire > Fill Vacancy,253
Hire Employee > New Hire > New Position,189
Request Compensation Change > Adjustment > Increased Job Responsibilities,72


In [84]:
reason_for_change_gender = reason_for_change_combined.groupby(['business_process_reason','gender']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(reason_for_change_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
business_process_reason,gender,Unnamed: 2_level_1
Request Compensation Change > Adjustment > Contract Increase,Female,1284
Request Compensation Change > Adjustment > Contract Increase,Male,1167
Merit > Performance > Annual Performance Appraisal,Female,878
Merit > Performance > Annual Performance Appraisal,Male,851
Data Change > Data Change > Change Job Details,Female,367
Data Change > Data Change > Change Job Details,Male,306
Transfer > Transfer > Move to another Manager,Male,299
Request Compensation Change > Adjustment > Change Plan Assignment,Female,288
Transfer > Transfer > Move to another Manager,Female,234
Request Compensation Change > Adjustment > Market Adjustment,Female,233


In [85]:
reason_for_change_race = reason_for_change_combined.groupby(['business_process_reason','race_ethnicity']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(reason_for_change_race)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
business_process_reason,race_ethnicity,Unnamed: 2_level_1
Request Compensation Change > Adjustment > Contract Increase,White (United States of America),1556
Merit > Performance > Annual Performance Appraisal,White (United States of America),1109
Request Compensation Change > Adjustment > Contract Increase,Black or African American (United States of America),508
Data Change > Data Change > Change Job Details,White (United States of America),432
Merit > Performance > Annual Performance Appraisal,Black or African American (United States of America),347
Transfer > Transfer > Move to another Manager,White (United States of America),288
Request Compensation Change > Adjustment > Change Plan Assignment,White (United States of America),266
Request Compensation Change > Adjustment > Market Adjustment,White (United States of America),255
Promotion > Promotion > Promotion,White (United States of America),213
Request Compensation Change > Adjustment > Contract Increase,Asian (United States of America),195


In [86]:
reason_for_change_race_gender = reason_for_change_combined.groupby(['business_process_reason','race_ethnicity','gender']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(reason_for_change_race_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero
business_process_reason,race_ethnicity,gender,Unnamed: 3_level_1
Request Compensation Change > Adjustment > Contract Increase,White (United States of America),Female,794
Request Compensation Change > Adjustment > Contract Increase,White (United States of America),Male,762
Merit > Performance > Annual Performance Appraisal,White (United States of America),Male,564
Merit > Performance > Annual Performance Appraisal,White (United States of America),Female,545
Request Compensation Change > Adjustment > Contract Increase,Black or African American (United States of America),Female,275
Request Compensation Change > Adjustment > Contract Increase,Black or African American (United States of America),Male,233
Data Change > Data Change > Change Job Details,White (United States of America),Female,225
Data Change > Data Change > Change Job Details,White (United States of America),Male,207
Merit > Performance > Annual Performance Appraisal,Black or African American (United States of America),Female,183
Request Compensation Change > Adjustment > Change Plan Assignment,White (United States of America),Female,178


## News

### Gender

In [87]:
current_news_gender_salaried = news_salaried.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_news_gender_salaried)

Unnamed: 0_level_0,count_nonzero
gender,Unnamed: 1_level_1
Female,284.0
Male,290.0


In [88]:
current_news_gender_hourly = news_hourly.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_news_gender_hourly)

Unnamed: 0_level_0,count_nonzero
gender,Unnamed: 1_level_1
Female,63.0
Male,33.0


In [89]:
current_news_gender_salaried_median = news_salaried.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_salaried_median)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,284.0,95595.02
Male,290.0,116064.57


In [90]:
current_news_gender_hourly_median = news_hourly.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_hourly_median)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,63.0,32.75
Male,33.0,33.33


In [91]:
current_news_gender_age_salaried = news_salaried.groupby(['gender'])['age'].median().sort_values(ascending=False)
current_news_gender_age_salaried

gender
Male     41.00
Female   35.00
Name: age, dtype: float64

In [92]:
current_news_gender_age_hourly = news_hourly.groupby(['gender'])['age'].median().sort_values(ascending=False)
current_news_gender_age_hourly

gender
Male     36.00
Female   31.00
Name: age, dtype: float64

In [93]:
current_news_gender_age_5_salary = news_salaried.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_age_5_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,19.0,64280.0
<25,Male,5.0,72000.0
25-29,Female,60.0,80000.0
25-29,Male,31.0,85500.0
30-34,Female,57.0,87000.0
30-34,Male,46.0,97827.86
35-39,Female,38.0,98891.57
35-39,Male,48.0,116030.0
40-44,Female,22.0,133200.02
40-44,Male,41.0,125000.0


In [94]:
current_news_gender_age_5_hourly = news_hourly.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_age_5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,12.0,31.38
25-29,Female,17.0,31.17
25-29,Male,6.0,20.96
30-34,Male,7.0,33.73
35-39,Female,5.0,31.92
40-44,Female,5.0,41.43
45-49,Female,6.0,48.55
50-54,Female,5.0,38.93
55-59,Male,5.0,34.89


In [95]:
current_news_gender_age_10_salary = news_salaried.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_age_10_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,19.0,64280.0
<25,Male,5.0,72000.0
25-34,Female,117.0,83146.67
25-34,Male,77.0,92500.0
35-44,Female,60.0,105691.31
35-44,Male,89.0,118785.0
45-54,Female,49.0,108864.49
45-54,Male,64.0,117981.79
55-64,Female,34.0,140423.62
55-64,Male,45.0,146541.57


In [96]:
current_news_gender_age_10_hourly = news_hourly.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_age_10_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,12.0,31.38
25-34,Female,21.0,31.17
25-34,Male,13.0,30.77
35-44,Female,10.0,33.12
35-44,Male,7.0,35.9
45-54,Female,11.0,41.38
55-64,Female,5.0,42.14
55-64,Male,7.0,33.41


In [97]:
current_news_gender_salaried_under_40 = news_salaried[news_salaried['age'] < 40].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_salaried_under_40)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,174.0,84030.0
Male,130.0,95890.0


In [98]:
current_news_gender_salaried_over_40 = news_salaried[news_salaried['age'] > 39].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_salaried_over_40)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,110.0,126000.0
Male,160.0,127764.51


In [99]:
current_news_gender_hourly_under_40 = news_hourly[news_hourly['age'] < 40].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_hourly_under_40)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,38.0,31.43
Male,18.0,32.05


In [100]:
current_news_gender_hourly_over_40 = news_hourly[news_hourly['age'] > 39].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_hourly_over_40)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,25.0,41.43
Male,15.0,33.38


### Race and ethnicity

In [101]:
current_news_race_salaried = news_salaried.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_news_race_salaried)

Unnamed: 0_level_0,count_nonzero
race_ethnicity,Unnamed: 1_level_1
White (United States of America),406.0
Black or African American (United States of America),48.0
Asian (United States of America),46.0
Hispanic or Latino (United States of America),28.0
Two or More Races (United States of America),14.0
Prefer Not to Disclose (United States of America),8.0


In [102]:
current_news_race_hourly = news_hourly.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_news_race_hourly)

Unnamed: 0_level_0,count_nonzero
race_ethnicity,Unnamed: 1_level_1
White (United States of America),64.0
Black or African American (United States of America),13.0
Asian (United States of America),11.0


In [103]:
current_news_race_group_salaried = news_salaried.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_news_race_group_salaried)

Unnamed: 0_level_0,count_nonzero
race_grouping,Unnamed: 1_level_1
white,406.0
person of color,139.0
unknown,29.0


In [104]:
current_news_race_group_hourly = news_hourly.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_news_race_group_hourly)

Unnamed: 0_level_0,count_nonzero
race_grouping,Unnamed: 1_level_1
white,64.0
person of color,30.0


In [105]:
current_news_race_median_salaried = news_salaried.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_median_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),406.0,106212.1
Black or African American (United States of America),48.0,97276.46
Asian (United States of America),46.0,95205.02
Hispanic or Latino (United States of America),28.0,82890.0
Prefer Not to Disclose (United States of America),8.0,82140.0
Two or More Races (United States of America),14.0,79860.0


In [106]:
current_news_race_median_hourly = news_hourly.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_median_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),64.0,33.59
Asian (United States of America),11.0,31.68
Black or African American (United States of America),13.0,29.37


In [107]:
current_news_race_group_median_salaried = news_salaried.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_group_median_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
unknown,29.0,134780.0
white,406.0,106212.1
person of color,139.0,92080.0


In [108]:
current_news_race_group_median_hourly = news_hourly.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_group_median_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
white,64.0,33.59
person of color,30.0,30.07


In [109]:
current_news_race_age_salaried = news_salaried.groupby(['race_ethnicity'])['age'].median().sort_values(ascending=False)
current_news_race_age_salaried

race_ethnicity
American Indian or Alaska Native (United States of America)            49.50
Native Hawaiian or Other Pacific Islander (United States of America)   43.00
White (United States of America)                                       40.00
Black or African American (United States of America)                   39.50
Hispanic or Latino (United States of America)                          37.00
Asian (United States of America)                                       33.00
Prefer Not to Disclose (United States of America)                      30.50
Two or More Races (United States of America)                           28.00
Name: age, dtype: float64

In [110]:
current_news_race_age_hourly = news_hourly.groupby(['race_ethnicity'])['age'].median().sort_values(ascending=False)
current_news_race_age_hourly

race_ethnicity
American Indian or Alaska Native (United States of America)   69.00
White (United States of America)                              39.50
Asian (United States of America)                              36.00
Black or African American (United States of America)          28.00
Hispanic or Latino (United States of America)                 26.00
Prefer Not to Disclose (United States of America)             23.00
Two or More Races (United States of America)                  22.50
Name: age, dtype: float64

In [111]:
current_news_race_age_5_salary = news_salaried.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_age_5_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Asian (United States of America),5.0,65780.0
<25,White (United States of America),12.0,65140.0
25-29,Asian (United States of America),11.0,77000.0
25-29,Black or African American (United States of America),6.0,81000.0
25-29,Two or More Races (United States of America),6.0,75690.0
25-29,White (United States of America),59.0,81756.58
30-34,Asian (United States of America),10.0,95780.0
30-34,Black or African American (United States of America),9.0,88132.61
30-34,Hispanic or Latino (United States of America),6.0,80596.26
30-34,White (United States of America),66.0,92640.0


In [112]:
current_news_race_age_5_hourly = news_hourly.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_age_5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),7.0,18.5
25-29,Black or African American (United States of America),8.0,30.15
25-29,White (United States of America),11.0,30.77
30-34,White (United States of America),9.0,33.73
35-39,White (United States of America),5.0,34.72
40-44,White (United States of America),7.0,41.43
45-49,White (United States of America),6.0,48.55
50-54,White (United States of America),5.0,38.93
55-59,White (United States of America),6.0,33.93
60-64,White (United States of America),5.0,38.82


In [113]:
current_news_race_age_10_salary = news_salaried.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_age_10_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Asian (United States of America),5.0,65780.0
<25,White (United States of America),12.0,65140.0
25-34,Asian (United States of America),21.0,86000.0
25-34,Black or African American (United States of America),15.0,87000.0
25-34,Hispanic or Latino (United States of America),10.0,81249.94
25-34,Prefer Not to Disclose (United States of America),5.0,78500.0
25-34,Two or More Races (United States of America),9.0,76380.0
25-34,White (United States of America),125.0,86000.0
35-44,Asian (United States of America),11.0,108324.02
35-44,Black or African American (United States of America),13.0,118530.0


In [114]:
current_news_race_age_10_hourly = news_hourly.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_age_10_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),7.0,18.5
25-34,Black or African American (United States of America),8.0,30.15
25-34,White (United States of America),20.0,31.26
35-44,White (United States of America),12.0,35.31
45-54,White (United States of America),11.0,41.38
55-64,White (United States of America),11.0,34.89


In [115]:
current_news_race_group_age_5_salary = news_salaried.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_group_age_5_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,11.0,63780.0
<25,white,12.0,65140.0
25-29,person of color,27.0,80000.0
25-29,unknown,5.0,88280.0
25-29,white,59.0,81756.58
30-34,person of color,28.0,86982.54
30-34,unknown,9.0,108000.0
30-34,white,66.0,92640.0
35-39,person of color,23.0,99238.5
35-39,white,61.0,105780.0


In [116]:
current_news_race_group_age_5_hourly = news_hourly.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_group_age_5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,6.0,29.49
<25,white,7.0,18.5
25-29,person of color,12.0,27.07
25-29,white,11.0,30.77
30-34,white,9.0,33.73
35-39,white,5.0,34.72
40-44,white,7.0,41.43
45-49,white,6.0,48.55
50-54,white,5.0,38.93
55-59,white,6.0,33.93


In [117]:
current_news_race_group_age_10_salary = news_salaried.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_group_age_10_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,11.0,63780.0
<25,white,12.0,65140.0
25-34,person of color,55.0,83340.0
25-34,unknown,14.0,106890.0
25-34,white,125.0,86000.0
35-44,person of color,38.0,102890.0
35-44,unknown,7.0,140280.0
35-44,white,104.0,115258.47
45-54,person of color,26.0,106932.24
45-54,white,84.0,116687.17


In [118]:
current_news_race_group_age_10_hourly = news_hourly.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_group_age_10_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,6.0,29.49
<25,white,7.0,18.5
25-34,person of color,13.0,29.12
25-34,white,20.0,31.26
35-44,person of color,5.0,23.93
35-44,white,12.0,35.31
45-54,white,11.0,41.38
55-64,white,11.0,34.89


In [119]:
current_news_race_under_40_salaried = news_salaried[news_salaried['age'] < 40].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_under_40_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),198.0,90780.0
Black or African American (United States of America),24.0,87970.47
Asian (United States of America),33.0,87000.0
Hispanic or Latino (United States of America),19.0,79618.25
Prefer Not to Disclose (United States of America),6.0,77750.0
Two or More Races (United States of America),13.0,76380.0


In [120]:
current_news_race_over_40_salaried = news_salaried[news_salaried['age'] > 39].groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_over_40_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
unknown,12.0,151407.91
white,208.0,128484.46
person of color,50.0,110844.65


In [121]:
current_news_race_under_40_hourly = news_hourly[news_hourly['age'] < 40].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_under_40_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),32.0,31.96
Black or African American (United States of America),10.0,29.95
Asian (United States of America),7.0,25.02


In [122]:
current_news_race_over_40_hourly = news_hourly[news_hourly['age'] > 39].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_over_40_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),32.0,39.86


### Gender x race/ethnicity

In [123]:
current_news_race_gender_salaried = news_salaried.groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_news_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
race_ethnicity,gender,Unnamed: 2_level_1
Asian (United States of America),Female,34.0
Asian (United States of America),Male,12.0
Black or African American (United States of America),Female,24.0
Black or African American (United States of America),Male,24.0
Hispanic or Latino (United States of America),Female,14.0
Hispanic or Latino (United States of America),Male,14.0
Prefer Not to Disclose (United States of America),Male,5.0
Two or More Races (United States of America),Female,9.0
Two or More Races (United States of America),Male,5.0
White (United States of America),Female,188.0


In [124]:
current_news_race_gender_hourly = news_hourly.groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_news_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
race_ethnicity,gender,Unnamed: 2_level_1
Asian (United States of America),Female,8.0
Black or African American (United States of America),Female,8.0
Black or African American (United States of America),Male,5.0
White (United States of America),Female,41.0
White (United States of America),Male,23.0


In [125]:
current_news_race_gender_median_salaried = news_salaried.groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_gender_median_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,83.0,86511.34
person of color,Male,56.0,101575.0
unknown,Female,13.0,129970.48
unknown,Male,16.0,135280.0
white,Female,188.0,99640.0
white,Male,218.0,117451.77


In [126]:
current_news_race_gender_median_hourly = news_hourly.groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_gender_median_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Asian (United States of America),Female,8.0,29.99
Black or African American (United States of America),Female,8.0,30.97
Black or African American (United States of America),Male,5.0,20.91
White (United States of America),Female,41.0,34.72
White (United States of America),Male,23.0,33.38


In [127]:
current_news_race_gender_under_40_salaried = news_salaried[news_salaried['age'] < 40].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_gender_under_40_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Asian (United States of America),Female,25.0,86000.0
Asian (United States of America),Male,8.0,102890.0
Black or African American (United States of America),Female,16.0,85390.0
Black or African American (United States of America),Male,8.0,127890.0
Hispanic or Latino (United States of America),Female,12.0,80059.12
Hispanic or Latino (United States of America),Male,7.0,75000.0
Two or More Races (United States of America),Female,9.0,75000.0
White (United States of America),Female,105.0,85780.0
White (United States of America),Male,93.0,95655.73


In [128]:
current_news_race_gender_under_40_hourly = news_hourly[news_hourly['age'] < 40].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_gender_under_40_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Asian (United States of America),Female,5.0,25.02
Black or African American (United States of America),Female,6.0,30.97
White (United States of America),Female,21.0,31.92
White (United States of America),Male,11.0,33.73


In [129]:
current_news_race_gender_over_40_salaried = news_salaried[news_salaried['age'] > 39].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_gender_over_40_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Asian (United States of America),Female,9.0,111761.01
Black or African American (United States of America),Female,8.0,115002.24
Black or African American (United States of America),Male,16.0,107464.14
Hispanic or Latino (United States of America),Male,7.0,126580.0
White (United States of America),Female,83.0,122916.97
White (United States of America),Male,125.0,130000.0


In [130]:
current_news_race_gender_over_40_hourly = news_hourly[news_hourly['age'] > 39].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_gender_over_40_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
White (United States of America),Female,20.0,42.39
White (United States of America),Male,12.0,33.17


### Years of service

In [131]:
current_news_yos_salary = news_salaried.groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_salary)

Unnamed: 0_level_0,count_nonzero,median
years_of_service_grouped,Unnamed: 1_level_1,Unnamed: 2_level_1
0,65.0,90000.0
1-2,128.0,93780.0
3-5,146.0,92170.07
6-10,60.0,112925.5
11-15,50.0,110823.23
16-20,68.0,127654.56
21-25,24.0,143197.97
25+,33.0,139831.3


In [132]:
current_news_yos_hourly = news_hourly.groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_hourly)

Unnamed: 0_level_0,count_nonzero,median
years_of_service_grouped,Unnamed: 1_level_1,Unnamed: 2_level_1
0,16.0,29.49
1-2,26.0,32.71
3-5,9.0,32.97
6-10,15.0,35.91
11-15,10.0,36.54
16-20,11.0,32.31
21-25,5.0,38.93


In [133]:
current_news_yos_gender_salary = news_salaried.groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_gender_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Female,39.0,80000.0
0,Male,26.0,105000.0
1-2,Female,70.0,87390.0
1-2,Male,58.0,101787.8
3-5,Female,72.0,88530.0
3-5,Male,74.0,95265.36
6-10,Female,26.0,100640.36
6-10,Male,34.0,119561.75
11-15,Female,25.0,98544.65
11-15,Male,25.0,129780.0


In [134]:
current_news_yos_gender_hourly = news_hourly.groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Female,11.0,28.21
0,Male,5.0,30.77
1-2,Female,18.0,32.36
1-2,Male,8.0,33.35
3-5,Male,6.0,32.47
6-10,Female,8.0,31.38
6-10,Male,7.0,36.7
11-15,Female,9.0,38.36
16-20,Female,7.0,42.14


In [135]:
current_news_yos_race_salary = news_salaried.groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_race_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Asian (United States of America),7.0,77000.0
0,White (United States of America),42.0,100000.0
1-2,Asian (United States of America),13.0,84780.0
1-2,Black or African American (United States of America),10.0,89780.0
1-2,Hispanic or Latino (United States of America),6.0,82890.0
1-2,Two or More Races (United States of America),5.0,68000.0
1-2,White (United States of America),85.0,95780.0
3-5,Asian (United States of America),12.0,93630.07
3-5,Black or African American (United States of America),12.0,97276.46
3-5,Hispanic or Latino (United States of America),14.0,80809.07


In [136]:
current_news_yos_race_hourly = news_hourly.groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
0,White (United States of America),6.0,29.49
1-2,White (United States of America),18.0,32.84
3-5,White (United States of America),6.0,32.47
6-10,White (United States of America),9.0,35.91
11-15,White (United States of America),8.0,39.87
16-20,White (United States of America),9.0,42.14
21-25,White (United States of America),5.0,38.93


In [137]:
current_news_yos_race_gender_salary = news_salaried.groupby(['years_of_service_grouped','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_race_gender_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
years_of_service_grouped,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
0,person of color,Female,12.0,76000.0
0,person of color,Male,6.0,93500.0
0,white,Female,25.0,85000.0
0,white,Male,17.0,110000.0
1-2,person of color,Female,25.0,82000.0
1-2,person of color,Male,9.0,113280.0
1-2,unknown,Male,5.0,117780.0
1-2,white,Female,41.0,90780.0
1-2,white,Male,44.0,99780.0
3-5,person of color,Female,25.0,86965.08


In [138]:
current_news_yos_race_gender_hourly = news_hourly.groupby(['years_of_service_grouped','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
years_of_service_grouped,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
0,person of color,Female,6.0,29.49
1-2,person of color,Female,6.0,31.59
1-2,white,Female,12.0,32.71
1-2,white,Male,6.0,33.35
6-10,white,Male,5.0,35.91
11-15,white,Female,7.0,41.38
16-20,white,Female,6.0,42.39


### Age

In [139]:
current_median_news_age_5_salaried = news_salaried.groupby(['age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_salaried)

Unnamed: 0_level_0,count_nonzero,median
age_group_5,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,24.0,64640.0
25-29,91.0,80500.0
30-34,103.0,90780.0
35-39,86.0,105691.31
40-44,63.0,125768.93
45-49,43.0,102795.6
50-54,70.0,115769.96
55-59,51.0,147780.0
60-64,28.0,131216.77
65+,15.0,157095.42


In [140]:
current_median_news_age_5_hourly = news_hourly.groupby(['age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_hourly)

Unnamed: 0_level_0,count_nonzero,median
age_group_5,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,14.0,29.49
25-29,23.0,30.77
30-34,11.0,33.73
35-39,8.0,33.92
40-44,9.0,33.13
45-49,7.0,50.38
50-54,7.0,33.38
55-59,7.0,34.89
60-64,5.0,38.82
65+,5.0,42.64


In [141]:
current_median_news_age_10_salaried = news_salaried.groupby(['age_group_10']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_salaried)

Unnamed: 0_level_0,count_nonzero,median
age_group_10,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,24.0,64640.0
25-34,194.0,85890.0
35-44,149.0,115236.94
45-54,113.0,114803.0
55-64,79.0,141015.94
65+,15.0,157095.42


In [142]:
current_median_news_age_10_hourly = news_hourly.groupby(['age_group_10']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_hourly)

Unnamed: 0_level_0,count_nonzero,median
age_group_10,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,14.0,29.49
25-34,34.0,31.01
35-44,17.0,33.13
45-54,14.0,41.09
55-64,12.0,35.8
65+,5.0,42.64


In [143]:
current_news_age_5_yos_salary = news_salaried.groupby(['age_group_5','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_age_5_yos_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,years_of_service_grouped,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,0,9.0,66000.0
<25,1-2,13.0,63780.0
25-29,0,19.0,82000.0
25-29,1-2,30.0,78500.0
25-29,3-5,41.0,81756.58
30-34,0,13.0,87000.0
30-34,1-2,28.0,93528.23
30-34,3-5,43.0,88780.0
30-34,6-10,15.0,82311.85
35-39,0,9.0,110000.0


In [144]:
current_news_age_5_yos_hourly = news_hourly.groupby(['age_group_5','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_age_5_yos_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,years_of_service_grouped,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,0,6.0,24.11
<25,1-2,8.0,32.0
25-29,0,8.0,29.49
25-29,1-2,12.0,32.2


In [145]:
current_news_age_10_yos_salary = news_salaried.groupby(['age_group_10','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_age_10_yos_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,years_of_service_grouped,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,0,9.0,66000.0
<25,1-2,13.0,63780.0
25-34,0,32.0,85000.0
25-34,1-2,58.0,86280.0
25-34,3-5,84.0,85890.0
25-34,6-10,16.0,94675.93
35-44,0,16.0,125000.0
35-44,1-2,36.0,116530.0
35-44,3-5,38.0,110934.68
35-44,6-10,25.0,115236.94


In [146]:
current_news_age_10_yos_hourly = news_hourly.groupby(['age_group_10','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_age_10_yos_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,years_of_service_grouped,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,0,6.0,24.11
<25,1-2,8.0,32.0
25-34,0,9.0,30.77
25-34,1-2,16.0,32.71
25-34,3-5,6.0,29.99
35-44,11-15,6.0,33.92


In [147]:
current_median_news_age_5_gender_salaried = news_salaried.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,19.0,64280.0
<25,Male,5.0,72000.0
25-29,Female,60.0,80000.0
25-29,Male,31.0,85500.0
30-34,Female,57.0,87000.0
30-34,Male,46.0,97827.86
35-39,Female,38.0,98891.57
35-39,Male,48.0,116030.0
40-44,Female,22.0,133200.02
40-44,Male,41.0,125000.0


In [148]:
current_median_news_age_5_gender_hourly = news_hourly.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,12.0,31.38
25-29,Female,17.0,31.17
25-29,Male,6.0,20.96
30-34,Male,7.0,33.73
35-39,Female,5.0,31.92
40-44,Female,5.0,41.43
45-49,Female,6.0,48.55
50-54,Female,5.0,38.93
55-59,Male,5.0,34.89


In [149]:
current_median_news_age_10_gender_salaried = news_salaried.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,19.0,64280.0
<25,Male,5.0,72000.0
25-34,Female,117.0,83146.67
25-34,Male,77.0,92500.0
35-44,Female,60.0,105691.31
35-44,Male,89.0,118785.0
45-54,Female,49.0,108864.49
45-54,Male,64.0,117981.79
55-64,Female,34.0,140423.62
55-64,Male,45.0,146541.57


In [150]:
current_median_news_age_10_gender_hourly = news_hourly.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,12.0,31.38
25-34,Female,21.0,31.17
25-34,Male,13.0,30.77
35-44,Female,10.0,33.12
35-44,Male,7.0,35.9
45-54,Female,11.0,41.38
55-64,Female,5.0,42.14
55-64,Male,7.0,33.41


In [151]:
current_median_news_age_5_race_salaried = news_salaried.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Asian (United States of America),5.0,65780.0
<25,White (United States of America),12.0,65140.0
25-29,Asian (United States of America),11.0,77000.0
25-29,Black or African American (United States of America),6.0,81000.0
25-29,Two or More Races (United States of America),6.0,75690.0
25-29,White (United States of America),59.0,81756.58
30-34,Asian (United States of America),10.0,95780.0
30-34,Black or African American (United States of America),9.0,88132.61
30-34,Hispanic or Latino (United States of America),6.0,80596.26
30-34,White (United States of America),66.0,92640.0


In [152]:
current_median_news_age_5_race_hourly = news_hourly.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),7.0,18.5
25-29,Black or African American (United States of America),8.0,30.15
25-29,White (United States of America),11.0,30.77
30-34,White (United States of America),9.0,33.73
35-39,White (United States of America),5.0,34.72
40-44,White (United States of America),7.0,41.43
45-49,White (United States of America),6.0,48.55
50-54,White (United States of America),5.0,38.93
55-59,White (United States of America),6.0,33.93
60-64,White (United States of America),5.0,38.82


In [153]:
current_median_news_age_5_race_group_salaried = news_salaried.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_group_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,11.0,63780.0
<25,white,12.0,65140.0
25-29,person of color,27.0,80000.0
25-29,unknown,5.0,88280.0
25-29,white,59.0,81756.58
30-34,person of color,28.0,86982.54
30-34,unknown,9.0,108000.0
30-34,white,66.0,92640.0
35-39,person of color,23.0,99238.5
35-39,white,61.0,105780.0


In [154]:
current_median_news_age_5_race_group_hourly = news_hourly.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_group_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,6.0,29.49
<25,white,7.0,18.5
25-29,person of color,12.0,27.07
25-29,white,11.0,30.77
30-34,white,9.0,33.73
35-39,white,5.0,34.72
40-44,white,7.0,41.43
45-49,white,6.0,48.55
50-54,white,5.0,38.93
55-59,white,6.0,33.93


In [155]:
current_median_news_age_10_race_salaried = news_salaried.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Asian (United States of America),5.0,65780.0
<25,White (United States of America),12.0,65140.0
25-34,Asian (United States of America),21.0,86000.0
25-34,Black or African American (United States of America),15.0,87000.0
25-34,Hispanic or Latino (United States of America),10.0,81249.94
25-34,Prefer Not to Disclose (United States of America),5.0,78500.0
25-34,Two or More Races (United States of America),9.0,76380.0
25-34,White (United States of America),125.0,86000.0
35-44,Asian (United States of America),11.0,108324.02
35-44,Black or African American (United States of America),13.0,118530.0


In [156]:
current_median_news_age_10_race_hourly = news_hourly.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),7.0,18.5
25-34,Black or African American (United States of America),8.0,30.15
25-34,White (United States of America),20.0,31.26
35-44,White (United States of America),12.0,35.31
45-54,White (United States of America),11.0,41.38
55-64,White (United States of America),11.0,34.89


In [157]:
current_median_news_age_10_race_group_salaried = news_salaried.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_group_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,11.0,63780.0
<25,white,12.0,65140.0
25-34,person of color,55.0,83340.0
25-34,unknown,14.0,106890.0
25-34,white,125.0,86000.0
35-44,person of color,38.0,102890.0
35-44,unknown,7.0,140280.0
35-44,white,104.0,115258.47
45-54,person of color,26.0,106932.24
45-54,white,84.0,116687.17


In [158]:
current_median_news_age_10_race_group_hourly = news_hourly.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_group_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,6.0,29.49
<25,white,7.0,18.5
25-34,person of color,13.0,29.12
25-34,white,20.0,31.26
35-44,person of color,5.0,23.93
35-44,white,12.0,35.31
45-54,white,11.0,41.38
55-64,white,11.0,34.89


In [159]:
current_median_news_age_5_race_gender_salaried = news_salaried.groupby(['age_group_5','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_5,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,Asian (United States of America),Female,5.0,65780.0
<25,White (United States of America),Female,9.0,64280.0
25-29,Asian (United States of America),Female,9.0,77000.0
25-29,Black or African American (United States of America),Female,5.0,80000.0
25-29,White (United States of America),Female,38.0,81878.29
25-29,White (United States of America),Male,21.0,76780.0
30-34,Asian (United States of America),Female,8.0,100780.0
30-34,Black or African American (United States of America),Female,5.0,85780.0
30-34,Hispanic or Latino (United States of America),Female,6.0,80596.26
30-34,White (United States of America),Female,32.0,87660.0


In [160]:
current_median_news_age_5_race_gender_hourly = news_hourly.groupby(['age_group_5','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_5,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,White (United States of America),Female,5.0,32.0
25-29,White (United States of America),Female,10.0,31.23
30-34,White (United States of America),Male,6.0,34.43
45-49,White (United States of America),Female,6.0,48.55
55-59,White (United States of America),Male,5.0,34.89


In [161]:
current_median_news_age_5_race_group_gender_salaried = news_salaried.groupby(['age_group_5','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_group_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_5,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,person of color,Female,10.0,64390.0
<25,white,Female,9.0,64280.0
25-29,person of color,Female,19.0,77000.0
25-29,person of color,Male,8.0,88540.0
25-29,white,Female,38.0,81878.29
25-29,white,Male,21.0,76780.0
30-34,person of color,Female,22.0,86372.54
30-34,person of color,Male,6.0,106000.0
30-34,unknown,Male,6.0,120390.0
30-34,white,Female,32.0,87660.0


In [162]:
current_median_news_age_5_race_group_gender_hourly = news_hourly.groupby(['age_group_5','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_group_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_5,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,person of color,Female,6.0,29.49
<25,white,Female,5.0,32.0
25-29,person of color,Female,7.0,31.17
25-29,person of color,Male,5.0,20.91
25-29,white,Female,10.0,31.23
30-34,white,Male,6.0,34.43
45-49,white,Female,6.0,48.55
55-59,white,Male,5.0,34.89


In [163]:
current_median_news_age_10_race_gender_salaried = news_salaried.groupby(['age_group_10','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_10,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,Asian (United States of America),Female,5.0,65780.0
<25,White (United States of America),Female,9.0,64280.0
25-34,Asian (United States of America),Female,17.0,87000.0
25-34,Black or African American (United States of America),Female,10.0,81000.0
25-34,Black or African American (United States of America),Male,5.0,140000.0
25-34,Hispanic or Latino (United States of America),Female,8.0,81249.94
25-34,Two or More Races (United States of America),Female,6.0,75690.0
25-34,White (United States of America),Female,70.0,84640.0
25-34,White (United States of America),Male,55.0,90780.0
35-44,Asian (United States of America),Female,7.0,99238.5


In [164]:
current_median_news_age_10_race_gender_hourly = news_hourly.groupby(['age_group_10','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_10,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,White (United States of America),Female,5.0,32.0
25-34,White (United States of America),Female,13.0,30.84
25-34,White (United States of America),Male,7.0,33.73
35-44,White (United States of America),Female,7.0,34.72
35-44,White (United States of America),Male,5.0,35.9
45-54,White (United States of America),Female,9.0,44.46
55-64,White (United States of America),Male,7.0,33.41


In [165]:
current_median_news_age_10_race_group_gender_salaried = news_salaried.groupby(['age_group_10','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_group_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_10,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,person of color,Female,10.0,64390.0
<25,white,Female,9.0,64280.0
25-34,person of color,Female,41.0,81999.88
25-34,person of color,Male,14.0,89540.0
25-34,unknown,Female,6.0,92140.0
25-34,unknown,Male,8.0,120390.0
25-34,white,Female,70.0,84640.0
25-34,white,Male,55.0,90780.0
35-44,person of color,Female,19.0,100000.0
35-44,person of color,Male,19.0,113280.0


In [166]:
current_median_news_age_10_race_group_gender_hourly = news_hourly.groupby(['age_group_10','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_group_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_10,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,person of color,Female,6.0,29.49
<25,white,Female,5.0,32.0
25-34,person of color,Female,7.0,31.17
25-34,person of color,Male,6.0,20.96
25-34,white,Female,13.0,30.84
25-34,white,Male,7.0,33.73
35-44,white,Female,7.0,34.72
35-44,white,Male,5.0,35.9
45-54,white,Female,9.0,44.46
55-64,white,Male,7.0,33.41


### Desks

In [167]:
current_news_median_desk_salaried = news_salaried.groupby(['desk']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_salaried)

Unnamed: 0_level_0,count_nonzero,median
desk,Unnamed: 1_level_1,Unnamed: 2_level_1
National,106.0,149520.5
Foreign,25.0,135000.0
Financial,38.0,133509.94
Style,45.0,107170.81
Local,65.0,105780.0
Editorial,33.0,105000.0
Graphics,15.0,100780.0
Universal Desk,8.0,100444.28
Sports,37.0,100000.0
Outlook,6.0,99937.5


In [168]:
current_news_median_desk_hourly = news_hourly.groupby(['desk']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_hourly)

Unnamed: 0_level_0,count_nonzero,median
desk,Unnamed: 1_level_1,Unnamed: 2_level_1
Audio,6.0,39.75
Universal Desk,8.0,38.67
non-newsroom,7.0,37.58
Multiplatform,16.0,34.09
Editorial,5.0,32.31
National,12.0,31.74
Local,5.0,26.46
Style,9.0,21.77
Sports,11.0,20.91
Operations,7.0,15.59


In [169]:
current_news_median_desk_gender_salaried = news_salaried.groupby(['desk','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
desk,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
National,Male,57.0,169780.0
Foreign,Male,14.0,145390.0
Editorial,Male,18.0,140271.26
National,Female,49.0,139780.0
Financial,Male,25.0,136467.5
Foreign,Female,11.0,129970.48
Financial,Female,13.0,125000.0
Local,Male,31.0,118850.0
Style,Male,20.0,115036.81
Sports,Female,9.0,115000.0


In [170]:
current_news_median_desk_gender_hourly = news_hourly.groupby(['desk','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
desk,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Audio,Female,5.0,41.03
Universal Desk,Female,5.0,35.9
Multiplatform,Female,13.0,34.72
Sports,Male,8.0,32.97
National,Female,8.0,32.71
Style,Female,8.0,26.73


In [171]:
current_news_median_desk_race_salaried = news_salaried.groupby(['desk','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
desk,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
National,white,84.0,168780.0
Foreign,unknown,20.0,137500.0
Financial,white,29.0,136467.5
National,person of color,21.0,130780.0
Editorial,white,27.0,120000.27
Financial,person of color,6.0,115570.0
Style,white,38.0,112371.03
Local,white,46.0,107707.84
Sports,person of color,7.0,105000.0
Universal Desk,white,5.0,104393.45


In [172]:
current_news_median_desk_race_hourly = news_hourly.groupby(['desk','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
desk,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
Style,White (United States of America),5.0,38.93
Universal Desk,White (United States of America),6.0,38.67
Multiplatform,White (United States of America),12.0,36.54
Sports,White (United States of America),9.0,32.97
National,White (United States of America),9.0,32.71


In [173]:
current_news_median_desk_race_gender_salaried = news_salaried.groupby(['desk','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
desk,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
National,White (United States of America),Male,46.0,175374.24
Financial,White (United States of America),Male,21.0,140387.17
Editorial,White (United States of America),Male,16.0,140271.26
National,White (United States of America),Female,38.0,139733.72
National,Black or African American (United States of America),Male,8.0,135390.0
National,Asian (United States of America),Female,8.0,132780.0
Sports,White (United States of America),Female,6.0,132014.99
Financial,White (United States of America),Female,8.0,130390.0
Local,White (United States of America),Male,25.0,119553.2
non-newsroom,White (United States of America),Male,12.0,115640.0


In [174]:
current_news_median_desk_race_gender_hourly = news_hourly.groupby(['desk','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
desk,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
Style,White (United States of America),Female,5.0,38.93
Multiplatform,White (United States of America),Female,9.0,38.36
Sports,White (United States of America),Male,7.0,32.97
National,White (United States of America),Female,6.0,32.71


In [175]:
current_news_median_desk_race_group_gender_salaried = news_salaried.groupby(['desk','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_group_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
desk,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
National,white,Male,46.0,175374.24
Financial,white,Male,21.0,140387.17
Editorial,white,Male,16.0,140271.26
Foreign,unknown,Male,11.0,140000.0
National,white,Female,38.0,139733.72
Foreign,unknown,Female,9.0,135000.0
National,person of color,Female,10.0,132780.0
Sports,white,Female,6.0,132014.99
National,person of color,Male,11.0,130780.0
Financial,white,Female,8.0,130390.0


In [176]:
current_news_median_desk_race_group_gender_hourly = news_hourly.groupby(['desk','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_group_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
desk,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
Style,white,Female,5.0,38.93
Multiplatform,white,Female,9.0,38.36
Sports,white,Male,7.0,32.97
National,white,Female,6.0,32.71


In [177]:
current_news_median_desk_race_gender_age5_salaried = news_salaried.groupby(['desk','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
desk,race_ethnicity,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
National,White (United States of America),Male,40-44,9.0,170000.0
National,White (United States of America),Male,30-34,9.0,169780.0
National,White (United States of America),Female,50-54,5.0,167780.0
National,White (United States of America),Female,55-59,6.0,162854.23
National,White (United States of America),Female,40-44,5.0,160000.0
National,White (United States of America),Male,35-39,10.0,148640.0
Sports,White (United States of America),Male,35-39,7.0,147300.0
Financial,White (United States of America),Male,35-39,5.0,144755.0
Local,White (United States of America),Male,55-59,6.0,127654.56
National,White (United States of America),Female,25-29,5.0,125000.0


In [178]:
current_news_median_desk_race_gender_age5_hourly = news_hourly.groupby(['desk','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_gender_age5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
desk,race_ethnicity,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1


In [179]:
current_news_median_desk_race_group_gender_age5_salaried = news_salaried.groupby(['desk','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_group_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
desk,race_grouping,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
National,white,Male,40-44,9.0,170000.0
National,white,Male,30-34,9.0,169780.0
National,white,Female,50-54,5.0,167780.0
National,white,Female,55-59,6.0,162854.23
National,white,Female,40-44,5.0,160000.0
National,white,Male,35-39,10.0,148640.0
Sports,white,Male,35-39,7.0,147300.0
Financial,white,Male,35-39,5.0,144755.0
Local,white,Male,55-59,6.0,127654.56
Foreign,unknown,Male,30-34,5.0,125000.0


In [180]:
current_news_median_desk_race_group_gender_age5_hourly = news_hourly.groupby(['desk','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_group_gender_age5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
desk,race_grouping,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1


In [181]:
current_news_median_desk_tier_salaried = news_salaried.groupby(['tier']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_salaried)

Unnamed: 0_level_0,count_nonzero,median
tier,Unnamed: 1_level_1,Unnamed: 2_level_1
Tier 1,169.0,140387.17
Tier 2,209.0,105000.0
other,29.0,95780.0
Tier 3,131.0,86000.0
Tier 4,36.0,75000.0


In [182]:
current_news_median_desk_tier_gender_salaried = news_salaried.groupby(['tier','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
tier,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Tier 1,Male,96.0,152115.94
Tier 1,Female,73.0,135320.05
Tier 2,Male,112.0,112755.06
other,Male,16.0,102890.0
Tier 2,Female,97.0,99251.6
other,Female,13.0,95000.0
Tier 3,Male,56.0,90780.0
Tier 3,Female,75.0,81999.88
Tier 4,Female,26.0,75000.0
Tier 4,Male,10.0,74086.11


In [183]:
current_news_median_desk_tier_race_salaried = news_salaried.groupby(['tier','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_race_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
tier,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
Tier 1,White (United States of America),116.0,159150.0
Tier 1,Black or African American (United States of America),11.0,140000.0
Tier 1,Asian (United States of America),15.0,125780.0
Tier 2,White (United States of America),159.0,107170.81
Tier 2,Black or African American (United States of America),16.0,101702.73
other,White (United States of America),22.0,101390.0
Tier 2,Asian (United States of America),14.0,93835.1
Tier 2,Hispanic or Latino (United States of America),11.0,92080.0
Tier 2,Two or More Races (United States of America),6.0,89107.5
Tier 3,White (United States of America),86.0,88780.0


In [184]:
current_news_median_desk_tier_race_gender_salaried = news_salaried.groupby(['tier','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
tier,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
Tier 1,White (United States of America),Male,68.0,169870.29
Tier 1,White (United States of America),Female,48.0,135824.85
Tier 1,Black or African American (United States of America),Male,8.0,135390.0
Tier 1,Asian (United States of America),Female,10.0,128430.0
Tier 1,Asian (United States of America),Male,5.0,125000.0
Tier 2,White (United States of America),Male,93.0,117843.5
Tier 2,Hispanic or Latino (United States of America),Male,5.0,117780.0
Tier 2,Black or African American (United States of America),Male,7.0,116349.15
other,White (United States of America),Male,12.0,115640.0
Tier 2,White (United States of America),Female,66.0,102423.86


In [185]:
current_news_median_desk_tier_race_group_gender_salaried = news_salaried.groupby(['tier','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_race_group_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
tier,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
Tier 1,white,Male,68.0,169870.29
Tier 1,unknown,Male,14.0,137890.0
Tier 1,unknown,Female,10.0,137640.0
Tier 1,white,Female,48.0,135824.85
Tier 1,person of color,Male,14.0,135390.0
Tier 1,person of color,Female,15.0,125780.0
Tier 2,white,Male,93.0,117843.5
other,white,Male,12.0,115640.0
Tier 2,person of color,Male,19.0,105000.0
Tier 2,white,Female,66.0,102423.86


In [186]:
current_news_median_desk_tier_race_gender_age5_salaried = news_salaried.groupby(['tier','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_race_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
tier,race_ethnicity,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
Tier 1,White (United States of America),Male,60-64,5.0,174968.48
Tier 1,White (United States of America),Male,40-44,13.0,170000.0
Tier 1,White (United States of America),Female,45-49,5.0,165000.0
Tier 1,White (United States of America),Male,55-59,8.0,162890.0
Tier 1,White (United States of America),Female,55-59,6.0,162854.23
Tier 1,White (United States of America),Female,40-44,5.0,160000.0
Tier 2,White (United States of America),Female,55-59,5.0,149029.98
Tier 2,White (United States of America),Male,65+,6.0,147473.21
Tier 2,White (United States of America),Male,55-59,16.0,147160.79
Tier 1,White (United States of America),Male,35-39,15.0,144755.0


In [187]:
current_news_median_desk_tier_race_group_gender_age5_salaried = news_salaried.groupby(['tier','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_race_group_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
tier,race_grouping,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
Tier 1,white,Male,60-64,5.0,174968.48
Tier 1,white,Male,40-44,13.0,170000.0
Tier 1,white,Female,45-49,5.0,165000.0
Tier 1,white,Male,55-59,8.0,162890.0
Tier 1,white,Female,55-59,6.0,162854.23
Tier 1,white,Female,40-44,5.0,160000.0
Tier 2,white,Female,55-59,5.0,149029.98
Tier 2,white,Male,65+,6.0,147473.21
Tier 2,white,Male,55-59,16.0,147160.79
Tier 1,white,Male,35-39,15.0,144755.0


### Job profiles

In [188]:
current_news_median_job_salaried = news_salaried.groupby(['job_profile_current']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_salaried)

Unnamed: 0_level_0,count_nonzero,median
job_profile_current,Unnamed: 1_level_1,Unnamed: 2_level_1
300113 - Columnist,19.0,170496.8
300313 - Columnist - Editorial,7.0,151896.27
320113 - Critic,9.0,150962.35
330113 - Editorial Writer,7.0,129236.03
280212 - Staff Writer,306.0,124040.0
390510 - Graphics Editor,7.0,111071.0
360114 - Photographer,16.0,106014.84
126902 - Topic Editor,6.0,103771.73
390610 - Graphics Reporter,8.0,97280.0
120602 - Operations Editor,7.0,90780.0


In [189]:
current_news_median_job_hourly = news_hourly.groupby(['job_profile_current']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_hourly)

Unnamed: 0_level_0,count_nonzero,median
job_profile_current,Unnamed: 1_level_1,Unnamed: 2_level_1
280225 - Producer,18.0,36.74
400151 - Administrative Aide,6.0,35.3
397110 - Multiplatform Editor (PT/PTOC),23.0,34.72
380117 - Research Assistant,6.0,31.23
410251 - Editorial Aide,12.0,21.45
430117 - News Aide,8.0,17.06
440116 - Copy Aide,5.0,15.19


In [190]:
current_news_median_job_gender_salaried = news_salaried.groupby(['job_profile_current','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
job_profile_current,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
300113 - Columnist,Male,8.0,175984.43
330113 - Editorial Writer,Male,5.0,164899.53
320113 - Critic,Male,5.0,160780.0
300113 - Columnist,Female,11.0,154780.0
300313 - Columnist - Editorial,Male,5.0,151896.27
280212 - Staff Writer,Male,170.0,128439.57
280212 - Staff Writer,Female,136.0,113474.07
390510 - Graphics Editor,Male,5.0,111071.0
360114 - Photographer,Male,11.0,109928.29
280226 - Video Journalist,Male,8.0,98555.0


In [191]:
current_news_median_job_gender_hourly = news_hourly.groupby(['job_profile_current','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
job_profile_current,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
280225 - Producer,Male,6.0,36.74
397110 - Multiplatform Editor (PT/PTOC),Female,14.0,36.54
280225 - Producer,Female,12.0,36.35
400151 - Administrative Aide,Female,6.0,35.3
397110 - Multiplatform Editor (PT/PTOC),Male,9.0,33.41
380117 - Research Assistant,Female,5.0,31.68
410251 - Editorial Aide,Female,8.0,21.45


In [192]:
current_news_median_job_race_salaried = news_salaried.groupby(['job_profile_current','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
job_profile_current,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
300313 - Columnist - Editorial,White (United States of America),6.0,190948.14
300113 - Columnist,White (United States of America),13.0,176780.0
300113 - Columnist,Black or African American (United States of America),5.0,153061.0
320113 - Critic,White (United States of America),8.0,149371.17
330113 - Editorial Writer,White (United States of America),6.0,127118.49
280212 - Staff Writer,White (United States of America),223.0,125000.0
280212 - Staff Writer,Black or African American (United States of America),18.0,122340.98
280212 - Staff Writer,Asian (United States of America),24.0,116892.5
390510 - Graphics Editor,White (United States of America),5.0,111071.0
360114 - Photographer,White (United States of America),12.0,106014.84


In [193]:
current_news_median_job_race_hourly = news_hourly.groupby(['job_profile_current','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
job_profile_current,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
280225 - Producer,Black or African American (United States of America),5.0,37.58
280225 - Producer,White (United States of America),8.0,35.91
397110 - Multiplatform Editor (PT/PTOC),White (United States of America),18.0,34.8
380117 - Research Assistant,White (United States of America),5.0,31.68
410251 - Editorial Aide,White (United States of America),7.0,21.12
430117 - News Aide,White (United States of America),5.0,16.5


In [194]:
current_news_median_job_race_gender_salaried = news_salaried.groupby(['job_profile_current','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
job_profile_current,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
300113 - Columnist,White (United States of America),Female,7.0,224460.51
300113 - Columnist,White (United States of America),Male,6.0,175984.43
320113 - Critic,White (United States of America),Male,5.0,160780.0
280212 - Staff Writer,White (United States of America),Male,130.0,129280.0
280212 - Staff Writer,Black or African American (United States of America),Male,13.0,125000.0
280212 - Staff Writer,Asian (United States of America),Male,9.0,118785.0
280212 - Staff Writer,Asian (United States of America),Female,15.0,115000.0
280212 - Staff Writer,White (United States of America),Female,93.0,115000.0
360114 - Photographer,White (United States of America),Male,7.0,113756.68
280212 - Staff Writer,Black or African American (United States of America),Female,5.0,108864.49


In [195]:
current_news_median_job_race_gender_hourly = news_hourly.groupby(['job_profile_current','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
job_profile_current,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
397110 - Multiplatform Editor (PT/PTOC),White (United States of America),Female,10.0,39.87
280225 - Producer,White (United States of America),Female,5.0,34.24
397110 - Multiplatform Editor (PT/PTOC),White (United States of America),Male,8.0,33.39
410251 - Editorial Aide,White (United States of America),Female,5.0,21.12


In [196]:
current_news_median_job_race_group_gender_salaried = news_salaried.groupby(['desk','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_group_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
desk,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
National,white,Male,46.0,175374.24
Financial,white,Male,21.0,140387.17
Editorial,white,Male,16.0,140271.26
Foreign,unknown,Male,11.0,140000.0
National,white,Female,38.0,139733.72
Foreign,unknown,Female,9.0,135000.0
National,person of color,Female,10.0,132780.0
Sports,white,Female,6.0,132014.99
National,person of color,Male,11.0,130780.0
Financial,white,Female,8.0,130390.0


In [197]:
current_news_median_job_race_group_gender_hourly = news_hourly.groupby(['job_profile_current','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_group_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
job_profile_current,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
397110 - Multiplatform Editor (PT/PTOC),white,Female,10.0,39.87
280225 - Producer,person of color,Female,6.0,35.9
280225 - Producer,white,Female,5.0,34.24
397110 - Multiplatform Editor (PT/PTOC),white,Male,8.0,33.39
410251 - Editorial Aide,white,Female,5.0,21.12


In [198]:
current_news_median_job_race_gender_age5_salaried = news_salaried.groupby(['job_profile_current','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
job_profile_current,race_ethnicity,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
280212 - Staff Writer,White (United States of America),Male,65+,5.0,159458.37
280212 - Staff Writer,White (United States of America),Male,55-59,17.0,153922.58
280212 - Staff Writer,White (United States of America),Female,55-59,7.0,153780.0
280212 - Staff Writer,White (United States of America),Female,45-49,10.0,144559.75
280212 - Staff Writer,White (United States of America),Female,40-44,9.0,140000.0
280212 - Staff Writer,White (United States of America),Male,60-64,11.0,134957.37
280212 - Staff Writer,White (United States of America),Male,40-44,20.0,132980.42
280212 - Staff Writer,White (United States of America),Male,50-54,14.0,132273.46
280212 - Staff Writer,White (United States of America),Male,45-49,9.0,130845.0
280212 - Staff Writer,White (United States of America),Female,60-64,6.0,128441.42


In [199]:
current_news_median_job_race_gender_age5_hourly = news_hourly.groupby(['job_profile_current','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_gender_age5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
job_profile_current,race_ethnicity,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1


In [200]:
current_news_median_job_race_group_gender_age5_salaried = news_salaried.groupby(['job_profile_current','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_group_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
job_profile_current,race_grouping,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
280212 - Staff Writer,white,Male,65+,5.0,159458.37
280212 - Staff Writer,white,Male,55-59,17.0,153922.58
280212 - Staff Writer,white,Female,55-59,7.0,153780.0
280212 - Staff Writer,white,Female,45-49,10.0,144559.75
280212 - Staff Writer,white,Female,40-44,9.0,140000.0
280212 - Staff Writer,white,Male,60-64,11.0,134957.37
280212 - Staff Writer,white,Male,40-44,20.0,132980.42
280212 - Staff Writer,white,Male,50-54,14.0,132273.46
280212 - Staff Writer,white,Male,45-49,9.0,130845.0
280212 - Staff Writer,white,Female,60-64,6.0,128441.42


In [201]:
current_news_median_job_race_group_gender_age5_hourly = news_hourly.groupby(['job_profile_current','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_group_gender_age5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
job_profile_current,race_grouping,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1


### Performance evaluations

In [202]:
news_ratings = ratings_combined[ratings_combined['dept'] == 'News']

In [203]:
news_ratings_gender = news_ratings.groupby(['gender']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress_median(news_ratings_gender)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,1892.0,3.4
Male,1772.0,3.4


In [204]:
news_ratings_race = news_ratings.groupby(['race_ethnicity']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress_median(news_ratings_race)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
American Indian or Alaska Native (United States of America),12.0,3.6
White (United States of America),2516.0,3.5
Asian (United States of America),324.0,3.4
Prefer Not to Disclose (United States of America),56.0,3.4
Black or African American (United States of America),416.0,3.3
Hispanic or Latino (United States of America),164.0,3.3
Native Hawaiian or Other Pacific Islander (United States of America),8.0,3.3
Two or More Races (United States of America),80.0,3.2


In [205]:
news_ratings_race_gender = news_ratings.groupby(['race_ethnicity','gender']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(news_ratings_race_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
American Indian or Alaska Native (United States of America),Female,8.0,3.7
Asian (United States of America),Female,232.0,3.4
Asian (United States of America),Male,92.0,3.4
Black or African American (United States of America),Female,224.0,3.25
Black or African American (United States of America),Male,192.0,3.3
Hispanic or Latino (United States of America),Female,80.0,3.3
Hispanic or Latino (United States of America),Male,84.0,3.3
Native Hawaiian or Other Pacific Islander (United States of America),Male,8.0,3.3
Prefer Not to Disclose (United States of America),Female,24.0,3.5
Prefer Not to Disclose (United States of America),Male,32.0,3.3


In [206]:
news_ratings_race_gender_under3 = news_ratings[news_ratings['performance_rating'] < 3.1].groupby(['race_grouping','gender']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(news_ratings_race_gender_under3)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,57.0,3.0
person of color,Male,49.0,3.0
white,Female,92.0,3.0
white,Male,80.0,3.0


In [207]:
news_ratings_race_gender_over4 = news_ratings[news_ratings['performance_rating'] > 3.9].groupby(['race_grouping','gender']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(news_ratings_race_gender_over4)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,13.0,4.1
person of color,Male,5.0,4.1
unknown,Female,5.0,4.1
unknown,Male,10.0,4.05
white,Female,67.0,4.1
white,Male,114.0,4.2


### Pay changes

In [208]:
news_change = reason_for_change_combined[reason_for_change_combined['dept'] == 'News']

In [209]:
news_change_gender = news_change.groupby(['business_process_reason','gender']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(news_change_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
business_process_reason,gender,Unnamed: 2_level_1
Request Compensation Change > Adjustment > Contract Increase,Male,813
Request Compensation Change > Adjustment > Contract Increase,Female,809
Merit > Performance > Annual Performance Appraisal,Male,623
Merit > Performance > Annual Performance Appraisal,Female,583
Data Change > Data Change > Change Job Details,Female,282
Data Change > Data Change > Change Job Details,Male,245
Transfer > Transfer > Move to another Manager,Male,185
Request Compensation Change > Adjustment > Market Adjustment,Female,169
Request Compensation Change > Adjustment > Market Adjustment,Male,131
Transfer > Transfer > Move to another Manager,Female,111


In [210]:
news_change_race = news_change.groupby(['business_process_reason','race_ethnicity']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(news_change_race)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
business_process_reason,race_ethnicity,Unnamed: 2_level_1
Request Compensation Change > Adjustment > Contract Increase,White (United States of America),1164
Merit > Performance > Annual Performance Appraisal,White (United States of America),889
Data Change > Data Change > Change Job Details,White (United States of America),345
Transfer > Transfer > Move to another Manager,White (United States of America),201
Request Compensation Change > Adjustment > Market Adjustment,White (United States of America),198
Request Compensation Change > Adjustment > Contract Increase,Black or African American (United States of America),169
Request Compensation Change > Adjustment > Contract Increase,Asian (United States of America),138
Merit > Performance > Annual Performance Appraisal,Black or African American (United States of America),108
Merit > Performance > Annual Performance Appraisal,Asian (United States of America),106
Promotion > Promotion > Promotion,White (United States of America),104


### Performance evaluations x merit raises

In [211]:
reason_for_change_combined['merit_raises'] = reason_for_change_combined['business_process_reason'].str.contains('Merit', re.IGNORECASE)

In [212]:
twenty14 = np.datetime64('2016-04-01')
twenty15 = np.datetime64('2017-04-01')
twenty16 = np.datetime64('2018-04-01')
twenty17 = np.datetime64('2019-04-01')
twenty18 = np.datetime64('2020-04-01')

def raise_time(row):
    if row['effective_date'] < twenty14:
        return 'before 2015'
    if row['effective_date'] < twenty15:
        return '2015'
    if row['effective_date'] < twenty16:
        return '2016'
    if row['effective_date'] < twenty17:
        return '2017'
    if row['effective_date'] < twenty18:
        return '2018'
    return 'unknown'

reason_for_change_combined['raise_after'] = reason_for_change_combined.apply(lambda row: raise_time(row), axis=1)

In [213]:
merit_raises_news_gender_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['gender']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress(merit_raises_news_gender_salaried)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,431.0,3000.0
Male,494.0,3000.0


In [214]:
merit_raises_news_gender_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['gender']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress(merit_raises_news_gender_hourly)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,78.0,1.27
Male,51.0,1.03


In [215]:
merit_raises_news_race_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['race_ethnicity']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_news_race_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
American Indian or Alaska Native (United States of America),5.0,3500.0
Two or More Races (United States of America),7.0,3500.0
Asian (United States of America),69.0,3000.0
Black or African American (United States of America),82.0,3000.0
White (United States of America),707.0,3000.0
Hispanic or Latino (United States of America),36.0,2500.0


In [216]:
merit_raises_news_race_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['race_ethnicity']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_news_race_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),91.0,1.28
Black or African American (United States of America),16.0,1.25
Asian (United States of America),18.0,1.03


In [217]:
merit_raises_news_race_group_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_news_race_group_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
person of color,200.0,3000.0
white,707.0,3000.0
unknown,18.0,2860.0


In [218]:
merit_raises_news_race_group_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_news_race_group_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
white,91.0,1.28
person of color,38.0,1.03


In [219]:
merit_raises_news_gender_race_group_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_news_gender_race_group_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,unknown,10.0,3500.0
Female,person of color,112.0,3000.0
Female,white,309.0,3000.0
Male,white,398.0,3000.0
Male,person of color,88.0,2900.0
Male,unknown,8.0,2457.5


In [220]:
merit_raises_news_gender_race_group_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_news_gender_race_group_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,white,59.0,1.28
Female,person of color,19.0,1.26
Male,person of color,19.0,1.03
Male,white,32.0,1.02


In [221]:
fifteen_raises_amount = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2015')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2015_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(fifteen_raises_amount)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,17.0,2888.0
Female,white,44.0,2500.0
Male,person of color,10.0,2162.5
Male,white,64.0,3000.0


In [222]:
fifteen_raises_score = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2015')].groupby(['gender','race_grouping']).agg({'2015_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(fifteen_raises_score)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,17.0,3.4
Female,white,44.0,3.7
Male,person of color,10.0,3.5
Male,white,64.0,3.65


In [223]:
sixteen_raises_amount = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2016')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2016_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(sixteen_raises_amount)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,26.0,3000.0
Female,white,60.0,3000.0
Male,person of color,17.0,3000.0
Male,white,81.0,3000.0


In [224]:
sixteen_raises_score = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2016')].groupby(['gender','race_grouping']).agg({'2016_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(sixteen_raises_score)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,26.0,3.4
Female,white,60.0,3.5
Male,person of color,17.0,3.4
Male,white,81.0,3.6


In [225]:
seventeen_raises_amount = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2017')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2017_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(seventeen_raises_amount)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,25.0,3000.0
Female,white,59.0,2500.0
Male,person of color,25.0,3000.0
Male,white,89.0,3000.0


In [226]:
seventeen_raises_score = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2017')].groupby(['gender','race_grouping']).agg({'2017_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(seventeen_raises_score)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,25.0,3.5
Female,white,59.0,3.4
Male,person of color,25.0,3.4
Male,white,89.0,3.6


In [227]:
eighteen_raises_amount = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2018')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2018_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(eighteen_raises_amount)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,28.0,3000.0
Female,white,104.0,3000.0
Male,person of color,26.0,2500.0
Male,white,120.0,3000.0


In [228]:
eighteen_raises_score = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2018')].groupby(['gender','race_grouping']).agg({'2018_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(eighteen_raises_score)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,28.0,3.5
Female,white,104.0,3.5
Male,person of color,26.0,3.4
Male,white,120.0,3.6


In [229]:
merit_raises_15 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2015') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_16 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2016') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_17 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2017') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_18 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2018') & (reason_for_change_combined['merit_raises'] == True)]

merit_raises_15 = merit_raises_15[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating']].rename(columns={'2015_annual_performance_rating':'performance_rating'})
merit_raises_16 = merit_raises_16[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2016_annual_performance_rating']].rename(columns={'2016_annual_performance_rating':'performance_rating'})
merit_raises_17 = merit_raises_17[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2017_annual_performance_rating']].rename(columns={'2017_annual_performance_rating':'performance_rating'})
merit_raises_18 = merit_raises_18[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2018_annual_performance_rating']].rename(columns={'2018_annual_performance_rating':'performance_rating'})

merit_raises_15 = pd.DataFrame(merit_raises_15)
merit_raises_16 = pd.DataFrame(merit_raises_16)
merit_raises_17 = pd.DataFrame(merit_raises_17)
merit_raises_18 = pd.DataFrame(merit_raises_18)

merit_raises_combined = pd.concat([merit_raises_15,merit_raises_16,merit_raises_17,merit_raises_18])

In [230]:
news_salaried_raises = merit_raises_combined[(merit_raises_combined['pay_rate_type'] == 'Salaried') & (merit_raises_combined['dept'] == 'News')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress(news_salaried_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,96.0,3000.0
Female,unknown,9.0,3000.0
Female,white,267.0,3000.0
Male,person of color,78.0,2658.52
Male,unknown,7.0,2500.0
Male,white,354.0,3000.0


In [231]:
news_salaried_raises_scores = merit_raises_combined[(merit_raises_combined['pay_rate_type'] == 'Salaried') & (merit_raises_combined['dept'] == 'News')].groupby(['gender','race_grouping']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(news_salaried_raises_scores)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,96.0,3.4
Female,unknown,9.0,3.9
Female,white,267.0,3.5
Male,person of color,78.0,3.4
Male,unknown,7.0,3.7
Male,white,354.0,3.6


In [232]:
news_hourly_raises = merit_raises_combined[(merit_raises_combined['pay_rate_type'] == 'Hourly') & (merit_raises_combined['dept'] == 'News')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress(news_hourly_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,18.0,1.27
Female,white,54.0,1.46
Male,person of color,19.0,1.03
Male,white,28.0,1.16


In [233]:
news_hourly_raises_scores = merit_raises_combined[(merit_raises_combined['pay_rate_type'] == 'Hourly') & (merit_raises_combined['dept'] == 'News')].groupby(['gender','race_grouping']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(news_hourly_raises_scores)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,18.0,3.4
Female,white,54.0,3.5
Male,person of color,19.0,3.4
Male,white,28.0,3.6


### Era

In [234]:
bezos = df[(df['hire_date'] > '2013-10-04') & (df['dept'] == 'News') & (df['pay_rate_type'] == 'Salaried')]
graham = df[(df['hire_date'] < '2013-10-05') & (df['dept'] == 'News') & (df['pay_rate_type'] == 'Salaried')]

In [235]:
bezos_gender = bezos.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(bezos_gender)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Male,157.0,100780.0
Female,180.0,87160.0


In [236]:
graham_gender = graham.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(graham_gender)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Male,133.0,127059.4
Female,104.0,112136.48


In [237]:
bezos_race = bezos.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(bezos_race)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
Black or African American (United States of America),26.0,94963.74
White (United States of America),224.0,94519.11
Asian (United States of America),31.0,87000.0
Prefer Not to Disclose (United States of America),8.0,82140.0
Hispanic or Latino (United States of America),22.0,81249.94
Two or More Races (United States of America),14.0,79860.0


In [238]:
graham_race = graham.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(graham_race)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
Hispanic or Latino (United States of America),6.0,135272.46
White (United States of America),182.0,124500.0
Asian (United States of America),15.0,111761.01
Black or African American (United States of America),22.0,104397.79


In [239]:
bezos_race_group = bezos.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(bezos_race_group)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
unknown,20.0,113890.0
white,224.0,94519.11
person of color,93.0,86000.0


In [240]:
graham_race_group = graham.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(graham_race_group)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
unknown,9.0,151170.88
white,182.0,124500.0
person of color,46.0,110844.65


In [241]:
bezos_gender_race_group = bezos.groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(bezos_gender_race_group)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
unknown,Male,10.0,121390.0
unknown,Female,10.0,109000.0
white,Male,115.0,102780.0
person of color,Male,32.0,94026.24
white,Female,109.0,88780.0
person of color,Female,61.0,82000.0


In [242]:
graham_gender_race_group = graham.groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(graham_gender_race_group)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
unknown,Male,6.0,150975.44
white,Male,103.0,128629.42
person of color,Male,24.0,117567.07
white,Female,79.0,112511.94
person of color,Female,22.0,108594.26


In [243]:
bezos_gender_race_group_age5 = bezos.groupby(['race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(bezos_gender_race_group_age5)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
race_grouping,gender,age_group_5,Unnamed: 3_level_1,Unnamed: 4_level_1
white,Female,45-49,7.0,160780.0
white,Male,55-59,8.0,156806.68
white,Female,40-44,6.0,143750.0
white,Male,40-44,15.0,136467.5
person of color,Male,35-39,8.0,115530.0
white,Female,50-54,8.0,114975.4
white,Male,35-39,24.0,107880.0
white,Female,35-39,15.0,105000.0
white,Male,45-49,9.0,102795.6
person of color,Female,35-39,8.0,99619.25


In [244]:
graham_gender_race_group_age5 = graham.groupby(['race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(graham_gender_race_group_age5)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
race_grouping,gender,age_group_5,Unnamed: 3_level_1,Unnamed: 4_level_1
white,Male,65+,8.0,153937.49
white,Male,35-39,11.0,147300.0
white,Male,55-59,19.0,146541.57
white,Female,55-59,16.0,138564.42
white,Male,50-54,21.0,134546.92
white,Male,60-64,14.0,123514.68
white,Female,40-44,5.0,120780.0
person of color,Female,40-44,5.0,118512.33
person of color,Male,50-54,11.0,116349.15
white,Male,40-44,17.0,115236.94


In [245]:
bezos_gender_race_group_age5_tier = bezos.groupby(['race_grouping','gender','age_group_5','tier']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(bezos_gender_race_group_age5_tier)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
race_grouping,gender,age_group_5,tier,Unnamed: 4_level_1,Unnamed: 5_level_1
white,Male,40-44,Tier 1,7.0,193280.0
white,Male,35-39,Tier 1,10.0,130017.5
white,Female,35-39,Tier 1,8.0,128330.0
white,Male,30-34,Tier 1,12.0,125233.27
white,Male,45-49,Tier 2,5.0,120780.0
white,Female,25-29,Tier 1,5.0,100000.0
white,Male,30-34,Tier 2,5.0,100000.0
white,Male,35-39,Tier 2,8.0,98890.0
white,Female,30-34,Tier 2,6.0,93780.0
white,Male,25-29,Tier 2,6.0,91282.5


In [246]:
graham_gender_race_group_age5_tier = graham.groupby(['race_grouping','gender','age_group_5','tier']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(graham_gender_race_group_age5_tier)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
race_grouping,gender,age_group_5,tier,Unnamed: 4_level_1,Unnamed: 5_level_1
white,Male,55-59,Tier 1,5.0,175780.0
white,Male,35-39,Tier 1,5.0,173280.0
white,Female,50-54,Tier 1,5.0,167780.0
white,Female,55-59,Tier 1,6.0,162854.23
white,Male,40-44,Tier 1,6.0,151590.08
white,Female,55-59,Tier 2,5.0,149029.98
white,Male,65+,Tier 2,6.0,147473.21
white,Male,35-39,Tier 2,5.0,147300.0
white,Male,55-59,Tier 2,12.0,143129.04
white,Male,50-54,Tier 2,13.0,128052.85


### Overall disparity calculations

In [247]:
news_groups = news_salaried.groupby(['age_group_5','tier']).agg({'current_base_pay': [np.count_nonzero, np.median]})
expected_medians = pd.merge(news_salaried, news_groups, on=['age_group_5', 'tier'])



In [248]:
below_expected_medians = expected_medians[expected_medians['current_base_pay'] < expected_medians[('current_base_pay', 'median')]].groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(below_expected_medians)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
race_grouping,gender,Unnamed: 2_level_1
person of color,Female,48.0
person of color,Male,27.0
unknown,Female,8.0
unknown,Male,8.0
white,Female,93.0
white,Male,89.0


In [249]:
above_expected_medians = expected_medians[expected_medians['current_base_pay'] > expected_medians[('current_base_pay', 'median')]].groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(above_expected_medians)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
race_grouping,gender,Unnamed: 2_level_1
person of color,Female,30.0
person of color,Male,21.0
unknown,Male,8.0
white,Female,90.0
white,Male,121.0


In [250]:
expected_medians['disparity'] = expected_medians['current_base_pay'] - expected_medians[('current_base_pay', 'median')]
expected_medians['disparity_pct'] = (expected_medians['current_base_pay'] - expected_medians[('current_base_pay', 'median')])/expected_medians[('current_base_pay', 'median')]

In [251]:
disparity = expected_medians.groupby(['race_grouping','gender']).agg({'disparity': [np.count_nonzero, np.median]})
suppress(disparity)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,78.0,-1500.0
person of color,Male,48.0,0.0
unknown,Female,11.0,-3500.0
unknown,Male,16.0,2177.25
white,Female,183.0,0.0
white,Male,210.0,2457.75


In [252]:
disparity_pct_above = expected_medians[expected_medians['disparity_pct'] > .05].groupby(['race_grouping','gender']).agg({'disparity': [np.count_nonzero, np.median]})
suppress(disparity_pct_above)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,21.0,9610.0
person of color,Male,16.0,25880.0
unknown,Male,7.0,30000.0
white,Female,61.0,21485.87
white,Male,100.0,28677.74


In [253]:
disparity_pct_below = expected_medians[expected_medians['disparity_pct'] < -.05].groupby(['race_grouping','gender']).agg({'disparity': [np.count_nonzero, np.median]})
suppress(disparity_pct_below)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,36.0,-10195.04
person of color,Male,19.0,-15435.0
unknown,Female,5.0,-14220.0
unknown,Male,5.0,-15000.0
white,Female,72.0,-14000.0
white,Male,70.0,-18765.53


In [254]:
expected_medians.groupby(['race_grouping','gender']).agg({'disparity_pct': [np.count_nonzero, np.average]})

Unnamed: 0_level_0,Unnamed: 1_level_0,disparity_pct,disparity_pct
Unnamed: 0_level_1,Unnamed: 1_level_1,count_nonzero,average
race_grouping,gender,Unnamed: 2_level_2,Unnamed: 3_level_2
person of color,Female,78.0,-0.01
person of color,Male,48.0,0.03
unknown,Female,11.0,-0.05
unknown,Male,16.0,0.04
white,Female,183.0,0.05
white,Male,210.0,0.1


In [255]:
bezos_news_groups = bezos.groupby(['age_group_5','tier']).agg({'current_base_pay': [np.count_nonzero, np.median]})
bezos_expected_medians = pd.merge(bezos, bezos_news_groups, on=['age_group_5', 'tier'])
graham_news_groups = graham.groupby(['age_group_5','tier']).agg({'current_base_pay': [np.count_nonzero, np.median]})
graham_expected_medians = pd.merge(graham, graham_news_groups, on=['age_group_5', 'tier'])

In [256]:
bezos_expected_medians['disparity'] = bezos_expected_medians['current_base_pay'] - bezos_expected_medians[('current_base_pay', 'median')]
bezos_expected_medians['disparity_pct'] = (bezos_expected_medians['current_base_pay'] - bezos_expected_medians[('current_base_pay', 'median')])/bezos_expected_medians[('current_base_pay', 'median')]
graham_expected_medians['disparity'] = graham_expected_medians['current_base_pay'] - graham_expected_medians[('current_base_pay', 'median')]
graham_expected_medians['disparity_pct'] = (graham_expected_medians['current_base_pay'] - graham_expected_medians[('current_base_pay', 'median')])/graham_expected_medians[('current_base_pay', 'median')]

In [257]:
bezos_disparity_gender = bezos_expected_medians.groupby(['gender']).agg({'disparity_pct': [np.count_nonzero, np.average]})
suppress(bezos_disparity_gender)

Unnamed: 0_level_0,count_nonzero,average
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,169.0,0.04
Male,142.0,0.07


In [258]:
bezos_disparity_race_group = bezos_expected_medians.groupby(['race_grouping']).agg({'disparity_pct': [np.count_nonzero, np.average]})
suppress(bezos_disparity_race_group)

Unnamed: 0_level_0,count_nonzero,average
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
person of color,83.0,0.02
unknown,19.0,-0.01
white,209.0,0.07


In [259]:
bezos_disparity_gender_race_group = bezos_expected_medians.groupby(['race_grouping','gender']).agg({'disparity_pct': [np.count_nonzero, np.average]})
suppress(bezos_disparity_gender_race_group)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,average
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,56.0,0.01
person of color,Male,27.0,0.05
unknown,Female,9.0,-0.06
unknown,Male,10.0,0.04
white,Female,104.0,0.06
white,Male,105.0,0.08


In [260]:
graham_disparity_gender = graham_expected_medians.groupby(['gender']).agg({'disparity_pct': [np.count_nonzero, np.average]})
suppress(graham_disparity_gender)

Unnamed: 0_level_0,count_nonzero,average
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,99.0,0.02
Male,125.0,0.07


In [261]:
graham_disparity_race_group = graham_expected_medians.groupby(['race_grouping']).agg({'disparity_pct': [np.count_nonzero, np.average]})
suppress(graham_disparity_race_group)

Unnamed: 0_level_0,count_nonzero,average
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
person of color,43.0,-0.05
unknown,8.0,-0.05
white,173.0,0.07


In [262]:
graham_disparity_gender_race_group = graham_expected_medians.groupby(['race_grouping','gender']).agg({'disparity_pct': [np.count_nonzero, np.average]})
suppress(graham_disparity_gender_race_group)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,average
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,21.0,-0.06
person of color,Male,22.0,-0.04
unknown,Male,5.0,-0.03
white,Female,75.0,0.04
white,Male,98.0,0.1


### Regression

In [263]:
news_salaried_regression = news_salaried[['department','gender','race_ethnicity','current_base_pay','job_profile_current','cost_center_current','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','age','years_of_service','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping']]
news_salaried_regression = pd.get_dummies(news_salaried_regression, columns=['gender','race_ethnicity','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping'])

In [264]:
news_salaried_regression = news_salaried_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over','tier_Tier 1':'tier_Tier_1','tier_Tier 2':'tier_Tier_2','tier_Tier 3':'tier_Tier_3','tier_Tier 4':'tier_Tier_4','years_of_service_grouped_0':'years_of_service_grouped_0','years_of_service_grouped_1-2':'years_of_service_grouped_1to2','years_of_service_grouped_3-5':'years_of_service_grouped_3to5','years_of_service_grouped_6-10':'years_of_service_grouped_6to10','years_of_service_grouped_11-15':'years_of_service_grouped_11to15','years_of_service_grouped_16-20':'years_of_service_grouped_16to20','years_of_service_grouped_21-25':'years_of_service_grouped_21to25','years_of_service_grouped_25+':'years_of_service_grouped_25_over'})
import statsmodels.formula.api as sm
model1 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male')
result1 = model1.fit()
result1.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.04
Model:,OLS,Adj. R-squared:,0.036
Method:,Least Squares,F-statistic:,11.76
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,9.87e-06
Time:,12:09:20,Log-Likelihood:,-6931.6
No. Observations:,574,AIC:,13870.0
Df Residuals:,571,BIC:,13880.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,7.739e+04,1185.564,65.281,0.000,7.51e+04,7.97e+04
gender_Female,3.007e+04,1880.411,15.992,0.000,2.64e+04,3.38e+04
gender_Male,4.732e+04,1868.654,25.324,0.000,4.37e+04,5.1e+04

0,1,2,3
Omnibus:,138.887,Durbin-Watson:,1.681
Prob(Omnibus):,0.0,Jarque-Bera (JB):,287.507
Skew:,1.32,Prob(JB):,3.7e-63
Kurtosis:,5.246,Cond. No.,1480000000000000.0


In [265]:
model2 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color')
result2 = model2.fit()
result2.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.043
Model:,OLS,Adj. R-squared:,0.04
Method:,Least Squares,F-statistic:,12.81
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,3.6e-06
Time:,12:09:20,Log-Likelihood:,-6930.6
No. Observations:,574,AIC:,13870.0
Df Residuals:,571,BIC:,13880.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.271e+05,7897.372,16.092,0.000,1.12e+05,1.43e+05
race_grouping_white,-6301.9244,8174.557,-0.771,0.441,-2.24e+04,9753.945
race_grouping_person_of_color,-2.661e+04,8682.201,-3.065,0.002,-4.37e+04,-9560.605

0,1,2,3
Omnibus:,128.063,Durbin-Watson:,1.632
Prob(Omnibus):,0.0,Jarque-Bera (JB):,248.772
Skew:,1.253,Prob(JB):,9.55e-55
Kurtosis:,5.03,Cond. No.,9.91


In [266]:
model3 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result3 = model3.fit()
result3.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.074
Model:,OLS,Adj. R-squared:,0.069
Method:,Least Squares,F-statistic:,15.18
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,1.62e-09
Time:,12:09:20,Log-Likelihood:,-6921.2
No. Observations:,574,AIC:,13850.0
Df Residuals:,570,BIC:,13870.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,8.419e+04,5184.782,16.238,0.000,7.4e+04,9.44e+04
gender_Female,3.44e+04,3167.032,10.863,0.000,2.82e+04,4.06e+04
gender_Male,4.979e+04,3098.878,16.066,0.000,4.37e+04,5.59e+04
race_grouping_white,-6074.5808,8048.101,-0.755,0.451,-2.19e+04,9732.973
race_grouping_person_of_color,-2.432e+04,8563.749,-2.840,0.005,-4.11e+04,-7503.406

0,1,2,3
Omnibus:,132.663,Durbin-Watson:,1.66
Prob(Omnibus):,0.0,Jarque-Bera (JB):,270.377
Skew:,1.269,Prob(JB):,1.94e-59
Kurtosis:,5.205,Cond. No.,1680000000000000.0


In [267]:
new_news_salaried_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1], 'age': [40,40,40,40]})
new_news_salaried_regression['predicted'] = result3.predict(new_news_salaried_regression)
new_news_salaried_regression

Unnamed: 0,gender_Female,gender_Male,race_grouping_white,race_grouping_person_of_color,age,predicted
0,1,0,1,0,40,112522.15
1,0,1,1,0,40,127905.73
2,1,0,0,1,40,94272.97
3,0,1,0,1,40,109656.55


In [268]:
model4 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result4 = model4.fit()
result4.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.268
Model:,OLS,Adj. R-squared:,0.255
Method:,Least Squares,F-statistic:,20.63
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,9.77e-33
Time:,12:09:20,Log-Likelihood:,-6853.6
No. Observations:,574,AIC:,13730.0
Df Residuals:,563,BIC:,13780.0
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,7.547e+04,1169.257,64.546,0.000,7.32e+04,7.78e+04
gender_Female,3.365e+04,1722.563,19.536,0.000,3.03e+04,3.7e+04
gender_Male,4.182e+04,1697.671,24.632,0.000,3.85e+04,4.52e+04
age_group_5_25_under,-4.454e+04,7177.390,-6.205,0.000,-5.86e+04,-3.04e+04
age_group_5_25to29,-2.51e+04,3987.825,-6.294,0.000,-3.29e+04,-1.73e+04
age_group_5_30to34,-8982.7087,3766.135,-2.385,0.017,-1.64e+04,-1585.316
age_group_5_35to39,1532.0128,4043.258,0.379,0.705,-6409.700,9473.725
age_group_5_40to44,1.998e+04,4621.927,4.322,0.000,1.09e+04,2.91e+04
age_group_5_45to49,1.214e+04,5439.050,2.231,0.026,1453.537,2.28e+04

0,1,2,3
Omnibus:,164.069,Durbin-Watson:,1.859
Prob(Omnibus):,0.0,Jarque-Bera (JB):,434.791
Skew:,1.424,Prob(JB):,3.8599999999999996e-95
Kurtosis:,6.173,Cond. No.,3500000000000000.0


In [269]:
model5 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result5 = model5.fit()
result5.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.278
Model:,OLS,Adj. R-squared:,0.264
Method:,Least Squares,F-statistic:,19.71
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,1.04e-33
Time:,12:09:21,Log-Likelihood:,-6849.6
No. Observations:,574,AIC:,13720.0
Df Residuals:,562,BIC:,13780.0
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.214e+05,6405.473,18.954,0.000,1.09e+05,1.34e+05
race_grouping_white,-1.047e+04,7206.280,-1.453,0.147,-2.46e+04,3682.522
race_grouping_person_of_color,-2.275e+04,7648.553,-2.974,0.003,-3.78e+04,-7724.844
age_group_5_25_under,-3.946e+04,7161.284,-5.510,0.000,-5.35e+04,-2.54e+04
age_group_5_25to29,-2.106e+04,3963.986,-5.313,0.000,-2.88e+04,-1.33e+04
age_group_5_30to34,-4725.1241,3744.627,-1.262,0.208,-1.21e+04,2630.051
age_group_5_35to39,7317.7479,4085.001,1.791,0.074,-705.987,1.53e+04
age_group_5_40to44,2.557e+04,4583.847,5.579,0.000,1.66e+04,3.46e+04
age_group_5_45to49,1.616e+04,5474.110,2.953,0.003,5410.289,2.69e+04

0,1,2,3
Omnibus:,164.311,Durbin-Watson:,1.827
Prob(Omnibus):,0.0,Jarque-Bera (JB):,428.7
Skew:,1.434,Prob(JB):,8.11e-94
Kurtosis:,6.114,Cond. No.,4330000000000000.0


In [270]:
model6 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result6 = model6.fit()
result6.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.285
Model:,OLS,Adj. R-squared:,0.269
Method:,Least Squares,F-statistic:,18.61
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,4.4599999999999995e-34
Time:,12:09:21,Log-Likelihood:,-6847.0
No. Observations:,574,AIC:,13720.0
Df Residuals:,561,BIC:,13780.0
Df Model:,12,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,8.317e+04,4390.292,18.944,0.000,7.45e+04,9.18e+04
gender_Female,3.802e+04,2754.420,13.803,0.000,3.26e+04,4.34e+04
gender_Male,4.515e+04,2676.058,16.872,0.000,3.99e+04,5.04e+04
race_grouping_white,-1.024e+04,7181.661,-1.426,0.154,-2.44e+04,3862.150
race_grouping_person_of_color,-2.178e+04,7634.018,-2.853,0.004,-3.68e+04,-6784.597
age_group_5_25_under,-4.129e+04,7165.506,-5.762,0.000,-5.54e+04,-2.72e+04
age_group_5_25to29,-2.371e+04,3972.663,-5.968,0.000,-3.15e+04,-1.59e+04
age_group_5_30to34,-8102.4208,3737.928,-2.168,0.031,-1.54e+04,-760.377
age_group_5_35to39,3133.7668,4052.539,0.773,0.440,-4826.237,1.11e+04

0,1,2,3
Omnibus:,164.304,Durbin-Watson:,1.83
Prob(Omnibus):,0.0,Jarque-Bera (JB):,437.349
Skew:,1.424,Prob(JB):,1.07e-95
Kurtosis:,6.19,Cond. No.,5460000000000000.0


In [271]:
model7 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over + tier_Tier_1 + tier_Tier_2 + tier_Tier_3 + tier_Tier_4')
result7 = model7.fit()
result7.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.468
Model:,OLS,Adj. R-squared:,0.453
Method:,Least Squares,F-statistic:,30.65
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,5.96e-66
Time:,12:09:21,Log-Likelihood:,-6762.0
No. Observations:,574,AIC:,13560.0
Df Residuals:,557,BIC:,13630.0
Df Model:,16,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,6.71e+04,5610.010,11.961,0.000,5.61e+04,7.81e+04
gender_Female,3.114e+04,3178.867,9.796,0.000,2.49e+04,3.74e+04
gender_Male,3.596e+04,3080.637,11.672,0.000,2.99e+04,4.2e+04
race_grouping_white,1.021e+04,6456.704,1.581,0.114,-2474.772,2.29e+04
race_grouping_person_of_color,1590.1942,6868.290,0.232,0.817,-1.19e+04,1.51e+04
age_group_5_25_under,-3.328e+04,6252.504,-5.323,0.000,-4.56e+04,-2.1e+04
age_group_5_25to29,-1.518e+04,3560.433,-4.264,0.000,-2.22e+04,-8187.622
age_group_5_30to34,-7122.3046,3257.952,-2.186,0.029,-1.35e+04,-722.931
age_group_5_35to39,-2713.7685,3565.793,-0.761,0.447,-9717.813,4290.276

0,1,2,3
Omnibus:,215.055,Durbin-Watson:,1.864
Prob(Omnibus):,0.0,Jarque-Bera (JB):,959.817
Skew:,1.648,Prob(JB):,3.79e-209
Kurtosis:,8.41,Cond. No.,5900000000000000.0


In [272]:
model8 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over + tier_Tier_1 + tier_Tier_2 + tier_Tier_3 + tier_Tier_4 + years_of_service_grouped_0 + years_of_service_grouped_1to2 + years_of_service_grouped_3to5 + years_of_service_grouped_6to10 + years_of_service_grouped_11to15 + years_of_service_grouped_16to20 + years_of_service_grouped_21to25 + years_of_service_grouped_25_over')
result8 = model8.fit()
result8.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.475
Model:,OLS,Adj. R-squared:,0.453
Method:,Least Squares,F-statistic:,21.63
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,2.04e-62
Time:,12:09:21,Log-Likelihood:,-6758.3
No. Observations:,574,AIC:,13560.0
Df Residuals:,550,BIC:,13670.0
Df Model:,23,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,6.117e+04,5242.405,11.668,0.000,5.09e+04,7.15e+04
gender_Female,2.823e+04,3011.909,9.373,0.000,2.23e+04,3.41e+04
gender_Male,3.294e+04,2923.835,11.265,0.000,2.72e+04,3.87e+04
race_grouping_white,1.068e+04,6477.491,1.648,0.100,-2046.534,2.34e+04
race_grouping_person_of_color,2147.6298,6898.961,0.311,0.756,-1.14e+04,1.57e+04
age_group_5_25_under,-3.821e+04,6642.301,-5.752,0.000,-5.13e+04,-2.52e+04
age_group_5_25to29,-1.808e+04,3999.056,-4.521,0.000,-2.59e+04,-1.02e+04
age_group_5_30to34,-8875.1051,3619.177,-2.452,0.015,-1.6e+04,-1766.005
age_group_5_35to39,-4003.6497,3846.671,-1.041,0.298,-1.16e+04,3552.315

0,1,2,3
Omnibus:,205.644,Durbin-Watson:,1.878
Prob(Omnibus):,0.0,Jarque-Bera (JB):,854.496
Skew:,1.594,Prob(JB):,2.81e-186
Kurtosis:,8.056,Cond. No.,1.12e+16


In [273]:
merit_raises_combined_salaried_regression = merit_raises_combined[(merit_raises_combined['dept'] == 'News') & (merit_raises_combined['pay_rate_type'] == 'Salaried')]
merit_raises_combined_salaried_regression = pd.get_dummies(merit_raises_combined_salaried_regression, columns=['gender','race_grouping','age_group_5'])

In [274]:
merit_raises_combined_salaried_regression = merit_raises_combined_salaried_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over'})
model9 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male')
result9 = model9.fit()
result9.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.004
Model:,OLS,Adj. R-squared:,0.003
Method:,Least Squares,F-statistic:,3.275
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.0707
Time:,12:09:21,Log-Likelihood:,-7121.9
No. Observations:,811,AIC:,14250.0
Df Residuals:,809,BIC:,14260.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2116.8178,37.068,57.107,0.000,2044.057,2189.578
gender_Female,957.7901,60.044,15.951,0.000,839.929,1075.651
gender_Male,1159.0276,57.138,20.285,0.000,1046.871,1271.185

0,1,2,3
Omnibus:,599.428,Durbin-Watson:,1.975
Prob(Omnibus):,0.0,Jarque-Bera (JB):,15042.743
Skew:,3.055,Prob(JB):,0.0
Kurtosis:,23.195,Cond. No.,5430000000000000.0


In [275]:
model10 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color')
result10 = model10.fit()
result10.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.007
Model:,OLS,Adj. R-squared:,0.005
Method:,Least Squares,F-statistic:,2.905
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.0553
Time:,12:09:21,Log-Likelihood:,-7120.6
No. Observations:,811,AIC:,14250.0
Df Residuals:,808,BIC:,14260.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,3426.7500,394.132,8.694,0.000,2653.106,4200.394
race_grouping_white,-179.1878,399.177,-0.449,0.654,-962.735,604.359
race_grouping_person_of_color,-494.0711,411.855,-1.200,0.231,-1302.503,314.361

0,1,2,3
Omnibus:,595.371,Durbin-Watson:,1.967
Prob(Omnibus):,0.0,Jarque-Bera (JB):,14962.329
Skew:,3.023,Prob(JB):,0.0
Kurtosis:,23.155,Cond. No.,16.1


In [276]:
model11 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result11 = model11.fit()
result11.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.01
Model:,OLS,Adj. R-squared:,0.007
Method:,Least Squares,F-statistic:,2.802
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.039
Time:,12:09:21,Log-Likelihood:,-7119.3
No. Observations:,811,AIC:,14250.0
Df Residuals:,807,BIC:,14270.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2291.9731,262.539,8.730,0.000,1776.633,2807.313
gender_Female,1056.3093,141.724,7.453,0.000,778.117,1334.501
gender_Male,1235.6638,143.543,8.608,0.000,953.901,1517.426
race_grouping_white,-202.9609,399.061,-0.509,0.611,-986.281,580.359
race_grouping_person_of_color,-496.0038,411.454,-1.205,0.228,-1303.650,311.642

0,1,2,3
Omnibus:,595.574,Durbin-Watson:,1.97
Prob(Omnibus):,0.0,Jarque-Bera (JB):,14866.159
Skew:,3.027,Prob(JB):,0.0
Kurtosis:,23.082,Cond. No.,6200000000000000.0


In [277]:
new_reason_for_change_combined_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1]})
new_reason_for_change_combined_regression['predicted'] = result11.predict(new_reason_for_change_combined_regression)
new_reason_for_change_combined_regression

Unnamed: 0,gender_Female,gender_Male,race_grouping_white,race_grouping_person_of_color,predicted
0,1,0,1,0,3145.32
1,0,1,1,0,3324.68
2,1,0,0,1,2852.28
3,0,1,0,1,3031.63


In [278]:
model12 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result12 = model12.fit()
result12.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.047
Model:,OLS,Adj. R-squared:,0.035
Method:,Least Squares,F-statistic:,3.937
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,2.95e-05
Time:,12:09:22,Log-Likelihood:,-7104.1
No. Observations:,811,AIC:,14230.0
Df Residuals:,800,BIC:,14280.0
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1900.1395,51.767,36.706,0.000,1798.525,2001.754
gender_Female,837.3584,61.567,13.601,0.000,716.506,958.211
gender_Male,1062.7812,60.759,17.492,0.000,943.516,1182.046
age_group_5_25_under,-625.0684,577.767,-1.082,0.280,-1759.186,509.049
age_group_5_25to29,348.4845,185.964,1.874,0.061,-16.551,713.520
age_group_5_30to34,508.1254,142.282,3.571,0.000,228.834,787.416
age_group_5_35to39,681.6571,149.030,4.574,0.000,389.122,974.193
age_group_5_40to44,629.9350,163.125,3.862,0.000,309.732,950.138
age_group_5_45to49,455.9623,179.299,2.543,0.011,104.010,807.914

0,1,2,3
Omnibus:,607.312,Durbin-Watson:,1.979
Prob(Omnibus):,0.0,Jarque-Bera (JB):,16080.305
Skew:,3.095,Prob(JB):,0.0
Kurtosis:,23.918,Cond. No.,5240000000000000.0


In [279]:
model13 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result13 = model13.fit()
result13.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.052
Model:,OLS,Adj. R-squared:,0.039
Method:,Least Squares,F-statistic:,3.976
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,1.17e-05
Time:,12:09:22,Log-Likelihood:,-7101.9
No. Observations:,811,AIC:,14230.0
Df Residuals:,799,BIC:,14280.0
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2856.8954,360.070,7.934,0.000,2150.101,3563.689
race_grouping_white,-33.7963,395.935,-0.085,0.932,-810.992,743.399
race_grouping_person_of_color,-425.7390,407.658,-1.044,0.297,-1225.947,374.469
age_group_5_25_under,-673.0990,579.089,-1.162,0.245,-1809.814,463.616
age_group_5_25to29,440.9979,187.438,2.353,0.019,73.070,808.926
age_group_5_30to34,628.8243,146.144,4.303,0.000,341.953,915.695
age_group_5_35to39,816.7998,153.462,5.323,0.000,515.564,1118.035
age_group_5_40to44,803.8584,163.611,4.913,0.000,482.700,1125.017
age_group_5_45to49,540.1748,182.411,2.961,0.003,182.113,898.237

0,1,2,3
Omnibus:,601.567,Durbin-Watson:,1.971
Prob(Omnibus):,0.0,Jarque-Bera (JB):,15921.391
Skew:,3.05,Prob(JB):,0.0
Kurtosis:,23.832,Cond. No.,3090000000000000.0


In [280]:
model14 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result14 = model14.fit()
result14.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.056
Model:,OLS,Adj. R-squared:,0.041
Method:,Least Squares,F-statistic:,3.916
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,7.08e-06
Time:,12:09:22,Log-Likelihood:,-7100.3
No. Observations:,811,AIC:,14230.0
Df Residuals:,798,BIC:,14290.0
Df Model:,12,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1978.5484,247.351,7.999,0.000,1493.012,2464.085
gender_Female,890.8640,133.899,6.653,0.000,628.028,1153.700
gender_Male,1087.6844,137.238,7.926,0.000,818.295,1357.074
race_grouping_white,-64.1327,395.777,-0.162,0.871,-841.019,712.754
race_grouping_person_of_color,-431.5462,407.127,-1.060,0.289,-1230.713,367.621
age_group_5_25_under,-688.0832,577.471,-1.192,0.234,-1821.625,445.459
age_group_5_25to29,375.2333,186.600,2.011,0.045,8.948,741.519
age_group_5_30to34,548.5259,144.590,3.794,0.000,264.704,832.348
age_group_5_35to39,725.4046,151.661,4.783,0.000,427.703,1023.106

0,1,2,3
Omnibus:,602.033,Durbin-Watson:,1.973
Prob(Omnibus):,0.0,Jarque-Bera (JB):,15800.004
Skew:,3.057,Prob(JB):,0.0
Kurtosis:,23.741,Cond. No.,6220000000000000.0


In [281]:
model15 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male')
result15 = model15.fit()
result15.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.012
Model:,OLS,Adj. R-squared:,0.011
Method:,Least Squares,F-statistic:,9.232
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.00246
Time:,12:09:22,Log-Likelihood:,-231.28
No. Observations:,763,AIC:,466.6
Df Residuals:,761,BIC:,475.8
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.3801,0.008,299.623,0.000,2.364,2.396
gender_Female,1.1538,0.013,89.739,0.000,1.129,1.179
gender_Male,1.2262,0.012,100.061,0.000,1.202,1.250

0,1,2,3
Omnibus:,26.124,Durbin-Watson:,1.853
Prob(Omnibus):,0.0,Jarque-Bera (JB):,28.14
Skew:,0.47,Prob(JB):,7.75e-07
Kurtosis:,3.04,Cond. No.,5040000000000000.0


In [282]:
model16 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color')
result16 = model16.fit()
result16.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.034
Model:,OLS,Adj. R-squared:,0.031
Method:,Least Squares,F-statistic:,13.37
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,1.97e-06
Time:,12:09:22,Log-Likelihood:,-222.69
No. Observations:,763,AIC:,451.4
Df Residuals:,760,BIC:,465.3
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,3.7250,0.081,45.900,0.000,3.566,3.884
race_grouping_white,-0.1248,0.082,-1.517,0.130,-0.286,0.037
race_grouping_person_of_color,-0.2626,0.085,-3.089,0.002,-0.429,-0.096

0,1,2,3
Omnibus:,17.904,Durbin-Watson:,1.871
Prob(Omnibus):,0.0,Jarque-Bera (JB):,18.639
Skew:,0.381,Prob(JB):,8.96e-05
Kurtosis:,3.066,Cond. No.,15.6


In [283]:
model17 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result17 = model17.fit()
result17.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.043
Model:,OLS,Adj. R-squared:,0.039
Method:,Least Squares,F-statistic:,11.32
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,2.88e-07
Time:,12:09:22,Log-Likelihood:,-219.19
No. Observations:,763,AIC:,446.4
Df Residuals:,759,BIC:,464.9
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.4859,0.054,46.122,0.000,2.380,2.592
gender_Female,1.2117,0.029,41.457,0.000,1.154,1.269
gender_Male,1.2742,0.030,43.015,0.000,1.216,1.332
race_grouping_white,-0.1331,0.082,-1.624,0.105,-0.294,0.028
race_grouping_person_of_color,-0.2629,0.085,-3.105,0.002,-0.429,-0.097

0,1,2,3
Omnibus:,18.909,Durbin-Watson:,1.865
Prob(Omnibus):,0.0,Jarque-Bera (JB):,19.811
Skew:,0.394,Prob(JB):,4.99e-05
Kurtosis:,3.041,Cond. No.,5840000000000000.0


In [284]:
model18 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result18 = model18.fit()
result18.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.046
Model:,OLS,Adj. R-squared:,0.033
Method:,Least Squares,F-statistic:,3.588
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.000114
Time:,12:09:22,Log-Likelihood:,-218.1
No. Observations:,763,AIC:,458.2
Df Residuals:,752,BIC:,509.2
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.2183,0.011,202.702,0.000,2.197,2.240
gender_Female,1.0816,0.013,81.967,0.000,1.056,1.107
gender_Male,1.1368,0.013,87.233,0.000,1.111,1.162
age_group_5_25_under,-0.0591,0.121,-0.489,0.625,-0.296,0.178
age_group_5_25to29,0.1312,0.040,3.286,0.001,0.053,0.210
age_group_5_30to34,0.1968,0.030,6.461,0.000,0.137,0.257
age_group_5_35to39,0.2457,0.032,7.720,0.000,0.183,0.308
age_group_5_40to44,0.2914,0.035,8.387,0.000,0.223,0.360
age_group_5_45to49,0.2170,0.038,5.715,0.000,0.142,0.292

0,1,2,3
Omnibus:,22.13,Durbin-Watson:,1.879
Prob(Omnibus):,0.0,Jarque-Bera (JB):,23.546
Skew:,0.43,Prob(JB):,7.71e-06
Kurtosis:,3.003,Cond. No.,5830000000000000.0


In [285]:
model19 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result19 = model19.fit()
result19.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.07
Model:,OLS,Adj. R-squared:,0.056
Method:,Least Squares,F-statistic:,5.124
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,8.91e-08
Time:,12:09:22,Log-Likelihood:,-208.27
No. Observations:,763,AIC:,440.5
Df Residuals:,751,BIC:,496.2
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,3.3538,0.075,45.011,0.000,3.208,3.500
race_grouping_white,-0.1183,0.082,-1.443,0.149,-0.279,0.043
race_grouping_person_of_color,-0.2531,0.085,-2.995,0.003,-0.419,-0.087
age_group_5_25_under,0.0145,0.120,0.121,0.904,-0.221,0.250
age_group_5_25to29,0.2464,0.040,6.189,0.000,0.168,0.324
age_group_5_30to34,0.3228,0.031,10.429,0.000,0.262,0.384
age_group_5_35to39,0.3714,0.032,11.469,0.000,0.308,0.435
age_group_5_40to44,0.4239,0.034,12.295,0.000,0.356,0.492
age_group_5_45to49,0.3275,0.038,8.568,0.000,0.252,0.402

0,1,2,3
Omnibus:,15.402,Durbin-Watson:,1.897
Prob(Omnibus):,0.0,Jarque-Bera (JB):,15.937
Skew:,0.354,Prob(JB):,0.000346
Kurtosis:,3.028,Cond. No.,3340000000000000.0


In [286]:
model20 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result20 = model20.fit()
result20.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.075
Model:,OLS,Adj. R-squared:,0.06
Method:,Least Squares,F-statistic:,5.031
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,4.32e-08
Time:,12:09:22,Log-Likelihood:,-206.34
No. Observations:,763,AIC:,438.7
Df Residuals:,750,BIC:,499.0
Df Model:,12,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.3092,0.051,45.135,0.000,2.209,2.410
gender_Female,1.1315,0.028,40.697,0.000,1.077,1.186
gender_Male,1.1777,0.029,41.247,0.000,1.122,1.234
race_grouping_white,-0.1256,0.082,-1.533,0.126,-0.286,0.035
race_grouping_person_of_color,-0.2545,0.084,-3.017,0.003,-0.420,-0.089
age_group_5_25_under,-0.0729,0.119,-0.610,0.542,-0.307,0.162
age_group_5_25to29,0.1473,0.040,3.716,0.000,0.069,0.225
age_group_5_30to34,0.2187,0.031,7.146,0.000,0.159,0.279
age_group_5_35to39,0.2665,0.032,8.329,0.000,0.204,0.329

0,1,2,3
Omnibus:,16.441,Durbin-Watson:,1.888
Prob(Omnibus):,0.0,Jarque-Bera (JB):,17.123
Skew:,0.367,Prob(JB):,0.000191
Kurtosis:,3.004,Cond. No.,1.03e+16


In [287]:
news_hourly_regression = news_hourly[['department','gender','race_ethnicity','current_base_pay','job_profile_current','cost_center_current','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','age','years_of_service','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping']]
news_hourly_regression = pd.get_dummies(news_hourly_regression, columns=['gender','race_ethnicity','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping'])

In [288]:
news_hourly_regression = news_hourly_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over','tier_Tier 1':'tier_Tier_1','tier_Tier 2':'tier_Tier_2','tier_Tier 3':'tier_Tier_3','tier_Tier 4':'tier_Tier_4','years_of_service_grouped_0':'years_of_service_grouped_0','years_of_service_grouped_1-2':'years_of_service_grouped_1to2','years_of_service_grouped_3-5':'years_of_service_grouped_3to5','years_of_service_grouped_6-10':'years_of_service_grouped_6to10','years_of_service_grouped_11-15':'years_of_service_grouped_11to15','years_of_service_grouped_16-20':'years_of_service_grouped_16to20','years_of_service_grouped_21-25':'years_of_service_grouped_21to25','years_of_service_grouped_25+':'years_of_service_grouped_25_over'})
import statsmodels.formula.api as sm
model21 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male')
result21 = model2.fit()
result21.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.043
Model:,OLS,Adj. R-squared:,0.04
Method:,Least Squares,F-statistic:,12.81
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,3.6e-06
Time:,12:09:22,Log-Likelihood:,-6930.6
No. Observations:,574,AIC:,13870.0
Df Residuals:,571,BIC:,13880.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.271e+05,7897.372,16.092,0.000,1.12e+05,1.43e+05
race_grouping_white,-6301.9244,8174.557,-0.771,0.441,-2.24e+04,9753.945
race_grouping_person_of_color,-2.661e+04,8682.201,-3.065,0.002,-4.37e+04,-9560.605

0,1,2,3
Omnibus:,128.063,Durbin-Watson:,1.632
Prob(Omnibus):,0.0,Jarque-Bera (JB):,248.772
Skew:,1.253,Prob(JB):,9.55e-55
Kurtosis:,5.03,Cond. No.,9.91


In [289]:
model22 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color')
result22 = model22.fit()
result22.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.051
Model:,OLS,Adj. R-squared:,0.03
Method:,Least Squares,F-statistic:,2.484
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.0889
Time:,12:09:22,Log-Likelihood:,-369.15
No. Observations:,96,AIC:,744.3
Df Residuals:,93,BIC:,752.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,39.2300,8.131,4.825,0.000,23.084,55.376
race_grouping_white,-3.6811,8.257,-0.446,0.657,-20.077,12.715
race_grouping_person_of_color,-9.0990,8.397,-1.084,0.281,-25.775,7.577

0,1,2,3
Omnibus:,5.387,Durbin-Watson:,1.792
Prob(Omnibus):,0.068,Jarque-Bera (JB):,4.797
Skew:,0.527,Prob(JB):,0.0909
Kurtosis:,3.296,Cond. No.,15.1


In [290]:
model23 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result23 = model23.fit()
result23.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.065
Model:,OLS,Adj. R-squared:,0.034
Method:,Least Squares,F-statistic:,2.116
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.104
Time:,12:09:23,Log-Likelihood:,-368.44
No. Observations:,96,AIC:,744.9
Df Residuals:,92,BIC:,755.1
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,25.1888,5.473,4.603,0.000,14.319,36.058
gender_Female,14.0412,2.829,4.964,0.000,8.423,19.659
gender_Male,11.1476,3.171,3.516,0.001,4.851,17.445
race_grouping_white,-2.6412,8.289,-0.319,0.751,-19.104,13.821
race_grouping_person_of_color,-8.1345,8.422,-0.966,0.337,-24.861,8.592

0,1,2,3
Omnibus:,4.237,Durbin-Watson:,1.806
Prob(Omnibus):,0.12,Jarque-Bera (JB):,3.664
Skew:,0.465,Prob(JB):,0.16
Kurtosis:,3.226,Cond. No.,8670000000000000.0


In [291]:
new_news_hourly_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1], 'age': [40,40,40,40]})
new_news_hourly_regression['predicted'] = result23.predict(new_news_hourly_regression)
new_news_hourly_regression

Unnamed: 0,gender_Female,gender_Male,race_grouping_white,race_grouping_person_of_color,age,predicted
0,1,0,1,0,40,36.59
1,0,1,1,0,40,33.7
2,1,0,0,1,40,31.1
3,0,1,0,1,40,28.2


In [292]:
model24 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result24 = model24.fit()
result24.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.331
Model:,OLS,Adj. R-squared:,0.253
Method:,Least Squares,F-statistic:,4.211
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,9.31e-05
Time:,12:09:23,Log-Likelihood:,-352.33
No. Observations:,96,AIC:,726.7
Df Residuals:,85,BIC:,754.9
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,22.4772,0.740,30.371,0.000,21.006,23.949
gender_Female,13.1035,1.128,11.614,0.000,10.860,15.347
gender_Male,9.3736,1.324,7.078,0.000,6.740,12.007
age_group_5_25_under,-8.8886,2.708,-3.282,0.001,-14.273,-3.504
age_group_5_25to29,-5.8755,2.191,-2.681,0.009,-10.232,-1.519
age_group_5_30to34,-0.5526,3.010,-0.184,0.855,-6.537,5.432
age_group_5_35to39,-2.4257,3.389,-0.716,0.476,-9.165,4.313
age_group_5_40to44,3.6126,3.220,1.122,0.265,-2.790,10.015
age_group_5_45to49,11.8836,3.640,3.265,0.002,4.647,19.120

0,1,2,3
Omnibus:,0.505,Durbin-Watson:,1.922
Prob(Omnibus):,0.777,Jarque-Bera (JB):,0.653
Skew:,0.092,Prob(JB):,0.721
Kurtosis:,2.64,Cond. No.,2.33e+16


In [293]:
model25 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result25 = model25.fit()
result25.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.337
Model:,OLS,Adj. R-squared:,0.25
Method:,Least Squares,F-statistic:,3.876
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.000154
Time:,12:09:23,Log-Likelihood:,-351.94
No. Observations:,96,AIC:,727.9
Df Residuals:,84,BIC:,758.7
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,42.6892,6.752,6.322,0.000,29.261,56.117
race_grouping_white,-9.9644,7.492,-1.330,0.187,-24.862,4.933
race_grouping_person_of_color,-12.5342,7.657,-1.637,0.105,-27.762,2.693
age_group_5_25_under,-6.1759,2.703,-2.285,0.025,-11.552,-0.800
age_group_5_25to29,-2.6518,2.389,-1.110,0.270,-7.403,2.100
age_group_5_30to34,-0.7425,2.971,-0.250,0.803,-6.650,5.165
age_group_5_35to39,-0.0048,3.496,-0.001,0.999,-6.958,6.948
age_group_5_40to44,5.3819,3.318,1.622,0.108,-1.215,11.979
age_group_5_45to49,14.5738,3.707,3.932,0.000,7.202,21.945

0,1,2,3
Omnibus:,1.45,Durbin-Watson:,1.945
Prob(Omnibus):,0.484,Jarque-Bera (JB):,1.513
Skew:,0.255,Prob(JB):,0.469
Kurtosis:,2.657,Cond. No.,1.1e+16


In [294]:
model26 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result26 = model26.fit()
result26.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.351
Model:,OLS,Adj. R-squared:,0.257
Method:,Least Squares,F-statistic:,3.736
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.00016
Time:,12:09:23,Log-Likelihood:,-350.92
No. Observations:,96,AIC:,727.8
Df Residuals:,83,BIC:,761.2
Df Model:,12,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,28.2477,4.693,6.019,0.000,18.913,37.582
gender_Female,15.7092,2.436,6.449,0.000,10.864,20.554
gender_Male,12.5385,2.807,4.466,0.000,6.955,18.122
race_grouping_white,-8.6864,7.517,-1.156,0.251,-23.638,6.265
race_grouping_person_of_color,-11.0211,7.705,-1.430,0.156,-26.345,4.303
age_group_5_25_under,-8.2781,2.745,-3.016,0.003,-13.737,-2.819
age_group_5_25to29,-4.4931,2.358,-1.905,0.060,-9.183,0.197
age_group_5_30to34,-1.1757,3.034,-0.387,0.699,-7.211,4.859
age_group_5_35to39,-1.4497,3.439,-0.421,0.674,-8.291,5.391

0,1,2,3
Omnibus:,0.67,Durbin-Watson:,1.944
Prob(Omnibus):,0.715,Jarque-Bera (JB):,0.804
Skew:,0.145,Prob(JB):,0.669
Kurtosis:,2.658,Cond. No.,1.52e+16


In [295]:
model27 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over + tier_Tier_1 + tier_Tier_2 + tier_Tier_3 + tier_Tier_4')
result27 = model27.fit()
result27.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.425
Model:,OLS,Adj. R-squared:,0.309
Method:,Least Squares,F-statistic:,3.656
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,5.89e-05
Time:,12:09:23,Log-Likelihood:,-345.05
No. Observations:,96,AIC:,724.1
Df Residuals:,79,BIC:,767.7
Df Model:,16,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,34.4064,5.313,6.476,0.000,23.831,44.982
gender_Female,19.0722,2.796,6.822,0.000,13.507,24.637
gender_Male,15.3342,2.992,5.125,0.000,9.379,21.289
race_grouping_white,-7.5095,7.386,-1.017,0.312,-22.211,7.192
race_grouping_person_of_color,-11.2049,7.615,-1.471,0.145,-26.362,3.952
age_group_5_25_under,-7.8299,2.675,-2.927,0.004,-13.154,-2.506
age_group_5_25to29,-5.5331,2.385,-2.320,0.023,-10.280,-0.786
age_group_5_30to34,-1.8309,2.974,-0.616,0.540,-7.750,4.088
age_group_5_35to39,-1.2639,3.350,-0.377,0.707,-7.931,5.403

0,1,2,3
Omnibus:,0.381,Durbin-Watson:,1.809
Prob(Omnibus):,0.827,Jarque-Bera (JB):,0.242
Skew:,0.123,Prob(JB):,0.886
Kurtosis:,3.0,Cond. No.,1.99e+16


In [296]:
model28 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over + tier_Tier_1 + tier_Tier_2 + tier_Tier_3 + tier_Tier_4 + years_of_service_grouped_0 + years_of_service_grouped_1to2 + years_of_service_grouped_3to5 + years_of_service_grouped_6to10 + years_of_service_grouped_11to15 + years_of_service_grouped_16to20 + years_of_service_grouped_21to25 + years_of_service_grouped_25_over')
result28 = model28.fit()
result28.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.443
Model:,OLS,Adj. R-squared:,0.266
Method:,Least Squares,F-statistic:,2.494
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.00173
Time:,12:09:23,Log-Likelihood:,-343.52
No. Observations:,96,AIC:,735.0
Df Residuals:,72,BIC:,796.6
Df Model:,23,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,32.4885,5.312,6.116,0.000,21.900,43.077
gender_Female,18.2562,2.768,6.596,0.000,12.738,23.774
gender_Male,14.2324,3.105,4.584,0.000,8.042,20.422
race_grouping_white,-8.7651,7.960,-1.101,0.275,-24.634,7.103
race_grouping_person_of_color,-12.3227,8.173,-1.508,0.136,-28.615,3.969
age_group_5_25_under,-10.2986,4.405,-2.338,0.022,-19.079,-1.518
age_group_5_25to29,-7.6966,4.002,-1.923,0.058,-15.674,0.281
age_group_5_30to34,-3.6324,3.639,-0.998,0.322,-10.888,3.623
age_group_5_35to39,-2.4335,3.749,-0.649,0.518,-9.908,5.041

0,1,2,3
Omnibus:,1.708,Durbin-Watson:,1.904
Prob(Omnibus):,0.426,Jarque-Bera (JB):,1.177
Skew:,0.239,Prob(JB):,0.555
Kurtosis:,3.257,Cond. No.,1.41e+16


In [297]:
merit_raises_combined_hourly_regression = merit_raises_combined[(merit_raises_combined['dept'] == 'News') & (merit_raises_combined['pay_rate_type'] == 'Hourly')]
merit_raises_combined_hourly_regression = pd.get_dummies(merit_raises_combined_hourly_regression, columns=['gender','race_grouping','age_group_5'])

In [298]:
merit_raises_combined_hourly_regression = merit_raises_combined_hourly_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over'})
model29 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male')
result29 = model29.fit()
result29.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.01
Model:,OLS,Adj. R-squared:,0.001
Method:,Least Squares,F-statistic:,1.13
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.29
Time:,12:09:23,Log-Likelihood:,-217.43
No. Observations:,119,AIC:,438.9
Df Residuals:,117,BIC:,444.4
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.0256,0.095,10.816,0.000,0.838,1.213
gender_Female,0.6640,0.140,4.737,0.000,0.386,0.942
gender_Male,0.3616,0.159,2.273,0.025,0.047,0.677

0,1,2,3
Omnibus:,140.664,Durbin-Watson:,1.822
Prob(Omnibus):,0.0,Jarque-Bera (JB):,3520.132
Skew:,4.181,Prob(JB):,0.0
Kurtosis:,28.299,Cond. No.,2840000000000000.0


In [299]:
model30 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color')
result30 = model30.fit()
result30.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.03
Model:,OLS,Adj. R-squared:,0.021
Method:,Least Squares,F-statistic:,3.581
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.0609
Time:,12:09:23,Log-Likelihood:,-216.21
No. Observations:,119,AIC:,436.4
Df Residuals:,117,BIC:,442.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.9759,0.099,9.846,0.000,0.780,1.172
race_grouping_white,0.7693,0.138,5.583,0.000,0.496,1.042
race_grouping_person_of_color,0.2066,0.174,1.190,0.236,-0.137,0.550

0,1,2,3
Omnibus:,140.033,Durbin-Watson:,1.726
Prob(Omnibus):,0.0,Jarque-Bera (JB):,3604.75
Skew:,4.131,Prob(JB):,0.0
Kurtosis:,28.666,Cond. No.,4700000000000000.0


In [300]:
model31 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result31 = model31.fit()
result31.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.035
Model:,OLS,Adj. R-squared:,0.018
Method:,Least Squares,F-statistic:,2.084
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.129
Time:,12:09:23,Log-Likelihood:,-215.9
No. Observations:,119,AIC:,437.8
Df Residuals:,116,BIC:,446.1
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.7239,0.075,9.628,0.000,0.575,0.873
gender_Female,0.4726,0.143,3.312,0.001,0.190,0.755
gender_Male,0.2512,0.153,1.645,0.103,-0.051,0.554
race_grouping_white,0.6242,0.142,4.386,0.000,0.342,0.906
race_grouping_person_of_color,0.0996,0.168,0.594,0.554,-0.233,0.432

0,1,2,3
Omnibus:,138.94,Durbin-Watson:,1.699
Prob(Omnibus):,0.0,Jarque-Bera (JB):,3489.086
Skew:,4.091,Prob(JB):,0.0
Kurtosis:,28.234,Cond. No.,5770000000000000.0


In [301]:
new_reason_for_change_combined_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1]})
new_reason_for_change_combined_regression['predicted'] = result31.predict(new_reason_for_change_combined_regression)
new_reason_for_change_combined_regression

Unnamed: 0,gender_Female,gender_Male,race_grouping_white,race_grouping_person_of_color,predicted
0,1,0,1,0,1.82
1,0,1,1,0,1.6
2,1,0,0,1,1.3
3,0,1,0,1,1.07


In [302]:
model32 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result32 = model32.fit()
result32.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.076
Model:,OLS,Adj. R-squared:,-0.01
Method:,Least Squares,F-statistic:,0.8829
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.552
Time:,12:09:24,Log-Likelihood:,-213.33
No. Observations:,119,AIC:,448.7
Df Residuals:,108,BIC:,479.2
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.9858,0.109,9.020,0.000,0.769,1.202
gender_Female,0.6158,0.155,3.964,0.000,0.308,0.924
gender_Male,0.3701,0.173,2.135,0.035,0.027,0.714
age_group_5_25_under,-0.9278,0.814,-1.140,0.257,-2.541,0.686
age_group_5_25to29,0.1217,0.330,0.369,0.713,-0.532,0.775
age_group_5_30to34,0.1034,0.365,0.284,0.777,-0.619,0.826
age_group_5_35to39,-0.1446,0.429,-0.337,0.737,-0.996,0.707
age_group_5_40to44,0.2296,0.429,0.535,0.594,-0.622,1.081
age_group_5_45to49,0.0921,0.381,0.242,0.809,-0.663,0.847

0,1,2,3
Omnibus:,146.672,Durbin-Watson:,1.806
Prob(Omnibus):,0.0,Jarque-Bera (JB):,4436.142
Skew:,4.377,Prob(JB):,0.0
Kurtosis:,31.601,Cond. No.,1.24e+16


In [303]:
model33 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result33 = model33.fit()
result33.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.098
Model:,OLS,Adj. R-squared:,0.015
Method:,Least Squares,F-statistic:,1.176
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.315
Time:,12:09:24,Log-Likelihood:,-211.85
No. Observations:,119,AIC:,445.7
Df Residuals:,108,BIC:,476.3
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.9191,0.115,7.971,0.000,0.691,1.148
race_grouping_white,0.7541,0.148,5.084,0.000,0.460,1.048
race_grouping_person_of_color,0.1650,0.191,0.866,0.389,-0.213,0.543
age_group_5_25_under,-1.1631,0.806,-1.444,0.152,-2.760,0.434
age_group_5_25to29,0.2934,0.327,0.898,0.371,-0.354,0.941
age_group_5_30to34,-0.0743,0.355,-0.209,0.835,-0.778,0.630
age_group_5_35to39,-0.0445,0.428,-0.104,0.918,-0.894,0.805
age_group_5_40to44,0.1334,0.425,0.314,0.754,-0.709,0.976
age_group_5_45to49,0.1278,0.375,0.341,0.734,-0.616,0.871

0,1,2,3
Omnibus:,142.251,Durbin-Watson:,1.706
Prob(Omnibus):,0.0,Jarque-Bera (JB):,4046.992
Skew:,4.185,Prob(JB):,0.0
Kurtosis:,30.316,Cond. No.,1.68e+16


In [304]:
model34 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result34 = model34.fit()
result34.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.099
Model:,OLS,Adj. R-squared:,0.006
Method:,Least Squares,F-statistic:,1.069
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.393
Time:,12:09:24,Log-Likelihood:,-211.8
No. Observations:,119,AIC:,447.6
Df Residuals:,107,BIC:,480.9
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.6987,0.088,7.906,0.000,0.523,0.874
gender_Female,0.3987,0.164,2.437,0.016,0.074,0.723
gender_Male,0.2999,0.169,1.779,0.078,-0.034,0.634
race_grouping_white,0.6296,0.158,3.992,0.000,0.317,0.942
race_grouping_person_of_color,0.0691,0.188,0.367,0.714,-0.304,0.442
age_group_5_25_under,-1.1511,0.815,-1.412,0.161,-2.768,0.465
age_group_5_25to29,0.2459,0.339,0.725,0.470,-0.427,0.919
age_group_5_30to34,-0.0636,0.372,-0.171,0.865,-0.802,0.675
age_group_5_35to39,-0.0632,0.431,-0.147,0.884,-0.917,0.791

0,1,2,3
Omnibus:,142.272,Durbin-Watson:,1.701
Prob(Omnibus):,0.0,Jarque-Bera (JB):,4047.261
Skew:,4.186,Prob(JB):,0.0
Kurtosis:,30.316,Cond. No.,1.99e+16


In [305]:
model35 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male')
result35 = model35.fit()
result35.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.004
Model:,OLS,Adj. R-squared:,-0.005
Method:,Least Squares,F-statistic:,0.4057
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.526
Time:,12:09:24,Log-Likelihood:,-40.137
No. Observations:,111,AIC:,84.27
Df Residuals:,109,BIC:,89.69
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.3463,0.023,103.057,0.000,2.301,2.391
gender_Female,1.1949,0.033,35.693,0.000,1.129,1.261
gender_Male,1.1514,0.038,30.021,0.000,1.075,1.227

0,1,2,3
Omnibus:,7.442,Durbin-Watson:,2.088
Prob(Omnibus):,0.024,Jarque-Bera (JB):,6.902
Skew:,0.544,Prob(JB):,0.0317
Kurtosis:,2.444,Cond. No.,3470000000000000.0


In [306]:
model36 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color')
result36 = model36.fit()
result36.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.044
Model:,OLS,Adj. R-squared:,0.035
Method:,Least Squares,F-statistic:,4.968
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.0279
Time:,12:09:24,Log-Likelihood:,-37.869
No. Observations:,111,AIC:,79.74
Df Residuals:,109,BIC:,85.16
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.3314,0.023,100.433,0.000,2.285,2.377
race_grouping_white,1.2433,0.033,38.131,0.000,1.179,1.308
race_grouping_person_of_color,1.0881,0.040,26.941,0.000,1.008,1.168

0,1,2,3
Omnibus:,4.85,Durbin-Watson:,2.092
Prob(Omnibus):,0.088,Jarque-Bera (JB):,4.227
Skew:,0.391,Prob(JB):,0.121
Kurtosis:,2.451,Cond. No.,3500000000000000.0


In [307]:
model37 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result37 = model37.fit()
result37.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.044
Model:,OLS,Adj. R-squared:,0.026
Method:,Least Squares,F-statistic:,2.484
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.0882
Time:,12:09:24,Log-Likelihood:,-37.847
No. Observations:,111,AIC:,81.69
Df Residuals:,108,BIC:,89.82
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.7480,0.018,98.864,0.000,1.713,1.783
gender_Female,0.8811,0.034,25.816,0.000,0.813,0.949
gender_Male,0.8668,0.037,23.645,0.000,0.794,0.940
race_grouping_white,0.9501,0.034,27.947,0.000,0.883,1.018
race_grouping_person_of_color,0.7979,0.039,20.276,0.000,0.720,0.876

0,1,2,3
Omnibus:,5.045,Durbin-Watson:,2.099
Prob(Omnibus):,0.08,Jarque-Bera (JB):,4.403
Skew:,0.402,Prob(JB):,0.111
Kurtosis:,2.448,Cond. No.,2.18e+16


In [308]:
model38 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result38 = model38.fit()
result38.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.136
Model:,OLS,Adj. R-squared:,0.05
Method:,Least Squares,F-statistic:,1.574
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.125
Time:,12:09:24,Log-Likelihood:,-32.232
No. Observations:,111,AIC:,86.46
Df Residuals:,100,BIC:,116.3
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.1848,0.027,80.675,0.000,2.131,2.238
gender_Female,1.1135,0.037,30.395,0.000,1.041,1.186
gender_Male,1.0713,0.041,26.028,0.000,0.990,1.153
age_group_5_25_under,0.0228,0.221,0.103,0.918,-0.416,0.461
age_group_5_25to29,0.1814,0.076,2.396,0.018,0.031,0.332
age_group_5_30to34,0.2256,0.087,2.580,0.011,0.052,0.399
age_group_5_35to39,0.0520,0.101,0.513,0.609,-0.149,0.253
age_group_5_40to44,0.5228,0.098,5.360,0.000,0.329,0.716
age_group_5_45to49,0.2274,0.087,2.620,0.010,0.055,0.400

0,1,2,3
Omnibus:,4.456,Durbin-Watson:,2.073
Prob(Omnibus):,0.108,Jarque-Bera (JB):,3.765
Skew:,0.354,Prob(JB):,0.152
Kurtosis:,2.44,Cond. No.,1.51e+16


In [309]:
model39 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result39 = model39.fit()
result39.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.164
Model:,OLS,Adj. R-squared:,0.08
Method:,Least Squares,F-statistic:,1.96
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.0457
Time:,12:09:24,Log-Likelihood:,-30.408
No. Observations:,111,AIC:,82.82
Df Residuals:,100,BIC:,112.6
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.1685,0.028,77.356,0.000,2.113,2.224
race_grouping_white,1.1544,0.034,33.766,0.000,1.087,1.222
race_grouping_person_of_color,1.0141,0.044,23.298,0.000,0.928,1.100
age_group_5_25_under,-0.0229,0.218,-0.105,0.917,-0.456,0.410
age_group_5_25to29,0.2181,0.075,2.921,0.004,0.070,0.366
age_group_5_30to34,0.1877,0.084,2.230,0.028,0.021,0.355
age_group_5_35to39,0.0809,0.101,0.802,0.424,-0.119,0.281
age_group_5_40to44,0.5005,0.096,5.205,0.000,0.310,0.691
age_group_5_45to49,0.2335,0.085,2.741,0.007,0.064,0.402

0,1,2,3
Omnibus:,3.523,Durbin-Watson:,2.05
Prob(Omnibus):,0.172,Jarque-Bera (JB):,2.913
Skew:,0.285,Prob(JB):,0.233
Kurtosis:,2.448,Cond. No.,2.68e+16


In [310]:
model40 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result40 = model40.fit()
result40.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.164
Model:,OLS,Adj. R-squared:,0.071
Method:,Least Squares,F-statistic:,1.764
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.0705
Time:,12:09:24,Log-Likelihood:,-30.408
No. Observations:,111,AIC:,84.82
Df Residuals:,99,BIC:,117.3
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.6522,0.022,76.828,0.000,1.610,1.695
gender_Female,0.8254,0.039,21.212,0.000,0.748,0.903
gender_Male,0.8268,0.040,20.585,0.000,0.747,0.907
race_grouping_white,0.8965,0.037,24.406,0.000,0.824,0.969
race_grouping_person_of_color,0.7557,0.043,17.447,0.000,0.670,0.842
age_group_5_25_under,-0.0748,0.220,-0.340,0.734,-0.511,0.361
age_group_5_25to29,0.1668,0.078,2.144,0.034,0.012,0.321
age_group_5_30to34,0.1356,0.089,1.521,0.131,-0.041,0.312
age_group_5_35to39,0.0292,0.102,0.287,0.775,-0.173,0.231

0,1,2,3
Omnibus:,3.487,Durbin-Watson:,2.049
Prob(Omnibus):,0.175,Jarque-Bera (JB):,2.89
Skew:,0.284,Prob(JB):,0.236
Kurtosis:,2.45,Cond. No.,3.49e+16


## Commercial

### Gender

In [311]:
current_commercial_gender_salaried = commercial_salaried.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_commercial_gender_salaried)

Unnamed: 0_level_0,count_nonzero
gender,Unnamed: 1_level_1
Female,86.0
Male,47.0


In [312]:
current_commercial_gender_hourly = commercial_hourly.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_commercial_gender_hourly)

Unnamed: 0_level_0,count_nonzero
gender,Unnamed: 1_level_1
Female,74.0
Male,73.0


In [313]:
current_commercial_gender_salaried_median = commercial_salaried.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_salaried_median)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,86.0,85977.35
Male,47.0,86880.0


In [314]:
current_commercial_gender_hourly_median = commercial_hourly.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_hourly_median)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,74.0,28.89
Male,73.0,23.45


In [315]:
current_commercial_gender_age_salaried = commercial_salaried.groupby(['gender'])['age'].median().sort_values(ascending=False)
current_commercial_gender_age_salaried

gender
Male     39.00
Female   32.00
Name: age, dtype: float64

In [316]:
current_commercial_gender_age_hourly = commercial_hourly.groupby(['gender'])['age'].median().sort_values(ascending=False)
current_commercial_gender_age_hourly

gender
Male     47.00
Female   43.50
Name: age, dtype: float64

In [317]:
current_commercial_gender_age_5_salary = commercial_salaried.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_age_5_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,8.0,63500.0
25-29,Female,29.0,75000.0
25-29,Male,6.0,79140.0
30-34,Female,9.0,100000.0
30-34,Male,7.0,97695.6
35-39,Female,9.0,149101.0
35-39,Male,9.0,77626.78
40-44,Female,8.0,124287.97
45-49,Female,7.0,90585.0
45-49,Male,6.0,85089.96


In [318]:
current_commercial_gender_age_5_hourly = commercial_hourly.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_age_5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Male,7.0,23.08
25-29,Female,14.0,31.76
25-29,Male,8.0,26.17
30-34,Female,6.0,30.32
35-39,Female,5.0,30.77
35-39,Male,8.0,30.62
40-44,Female,12.0,29.48
40-44,Male,5.0,21.5
45-49,Female,7.0,31.28
45-49,Male,10.0,22.39


In [319]:
current_commercial_gender_age_10_salary = commercial_salaried.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_age_10_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,8.0,63500.0
25-34,Female,38.0,80212.0
25-34,Male,13.0,86880.0
35-44,Female,17.0,143575.94
35-44,Male,10.0,84029.11
45-54,Female,14.0,90627.24
45-54,Male,9.0,85000.0
55-64,Female,9.0,96780.0
55-64,Male,11.0,97134.77


In [320]:
current_commercial_gender_age_10_hourly = commercial_hourly.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_age_10_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Male,7.0,23.08
25-34,Female,20.0,31.03
25-34,Male,11.0,26.04
35-44,Female,17.0,29.74
35-44,Male,13.0,27.18
45-54,Female,13.0,26.14
45-54,Male,22.0,23.49
55-64,Female,15.0,25.36
55-64,Male,14.0,23.86
65+,Female,5.0,27.69


In [321]:
current_commercial_gender_salaried_under_40 = commercial_salaried[commercial_salaried['age'] < 40].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_salaried_under_40)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,55.0,80424.0
Male,24.0,83140.0


In [322]:
current_commercial_gender_salaried_over_40 = commercial_salaried[commercial_salaried['age'] > 39].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_salaried_over_40)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,31.0,96780.0
Male,23.0,90000.0


In [323]:
current_commercial_gender_hourly_under_40 = commercial_hourly[commercial_hourly['age'] < 40].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_hourly_under_40)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,29.0,30.38
Male,26.0,26.53


In [324]:
current_commercial_gender_hourly_over_40 = commercial_hourly[commercial_hourly['age'] > 39].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_hourly_over_40)

Unnamed: 0_level_0,count_nonzero,median
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,45.0,27.69
Male,47.0,23.2


### Race and ethnicity

In [325]:
current_commercial_race_salaried = commercial_salaried.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_commercial_race_salaried)

Unnamed: 0_level_0,count_nonzero
race_ethnicity,Unnamed: 1_level_1
White (United States of America),99.0
Black or African American (United States of America),14.0
Asian (United States of America),13.0
Hispanic or Latino (United States of America),5.0


In [326]:
current_commercial_race_hourly = commercial_hourly.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_commercial_race_hourly)

Unnamed: 0_level_0,count_nonzero
race_ethnicity,Unnamed: 1_level_1
Black or African American (United States of America),82.0
White (United States of America),43.0
Hispanic or Latino (United States of America),9.0
Asian (United States of America),7.0


In [327]:
current_commercial_race_group_salaried = commercial_salaried.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_commercial_race_group_salaried)

Unnamed: 0_level_0,count_nonzero
race_grouping,Unnamed: 1_level_1
white,99.0
person of color,32.0


In [328]:
current_commercial_race_group_hourly = commercial_hourly.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_commercial_race_group_hourly)

Unnamed: 0_level_0,count_nonzero
race_grouping,Unnamed: 1_level_1
person of color,101.0
white,43.0


In [329]:
current_commercial_race_median_salaried = commercial_salaried.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_median_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),99.0,88000.0
Black or African American (United States of America),14.0,84640.0
Asian (United States of America),13.0,80000.0
Hispanic or Latino (United States of America),5.0,80000.0


In [330]:
current_commercial_race_median_hourly = commercial_hourly.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_median_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),43.0,30.38
Asian (United States of America),7.0,26.04
Black or African American (United States of America),82.0,24.91
Hispanic or Latino (United States of America),9.0,23.12


In [331]:
current_commercial_race_group_median_salaried = commercial_salaried.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_group_median_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
white,99.0,88000.0
person of color,32.0,83444.64


In [332]:
current_commercial_race_group_median_hourly = commercial_hourly.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_group_median_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
white,43.0,30.38
person of color,101.0,25.16


In [333]:
current_commercial_race_age_salaried = commercial_salaried.groupby(['race_ethnicity'])['age'].median().sort_values(ascending=False)
current_commercial_race_age_salaried

race_ethnicity
Black or African American (United States of America)   48.00
Hispanic or Latino (United States of America)          41.00
Prefer Not to Disclose (United States of America)      35.50
White (United States of America)                       35.00
Asian (United States of America)                       32.00
Name: age, dtype: float64

In [334]:
current_commercial_race_age_hourly = commercial_hourly.groupby(['race_ethnicity'])['age'].median().sort_values(ascending=False)
current_commercial_race_age_hourly

race_ethnicity
Black or African American (United States of America)          48.50
White (United States of America)                              39.00
American Indian or Alaska Native (United States of America)   38.00
Prefer Not to Disclose (United States of America)             35.00
Two or More Races (United States of America)                  31.00
Hispanic or Latino (United States of America)                 30.00
Asian (United States of America)                              28.00
Name: age, dtype: float64

In [335]:
current_commercial_race_age_5_salary = commercial_salaried.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_age_5_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),9.0,63000.0
25-29,White (United States of America),28.0,78691.5
30-34,White (United States of America),12.0,98847.8
35-39,White (United States of America),13.0,149101.0
40-44,White (United States of America),6.0,126864.75
45-49,White (United States of America),7.0,90000.0
50-54,White (United States of America),9.0,87391.89
55-59,White (United States of America),8.0,96957.39
60-64,White (United States of America),6.0,97651.02


In [336]:
current_commercial_race_age_5_hourly = commercial_hourly.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_age_5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Black or African American (United States of America),5.0,22.36
25-29,White (United States of America),11.0,31.84
35-39,White (United States of America),6.0,30.81
40-44,Black or African American (United States of America),13.0,28.89
45-49,Black or African American (United States of America),14.0,23.11
50-54,Black or African American (United States of America),12.0,23.27
50-54,White (United States of America),5.0,24.44
55-59,Black or African American (United States of America),11.0,27.05
55-59,White (United States of America),5.0,25.36
60-64,Black or African American (United States of America),11.0,24.27


In [337]:
current_commercial_race_age_10_salary = commercial_salaried.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_age_10_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),9.0,63000.0
25-34,Asian (United States of America),6.0,82418.32
25-34,White (United States of America),40.0,82000.0
35-44,White (United States of America),19.0,148729.5
45-54,White (United States of America),16.0,88695.95
55-64,White (United States of America),14.0,97324.6


In [338]:
current_commercial_race_age_10_hourly = commercial_hourly.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_age_10_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Black or African American (United States of America),5.0,22.36
25-34,Black or African American (United States of America),7.0,26.73
25-34,Hispanic or Latino (United States of America),6.0,24.99
25-34,White (United States of America),12.0,31.76
35-44,Black or African American (United States of America),17.0,29.23
35-44,White (United States of America),8.0,30.57
45-54,Black or African American (United States of America),26.0,23.27
45-54,White (United States of America),8.0,30.81
55-64,Black or African American (United States of America),22.0,24.54
55-64,White (United States of America),7.0,26.41


In [339]:
current_commercial_race_group_age_5_salary = commercial_salaried.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_group_age_5_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,white,9.0,63000.0
25-29,person of color,7.0,72000.0
25-29,white,28.0,78691.5
30-34,white,12.0,98847.8
35-39,person of color,5.0,73521.6
35-39,white,13.0,149101.0
40-44,white,6.0,126864.75
45-49,person of color,6.0,85449.96
45-49,white,7.0,90000.0
50-54,white,9.0,87391.89


In [340]:
current_commercial_race_group_age_5_hourly = commercial_hourly.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_group_age_5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,7.0,25.64
25-29,person of color,10.0,26.29
25-29,white,11.0,31.84
30-34,person of color,8.0,28.82
35-39,person of color,6.0,30.81
35-39,white,6.0,30.81
40-44,person of color,14.0,28.52
45-49,person of color,14.0,23.11
50-54,person of color,13.0,23.19
50-54,white,5.0,24.44


In [341]:
current_commercial_race_group_age_10_salary = commercial_salaried.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_group_age_10_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,white,9.0,63000.0
25-34,person of color,10.0,74918.32
25-34,white,40.0,82000.0
35-44,person of color,7.0,90431.45
35-44,white,19.0,148729.5
45-54,person of color,7.0,85000.0
45-54,white,16.0,88695.95
55-64,person of color,6.0,82708.86
55-64,white,14.0,97324.6


In [342]:
current_commercial_race_group_age_10_hourly = commercial_hourly.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_group_age_10_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,7.0,25.64
25-34,person of color,18.0,26.52
25-34,white,12.0,31.76
35-44,person of color,20.0,29.06
35-44,white,8.0,30.57
45-54,person of color,27.0,23.19
45-54,white,8.0,30.81
55-64,person of color,22.0,24.54
55-64,white,7.0,26.41
65+,person of color,7.0,23.4


In [343]:
current_commercial_race_under_40_salaried = commercial_salaried[commercial_salaried['age'] < 40].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_under_40_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),62.0,82000.0
Asian (United States of America),10.0,77418.32


In [344]:
current_commercial_race_over_40_salaried = commercial_salaried[commercial_salaried['age'] > 39].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_over_40_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),37.0,97134.77
Black or African American (United States of America),10.0,84848.86


In [345]:
current_commercial_race_under_40_hourly = commercial_hourly[commercial_hourly['age'] < 40].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_under_40_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),22.0,31.46
Black or African American (United States of America),16.0,26.5
Hispanic or Latino (United States of America),8.0,25.62


In [346]:
current_commercial_race_over_40_hourly = commercial_hourly[commercial_hourly['age'] > 39].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_over_40_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
White (United States of America),21.0,29.23
Black or African American (United States of America),66.0,24.35


### Gender x race/ethnicity

In [347]:
current_commercial_race_gender_salaried = commercial_salaried.groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_commercial_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
race_ethnicity,gender,Unnamed: 2_level_1
Asian (United States of America),Female,8.0
Asian (United States of America),Male,5.0
Black or African American (United States of America),Female,7.0
Black or African American (United States of America),Male,7.0
White (United States of America),Female,67.0
White (United States of America),Male,32.0


In [348]:
current_commercial_race_gender_hourly = commercial_hourly.groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_commercial_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
race_ethnicity,gender,Unnamed: 2_level_1
Black or African American (United States of America),Female,41.0
Black or African American (United States of America),Male,41.0
Hispanic or Latino (United States of America),Female,6.0
White (United States of America),Female,22.0
White (United States of America),Male,21.0


In [349]:
current_commercial_race_gender_median_salaried = commercial_salaried.groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_gender_median_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,17.0,85000.0
person of color,Male,15.0,76866.1
white,Female,67.0,86104.69
white,Male,32.0,94496.71


In [350]:
current_commercial_race_gender_median_hourly = commercial_hourly.groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_gender_median_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_grouping,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
person of color,Female,52.0,26.54
person of color,Male,49.0,23.33
white,Female,22.0,31.76
white,Male,21.0,26.76


In [351]:
current_commercial_race_gender_under_40_salaried = commercial_salaried[commercial_salaried['age'] < 40].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_gender_under_40_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Asian (United States of America),Female,6.0,85000.0
White (United States of America),Female,46.0,80212.0
White (United States of America),Male,16.0,90940.0


In [352]:
current_commercial_race_gender_under_40_hourly = commercial_hourly[commercial_hourly['age'] < 40].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_gender_under_40_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Black or African American (United States of America),Female,8.0,26.5
Black or African American (United States of America),Male,8.0,26.31
Hispanic or Latino (United States of America),Female,6.0,28.51
White (United States of America),Female,12.0,33.28
White (United States of America),Male,10.0,30.57


In [353]:
current_commercial_race_gender_over_40_salaried = commercial_salaried[commercial_salaried['age'] > 39].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_gender_over_40_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Black or African American (United States of America),Female,6.0,94950.5
White (United States of America),Female,21.0,97546.0
White (United States of America),Male,16.0,95564.1


In [354]:
current_commercial_race_gender_over_40_hourly = commercial_hourly[commercial_hourly['age'] > 39].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_gender_over_40_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Black or African American (United States of America),Female,33.0,26.14
Black or African American (United States of America),Male,33.0,23.07
White (United States of America),Female,10.0,31.02
White (United States of America),Male,11.0,23.85


### Years of service

In [355]:
current_commercial_yos_salary = commercial_salaried.groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_salary)

Unnamed: 0_level_0,count_nonzero,median
years_of_service_grouped,Unnamed: 1_level_1,Unnamed: 2_level_1
0,31.0,82000.0
1-2,36.0,80212.0
3-5,26.0,95769.71
6-10,15.0,99316.0
11-15,6.0,76331.03
16-20,6.0,81765.65
21-25,8.0,94006.52
25+,5.0,93490.62


In [356]:
current_commercial_yos_hourly = commercial_hourly.groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_hourly)

Unnamed: 0_level_0,count_nonzero,median
years_of_service_grouped,Unnamed: 1_level_1,Unnamed: 2_level_1
0,26.0,25.64
1-2,33.0,26.99
3-5,14.0,23.16
6-10,19.0,23.98
11-15,14.0,30.15
16-20,17.0,24.32
21-25,9.0,29.74
25+,15.0,26.34


In [357]:
current_commercial_yos_gender_salary = commercial_salaried.groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_gender_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Female,22.0,74640.0
0,Male,9.0,90000.0
1-2,Female,26.0,80212.0
1-2,Male,10.0,81640.0
3-5,Female,16.0,94107.74
3-5,Male,10.0,102496.71
6-10,Female,12.0,99499.7
21-25,Male,6.0,91466.08


In [358]:
current_commercial_yos_gender_hourly = commercial_hourly.groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Female,10.0,29.48
0,Male,16.0,22.05
1-2,Female,18.0,30.29
1-2,Male,15.0,24.35
3-5,Female,5.0,30.77
3-5,Male,9.0,22.14
6-10,Female,5.0,26.27
6-10,Male,14.0,23.62
11-15,Male,10.0,29.04
16-20,Female,10.0,24.16


In [359]:
current_commercial_yos_race_salary = commercial_salaried.groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_race_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
0,White (United States of America),23.0,82000.0
1-2,White (United States of America),30.0,80212.0
3-5,White (United States of America),19.0,108780.0
6-10,White (United States of America),11.0,102500.0
16-20,White (United States of America),5.0,87391.89
21-25,White (United States of America),6.0,97651.02


In [360]:
current_commercial_yos_race_hourly = commercial_hourly.groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
years_of_service_grouped,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
0,Black or African American (United States of America),11.0,25.64
0,White (United States of America),6.0,29.52
1-2,Black or African American (United States of America),14.0,23.56
1-2,White (United States of America),13.0,34.72
3-5,Black or African American (United States of America),6.0,21.83
3-5,White (United States of America),5.0,23.2
6-10,Black or African American (United States of America),12.0,23.62
6-10,White (United States of America),6.0,29.91
11-15,Black or African American (United States of America),7.0,30.38
11-15,White (United States of America),6.0,26.01


In [361]:
current_commercial_yos_race_gender_salary = commercial_salaried.groupby(['years_of_service_grouped','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_race_gender_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
years_of_service_grouped,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
0,person of color,Female,6.0,78500.0
0,white,Female,15.0,74280.0
0,white,Male,8.0,92500.0
1-2,person of color,Female,5.0,96980.0
1-2,white,Female,21.0,77383.0
1-2,white,Male,9.0,83280.0
3-5,person of color,Male,5.0,74836.65
3-5,white,Female,14.0,94107.74
3-5,white,Male,5.0,125530.0
6-10,white,Female,10.0,101091.7


In [362]:
current_commercial_yos_race_gender_hourly = commercial_hourly.groupby(['years_of_service_grouped','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
years_of_service_grouped,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
0,person of color,Female,7.0,29.74
0,person of color,Male,10.0,21.35
1-2,person of color,Female,9.0,26.73
1-2,person of color,Male,11.0,22.36
1-2,white,Female,9.0,35.01
3-5,person of color,Male,6.0,21.83
6-10,person of color,Male,10.0,23.42
11-15,person of color,Male,5.0,29.92
11-15,white,Male,5.0,26.76
16-20,person of color,Female,9.0,23.99


### Age

In [363]:
current_median_commercial_age_5_salaried = commercial_salaried.groupby(['age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_salaried)

Unnamed: 0_level_0,count_nonzero,median
age_group_5,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,10.0,64000.0
25-29,35.0,75000.0
30-34,16.0,98847.8
35-39,18.0,101091.7
40-44,9.0,143575.94
45-49,13.0,86104.69
50-54,10.0,87002.45
55-59,10.0,96957.39
60-64,10.0,95753.93


In [364]:
current_median_commercial_age_5_hourly = commercial_hourly.groupby(['age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_hourly)

Unnamed: 0_level_0,count_nonzero,median
age_group_5,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,11.0,25.64
25-29,22.0,29.77
30-34,9.0,29.51
35-39,13.0,30.77
40-44,17.0,28.89
45-49,17.0,23.99
50-54,18.0,23.6
55-59,16.0,26.23
60-64,13.0,24.32
65+,11.0,23.4


In [365]:
current_median_commercial_age_10_salaried = commercial_salaried.groupby(['age_group_10']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_salaried)

Unnamed: 0_level_0,count_nonzero,median
age_group_10,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,10.0,64000.0
25-34,51.0,82000.0
35-44,27.0,105000.0
45-54,23.0,86613.0
55-64,20.0,96957.39


In [366]:
current_median_commercial_age_10_hourly = commercial_hourly.groupby(['age_group_10']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_hourly)

Unnamed: 0_level_0,count_nonzero,median
age_group_10,Unnamed: 1_level_1,Unnamed: 2_level_1
<25,11.0,25.64
25-34,31.0,29.51
35-44,30.0,29.23
45-54,35.0,23.85
55-64,29.0,24.71
65+,11.0,23.4


In [367]:
current_commercial_age_5_yos_salary = commercial_salaried.groupby(['age_group_5','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_age_5_yos_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,years_of_service_grouped,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,0,6.0,62500.0
25-29,0,14.0,75000.0
25-29,1-2,17.0,76000.0
30-34,0,6.0,100000.0
30-34,1-2,7.0,96980.0
35-39,3-5,7.0,149101.0
35-39,6-10,6.0,101091.7
40-44,3-5,5.0,167000.0
60-64,21-25,5.0,97514.43


In [368]:
current_commercial_age_5_yos_hourly = commercial_hourly.groupby(['age_group_5','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_age_5_yos_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,years_of_service_grouped,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,0,5.0,23.08
<25,1-2,6.0,27.94
25-29,0,6.0,33.34
25-29,1-2,15.0,26.73
30-34,0,5.0,22.05
35-39,11-15,5.0,30.38
40-44,3-5,5.0,29.23
55-59,25+,6.0,27.94
60-64,16-20,5.0,24.27
65+,25+,5.0,26.82


In [369]:
current_commercial_age_10_yos_salary = commercial_salaried.groupby(['age_group_10','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_age_10_yos_salary)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,years_of_service_grouped,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,0,6.0,62500.0
25-34,0,20.0,82000.0
25-34,1-2,24.0,80810.05
25-34,3-5,5.0,85850.0
35-44,3-5,12.0,158050.5
35-44,6-10,6.0,101091.7
45-54,3-5,5.0,86613.0
55-64,21-25,5.0,97514.43


In [370]:
current_commercial_age_10_yos_hourly = commercial_hourly.groupby(['age_group_10','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_age_10_yos_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,years_of_service_grouped,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,0,5.0,23.08
<25,1-2,6.0,27.94
25-34,0,11.0,30.26
25-34,1-2,15.0,26.73
35-44,0,5.0,29.23
35-44,3-5,6.0,26.18
35-44,11-15,7.0,30.38
45-54,0,5.0,20.5
45-54,1-2,5.0,22.36
45-54,6-10,7.0,23.85


In [371]:
current_median_commercial_age_5_gender_salaried = commercial_salaried.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,8.0,63500.0
25-29,Female,29.0,75000.0
25-29,Male,6.0,79140.0
30-34,Female,9.0,100000.0
30-34,Male,7.0,97695.6
35-39,Female,9.0,149101.0
35-39,Male,9.0,77626.78
40-44,Female,8.0,124287.97
45-49,Female,7.0,90585.0
45-49,Male,6.0,85089.96


In [372]:
current_median_commercial_age_5_gender_hourly = commercial_hourly.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Male,7.0,23.08
25-29,Female,14.0,31.76
25-29,Male,8.0,26.17
30-34,Female,6.0,30.32
35-39,Female,5.0,30.77
35-39,Male,8.0,30.62
40-44,Female,12.0,29.48
40-44,Male,5.0,21.5
45-49,Female,7.0,31.28
45-49,Male,10.0,22.39


In [373]:
current_median_commercial_age_10_gender_salaried = commercial_salaried.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Female,8.0,63500.0
25-34,Female,38.0,80212.0
25-34,Male,13.0,86880.0
35-44,Female,17.0,143575.94
35-44,Male,10.0,84029.11
45-54,Female,14.0,90627.24
45-54,Male,9.0,85000.0
55-64,Female,9.0,96780.0
55-64,Male,11.0,97134.77


In [374]:
current_median_commercial_age_10_gender_hourly = commercial_hourly.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Male,7.0,23.08
25-34,Female,20.0,31.03
25-34,Male,11.0,26.04
35-44,Female,17.0,29.74
35-44,Male,13.0,27.18
45-54,Female,13.0,26.14
45-54,Male,22.0,23.49
55-64,Female,15.0,25.36
55-64,Male,14.0,23.86
65+,Female,5.0,27.69


In [375]:
current_median_commercial_age_5_race_salaried = commercial_salaried.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),9.0,63000.0
25-29,White (United States of America),28.0,78691.5
30-34,White (United States of America),12.0,98847.8
35-39,White (United States of America),13.0,149101.0
40-44,White (United States of America),6.0,126864.75
45-49,White (United States of America),7.0,90000.0
50-54,White (United States of America),9.0,87391.89
55-59,White (United States of America),8.0,96957.39
60-64,White (United States of America),6.0,97651.02


In [376]:
current_median_commercial_age_5_race_hourly = commercial_hourly.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Black or African American (United States of America),5.0,22.36
25-29,White (United States of America),11.0,31.84
35-39,White (United States of America),6.0,30.81
40-44,Black or African American (United States of America),13.0,28.89
45-49,Black or African American (United States of America),14.0,23.11
50-54,Black or African American (United States of America),12.0,23.27
50-54,White (United States of America),5.0,24.44
55-59,Black or African American (United States of America),11.0,27.05
55-59,White (United States of America),5.0,25.36
60-64,Black or African American (United States of America),11.0,24.27


In [377]:
current_median_commercial_age_5_race_group_salaried = commercial_salaried.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_group_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,white,9.0,63000.0
25-29,person of color,7.0,72000.0
25-29,white,28.0,78691.5
30-34,white,12.0,98847.8
35-39,person of color,5.0,73521.6
35-39,white,13.0,149101.0
40-44,white,6.0,126864.75
45-49,person of color,6.0,85449.96
45-49,white,7.0,90000.0
50-54,white,9.0,87391.89


In [378]:
current_median_commercial_age_5_race_group_hourly = commercial_hourly.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_group_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_5,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,7.0,25.64
25-29,person of color,10.0,26.29
25-29,white,11.0,31.84
30-34,person of color,8.0,28.82
35-39,person of color,6.0,30.81
35-39,white,6.0,30.81
40-44,person of color,14.0,28.52
45-49,person of color,14.0,23.11
50-54,person of color,13.0,23.19
50-54,white,5.0,24.44


In [379]:
current_median_commercial_age_10_race_salaried = commercial_salaried.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,White (United States of America),9.0,63000.0
25-34,Asian (United States of America),6.0,82418.32
25-34,White (United States of America),40.0,82000.0
35-44,White (United States of America),19.0,148729.5
45-54,White (United States of America),16.0,88695.95
55-64,White (United States of America),14.0,97324.6


In [380]:
current_median_commercial_age_10_race_hourly = commercial_hourly.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,Black or African American (United States of America),5.0,22.36
25-34,Black or African American (United States of America),7.0,26.73
25-34,Hispanic or Latino (United States of America),6.0,24.99
25-34,White (United States of America),12.0,31.76
35-44,Black or African American (United States of America),17.0,29.23
35-44,White (United States of America),8.0,30.57
45-54,Black or African American (United States of America),26.0,23.27
45-54,White (United States of America),8.0,30.81
55-64,Black or African American (United States of America),22.0,24.54
55-64,White (United States of America),7.0,26.41


In [381]:
current_median_commercial_age_10_race_group_salaried = commercial_salaried.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_group_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,white,9.0,63000.0
25-34,person of color,10.0,74918.32
25-34,white,40.0,82000.0
35-44,person of color,7.0,90431.45
35-44,white,19.0,148729.5
45-54,person of color,7.0,85000.0
45-54,white,16.0,88695.95
55-64,person of color,6.0,82708.86
55-64,white,14.0,97324.6


In [382]:
current_median_commercial_age_10_race_group_hourly = commercial_hourly.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_group_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
age_group_10,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
<25,person of color,7.0,25.64
25-34,person of color,18.0,26.52
25-34,white,12.0,31.76
35-44,person of color,20.0,29.06
35-44,white,8.0,30.57
45-54,person of color,27.0,23.19
45-54,white,8.0,30.81
55-64,person of color,22.0,24.54
55-64,white,7.0,26.41
65+,person of color,7.0,23.4


In [383]:
current_median_commercial_age_5_race_gender_salaried = commercial_salaried.groupby(['age_group_5','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_5,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,White (United States of America),Female,7.0,62000.0
25-29,White (United States of America),Female,25.0,76000.0
30-34,White (United States of America),Female,5.0,131097.12
30-34,White (United States of America),Male,7.0,97695.6
35-39,White (United States of America),Female,9.0,149101.0
40-44,White (United States of America),Female,6.0,126864.75
50-54,White (United States of America),Female,6.0,98281.24
55-59,White (United States of America),Male,5.0,97134.77


In [384]:
current_median_commercial_age_5_race_gender_hourly = commercial_hourly.groupby(['age_group_5','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_5,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,Black or African American (United States of America),Male,5.0,22.36
25-29,White (United States of America),Female,7.0,35.01
40-44,Black or African American (United States of America),Female,9.0,29.74
45-49,Black or African American (United States of America),Male,10.0,22.39
50-54,Black or African American (United States of America),Female,6.0,23.27
50-54,Black or African American (United States of America),Male,6.0,23.01
50-54,White (United States of America),Male,5.0,24.44
55-59,Black or African American (United States of America),Female,7.0,28.61
60-64,Black or African American (United States of America),Female,5.0,24.32
60-64,Black or African American (United States of America),Male,6.0,23.8


In [385]:
current_median_commercial_age_5_race_group_gender_salaried = commercial_salaried.groupby(['age_group_5','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_group_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_5,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,white,Female,7.0,62000.0
25-29,white,Female,25.0,76000.0
30-34,white,Female,5.0,131097.12
30-34,white,Male,7.0,97695.6
35-39,person of color,Male,5.0,73521.6
35-39,white,Female,9.0,149101.0
40-44,white,Female,6.0,126864.75
50-54,white,Female,6.0,98281.24
55-59,white,Male,5.0,97134.77


In [386]:
current_median_commercial_age_5_race_group_gender_hourly = commercial_hourly.groupby(['age_group_5','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_group_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_5,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,person of color,Male,5.0,22.36
25-29,person of color,Female,7.0,26.27
25-29,white,Female,7.0,35.01
30-34,person of color,Female,5.0,30.38
40-44,person of color,Female,10.0,29.48
45-49,person of color,Male,10.0,22.39
50-54,person of color,Female,6.0,23.27
50-54,person of color,Male,7.0,21.1
50-54,white,Male,5.0,24.44
55-59,person of color,Female,7.0,28.61


In [387]:
current_median_commercial_age_10_race_gender_salaried = commercial_salaried.groupby(['age_group_10','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_10,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,White (United States of America),Female,7.0,62000.0
25-34,Asian (United States of America),Female,5.0,90000.0
25-34,White (United States of America),Female,30.0,78691.5
25-34,White (United States of America),Male,10.0,96347.8
35-44,White (United States of America),Female,15.0,148729.5
45-54,White (United States of America),Female,10.0,98281.24
45-54,White (United States of America),Male,6.0,86195.95
55-64,White (United States of America),Female,5.0,96780.0
55-64,White (United States of America),Male,9.0,97514.43


In [388]:
current_median_commercial_age_10_race_gender_hourly = commercial_hourly.groupby(['age_group_10','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_10,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,Black or African American (United States of America),Male,5.0,22.36
25-34,Black or African American (United States of America),Female,6.0,26.5
25-34,Hispanic or Latino (United States of America),Female,5.0,28.13
25-34,White (United States of America),Female,8.0,33.42
35-44,Black or African American (United States of America),Female,11.0,29.74
35-44,Black or African American (United States of America),Male,6.0,24.84
45-54,Black or African American (United States of America),Female,10.0,23.67
45-54,Black or African American (United States of America),Male,16.0,22.39
45-54,White (United States of America),Male,5.0,24.44
55-64,Black or African American (United States of America),Female,12.0,24.99


In [389]:
current_median_commercial_age_10_race_group_gender_salaried = commercial_salaried.groupby(['age_group_10','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_group_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_10,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,white,Female,7.0,62000.0
25-34,person of color,Female,7.0,85000.0
25-34,white,Female,30.0,78691.5
25-34,white,Male,10.0,96347.8
35-44,person of color,Male,6.0,81976.52
35-44,white,Female,15.0,148729.5
45-54,white,Female,10.0,98281.24
45-54,white,Male,6.0,86195.95
55-64,white,Female,5.0,96780.0
55-64,white,Male,9.0,97514.43


In [390]:
current_median_commercial_age_10_race_group_gender_hourly = commercial_hourly.groupby(['age_group_10','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_group_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
age_group_10,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
<25,person of color,Male,5.0,22.36
25-34,person of color,Female,12.0,27.43
25-34,person of color,Male,6.0,26.17
25-34,white,Female,8.0,33.42
35-44,person of color,Female,13.0,29.74
35-44,person of color,Male,7.0,23.12
45-54,person of color,Female,10.0,23.67
45-54,person of color,Male,17.0,22.34
45-54,white,Male,5.0,24.44
55-64,person of color,Female,12.0,24.99


### Departments

In [391]:
current_commercial_median_department_salaried = commercial_salaried.groupby(['department']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_salaried)

Unnamed: 0_level_0,count_nonzero,median
department,Unnamed: 1_level_1,Unnamed: 2_level_1
Finance,8.0,90575.5
WP News Media Services,9.0,86104.69
Client Solutions,102.0,85633.86
Marketing,7.0,81196.11
Production,5.0,71665.06


In [392]:
current_commercial_median_department_hourly = commercial_hourly.groupby(['department']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_hourly)

Unnamed: 0_level_0,count_nonzero,median
department,Unnamed: 1_level_1,Unnamed: 2_level_1
Public Relations,5.0,35.01
Client Solutions,62.0,29.41
Finance,23.0,29.23
Circulation,49.0,22.44


In [393]:
current_commercial_median_department_gender_salaried = commercial_salaried.groupby(['department','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
department,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Finance,Female,5.0,96780.0
Client Solutions,Male,31.0,90000.0
WP News Media Services,Male,5.0,85899.92
Client Solutions,Female,71.0,85000.0


In [394]:
current_commercial_median_department_gender_hourly = commercial_hourly.groupby(['department','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
department,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Public Relations,Female,5.0,35.01
Client Solutions,Male,24.0,30.13
Finance,Female,17.0,29.23
Finance,Male,6.0,28.85
Client Solutions,Female,38.0,28.83
Circulation,Female,9.0,23.19
Circulation,Male,40.0,22.4


In [395]:
current_commercial_median_department_race_salaried = commercial_salaried.groupby(['department','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
department,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
Client Solutions,White (United States of America),79.0,90000.0
WP News Media Services,White (United States of America),8.0,88301.65
Client Solutions,Black or African American (United States of America),10.0,83804.64
Marketing,White (United States of America),5.0,83280.0
Client Solutions,Asian (United States of America),9.0,76139.41


In [396]:
current_commercial_median_department_race_hourly = commercial_hourly.groupby(['department','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
department,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
Client Solutions,White (United States of America),24.0,31.0
Finance,White (United States of America),5.0,29.49
Finance,Black or African American (United States of America),16.0,29.06
Client Solutions,Hispanic or Latino (United States of America),6.0,28.51
Client Solutions,Black or African American (United States of America),25.0,26.99
Client Solutions,Asian (United States of America),5.0,26.3
Circulation,White (United States of America),8.0,22.8
Circulation,Black or African American (United States of America),35.0,22.36


In [397]:
current_commercial_median_department_race_gender_salaried = commercial_salaried.groupby(['department','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
department,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
Client Solutions,White (United States of America),Male,22.0,98893.8
Client Solutions,Black or African American (United States of America),Female,6.0,92158.0
Client Solutions,White (United States of America),Female,57.0,86613.0
Client Solutions,Asian (United States of America),Female,5.0,80000.0


In [398]:
current_commercial_median_department_race_gender_hourly = commercial_hourly.groupby(['department','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
department,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
Client Solutions,White (United States of America),Female,13.0,31.68
Client Solutions,White (United States of America),Male,11.0,30.77
Finance,Black or African American (United States of America),Female,12.0,29.06
Client Solutions,Hispanic or Latino (United States of America),Female,6.0,28.51
Client Solutions,Black or African American (United States of America),Male,9.0,28.16
Client Solutions,Black or African American (United States of America),Female,16.0,25.95
Circulation,Black or African American (United States of America),Female,9.0,23.19
Circulation,White (United States of America),Male,8.0,22.8
Circulation,Black or African American (United States of America),Male,26.0,22.35


In [399]:
current_commercial_median_department_race_group_gender_salaried = commercial_salaried.groupby(['department','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_group_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
department,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
Client Solutions,white,Male,22.0,98893.8
Client Solutions,white,Female,57.0,86613.0
Client Solutions,person of color,Female,13.0,80000.0
Client Solutions,person of color,Male,9.0,76139.41


In [400]:
current_commercial_median_department_race_group_gender_hourly = commercial_hourly.groupby(['department','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_group_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
department,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
Client Solutions,white,Female,13.0,31.68
Client Solutions,white,Male,11.0,30.77
Finance,person of color,Female,13.0,28.89
Client Solutions,person of color,Male,13.0,27.05
Client Solutions,person of color,Female,25.0,26.34
Circulation,person of color,Female,9.0,23.19
Circulation,white,Male,8.0,22.8
Circulation,person of color,Male,30.0,22.35


In [401]:
current_commercial_median_department_race_gender_age5_salaried = commercial_salaried.groupby(['department','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
department,race_ethnicity,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
Client Solutions,White (United States of America),Female,35-39,9.0,149101.0
Client Solutions,White (United States of America),Female,40-44,6.0,126864.75
Client Solutions,White (United States of America),Female,50-54,5.0,105893.0
Client Solutions,White (United States of America),Male,30-34,5.0,100000.0
Client Solutions,White (United States of America),Female,25-29,23.0,75000.0
Client Solutions,White (United States of America),Female,<25,6.0,61000.0


In [402]:
current_commercial_median_department_race_gender_age5_hourly = commercial_hourly.groupby(['department','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_gender_age5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
department,race_ethnicity,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
Client Solutions,White (United States of America),Female,25-29,5.0,31.84
Circulation,Black or African American (United States of America),Male,60-64,6.0,23.8
Circulation,Black or African American (United States of America),Male,45-49,7.0,21.51


In [403]:
current_commercial_median_department_race_group_gender_age5_salaried = commercial_salaried.groupby(['department','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_group_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
department,race_grouping,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
Client Solutions,white,Female,35-39,9.0,149101.0
Client Solutions,white,Female,40-44,6.0,126864.75
Client Solutions,white,Female,50-54,5.0,105893.0
Client Solutions,white,Male,30-34,5.0,100000.0
Client Solutions,white,Female,25-29,23.0,75000.0
Client Solutions,white,Female,<25,6.0,61000.0


In [404]:
current_commercial_median_department_race_group_gender_age5_hourly = commercial_hourly.groupby(['department','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_group_gender_age5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
department,race_grouping,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
Client Solutions,white,Female,25-29,5.0,31.84
Client Solutions,person of color,Female,40-44,5.0,25.05
Circulation,person of color,Male,60-64,6.0,23.8
Circulation,person of color,Male,45-49,7.0,21.51
Circulation,person of color,Male,50-54,5.0,20.85


### Job profiles

In [405]:
current_commercial_median_job_salaried = commercial_salaried.groupby(['job_profile_current']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_salaried)

Unnamed: 0_level_0,count_nonzero,median
job_profile_current,Unnamed: 1_level_1,Unnamed: 2_level_1
450220 - Sales Representative,25.0,153987.3
350227 - Custom Content Writer,7.0,100000.0
551104 - Senior Financial Accountant,5.0,90566.0
450120 - Account Manager,26.0,88644.94
390110 - Multiplatform Editor,9.0,86104.69
280228 - Designer,7.0,85000.0
340227 - Artist,5.0,75035.28
481205 - Digital Analyst,5.0,75000.0
660127 - Make-Up Person,5.0,71665.06
231303 - Client Service Manager,15.0,67095.6


In [406]:
current_commercial_median_job_hourly = commercial_hourly.groupby(['job_profile_current']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_hourly)

Unnamed: 0_level_0,count_nonzero,median
job_profile_current,Unnamed: 1_level_1,Unnamed: 2_level_1
341027 - Desktop Publisher,6.0,30.81
574504 - Senior Accounting Specialist,11.0,30.38
565005 - Accounting Specialist,12.0,26.59
470121 - Account Executive,16.0,25.15
600318 - Circulation Driver (Class A),35.0,22.45


In [407]:
current_commercial_median_job_gender_salaried = commercial_salaried.groupby(['job_profile_current','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
job_profile_current,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
450220 - Sales Representative,Male,6.0,162338.6
450220 - Sales Representative,Female,19.0,150780.0
450120 - Account Manager,Female,17.0,90110.0
390110 - Multiplatform Editor,Male,5.0,85899.92
450120 - Account Manager,Male,9.0,85417.73
231303 - Client Service Manager,Female,13.0,68000.0


In [408]:
current_commercial_median_job_gender_hourly = commercial_hourly.groupby(['job_profile_current','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
job_profile_current,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
574504 - Senior Accounting Specialist,Female,10.0,30.06
565005 - Accounting Specialist,Male,5.0,27.18
565005 - Accounting Specialist,Female,7.0,26.04
470121 - Account Executive,Female,15.0,25.05
600318 - Circulation Driver (Class A),Male,34.0,22.53


In [409]:
current_commercial_median_job_race_salaried = commercial_salaried.groupby(['job_profile_current','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
job_profile_current,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
450220 - Sales Representative,White (United States of America),23.0,150780.0
350227 - Custom Content Writer,White (United States of America),6.0,100000.0
450120 - Account Manager,White (United States of America),15.0,90669.48
390110 - Multiplatform Editor,White (United States of America),8.0,88301.65
450120 - Account Manager,Black or African American (United States of America),7.0,85417.73
231303 - Client Service Manager,White (United States of America),14.0,65548.47


In [410]:
current_commercial_median_job_race_hourly = commercial_hourly.groupby(['job_profile_current','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
job_profile_current,race_ethnicity,Unnamed: 2_level_1,Unnamed: 3_level_1
574504 - Senior Accounting Specialist,Black or African American (United States of America),8.0,30.06
565005 - Accounting Specialist,Black or African American (United States of America),7.0,26.04
470121 - Account Executive,White (United States of America),5.0,25.36
470121 - Account Executive,Black or African American (United States of America),9.0,24.7
600318 - Circulation Driver (Class A),White (United States of America),7.0,22.98
600318 - Circulation Driver (Class A),Black or African American (United States of America),23.0,22.36


In [411]:
current_commercial_median_job_race_gender_salaried = commercial_salaried.groupby(['job_profile_current','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
job_profile_current,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
450220 - Sales Representative,White (United States of America),Male,5.0,155300.0
450220 - Sales Representative,White (United States of America),Female,18.0,149940.5
450120 - Account Manager,White (United States of America),Female,11.0,90110.0
231303 - Client Service Manager,White (United States of America),Female,12.0,66000.67


In [412]:
current_commercial_median_job_race_gender_hourly = commercial_hourly.groupby(['job_profile_current','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
job_profile_current,race_ethnicity,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
574504 - Senior Accounting Specialist,Black or African American (United States of America),Female,7.0,29.74
565005 - Accounting Specialist,Black or African American (United States of America),Female,5.0,26.04
470121 - Account Executive,Black or African American (United States of America),Female,9.0,24.7
600318 - Circulation Driver (Class A),White (United States of America),Male,7.0,22.98
600318 - Circulation Driver (Class A),Black or African American (United States of America),Male,22.0,22.39


In [413]:
current_commercial_median_job_race_group_gender_salaried = commercial_salaried.groupby(['desk','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_group_gender_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
desk,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
non-newsroom,white,Male,32.0,94496.71
non-newsroom,white,Female,67.0,86104.69
non-newsroom,person of color,Female,17.0,85000.0
non-newsroom,person of color,Male,15.0,76866.1


In [414]:
current_commercial_median_job_race_group_gender_hourly = commercial_hourly.groupby(['job_profile_current','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_group_gender_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero,median
job_profile_current,race_grouping,gender,Unnamed: 3_level_1,Unnamed: 4_level_1
574504 - Senior Accounting Specialist,person of color,Female,7.0,29.74
565005 - Accounting Specialist,person of color,Female,6.0,25.84
470121 - Account Executive,person of color,Female,11.0,24.7
600318 - Circulation Driver (Class A),white,Male,7.0,22.98
600318 - Circulation Driver (Class A),person of color,Male,26.0,22.39


In [415]:
current_commercial_median_job_race_gender_age5_salaried = commercial_salaried.groupby(['job_profile_current','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
job_profile_current,race_ethnicity,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
450220 - Sales Representative,White (United States of America),Female,35-39,8.0,149940.5
231303 - Client Service Manager,White (United States of America),Female,25-29,8.0,66212.61


In [416]:
current_commercial_median_job_race_gender_age5_hourly = commercial_hourly.groupby(['job_profile_current','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_gender_age5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
job_profile_current,race_ethnicity,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
600318 - Circulation Driver (Class A),Black or African American (United States of America),Male,60-64,6.0,23.8
600318 - Circulation Driver (Class A),Black or African American (United States of America),Male,45-49,7.0,21.51


In [417]:
current_commercial_median_job_race_group_gender_age5_salaried = commercial_salaried.groupby(['job_profile_current','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_group_gender_age5_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
job_profile_current,race_grouping,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
450220 - Sales Representative,white,Female,35-39,8.0,149940.5
231303 - Client Service Manager,white,Female,25-29,8.0,66212.61


In [418]:
current_commercial_median_job_race_group_gender_age5_hourly = commercial_hourly.groupby(['job_profile_current','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_group_gender_age5_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count_nonzero,median
job_profile_current,race_grouping,gender,age_group_5,Unnamed: 4_level_1,Unnamed: 5_level_1
600318 - Circulation Driver (Class A),person of color,Male,60-64,6.0,23.8
600318 - Circulation Driver (Class A),person of color,Male,45-49,7.0,21.51


### Performance evaluations

In [419]:
commercial_ratings = ratings_combined[ratings_combined['dept'] == "Commercial"]

In [420]:
commercial_ratings_gender = commercial_ratings.groupby(['gender']).agg({'performance_rating': [np.count_nonzero, np.median]})
commercial_ratings_gender

Unnamed: 0_level_0,performance_rating,performance_rating
Unnamed: 0_level_1,count_nonzero,median
gender,Unnamed: 1_level_2,Unnamed: 2_level_2
Female,1308.0,3.3
Male,984.0,3.2


In [421]:
commercial_ratings_race = commercial_ratings.groupby(['race_ethnicity']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress_median(commercial_ratings_race)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
Asian (United States of America),168.0,3.3
Two or More Races (United States of America),36.0,3.3
White (United States of America),1096.0,3.3
Black or African American (United States of America),860.0,3.2
Hispanic or Latino (United States of America),96.0,3.15
Prefer Not to Disclose (United States of America),28.0,3.0


In [422]:
commercial_ratings_race_gender = commercial_ratings.groupby(['race_ethnicity','gender']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(commercial_ratings_race_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
race_ethnicity,gender,Unnamed: 2_level_1,Unnamed: 3_level_1
Asian (United States of America),Female,116.0,3.3
Asian (United States of America),Male,52.0,3.1
Black or African American (United States of America),Female,408.0,3.2
Black or African American (United States of America),Male,452.0,3.05
Hispanic or Latino (United States of America),Female,56.0,3.15
Hispanic or Latino (United States of America),Male,40.0,3.1
Prefer Not to Disclose (United States of America),Female,16.0,3.0
Prefer Not to Disclose (United States of America),Male,12.0,
Two or More Races (United States of America),Female,20.0,3.3
Two or More Races (United States of America),Male,16.0,3.35


### Pay changes

In [423]:
commercial_change = reason_for_change_combined[reason_for_change_combined['dept'] == 'Commercial']

In [424]:
commercial_change_gender = commercial_change.groupby(['business_process_reason','gender']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(commercial_change_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
business_process_reason,gender,Unnamed: 2_level_1
Request Compensation Change > Adjustment > Contract Increase,Female,475
Request Compensation Change > Adjustment > Contract Increase,Male,354
Merit > Performance > Annual Performance Appraisal,Female,295
Merit > Performance > Annual Performance Appraisal,Male,228
Request Compensation Change > Adjustment > Change Plan Assignment,Female,198
Promotion > Promotion > Promotion,Female,144
Transfer > Transfer > Move to another Manager,Female,123
Transfer > Transfer > Move to another Manager,Male,114
Data Change > Data Change > Change Job Details,Female,85
Request Compensation Change > Adjustment > Change Plan Assignment,Male,85


In [425]:
commercial_change_race = commercial_change[commercial_change['business_process_reason'] == 'Merit > Performance > Annual Performance Appraisal'].groupby(['business_process_reason','race_ethnicity']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(commercial_change_race)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero
business_process_reason,race_ethnicity,Unnamed: 2_level_1
Merit > Performance > Annual Performance Appraisal,Black or African American (United States of America),239
Merit > Performance > Annual Performance Appraisal,White (United States of America),220
Merit > Performance > Annual Performance Appraisal,Asian (United States of America),36
Merit > Performance > Annual Performance Appraisal,Hispanic or Latino (United States of America),19


In [426]:
commercial_change_race_gender = commercial_change[commercial_change['business_process_reason'] == 'Merit > Performance > Annual Performance Appraisal'].groupby(['business_process_reason','race_ethnicity','gender']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(commercial_change_race_gender)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count_nonzero
business_process_reason,race_ethnicity,gender,Unnamed: 3_level_1
Merit > Performance > Annual Performance Appraisal,White (United States of America),Female,132
Merit > Performance > Annual Performance Appraisal,Black or African American (United States of America),Female,126
Merit > Performance > Annual Performance Appraisal,Black or African American (United States of America),Male,113
Merit > Performance > Annual Performance Appraisal,White (United States of America),Male,88
Merit > Performance > Annual Performance Appraisal,Asian (United States of America),Female,19
Merit > Performance > Annual Performance Appraisal,Asian (United States of America),Male,17
Merit > Performance > Annual Performance Appraisal,Hispanic or Latino (United States of America),Male,10
Merit > Performance > Annual Performance Appraisal,Hispanic or Latino (United States of America),Female,9


### Performance evaluations x merit raises

In [427]:
import re
reason_for_change_combined['merit_raises'] = reason_for_change_combined['business_process_reason'].str.contains('Merit', re.IGNORECASE)

In [428]:
twenty14 = np.datetime64('2016-04-01')
twenty15 = np.datetime64('2017-04-01')
twenty16 = np.datetime64('2018-04-01')
twenty17 = np.datetime64('2019-04-01')
twenty18 = np.datetime64('2020-04-01')

def raise_time(row):
    if row['effective_date'] < twenty14:
        return 'before 2015'
    if row['effective_date'] < twenty15:
        return '2015'
    if row['effective_date'] < twenty16:
        return '2016'
    if row['effective_date'] < twenty17:
        return '2017'
    if row['effective_date'] < twenty18:
        return '2018'
    return 'unknown'

reason_for_change_combined['raise_after'] = reason_for_change_combined.apply(lambda row: raise_time(row), axis=1)

In [429]:
merit_raises_commercial_gender_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['gender']).agg({'base_pay_change': [np.count_nonzero, np.median]})
merit_raises_commercial_gender_salaried

Unnamed: 0_level_0,base_pay_change,base_pay_change
Unnamed: 0_level_1,count_nonzero,median
gender,Unnamed: 1_level_2,Unnamed: 2_level_2
Female,97.0,1317.48
Male,74.0,1205.07


In [430]:
merit_raises_commercial_gender_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['gender']).agg({'base_pay_change': [np.count_nonzero, np.median]})
merit_raises_commercial_gender_hourly

Unnamed: 0_level_0,base_pay_change,base_pay_change
Unnamed: 0_level_1,count_nonzero,median
gender,Unnamed: 1_level_2,Unnamed: 2_level_2
Female,170.0,0.42
Male,138.0,0.33


In [431]:
merit_raises_commercial_race_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['race_ethnicity']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_commercial_race_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
Asian (United States of America),23.0,1375.0
Hispanic or Latino (United States of America),6.0,1321.85
White (United States of America),110.0,1286.88
Black or African American (United States of America),30.0,1117.12


In [432]:
merit_raises_commercial_race_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['race_ethnicity']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_commercial_race_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1
Asian (United States of America),11.0,0.45
White (United States of America),85.0,0.42
Hispanic or Latino (United States of America),11.0,0.37
Black or African American (United States of America),197.0,0.35


In [433]:
merit_raises_commercial_race_group_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_commercial_race_group_salaried)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
white,110.0,1286.88
person of color,60.0,1225.0


In [434]:
merit_raises_commercial_race_group_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_commercial_race_group_hourly)

Unnamed: 0_level_0,count_nonzero,median
race_grouping,Unnamed: 1_level_1,Unnamed: 2_level_1
white,85.0,0.42
person of color,223.0,0.35


In [435]:
merit_raises_commercial_gender_race_group_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_commercial_gender_race_group_salaried)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,white,69.0,1317.48
Female,person of color,27.0,1305.0
Male,white,41.0,1282.47
Male,person of color,33.0,1134.24


In [436]:
merit_raises_commercial_gender_race_group_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_commercial_gender_race_group_hourly)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,white,44.0,0.52
Female,person of color,126.0,0.38
Male,white,41.0,0.35
Male,person of color,97.0,0.32


In [437]:
fifteen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2015')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2015_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(fifteen_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,white,7.0,937.13
Male,white,5.0,850.75


In [438]:
fifteen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2015')].groupby(['gender','race_grouping']).agg({'2015_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(fifteen_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,white,7.0,3.5
Male,white,5.0,3.5


In [439]:
sixteen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2016')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2016_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(sixteen_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,5.0,1729.4
Female,white,9.0,1683.0
Male,person of color,6.0,1506.78
Male,white,7.0,1291.29


In [440]:
sixteen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2016')].groupby(['gender','race_grouping']).agg({'2016_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(sixteen_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,5.0,3.5
Female,white,9.0,3.4
Male,person of color,6.0,3.25
Male,white,7.0,3.2


In [441]:
seventeen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2017')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2017_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(seventeen_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,white,13.0,1398.48
Male,person of color,8.0,1000.0
Male,white,5.0,1414.6


In [442]:
seventeen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2017')].groupby(['gender','race_grouping']).agg({'2017_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(seventeen_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,white,13.0,3.3
Male,person of color,8.0,3.15
Male,white,5.0,3.4


In [443]:
eighteen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2018')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2018_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(eighteen_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,7.0,1415.6
Female,white,21.0,1668.88
Male,person of color,7.0,1050.0
Male,white,8.0,1417.48


In [444]:
eighteen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2018')].groupby(['gender','race_grouping']).agg({'2018_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(eighteen_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,7.0,3.4
Female,white,21.0,3.5
Male,person of color,7.0,3.3
Male,white,8.0,3.5


In [445]:
merit_raises_15 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2015') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_16 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2016') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_17 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2017') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_18 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2018') & (reason_for_change_combined['merit_raises'] == True)]

merit_raises_15 = merit_raises_15[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating']].rename(columns={'2015_annual_performance_rating':'performance_rating'})
merit_raises_16 = merit_raises_16[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2016_annual_performance_rating']].rename(columns={'2016_annual_performance_rating':'performance_rating'})
merit_raises_17 = merit_raises_17[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2017_annual_performance_rating']].rename(columns={'2017_annual_performance_rating':'performance_rating'})
merit_raises_18 = merit_raises_18[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2018_annual_performance_rating']].rename(columns={'2018_annual_performance_rating':'performance_rating'})

merit_raises_15 = pd.DataFrame(merit_raises_15)
merit_raises_16 = pd.DataFrame(merit_raises_16)
merit_raises_17 = pd.DataFrame(merit_raises_17)
merit_raises_18 = pd.DataFrame(merit_raises_18)

merit_raises_combined = pd.concat([merit_raises_15,merit_raises_16,merit_raises_17,merit_raises_18])

In [446]:
commercial_salaried_raises = merit_raises_combined[merit_raises_combined['pay_rate_type'] == 'Salaried'].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress(commercial_salaried_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,116.0,2812.5
Female,unknown,10.0,2860.0
Female,white,317.0,2500.0
Male,person of color,102.0,2310.0
Male,unknown,7.0,2500.0
Male,white,379.0,3000.0


In [447]:
commercial_salaried_raises_scores = merit_raises_combined[merit_raises_combined['pay_rate_type'] == 'Salaried'].groupby(['gender','race_grouping']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(commercial_salaried_raises_scores)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,116.0,3.4
Female,unknown,10.0,3.8
Female,white,317.0,3.5
Male,person of color,102.0,3.4
Male,unknown,7.0,3.7
Male,white,379.0,3.6


In [448]:
commercial_hourly_raises = merit_raises_combined[merit_raises_combined['pay_rate_type'] == 'Hourly'].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress(commercial_hourly_raises)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,120.0,0.43
Female,white,88.0,0.78
Male,person of color,108.0,0.35
Male,white,65.0,0.45


In [449]:
commercial_hourly_raises_scores = merit_raises_combined[merit_raises_combined['pay_rate_type'] == 'Hourly'].groupby(['gender','race_grouping']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(commercial_hourly_raises_scores)

Unnamed: 0_level_0,Unnamed: 1_level_0,count_nonzero,median
gender,race_grouping,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,person of color,120.0,3.3
Female,white,88.0,3.5
Male,person of color,108.0,3.2
Male,white,65.0,3.3


### Regression

In [450]:
commercial_salaried_regression = commercial_salaried[['department','gender','race_ethnicity','current_base_pay','job_profile_current','cost_center_current','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','age','years_of_service','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping']]
commercial_salaried_regression = pd.get_dummies(commercial_salaried_regression, columns=['gender','race_ethnicity','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping'])

In [451]:
commercial_salaried_regression = commercial_salaried_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over','tier_Tier 1':'tier_Tier_1','tier_Tier 2':'tier_Tier_2','tier_Tier 3':'tier_Tier_3','tier_Tier 4':'tier_Tier_4','years_of_service_grouped_0':'years_of_service_grouped_0','years_of_service_grouped_1-2':'years_of_service_grouped_1to2','years_of_service_grouped_3-5':'years_of_service_grouped_3to5','years_of_service_grouped_6-10':'years_of_service_grouped_6to10','years_of_service_grouped_11-15':'years_of_service_grouped_11to15','years_of_service_grouped_16-20':'years_of_service_grouped_16to20','years_of_service_grouped_21-25':'years_of_service_grouped_21to25','years_of_service_grouped_25+':'years_of_service_grouped_25_over'})
import statsmodels.formula.api as sm
model41 = sm.ols(data=commercial_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male')
result41 = model41.fit()
result41.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.001
Model:,OLS,Adj. R-squared:,-0.007
Method:,Least Squares,F-statistic:,0.07662
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.782
Time:,12:09:40,Log-Likelihood:,-1577.9
No. Observations:,133,AIC:,3160.0
Df Residuals:,131,BIC:,3166.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,6.382e+04,2093.898,30.480,0.000,5.97e+04,6.8e+04
gender_Female,3.278e+04,3005.419,10.907,0.000,2.68e+04,3.87e+04
gender_Male,3.104e+04,3590.196,8.646,0.000,2.39e+04,3.81e+04

0,1,2,3
Omnibus:,30.714,Durbin-Watson:,1.641
Prob(Omnibus):,0.0,Jarque-Bera (JB):,42.867
Skew:,1.285,Prob(JB):,4.92e-10
Kurtosis:,4.064,Cond. No.,3620000000000000.0


In [452]:
model42 = sm.ols(data=commercial_salaried_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color')
result42 = model42.fit()
result42.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.025
Model:,OLS,Adj. R-squared:,0.01
Method:,Least Squares,F-statistic:,1.645
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.197
Time:,12:09:40,Log-Likelihood:,-1576.3
No. Observations:,133,AIC:,3159.0
Df Residuals:,130,BIC:,3167.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,7.84e+04,2.43e+04,3.229,0.002,3.04e+04,1.26e+05
race_grouping_white,2.068e+04,2.45e+04,0.843,0.401,-2.78e+04,6.92e+04
race_grouping_person_of_color,9089.4666,2.5e+04,0.363,0.717,-4.04e+04,5.86e+04

0,1,2,3
Omnibus:,28.825,Durbin-Watson:,1.642
Prob(Omnibus):,0.0,Jarque-Bera (JB):,39.096
Skew:,1.238,Prob(JB):,3.24e-09
Kurtosis:,3.964,Cond. No.,18.2


In [453]:
model43 = sm.ols(data=commercial_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result43 = model43.fit()
result43.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.025
Model:,OLS,Adj. R-squared:,0.002
Method:,Least Squares,F-statistic:,1.094
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.354
Time:,12:09:40,Log-Likelihood:,-1576.3
No. Observations:,133,AIC:,3161.0
Df Residuals:,129,BIC:,3172.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,5.199e+04,1.64e+04,3.173,0.002,1.96e+04,8.44e+04
gender_Female,2.641e+04,8394.824,3.146,0.002,9802.570,4.3e+04
gender_Male,2.558e+04,9156.599,2.794,0.006,7463.076,4.37e+04
race_grouping_white,2.095e+04,2.47e+04,0.848,0.398,-2.79e+04,6.98e+04
race_grouping_person_of_color,9479.6077,2.53e+04,0.375,0.709,-4.06e+04,5.95e+04

0,1,2,3
Omnibus:,28.76,Durbin-Watson:,1.64
Prob(Omnibus):,0.0,Jarque-Bera (JB):,38.975
Skew:,1.234,Prob(JB):,3.44e-09
Kurtosis:,3.969,Cond. No.,4640000000000000.0


In [454]:
new_commercial_salaried_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1], 'age': [40,40,40,40]})
new_commercial_salaried_regression['predicted'] = result43.predict(new_commercial_salaried_regression)
new_commercial_salaried_regression

Unnamed: 0,gender_Female,gender_Male,race_grouping_white,race_grouping_person_of_color,age,predicted
0,1,0,1,0,40,99356.99
1,0,1,1,0,40,98524.69
2,1,0,0,1,40,87883.11
3,0,1,0,1,40,87050.81


In [455]:
model44 = sm.ols(data=commercial_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result44 = model44.fit()
result44.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.286
Model:,OLS,Adj. R-squared:,0.227
Method:,Least Squares,F-statistic:,4.882
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,6.47e-06
Time:,12:09:40,Log-Likelihood:,-1555.5
No. Observations:,133,AIC:,3133.0
Df Residuals:,122,BIC:,3165.0
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,6.157e+04,2123.471,28.997,0.000,5.74e+04,6.58e+04
gender_Female,3.556e+04,3023.460,11.762,0.000,2.96e+04,4.15e+04
gender_Male,2.601e+04,3274.551,7.944,0.000,1.95e+04,3.25e+04
age_group_5_25_under,-3.072e+04,9309.126,-3.300,0.001,-4.91e+04,-1.23e+04
age_group_5_25to29,-1.766e+04,5809.577,-3.040,0.003,-2.92e+04,-6162.194
age_group_5_30to34,2.149e+04,7531.035,2.853,0.005,6579.270,3.64e+04
age_group_5_35to39,2.277e+04,7189.104,3.168,0.002,8540.680,3.7e+04
age_group_5_40to44,2.951e+04,9833.731,3.001,0.003,1e+04,4.9e+04
age_group_5_45to49,9655.6318,8217.596,1.175,0.242,-6611.919,2.59e+04

0,1,2,3
Omnibus:,14.188,Durbin-Watson:,1.771
Prob(Omnibus):,0.001,Jarque-Bera (JB):,15.735
Skew:,0.72,Prob(JB):,0.000383
Kurtosis:,3.874,Cond. No.,9780000000000000.0


In [456]:
model45 = sm.ols(data=commercial_salaried_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result45 = model45.fit()
result45.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.335
Model:,OLS,Adj. R-squared:,0.275
Method:,Least Squares,F-statistic:,5.549
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,3.83e-07
Time:,12:09:41,Log-Likelihood:,-1550.8
No. Observations:,133,AIC:,3126.0
Df Residuals:,121,BIC:,3160.0
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,5.016e+04,1.97e+04,2.553,0.012,1.13e+04,8.91e+04
race_grouping_white,4.933e+04,2.18e+04,2.264,0.025,6197.218,9.25e+04
race_grouping_person_of_color,3.255e+04,2.23e+04,1.462,0.146,-1.15e+04,7.66e+04
age_group_5_25_under,-3.33e+04,9266.922,-3.594,0.000,-5.16e+04,-1.5e+04
age_group_5_25to29,-1.83e+04,5876.870,-3.114,0.002,-2.99e+04,-6663.337
age_group_5_30to34,2.118e+04,7351.305,2.882,0.005,6630.956,3.57e+04
age_group_5_35to39,2.03e+04,7310.811,2.777,0.006,5830.557,3.48e+04
age_group_5_40to44,3.53e+04,9345.043,3.778,0.000,1.68e+04,5.38e+04
age_group_5_45to49,1.064e+04,8367.434,1.271,0.206,-5926.784,2.72e+04

0,1,2,3
Omnibus:,10.496,Durbin-Watson:,1.847
Prob(Omnibus):,0.005,Jarque-Bera (JB):,10.654
Skew:,0.624,Prob(JB):,0.00486
Kurtosis:,3.606,Cond. No.,8320000000000000.0


In [457]:
model46 = sm.ols(data=commercial_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result46 = model46.fit()
result46.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.35
Model:,OLS,Adj. R-squared:,0.285
Method:,Least Squares,F-statistic:,5.377
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,3.1e-07
Time:,12:09:41,Log-Likelihood:,-1549.3
No. Observations:,133,AIC:,3125.0
Df Residuals:,120,BIC:,3162.0
Df Model:,12,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,3.231e+04,1.35e+04,2.396,0.018,5616.362,5.9e+04
gender_Female,2.084e+04,7063.061,2.950,0.004,6853.164,3.48e+04
gender_Male,1.148e+04,7585.687,1.513,0.133,-3541.955,2.65e+04
race_grouping_white,5.196e+04,2.17e+04,2.394,0.018,8994.931,9.49e+04
race_grouping_person_of_color,3.599e+04,2.22e+04,1.620,0.108,-7990.410,8e+04
age_group_5_25_under,-3.713e+04,9182.537,-4.044,0.000,-5.53e+04,-1.9e+04
age_group_5_25to29,-2.248e+04,5886.452,-3.819,0.000,-3.41e+04,-1.08e+04
age_group_5_30to34,1.967e+04,7265.155,2.707,0.008,5285.087,3.41e+04
age_group_5_35to39,1.914e+04,7117.210,2.689,0.008,5044.537,3.32e+04

0,1,2,3
Omnibus:,11.57,Durbin-Watson:,1.829
Prob(Omnibus):,0.003,Jarque-Bera (JB):,12.088
Skew:,0.647,Prob(JB):,0.00237
Kurtosis:,3.712,Cond. No.,1.28e+16


In [458]:
merit_raises_combined_salaried_regression = merit_raises_combined[(merit_raises_combined['dept'] == 'Commercial') & (merit_raises_combined['pay_rate_type'] == 'Salaried')]
merit_raises_combined_salaried_regression = pd.get_dummies(merit_raises_combined_salaried_regression, columns=['gender','race_grouping','age_group_5'])

In [459]:
merit_raises_combined_salaried_regression = merit_raises_combined_salaried_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over'})
model47 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male')
result47 = model47.fit()
result47.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.022
Model:,OLS,Adj. R-squared:,0.014
Method:,Least Squares,F-statistic:,2.664
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.105
Time:,12:09:41,Log-Likelihood:,-999.84
No. Observations:,120,AIC:,2004.0
Df Residuals:,118,BIC:,2009.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1002.1929,62.763,15.968,0.000,877.905,1126.480
gender_Female,654.7617,93.620,6.994,0.000,469.369,840.154
gender_Male,347.4312,104.552,3.323,0.001,140.389,554.473

0,1,2,3
Omnibus:,63.911,Durbin-Watson:,1.948
Prob(Omnibus):,0.0,Jarque-Bera (JB):,203.112
Skew:,2.035,Prob(JB):,7.85e-45
Kurtosis:,7.905,Cond. No.,3590000000000000.0


In [460]:
model48 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color')
result48 = model48.fit()
result48.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.005
Model:,OLS,Adj. R-squared:,-0.012
Method:,Least Squares,F-statistic:,0.3188
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.728
Time:,12:09:41,Log-Likelihood:,-1000.9
No. Observations:,120,AIC:,2008.0
Df Residuals:,117,BIC:,2016.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1400.0000,1026.778,1.363,0.175,-633.479,3433.479
race_grouping_white,189.3775,1033.600,0.183,0.855,-1857.613,2236.368
race_grouping_person_of_color,35.7284,1038.380,0.034,0.973,-2020.729,2092.186

0,1,2,3
Omnibus:,66.033,Durbin-Watson:,1.921
Prob(Omnibus):,0.0,Jarque-Bera (JB):,218.59
Skew:,2.092,Prob(JB):,3.4199999999999996e-48
Kurtosis:,8.12,Cond. No.,23.6


In [461]:
model49 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result49 = model49.fit()
result49.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.024
Model:,OLS,Adj. R-squared:,-0.001
Method:,Least Squares,F-statistic:,0.9677
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.411
Time:,12:09:41,Log-Likelihood:,-999.7
No. Observations:,120,AIC:,2007.0
Df Residuals:,116,BIC:,2019.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,835.9007,683.945,1.222,0.224,-518.738,2190.540
gender_Female,564.0993,346.551,1.628,0.106,-122.288,1250.486
gender_Male,271.8013,364.288,0.746,0.457,-449.716,993.319
race_grouping_white,286.8101,1030.126,0.278,0.781,-1753.485,2327.105
race_grouping_person_of_color,195.1637,1038.272,0.188,0.851,-1861.266,2251.593

0,1,2,3
Omnibus:,62.985,Durbin-Watson:,1.956
Prob(Omnibus):,0.0,Jarque-Bera (JB):,197.899
Skew:,2.005,Prob(JB):,1.0600000000000001e-43
Kurtosis:,7.847,Cond. No.,4190000000000000.0


In [462]:
new_reason_for_change_combined_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1]})
new_reason_for_change_combined_regression['predicted'] = result49.predict(new_reason_for_change_combined_regression)
new_reason_for_change_combined_regression

Unnamed: 0,gender_Female,gender_Male,race_grouping_white,race_grouping_person_of_color,predicted
0,1,0,1,0,1686.81
1,0,1,1,0,1394.51
2,1,0,0,1,1595.16
3,0,1,0,1,1302.87


In [463]:
model50 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result50 = model50.fit()
result50.summary()

  return np.sqrt(eigvals[0]/eigvals[-1])
  return self.params / self.bse
  return (self.a < x) & (x < self.b)
  return (self.a < x) & (x < self.b)
  cond2 = cond0 & (x <= self.a)


0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.107
Model:,OLS,Adj. R-squared:,0.034
Method:,Least Squares,F-statistic:,1.463
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.171
Time:,12:09:41,Log-Likelihood:,-994.4
No. Observations:,120,AIC:,2009.0
Df Residuals:,110,BIC:,2037.0
Df Model:,9,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,950.6651,89.652,10.604,0.000,772.996,1128.334
gender_Female,634.1486,117.932,5.377,0.000,400.435,867.862
gender_Male,316.5165,127.637,2.480,0.015,63.570,569.463
age_group_5_25_under,48.4184,912.882,0.053,0.958,-1760.699,1857.536
age_group_5_25to29,253.1740,238.716,1.061,0.291,-219.905,726.253
age_group_5_30to34,-206.1156,256.682,-0.803,0.424,-714.800,302.568
age_group_5_35to39,477.0562,269.916,1.767,0.080,-57.855,1011.967
age_group_5_40to44,710.8937,356.966,1.991,0.049,3.470,1418.318
age_group_5_45to49,-108.7731,226.626,-0.480,0.632,-557.893,340.347

0,1,2,3
Omnibus:,49.406,Durbin-Watson:,2.042
Prob(Omnibus):,0.0,Jarque-Bera (JB):,118.315
Skew:,1.654,Prob(JB):,2.03e-26
Kurtosis:,6.566,Cond. No.,inf


In [464]:
model51 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result51 = model51.fit()
result51.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.103
Model:,OLS,Adj. R-squared:,0.021
Method:,Least Squares,F-statistic:,1.25
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.268
Time:,12:09:41,Log-Likelihood:,-994.67
No. Observations:,120,AIC:,2011.0
Df Residuals:,109,BIC:,2042.0
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1096.4061,936.124,1.171,0.244,-758.962,2951.774
race_grouping_white,412.0134,1036.127,0.398,0.692,-1641.556,2465.583
race_grouping_person_of_color,174.9337,1048.709,0.167,0.868,-1903.573,2253.441
age_group_5_25_under,-192.8195,921.509,-0.209,0.835,-2019.220,1633.581
age_group_5_25to29,303.5939,244.359,1.242,0.217,-180.717,787.905
age_group_5_30to34,-103.6335,272.182,-0.381,0.704,-643.090,435.823
age_group_5_35to39,445.2756,286.369,1.555,0.123,-122.299,1012.850
age_group_5_40to44,876.1929,361.933,2.421,0.017,158.854,1593.532
age_group_5_45to49,-52.8984,257.357,-0.206,0.838,-562.972,457.175

0,1,2,3
Omnibus:,51.129,Durbin-Watson:,2.056
Prob(Omnibus):,0.0,Jarque-Bera (JB):,130.601
Skew:,1.678,Prob(JB):,4.37e-29
Kurtosis:,6.855,Cond. No.,inf


In [465]:
model52 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result52 = model52.fit()
result52.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.115
Model:,OLS,Adj. R-squared:,0.025
Method:,Least Squares,F-statistic:,1.273
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.25
Time:,12:09:41,Log-Likelihood:,-993.87
No. Observations:,120,AIC:,2012.0
Df Residuals:,108,BIC:,2045.0
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,687.2307,646.791,1.063,0.290,-594.822,1969.284
gender_Female,485.3983,334.200,1.452,0.149,-177.044,1147.841
gender_Male,201.8324,353.734,0.571,0.569,-499.329,902.994
race_grouping_white,486.8437,1035.835,0.470,0.639,-1566.362,2540.049
race_grouping_person_of_color,300.9616,1051.736,0.286,0.775,-1783.762,2385.685
age_group_5_25_under,-60.3067,924.108,-0.065,0.948,-1892.049,1771.436
age_group_5_25to29,227.3710,242.088,0.939,0.350,-252.488,707.230
age_group_5_30to34,-224.4915,270.950,-0.829,0.409,-761.561,312.578
age_group_5_35to39,494.6194,285.778,1.731,0.086,-71.842,1061.081

0,1,2,3
Omnibus:,47.823,Durbin-Watson:,2.064
Prob(Omnibus):,0.0,Jarque-Bera (JB):,113.684
Skew:,1.598,Prob(JB):,2.06e-25
Kurtosis:,6.54,Cond. No.,inf


In [466]:
model53 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male')
result53 = model53.fit()
result53.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.006
Model:,OLS,Adj. R-squared:,-0.002
Method:,Least Squares,F-statistic:,0.7373
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.392
Time:,12:09:41,Log-Likelihood:,-31.55
No. Observations:,118,AIC:,67.1
Df Residuals:,116,BIC:,72.64
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.2810,0.020,114.520,0.000,2.242,2.320
gender_Female,1.1662,0.030,39.292,0.000,1.107,1.225
gender_Male,1.1148,0.033,33.572,0.000,1.049,1.181

0,1,2,3
Omnibus:,5.156,Durbin-Watson:,1.775
Prob(Omnibus):,0.076,Jarque-Bera (JB):,5.147
Skew:,0.509,Prob(JB):,0.0763
Kurtosis:,2.899,Cond. No.,8490000000000000.0


In [467]:
model54 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color')
result54 = model54.fit()
result54.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.001
Model:,OLS,Adj. R-squared:,-0.016
Method:,Least Squares,F-statistic:,0.07628
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.927
Time:,12:09:42,Log-Likelihood:,-31.846
No. Observations:,118,AIC:,69.69
Df Residuals:,115,BIC:,78.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,3.4000,0.321,10.591,0.000,2.764,4.036
race_grouping_white,0.0351,0.323,0.109,0.914,-0.605,0.675
race_grouping_person_of_color,0.0116,0.325,0.036,0.971,-0.632,0.655

0,1,2,3
Omnibus:,5.821,Durbin-Watson:,1.789
Prob(Omnibus):,0.054,Jarque-Bera (JB):,5.895
Skew:,0.544,Prob(JB):,0.0525
Kurtosis:,2.885,Cond. No.,23.4


In [468]:
model55 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result55 = model55.fit()
result55.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.007
Model:,OLS,Adj. R-squared:,-0.019
Method:,Least Squares,F-statistic:,0.2609
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.853
Time:,12:09:42,Log-Likelihood:,-31.52
No. Observations:,118,AIC:,71.04
Df Residuals:,114,BIC:,82.12
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.2502,0.215,10.448,0.000,1.824,2.677
gender_Female,1.1498,0.109,10.532,0.000,0.934,1.366
gender_Male,1.1005,0.115,9.577,0.000,0.873,1.328
race_grouping_white,0.0511,0.324,0.158,0.875,-0.591,0.694
race_grouping_person_of_color,0.0391,0.327,0.120,0.905,-0.609,0.687

0,1,2,3
Omnibus:,5.075,Durbin-Watson:,1.776
Prob(Omnibus):,0.079,Jarque-Bera (JB):,5.057
Skew:,0.505,Prob(JB):,0.0798
Kurtosis:,2.901,Cond. No.,1.02e+16


In [469]:
model56 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result56 = model56.fit()
result56.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.12
Model:,OLS,Adj. R-squared:,0.046
Method:,Least Squares,F-statistic:,1.629
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.116
Time:,12:09:42,Log-Likelihood:,-24.413
No. Observations:,118,AIC:,68.83
Df Residuals:,108,BIC:,96.53
Df Model:,9,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.1358,0.028,76.644,0.000,2.081,2.191
gender_Female,1.0716,0.037,28.823,0.000,0.998,1.145
gender_Male,1.0643,0.040,26.328,0.000,0.984,1.144
age_group_5_25_under,0.0999,0.283,0.353,0.725,-0.461,0.661
age_group_5_25to29,0.1698,0.075,2.260,0.026,0.021,0.319
age_group_5_30to34,0.1758,0.082,2.141,0.035,0.013,0.339
age_group_5_35to39,0.2692,0.084,3.205,0.002,0.103,0.436
age_group_5_40to44,0.2676,0.111,2.415,0.017,0.048,0.487
age_group_5_45to49,0.1212,0.070,1.724,0.088,-0.018,0.261

0,1,2,3
Omnibus:,8.372,Durbin-Watson:,1.928
Prob(Omnibus):,0.015,Jarque-Bera (JB):,8.098
Skew:,0.609,Prob(JB):,0.0174
Kurtosis:,3.404,Cond. No.,inf


In [470]:
model57 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result57 = model57.fit()
result57.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.12
Model:,OLS,Adj. R-squared:,0.038
Method:,Least Squares,F-statistic:,1.463
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.163
Time:,12:09:42,Log-Likelihood:,-24.362
No. Observations:,118,AIC:,70.72
Df Residuals:,107,BIC:,101.2
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,3.1237,0.290,10.780,0.000,2.549,3.698
race_grouping_white,-0.0231,0.321,-0.072,0.943,-0.659,0.613
race_grouping_person_of_color,-0.0434,0.325,-0.133,0.894,-0.688,0.601
age_group_5_25_under,0.1994,0.285,0.700,0.486,-0.365,0.764
age_group_5_25to29,0.2763,0.077,3.593,0.000,0.124,0.429
age_group_5_30to34,0.2889,0.086,3.367,0.001,0.119,0.459
age_group_5_35to39,0.3831,0.089,4.313,0.000,0.207,0.559
age_group_5_40to44,0.3820,0.112,3.411,0.001,0.160,0.604
age_group_5_45to49,0.2362,0.080,2.959,0.004,0.078,0.394

0,1,2,3
Omnibus:,8.442,Durbin-Watson:,1.937
Prob(Omnibus):,0.015,Jarque-Bera (JB):,8.19
Skew:,0.616,Prob(JB):,0.0167
Kurtosis:,3.387,Cond. No.,inf


In [471]:
model58 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result58 = model58.fit()
result58.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.12
Model:,OLS,Adj. R-squared:,0.029
Method:,Least Squares,F-statistic:,1.318
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.225
Time:,12:09:42,Log-Likelihood:,-24.362
No. Observations:,118,AIC:,72.72
Df Residuals:,106,BIC:,106.0
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.1538,0.202,10.675,0.000,1.754,2.554
gender_Female,1.0778,0.104,10.346,0.000,0.871,1.284
gender_Male,1.0760,0.111,9.684,0.000,0.856,1.296
race_grouping_white,-0.0226,0.323,-0.070,0.944,-0.663,0.618
race_grouping_person_of_color,-0.0425,0.329,-0.129,0.897,-0.695,0.610
age_group_5_25_under,0.0927,0.288,0.322,0.748,-0.478,0.664
age_group_5_25to29,0.1683,0.077,2.198,0.030,0.017,0.320
age_group_5_30to34,0.1805,0.087,2.070,0.041,0.008,0.354
age_group_5_35to39,0.2759,0.089,3.093,0.003,0.099,0.453

0,1,2,3
Omnibus:,8.393,Durbin-Watson:,1.936
Prob(Omnibus):,0.015,Jarque-Bera (JB):,8.134
Skew:,0.614,Prob(JB):,0.0171
Kurtosis:,3.385,Cond. No.,inf


In [472]:
commercial_hourly_regression = commercial_hourly[['department','gender','race_ethnicity','current_base_pay','job_profile_current','cost_center_current','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','age','years_of_service','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping']]
commercial_hourly_regression = pd.get_dummies(commercial_hourly_regression, columns=['gender','race_ethnicity','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping'])

In [473]:
commercial_hourly_regression = commercial_hourly_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over','tier_Tier 1':'tier_Tier_1','tier_Tier 2':'tier_Tier_2','tier_Tier 3':'tier_Tier_3','tier_Tier 4':'tier_Tier_4','years_of_service_grouped_0':'years_of_service_grouped_0','years_of_service_grouped_1-2':'years_of_service_grouped_1to2','years_of_service_grouped_3-5':'years_of_service_grouped_3to5','years_of_service_grouped_6-10':'years_of_service_grouped_6to10','years_of_service_grouped_11-15':'years_of_service_grouped_11to15','years_of_service_grouped_16-20':'years_of_service_grouped_16to20','years_of_service_grouped_21-25':'years_of_service_grouped_21to25','years_of_service_grouped_25+':'years_of_service_grouped_25_over'})
import statsmodels.formula.api as sm
model59 = sm.ols(data=commercial_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male')
result59 = model59.fit()
result59.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.085
Model:,OLS,Adj. R-squared:,0.078
Method:,Least Squares,F-statistic:,13.41
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.00035
Time:,12:09:42,Log-Likelihood:,-482.21
No. Observations:,147,AIC:,968.4
Df Residuals:,145,BIC:,974.4
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,18.4963,0.356,51.935,0.000,17.792,19.200
gender_Female,11.2044,0.562,19.938,0.000,10.094,12.315
gender_Male,7.2918,0.564,12.923,0.000,6.177,8.407

0,1,2,3
Omnibus:,47.415,Durbin-Watson:,1.17
Prob(Omnibus):,0.0,Jarque-Bera (JB):,107.307
Skew:,1.371,Prob(JB):,5e-24
Kurtosis:,6.162,Cond. No.,7580000000000000.0


In [474]:
model60 = sm.ols(data=commercial_hourly_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color')
result60 = model60.fit()
result60.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.105
Model:,OLS,Adj. R-squared:,0.093
Method:,Least Squares,F-statistic:,8.479
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.00033
Time:,12:09:42,Log-Likelihood:,-480.53
No. Observations:,147,AIC:,967.1
Df Residuals:,144,BIC:,976.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,22.1133,3.710,5.961,0.000,14.781,29.446
race_grouping_white,8.8969,3.837,2.319,0.022,1.313,16.481
race_grouping_person_of_color,4.4273,3.764,1.176,0.241,-3.013,11.868

0,1,2,3
Omnibus:,41.707,Durbin-Watson:,1.138
Prob(Omnibus):,0.0,Jarque-Bera (JB):,82.415
Skew:,1.27,Prob(JB):,1.27e-18
Kurtosis:,5.647,Cond. No.,15.4


In [475]:
model61 = sm.ols(data=commercial_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result61 = model61.fit()
result61.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.182
Model:,OLS,Adj. R-squared:,0.165
Method:,Least Squares,F-statistic:,10.62
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,2.4e-06
Time:,12:09:42,Log-Likelihood:,-473.93
No. Observations:,147,AIC:,955.9
Df Residuals:,143,BIC:,967.8
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,15.9980,2.397,6.673,0.000,11.259,20.737
gender_Female,9.8826,1.370,7.213,0.000,7.174,12.591
gender_Male,6.1154,1.235,4.952,0.000,3.674,8.556
race_grouping_white,6.9695,3.719,1.874,0.063,-0.381,14.320
race_grouping_person_of_color,2.4877,3.650,0.682,0.497,-4.728,9.703

0,1,2,3
Omnibus:,39.108,Durbin-Watson:,1.309
Prob(Omnibus):,0.0,Jarque-Bera (JB):,72.374
Skew:,1.226,Prob(JB):,1.92e-16
Kurtosis:,5.41,Cond. No.,6490000000000000.0


In [476]:
new_commercial_hourly_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1], 'age': [40,40,40,40]})
new_commercial_hourly_regression['predicted'] = result61.predict(new_commercial_hourly_regression)
new_commercial_hourly_regression

Unnamed: 0,gender_Female,gender_Male,race_grouping_white,race_grouping_person_of_color,age,predicted
0,1,0,1,0,40,32.85
1,0,1,1,0,40,29.08
2,1,0,0,1,40,28.37
3,0,1,0,1,40,24.6


In [477]:
model62 = sm.ols(data=commercial_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result62 = model62.fit()
result62.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.173
Model:,OLS,Adj. R-squared:,0.113
Method:,Least Squares,F-statistic:,2.851
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.00298
Time:,12:09:42,Log-Likelihood:,-474.72
No. Observations:,147,AIC:,971.4
Df Residuals:,136,BIC:,1004.0
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,17.3253,0.339,51.121,0.000,16.655,17.995
gender_Female,10.5396,0.569,18.510,0.000,9.414,11.666
gender_Male,6.7857,0.568,11.943,0.000,5.662,7.909
age_group_5_25_under,0.1649,1.806,0.091,0.927,-3.407,3.737
age_group_5_25to29,3.3425,1.331,2.510,0.013,0.710,5.975
age_group_5_30to34,3.3398,1.985,1.683,0.095,-0.585,7.264
age_group_5_35to39,6.3753,1.673,3.811,0.000,3.067,9.683
age_group_5_40to44,1.1498,1.497,0.768,0.444,-1.810,4.109
age_group_5_45to49,2.7692,1.482,1.868,0.064,-0.162,5.701

0,1,2,3
Omnibus:,38.981,Durbin-Watson:,1.332
Prob(Omnibus):,0.0,Jarque-Bera (JB):,78.932
Skew:,1.17,Prob(JB):,7.25e-18
Kurtosis:,5.723,Cond. No.,1.06e+16


In [478]:
model63 = sm.ols(data=commercial_hourly_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result63 = model63.fit()
result63.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.204
Model:,OLS,Adj. R-squared:,0.139
Method:,Least Squares,F-statistic:,3.136
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.000847
Time:,12:09:43,Log-Likelihood:,-471.98
No. Observations:,147,AIC:,968.0
Df Residuals:,135,BIC:,1004.0
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,18.1569,3.366,5.395,0.000,11.501,24.813
race_grouping_white,10.8972,3.802,2.866,0.005,3.378,18.417
race_grouping_person_of_color,6.6583,3.758,1.772,0.079,-0.773,14.090
age_group_5_25_under,-0.7156,1.822,-0.393,0.695,-4.318,2.887
age_group_5_25to29,3.2103,1.348,2.382,0.019,0.545,5.875
age_group_5_30to34,4.6672,2.003,2.330,0.021,0.706,8.628
age_group_5_35to39,5.6706,1.664,3.408,0.001,2.380,8.962
age_group_5_40to44,2.9884,1.481,2.019,0.046,0.060,5.916
age_group_5_45to49,2.8627,1.518,1.886,0.061,-0.140,5.865

0,1,2,3
Omnibus:,34.622,Durbin-Watson:,1.28
Prob(Omnibus):,0.0,Jarque-Bera (JB):,62.556
Skew:,1.095,Prob(JB):,2.61e-14
Kurtosis:,5.328,Cond. No.,8180000000000000.0


In [479]:
model64 = sm.ols(data=commercial_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result64 = model64.fit()
result64.summary()

0,1,2,3
Dep. Variable:,current_base_pay,R-squared:,0.263
Model:,OLS,Adj. R-squared:,0.196
Method:,Least Squares,F-statistic:,3.975
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,2.99e-05
Time:,12:09:43,Log-Likelihood:,-466.32
No. Observations:,147,AIC:,958.6
Df Residuals:,134,BIC:,997.5
Df Model:,12,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,13.7789,2.270,6.071,0.000,9.290,18.268
gender_Female,8.6081,1.331,6.469,0.000,5.976,11.240
gender_Male,5.1708,1.164,4.441,0.000,2.868,7.473
race_grouping_white,8.7881,3.728,2.357,0.020,1.414,16.162
race_grouping_person_of_color,4.5545,3.686,1.236,0.219,-2.735,11.844
age_group_5_25_under,-0.6527,1.749,-0.373,0.710,-4.113,2.807
age_group_5_25to29,2.2409,1.296,1.729,0.086,-0.323,4.805
age_group_5_30to34,3.6872,1.918,1.922,0.057,-0.107,7.481
age_group_5_35to39,5.5001,1.608,3.421,0.001,2.320,8.680

0,1,2,3
Omnibus:,29.883,Durbin-Watson:,1.446
Prob(Omnibus):,0.0,Jarque-Bera (JB):,47.654
Skew:,1.013,Prob(JB):,4.49e-11
Kurtosis:,4.917,Cond. No.,1.31e+16


In [480]:
merit_raises_combined_hourly_regression = merit_raises_combined[(merit_raises_combined['dept'] == 'Commercial') & (merit_raises_combined['pay_rate_type'] == 'Hourly')]
merit_raises_combined_hourly_regression = pd.get_dummies(merit_raises_combined_hourly_regression, columns=['gender','race_grouping','age_group_5'])

In [481]:
merit_raises_combined_hourly_regression = merit_raises_combined_hourly_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over'})
model65 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male')
result65 = model65.fit()
result65.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.064
Model:,OLS,Adj. R-squared:,0.06
Method:,Least Squares,F-statistic:,17.78
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,3.43e-05
Time:,12:09:43,Log-Likelihood:,35.988
No. Observations:,262,AIC:,-67.98
Df Residuals:,260,BIC:,-60.84
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.2686,0.009,30.779,0.000,0.251,0.286
gender_Female,0.1895,0.014,13.893,0.000,0.163,0.216
gender_Male,0.0791,0.014,5.668,0.000,0.052,0.107

0,1,2,3
Omnibus:,112.425,Durbin-Watson:,1.742
Prob(Omnibus):,0.0,Jarque-Bera (JB):,440.219
Skew:,1.802,Prob(JB):,2.56e-96
Kurtosis:,8.229,Cond. No.,1.09e+16


In [482]:
model66 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color')
result66 = model66.fit()
result66.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.032
Model:,OLS,Adj. R-squared:,0.029
Method:,Least Squares,F-statistic:,8.727
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.00342
Time:,12:09:43,Log-Likelihood:,31.648
No. Observations:,262,AIC:,-59.3
Df Residuals:,260,BIC:,-52.16
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.2835,0.010,28.425,0.000,0.264,0.303
race_grouping_white,0.1859,0.018,10.443,0.000,0.151,0.221
race_grouping_person_of_color,0.0976,0.013,7.264,0.000,0.071,0.124

0,1,2,3
Omnibus:,109.028,Durbin-Watson:,1.787
Prob(Omnibus):,0.0,Jarque-Bera (JB):,384.134
Skew:,1.791,Prob(JB):,3.86e-84
Kurtosis:,7.729,Cond. No.,3950000000000000.0


In [483]:
model67 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result67 = model67.fit()
result67.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.101
Model:,OLS,Adj. R-squared:,0.094
Method:,Least Squares,F-statistic:,14.58
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,9.99e-07
Time:,12:09:43,Log-Likelihood:,41.3
No. Observations:,262,AIC:,-76.6
Df Residuals:,259,BIC:,-65.9
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.2123,0.007,29.381,0.000,0.198,0.226
gender_Female,0.1634,0.013,12.262,0.000,0.137,0.190
gender_Male,0.0489,0.013,3.645,0.000,0.022,0.075
race_grouping_white,0.1535,0.016,9.340,0.000,0.121,0.186
race_grouping_person_of_color,0.0588,0.013,4.449,0.000,0.033,0.085

0,1,2,3
Omnibus:,98.49,Durbin-Watson:,1.814
Prob(Omnibus):,0.0,Jarque-Bera (JB):,319.209
Skew:,1.632,Prob(JB):,4.84e-70
Kurtosis:,7.311,Cond. No.,1.38e+16


In [484]:
new_reason_for_change_combined_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1]})
new_reason_for_change_combined_regression['predicted'] = result67.predict(new_reason_for_change_combined_regression)
new_reason_for_change_combined_regression

Unnamed: 0,gender_Female,gender_Male,race_grouping_white,race_grouping_person_of_color,predicted
0,1,0,1,0,0.53
1,0,1,1,0,0.41
2,1,0,0,1,0.43
3,0,1,0,1,0.32


In [485]:
model68 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result68 = model68.fit()
result68.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.127
Model:,OLS,Adj. R-squared:,0.092
Method:,Least Squares,F-statistic:,3.651
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.000145
Time:,12:09:43,Log-Likelihood:,45.112
No. Observations:,262,AIC:,-68.22
Df Residuals:,251,BIC:,-28.97
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.2639,0.009,27.788,0.000,0.245,0.283
gender_Female,0.1855,0.015,12.286,0.000,0.156,0.215
gender_Male,0.0784,0.015,5.216,0.000,0.049,0.108
age_group_5_25_under,0.1763,0.086,2.062,0.040,0.008,0.345
age_group_5_25to29,-0.0140,0.040,-0.354,0.724,-0.092,0.064
age_group_5_30to34,0.0412,0.048,0.853,0.395,-0.054,0.136
age_group_5_35to39,0.1336,0.043,3.111,0.002,0.049,0.218
age_group_5_40to44,0.0340,0.039,0.876,0.382,-0.042,0.110
age_group_5_45to49,0.0274,0.036,0.755,0.451,-0.044,0.099

0,1,2,3
Omnibus:,112.509,Durbin-Watson:,1.803
Prob(Omnibus):,0.0,Jarque-Bera (JB):,472.182
Skew:,1.77,Prob(JB):,2.9300000000000004e-103
Kurtosis:,8.543,Cond. No.,1.43e+16


In [486]:
model69 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result69 = model69.fit()
result69.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.106
Model:,OLS,Adj. R-squared:,0.07
Method:,Least Squares,F-statistic:,2.975
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.00147
Time:,12:09:43,Log-Likelihood:,41.998
No. Observations:,262,AIC:,-62.0
Df Residuals:,251,BIC:,-22.74
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.2759,0.011,26.094,0.000,0.255,0.297
race_grouping_white,0.1811,0.018,9.892,0.000,0.145,0.217
race_grouping_person_of_color,0.0948,0.014,6.724,0.000,0.067,0.123
age_group_5_25_under,0.1520,0.086,1.764,0.079,-0.018,0.322
age_group_5_25to29,-0.0136,0.041,-0.334,0.738,-0.094,0.067
age_group_5_30to34,0.0793,0.048,1.640,0.102,-0.016,0.175
age_group_5_35to39,0.0950,0.043,2.196,0.029,0.010,0.180
age_group_5_40to44,0.0733,0.038,1.923,0.056,-0.002,0.148
age_group_5_45to49,0.0523,0.037,1.425,0.155,-0.020,0.125

0,1,2,3
Omnibus:,113.194,Durbin-Watson:,1.862
Prob(Omnibus):,0.0,Jarque-Bera (JB):,453.825
Skew:,1.805,Prob(JB):,2.84e-99
Kurtosis:,8.343,Cond. No.,9130000000000000.0


In [487]:
model70 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result70 = model70.fit()
result70.summary()

0,1,2,3
Dep. Variable:,base_pay_change,R-squared:,0.162
Model:,OLS,Adj. R-squared:,0.126
Method:,Least Squares,F-statistic:,4.407
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,4.68e-06
Time:,12:09:44,Log-Likelihood:,50.54
No. Observations:,262,AIC:,-77.08
Df Residuals:,250,BIC:,-34.26
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.2117,0.008,27.065,0.000,0.196,0.227
gender_Female,0.1637,0.015,11.063,0.000,0.135,0.193
gender_Male,0.0480,0.014,3.323,0.001,0.020,0.076
race_grouping_white,0.1548,0.017,9.067,0.000,0.121,0.188
race_grouping_person_of_color,0.0569,0.014,4.096,0.000,0.030,0.084
age_group_5_25_under,0.1807,0.084,2.154,0.032,0.015,0.346
age_group_5_25to29,-0.0568,0.041,-1.401,0.162,-0.137,0.023
age_group_5_30to34,0.0430,0.047,0.906,0.366,-0.050,0.136
age_group_5_35to39,0.1123,0.042,2.650,0.009,0.029,0.196

0,1,2,3
Omnibus:,103.112,Durbin-Watson:,1.862
Prob(Omnibus):,0.0,Jarque-Bera (JB):,381.604
Skew:,1.654,Prob(JB):,1.37e-83
Kurtosis:,7.9,Cond. No.,2.58e+16


In [488]:
model71 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male')
result71 = model71.fit()
result71.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.064
Model:,OLS,Adj. R-squared:,0.061
Method:,Least Squares,F-statistic:,17.83
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,3.34e-05
Time:,12:09:44,Log-Likelihood:,-3.3094
No. Observations:,261,AIC:,10.62
Df Residuals:,259,BIC:,17.75
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.1932,0.010,215.915,0.000,2.173,2.213
gender_Female,1.1609,0.016,73.044,0.000,1.130,1.192
gender_Male,1.0322,0.016,63.618,0.000,1.000,1.064

0,1,2,3
Omnibus:,16.892,Durbin-Watson:,1.674
Prob(Omnibus):,0.0,Jarque-Bera (JB):,18.305
Skew:,0.633,Prob(JB):,0.000106
Kurtosis:,3.288,Cond. No.,2380000000000000.0


In [489]:
model72 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color')
result72 = model72.fit()
result72.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.009
Model:,OLS,Adj. R-squared:,0.005
Method:,Least Squares,F-statistic:,2.318
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.129
Time:,12:09:44,Log-Likelihood:,-10.836
No. Observations:,261,AIC:,25.67
Df Residuals:,259,BIC:,32.8
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.2028,0.012,187.636,0.000,2.180,2.226
race_grouping_white,1.1282,0.021,53.858,0.000,1.087,1.169
race_grouping_person_of_color,1.0746,0.016,67.923,0.000,1.043,1.106

0,1,2,3
Omnibus:,13.746,Durbin-Watson:,1.519
Prob(Omnibus):,0.001,Jarque-Bera (JB):,14.917
Skew:,0.585,Prob(JB):,0.000577
Kurtosis:,2.976,Cond. No.,5990000000000000.0


In [490]:
model73 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result73 = model73.fit()
result73.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.076
Model:,OLS,Adj. R-squared:,0.069
Method:,Least Squares,F-statistic:,10.56
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,3.89e-05
Time:,12:09:44,Log-Likelihood:,-1.7265
No. Observations:,261,AIC:,9.453
Df Residuals:,258,BIC:,20.15
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.6517,0.009,193.876,0.000,1.635,1.669
gender_Female,0.8915,0.016,56.645,0.000,0.860,0.922
gender_Male,0.7603,0.016,48.075,0.000,0.729,0.791
race_grouping_white,0.8561,0.019,44.198,0.000,0.818,0.894
race_grouping_person_of_color,0.7956,0.016,51.040,0.000,0.765,0.826

0,1,2,3
Omnibus:,17.639,Durbin-Watson:,1.701
Prob(Omnibus):,0.0,Jarque-Bera (JB):,19.204
Skew:,0.64,Prob(JB):,6.76e-05
Kurtosis:,3.356,Cond. No.,3190000000000000.0


In [491]:
model74 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result74 = model74.fit()
result74.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.121
Model:,OLS,Adj. R-squared:,0.086
Method:,Least Squares,F-statistic:,3.439
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.000303
Time:,12:09:44,Log-Likelihood:,4.8201
No. Observations:,261,AIC:,12.36
Df Residuals:,250,BIC:,51.57
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.0534,0.011,185.289,0.000,2.032,2.075
gender_Female,1.0766,0.018,60.998,0.000,1.042,1.111
gender_Male,0.9768,0.018,55.653,0.000,0.942,1.011
age_group_5_25_under,0.1699,0.100,1.703,0.090,-0.027,0.366
age_group_5_25to29,0.2144,0.046,4.637,0.000,0.123,0.305
age_group_5_30to34,0.1876,0.056,3.330,0.001,0.077,0.299
age_group_5_35to39,0.2790,0.050,5.572,0.000,0.180,0.378
age_group_5_40to44,0.2975,0.045,6.572,0.000,0.208,0.387
age_group_5_45to49,0.1887,0.042,4.459,0.000,0.105,0.272

0,1,2,3
Omnibus:,12.95,Durbin-Watson:,1.72
Prob(Omnibus):,0.002,Jarque-Bera (JB):,13.45
Skew:,0.537,Prob(JB):,0.0012
Kurtosis:,3.285,Cond. No.,1.01e+16


In [492]:
model75 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result75 = model75.fit()
result75.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.094
Model:,OLS,Adj. R-squared:,0.058
Method:,Least Squares,F-statistic:,2.601
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.00509
Time:,12:09:44,Log-Likelihood:,0.91884
No. Observations:,261,AIC:,20.16
Df Residuals:,250,BIC:,59.37
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,2.0590,0.012,166.409,0.000,2.035,2.083
race_grouping_white,1.0502,0.021,49.016,0.000,1.008,1.092
race_grouping_person_of_color,1.0088,0.017,61.107,0.000,0.976,1.041
age_group_5_25_under,0.1440,0.101,1.428,0.154,-0.055,0.342
age_group_5_25to29,0.2284,0.048,4.798,0.000,0.135,0.322
age_group_5_30to34,0.2191,0.057,3.872,0.000,0.108,0.331
age_group_5_35to39,0.2498,0.051,4.934,0.000,0.150,0.350
age_group_5_40to44,0.3326,0.045,7.460,0.000,0.245,0.420
age_group_5_45to49,0.2070,0.043,4.816,0.000,0.122,0.292

0,1,2,3
Omnibus:,9.53,Durbin-Watson:,1.609
Prob(Omnibus):,0.009,Jarque-Bera (JB):,9.632
Skew:,0.464,Prob(JB):,0.0081
Kurtosis:,3.151,Cond. No.,6300000000000000.0


In [493]:
model76 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result76 = model76.fit()
result76.summary()

0,1,2,3
Dep. Variable:,performance_rating,R-squared:,0.128
Model:,OLS,Adj. R-squared:,0.09
Method:,Least Squares,F-statistic:,3.33
Date:,"Wed, 06 Nov 2019",Prob (F-statistic):,0.000269
Time:,12:09:44,Log-Likelihood:,5.9135
No. Observations:,261,AIC:,12.17
Df Residuals:,249,BIC:,54.95
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,1.5701,0.009,169.243,0.000,1.552,1.588
gender_Female,0.8372,0.018,47.621,0.000,0.803,0.872
gender_Male,0.7329,0.017,42.711,0.000,0.699,0.767
race_grouping_white,0.8109,0.020,40.043,0.000,0.771,0.851
race_grouping_person_of_color,0.7592,0.016,46.090,0.000,0.727,0.792
age_group_5_25_under,0.1266,0.099,1.273,0.204,-0.069,0.322
age_group_5_25to29,0.1463,0.048,3.046,0.003,0.052,0.241
age_group_5_30to34,0.1430,0.056,2.541,0.012,0.032,0.254
age_group_5_35to39,0.2222,0.050,4.421,0.000,0.123,0.321

0,1,2,3
Omnibus:,13.454,Durbin-Watson:,1.737
Prob(Omnibus):,0.001,Jarque-Bera (JB):,14.029
Skew:,0.544,Prob(JB):,0.000899
Kurtosis:,3.329,Cond. No.,1.7e+16
