Income By Title and Gender

Find the average total compensation based on employee titles and gender. Total compensation is calculated by adding both the salary and bonus of each employee. However, not every employee receives a bonus so disregard employees without bonuses in your calculation. Employee can receive more than one bonus.
Output the employee title, gender (i.e., sex), along with the average total compensation.

In [1]:
import pandas as pd
import numpy as np

In [2]:
sf_employee = pd.read_csv("../CSV/sf_employee.csv")
sf_employee.head(3)

Unnamed: 0,id,first_name,last_name,age,sex,employee_title,department,salary,target,email,city,address,manager_id
0,5,Max,George,26,M,Sales,Sales,1300,200,Max@company.com,California,2638 Richards Avenue,1
1,13,Katty,Bond,56,F,Manager,Management,150000,0,Katty@company.com,Arizona,,1
2,11,Richerd,Gear,57,M,Manager,Management,250000,0,Richerd@company.com,Alabama,,1


In [6]:
sf_bonus = pd.read_csv("../CSV/sf_bonus.csv")
sf_bonus = sf_bonus.iloc[:, :2]
sf_bonus

Unnamed: 0,worker_ref_id,bonus
0,1,5000
1,2,3000
2,3,4000
3,1,4500
4,2,3500
5,14,1200
6,17,2500
7,30,500


In [7]:
sf_bonus_summary = sf_bonus.groupby(['worker_ref_id'])['bonus'].sum().to_frame('bonus').reset_index()
sf_bonus_summary

Unnamed: 0,worker_ref_id,bonus
0,1,9500
1,2,6500
2,3,4000
3,14,1200
4,17,2500
5,30,500


In [8]:
merged_df = pd.merge(sf_employee,sf_bonus_summary,left_on='id',right_on='worker_ref_id')
merged_df

Unnamed: 0,id,first_name,last_name,age,sex,employee_title,department,salary,target,email,city,address,manager_id,worker_ref_id,bonus
0,17,Mick,Berry,44,M,Senior Sales,Sales,2200,200,Mick@company.com,Florida,,11,17,2500
1,14,Jason,Tom,23,M,Auditor,Audit,1000,200,Jason@company.com,Arizona,,11,14,1200
2,30,Mark,Jon,28,M,Sales,Sales,1200,200,Mark@company.com,Alabama,2522 George Avenue,1,30,500
3,1,Allen,Wang,55,F,Manager,Management,200000,0,Allen@company.com,California,1069 Ventura Drive,1,1,9500
4,2,Joe,Jack,32,M,Sales,Sales,1000,200,Joe@company.com,California,995 Jim Rosa Lane,1,2,6500
5,3,Henry,Ted,31,M,Senior Sales,Sales,2000,200,Henry@company.com,California,1609 Ford Street,1,3,4000


In [9]:
merged_df['avg_total_comp'] = merged_df['salary'] + merged_df['bonus']
merged_df

Unnamed: 0,id,first_name,last_name,age,sex,employee_title,department,salary,target,email,city,address,manager_id,worker_ref_id,bonus,avg_total_comp
0,17,Mick,Berry,44,M,Senior Sales,Sales,2200,200,Mick@company.com,Florida,,11,17,2500,4700
1,14,Jason,Tom,23,M,Auditor,Audit,1000,200,Jason@company.com,Arizona,,11,14,1200,2200
2,30,Mark,Jon,28,M,Sales,Sales,1200,200,Mark@company.com,Alabama,2522 George Avenue,1,30,500,1700
3,1,Allen,Wang,55,F,Manager,Management,200000,0,Allen@company.com,California,1069 Ventura Drive,1,1,9500,209500
4,2,Joe,Jack,32,M,Sales,Sales,1000,200,Joe@company.com,California,995 Jim Rosa Lane,1,2,6500,7500
5,3,Henry,Ted,31,M,Senior Sales,Sales,2000,200,Henry@company.com,California,1609 Ford Street,1,3,4000,6000


In [14]:
result = merged_df.groupby(['employee_title','sex'])['avg_total_comp'].mean().reset_index()
result

Unnamed: 0,employee_title,sex,avg_total_comp
0,Auditor,M,2200.0
1,Manager,F,209500.0
2,Sales,M,4600.0
3,Senior Sales,M,5350.0


Solution Walkthrough
This code snippet is used to find the average total compensation based on employee titles and gender. The total compensation is calculated by adding both the salary and bonus of each employee. The code uses pandas library to process and analyze the data.

Understanding The Data
Before diving into the code, let's understand the data involved in this analysis. The code assumes the following data:

sf_employee: A DataFrame containing information about employees, including their IDs, titles, salaries, and gender.
sf_bonus: A DataFrame containing information about bonuses received by employees, including the worker reference ID and the amount of the bonus.
Both DataFrames have columns that are common to join them, i.e., the employee ID in sf_employee and the worker reference ID in sf_bonus.

The Problem Statement
The goal is to calculate the average total compensation for each combination of employee title and gender. The total compensation is the sum of the salary and bonus for each employee. However, if an employee has no bonus, they should be disregarded in the calculation.

Breaking Down The Code
Let's break down the code step by step to understand its implementation:

import pandas as pd
import numpy as np
These lines import the pandas library, which is used for data manipulation and analysis, and the numpy library, which provides support for large, multi-dimensional arrays and matrices.

sf_bonus_summary = sf_bonus.groupby(['worker_ref_id'])['bonus'].sum().to_frame('bonus').reset_index()
This line groups the sf_bonus DataFrame by the worker reference ID and calculates the sum of bonuses for each ID. The result is stored in the sf_bonus_summary DataFrame, which has two columns: the worker reference ID and the summed bonus amount.

merged_df = pd.merge(sf_employee, sf_bonus_summary, left_on='id', right_on='worker_ref_id')
This line merges the sf_employee DataFrame with the sf_bonus_summary DataFrame based on the employee ID and worker reference ID respectively. The result is stored in the merged_df DataFrame, which contains all the columns from both DataFrames.

merged_df['avg_total_comp'] = merged_df['salary'] + merged_df['bonus']
This line calculates the total compensation for each employee by adding their salary and bonus. The result is stored in a new column called 'avg_total_comp' in the merged_df DataFrame.

result = merged_df.groupby(['employee_title','sex'])['avg_total_comp'].mean().reset_index()
This line groups the merged_df DataFrame by employee title and gender, and calculates the mean of the 'avg_total_comp' column for each group. The result is stored in the result DataFrame, which contains the employee title, gender, and the average total compensation.

Bringing It All Together
The code combines the given data and performs the necessary calculations to find the average total compensation based on employee titles and gender. Here's the complete code:

import pandas as pd
import numpy as np

sf_bonus_summary = sf_bonus.groupby(['worker_ref_id'])['bonus'].sum().to_frame('bonus').reset_index()
merged_df = pd.merge(sf_employee, sf_bonus_summary, left_on='id', right_on='worker_ref_id')
merged_df['avg_total_comp'] = merged_df['salary'] + merged_df['bonus']
result = merged_df.groupby(['employee_title', 'sex'])['avg_total_comp'].mean().reset_index()
Conclusion
This code provides a solution to finding the average total compensation for each combination of employee title and gender, taking into account the salary and bonus amounts. The pandas library is used to manipulate and analyze the data, making it a versatile and efficient solution.