# 2024: Week 24 - SuperBytes Salaries Part 2

June 12, 2024

Challenge By: Andrew Tobin

We're continuing with DS43's challenges so over to Andrew to explain his next challenge. 

_____________________________________

Manager salaries at SuperBytes are paid weekly. Salaries are consistent throughout a given manager’s employment, but may change when a manager is replaced. SuperBytes wants to know two things:

1. How much have they been spending on manager salaries each week since the company’s founding?
2. What percentage of annual expenses is going to manager salaries?

### Input

This week’s data has two sheets: 

- One is similar data to last week’s, but the cleaning and joining is already done for you, giving a single sheet containing manager salaries and tenure dates 

![1](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEguPgHdir2wPKajOYhNxWsP72o8Y9cGvVb8TlDPSMS_d74lMq5Im1BeezTO_ufRi-gSY_lCPuSorSofKsMFwjwU0UEa2glyn9_7tLlhO5XakqYBYbpPTFk8-adlyUuWymIImAFJEdax3gLsYcvV9YFnlLpTPgSsr-OwR9kzxRmG-vOUz-8u7-gxTLXKtBaS/s903/Screenshot%202024-05-28%20165102.png)

- The other contains the total expenses by year for the company 

![2](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiytCupXe91wVsv4arvhywGMyD2X2EcgRj1H_MdpY9DAbrYLAPeP-uF4RgS8tMN6SxJWj_c4TppGO8cl7Jc9GnhjivklGrrw9bdO48QFPAFZwfvE2uz3xwniy9arYmuOIB_OjLETsMBctSe42BjsENTjEd0ZIZST660XJysmjv8Yb_k2_SqR6nRlsK0eE6I/s331/Screenshot%202024-05-28%20165144.png)

### Requirements

- Input the data
- Set the end date for current employees as 12th June 2024
- Calculate the weekly salary of each manager (assuming 52 weeks in the year)
- For each manager, create a row for every week they receive their salary
- For the first output:
- Total up the amount SuperBytes pays Manager Salaries each week
- Round this value to 2 decimal places
- For the second output:
- Workout how much SuperBytes spends on Manager Salaries each year
- Compare this to the total expenses for the company each year
- Calculate the % Spent on Manager Salaries rounded to 1 decimal place
- Output the data

### Output

The flow should have two output tables:

![3](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjcHRlYjWuJ7V8ZyF7tfJdS6WAiy8dX60Jfj6irR-ldl32jjLc1d3KGHs1sSa0yapiReZhe8nQVRq3xj6L_OFPH-y8bML3YI_qLXXnVzJImabn11s0czYhsfCuFPjjOA5JYRQ2EUmyT48tvgrkvphLt13Sty5zS1_l2RAU40j2H14nOY0igXjJfXNSI2gV9/s186/Screenshot%202024-05-30%20111432.png)

- 2 fields
- Week
- Salary Payments
- 830 rows (831 including headers)

![4](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9SH4BABJcZmUn1vmESp71q0kq0KVUnbgHVb1lGfZfzGhwXXgnC9X0lAAEdK8Y18se0B2ghWNRufwGs8s4SbPl4ft1tvSoABJnfGF3CK-KJE6TKKMrMoMhJCztrQo4w0esAPNWyZg3xJqwE3ODD-wxSPsZbPNGaIsmAtgpHq0xFFpIlOTtGwwTnfw3Ioxv/s252/Screenshot%202024-05-30%20111716.png)

- 2 fields
- Year
- % Spent on Manager Salaries
- 16 rows (17 including headers)

In [436]:
import pandas as pd

# Read the Excel file
excel_file = 'SuperByte Managers (Intermediate).xlsx'

# List all sheet names
sheet_names = pd.ExcelFile(excel_file).sheet_names

print(sheet_names)

['Tenures & Salaries', 'Total Expenses']


In [437]:
tenu_salaries_df = pd.read_excel(excel_file, sheet_name='Tenures & Salaries')
tenu_salaries_df.head()

Unnamed: 0,Name,Gender,Store City,Store Area,Start Date,End Date,Salary
0,Pablo Prep,M,London,Stratford,2008-07-20,2017-03-02 00:00:00,40000
1,Phil Down,M,London,Oxford Street,2008-07-20,2020-06-04 00:00:00,40000
2,Beau Leon,M,Nottingham,,2008-07-20,2018-09-05 00:00:00,35000
3,Tasha Board,F,Leeds,,2008-07-20,Current,35000
4,Jewel Axis,M,Manchester,Trafford,2008-12-22,2022-01-07 00:00:00,37000


In [438]:
total_expense_df = pd.read_excel(excel_file, sheet_name='Total Expenses')
total_expense_df.head()

Unnamed: 0,Year,Expenses
0,2008,2404683
1,2009,1954242
2,2010,2608345
3,2011,2546432
4,2012,3842963


In [439]:
# Replace 'Current' with '2024-06-12' in the 'End Date' column
tenu_salaries_df['End Date'] = tenu_salaries_df['End Date'].replace('Current', '2024-06-12')
tenu_salaries_df.head()

Unnamed: 0,Name,Gender,Store City,Store Area,Start Date,End Date,Salary
0,Pablo Prep,M,London,Stratford,2008-07-20,2017-03-02 00:00:00,40000
1,Phil Down,M,London,Oxford Street,2008-07-20,2020-06-04 00:00:00,40000
2,Beau Leon,M,Nottingham,,2008-07-20,2018-09-05 00:00:00,35000
3,Tasha Board,F,Leeds,,2008-07-20,2024-06-12,35000
4,Jewel Axis,M,Manchester,Trafford,2008-12-22,2022-01-07 00:00:00,37000


In [440]:
# Convert 'End Date' column to datetime data type
tenu_salaries_df['End Date'] = pd.to_datetime(tenu_salaries_df['End Date'])
tenu_salaries_df.head()

Unnamed: 0,Name,Gender,Store City,Store Area,Start Date,End Date,Salary
0,Pablo Prep,M,London,Stratford,2008-07-20,2017-03-02,40000
1,Phil Down,M,London,Oxford Street,2008-07-20,2020-06-04,40000
2,Beau Leon,M,Nottingham,,2008-07-20,2018-09-05,35000
3,Tasha Board,F,Leeds,,2008-07-20,2024-06-12,35000
4,Jewel Axis,M,Manchester,Trafford,2008-12-22,2022-01-07,37000


In [441]:
# Calculate the weekly salary
tenu_salaries_df['Weekly Salary'] = tenu_salaries_df['Salary'] / 52
tenu_salaries_df.head()

Unnamed: 0,Name,Gender,Store City,Store Area,Start Date,End Date,Salary,Weekly Salary
0,Pablo Prep,M,London,Stratford,2008-07-20,2017-03-02,40000,769.230769
1,Phil Down,M,London,Oxford Street,2008-07-20,2020-06-04,40000,769.230769
2,Beau Leon,M,Nottingham,,2008-07-20,2018-09-05,35000,673.076923
3,Tasha Board,F,Leeds,,2008-07-20,2024-06-12,35000,673.076923
4,Jewel Axis,M,Manchester,Trafford,2008-12-22,2022-01-07,37000,711.538462


In [442]:
min_start_date = tenu_salaries_df['Start Date'].min()
max_end_date = tenu_salaries_df['End Date'].max()

print(f"Min Start Date: {min_start_date}")
print(f"Max End Date: {max_end_date}")

Min Start Date: 2008-07-20 00:00:00
Max End Date: 2024-06-12 00:00:00


In [443]:
# Create a date range from min_start_date to max_end_date
date_range = pd.date_range(start=min_start_date, end=max_end_date)

# Create a DataFrame with the date range
date_df = pd.DataFrame(date_range, columns=['Date'])

date_df.head()

Unnamed: 0,Date
0,2008-07-20
1,2008-07-21
2,2008-07-22
3,2008-07-23
4,2008-07-24


In [444]:
# Add a new column 'Day of Week' to date_df
date_df['Day of Week'] = date_df['Date'].dt.day_name()

date_df.head()

Unnamed: 0,Date,Day of Week
0,2008-07-20,Sunday
1,2008-07-21,Monday
2,2008-07-22,Tuesday
3,2008-07-23,Wednesday
4,2008-07-24,Thursday


In [445]:
# Create a new column 'Week Number' starting from 1
date_df['Week Number'] = ((date_df['Date'] - date_df['Date'].min()).dt.days // 7) + 1

date_df.head()

Unnamed: 0,Date,Day of Week,Week Number
0,2008-07-20,Sunday,1
1,2008-07-21,Monday,1
2,2008-07-22,Tuesday,1
3,2008-07-23,Wednesday,1
4,2008-07-24,Thursday,1


In [446]:
# Group by 'Week Number' and keep only the minimum date for each week
min_date_per_week_df = date_df.groupby('Week Number').agg({'Date': 'min'}).reset_index()

min_date_per_week_df.head()

Unnamed: 0,Week Number,Date
0,1,2008-07-20
1,2,2008-07-27
2,3,2008-08-03
3,4,2008-08-10
4,5,2008-08-17


In [447]:
# Perform the join
merged_df = tenu_salaries_df.merge(min_date_per_week_df, how='cross')

# Filter the merged dataframe based on the conditions
filtered_df = merged_df[(merged_df['Start Date'] <= merged_df['Date']) & (merged_df['End Date'] >= merged_df['Date'])]

filtered_df.head()

Unnamed: 0,Name,Gender,Store City,Store Area,Start Date,End Date,Salary,Weekly Salary,Week Number,Date
0,Pablo Prep,M,London,Stratford,2008-07-20,2017-03-02,40000,769.230769,1,2008-07-20
1,Pablo Prep,M,London,Stratford,2008-07-20,2017-03-02,40000,769.230769,2,2008-07-27
2,Pablo Prep,M,London,Stratford,2008-07-20,2017-03-02,40000,769.230769,3,2008-08-03
3,Pablo Prep,M,London,Stratford,2008-07-20,2017-03-02,40000,769.230769,4,2008-08-10
4,Pablo Prep,M,London,Stratford,2008-07-20,2017-03-02,40000,769.230769,5,2008-08-17


In [448]:
# Group by 'Date' and calculate the sum of 'Weekly Salary'
weekly_salary_by_week_number = filtered_df.groupby(['Date'])['Weekly Salary'].sum().reset_index()
weekly_salary_by_week_number.head()

Unnamed: 0,Date,Weekly Salary
0,2008-07-20,2884.615385
1,2008-07-27,2884.615385
2,2008-08-03,2884.615385
3,2008-08-10,2884.615385
4,2008-08-17,2884.615385


In [449]:
# Calculate the weekly salary of each employee by week number
weekly_salary_by_week_number['Weekly Salary'] = weekly_salary_by_week_number['Weekly Salary'].round(2)
output1 = weekly_salary_by_week_number
output1.head()

Unnamed: 0,Date,Weekly Salary
0,2008-07-20,2884.62
1,2008-07-27,2884.62
2,2008-08-03,2884.62
3,2008-08-10,2884.62
4,2008-08-17,2884.62


In [450]:
# Calculate the daily salary
tenu_salaries_df['Daily Salary'] = tenu_salaries_df['Salary'] / 365
tenu_salaries_df.head()

Unnamed: 0,Name,Gender,Store City,Store Area,Start Date,End Date,Salary,Weekly Salary,Daily Salary
0,Pablo Prep,M,London,Stratford,2008-07-20,2017-03-02,40000,769.230769,109.589041
1,Phil Down,M,London,Oxford Street,2008-07-20,2020-06-04,40000,769.230769,109.589041
2,Beau Leon,M,Nottingham,,2008-07-20,2018-09-05,35000,673.076923,95.890411
3,Tasha Board,F,Leeds,,2008-07-20,2024-06-12,35000,673.076923,95.890411
4,Jewel Axis,M,Manchester,Trafford,2008-12-22,2022-01-07,37000,711.538462,101.369863


In [451]:
# Function to create a date range for each employee
def create_date_range(row):
    return pd.date_range(start=row['Start Date'], end=row['End Date'])

# Apply the function to each row and explode the dataframe
expanded_df = tenu_salaries_df.loc[tenu_salaries_df.index.repeat(tenu_salaries_df.apply(create_date_range, axis=1).apply(len))]
expanded_df['Date'] = tenu_salaries_df.apply(create_date_range, axis=1).explode().values

expanded_df.head()

Unnamed: 0,Name,Gender,Store City,Store Area,Start Date,End Date,Salary,Weekly Salary,Daily Salary,Date
0,Pablo Prep,M,London,Stratford,2008-07-20,2017-03-02,40000,769.230769,109.589041,2008-07-20
0,Pablo Prep,M,London,Stratford,2008-07-20,2017-03-02,40000,769.230769,109.589041,2008-07-21
0,Pablo Prep,M,London,Stratford,2008-07-20,2017-03-02,40000,769.230769,109.589041,2008-07-22
0,Pablo Prep,M,London,Stratford,2008-07-20,2017-03-02,40000,769.230769,109.589041,2008-07-23
0,Pablo Prep,M,London,Stratford,2008-07-20,2017-03-02,40000,769.230769,109.589041,2008-07-24


In [452]:
# Add a new column 'Year' to expanded_df
expanded_df['Year'] = expanded_df['Date'].dt.year

# Group by 'Year' and calculate the sum of 'Daily Salary'
annual_salary_by_year = expanded_df.groupby('Year')['Daily Salary'].sum().reset_index()

# Rename the column to 'Total Annual Salary'
annual_salary_by_year.rename(columns={'Daily Salary': 'Total Annual Salary'}, inplace=True)

annual_salary_by_year.head()

Unnamed: 0,Year,Total Annual Salary
0,2008,68821.917808
1,2009,187000.0
2,2010,231928.767123
3,2011,270000.0
4,2012,276986.30137


In [454]:
# Merge annual_salary_by_year with total_expense_df on 'Year'
merged_annual_expenses_df = pd.merge(annual_salary_by_year, total_expense_df, on='Year')

merged_annual_expenses_df.head()

Unnamed: 0,Year,Total Annual Salary,Expenses
0,2008,68821.917808,2404683
1,2009,187000.0,1954242
2,2010,231928.767123,2608345
3,2011,270000.0,2546432
4,2012,276986.30137,3842963


In [455]:
# Calculate the percentage spent on manager salaries
merged_annual_expenses_df['% Spent on Manager Salaries'] = (merged_annual_expenses_df['Total Annual Salary'] / merged_annual_expenses_df['Expenses']) * 100

# Round the percentage to 1 decimal place
merged_annual_expenses_df['% Spent on Manager Salaries'] = merged_annual_expenses_df['% Spent on Manager Salaries'].round(1)

# Display the updated dataframe
output2 = merged_annual_expenses_df[['Year', '% Spent on Manager Salaries']]
output2.head()

Unnamed: 0,Year,% Spent on Manager Salaries
0,2008,2.9
1,2009,9.6
2,2010,8.9
3,2011,10.6
4,2012,7.2
