### Prepping Data Challenge: Departmental December - Finance (week 49)

### Requirements
- Input data
- Create the Employment Range field which captures the employees full tenure at the company in the *MMM yyyy* to *MMM yyyy* format. 
- Work out for each year employed per person:
  - Number of months they worked
  - Their salary they will have received 
  - Their sales total for the year
- For each *Reporting Year* (the individual year someone worked for us), calculate their cumulative months (called Tenure)
- Determine the bonus payments the person will have received each year
  - It's 5% of their sales total
- Round Salary Paid and Yearly Bonus to two decimal places 
- Add Salary Paid and Yearly Bonus together to form *Total Paid*
- Output the data

In [1]:
import pandas as pd
import numpy as np

In [2]:
#Input the data
df = pd.read_csv(r"\Dataprep\2021\PD 2021 Wk 49 Input.csv", parse_dates=['Date'], dayfirst=True)

In [3]:
df.head(10)

Unnamed: 0,Name,Date,Annual Salary,Sales
0,Carl,2020-04-01,16000,632
1,Carl,2020-05-01,16000,1085
2,Carl,2020-06-01,16000,1856
3,Carl,2020-07-01,16000,647
4,Carl,2020-08-01,16000,776
5,Carl,2020-09-01,16000,1671
6,Carl,2020-10-01,16000,357
7,Carl,2020-11-01,16000,339
8,Carl,2020-12-01,16000,1932
9,Carl,2021-01-01,18000,2207


In [4]:
# Work out for each year employed per person:
# Number of months they worked
# Their sales total for the year
df['Reporting Year'] = df['Date'].dt.year
df = df.groupby(['Name', 'Reporting Year']).agg(maxDate = ('Date', 'max'), minDate = ('Date', 'min'), Months_Worked = ('Date','count'),
                                                 Annual_salary = ('Annual Salary', 'mean'), Total_sales=('Sales', 'sum')).reset_index()

In [5]:
# Determine the bonus payments the person will have received each year
#  - It's 5% of their sales total

# Their salary they will have received 
df['Salary Paid'] = ((df['Annual_salary'] / 12) * df['Months_Worked']).round(2)

# Round Salary Paid and Yearly Bonus to two decimal places
df['Yearly Bonus'] = (df['Total_sales'] * 0.05).round(2)
 
# Add Salary Paid and Yearly Bonus together to form *Total Paid*
df['Total Paid'] = df['Salary Paid'] + df['Yearly Bonus']

In [6]:
#For each *Reporting Year* (the individual year someone worked for us), calculate their cumulative months (called Tenure)
df['Tenure by End of Reporting Year'] = df.groupby(['Name'])['Months_Worked'].transform('cumsum')

In [7]:
# Create the Employment Range field which captures the employees full tenure at the company in the *MMM yyyy* to *MMM yyyy* format.                    
df['Employment Range'] = df.groupby(['Name'])['minDate'].transform('min').dt.strftime('%b %Y') + ' to ' + df.groupby(['Name'])['maxDate'].transform('max').dt.strftime('%b %Y')

In [8]:
output = df[['Name', 'Employment Range', 'Reporting Year', 'Tenure by End of Reporting Year', 'Salary Paid', 'Yearly Bonus', 'Total Paid']].drop_duplicates()

In [9]:
output.head(30)

Unnamed: 0,Name,Employment Range,Reporting Year,Tenure by End of Reporting Year,Salary Paid,Yearly Bonus,Total Paid
0,Carl,Apr 2020 to Mar 2022,2020,9,12000.0,464.75,12464.75
1,Carl,Apr 2020 to Mar 2022,2021,21,18000.0,1363.25,19363.25
2,Carl,Apr 2020 to Mar 2022,2022,24,4750.0,339.75,5089.75
3,Jenny,Jul 2020 to Nov 2021,2020,6,9000.0,545.65,9545.65
4,Jenny,Jul 2020 to Nov 2021,2021,17,16958.33,1159.5,18117.83
5,Tom,Nov 2020 to May 2022,2020,2,2416.67,277.55,2694.22
6,Tom,Nov 2020 to May 2022,2021,14,16000.0,1119.4,17119.4
7,Tom,Nov 2020 to May 2022,2022,19,7291.67,394.25,7685.92
8,Toni,Oct 2019 to Sep 2020,2019,3,5250.0,134.25,5384.25
9,Toni,Oct 2019 to Sep 2020,2020,12,15750.0,526.2,16276.2


In [10]:
#output the data
output.to_csv('wk49-output.csv', index=False)