# 2024: Week 6 - Staff Income Tax

February 07, 2024

 Created by: Carl Allchin

Welcome to the first week of the Intermediary level challenges for 2024. This means we'll leave more space for you to work out the logic and be less specific about the techniques you are likely to need. 

The end of January in the UK (where Prep Air is based) is when residents have to submit their income tax returns by. To help our team, we've offered to summarise their tax position for them. The UK income tax works by bands. Here's a summary table showing the percentage of tax for each pound earned in that bracket: 

![1](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjb4x01iCIrMGNS8qxplMNJoXRma3YZz9FdpBs39FlkfxspveNmDIJQS4zz9PvsritFkVBXq4XK9EuegpWHrtcpLkFtaLRHQIzm0dYgjLjxAQAIzJyojyWuhcM24qFF503Mv8rc4O1GxDYJaNnGrxxYYxZ0EgLzCA7RM2WRL5e1-Vi-IhBIMs8U8PYbsKQS/s952/Screenshot%202024-01-28%20at%2017.51.57.png)

For example, if I earned £12,571. I would pay £0.20 of tax in total: £0 for the first £12,570 earned and then 20% of the £1 in the next tax band. 

### Input

One csv file containing the monthly salary for staff. If any team member has a change in their pay, their new salary is recorded as a later record but the input contains their former record based on what they would have been paid

![2](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEirEshz8IP1iXN1P9xru0kSyr9KRN1gj19we0OajhqrAU8cgf9xCt8Jk-QgLTtWn2msmbJm2nLzDxvprG3hVUOitQ8Chj3t-b02xHpWqIglcOfl-O86jOL14V355Sx8quXP4yOAo0wxjubGvgkU__y0AdWw3QEE9G68-Pg5tiE2SvgzCcH1Sx5K3HER3gU3/s2626/Screenshot%202024-01-28%20at%2018.57.43.png)

### Requirements
- Input the csv file
- Add a row number to the data set
- Find the latest row (largest row number) to capture the individuals correct salary information
- Find each team member's annual salary
- Find each team member's maximum tax band based on their annual salary
- 20% rate
- 40% rate
- 45% rate 
- Work out how much tax an individual paid for each of the % bands. Call these fields:
- 20% tax rate paid
- 40% tax rate paid
- 45% tax rate paid
- Total the tax paid across all three % bands. Call this field 'Total Tax Paid' 
- Output the data

### Output

![3](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigeYYkwt4PhAzfUravnvoWCBTuhuRt8av-J7zuq3VDRGBaQ2sDnkGCkkmueCX8EHOQ-4N8FL5ichGFMAFymJqUd93Z2MdlhluKnG4Bpvn00QsJQaEBghfYXcm-hS91wLpN4XNJnQvZaqmfY8rZE_gXB-12ET4ke_TdWlRr81iLTHpz3G88xKa1aOYCSryS/s1622/Screenshot%202024-01-29%20at%2017.59.22.png)


Note: your output may have some rounding differences and that's ok as it will depend on the tool you use

7 data fields: 
- StaffID 
- Salary
- Max Tax Rate
- Total Tax Paid
- 20% rate tax paid
- 40% tax rate paid
- 45% tax rate paid

In [100]:
import pandas as pd

df = pd.read_csv('PD 2024 Wk 6 Input.csv')
print(df.head())

   StaffID       1        2        3        4        5        6        7  \
0     1533  2398.0  2421.98  2446.20  2446.20  2495.12  2495.12  2495.12   
1     1339  7304.0  7523.12  7673.58  7673.58  7750.32  7827.82  8062.66   
2     2291  8240.0  8404.80  8572.90  8744.35  8831.80  9096.75  9278.69   
3     2038  3908.0  3986.16  3986.16  4026.02  4066.28  4188.27  4313.92   
4     2810  3988.0  4107.64  4148.72  4190.20  4274.01  4316.75  4316.75   

         8        9       10       11       12  
0  2495.12  2545.03  2621.38  2621.38  2621.38  
1  8304.54  8470.63  8555.33  8555.33  8726.44  
2  9464.26  9464.26  9464.26  9558.90  9558.90  
3  4443.34  4487.77  4622.40  4668.63  4715.31  
4  4359.92  4490.71  4535.62  4671.69  4718.41  


In [101]:
df['StaffID_Index'] = df.groupby('StaffID').cumcount() + 1
print(df)

     StaffID        1         2         3         4         5         6  \
0       1533   2398.0   2421.98   2446.20   2446.20   2495.12   2495.12   
1       1339   7304.0   7523.12   7673.58   7673.58   7750.32   7827.82   
2       2291   8240.0   8404.80   8572.90   8744.35   8831.80   9096.75   
3       2038   3908.0   3986.16   3986.16   4026.02   4066.28   4188.27   
4       2810   3988.0   4107.64   4148.72   4190.20   4274.01   4316.75   
..       ...      ...       ...       ...       ...       ...       ...   
994     2959   9163.0   9163.00   9163.00   9163.00   9437.89   9626.65   
995     1467   1928.0   1985.84   2045.42   2086.32   2128.05   2170.61   
996     2582   5343.0   5449.86   5558.86   5614.45   5726.73   5898.54   
997     1779  11138.0  11472.14  11816.30  11816.30  11934.47  12173.16   
998     1510   2211.0   2255.22   2322.88   2392.56   2440.41   2464.82   

            7         8         9        10        11        12  StaffID_Index  
0     2495.12   24

In [102]:
latest_df = df.loc[df.groupby('StaffID')['StaffID_Index'].idxmax()]
print(latest_df)

     StaffID        1         2         3         4         5         6  \
386     1000  13416.0  13550.16  13685.66  13822.52  13960.74  14239.96   
5       1001  12518.0  12518.00  12893.54  13151.41  13545.95  13681.41   
877     1007   2134.0   2176.68   2220.21   2264.62   2264.62   2287.26   
262     1010   8260.0   8507.80   8763.03   9025.93   9025.93   9206.44   
96      1012   8669.0   8755.69   9018.36   9288.91   9381.80   9475.62   
..       ...      ...       ...       ...       ...       ...       ...   
515     2994   8459.0   8712.77   8712.77   8887.03   8887.03   9153.64   
322     2995  10672.0  10672.00  10672.00  10778.72  10994.29  11104.24   
842     2997   2436.0   2509.08   2534.17   2610.20   2636.30   2662.66   
565     2998   1889.0   1889.00   1945.67   1965.13   2024.08   2024.08   
947     2999   6510.0   6510.00   6705.30   6839.41   7044.59   7115.03   

            7         8         9        10        11        12  StaffID_Index  
386  14667.16  149

In [103]:
latest_df['Annual Salary'] = latest_df.loc[:, '1':'12'].sum(axis=1)
print(latest_df)

     StaffID        1         2         3         4         5         6  \
386     1000  13416.0  13550.16  13685.66  13822.52  13960.74  14239.96   
5       1001  12518.0  12518.00  12893.54  13151.41  13545.95  13681.41   
877     1007   2134.0   2176.68   2220.21   2264.62   2264.62   2287.26   
262     1010   8260.0   8507.80   8763.03   9025.93   9025.93   9206.44   
96      1012   8669.0   8755.69   9018.36   9288.91   9381.80   9475.62   
..       ...      ...       ...       ...       ...       ...       ...   
515     2994   8459.0   8712.77   8712.77   8887.03   8887.03   9153.64   
322     2995  10672.0  10672.00  10672.00  10778.72  10994.29  11104.24   
842     2997   2436.0   2509.08   2534.17   2610.20   2636.30   2662.66   
565     2998   1889.0   1889.00   1945.67   1965.13   2024.08   2024.08   
947     2999   6510.0   6510.00   6705.30   6839.41   7044.59   7115.03   

            7         8         9        10        11        12  \
386  14667.16  14960.50  15110.1

In [104]:
tax_brackets = {
    'Band' : ['Personal Allowance', 'Basic Rate', 'Higher Rate', 'Additional Rate'],
    'Taxable Income': ['0 to 12570', '12571 to 50270', '50271 to 125140', 'Over 125140'],
    'Tax Rate': ['0%', '20%', '40%', '45%']
}

tax_df = pd.DataFrame(tax_brackets)
print(tax_df)

                 Band   Taxable Income Tax Rate
0  Personal Allowance       0 to 12570       0%
1          Basic Rate   12571 to 50270      20%
2         Higher Rate  50271 to 125140      40%
3     Additional Rate      Over 125140      45%


In [105]:
tax_brackets['compare_field'] = [12570, 50270, 125140, 9999999999999]

tax_df = pd.DataFrame(tax_brackets)
print(tax_df)

                 Band   Taxable Income Tax Rate  compare_field
0  Personal Allowance       0 to 12570       0%          12570
1          Basic Rate   12571 to 50270      20%          50270
2         Higher Rate  50271 to 125140      40%         125140
3     Additional Rate      Over 125140      45%  9999999999999


In [106]:
latest_df['Max Tax Band'] = latest_df['Annual Salary'].apply(lambda x: tax_df[tax_df['compare_field'] >= x].iloc[0]['Band'])
latest_df['Max Tax Rate'] = latest_df['Annual Salary'].apply(lambda x: tax_df[tax_df['compare_field'] >= x].iloc[0]['Tax Rate'])
print(latest_df[['StaffID', 'Annual Salary', 'Max Tax Band', 'Max Tax Rate']])

     StaffID  Annual Salary     Max Tax Band Max Tax Rate
386     1000      173197.95  Additional Rate          45%
5       1001      166864.48  Additional Rate          45%
877     1007       27969.52       Basic Rate          20%
262     1010      111033.29      Higher Rate          40%
96      1012      115739.53      Higher Rate          40%
..       ...            ...              ...          ...
515     2994      111712.65      Higher Rate          40%
322     2995      134668.02  Additional Rate          45%
842     2997       32379.93       Basic Rate          20%
565     2998       24647.13       Basic Rate          20%
947     2999       85976.30      Higher Rate          40%

[803 rows x 4 columns]


In [107]:
def calculate_tax(salary):
    tax_paid = {'20% tax rate paid': 0, '40% tax rate paid': 0, '45% tax rate paid': 0}
    remaining_salary = salary

    # 45% tax rate
    if remaining_salary > 125140:
        tax_paid['45% tax rate paid'] = (remaining_salary - 125140) * 0.45
        remaining_salary = 125140

    # 40% tax rate
    if remaining_salary > 50270:
        tax_paid['40% tax rate paid'] = (remaining_salary - 50270) * 0.40
        remaining_salary = 50270

    # 20% tax rate
    if remaining_salary > 12570:
        tax_paid['20% tax rate paid'] = (remaining_salary - 12570) * 0.20

    return tax_paid

latest_df[['20% tax rate paid', '40% tax rate paid', '45% tax rate paid']] = latest_df['Annual Salary'].apply(lambda x: pd.Series(calculate_tax(x)))
print(latest_df[['StaffID', 'Annual Salary', '20% tax rate paid', '40% tax rate paid', '45% tax rate paid']])

     StaffID  Annual Salary  20% tax rate paid  40% tax rate paid  \
386     1000      173197.95           7540.000          29948.000   
5       1001      166864.48           7540.000          29948.000   
877     1007       27969.52           3079.904              0.000   
262     1010      111033.29           7540.000          24305.316   
96      1012      115739.53           7540.000          26187.812   
..       ...            ...                ...                ...   
515     2994      111712.65           7540.000          24577.060   
322     2995      134668.02           7540.000          29948.000   
842     2997       32379.93           3961.986              0.000   
565     2998       24647.13           2415.426              0.000   
947     2999       85976.30           7540.000          14282.520   

     45% tax rate paid  
386         21626.0775  
5           18776.0160  
877             0.0000  
262             0.0000  
96              0.0000  
..                 ..

In [108]:
# calculate total tax paid
latest_df['Total Tax Paid'] = latest_df[['20% tax rate paid', '40% tax rate paid', '45% tax rate paid']].sum(axis=1)

# Drop columns 1-12, drop StaffID_Index
latest_df = latest_df.drop(columns=[str(i) for i in range(1, 13)])
latest_df = latest_df.drop(columns=['StaffID_Index'])
print(latest_df)

     StaffID  Annual Salary     Max Tax Band Max Tax Rate  20% tax rate paid  \
386     1000      173197.95  Additional Rate          45%           7540.000   
5       1001      166864.48  Additional Rate          45%           7540.000   
877     1007       27969.52       Basic Rate          20%           3079.904   
262     1010      111033.29      Higher Rate          40%           7540.000   
96      1012      115739.53      Higher Rate          40%           7540.000   
..       ...            ...              ...          ...                ...   
515     2994      111712.65      Higher Rate          40%           7540.000   
322     2995      134668.02  Additional Rate          45%           7540.000   
842     2997       32379.93       Basic Rate          20%           3961.986   
565     2998       24647.13       Basic Rate          20%           2415.426   
947     2999       85976.30      Higher Rate          40%           7540.000   

     40% tax rate paid  45% tax rate pa

In [109]:
output = latest_df