# 2024: Week 17 - Budget vs Actuals Part 2

April 24, 2024

Challenge by: Michał Mioduchowski

We're continuing with DS43's challenges so over to Michał to explain the her next challenge. 


_____________________________________

Superbytes has tasked us with examining their historical budgets once more, this time focusing on 2023, when the company began tracking actual spending on a monthly basis. The objective is to determine primary areas contributing to over-expenditure over the course of the year. CEO Phil suspects that the company may have invested too much into inventory without witnessing a corresponding uptick in sales performance last year. If suspicion turns out to be true, it will provide actionable insights for optimising resource allocation, and enhancing overall financial performance in line with Superbytes' strategic objectives.

### Inputs

There are 13 inputs this week:
Forecasted Spending 

![1](https://lh7-eu.googleusercontent.com/5eAgmBcHjcJV-ArIgu3CIIAdlzLMhlsXkdbKCN1OqaZ3jp1u2PuZNN96GW9EPDObZ5npVZkbKK8WcqRpJEbs0Vbg9oMi-O5Wnz6iKN28Llaua-EW-McaH-aLoRA_H2D041RKw1ZGp3VevmBD3M-aZ9Y)

Actual Monthly Spending (1 table per month) 

![2](https://lh7-eu.googleusercontent.com/VmBp8qu5iPUFv9-Mn5ESJhLKOq0niTNtTAFNVmpmANgs3FDXebURfWk4QT31RH8AecQdXpZZt76zQIM6LRejR9gae12YmHlQZC8wB3C6zGuMdZciFNIrvkQDGzP0_O79P-Cacn29SDHctgkQ4OIZW7c=w640-h26)

### Requirements

- Input the Excel file:
- Combine the Monthly sheets into a single table
- Since we are trying to find the key areas of overspending, let's exclude any area that is below budget at an annual level
- For the remaining areas, work out the difference between forecasted and actual expenditure on a monthly basis:
- Since the budget estimates were not prepared on a monthly basis, the available numbers should be spread out across the year evenly
- Round the values to the nearest whole number
- For each month, find the category which has the highest overspending 
- Output the data

### Output

![3](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNaIPFNWJfwcXbi_PTTh6q7WYrjJoLUn48B6DnLZiY1vhIOUOX6_q3T9LF0vQ-xNppDoj7WwOoq4G92zgAoWu6gCyU1NU5VqYu2qi2PaTcWxHX34LxU7hLrj8zD2xM8fSyzwPtf-cJbWmyAS6by05rG5V7kiTDH3_GIdtrz0-nHTA4V3DyXqZkg7Z1Au-v/s496/Screenshot%202024-03-13%20134226.png)

- 5 fields
- Month
- Category
- Actual Spending
- Budget
- Difference
- 12 rows

In [16]:
import pandas as pd

# Read the Excel file
file_path = 'Budget Data 2023 (Intermediate).xlsx'
xls = pd.ExcelFile(file_path)

# List all sheet names
sheet_names = xls.sheet_names
print(sheet_names)

['Budget', 'Month-January', 'Month-February', 'Month-March', 'Month-April', 'Month-May', 'Month-June', 'Month-July', 'Month-August', 'Month-September', 'Month-October', 'Month-November', 'Month-December']


In [17]:
budget_df = pd.read_excel(xls, sheet_name='Budget', names=['Category', 'Budget'])
budget_df

Unnamed: 0,Category,Budget
0,Rent,1008000 SameAsLastYear
1,Utilities,120000 Appx
2,Security,90000 PaidUpfront
3,,
4,Wages,270000
5,,
6,Inventory,2300000 RoughEstimate
7,Transportation,180000 Estimate
8,Transaction Fees,54000 Appx
9,Advertising,300000 SameAsLastYear


In [18]:
budget_df = budget_df.dropna().reset_index(drop=True)
budget_df

Unnamed: 0,Category,Budget
0,Rent,1008000 SameAsLastYear
1,Utilities,120000 Appx
2,Security,90000 PaidUpfront
3,Wages,270000
4,Inventory,2300000 RoughEstimate
5,Transportation,180000 Estimate
6,Transaction Fees,54000 Appx
7,Advertising,300000 SameAsLastYear
8,Insurance,72000 MoreThanLastYear


In [19]:
# Add 'x' to the tail of the values in the 'Budget' column
budget_df['Budget'] = budget_df['Budget'].astype(str) + ' x'

# Remove non-numeric characters from the 'Budget' column and convert to integer
budget_df['Budget'] = budget_df['Budget'].str.replace(r'\D', '', regex=True).astype(int)
budget_df

Unnamed: 0,Category,Budget
0,Rent,1008000
1,Utilities,120000
2,Security,90000
3,Wages,270000
4,Inventory,2300000
5,Transportation,180000
6,Transaction Fees,54000
7,Advertising,300000
8,Insurance,72000


In [20]:
# calculate the monthly budget
budget_df['Budget'] = (budget_df['Budget'] / 12).round()
budget_df
budget_df

Unnamed: 0,Category,Budget
0,Rent,84000.0
1,Utilities,10000.0
2,Security,7500.0
3,Wages,22500.0
4,Inventory,191667.0
5,Transportation,15000.0
6,Transaction Fees,4500.0
7,Advertising,25000.0
8,Insurance,6000.0


In [21]:
# Initialize an empty list to hold the dataframes
monthly_dfs = []

# Loop through each sheet name (excluding the 'Budget' sheet)
for sheet in sheet_names:
    if sheet != 'Budget':
        # Read the sheet into a dataframe, ignoring the first column
        df = pd.read_excel(xls, sheet_name=sheet).iloc[:, 1:]
        # Add a column for the month
        df['Month'] = sheet
        # Append the dataframe to the list
        monthly_dfs.append(df)

# Concatenate all the dataframes in the list into a single dataframe
combined_df = pd.concat(monthly_dfs, ignore_index=True)
combined_df

Unnamed: 0,Rent,Utilities,Security,Wages,Inventory,Transportation,TransactionFees,Advertising,Insurance,Month,2023-10-01 00:00:00
0,84000,9255.251667,7500,22500,341902.8658,21022.77917,5640.43,25574.99667,6000,Month-January,
1,84000,9255.251667,7500,22500,211345.1858,11957.58917,5471.1,25574.99667,6000,Month-February,
2,84000,9255.251667,7500,22500,230988.6658,18856.66917,4968.78,25574.99667,6000,Month-March,
3,84000,9255.251667,7500,22500,263154.2458,16505.27917,5098.87,25574.99667,6000,Month-April,
4,84000,9255.251667,7500,22500,234528.5758,17847.11917,5128.37,25574.99667,6000,Month-May,
5,84000,9255.251667,7500,22500,229251.0358,18902.92917,5043.07,25574.99667,6000,Month-June,
6,84000,9255.251667,7500,22500,236123.0458,16899.54917,4856.25,25574.99667,6000,Month-July,
7,84000,9255.251667,7500,22500,206210.8058,17955.35917,4770.95,25574.99667,6000,Month-August,
8,84000,9255.251667,7500,22500,213082.8158,19297.19917,4800.45,25574.99667,6000,Month-September,
9,84000,9255.251667,7500,22500,207805.2758,16945.80917,4930.54,25574.99667,6000,Month-October,Actual


In [None]:
# Remove unnessecary columns
combined_df = combined_df.iloc[:, :10]
combined_df

Unnamed: 0,Rent,Utilities,Security,Wages,Inventory,Transportation,TransactionFees,Advertising,Insurance,Month
0,84000,9255.251667,7500,22500,341902.8658,21022.77917,5640.43,25574.99667,6000,Month-January
1,84000,9255.251667,7500,22500,211345.1858,11957.58917,5471.1,25574.99667,6000,Month-February
2,84000,9255.251667,7500,22500,230988.6658,18856.66917,4968.78,25574.99667,6000,Month-March
3,84000,9255.251667,7500,22500,263154.2458,16505.27917,5098.87,25574.99667,6000,Month-April
4,84000,9255.251667,7500,22500,234528.5758,17847.11917,5128.37,25574.99667,6000,Month-May
5,84000,9255.251667,7500,22500,229251.0358,18902.92917,5043.07,25574.99667,6000,Month-June
6,84000,9255.251667,7500,22500,236123.0458,16899.54917,4856.25,25574.99667,6000,Month-July
7,84000,9255.251667,7500,22500,206210.8058,17955.35917,4770.95,25574.99667,6000,Month-August
8,84000,9255.251667,7500,22500,213082.8158,19297.19917,4800.45,25574.99667,6000,Month-September
9,84000,9255.251667,7500,22500,207805.2758,16945.80917,4930.54,25574.99667,6000,Month-October


In [None]:
# Modify the values in the 'Month' column
combined_df['Month'] = combined_df['Month'].str.replace('Month-', '')
combined_df

Unnamed: 0,Rent,Utilities,Security,Wages,Inventory,Transportation,TransactionFees,Advertising,Insurance,Month
0,84000,9255.251667,7500,22500,341902.8658,21022.77917,5640.43,25574.99667,6000,January
1,84000,9255.251667,7500,22500,211345.1858,11957.58917,5471.1,25574.99667,6000,February
2,84000,9255.251667,7500,22500,230988.6658,18856.66917,4968.78,25574.99667,6000,March
3,84000,9255.251667,7500,22500,263154.2458,16505.27917,5098.87,25574.99667,6000,April
4,84000,9255.251667,7500,22500,234528.5758,17847.11917,5128.37,25574.99667,6000,May
5,84000,9255.251667,7500,22500,229251.0358,18902.92917,5043.07,25574.99667,6000,June
6,84000,9255.251667,7500,22500,236123.0458,16899.54917,4856.25,25574.99667,6000,July
7,84000,9255.251667,7500,22500,206210.8058,17955.35917,4770.95,25574.99667,6000,August
8,84000,9255.251667,7500,22500,213082.8158,19297.19917,4800.45,25574.99667,6000,September
9,84000,9255.251667,7500,22500,207805.2758,16945.80917,4930.54,25574.99667,6000,October


In [24]:
# Pivot the dataframe
melted_df = pd.melt(combined_df, id_vars=['Month'], var_name='Category', value_name='Actual Spending')
melted_df

Unnamed: 0,Month,Category,Actual Spending
0,January,Rent,84000.0
1,February,Rent,84000.0
2,March,Rent,84000.0
3,April,Rent,84000.0
4,May,Rent,84000.0
...,...,...,...
103,August,Insurance,6000.0
104,September,Insurance,6000.0
105,October,Insurance,6000.0
106,November,Insurance,6000.0


In [None]:
# Merge the melted dataframe with the budget dataframe
merged_df = pd.merge(melted_df, budget_df, on='Category', how='left')
merged_df

Unnamed: 0,Month,Category,Actual Spending,Budget
0,January,Rent,84000.0,84000.0
1,February,Rent,84000.0,84000.0
2,March,Rent,84000.0,84000.0
3,April,Rent,84000.0,84000.0
4,May,Rent,84000.0,84000.0
...,...,...,...,...
103,August,Insurance,6000.0,6000.0
104,September,Insurance,6000.0,6000.0
105,October,Insurance,6000.0,6000.0
106,November,Insurance,6000.0,6000.0


In [None]:
# Filter the dataframe to show only rows where Actual Spending is greater than Budget
filtered_df = merged_df[merged_df['Actual Spending'] > merged_df['Budget']]
filtered_df

Unnamed: 0,Month,Category,Actual Spending,Budget
48,January,Inventory,341902.8658,191667.0
49,February,Inventory,211345.1858,191667.0
50,March,Inventory,230988.6658,191667.0
51,April,Inventory,263154.2458,191667.0
52,May,Inventory,234528.5758,191667.0
53,June,Inventory,229251.0358,191667.0
54,July,Inventory,236123.0458,191667.0
55,August,Inventory,206210.8058,191667.0
56,September,Inventory,213082.8158,191667.0
57,October,Inventory,207805.2758,191667.0


In [None]:
# Calculate the difference between actual spending and budget
filtered_df['Difference'] = filtered_df['Actual Spending'] - filtered_df['Budget']
filtered_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_df['Difference'] = filtered_df['Actual Spending'] - filtered_df['Budget']


Unnamed: 0,Month,Category,Actual Spending,Budget,Difference
48,January,Inventory,341902.8658,191667.0,150235.8658
49,February,Inventory,211345.1858,191667.0,19678.1858
50,March,Inventory,230988.6658,191667.0,39321.6658
51,April,Inventory,263154.2458,191667.0,71487.2458
52,May,Inventory,234528.5758,191667.0,42861.5758
53,June,Inventory,229251.0358,191667.0,37584.0358
54,July,Inventory,236123.0458,191667.0,44456.0458
55,August,Inventory,206210.8058,191667.0,14543.8058
56,September,Inventory,213082.8158,191667.0,21415.8158
57,October,Inventory,207805.2758,191667.0,16138.2758


In [None]:
# Find the maximum difference for each month
max_diff_df = filtered_df.loc[filtered_df.groupby('Month')['Difference'].idxmax()].reset_index(drop=True)
max_diff_df

Unnamed: 0,Month,Category,Actual Spending,Budget,Difference
0,April,Inventory,263154.2458,191667.0,71487.2458
1,August,Inventory,206210.8058,191667.0,14543.8058
2,December,Advertising,25574.99667,25000.0,574.99667
3,February,Inventory,211345.1858,191667.0,19678.1858
4,January,Inventory,341902.8658,191667.0,150235.8658
5,July,Inventory,236123.0458,191667.0,44456.0458
6,June,Inventory,229251.0358,191667.0,37584.0358
7,March,Inventory,230988.6658,191667.0,39321.6658
8,May,Inventory,234528.5758,191667.0,42861.5758
9,November,Transportation,23844.88917,15000.0,8844.88917


In [None]:
# Round the values to the nearest whole number
ax_diff_df['Actual Spending'] = max_diff_df['Actual Spending'].round()
max_diff_df['Difference'] = max_diff_df['Difference'].round()
output = max_diff_df
output

Unnamed: 0,Month,Category,Actual Spending,Budget,Difference
0,April,Inventory,263154.0,191667.0,71487.0
1,August,Inventory,206211.0,191667.0,14544.0
2,December,Advertising,25575.0,25000.0,575.0
3,February,Inventory,211345.0,191667.0,19678.0
4,January,Inventory,341903.0,191667.0,150236.0
5,July,Inventory,236123.0,191667.0,44456.0
6,June,Inventory,229251.0,191667.0,37584.0
7,March,Inventory,230989.0,191667.0,39322.0
8,May,Inventory,234529.0,191667.0,42862.0
9,November,Transportation,23845.0,15000.0,8845.0
