# 2024: Week 16 - Budget vs Actuals

April 17, 2024

Challenge by: Michał Mioduchowski

We're continuing with DS43's challenges so over to Michał to explain the next challenge. 


_____________________________________


For this week, Superbytes has requested us to look at their budget sheet for 2022. The company believes that their rough budget estimates were very close to the final budget allocations for that year. Examining historical budget data allows the company to identify trends in spending patterns over time. This can provide valuable insights into how expenses have evolved and help predict future expenditure trends. CEO Phil Down would like us to find the exact areas of mismatch between predicted and actual spending.

### Inputs

There are 2 inputs for this challenge:

Forecasted Spending
 
![1](https://lh7-eu.googleusercontent.com/gSyi9H-IN0rOd6SmwF7K1YF59bm7otma1_AMiyiWv5O9KNoHJjMjeBToDsxFFPDC7LicESfMdTa5FYt5jlTDw_QIE673dRJWrxjYsjaxIhJeKKmXp-JGbtI4EFAVf6bnytyY5a7RqWMFvoFSCu6rGKA)

Actual Spending

![2](https://lh7-eu.googleusercontent.com/SMIstRe1J2jKiC-YBLtwBB4Dy-rJ4hC7vrmnxIBqk-_bfNqFWHyeAu0d4I6wj1sy9XC124floldfLLOGJdPudl2EMeX5wmg6pZdZg-TV8D4ZB4R0LG3nbujWY0roRjIlvmHpPtbLIpOx3Dx9xN7pbv8)

### Requirements

- Input the Excel file
- Match Sheet 1 and Sheet 2 in formatting. Both should include:
- Category field [String]
- Budget/Actual field [Number (Decimal)]
- Join both sheets based on Category field to create a single table with 3 columns:
- Category [String]
- Budget [Number]
- Actual [Number]
- Rename the fields to:
- Category
- Forecasted Spending
- Actual Spending
- Create a new calculated column with the differences between forecasted and actual values.
- Values in the new column should be rounded to whole numbers [ROUND(...)]
- Output the data

### Output

![3](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiEwdPJVXL33gzeUoI7ww2oH5IKcrw2sgMrJtMtPWQoXQ3N_1fmUrwsC5_4nETnsGDQI7MX_1f3HxtjIkZChhcw-tz4ZYU9pjvQNyFVtZneW6AHK5SFuTwWQIDLX_Lk3qq_i8VHj7xLAyrJKWuNuLzrx17HVouTYV0dgaB6D565ncWlGqX-wJsOevKn3j9H/s517/Screenshot%202024-03-13%20132341.png)

- 4 fields
- Category
- Forecasted Spending 
- Actual Spending
- Difference
- 9 rows

In [100]:
import pandas as pd

# Read the Excel file
excel_file = 'budget data 2022 (beginner).xlsx'

# List all sheet names
sheet_names = pd.ExcelFile(excel_file).sheet_names
print(sheet_names)

['Budget', 'Actual']


In [101]:
budget_df = pd.read_excel(excel_file, sheet_name='Budget', names=['Category', 'Budget', 'Notes'])
budget_df

Unnamed: 0,Category,Budget,Notes
0,Rent,1000000,Same as last year
1,Utilities,120000,Appx
2,Security,90000,Already paid
3,,,
4,Wages,270K,ADJUSTED
5,,,
6,Inventory,2300000,More less
7,Transportation,180000,More less
8,Transaction Fees,54000,"That was last year, probably higher this year"
9,Advertising,300000,Ask Mark about exact sum


In [102]:
actual_df = pd.read_excel(excel_file, sheet_name='Actual', names=['Category', 'Actual'])
actual_df

Unnamed: 0,Category,Actual
0,Rent,1000000.0
1,Utilities,111063.02
2,Security,90000.0
3,Wages,270000.0
4,Inventory,2654003.11
5,Transportation,214814.87
6,TransactionFees,59395.92
7,Advertising,306899.96
8,Insurance,70000.0


In [103]:
# Replace 'TransactionFees' with 'Transaction Fees' in the 'Category' column
actual_df['Category'] = actual_df['Category'].replace('TransactionFees', 'Transaction Fees')

# Display the updated dataframe
actual_df

Unnamed: 0,Category,Actual
0,Rent,1000000.0
1,Utilities,111063.02
2,Security,90000.0
3,Wages,270000.0
4,Inventory,2654003.11
5,Transportation,214814.87
6,Transaction Fees,59395.92
7,Advertising,306899.96
8,Insurance,70000.0


In [104]:
# drop rows with missing values
budget_df = budget_df.dropna()
budget_df

Unnamed: 0,Category,Budget,Notes
0,Rent,1000000,Same as last year
1,Utilities,120000,Appx
2,Security,90000,Already paid
4,Wages,270K,ADJUSTED
6,Inventory,2300000,More less
7,Transportation,180000,More less
8,Transaction Fees,54000,"That was last year, probably higher this year"
9,Advertising,300000,Ask Mark about exact sum
11,Insurance,70K,Exactly 70K


In [105]:
# Replace 'K' with '000' in the 'Budget' column
budget_df['Budget'] = budget_df['Budget'].replace({'K': '000'}, regex=True)

# Convert the 'Budget' column to float
budget_df['Budget'] = budget_df['Budget'].astype(float)

# Display the updated dataframe
budget_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  budget_df['Budget'] = budget_df['Budget'].replace({'K': '000'}, regex=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  budget_df['Budget'] = budget_df['Budget'].astype(float)


Unnamed: 0,Category,Budget,Notes
0,Rent,1000000.0,Same as last year
1,Utilities,120000.0,Appx
2,Security,90000.0,Already paid
4,Wages,270000.0,ADJUSTED
6,Inventory,2300000.0,More less
7,Transportation,180000.0,More less
8,Transaction Fees,54000.0,"That was last year, probably higher this year"
9,Advertising,300000.0,Ask Mark about exact sum
11,Insurance,70000.0,Exactly 70K


In [106]:
# Merge the dataframes on the 'Category' column
merged_df = pd.merge(budget_df, actual_df, on='Category')

# Display the merged dataframe
merged_df

Unnamed: 0,Category,Budget,Notes,Actual
0,Rent,1000000.0,Same as last year,1000000.0
1,Utilities,120000.0,Appx,111063.02
2,Security,90000.0,Already paid,90000.0
3,Wages,270000.0,ADJUSTED,270000.0
4,Inventory,2300000.0,More less,2654003.11
5,Transportation,180000.0,More less,214814.87
6,Transaction Fees,54000.0,"That was last year, probably higher this year",59395.92
7,Advertising,300000.0,Ask Mark about exact sum,306899.96
8,Insurance,70000.0,Exactly 70K,70000.0


In [107]:
# Keep only the relevant columns and rename them
merged_df = merged_df[['Category', 'Budget', 'Actual']]
merged_df.columns = ['Category', 'Forecasted Spending', 'Actual Spending']

# Display the updated dataframe
merged_df

Unnamed: 0,Category,Forecasted Spending,Actual Spending
0,Rent,1000000.0,1000000.0
1,Utilities,120000.0,111063.02
2,Security,90000.0,90000.0
3,Wages,270000.0,270000.0
4,Inventory,2300000.0,2654003.11
5,Transportation,180000.0,214814.87
6,Transaction Fees,54000.0,59395.92
7,Advertising,300000.0,306899.96
8,Insurance,70000.0,70000.0


In [108]:
# Calculate the difference and round to whole numbers
merged_df['Difference'] = (merged_df['Actual Spending'] - merged_df['Forecasted Spending']).round()

# Display the updated dataframe
merged_df

Unnamed: 0,Category,Forecasted Spending,Actual Spending,Difference
0,Rent,1000000.0,1000000.0,0.0
1,Utilities,120000.0,111063.02,-8937.0
2,Security,90000.0,90000.0,0.0
3,Wages,270000.0,270000.0,0.0
4,Inventory,2300000.0,2654003.11,354003.0
5,Transportation,180000.0,214814.87,34815.0
6,Transaction Fees,54000.0,59395.92,5396.0
7,Advertising,300000.0,306899.96,6900.0
8,Insurance,70000.0,70000.0,0.0
