# 2024: Week 19 - SuperBytes Sales and Profits

May 08, 2024

Challenge by: Saampave Sanmuhanathan

We're continuing with DS43's challenges so over to Saampave to explain the her first challenge. 

_____________________________________

This week we are looking at the sales and profit data of Superbytes between 2018 and 2022. The finance department would like to compare the yearly sales and profit of the store.

### Inputs

The input data consists of 5 sheets corresponding to each year (between 2018 and 2022). Each sheet includes sales and profit by quarter of each year.

![1](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbP5lk4rDtTUNFn8qtSk96ZswUYfk4QtgfRiPQcWPV1RCXT931g9Hkzz29QaFbvz5L33rsLogSKpa8hyphenhyphenZDr8S5OugSFnf5tqeplzBHfV2BHIbbE64V7vG_pg-S2_M6VqaAt69hRK2oOrru5VaRHN6QRqt4waMMnt4CYruHMfrVDK52-R2uZbpGJFsZ0IGQ/s333/Sales%20and%20Profits%20Input%20Image.png)

2018 table

There are 5 sheets in total with the title of the sheets according to the year the data was collected.

### Requirements
- Input the data
- Union tables together
- We want to remove the units from the sales field to convert the data type to the whole number
- Repeat this process for the profits field
- Aggregate sales and profits by year
- Output the data

### Output

![2](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHSxX4LKy-srBVmcvhvOGLS8CY2BYCELGKusx9Zq8JCNKWKdWfDK0ZxF7567eF2JdYkQJ4jRRcyswYvYlchEmdPqAfBLqwkES9oeDOMQ9H-29Li6Gp76FQAVCeWyLF-CHt-h3KlmAB-l3NTjurXBJxcEcb-lfECUkdKRRRwyEIexxXTApbJDSeTR97HTmO/s250/Sales%20and%20Profit%20Output%20image.png)

- 3 fields
- Year
- Sales
- Profits
- 5 rows

In [49]:
import pandas as pd

# Read the Excel file
file_path = 'SuperBytes Sales_ Profits.xlsx'
xls = pd.ExcelFile(file_path)

# List all sheet names
sheet_names = xls.sheet_names
print(sheet_names)

['2018', '2019', '2020', '2021', '2022']


In [50]:
# Initialize an empty list to hold dataframes
dfs = []

# Loop through each sheet and read it into a dataframe
for sheet in sheet_names:
    df = pd.read_excel(xls, sheet_name=sheet)
    df['Year'] = sheet  # Add a column for the year
    dfs.append(df)

# Concatenate all dataframes
combined_df = pd.concat(dfs, ignore_index=True)
combined_df

Unnamed: 0.1,Unnamed: 0,Sales,Profits,Year
0,Q1,"4,300.432K",1.1M,2018
1,Q2,"4,250.122K","1,000.152K",2018
2,Q3,4.61M,"1,300.345K",2018
3,Q4,"5,100.476K",2.07M,2018
4,Q1,"4,211.934K",0.9M,2019
5,Q2,4.432M,"1,200.566K",2019
6,Q3,"4,661.333K","1,300.101K",2019
7,Q4,5.2M,2M,2019
8,Q1,"4,253.111K",1.1M,2020
9,Q2,"5,000.555K",2.2M,2020


In [51]:
combined_df.rename(columns={'Unnamed: 0': 'Quarter'}, inplace=True)
combined_df

Unnamed: 0,Quarter,Sales,Profits,Year
0,Q1,"4,300.432K",1.1M,2018
1,Q2,"4,250.122K","1,000.152K",2018
2,Q3,4.61M,"1,300.345K",2018
3,Q4,"5,100.476K",2.07M,2018
4,Q1,"4,211.934K",0.9M,2019
5,Q2,4.432M,"1,200.566K",2019
6,Q3,"4,661.333K","1,300.101K",2019
7,Q4,5.2M,2M,2019
8,Q1,"4,253.111K",1.1M,2020
9,Q2,"5,000.555K",2.2M,2020


In [52]:
# Function to determine the multiplier
def get_multiplier(value):
    if 'K' in value:
        return 1000
    elif 'M' in value:
        return 1000000
    return 1

# Apply the function to create the new columns
combined_df['multify_sale'] = combined_df['Sales'].apply(get_multiplier)
combined_df['multify_profits'] = combined_df['Profits'].apply(get_multiplier)

combined_df

Unnamed: 0,Quarter,Sales,Profits,Year,multify_sale,multify_profits
0,Q1,"4,300.432K",1.1M,2018,1000,1000000
1,Q2,"4,250.122K","1,000.152K",2018,1000,1000
2,Q3,4.61M,"1,300.345K",2018,1000000,1000
3,Q4,"5,100.476K",2.07M,2018,1000,1000000
4,Q1,"4,211.934K",0.9M,2019,1000,1000000
5,Q2,4.432M,"1,200.566K",2019,1000000,1000
6,Q3,"4,661.333K","1,300.101K",2019,1000,1000
7,Q4,5.2M,2M,2019,1000000,1000000
8,Q1,"4,253.111K",1.1M,2020,1000,1000000
9,Q2,"5,000.555K",2.2M,2020,1000,1000000


In [53]:
# Remove commas, 'K' and 'M' and convert to numeric
combined_df['Sales'] = combined_df.apply(lambda x: float(x['Sales'].replace(',', '').replace('K', '').replace('M', '')) * x['multify_sale'], axis=1)
combined_df['Profits'] = combined_df.apply(lambda x: float(x['Profits'].replace(',', '').replace('K', '').replace('M', '')) * x['multify_profits'], axis=1)

# Drop the multiplier columns as they are no longer needed
combined_df.drop(columns=['multify_sale', 'multify_profits'], inplace=True)

combined_df

Unnamed: 0,Quarter,Sales,Profits,Year
0,Q1,4300432.0,1100000.0,2018
1,Q2,4250122.0,1000152.0,2018
2,Q3,4610000.0,1300345.0,2018
3,Q4,5100476.0,2070000.0,2018
4,Q1,4211934.0,900000.0,2019
5,Q2,4432000.0,1200566.0,2019
6,Q3,4661333.0,1300101.0,2019
7,Q4,5200000.0,2000000.0,2019
8,Q1,4253111.0,1100000.0,2020
9,Q2,5000555.0,2200000.0,2020


In [54]:
# Aggregate sales and profits by year
aggregated_df = combined_df.groupby('Year').agg({'Sales': 'sum', 'Profits': 'sum'}).reset_index()
output = aggregated_df
output

Unnamed: 0,Year,Sales,Profits
0,2018,18261030.0,5470497.0
1,2019,18505267.0,5400667.0
2,2020,16111418.0,5410007.0
3,2021,17008779.0,6404258.0
4,2022,18099644.0,5766245.0
