# 2024: Week 10 - Preppin' for Pulse

March 06, 2024

Challenge by: Jenny Martin

Tableau Product Manager, Libby Knell has challenged us to create a Preppin' Data challenge that gets data in shape for working with the newly released Tableau Pulse. Pulse empowers every employee to make better, faster decisions by tracking a metric’s current value, compared to the past — so what does this mean for the shape of our data? Currently, Pulse works best with:

- Data that is up to date and recent - the last 1 or 2 years and changes regularly - daily or weekly preferable
- Data that is complete, without gaps
- Data that is as granular as possible
- The names of fields are human readable - let’s not use acronyms that lack context!
- String values in the data are consistently named

Chin & Beard Suds Co are excited to get started using Tableau Pulse so their employees can make smarter decisions for their flagship store. Their store manager is always on the go and works off their phone daily. With the insights Pulse provides, they’ll be better able to keep track of which products are selling well and might need to be restocked sooner than anticipated. Even better, digests on Email and Slack, as well as Tableau Pulse on Tableau Mobile, meets them where they are so they can stay on top of their daily changing data!

### Inputs

1. Transaction Data 

![1](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWZcwoM_0o6KlJkIwhyQhOWmiTjwBbZFt5gv9auf-1bTxnycz44lyjqQYb9vxI-cHlyXd6nBqCNY4j3dS7IFRIaXY_JycFwwfdFECG1s8LnOCt-6VsVqW5oV1Et6F7SNHnsf_2ln8sklULngxZtZVGdMnUGDye3m11heNDMkVqSVQlg8uIeusNG-nGD64C/s1047/Screenshot%202024-02-28%20120535.png)

2. Product Table 

![2](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0nQ0aERFOnpB_u_QKzNDgcjl9pv0F7K6c_AjyawfKo4qzVe6bOgh4dB_By5w41tVrH0p8mWJleGSO5w-0zJSqdZv4ppLzaRZXILaaEd3MWTNHttmJvN7U2EW-1WR9g_wRBW8WlkcXByPIkEpriD0fR_1MWkYwyA7w8fG1NUIpa8UBzy1G48viWako7kEF/s1042/Screenshot%202024-02-28%20120616.png)

3. Loyalty Table 

![3](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjyZzVWEKdubh4LHQRmkoOnQS9mso515aYfuOGgO-PGngm_Z9DUb8tafEAveHGaSi5uCxDiuBsWSpDTd5zzT_ZgNy4TegVcDB2pIN0vZKlQsRvFE4I94Boh3YpkkuxyZCD8kTz9ECXuwKqeNa_GZ48kh1AvCaZ6Ec57aCvtMjIlOzzaH8HH7EmA-MpQDWke/s1016/24W10%20Loyalty.png)

### Requirements

- Input the data
- Filter to the last 2 years of data i.e. only 2023 and 2024 transactions
- This will allow for year on year comparison
- Create additional rows of data for the days the store was closed, ensuring all other fields will have null values
- The store is closed on Sundays and Public Holidays
- Update the Cash_or_Card field so that:
- 1=Card
- 2=Cash
- Join the Product Table
- You'll need to prepare the join clause fields first
- Calculate the Quantity of each transaction 
- Defined as the Sales_Before_Discount / Selling_Price
- In the Loyalty Table:
- The Customer_Name is currently reading Last Name, First Name. Update it to read First Name Last Name in Title case
- e.g. knell, libby becomes Libby Knell
- Group together the Loyalty_Tiers into Gold, Silver and Bronze
- Update the Loyalty_Discount to be a numeric field
- Join the Loyalty Table
- Create a Sales_After_Discount field to apply the Loyalty_Discount for transactions with a Loyalty_Number
- Calculate the Profit, defined as:
- Sales_After_Discount - (Unit_Cost * Quantity)
- Update the field names to remove all underscores and replace them with spaces
- Remove any unnecessary fields
- Output the data
- If you're working in a tool that allows you to create a Published Data Source, that would be best, as we're preparing the data for Pulse!

### Output

![4](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_VI39Xs1bHMGkPODGH1xQV6vwasEb9g7k-J-CUg53tTBA2IFcJFqCej3K8vPR0xUjsQSr9AyAeAYMp0PgLdp79wQuL2j8kyJ-RnF4D2PLvHvFdkip0W0y3wKw5L_Chf4zMw4T8QKgZyj6sv35Pr28lyb0N6UuqJSlO9KouueNgvkomZEsdLAumvE6k1Ls/s1274/Screenshot%202024-02-28%20120907.png)

- 14 fields
- Transaction Date
- Transaction Number
- Product Type
- Product Scent
- Product Size
- Cash or Card
- Loyalty Number
- Customer Name
- Loyalty Tier
- Loyalty Discount
- Quantity
- Sales Before Discount
- Sales After Discount
- Profit
- 39,337 rows (39,338 including headers)

In [1]:
import pandas as pd

# Read the Excel file
excel_file = '2024W10 Input.xlsx'

# List all sheet names
sheet_names = pd.ExcelFile(excel_file).sheet_names
print(sheet_names)

['Product Table', 'Transaction Data', 'Loyalty Table']


In [2]:
product_df = pd.read_excel(excel_file, sheet_name=sheet_names[0])


product_df

Unnamed: 0,Product_Type,Product_Scent,Pack_Size,Product_Size,Unit_Cost,Selling_Price
0,Bar,Lavender Fields,1x,,1.25,1.77
1,Bar,Citrus Breeze,1x,,0.75,0.81
2,Bar,Ocean Mist,1x,,0.66,1.2
3,Bar,Fresh Rain,1x,,0.94,1.61
4,Bar,Rose Garden,1x,,1.55,2.45
5,Bar,Eucalyptus Mint,1x,,1.89,3.49
6,Bar,Sandalwood Spice,1x,,2.04,3.75
7,Bar,Vanilla Bean,1x,,2.5,3.64
8,Bar,Coconut Dream,1x,,1.44,2.74
9,Bar,Cedarwood Forest,1x,,1.11,2.18


In [3]:
trans_data_df = pd.read_excel(excel_file, sheet_name=sheet_names[1])
trans_data_df

Unnamed: 0,Transaction_Date,Transanction_Number,Product_ID,Cash_or_Card,Loyalty_Number,Sales_Before_Discount
0,"Sat, January 02, 2021",20121001,Bar-Ocean_Mist-1x,1,1004721.0,6.00
1,"Sat, January 02, 2021",20121001,Liquid-Rose_Garden-0.5L,1,1004721.0,14.10
2,"Sat, January 02, 2021",20121002,Bar-Citrus_Breeze-4x,2,1009280.0,8.12
3,"Sat, January 02, 2021",20121002,Liquid-Coconut_Dream-0.5L,2,1009280.0,12.36
4,"Sat, January 02, 2021",20121003,Bar-Ocean_Mist-4x,1,1009022.0,13.95
...,...,...,...,...,...,...
105490,"Wed, March 06, 2024",60324062,Bar-Sandalwood_Spice-4x,2,1001139.0,31.96
105491,"Wed, March 06, 2024",60324062,Liquid-Cedarwood_Forest-0.25L,2,1001139.0,7.80
105492,"Wed, March 06, 2024",60324063,Liquid-Citrus_Breeze-0.5L,1,1007693.0,2.48
105493,"Wed, March 06, 2024",60324063,Liquid-Sandalwood_Spice-0.5L,1,1007693.0,47.80


In [4]:
loyalty_df = pd.read_excel(excel_file, sheet_name=sheet_names[2])
loyalty_df

Unnamed: 0,Loyalty_Number,Customer_Name,Loyalty_Tier,Loyalty_Discount
0,1000012,"trimmill, leeanne",Bronz,
1,1000026,"kobierski, teador",,
2,1000028,"plues, jenelle",Bronz,
3,1000032,"firmager, gabriell",Bronz,
4,1000038,"chiles, nicolea",,
...,...,...,...,...
9784,1009987,"rattenberie, thacher",Sliver,10%
9785,1009990,"mariaud, rosanna",Sliver,10%
9786,1009991,"d'ambrogio, edgar",Silver,10%
9787,1009995,"scrivener, mark",Silver,10%


In [5]:
# Convert Transaction_Date to datetime format
trans_data_df['Transaction_Date'] = pd.to_datetime(trans_data_df['Transaction_Date'])

# Filter to the last 2 years of data (2023 and 2024)
filtered_trans_data_df = trans_data_df[(trans_data_df['Transaction_Date'].dt.year >= 2023) & (trans_data_df['Transaction_Date'].dt.year <= 2024)]

filtered_trans_data_df

Unnamed: 0,Transaction_Date,Transanction_Number,Product_ID,Cash_or_Card,Loyalty_Number,Sales_Before_Discount
66228,2023-01-03,30123001,Liquid-Sandalwood_Spice-0.25L,2,1005245.0,8.50
66229,2023-01-03,30123001,Liquid-Vanilla_Bean-0.5L,2,1005245.0,14.70
66230,2023-01-03,30123002,Liquid-Sandalwood_Spice-1L,1,1007270.0,13.19
66231,2023-01-03,30123003,Liquid-Eucalyptus_Mint-0.25L,2,1009750.0,9.00
66232,2023-01-03,30123003,Liquid-Eucalyptus_Mint-1L,2,1009750.0,39.81
...,...,...,...,...,...,...
105490,2024-03-06,60324062,Bar-Sandalwood_Spice-4x,2,1001139.0,31.96
105491,2024-03-06,60324062,Liquid-Cedarwood_Forest-0.25L,2,1001139.0,7.80
105492,2024-03-06,60324063,Liquid-Citrus_Breeze-0.5L,1,1007693.0,2.48
105493,2024-03-06,60324063,Liquid-Sandalwood_Spice-0.5L,1,1007693.0,47.80


In [6]:
# Define the complete date range from the minimum to the maximum date in the filtered transaction data
complete_date_range = pd.date_range(start=filtered_trans_data_df['Transaction_Date'].min(), 
									end=filtered_trans_data_df['Transaction_Date'].max(), 
									freq='D')

# Create a DataFrame with all dates in the range
all_dates_df = pd.DataFrame({'Transaction_Date': complete_date_range})

# Merge with the filtered transaction data to find missing dates
extended_trans_data_df = all_dates_df.merge(filtered_trans_data_df, on='Transaction_Date', how='left')

extended_trans_data_df

Unnamed: 0,Transaction_Date,Transanction_Number,Product_ID,Cash_or_Card,Loyalty_Number,Sales_Before_Discount
0,2023-01-03,30123001.0,Liquid-Sandalwood_Spice-0.25L,2.0,1005245.0,8.50
1,2023-01-03,30123001.0,Liquid-Vanilla_Bean-0.5L,2.0,1005245.0,14.70
2,2023-01-03,30123002.0,Liquid-Sandalwood_Spice-1L,1.0,1007270.0,13.19
3,2023-01-03,30123003.0,Liquid-Eucalyptus_Mint-0.25L,2.0,1009750.0,9.00
4,2023-01-03,30123003.0,Liquid-Eucalyptus_Mint-1L,2.0,1009750.0,39.81
...,...,...,...,...,...,...
39332,2024-03-06,60324062.0,Bar-Sandalwood_Spice-4x,2.0,1001139.0,31.96
39333,2024-03-06,60324062.0,Liquid-Cedarwood_Forest-0.25L,2.0,1001139.0,7.80
39334,2024-03-06,60324063.0,Liquid-Citrus_Breeze-0.5L,1.0,1007693.0,2.48
39335,2024-03-06,60324063.0,Liquid-Sandalwood_Spice-0.5L,1.0,1007693.0,47.80


In [7]:
# Replace values in the 'Cash_or_Card' column
filtered_trans_data_df['Cash_or_Card'] = filtered_trans_data_df['Cash_or_Card'].replace({1: 'Card', 2: 'Cash'})
extended_trans_data_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_trans_data_df['Cash_or_Card'] = filtered_trans_data_df['Cash_or_Card'].replace({1: 'Card', 2: 'Cash'})


Unnamed: 0,Transaction_Date,Transanction_Number,Product_ID,Cash_or_Card,Loyalty_Number,Sales_Before_Discount
0,2023-01-03,30123001.0,Liquid-Sandalwood_Spice-0.25L,2.0,1005245.0,8.50
1,2023-01-03,30123001.0,Liquid-Vanilla_Bean-0.5L,2.0,1005245.0,14.70
2,2023-01-03,30123002.0,Liquid-Sandalwood_Spice-1L,1.0,1007270.0,13.19
3,2023-01-03,30123003.0,Liquid-Eucalyptus_Mint-0.25L,2.0,1009750.0,9.00
4,2023-01-03,30123003.0,Liquid-Eucalyptus_Mint-1L,2.0,1009750.0,39.81
...,...,...,...,...,...,...
39332,2024-03-06,60324062.0,Bar-Sandalwood_Spice-4x,2.0,1001139.0,31.96
39333,2024-03-06,60324062.0,Liquid-Cedarwood_Forest-0.25L,2.0,1001139.0,7.80
39334,2024-03-06,60324063.0,Liquid-Citrus_Breeze-0.5L,1.0,1007693.0,2.48
39335,2024-03-06,60324063.0,Liquid-Sandalwood_Spice-0.5L,1.0,1007693.0,47.80


In [8]:
# Merge Pack_Size and Product_Size into a new column called 'Size'
product_df['Size'] = product_df['Pack_Size'].fillna('') + product_df['Product_Size'].fillna('')

# Drop the original Pack_Size and Product_Size columns
product_df = product_df.drop(columns=['Pack_Size', 'Product_Size'])

product_df

Unnamed: 0,Product_Type,Product_Scent,Unit_Cost,Selling_Price,Size
0,Bar,Lavender Fields,1.25,1.77,1x
1,Bar,Citrus Breeze,0.75,0.81,1x
2,Bar,Ocean Mist,0.66,1.2,1x
3,Bar,Fresh Rain,0.94,1.61,1x
4,Bar,Rose Garden,1.55,2.45,1x
5,Bar,Eucalyptus Mint,1.89,3.49,1x
6,Bar,Sandalwood Spice,2.04,3.75,1x
7,Bar,Vanilla Bean,2.5,3.64,1x
8,Bar,Coconut Dream,1.44,2.74,1x
9,Bar,Cedarwood Forest,1.11,2.18,1x


In [9]:
# Replace spaces with underscores in the Product_Scent column
product_df['Product_Scent'] = product_df['Product_Scent'].str.replace(' ', '_')
product_df

Unnamed: 0,Product_Type,Product_Scent,Unit_Cost,Selling_Price,Size
0,Bar,Lavender_Fields,1.25,1.77,1x
1,Bar,Citrus_Breeze,0.75,0.81,1x
2,Bar,Ocean_Mist,0.66,1.2,1x
3,Bar,Fresh_Rain,0.94,1.61,1x
4,Bar,Rose_Garden,1.55,2.45,1x
5,Bar,Eucalyptus_Mint,1.89,3.49,1x
6,Bar,Sandalwood_Spice,2.04,3.75,1x
7,Bar,Vanilla_Bean,2.5,3.64,1x
8,Bar,Coconut_Dream,1.44,2.74,1x
9,Bar,Cedarwood_Forest,1.11,2.18,1x


In [10]:
# Create merge key
product_df['merge_key'] = product_df['Product_Type'] + '-' + product_df['Product_Scent'] + '-' + product_df['Size']
product_df

Unnamed: 0,Product_Type,Product_Scent,Unit_Cost,Selling_Price,Size,merge_key
0,Bar,Lavender_Fields,1.25,1.77,1x,Bar-Lavender_Fields-1x
1,Bar,Citrus_Breeze,0.75,0.81,1x,Bar-Citrus_Breeze-1x
2,Bar,Ocean_Mist,0.66,1.2,1x,Bar-Ocean_Mist-1x
3,Bar,Fresh_Rain,0.94,1.61,1x,Bar-Fresh_Rain-1x
4,Bar,Rose_Garden,1.55,2.45,1x,Bar-Rose_Garden-1x
5,Bar,Eucalyptus_Mint,1.89,3.49,1x,Bar-Eucalyptus_Mint-1x
6,Bar,Sandalwood_Spice,2.04,3.75,1x,Bar-Sandalwood_Spice-1x
7,Bar,Vanilla_Bean,2.5,3.64,1x,Bar-Vanilla_Bean-1x
8,Bar,Coconut_Dream,1.44,2.74,1x,Bar-Coconut_Dream-1x
9,Bar,Cedarwood_Forest,1.11,2.18,1x,Bar-Cedarwood_Forest-1x


In [11]:
# Perform a left join
merged_df = extended_trans_data_df.merge(product_df, how='left', left_on='Product_ID', right_on='merge_key')

merged_df

Unnamed: 0,Transaction_Date,Transanction_Number,Product_ID,Cash_or_Card,Loyalty_Number,Sales_Before_Discount,Product_Type,Product_Scent,Unit_Cost,Selling_Price,Size,merge_key
0,2023-01-03,30123001.0,Liquid-Sandalwood_Spice-0.25L,2.0,1005245.0,8.50,Liquid,Sandalwood_Spice,2.45,4.25,0.25L,Liquid-Sandalwood_Spice-0.25L
1,2023-01-03,30123001.0,Liquid-Vanilla_Bean-0.5L,2.0,1005245.0,14.70,Liquid,Vanilla_Bean,5.97,7.35,0.5L,Liquid-Vanilla_Bean-0.5L
2,2023-01-03,30123002.0,Liquid-Sandalwood_Spice-1L,1.0,1007270.0,13.19,Liquid,Sandalwood_Spice,9.67,13.19,1L,Liquid-Sandalwood_Spice-1L
3,2023-01-03,30123003.0,Liquid-Eucalyptus_Mint-0.25L,2.0,1009750.0,9.00,Liquid,Eucalyptus_Mint,2.27,4.50,0.25L,Liquid-Eucalyptus_Mint-0.25L
4,2023-01-03,30123003.0,Liquid-Eucalyptus_Mint-1L,2.0,1009750.0,39.81,Liquid,Eucalyptus_Mint,8.96,13.27,1L,Liquid-Eucalyptus_Mint-1L
...,...,...,...,...,...,...,...,...,...,...,...,...
39332,2024-03-06,60324062.0,Bar-Sandalwood_Spice-4x,2.0,1001139.0,31.96,Bar,Sandalwood_Spice,7.34,7.99,4x,Bar-Sandalwood_Spice-4x
39333,2024-03-06,60324062.0,Liquid-Cedarwood_Forest-0.25L,2.0,1001139.0,7.80,Liquid,Cedarwood_Forest,1.33,1.56,0.25L,Liquid-Cedarwood_Forest-0.25L
39334,2024-03-06,60324063.0,Liquid-Citrus_Breeze-0.5L,1.0,1007693.0,2.48,Liquid,Citrus_Breeze,1.79,2.48,0.5L,Liquid-Citrus_Breeze-0.5L
39335,2024-03-06,60324063.0,Liquid-Sandalwood_Spice-0.5L,1.0,1007693.0,47.80,Liquid,Sandalwood_Spice,4.87,9.56,0.5L,Liquid-Sandalwood_Spice-0.5L


In [12]:
# calculate quantity
merged_df['Quantity'] = merged_df['Sales_Before_Discount'] / merged_df['Selling_Price']
merged_df

Unnamed: 0,Transaction_Date,Transanction_Number,Product_ID,Cash_or_Card,Loyalty_Number,Sales_Before_Discount,Product_Type,Product_Scent,Unit_Cost,Selling_Price,Size,merge_key,Quantity
0,2023-01-03,30123001.0,Liquid-Sandalwood_Spice-0.25L,2.0,1005245.0,8.50,Liquid,Sandalwood_Spice,2.45,4.25,0.25L,Liquid-Sandalwood_Spice-0.25L,2.0
1,2023-01-03,30123001.0,Liquid-Vanilla_Bean-0.5L,2.0,1005245.0,14.70,Liquid,Vanilla_Bean,5.97,7.35,0.5L,Liquid-Vanilla_Bean-0.5L,2.0
2,2023-01-03,30123002.0,Liquid-Sandalwood_Spice-1L,1.0,1007270.0,13.19,Liquid,Sandalwood_Spice,9.67,13.19,1L,Liquid-Sandalwood_Spice-1L,1.0
3,2023-01-03,30123003.0,Liquid-Eucalyptus_Mint-0.25L,2.0,1009750.0,9.00,Liquid,Eucalyptus_Mint,2.27,4.50,0.25L,Liquid-Eucalyptus_Mint-0.25L,2.0
4,2023-01-03,30123003.0,Liquid-Eucalyptus_Mint-1L,2.0,1009750.0,39.81,Liquid,Eucalyptus_Mint,8.96,13.27,1L,Liquid-Eucalyptus_Mint-1L,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
39332,2024-03-06,60324062.0,Bar-Sandalwood_Spice-4x,2.0,1001139.0,31.96,Bar,Sandalwood_Spice,7.34,7.99,4x,Bar-Sandalwood_Spice-4x,4.0
39333,2024-03-06,60324062.0,Liquid-Cedarwood_Forest-0.25L,2.0,1001139.0,7.80,Liquid,Cedarwood_Forest,1.33,1.56,0.25L,Liquid-Cedarwood_Forest-0.25L,5.0
39334,2024-03-06,60324063.0,Liquid-Citrus_Breeze-0.5L,1.0,1007693.0,2.48,Liquid,Citrus_Breeze,1.79,2.48,0.5L,Liquid-Citrus_Breeze-0.5L,1.0
39335,2024-03-06,60324063.0,Liquid-Sandalwood_Spice-0.5L,1.0,1007693.0,47.80,Liquid,Sandalwood_Spice,4.87,9.56,0.5L,Liquid-Sandalwood_Spice-0.5L,5.0


In [13]:
# Split the Customer_Name into Last Name and First Name, then swap and convert to title case
loyalty_df['Customer_Name'] = loyalty_df['Customer_Name'].apply(lambda x: ' '.join(reversed(x.split(', '))).title())

loyalty_df

Unnamed: 0,Loyalty_Number,Customer_Name,Loyalty_Tier,Loyalty_Discount
0,1000012,Leeanne Trimmill,Bronz,
1,1000026,Teador Kobierski,,
2,1000028,Jenelle Plues,Bronz,
3,1000032,Gabriell Firmager,Bronz,
4,1000038,Nicolea Chiles,,
...,...,...,...,...
9784,1009987,Thacher Rattenberie,Sliver,10%
9785,1009990,Rosanna Mariaud,Sliver,10%
9786,1009991,Edgar D'Ambrogio,Silver,10%
9787,1009995,Mark Scrivener,Silver,10%


In [14]:
# list unique loyalty tiers
unique_loyalty_tiers = loyalty_df['Loyalty_Tier'].unique()
print(unique_loyalty_tiers)

['Bronz' nan 'Bronze' 'bronze' 'Goald' 'Gold' 'gold' 'Silver' 'Sliver'
 'silver']


In [15]:
# Define the mapping dictionary
loyalty_tier_mapping = {
    'Bronz': 'Bronze',
    'Bronze': 'Bronze',
    'bronze': 'Bronze',
    'Goald': 'Gold',
    'Gold': 'Gold',
    'gold': 'Gold',
    'Silver': 'Silver',
    'Sliver': 'Silver',
    'silver': 'Silver'
}

# Replace the values in the Loyalty_Tier column
loyalty_df['Loyalty_Tier'] = loyalty_df['Loyalty_Tier'].replace(loyalty_tier_mapping)

loyalty_df

Unnamed: 0,Loyalty_Number,Customer_Name,Loyalty_Tier,Loyalty_Discount
0,1000012,Leeanne Trimmill,Bronze,
1,1000026,Teador Kobierski,,
2,1000028,Jenelle Plues,Bronze,
3,1000032,Gabriell Firmager,Bronze,
4,1000038,Nicolea Chiles,,
...,...,...,...,...
9784,1009987,Thacher Rattenberie,Silver,10%
9785,1009990,Rosanna Mariaud,Silver,10%
9786,1009991,Edgar D'Ambrogio,Silver,10%
9787,1009995,Mark Scrivener,Silver,10%


In [16]:
# Convert Loyalty_Discount to numeric, removing the '%' sign first
loyalty_df['Loyalty_Discount'] = loyalty_df['Loyalty_Discount'].str.rstrip('%').astype('float') / 100.0

loyalty_df

Unnamed: 0,Loyalty_Number,Customer_Name,Loyalty_Tier,Loyalty_Discount
0,1000012,Leeanne Trimmill,Bronze,
1,1000026,Teador Kobierski,,
2,1000028,Jenelle Plues,Bronze,
3,1000032,Gabriell Firmager,Bronze,
4,1000038,Nicolea Chiles,,
...,...,...,...,...
9784,1009987,Thacher Rattenberie,Silver,0.1
9785,1009990,Rosanna Mariaud,Silver,0.1
9786,1009991,Edgar D'Ambrogio,Silver,0.1
9787,1009995,Mark Scrivener,Silver,0.1


In [17]:
# Perform a left join
final_df = merged_df.merge(loyalty_df, how='left', on='Loyalty_Number')

final_df

Unnamed: 0,Transaction_Date,Transanction_Number,Product_ID,Cash_or_Card,Loyalty_Number,Sales_Before_Discount,Product_Type,Product_Scent,Unit_Cost,Selling_Price,Size,merge_key,Quantity,Customer_Name,Loyalty_Tier,Loyalty_Discount
0,2023-01-03,30123001.0,Liquid-Sandalwood_Spice-0.25L,2.0,1005245.0,8.50,Liquid,Sandalwood_Spice,2.45,4.25,0.25L,Liquid-Sandalwood_Spice-0.25L,2.0,Laurens Squibbs,Bronze,0.05
1,2023-01-03,30123001.0,Liquid-Vanilla_Bean-0.5L,2.0,1005245.0,14.70,Liquid,Vanilla_Bean,5.97,7.35,0.5L,Liquid-Vanilla_Bean-0.5L,2.0,Laurens Squibbs,Bronze,0.05
2,2023-01-03,30123002.0,Liquid-Sandalwood_Spice-1L,1.0,1007270.0,13.19,Liquid,Sandalwood_Spice,9.67,13.19,1L,Liquid-Sandalwood_Spice-1L,1.0,Cary Breckon,Bronze,0.05
3,2023-01-03,30123003.0,Liquid-Eucalyptus_Mint-0.25L,2.0,1009750.0,9.00,Liquid,Eucalyptus_Mint,2.27,4.50,0.25L,Liquid-Eucalyptus_Mint-0.25L,2.0,Orv Drewitt,Bronze,0.05
4,2023-01-03,30123003.0,Liquid-Eucalyptus_Mint-1L,2.0,1009750.0,39.81,Liquid,Eucalyptus_Mint,8.96,13.27,1L,Liquid-Eucalyptus_Mint-1L,3.0,Orv Drewitt,Bronze,0.05
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
39332,2024-03-06,60324062.0,Bar-Sandalwood_Spice-4x,2.0,1001139.0,31.96,Bar,Sandalwood_Spice,7.34,7.99,4x,Bar-Sandalwood_Spice-4x,4.0,Jae Swindall,Bronze,0.05
39333,2024-03-06,60324062.0,Liquid-Cedarwood_Forest-0.25L,2.0,1001139.0,7.80,Liquid,Cedarwood_Forest,1.33,1.56,0.25L,Liquid-Cedarwood_Forest-0.25L,5.0,Jae Swindall,Bronze,0.05
39334,2024-03-06,60324063.0,Liquid-Citrus_Breeze-0.5L,1.0,1007693.0,2.48,Liquid,Citrus_Breeze,1.79,2.48,0.5L,Liquid-Citrus_Breeze-0.5L,1.0,Rheba Roark,Bronze,0.05
39335,2024-03-06,60324063.0,Liquid-Sandalwood_Spice-0.5L,1.0,1007693.0,47.80,Liquid,Sandalwood_Spice,4.87,9.56,0.5L,Liquid-Sandalwood_Spice-0.5L,5.0,Rheba Roark,Bronze,0.05


In [18]:
# Calculate Sales_After_Discount
final_df['Sales_After_Discount'] = final_df.apply(
    lambda row: row['Sales_Before_Discount'] * (1 - row['Loyalty_Discount']) if pd.notnull(row['Loyalty_Number']) else row['Sales_Before_Discount'],
    axis=1
)

final_df

Unnamed: 0,Transaction_Date,Transanction_Number,Product_ID,Cash_or_Card,Loyalty_Number,Sales_Before_Discount,Product_Type,Product_Scent,Unit_Cost,Selling_Price,Size,merge_key,Quantity,Customer_Name,Loyalty_Tier,Loyalty_Discount,Sales_After_Discount
0,2023-01-03,30123001.0,Liquid-Sandalwood_Spice-0.25L,2.0,1005245.0,8.50,Liquid,Sandalwood_Spice,2.45,4.25,0.25L,Liquid-Sandalwood_Spice-0.25L,2.0,Laurens Squibbs,Bronze,0.05,8.0750
1,2023-01-03,30123001.0,Liquid-Vanilla_Bean-0.5L,2.0,1005245.0,14.70,Liquid,Vanilla_Bean,5.97,7.35,0.5L,Liquid-Vanilla_Bean-0.5L,2.0,Laurens Squibbs,Bronze,0.05,13.9650
2,2023-01-03,30123002.0,Liquid-Sandalwood_Spice-1L,1.0,1007270.0,13.19,Liquid,Sandalwood_Spice,9.67,13.19,1L,Liquid-Sandalwood_Spice-1L,1.0,Cary Breckon,Bronze,0.05,12.5305
3,2023-01-03,30123003.0,Liquid-Eucalyptus_Mint-0.25L,2.0,1009750.0,9.00,Liquid,Eucalyptus_Mint,2.27,4.50,0.25L,Liquid-Eucalyptus_Mint-0.25L,2.0,Orv Drewitt,Bronze,0.05,8.5500
4,2023-01-03,30123003.0,Liquid-Eucalyptus_Mint-1L,2.0,1009750.0,39.81,Liquid,Eucalyptus_Mint,8.96,13.27,1L,Liquid-Eucalyptus_Mint-1L,3.0,Orv Drewitt,Bronze,0.05,37.8195
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
39332,2024-03-06,60324062.0,Bar-Sandalwood_Spice-4x,2.0,1001139.0,31.96,Bar,Sandalwood_Spice,7.34,7.99,4x,Bar-Sandalwood_Spice-4x,4.0,Jae Swindall,Bronze,0.05,30.3620
39333,2024-03-06,60324062.0,Liquid-Cedarwood_Forest-0.25L,2.0,1001139.0,7.80,Liquid,Cedarwood_Forest,1.33,1.56,0.25L,Liquid-Cedarwood_Forest-0.25L,5.0,Jae Swindall,Bronze,0.05,7.4100
39334,2024-03-06,60324063.0,Liquid-Citrus_Breeze-0.5L,1.0,1007693.0,2.48,Liquid,Citrus_Breeze,1.79,2.48,0.5L,Liquid-Citrus_Breeze-0.5L,1.0,Rheba Roark,Bronze,0.05,2.3560
39335,2024-03-06,60324063.0,Liquid-Sandalwood_Spice-0.5L,1.0,1007693.0,47.80,Liquid,Sandalwood_Spice,4.87,9.56,0.5L,Liquid-Sandalwood_Spice-0.5L,5.0,Rheba Roark,Bronze,0.05,45.4100


In [19]:
# Calculate Profit
final_df['Profit'] = final_df['Sales_After_Discount'] - (final_df['Unit_Cost'] * final_df['Quantity'])

final_df

Unnamed: 0,Transaction_Date,Transanction_Number,Product_ID,Cash_or_Card,Loyalty_Number,Sales_Before_Discount,Product_Type,Product_Scent,Unit_Cost,Selling_Price,Size,merge_key,Quantity,Customer_Name,Loyalty_Tier,Loyalty_Discount,Sales_After_Discount,Profit
0,2023-01-03,30123001.0,Liquid-Sandalwood_Spice-0.25L,2.0,1005245.0,8.50,Liquid,Sandalwood_Spice,2.45,4.25,0.25L,Liquid-Sandalwood_Spice-0.25L,2.0,Laurens Squibbs,Bronze,0.05,8.0750,3.1750
1,2023-01-03,30123001.0,Liquid-Vanilla_Bean-0.5L,2.0,1005245.0,14.70,Liquid,Vanilla_Bean,5.97,7.35,0.5L,Liquid-Vanilla_Bean-0.5L,2.0,Laurens Squibbs,Bronze,0.05,13.9650,2.0250
2,2023-01-03,30123002.0,Liquid-Sandalwood_Spice-1L,1.0,1007270.0,13.19,Liquid,Sandalwood_Spice,9.67,13.19,1L,Liquid-Sandalwood_Spice-1L,1.0,Cary Breckon,Bronze,0.05,12.5305,2.8605
3,2023-01-03,30123003.0,Liquid-Eucalyptus_Mint-0.25L,2.0,1009750.0,9.00,Liquid,Eucalyptus_Mint,2.27,4.50,0.25L,Liquid-Eucalyptus_Mint-0.25L,2.0,Orv Drewitt,Bronze,0.05,8.5500,4.0100
4,2023-01-03,30123003.0,Liquid-Eucalyptus_Mint-1L,2.0,1009750.0,39.81,Liquid,Eucalyptus_Mint,8.96,13.27,1L,Liquid-Eucalyptus_Mint-1L,3.0,Orv Drewitt,Bronze,0.05,37.8195,10.9395
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
39332,2024-03-06,60324062.0,Bar-Sandalwood_Spice-4x,2.0,1001139.0,31.96,Bar,Sandalwood_Spice,7.34,7.99,4x,Bar-Sandalwood_Spice-4x,4.0,Jae Swindall,Bronze,0.05,30.3620,1.0020
39333,2024-03-06,60324062.0,Liquid-Cedarwood_Forest-0.25L,2.0,1001139.0,7.80,Liquid,Cedarwood_Forest,1.33,1.56,0.25L,Liquid-Cedarwood_Forest-0.25L,5.0,Jae Swindall,Bronze,0.05,7.4100,0.7600
39334,2024-03-06,60324063.0,Liquid-Citrus_Breeze-0.5L,1.0,1007693.0,2.48,Liquid,Citrus_Breeze,1.79,2.48,0.5L,Liquid-Citrus_Breeze-0.5L,1.0,Rheba Roark,Bronze,0.05,2.3560,0.5660
39335,2024-03-06,60324063.0,Liquid-Sandalwood_Spice-0.5L,1.0,1007693.0,47.80,Liquid,Sandalwood_Spice,4.87,9.56,0.5L,Liquid-Sandalwood_Spice-0.5L,5.0,Rheba Roark,Bronze,0.05,45.4100,21.0600


In [20]:
# Rename columns
final_df.columns = final_df.columns.str.replace('_', ' ')
final_df

Unnamed: 0,Transaction Date,Transanction Number,Product ID,Cash or Card,Loyalty Number,Sales Before Discount,Product Type,Product Scent,Unit Cost,Selling Price,Size,merge key,Quantity,Customer Name,Loyalty Tier,Loyalty Discount,Sales After Discount,Profit
0,2023-01-03,30123001.0,Liquid-Sandalwood_Spice-0.25L,2.0,1005245.0,8.50,Liquid,Sandalwood_Spice,2.45,4.25,0.25L,Liquid-Sandalwood_Spice-0.25L,2.0,Laurens Squibbs,Bronze,0.05,8.0750,3.1750
1,2023-01-03,30123001.0,Liquid-Vanilla_Bean-0.5L,2.0,1005245.0,14.70,Liquid,Vanilla_Bean,5.97,7.35,0.5L,Liquid-Vanilla_Bean-0.5L,2.0,Laurens Squibbs,Bronze,0.05,13.9650,2.0250
2,2023-01-03,30123002.0,Liquid-Sandalwood_Spice-1L,1.0,1007270.0,13.19,Liquid,Sandalwood_Spice,9.67,13.19,1L,Liquid-Sandalwood_Spice-1L,1.0,Cary Breckon,Bronze,0.05,12.5305,2.8605
3,2023-01-03,30123003.0,Liquid-Eucalyptus_Mint-0.25L,2.0,1009750.0,9.00,Liquid,Eucalyptus_Mint,2.27,4.50,0.25L,Liquid-Eucalyptus_Mint-0.25L,2.0,Orv Drewitt,Bronze,0.05,8.5500,4.0100
4,2023-01-03,30123003.0,Liquid-Eucalyptus_Mint-1L,2.0,1009750.0,39.81,Liquid,Eucalyptus_Mint,8.96,13.27,1L,Liquid-Eucalyptus_Mint-1L,3.0,Orv Drewitt,Bronze,0.05,37.8195,10.9395
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
39332,2024-03-06,60324062.0,Bar-Sandalwood_Spice-4x,2.0,1001139.0,31.96,Bar,Sandalwood_Spice,7.34,7.99,4x,Bar-Sandalwood_Spice-4x,4.0,Jae Swindall,Bronze,0.05,30.3620,1.0020
39333,2024-03-06,60324062.0,Liquid-Cedarwood_Forest-0.25L,2.0,1001139.0,7.80,Liquid,Cedarwood_Forest,1.33,1.56,0.25L,Liquid-Cedarwood_Forest-0.25L,5.0,Jae Swindall,Bronze,0.05,7.4100,0.7600
39334,2024-03-06,60324063.0,Liquid-Citrus_Breeze-0.5L,1.0,1007693.0,2.48,Liquid,Citrus_Breeze,1.79,2.48,0.5L,Liquid-Citrus_Breeze-0.5L,1.0,Rheba Roark,Bronze,0.05,2.3560,0.5660
39335,2024-03-06,60324063.0,Liquid-Sandalwood_Spice-0.5L,1.0,1007693.0,47.80,Liquid,Sandalwood_Spice,4.87,9.56,0.5L,Liquid-Sandalwood_Spice-0.5L,5.0,Rheba Roark,Bronze,0.05,45.4100,21.0600


In [21]:
# Remove unnessary columns
final_df = final_df[['Transaction Date', 'Transanction Number', 'Product Type', 'Product Scent', 'Size', 'Cash or Card', 'Loyalty Number', 'Customer Name', 'Loyalty Tier', 'Loyalty Discount', 'Quantity', 'Sales Before Discount', 'Sales After Discount', 'Profit']]
final_df

Unnamed: 0,Transaction Date,Transanction Number,Product Type,Product Scent,Size,Cash or Card,Loyalty Number,Customer Name,Loyalty Tier,Loyalty Discount,Quantity,Sales Before Discount,Sales After Discount,Profit
0,2023-01-03,30123001.0,Liquid,Sandalwood_Spice,0.25L,2.0,1005245.0,Laurens Squibbs,Bronze,0.05,2.0,8.50,8.0750,3.1750
1,2023-01-03,30123001.0,Liquid,Vanilla_Bean,0.5L,2.0,1005245.0,Laurens Squibbs,Bronze,0.05,2.0,14.70,13.9650,2.0250
2,2023-01-03,30123002.0,Liquid,Sandalwood_Spice,1L,1.0,1007270.0,Cary Breckon,Bronze,0.05,1.0,13.19,12.5305,2.8605
3,2023-01-03,30123003.0,Liquid,Eucalyptus_Mint,0.25L,2.0,1009750.0,Orv Drewitt,Bronze,0.05,2.0,9.00,8.5500,4.0100
4,2023-01-03,30123003.0,Liquid,Eucalyptus_Mint,1L,2.0,1009750.0,Orv Drewitt,Bronze,0.05,3.0,39.81,37.8195,10.9395
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
39332,2024-03-06,60324062.0,Bar,Sandalwood_Spice,4x,2.0,1001139.0,Jae Swindall,Bronze,0.05,4.0,31.96,30.3620,1.0020
39333,2024-03-06,60324062.0,Liquid,Cedarwood_Forest,0.25L,2.0,1001139.0,Jae Swindall,Bronze,0.05,5.0,7.80,7.4100,0.7600
39334,2024-03-06,60324063.0,Liquid,Citrus_Breeze,0.5L,1.0,1007693.0,Rheba Roark,Bronze,0.05,1.0,2.48,2.3560,0.5660
39335,2024-03-06,60324063.0,Liquid,Sandalwood_Spice,0.5L,1.0,1007693.0,Rheba Roark,Bronze,0.05,5.0,47.80,45.4100,21.0600


In [22]:
output = final_df
output

Unnamed: 0,Transaction Date,Transanction Number,Product Type,Product Scent,Size,Cash or Card,Loyalty Number,Customer Name,Loyalty Tier,Loyalty Discount,Quantity,Sales Before Discount,Sales After Discount,Profit
0,2023-01-03,30123001.0,Liquid,Sandalwood_Spice,0.25L,2.0,1005245.0,Laurens Squibbs,Bronze,0.05,2.0,8.50,8.0750,3.1750
1,2023-01-03,30123001.0,Liquid,Vanilla_Bean,0.5L,2.0,1005245.0,Laurens Squibbs,Bronze,0.05,2.0,14.70,13.9650,2.0250
2,2023-01-03,30123002.0,Liquid,Sandalwood_Spice,1L,1.0,1007270.0,Cary Breckon,Bronze,0.05,1.0,13.19,12.5305,2.8605
3,2023-01-03,30123003.0,Liquid,Eucalyptus_Mint,0.25L,2.0,1009750.0,Orv Drewitt,Bronze,0.05,2.0,9.00,8.5500,4.0100
4,2023-01-03,30123003.0,Liquid,Eucalyptus_Mint,1L,2.0,1009750.0,Orv Drewitt,Bronze,0.05,3.0,39.81,37.8195,10.9395
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
39332,2024-03-06,60324062.0,Bar,Sandalwood_Spice,4x,2.0,1001139.0,Jae Swindall,Bronze,0.05,4.0,31.96,30.3620,1.0020
39333,2024-03-06,60324062.0,Liquid,Cedarwood_Forest,0.25L,2.0,1001139.0,Jae Swindall,Bronze,0.05,5.0,7.80,7.4100,0.7600
39334,2024-03-06,60324063.0,Liquid,Citrus_Breeze,0.5L,1.0,1007693.0,Rheba Roark,Bronze,0.05,1.0,2.48,2.3560,0.5660
39335,2024-03-06,60324063.0,Liquid,Sandalwood_Spice,0.5L,1.0,1007693.0,Rheba Roark,Bronze,0.05,5.0,47.80,45.4100,21.0600
