# 2024: Week 21 - Loyalty Points Percentages

May 22, 2024

Challenge by: Alexandra Skelly

We're continuing with DS43's challenges so over to Alex to explain the her challenge. 

_____________________________________

At Preppin' Data we use a number of (mock) companies to look at the challenges they have with their data. We're going to focus on our own supermarket: SuperBytes. The shop has introduced a new loyalty card scheme. 

In the first task we need to clean up a number of data fields to determine the percentage of Free Byte qualifiers. Our stakeholders would like one dataset focused on the percentage of 
customers (split between male and females) that have qualified. 

Input

There is one input file:

![1](https://lh7-eu.googleusercontent.com/XGpIN_nsjsOAOhWhVKBDtKASExOcDSy6TeaNEK0sZ4Cqb8GgTymglWU1HNhiNo3X2vkn-4eo2AjeiD4D6rP9yteDlJcYhzz-AtIg4umsAtZWtqWCEyGgEVdVSH2tF-aC1iShmytNfDdFQxJrioTaq2Q)

Requirements

- Input the data
- Create the Loyalty Points:
- For every £50 spent they get 1 loyalty point
- Round Loyalty Points to 1 decimal place
- Create a new field to categorise the Loyalty Points:
- Points that are greater than or equal to 7 categorise them as "MegaByte" 
- Points that are greater than or equal to 5 (but less than 7) categorise as “Byte”
- Categorise the remaining as “No Byte”
- Find the sum of customers that qualify for each type of byte category
- For females and males separately
- Pivot so that females and males have their own field 
- Calculate the percentage of females and males in each category
- Round the percentages to 1 decimal place
- Output the data

Output

![2](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8SHpONqGiHBOVdWbMa9H-Ns2Cd5vuI_AQa0Y3CNF33Us28fxawhWx-LokWtdl0RliMhArjAijUoaEWDvTyyNTE83vnLIubV2H73Unx6wXSRcE5DJ_RzaZXC6QZhSrJd1fLCBNbRfoYbbfz-OPqOn8rWkNMgJFwqeJN0oGvklS2FYERl22qkIeNAV91pK_/s1600/Screenshot%202024-05-08%20154129.png)

- 3 fields
- Category
- Female
- Male
- 3 rows

In [34]:
import pandas as pd

df = pd.read_csv('Customer Spending.csv')
df

Unnamed: 0,First Name,Last Name,Gender,Receipt Number,Date,Online,In Person,Sale Total
0,Emeline,Woollard,Female,2578226121,27/12/2023,No,Yes,250.54
1,Wallace,Slyde,Male,6574211891,6/8/2023,No,Yes,363.43
2,Chrissy,MacGiany,Male,7742449501,19/4/2023,No,Yes,487.25
3,Jacques,Brauns,Male,3112564839,25/12/2023,No,Yes,195.67
4,Joannes,Cabrera,Female,8018081174,20/7/2023,No,Yes,95.10
...,...,...,...,...,...,...,...,...
994,Camella,Alonso,Female,5239318611,22/12/2023,Yes,No,268.70
995,Chev,Cattonnet,Male,7952775111,18/5/2023,No,Yes,403.90
996,Dallis,Avon,Male,875691994,22/7/2023,Yes,No,238.31
997,Crista,Davsley,Female,1632098091,14/10/2023,Yes,No,41.50


In [35]:
# Calculate the Loyal Point for each customer
df['Loyal Point'] = (df['Sale Total'] / 50).round(1)
df.head()

Unnamed: 0,First Name,Last Name,Gender,Receipt Number,Date,Online,In Person,Sale Total,Loyal Point
0,Emeline,Woollard,Female,2578226121,27/12/2023,No,Yes,250.54,5.0
1,Wallace,Slyde,Male,6574211891,6/8/2023,No,Yes,363.43,7.3
2,Chrissy,MacGiany,Male,7742449501,19/4/2023,No,Yes,487.25,9.7
3,Jacques,Brauns,Male,3112564839,25/12/2023,No,Yes,195.67,3.9
4,Joannes,Cabrera,Female,8018081174,20/7/2023,No,Yes,95.1,1.9


In [36]:
# Create a new column called 'Category' that categorizes the customers into 'MegaByte', 'Byte', and 'No Byte' based on their 'Loyal Point' value
df['Category'] = df['Loyal Point'].apply(lambda x: 'MegaByte' if x >= 7 else ('Byte' if x >= 5 else 'No Byte'))
df.head()

Unnamed: 0,First Name,Last Name,Gender,Receipt Number,Date,Online,In Person,Sale Total,Loyal Point,Category
0,Emeline,Woollard,Female,2578226121,27/12/2023,No,Yes,250.54,5.0,Byte
1,Wallace,Slyde,Male,6574211891,6/8/2023,No,Yes,363.43,7.3,MegaByte
2,Chrissy,MacGiany,Male,7742449501,19/4/2023,No,Yes,487.25,9.7,MegaByte
3,Jacques,Brauns,Male,3112564839,25/12/2023,No,Yes,195.67,3.9,No Byte
4,Joannes,Cabrera,Female,8018081174,20/7/2023,No,Yes,95.1,1.9,No Byte


In [37]:
# Count the number of customers in each category
category_gender_count = df.groupby(['Category', 'Gender']).size().unstack(fill_value=0)
category_gender_count

Gender,Female,Male
Category,Unnamed: 1_level_1,Unnamed: 2_level_1
Byte,111,88
MegaByte,147,166
No Byte,240,247


In [38]:
# calculate the percentage
category_gender_percentage = category_gender_count.div(category_gender_count.sum(axis=1), axis=0) * 100
category_gender_percentage = category_gender_percentage.round(1)
category_gender_percentage

Gender,Female,Male
Category,Unnamed: 1_level_1,Unnamed: 2_level_1
Byte,55.8,44.2
MegaByte,47.0,53.0
No Byte,49.3,50.7


In [39]:
# reset index
category_gender_percentage.reset_index()[['Category', 'Female', 'Male']]
output = category_gender_count
output

Gender,Female,Male
Category,Unnamed: 1_level_1,Unnamed: 2_level_1
Byte,111,88
MegaByte,147,166
No Byte,240,247
