# 2024: Week 22 - Top 5 Loyal Customers

May 29, 2024

Challenge by: Alexandra Skelly

We're continuing with DS43's challenges so over to Alex to explain the her next challenge. 

_____________________________________

Each SuperBytes store is interested in determining their most loyal customers so that they can send them all a gift. They'll need to bring many tables of data together in order to do this.


### Inputs

There are 3 tables of data needed to achieve SuperBytes goal:

1. Loyalty Points 

![2](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi9gjByXC4LtIpm85Z95uGAcuU5ClCZ5ObKBzQznZSsaczFvgoqpDY9dpRVILBLb-63Xj6-khcwww062iBK8SYaQmmuXLdzb2i-Sy59YUu8ZF2Fc4l-bgolig_XA0vWfXVB0uzRVByN9ftSCeQzO9HCRJroSrL2go_vrYr80m9S1o1scWK66d_n-BGAY7r_/s532/Intermediate%20Input-%20Loyalty%20Points.png)

2. Customer Details 

![3](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHO-Tc6BDAsZQruWNjNecD3H5FTiZmUnClp8Lay68s2fXUeM-oSwqpQv-J8vZagB153bXdW-Z_xGucXVwXHVTZKk_dxzCNjcRSXtlLJovtlyWbpB0Qq9jTkYpaX1BnHuuhYw9QxiCiXabjKasn2Te7AGvePJcImLNxmSwcAfTUjIUzLqp7w1BT4Df7_-mv/s371/Intermeidate%20Input%20-%20Customer%20Details.png)

3. Store Data 

![4](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgOsNe178IhdJPipPA5HdjBKB35txSwIAxtdBAenvGj35AFsJ1HnxsqmXGnGIbluIvOwyXjb1rUVwqlKih0ktQgkDO_RxoUKnFQgCT0QbqM_LHcCC4IeMqSHWhibumVtTGYI0vXD8G288CXsp7_CPlzJsxyj8pZg5AeA6qncxvn05bB8f8ll5LcRunmb4P/s208/Intermediate%20Input-%20Store%20Data.png)

### Requirements

- Input the data (updated 3rd June)
- Start with the Loyalty Points table:
- Change the DateTime_Out field to a Date data type
- Extract the numeric part of the Loyalty Points field
- Extract the First Name and Last Name Initial from the Email Address
- Join to the Customer Details table, ensuring the number of rows remains at 999
- Join on the Store Data table
- Remove unnecessary fields
- Filter out customers without postcodes (it will be difficult to send gifts to these customers!)
- For each store, rank the customers 
- Customers with the highest number of loyalty points should be ranked #1
- Filter to the top 5 customers for each store
- This may result in more than 5 customers per store if there has been ties in the number of loyalty points. SuperBytes wants to reward all these customers.
- Output the data

### Output

![1](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYm4CPkBUjiop6w6M-kDH2usK22qQdu7mTAcpmPDAu_Dyq1SxcQnYpqA3tYHmmW5l9Y1mtPHxED3KS0oZ5xZ7-LMX4TK-89h6z_8pXp8VFmFg6t8zDSLygJag4iJA-HDwK52X_CZvyx5RewRE-G4ocsOVT5t0T0ky-kWWZsU0Bwd8O-ZLf22pzbqXgY9cK/s938/Screenshot%202024-05-09%20121612.png)


- 10 fields
- City
- Store
- Rank
- Email Address
- First Name
- Last Name
- Loyalty Points
- Date
- Postcode
- Address
- 65 rows (66 including headers)

In [61]:
import pandas as pd
# Load the Excel file
excel_file = pd.ExcelFile('Loyalty Points Data.xlsx')

# List all sheet names
sheet_names = excel_file.sheet_names
print(sheet_names)


['Loyalty Points', 'Store Data', 'Customer Details']


In [62]:
loyal_df = pd.read_excel(excel_file, sheet_name='Loyalty Points')
loyal_df.head()

Unnamed: 0,RecordID,Email Address,Loyalty Points,Store ID,DateTime_Out,Right_RecordID
0,1,Aarika.d@wikispaces.com,/__ 1.2 \LotaltyScore,6,01-Dec-2023,1
1,2,Abbey.m@tiny.cc,/__ 0.2 \LotaltyScore,6,30-Mar-2023,2
2,3,Abbi.p@chronoengine.com,/__ 4.8 \LotaltyScore,8,28-Mar-2023,3
3,4,Abby.l@desdev.cn,/__ 2.1 \LotaltyScore,2,10-Apr-2023,4
4,5,Abbye.h@t-online.de,/__ 0.4 \LotaltyScore,2,28-Oct-2023,5


In [63]:
store_df = pd.read_excel(excel_file, sheet_name='Store Data')
store_df.head()

Unnamed: 0,City,Store,Store ID
0,London,Oxford Street,1
1,London,Wembley,2
2,London,Richmond,3
3,London,Stratford,4
4,Manchester,Trafford,5


In [64]:
customer_df = pd.read_excel(excel_file, sheet_name='Customer Details')
customer_df.head()

Unnamed: 0,First Name,Last Name,Postcode,address
0,Aarika,Densham,,Suite 52
1,Abbey,Moorey,,12th Floor
2,Abbi,Parvin,,PO Box 62649
3,Abby,Leonardi,6513.0,Suite 73
4,Abbye,Hazlehurst,,8th Floor


In [65]:
# Convert the 'DateTime_In' and 'DateTime_Out' columns to datetime
loyal_df['DateTime_Out'] = pd.to_datetime(loyal_df['DateTime_Out'], format='%d-%b-%Y')
loyal_df.head()

Unnamed: 0,RecordID,Email Address,Loyalty Points,Store ID,DateTime_Out,Right_RecordID
0,1,Aarika.d@wikispaces.com,/__ 1.2 \LotaltyScore,6,2023-12-01,1
1,2,Abbey.m@tiny.cc,/__ 0.2 \LotaltyScore,6,2023-03-30,2
2,3,Abbi.p@chronoengine.com,/__ 4.8 \LotaltyScore,8,2023-03-28,3
3,4,Abby.l@desdev.cn,/__ 2.1 \LotaltyScore,2,2023-04-10,4
4,5,Abbye.h@t-online.de,/__ 0.4 \LotaltyScore,2,2023-10-28,5


In [66]:
loyal_df['Loyalty Points'] = pd.to_numeric(loyal_df['Loyalty Points'].str.extract(r'(\d+\.\d+|\d+)')[0])
loyal_df.head()

Unnamed: 0,RecordID,Email Address,Loyalty Points,Store ID,DateTime_Out,Right_RecordID
0,1,Aarika.d@wikispaces.com,1.2,6,2023-12-01,1
1,2,Abbey.m@tiny.cc,0.2,6,2023-03-30,2
2,3,Abbi.p@chronoengine.com,4.8,8,2023-03-28,3
3,4,Abby.l@desdev.cn,2.1,2,2023-04-10,4
4,5,Abbye.h@t-online.de,0.4,2,2023-10-28,5


In [67]:
# Split the email address to extract first name and last name
loyal_df[['First Name', 'Rest']] = loyal_df['Email Address'].str.split('.', n=1, expand=True)
loyal_df[['Last Name', 'Domain']] = loyal_df['Rest'].str.split('@', n=1, expand=True)
loyal_df.drop(columns=['Rest', 'Domain'], inplace=True)
loyal_df.head()

Unnamed: 0,RecordID,Email Address,Loyalty Points,Store ID,DateTime_Out,Right_RecordID,First Name,Last Name
0,1,Aarika.d@wikispaces.com,1.2,6,2023-12-01,1,Aarika,d
1,2,Abbey.m@tiny.cc,0.2,6,2023-03-30,2,Abbey,m
2,3,Abbi.p@chronoengine.com,4.8,8,2023-03-28,3,Abbi,p
3,4,Abby.l@desdev.cn,2.1,2,2023-04-10,4,Abby,l
4,5,Abbye.h@t-online.de,0.4,2,2023-10-28,5,Abbye,h


In [68]:
# uppercase the first name and last name
loyal_df['Last Name'] = loyal_df['Last Name'].str.upper()
loyal_df.head()

Unnamed: 0,RecordID,Email Address,Loyalty Points,Store ID,DateTime_Out,Right_RecordID,First Name,Last Name
0,1,Aarika.d@wikispaces.com,1.2,6,2023-12-01,1,Aarika,D
1,2,Abbey.m@tiny.cc,0.2,6,2023-03-30,2,Abbey,M
2,3,Abbi.p@chronoengine.com,4.8,8,2023-03-28,3,Abbi,P
3,4,Abby.l@desdev.cn,2.1,2,2023-04-10,4,Abby,L
4,5,Abbye.h@t-online.de,0.4,2,2023-10-28,5,Abbye,H


In [69]:
customer_df['Last Name'] = customer_df['Last Name'].str.slice(0, 1)
customer_df['Last Name'] = customer_df['Last Name'].str.upper()
customer_df.head()

Unnamed: 0,First Name,Last Name,Postcode,address
0,Aarika,D,,Suite 52
1,Abbey,M,,12th Floor
2,Abbi,P,,PO Box 62649
3,Abby,L,6513.0,Suite 73
4,Abbye,H,,8th Floor


In [70]:
merge_df = pd.merge(loyal_df, customer_df, on=['First Name', 'Last Name'], how='left')
merge_df.head()

Unnamed: 0,RecordID,Email Address,Loyalty Points,Store ID,DateTime_Out,Right_RecordID,First Name,Last Name,Postcode,address
0,1,Aarika.d@wikispaces.com,1.2,6,2023-12-01,1,Aarika,D,,Suite 52
1,2,Abbey.m@tiny.cc,0.2,6,2023-03-30,2,Abbey,M,,12th Floor
2,3,Abbi.p@chronoengine.com,4.8,8,2023-03-28,3,Abbi,P,,PO Box 62649
3,4,Abby.l@desdev.cn,2.1,2,2023-04-10,4,Abby,L,6513.0,Suite 73
4,5,Abbye.h@t-online.de,0.4,2,2023-10-28,5,Abbye,H,,8th Floor


In [71]:
store_df

Unnamed: 0,City,Store,Store ID
0,London,Oxford Street,1
1,London,Wembley,2
2,London,Richmond,3
3,London,Stratford,4
4,Manchester,Trafford,5
5,Manchester,Salford,6
6,Manchester,Chorlton,7
7,Birmingham,Birmingham,8
8,Nottingham,Nottingham,9
9,Leeds,Leeds,10


In [72]:
final_df = pd.merge(merge_df, store_df, on='Store ID', how='left')
final_df.head()

Unnamed: 0,RecordID,Email Address,Loyalty Points,Store ID,DateTime_Out,Right_RecordID,First Name,Last Name,Postcode,address,City,Store
0,1,Aarika.d@wikispaces.com,1.2,6,2023-12-01,1,Aarika,D,,Suite 52,Manchester,Salford
1,2,Abbey.m@tiny.cc,0.2,6,2023-03-30,2,Abbey,M,,12th Floor,Manchester,Salford
2,3,Abbi.p@chronoengine.com,4.8,8,2023-03-28,3,Abbi,P,,PO Box 62649,Birmingham,Birmingham
3,4,Abby.l@desdev.cn,2.1,2,2023-04-10,4,Abby,L,6513.0,Suite 73,London,Wembley
4,5,Abbye.h@t-online.de,0.4,2,2023-10-28,5,Abbye,H,,8th Floor,London,Wembley


In [73]:
final_df = final_df.drop(columns=['RecordID', 'Store ID', 'Right_RecordID'])
final_df.head()

Unnamed: 0,Email Address,Loyalty Points,DateTime_Out,First Name,Last Name,Postcode,address,City,Store
0,Aarika.d@wikispaces.com,1.2,2023-12-01,Aarika,D,,Suite 52,Manchester,Salford
1,Abbey.m@tiny.cc,0.2,2023-03-30,Abbey,M,,12th Floor,Manchester,Salford
2,Abbi.p@chronoengine.com,4.8,2023-03-28,Abbi,P,,PO Box 62649,Birmingham,Birmingham
3,Abby.l@desdev.cn,2.1,2023-04-10,Abby,L,6513.0,Suite 73,London,Wembley
4,Abbye.h@t-online.de,0.4,2023-10-28,Abbye,H,,8th Floor,London,Wembley


In [74]:
final_df = final_df.dropna(subset=['Postcode'])
final_df.head()

Unnamed: 0,Email Address,Loyalty Points,DateTime_Out,First Name,Last Name,Postcode,address,City,Store
3,Abby.l@desdev.cn,2.1,2023-04-10,Abby,L,6513,Suite 73,London,Wembley
7,Addie.r@reddit.com,9.8,2023-10-10,Addie,R,9024,Apt 1028,Manchester,Salford
8,Adelbert.e@narod.ru,3.9,2023-10-23,Adelbert,E,15440-000,3rd Floor,Manchester,Salford
11,Aeriela.c@oakley.com,2.5,2023-01-17,Aeriela,C,6711,Room 1478,Manchester,Chorlton
12,Aggi.f@arstechnica.com,8.0,2023-09-13,Aggi,F,3070-435,Suite 92,Manchester,Trafford


In [75]:
final_df['Rank'] = final_df.groupby('Store')['Loyalty Points'].rank(method='dense', ascending=False)
final_df.head()

Unnamed: 0,Email Address,Loyalty Points,DateTime_Out,First Name,Last Name,Postcode,address,City,Store,Rank
3,Abby.l@desdev.cn,2.1,2023-04-10,Abby,L,6513,Suite 73,London,Wembley,38.0
7,Addie.r@reddit.com,9.8,2023-10-10,Addie,R,9024,Apt 1028,Manchester,Salford,1.0
8,Adelbert.e@narod.ru,3.9,2023-10-23,Adelbert,E,15440-000,3rd Floor,Manchester,Salford,25.0
11,Aeriela.c@oakley.com,2.5,2023-01-17,Aeriela,C,6711,Room 1478,Manchester,Chorlton,30.0
12,Aggi.f@arstechnica.com,8.0,2023-09-13,Aggi,F,3070-435,Suite 92,Manchester,Trafford,11.0


In [76]:
top_customers_df = final_df[final_df['Rank'] <= 5]
top_customers_df

Unnamed: 0,Email Address,Loyalty Points,DateTime_Out,First Name,Last Name,Postcode,address,City,Store,Rank
7,Addie.r@reddit.com,9.8,2023-10-10,Addie,R,9024,Apt 1028,Manchester,Salford,1.0
18,Aldo.b@washingtonpost.com,9.4,2023-04-09,Aldo,B,8710,Suite 15,Leeds,Leeds,4.0
23,Alfonso.e@dedecms.com,10.0,2023-09-06,Alfonso,E,8290,10th Floor,London,Richmond,1.0
32,Alphonso.p@yahoo.co.jp,9.4,2023-09-30,Alphonso,P,3313,Suite 38,Birmingham,Birmingham,3.0
35,Alvina.d@mediafire.com,9.4,2023-05-04,Alvina,D,4960-180,Apt 1003,London,Oxford Street,5.0
...,...,...,...,...,...,...,...,...,...,...
945,Uta.d@yale.edu,8.7,2023-10-24,Uta,D,94109 CEDEX,Apt 1017,Manchester,Chorlton,3.0
962,Wait.v@hao123.com,9.5,2023-07-25,Wait,V,789 01,18th Floor,London,Stratford,3.0
979,Win.l@1688.com,9.2,2023-08-06,Win,L,63800-000,Room 1243,London,Stratford,4.0
990,Zacharia.h@sfgate.com,9.2,2023-05-15,Zacharia,H,429720,PO Box 64615,Manchester,Trafford,5.0


In [77]:
sorted_df = top_customers_df.sort_values(by=['Store', 'Rank'])
output = sorted_df
output

Unnamed: 0,Email Address,Loyalty Points,DateTime_Out,First Name,Last Name,Postcode,address,City,Store,Rank
43,Andros.k@cafepress.com,9.8,2023-05-02,Andros,K,55590,PO Box 8729,Birmingham,Birmingham,1.0
523,Karalee.m@addtoany.com,9.8,2023-11-20,Karalee,M,5449,PO Box 93919,Birmingham,Birmingham,1.0
745,Ofilia.a@newyorker.com,9.6,2023-11-06,Ofilia,A,4212,Room 1698,Birmingham,Birmingham,2.0
32,Alphonso.p@yahoo.co.jp,9.4,2023-09-30,Alphonso,P,3313,Suite 38,Birmingham,Birmingham,3.0
53,Archibald.w@prweb.com,9.2,2023-08-26,Archibald,W,446017,Suite 90,Birmingham,Birmingham,4.0
...,...,...,...,...,...,...,...,...,...,...
265,Ebeneser.f@kickstarter.com,9.8,2023-12-15,Ebeneser,F,184021,Apt 192,London,Wembley,1.0
829,Rourke.z@a8.net,9.6,2023-09-27,Rourke,Z,4274,Apt 1258,London,Wembley,2.0
141,Carma.r@cornell.edu,9.4,2023-10-10,Carma,R,197730,Apt 832,London,Wembley,3.0
581,Leo.s@yahoo.co.jp,9.0,2023-11-02,Leo,S,453380,Suite 91,London,Wembley,4.0
