# 2024: Week 25 - SuperBytes Customer Complaints

June 19, 2024

Challenge by: Tobias Colmer

We're continuing with DS43's challenges so over to Tobias to explain his first challenge. 

_____________________________________

This week Superbytes needs you to take a look at their complaints data, as an error was spotted in a recent data leak posted by rival supermarket WeakFloats: we need to check if their claims are true! They claim that:



we have no category field in our data clearly stating the category of products being complained about & that we were resolving complaints before we’d even received them!

### Inputs

There are 2 inputs for this challenge:

- The Complaints Data Table 

![1](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfWRsAJK2FT5palg6JFoLUJY-6c1IOD_dariMtt5b68LFQMihRQcV2eR5cWf-cwY4snZxLdY805AYbgvZmRSEFBTzHImZkWR10ctC8tGItYYAGrfd8Iuu0MaynlZcubbNx_rOaG0Jyv3fi2rfPKTc3ekTiLKt47lsq-CTW5VyY8bClNwcNIvz4wed1USKz/s887/Screenshot%202024-05-31%20111810.png)

The Product Category Lookup Table 

![2](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXhmqoB6slHYZHcyvx0nxLCot2F2lbZk65Yq8FR45mfQKCsyCdsvSnnFajmMg80pxZ0LCWhrsLVrwOsD92KOIuwd3l2QDPGARwEabcdkE8IVcwnxSfUKEc_8ts8YbZUWCpU6CLoDbJVw786UWmZ3Z-85c00jw_Yp5JVKw6eLkRRV9r6Cug_AN-h7Qx1IAr/s178/Screenshot%202024-05-31%20111942.png)

### Requirements

- Input the dataInput the data
- Split the Complaint Description field into:
- Product ID
- Issue Type
- Complaint Description (the text the customer wrote to describe the complaint)
- Extract the first 2 letters of the Product ID as this represents the Category Code
- Use this Category Code to join on the Product Category from the lookup table
- Filter the dataset to only contain rows where the Date Resolved happens after the Date Received
- Remove unnecessary fields
- Output the dataOutput the data

### Output

![3](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhuc-n-_TMockhRWi_kZo8JZ2_0ZRAbru8hsAuMJ_1nh6hEb3Kv1e79qy7QtVEVTrx7pURLE8nHRtNP7rFGfc_2poHrJd5On5LynulNZeoslG7XBGBmqh_4yfrJLYIeH7ESo5SQbXMpxOtm14Rh1_iTwxldPrh0kMR9bIvRa0NalMqX5UuuFmomU_TnlI7S/s789/Screenshot%202024-05-31%20112326.png)

- 11 fields
- Complaint ID
- Receipt Number
- Customer ID
- Date Received
- Date Resolved
- Timely Response
- Response to Customer
- Issue Type
- Product Category
- Product ID
- Complaint Description
- 53 rows (54 including headers)

In [152]:
import pandas as pd

# Read the Excel file
excel_file = 'Complaint Data Input Beginner.xlsx'

# List all sheet names
sheet_names = pd.ExcelFile(excel_file).sheet_names
print(sheet_names)

['Complaints', 'Categories']


In [153]:
complain_df = pd.read_excel(excel_file, sheet_name='Complaints')
complain_df.head()

Unnamed: 0,Complaint ID,Date Received,Date Resolved,Timely Response,Response to Consumer,Customer ID,Receipt Number,Complaint Description
0,87-855-3122,2023-10-10 00:00:00,9/28/2023,False,In Progress,19-562-5181,2578226121,"CL1828 - Changed Mind: ""Changed mind about the..."
1,85-933-3658,6/26/2023,1/31/2023,False,Closed with Explanation,87-490-9935,3568730928,"HO2617 - Other: ""Encountered an issue with the..."
2,54-009-2490,1/17/2023,7/25/2023,True,In Progress,22-309-3946,4334891349,"HO8575 - Item Size Issue: ""The size of the hom..."
3,48-711-7789,2023-03-10 00:00:00,1/26/2023,True,Closed with Monetary Relief,12-774-1181,8199663065,"EL3785 - Item Size Issue: ""The electronic prod..."
4,41-946-8307,2023-01-12 00:00:00,8/28/2023,False,Closed with Explanation,46-018-3770,7154721456,"EL9846 - Faulty: ""The electronic product is ma..."


In [154]:
category_df = pd.read_excel(excel_file, sheet_name='Categories')
category_df.head()

Unnamed: 0,Product Category
0,Beauty
1,Clothes
2,Electronic
3,Groceries
4,Home


In [155]:
# Split the Complaint Description into three parts
complain_df[['Product ID', 'Issue Type', 'Complaint Description']] = complain_df['Complaint Description'].str.extract(r'([^:]+)-([^:]+): "(.*)"')

# Display the updated dataframe
complain_df.head()

Unnamed: 0,Complaint ID,Date Received,Date Resolved,Timely Response,Response to Consumer,Customer ID,Receipt Number,Complaint Description,Product ID,Issue Type
0,87-855-3122,2023-10-10 00:00:00,9/28/2023,False,In Progress,19-562-5181,2578226121,"Changed mind about the clothes, no longer want...",CL1828,Changed Mind
1,85-933-3658,6/26/2023,1/31/2023,False,Closed with Explanation,87-490-9935,3568730928,Encountered an issue with the home product but...,HO2617,Other
2,54-009-2490,1/17/2023,7/25/2023,True,In Progress,22-309-3946,4334891349,The size of the home product doesn't fit as ex...,HO8575,Item Size Issue
3,48-711-7789,2023-03-10 00:00:00,1/26/2023,True,Closed with Monetary Relief,12-774-1181,8199663065,The electronic product size is not suitable fo...,EL3785,Item Size Issue
4,41-946-8307,2023-01-12 00:00:00,8/28/2023,False,Closed with Explanation,46-018-3770,7154721456,The electronic product is malfunctioning.,EL9846,Faulty


In [156]:
complain_df['Product ID'] = complain_df['Product ID'].str.strip()
complain_df['Issue Type'] = complain_df['Issue Type'].str.strip()

# Display the updated dataframe
complain_df.head()

Unnamed: 0,Complaint ID,Date Received,Date Resolved,Timely Response,Response to Consumer,Customer ID,Receipt Number,Complaint Description,Product ID,Issue Type
0,87-855-3122,2023-10-10 00:00:00,9/28/2023,False,In Progress,19-562-5181,2578226121,"Changed mind about the clothes, no longer want...",CL1828,Changed Mind
1,85-933-3658,6/26/2023,1/31/2023,False,Closed with Explanation,87-490-9935,3568730928,Encountered an issue with the home product but...,HO2617,Other
2,54-009-2490,1/17/2023,7/25/2023,True,In Progress,22-309-3946,4334891349,The size of the home product doesn't fit as ex...,HO8575,Item Size Issue
3,48-711-7789,2023-03-10 00:00:00,1/26/2023,True,Closed with Monetary Relief,12-774-1181,8199663065,The electronic product size is not suitable fo...,EL3785,Item Size Issue
4,41-946-8307,2023-01-12 00:00:00,8/28/2023,False,Closed with Explanation,46-018-3770,7154721456,The electronic product is malfunctioning.,EL9846,Faulty


In [157]:
complain_df['Category Code'] = complain_df['Product ID'].str[:2]

# Display the updated dataframe
complain_df.head()

Unnamed: 0,Complaint ID,Date Received,Date Resolved,Timely Response,Response to Consumer,Customer ID,Receipt Number,Complaint Description,Product ID,Issue Type,Category Code
0,87-855-3122,2023-10-10 00:00:00,9/28/2023,False,In Progress,19-562-5181,2578226121,"Changed mind about the clothes, no longer want...",CL1828,Changed Mind,CL
1,85-933-3658,6/26/2023,1/31/2023,False,Closed with Explanation,87-490-9935,3568730928,Encountered an issue with the home product but...,HO2617,Other,HO
2,54-009-2490,1/17/2023,7/25/2023,True,In Progress,22-309-3946,4334891349,The size of the home product doesn't fit as ex...,HO8575,Item Size Issue,HO
3,48-711-7789,2023-03-10 00:00:00,1/26/2023,True,Closed with Monetary Relief,12-774-1181,8199663065,The electronic product size is not suitable fo...,EL3785,Item Size Issue,EL
4,41-946-8307,2023-01-12 00:00:00,8/28/2023,False,Closed with Explanation,46-018-3770,7154721456,The electronic product is malfunctioning.,EL9846,Faulty,EL


In [158]:
category_df['Category Code'] = category_df['Product Category'].str[:2].str.upper()

# Display the updated dataframe
category_df.head()

Unnamed: 0,Product Category,Category Code
0,Beauty,BE
1,Clothes,CL
2,Electronic,EL
3,Groceries,GR
4,Home,HO


In [159]:
merged_df = complain_df.merge(category_df, on='Category Code', how='left')

# Display the merged dataframe
merged_df.head()

Unnamed: 0,Complaint ID,Date Received,Date Resolved,Timely Response,Response to Consumer,Customer ID,Receipt Number,Complaint Description,Product ID,Issue Type,Category Code,Product Category
0,87-855-3122,2023-10-10 00:00:00,9/28/2023,False,In Progress,19-562-5181,2578226121,"Changed mind about the clothes, no longer want...",CL1828,Changed Mind,CL,Clothes
1,85-933-3658,6/26/2023,1/31/2023,False,Closed with Explanation,87-490-9935,3568730928,Encountered an issue with the home product but...,HO2617,Other,HO,Home
2,54-009-2490,1/17/2023,7/25/2023,True,In Progress,22-309-3946,4334891349,The size of the home product doesn't fit as ex...,HO8575,Item Size Issue,HO,Home
3,48-711-7789,2023-03-10 00:00:00,1/26/2023,True,Closed with Monetary Relief,12-774-1181,8199663065,The electronic product size is not suitable fo...,EL3785,Item Size Issue,EL,Electronic
4,41-946-8307,2023-01-12 00:00:00,8/28/2023,False,Closed with Explanation,46-018-3770,7154721456,The electronic product is malfunctioning.,EL9846,Faulty,EL,Electronic


In [160]:
# Convert Date Received and Date Resolved to datetime
merged_df['Date Received'] = pd.to_datetime(merged_df['Date Received'])
merged_df['Date Resolved'] = pd.to_datetime(merged_df['Date Resolved'])

# Filter the dataframe
filtered_df = merged_df[merged_df['Date Resolved'] > merged_df['Date Received']]

# Display the filtered dataframe
filtered_df.head()

Unnamed: 0,Complaint ID,Date Received,Date Resolved,Timely Response,Response to Consumer,Customer ID,Receipt Number,Complaint Description,Product ID,Issue Type,Category Code,Product Category
2,54-009-2490,2023-01-17,2023-07-25,True,In Progress,22-309-3946,4334891349,The size of the home product doesn't fit as ex...,HO8575,Item Size Issue,HO,Home
4,41-946-8307,2023-01-12,2023-08-28,False,Closed with Explanation,46-018-3770,7154721456,The electronic product is malfunctioning.,EL9846,Faulty,EL,Electronic
7,28-332-8360,2023-04-03,2023-10-12,True,Closed with Explanation,67-194-5451,8657000695,"Changed mind about the home product, no longer...",HO8575,Changed Mind,HO,Home
8,74-957-8616,2023-06-26,2023-08-27,True,Closed with Monetary Relief,91-602-5646,8094639490,The size of the beauty product is not suitable.,BE5681,Item Size Issue,BE,Beauty
9,83-348-7266,2023-03-14,2023-09-12,True,Closed with Non-Monetary Relief,78-272-9545,6967010786,The electronic product size is not what I expe...,EL2859,Item Size Issue,EL,Electronic


In [161]:
# Select only the specified columns
final_df = filtered_df[['Complaint ID', 'Receipt Number', 'Customer ID', 'Date Received', 'Date Resolved', 'Timely Response', 'Response to Consumer', 'Issue Type', 'Product Category', 'Product ID', 'Complaint Description']]

# Display the final dataframe
output = final_df.reset_index(drop=True)
output

Unnamed: 0,Complaint ID,Receipt Number,Customer ID,Date Received,Date Resolved,Timely Response,Response to Consumer,Issue Type,Product Category,Product ID,Complaint Description
0,54-009-2490,4334891349,22-309-3946,2023-01-17,2023-07-25,True,In Progress,Item Size Issue,Home,HO8575,The size of the home product doesn't fit as ex...
1,41-946-8307,7154721456,46-018-3770,2023-01-12,2023-08-28,False,Closed with Explanation,Faulty,Electronic,EL9846,The electronic product is malfunctioning.
2,28-332-8360,8657000695,67-194-5451,2023-04-03,2023-10-12,True,Closed with Explanation,Changed Mind,Home,HO8575,"Changed mind about the home product, no longer..."
3,74-957-8616,8094639490,91-602-5646,2023-06-26,2023-08-27,True,Closed with Monetary Relief,Item Size Issue,Beauty,BE5681,The size of the beauty product is not suitable.
4,83-348-7266,6967010786,78-272-9545,2023-03-14,2023-09-12,True,Closed with Non-Monetary Relief,Item Size Issue,Electronic,EL2859,The electronic product size is not what I expe...
5,10-745-1342,431458243,18-449-1413,2023-07-12,2023-08-14,True,Closed with Monetary Relief,Product not what I expected,Home,HO1212,The home product received is not what I antici...
6,39-616-3628,751826030,64-329-1333,2023-06-09,2023-12-21,True,In Progress,Product not what I expected,Clothes,CL1864,The clothes received are not what I expected.
7,97-853-1841,4539394800,23-143-2854,2023-09-08,2023-12-30,False,Closed with Explanation,Faulty,Home,HO2617,The home product is defective.
8,75-886-5419,6785823269,45-496-3428,2023-09-19,2023-09-28,False,Closed with Non-Monetary Relief,Faulty,Clothes,CL5675,The clothes received are defective.
9,18-420-2838,5285420997,88-458-5077,2023-05-22,2023-12-22,True,Closed with Monetary Relief,Item Size Issue,Beauty,BE9568,The size of the beauty product doesn't fit as ...
