### Prepping Data Challenge: Departmental December - Operations (week 52)

### Requirements
- Input the spreadsheet
- Count the number of complaints per customer
- Join the 'Department Responsible' data set to allocate the complaints to the correct department
- Create a comma-separated field for all the keywords found in the complaint for each department (help)
- For any complaint that isn't classified, class the department as 'unknown' and the complaint cause as 'other'
- Output the file

In [2]:
import pandas as pd
import numpy as np

In [3]:
#Input the data
with pd.ExcelFile(r"\Dataprep\2021\PD 2021 Wk 52 Input.xlsx") as xl:
    dfc = pd.read_excel(xl, 'Complaints')
    dfd = pd.read_excel(xl, 'Department Responsbile')

In [4]:
dfc.head()

Unnamed: 0,Name,Complaint
0,Carl,The state of the toilets were not suitable for...
1,Carl,Call that food? The bread was so stale. I thou...
2,Anya,"Urgh, the food. They run out of tomato juice i..."
3,Ian,I'm really disappointed the pilot didn't fly p...
4,Sophie,I have seen tins of sardines with more leg roo...


In [5]:
dfd.head()

Unnamed: 0,Keyword,Department
0,Toilet,Operations
1,Luggage,Operations
2,Ticket,Sales
3,Room,Marketing
4,Food,Operations


In [7]:
# count the number of complaints by customer
dfc['Complaints per Person'] = dfc.groupby('Name')['Complaint'].transform('size')
dfc['Complaint'] = dfc['Complaint'].str.strip().str.lower()

In [8]:
#Join the 'Department Responsible' data set to allocate the complaints to the correct department
keywords = '|'.join(dfd['Keyword'].str.lower())
df = dfc.assign(Keyword=dfc['Complaint'].str.findall(keywords)).explode('Keyword')\
                .merge(dfd.assign(Keyword=dfd['Keyword'].str.lower()), on='Keyword', how='left')

In [9]:
# for any complaint that isn't classified, class the department as 'unknown' and 'other' cause
df['Complaint causes'] = df['Keyword'].fillna('other')
df['Department'] = df['Department'].fillna('Unknown')


# create a comma-separated list of keywords grouped by department
df = df.groupby(['Name', 'Complaint', 'Department', 'Complaints per Person'])['Complaint causes'].agg(lambda x: ', '.join(x)).reset_index()

In [11]:
df.head(10)

Unnamed: 0,Name,Complaint,Department,Complaints per Person,Complaint causes
0,Anya,"the lugguage compartment door wasn't working, ...",Unknown,2,other
1,Anya,"urgh, the food. they run out of tomato juice i...",Operations,2,food
2,Carl,call that food? the bread was so stale. i thou...,Operations,3,"food, luggage"
3,Carl,call that food? the bread was so stale. i thou...,Sales,3,price
4,Carl,"i am only 5' 3"" and i still didn't have leg ro...",Marketing,3,room
5,Carl,the state of the toilets were not suitable for...,Operations,3,toilet
6,Ian,i'm really disappointed the pilot didn't fly p...,Unknown,1,other
7,Oscar,the baby changing facilities were not suitable...,Operations,1,baby changing
8,Sophie,i have seen tins of sardines with more leg roo...,Marketing,2,room
9,Sophie,i have seen tins of sardines with more leg roo...,Sales,2,price


In [12]:
#output the data
df.to_csv('wk52-output.csv', index=False)