### Prepping Data Challenge: Call Center Agent Metrics (Week 7)

For this week’s challenge we need to create a data set for our call center agent metrics. We have 2 Excel files that contain the monthly metrics for the agent and another file that contains the agent, leader, location, and goals. 

If you work with databases you may encounter situations were you have one table that has ids and another table(s) that have the descriptors for those ids. We aren’t connecting to a database in this example, however, think about the people, location, leader, and date inputs in that way. 

### Requirements
- Input the data
- People, Location, Leader, and Dates:
  - Join the People, Location, and Leader data sets together
  - Remove the location id fields, the secondary leader id field
  - Create last name, first name fields for the agent and the leader
  - Limit the dates to just 2021 and join those to the People, Location, Leader step
  - Keep the id, agent name, leader 1, leader name, month start date, join, and location field
- Monthly Data
  - union the worksheets in the input step
  - merge the mismatched fields
  - create a month start date
  - remove the table names and file paths field
  - join the data with the people - remember we need to show every agent for every month
- Goals
  - add the goals input to the flow
  - clean the goal data to have the goal name & numeric value
  - add the goals to the combined people & data step
  - be sure that you aren't increasing the row count - the goals should be additional columns
- Metrics & Met Goal Flags
  - create a calculation for the percent of offered that weren't answered (for each agent, each month)
  - create a calculation for the average duration by agent (for each agent, each month)
  - create a calculation that determines if the sentiment score met the goal
  - create a calculation that determines if the not answered percent met the goal
- Output the data

In [1]:
import pandas as pd
import numpy as np

### People, Location, Leader and Dates:

In [2]:
# Input the data.
with pd.ExcelFile('wk7-PeopleData.xlsx') as xlsx:
    people = pd.read_excel(xlsx, 'People')
    leader = pd.read_excel(xlsx, 'Leaders')
    location = pd.read_excel(xlsx, 'Location')
    goals = pd.read_excel(xlsx, 'Goals')
    date = pd.read_excel(xlsx, 'Date Dim')

In [3]:
people.head()

Unnamed: 0,id,first_name,last_name,Leader 1,Location ID
0,4,Fleur,Garnam,1,ABC
1,8,Tandi,Jobbings,3,ABC
2,10,Leanora,Beaver,1,ABC
3,11,Zabrina,Cranke,1,ABC
4,12,Berny,Matysiak,1,ABC


In [4]:
leader.head()

Unnamed: 0,id,first_name,last_name
0,1,Kylie,Howroyd
1,2,Madelyn,MacAne
2,3,Yorke,Befroy
3,4,Dorian,Swallow
4,5,Silvan,Gallardo


In [5]:
location.head()

Unnamed: 0,Location ID,Location
0,ABC,Margaree
1,DEF,Halifax
2,GHI,Truro
3,JKL,Digby


In [6]:
#Join the People, Location, and Leader data sets together
df = pd.merge(people, location, on = 'Location ID', how = 'left' )

In [7]:
df = pd.merge(df, leader, left_on = 'Leader 1', right_on = 'id', how= 'left')

In [8]:
df['Agent Name'] = df['last_name_x']+', '+df['first_name_x']
df['Leader Name'] = df['last_name_y']+', '+df['first_name_y']
df = df.rename(columns={'id_x':'id'})

In [9]:
#Limit the dates to just 2021 and join those to the People, Location, Leader step
#date['Month Start Date'] = date[date['Month Start Date'].dt.year == 2021]
date = date[:12]

In [10]:
df2 = pd.merge(df, date, how = 'cross')

In [11]:
#Keep the id, agent name, leader 1, leader name, month start date, join, and location field
df2 = df2[['id','Agent Name','Leader 1','Leader Name','Month Start Date','Location']]                                         

In [12]:
df2.head()

Unnamed: 0,id,Agent Name,Leader 1,Leader Name,Month Start Date,Location
0,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-01-01,Margaree
1,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-02-01,Margaree
2,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-03-01,Margaree
3,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-04-01,Margaree
4,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-05-01,Margaree


### Monthly Data

In [13]:
# Input the data.
#union the worksheets in the input step

xlsx = pd.ExcelFile('wk7-MetricData2021.xlsx')

col = {'Offered': 'Calls Offered', 'Not Answered': 'Calls Not Answered', 'Answered': 'Calls Answered'} 
df3 = None
for sheet_name in xlsx.sheet_names:
    df4 = xlsx.parse(sheet_name).rename(col, axis = 1)
    df4['Month'] = sheet_name
    df3 = pd.concat([df3,df4])

In [14]:
df3.head()

Unnamed: 0,AgentID,Calls Offered,Calls Not Answered,Calls Answered,Total Duration,Sentiment,Month,Transfers
0,1,477,18,459,2385,48,Jan,
1,2,440,9,431,5720,-15,Jan,
2,3,514,1,513,2056,-25,Jan,
3,4,445,2,443,7565,-53,Jan,
4,5,399,3,396,5187,63,Jan,


In [15]:
#Create a month start date
df3['Month Start Date'] = pd.to_datetime('2021' + df3['Month']+ '01', format='%Y%b%d')

In [16]:
#join the data with the people - remember we need to show every agent for every month
df4 = pd.merge(df2, df3, how = 'left', left_on = ['id','Month Start Date'], right_on = ['AgentID','Month Start Date']).drop(['AgentID','Month'], axis =1)

In [17]:
df4.head()

Unnamed: 0,id,Agent Name,Leader 1,Leader Name,Month Start Date,Location,Calls Offered,Calls Not Answered,Calls Answered,Total Duration,Sentiment,Transfers
0,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-01-01,Margaree,445.0,2.0,443.0,7565.0,-53.0,
1,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-02-01,Margaree,606.0,16.0,590.0,4848.0,97.0,
2,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-03-01,Margaree,413.0,75.0,338.0,2478.0,23.0,
3,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-04-01,Margaree,760.0,12.0,748.0,6080.0,21.0,
4,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-05-01,Margaree,486.0,22.0,464.0,1458.0,-33.0,23.0


### Goals

In [18]:
#add the goals input to the flow
goals.head()

Unnamed: 0,Goals
0,Not Answered Percent < 5
1,Sentiment Score >= 0


In [19]:
#clean the goal data to have the goal name & numeric value
goals['goal name'] =goals['Goals'].str.replace('([\>\<]\=?\s\d)',' ')
goals['numeric value'] = goals['Goals'].str.extract('(\d)')

  


In [20]:
#add the goals to the combined people & data step
#be sure that you aren't increasing the row count - the goals should be additional columns
df4['Not Answered Percent < 5'] = float(goals.iloc[0]['numeric value'])
df4['Sentiment Score >= 0'] = float(goals.iloc[1]['numeric value'])

In [21]:
#be sure that you aren't increasing the row count - the goals should be additional columns
df4.head()

Unnamed: 0,id,Agent Name,Leader 1,Leader Name,Month Start Date,Location,Calls Offered,Calls Not Answered,Calls Answered,Total Duration,Sentiment,Transfers,Not Answered Percent < 5,Sentiment Score >= 0
0,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-01-01,Margaree,445.0,2.0,443.0,7565.0,-53.0,,5.0,0.0
1,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-02-01,Margaree,606.0,16.0,590.0,4848.0,97.0,,5.0,0.0
2,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-03-01,Margaree,413.0,75.0,338.0,2478.0,23.0,,5.0,0.0
3,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-04-01,Margaree,760.0,12.0,748.0,6080.0,21.0,,5.0,0.0
4,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-05-01,Margaree,486.0,22.0,464.0,1458.0,-33.0,23.0,5.0,0.0


### Metrics & Met Goal Flags

In [22]:
#create a calculation for the percent of offered that weren't answered (for each agent, each month)
df4["Not Answered Rate"] = (df4['Calls Not Answered']/df4['Calls Offered']).round(3)

In [23]:
#create a calculation for the average duration by agent (for each agent, each month)
df4['Agent Avg Duration'] = (df4['Total Duration']/df4['Calls Answered']).round()

In [24]:
#create a calculation that determines if the sentiment score met the goal
df4['Met Sentiment Goal'] = np.where(df4['Sentiment'].isna(), '', df4['Sentiment'] >= df4['Sentiment Score >= 0'])

In [25]:
#create a calculation that determines if the not answered percent met the goal
df4['Met Not Answered Rate'] = np.where(df4['Not Answered Rate'].isna(), '', df4['Not Answered Rate'] < df4['Not Answered Percent < 5']/100)

In [26]:
df = df4[['id','Agent Name','Leader 1','Leader Name','Month Start Date','Location','Calls Answered','Calls Not Answered',
         'Met Not Answered Rate','Not Answered Percent < 5','Calls Offered','Total Duration','Agent Avg Duration',
          'Calls Offered','Total Duration','Agent Avg Duration','Transfers','Sentiment','Sentiment Score >= 0',
         'Met Sentiment Goal']]

In [27]:
df.head()

Unnamed: 0,id,Agent Name,Leader 1,Leader Name,Month Start Date,Location,Calls Answered,Calls Not Answered,Met Not Answered Rate,Not Answered Percent < 5,Calls Offered,Total Duration,Agent Avg Duration,Calls Offered.1,Total Duration.1,Agent Avg Duration.1,Transfers,Sentiment,Sentiment Score >= 0,Met Sentiment Goal
0,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-01-01,Margaree,443.0,2.0,True,5.0,445.0,7565.0,17.0,445.0,7565.0,17.0,,-53.0,0.0,False
1,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-02-01,Margaree,590.0,16.0,True,5.0,606.0,4848.0,8.0,606.0,4848.0,8.0,,97.0,0.0,True
2,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-03-01,Margaree,338.0,75.0,False,5.0,413.0,2478.0,7.0,413.0,2478.0,7.0,,23.0,0.0,True
3,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-04-01,Margaree,748.0,12.0,True,5.0,760.0,6080.0,8.0,760.0,6080.0,8.0,,21.0,0.0,True
4,4,"Garnam, Fleur",1,"Howroyd, Kylie",2021-05-01,Margaree,464.0,22.0,True,5.0,486.0,1458.0,3.0,486.0,1458.0,3.0,23.0,-33.0,0.0,False


In [28]:
#output the dataset
df.to_csv('wk7-output.csv', index=False)