# PhoneNow - Call Centre Report

In this task, I'm a data analyst consultant for **PhoneNow** - a big telecom company. I've just received an email from Claire, a Call Centre Manager, she is looking for transparency and insight into the data we have at the Call Centre; for example, total number of calls answered and abandoned, speed of answer, length of calls, overall satisfaction etc. She just wants an accurate overview of long-term trends in customer and agent behaviour.

The goal is therefore to create a dashboard on Call Centre trends that she can use as a basis for discussion with management.

## Strategy
To takle this problem, I'm first going to perform exploratory analysis in this notebook, find patterns that can actually go into a dashboard, them implement them in Power BI.

Looking at the data provided by Claire, I think it makes sense to break down the metrics and KPIs into 3 categories:
- **Customer Experience**
- **Call Volume and Efficiency**
- **Agent Performance**


In [213]:
# DATA MANIPULATION
import os
import copy
import pandas as pd
import numpy as np

# DATA VIZ
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use("fivethirtyeight")
# plt.rcParams["figure.figsize"] = [8, 5]
# plt.rcParams["figure.dpi"] = 100
plt.rcParams["figure.facecolor"] = "white"

import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

import warnings

warnings.filterwarnings("ignore")
sns.set(color_codes=True)

In [214]:
PHONENOW_COLOURS = [
    "#0072CE",
    "#B4B4B3",
    "#79B8F3",
    "#FDB927",
    "#F7941D",
    "#4CB748",
    "#2E3192",
]
DIVERGENT_COLOUR_GRADIENT = [
    "#e2f1fc",
    "#b9dcfa",
    "#8cc7f7",
    "#5eb1f3",
    "#39a0f1",
    "#0691ef",
]
sns.set_palette(PHONENOW_COLOURS)

# Loading and Cleaning Data

In [215]:
df = pd.read_csv("./call-centre-dataset.csv")
df.head()

Unnamed: 0,Call Id,Agent,Date,Time,Topic,Answered (Y/N),Resolved,Speed of answer in seconds,AvgTalkDuration,Satisfaction rating
0,ID0001,Diane,2021-01-01,9:12:58,Contract related,Y,Y,109.0,0:02:23,3.0
1,ID0002,Becky,2021-01-01,9:12:58,Technical Support,Y,N,70.0,0:04:02,3.0
2,ID0003,Stewart,2021-01-01,9:47:31,Contract related,Y,Y,10.0,0:02:11,3.0
3,ID0004,Greg,2021-01-01,9:47:31,Contract related,Y,Y,53.0,0:00:37,2.0
4,ID0005,Becky,2021-01-01,10:00:29,Payment related,Y,Y,95.0,0:01:00,3.0


In [216]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 10 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Call Id                     5000 non-null   object 
 1   Agent                       5000 non-null   object 
 2   Date                        5000 non-null   object 
 3   Time                        5000 non-null   object 
 4   Topic                       5000 non-null   object 
 5   Answered (Y/N)              5000 non-null   object 
 6   Resolved                    5000 non-null   object 
 7   Speed of answer in seconds  4054 non-null   float64
 8   AvgTalkDuration             4054 non-null   object 
 9   Satisfaction rating         4054 non-null   float64
dtypes: float64(2), object(8)
memory usage: 390.8+ KB


In [217]:
df = df.rename(
    columns={
        "Call Id": "call_id",
        "Agent": "agent",
        "Date": "date",
        "Time": "time",
        "Topic": "topic",
        "Answered (Y/N)": "answered",
        "Resolved": "resolved",
        "Speed of answer in seconds": "answerSpeed",
        "AvgTalkDuration": "avgTalkDuration",
        "Satisfaction rating": "rating",
    }
)

df["date"] = pd.to_datetime(df["date"])
df["time"] = pd.to_datetime(df["time"]).dt.time
df["answered"] = df["answered"].map({"Y": 1, "N": 0})
df["resolved"] = df["resolved"].map({"Y": 1, "N": 0})
df["avgTalkDuration"] = pd.to_datetime(df["avgTalkDuration"]).dt.time
df["rating"] = df["rating"].astype("Int64")

In [218]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   call_id          5000 non-null   object        
 1   agent            5000 non-null   object        
 2   date             5000 non-null   datetime64[ns]
 3   time             5000 non-null   object        
 4   topic            5000 non-null   object        
 5   answered         5000 non-null   int64         
 6   resolved         5000 non-null   int64         
 7   answerSpeed      4054 non-null   float64       
 8   avgTalkDuration  4054 non-null   object        
 9   rating           4054 non-null   Int64         
dtypes: Int64(1), datetime64[ns](1), float64(1), int64(2), object(5)
memory usage: 395.6+ KB


In [219]:
df.head()

Unnamed: 0,call_id,agent,date,time,topic,answered,resolved,answerSpeed,avgTalkDuration,rating
0,ID0001,Diane,2021-01-01,09:12:58,Contract related,1,1,109.0,00:02:23,3
1,ID0002,Becky,2021-01-01,09:12:58,Technical Support,1,0,70.0,00:04:02,3
2,ID0003,Stewart,2021-01-01,09:47:31,Contract related,1,1,10.0,00:02:11,3
3,ID0004,Greg,2021-01-01,09:47:31,Contract related,1,1,53.0,00:00:37,2
4,ID0005,Becky,2021-01-01,10:00:29,Payment related,1,1,95.0,00:01:00,3


In [220]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   call_id          5000 non-null   object        
 1   agent            5000 non-null   object        
 2   date             5000 non-null   datetime64[ns]
 3   time             5000 non-null   object        
 4   topic            5000 non-null   object        
 5   answered         5000 non-null   int64         
 6   resolved         5000 non-null   int64         
 7   answerSpeed      4054 non-null   float64       
 8   avgTalkDuration  4054 non-null   object        
 9   rating           4054 non-null   Int64         
dtypes: Int64(1), datetime64[ns](1), float64(1), int64(2), object(5)
memory usage: 395.6+ KB


# Customer Experience
In this section we're going to focus on metrics that we'd think about when improving customer experience, which are:
- [x] What challenges prompt customer calls(topics)?
- [x] What's the resolution rate for each of these challenges?
- [x] How long do these calls take per topic?
- [x] How are the rating scores per topic?

## Topics

In [221]:
df['topic'].unique()

array(['Contract related', 'Technical Support', 'Payment related',
       'Admin Support', 'Streaming'], dtype=object)

In [222]:
topic_counts = df["topic"].value_counts()
topic_counts.head()

topic
Streaming            1022
Technical Support    1019
Payment related      1007
Contract related      976
Admin Support         976
Name: count, dtype: int64

There's practically no difference on the number of calls per topic; in other words we don't see anything alarming as yet. Perhaps we can check if there was a time when one topic had numbers we're not used to.

In [223]:
fig = px.bar(
    x=topic_counts.index,
    y=topic_counts.values,
    labels={"x": "Topics", "y": "Number of Calls"},
    title="Number of Calls per Topic",
)
fig.show()

In [224]:
daily_calls_per_topic = (
    df.groupby(["date", "topic"])
    .size()
    .to_frame(name="count")
    .reset_index()
    .sort_values(by=["date", "topic"])
)

In [225]:
daily_calls_per_topic.head(10)

Unnamed: 0,date,topic,count
0,2021-01-01,Admin Support,10
1,2021-01-01,Contract related,11
2,2021-01-01,Payment related,11
3,2021-01-01,Streaming,10
4,2021-01-01,Technical Support,16
5,2021-01-02,Admin Support,8
6,2021-01-02,Contract related,14
7,2021-01-02,Payment related,7
8,2021-01-02,Streaming,11
9,2021-01-02,Technical Support,20


In [226]:
fig = go.Figure()
for topic in daily_calls_per_topic["topic"].unique():
    data_subset = daily_calls_per_topic[daily_calls_per_topic["topic"] == topic]
    fig.add_trace(
        go.Scatter(
            x=data_subset["date"],
            y=data_subset["count"],
            mode="lines",
            name=topic,
        )
    )

# Customize the graph layout
fig.update_layout(
    title="Number of Calls per Day by Topic",
    xaxis_title="Date",
    yaxis_title="Number of Calls",
    legend_title="Topic",
)

# Display the graph
fig.show()

We learn from these that:
- Admin Support & Contract related: No discernible pattern 
- Payment related: Calls under this topic seem to have cycles that repeat every 4, 5 or 6 days which I assume suggests a weekly pattern, therefore, let's aggregate calls by day of week.
- Technical Support: They seem to have a monthly cycle

In [227]:
df["date"] = pd.to_datetime(df["date"])
df["dayOfWeek"] = df["date"].dt.day_name()
payment_related_calls = df[df["topic"] == "Payment related"]

callsByDayOfWeek = (
    payment_related_calls.groupby("dayOfWeek")["call_id"]
    .count()
    .reset_index()
    .sort_values(by="call_id")
)

In [228]:
callsByDayOfWeek.head()

Unnamed: 0,dayOfWeek,call_id
0,Friday,132
4,Thursday,133
5,Tuesday,142
1,Monday,145
6,Wednesday,149


In [229]:
fig = px.bar(
    callsByDayOfWeek,
    x="dayOfWeek",
    y="call_id",
    title="Number of Calls Related to Payment per Day of the Week",
    labels={"dayOfWeek": "Day of the Week", "call_id": "Number of Calls"},
)
fig.show()

In [230]:
df["dayOfMonth"] = df["date"].dt.day
techSupportCalls = df[
    df["topic"] == "Technical Support"
]

techCallsByDayOfMonth = (
    techSupportCalls.groupby("dayOfMonth")["call_id"]
    .count()
    .reset_index()
    .sort_values(by="dayOfMonth")
)

In [231]:
techCallsByDayOfMonth.head()

Unnamed: 0,dayOfMonth,call_id
0,1,43
1,2,34
2,3,38
3,4,43
4,5,38


In [232]:
fig = px.line(
    techCallsByDayOfMonth,
    x="dayOfMonth",
    y="call_id",
    title="Number of Calls Related to Technical Support per Day of the Month",
    labels={"dayOfMonth": "Day of the Month", "call_id": "Number of Calls"},
)
fig.show()

On the dashboard, we'll aggregate calls per topic by day of month and day of week.

## Topic Resolutions

In [233]:
resolved_counts = df.groupby(["topic", "resolved"]).size().reset_index(name="count")
resolved_counts.head()

Unnamed: 0,topic,resolved,count
0,Admin Support,0,253
1,Admin Support,1,723
2,Contract related,0,267
3,Contract related,1,709
4,Payment related,0,278


In [234]:
fig = px.bar(
    resolved_counts,
    x="topic",
    y="count",
    color="resolved",
    title="Resolved and Unresolved Calls per Topic",
    labels={"topic": "Topic", "count": "Number of Calls", "resolved": "Resolved"},
    color_continuous_scale=PHONENOW_COLOURS
)
fig.update_layout(barmode="stack")
fig.show()

So almost 30% of calls per topic aren't resolved

In [235]:
rating_counts = df.groupby(["topic", "rating"]).size().reset_index(name="count")
rating_counts.head()

Unnamed: 0,topic,rating,count
0,Admin Support,1,72
1,Admin Support,2,76
2,Admin Support,3,259
3,Admin Support,4,217
4,Admin Support,5,171


In [236]:
fig = px.bar(
    rating_counts,
    x="topic",
    y="count",
    color="rating",
    title="Distribution of Ratings per Topic",
    labels={"topic": "Topic", "count": "Number of Calls", "rating": "Rating"},
    color_continuous_scale=PHONENOW_COLOURS,
)

fig.update_layout(barmode="stack")

fig.show()

Since there's nothing interesting about drilling down into specific topics, our dashboard will only have overall aggregates like we do below:

In [237]:
resolved_percentage = df["resolved"].mean() * 100
unresolved_percentage = 100 - resolved_percentage

resolution_df = pd.DataFrame(
    {
        "Resolution Status": ["Resolved", "Unresolved"],
        "Percentage": [resolved_percentage, unresolved_percentage],
    }
)

resolution_df.head()

Unnamed: 0,Resolution Status,Percentage
0,Resolved,72.92
1,Unresolved,27.08


In [238]:
rating_percentages = (
    df["rating"]
    .value_counts(normalize=True)
    .sort_index() * 100)

rating_percentages_df = pd.DataFrame(
    {
        "Rating": rating_percentages.index,
        "Percentage": rating_percentages.values
    }
)

rating_percentages_df.head()

Unnamed: 0,Rating,Percentage
0,1,10.286137
1,2,9.76813
2,3,30.044401
3,4,29.107055
4,5,20.794277


In [239]:
resolution_fig = px.bar(
    resolution_df,
    x="Resolution Status",
    y="Percentage",
    color="Resolution Status",
    labels={"Resolution Status": "Resolution Status", "Percentage": "Percentage"},
    title="Resolution Rates",
    width=300,
)

fig_rating_percentages = px.bar(
    x=rating_percentages.index,
    y=rating_percentages.values,
    labels={"x": "Rating", "y": "Percentage"},
    title="Rating Percentages",
    width=700,
)

fig = make_subplots(
    rows=1, cols=2, subplot_titles=("Resolution Rates", "Rating Percentages")
)

fig.add_trace(resolution_fig["data"][0], row=1, col=1)
fig.add_trace(fig_rating_percentages["data"][0], row=1, col=2)

fig.update_layout(width=1100, showlegend=False)

fig.show()

Okay, now we have a bug. Will deal with it later

## Call Lengths by Topic

We can do this by visualizing total number of minutes spent on a particular topic.

In [240]:
df.head()

Unnamed: 0,call_id,agent,date,time,topic,answered,resolved,answerSpeed,avgTalkDuration,rating,dayOfWeek,dayOfMonth
0,ID0001,Diane,2021-01-01,09:12:58,Contract related,1,1,109.0,00:02:23,3,Friday,1
1,ID0002,Becky,2021-01-01,09:12:58,Technical Support,1,0,70.0,00:04:02,3,Friday,1
2,ID0003,Stewart,2021-01-01,09:47:31,Contract related,1,1,10.0,00:02:11,3,Friday,1
3,ID0004,Greg,2021-01-01,09:47:31,Contract related,1,1,53.0,00:00:37,2,Friday,1
4,ID0005,Becky,2021-01-01,10:00:29,Payment related,1,1,95.0,00:01:00,3,Friday,1


In [241]:
df["avgTalkDuration"] = pd.to_timedelta(df["avgTalkDuration"].astype(str))

dailyMinutesPerTopic = (
    df.groupby(["date", "topic"])["avgTalkDuration"]
    .sum()
    .reset_index()
    .sort_values(by=["date", "topic"])
)

dailyMinutesPerTopic["avgTalkMinutes"] = (
    (dailyMinutesPerTopic["avgTalkDuration"].dt.total_seconds() / 60).round(2)
)
dailyMinutesPerTopic.drop(columns=["avgTalkDuration"], inplace=True)

dailyMinutesPerTopic.head()

Unnamed: 0,date,topic,avgTalkMinutes
0,2021-01-01,Admin Support,37.78
1,2021-01-01,Contract related,27.52
2,2021-01-01,Payment related,18.15
3,2021-01-01,Streaming,26.85
4,2021-01-01,Technical Support,48.55


In [242]:
fig = px.line(
    dailyMinutesPerTopic,
    x="date",
    y="avgTalkMinutes",
    color="topic",
    labels={"date": "Date", "avgTalkMinutes": "Minutes Spent", "topic": "Topic"},
    title="Total Minutes Spent on a Topic per Day",
)
fig.show()

Also nothing interestings topic-wise, meaning we'll just be fine looking at minutes as they are and disregarding the topic. It doesn't seem any more useful.

# Call Volume and Efficiency
In this section we'll look at metrics to focus on in order to improve efficiency in the call centre:
- [x] Call volumn distribution by hour, day and month
- [x] Compare with the number of agents per hour, day and month
- [x] How does the above affect the following:
    - [x] Abandoned calls rate
    - [x] Speed of answer
- [x] Duration calls per day, week or month

## Call Volume Distrbution

In [243]:
df["hour"] = (df["time"].astype(str)).str[:2].astype(int)
df["month"] = df["date"].dt.month_name()
df.head()

Unnamed: 0,call_id,agent,date,time,topic,answered,resolved,answerSpeed,avgTalkDuration,rating,dayOfWeek,dayOfMonth,hour,month
0,ID0001,Diane,2021-01-01,09:12:58,Contract related,1,1,109.0,0 days 00:02:23,3,Friday,1,9,January
1,ID0002,Becky,2021-01-01,09:12:58,Technical Support,1,0,70.0,0 days 00:04:02,3,Friday,1,9,January
2,ID0003,Stewart,2021-01-01,09:47:31,Contract related,1,1,10.0,0 days 00:02:11,3,Friday,1,9,January
3,ID0004,Greg,2021-01-01,09:47:31,Contract related,1,1,53.0,0 days 00:00:37,2,Friday,1,9,January
4,ID0005,Becky,2021-01-01,10:00:29,Payment related,1,1,95.0,0 days 00:01:00,3,Friday,1,10,January


In [244]:
lineTimeUnits = {
    'hour': 'Hour of Day',
    'dayOfMonth': 'Day of Month',
}

barTimeUnits = {
    "dayOfWeek": "Day of Week",
    "month": "Month",
}

def getCallPerTimeUnit(df, timeUnit, valueToCount):
    return df.groupby(timeUnit)[valueToCount].nunique().reset_index()


def createTimeUnitTrace(callsPerTimeUnit, timeUnit, valueToCount, traceName, mode):
    if mode == "bar":
        return go.Bar(
            x=callsPerTimeUnit[timeUnit],
            y=callsPerTimeUnit[valueToCount],
            name=traceName,
        )
    elif mode == "line":
        return go.Scatter(
            x=callsPerTimeUnit[timeUnit],
            y=callsPerTimeUnit[valueToCount],
            mode="lines",
            name=traceName,
        )


def viewCallsPerTimeUnit(units, timeUnits, valueToCount, title):
    fig = make_subplots(
        rows=1, cols=len(units), subplot_titles=list(timeUnits.values())
    )

    for i, unit in enumerate(units):
        name = timeUnits[unit]
        callsPerTimeUnit = getCallPerTimeUnit(df, unit, valueToCount)
        mode = "bar" if unit in barTimeUnits else "line"
        trace = createTimeUnitTrace(callsPerTimeUnit, unit, valueToCount, name, mode)

        fig.add_trace(trace, row=1, col=i + 1)

    fig.update_layout(title=title, height=400, showlegend=False)
    fig.show()

In [245]:
units = list(lineTimeUnits.keys())
viewCallsPerTimeUnit(units, lineTimeUnits, 'call_id', 'Number of Calls')

So the working hours are 9-6 and perhaps 14 calls falling under 6pm were taken late already i.e. an agent took one more call even though their time was up.
It's also interesting to see that customers call less towards the end of the month and rather call more in the first 2 weeks of the month.

In [246]:
units = list(barTimeUnits.keys())
viewCallsPerTimeUnit(units, barTimeUnits, 'call_id', "Number of Calls")

We should expect more calls on Mondays and Saturdays

In [247]:
units = list(lineTimeUnits.keys())
viewCallsPerTimeUnit(units, lineTimeUnits, "agent", "Number of Agents")

In [248]:
units = list(barTimeUnits.keys())
viewCallsPerTimeUnit(units, barTimeUnits, "agent", "Number of Agents")

## Abandoned Calls

In [249]:
df.head()

Unnamed: 0,call_id,agent,date,time,topic,answered,resolved,answerSpeed,avgTalkDuration,rating,dayOfWeek,dayOfMonth,hour,month
0,ID0001,Diane,2021-01-01,09:12:58,Contract related,1,1,109.0,0 days 00:02:23,3,Friday,1,9,January
1,ID0002,Becky,2021-01-01,09:12:58,Technical Support,1,0,70.0,0 days 00:04:02,3,Friday,1,9,January
2,ID0003,Stewart,2021-01-01,09:47:31,Contract related,1,1,10.0,0 days 00:02:11,3,Friday,1,9,January
3,ID0004,Greg,2021-01-01,09:47:31,Contract related,1,1,53.0,0 days 00:00:37,2,Friday,1,9,January
4,ID0005,Becky,2021-01-01,10:00:29,Payment related,1,1,95.0,0 days 00:01:00,3,Friday,1,10,January


In [250]:
def viewBinaryDistrbutions(binaryCol, trueValName, falseValName, title, legendTitle):
    true_data = df[df[binaryCol] == 1].groupby("date")["call_id"].nunique()
    false_data = df[df[binaryCol] == 0].groupby("date")["call_id"].nunique()

    true_trace = go.Scatter(
        x=true_data.index,
        y=true_data.values,
        mode="lines",
        name=trueValName,
    )
    false_trace = go.Scatter(
        x=false_data.index,
        y=false_data.values,
        mode="lines",
        name=falseValName,
    )

    fig = go.Figure(data=[true_trace, false_trace])

    fig.update_layout(
        title=title,
        xaxis_title="Date",
        yaxis_title="Number of Calls",
        legend_title=legendTitle,
    )

    fig.show()

In [251]:
viewBinaryDistrbutions(
    'answered',
    'Answered Calls',
    'Unanswered Calls',
    'Number of Answered and Unanswered Calls',
    'Call Status'
)

In [252]:
viewBinaryDistrbutions(
    "resolved",
    "Resolved",
    "Unresolved",
    "Number of Resolved and Unresolved Calls",
    "Resolution Status",
)

In [253]:
answered_percentage = df["answered"].value_counts(normalize=True) * 100
resolved_percentage = df["resolved"].value_counts(normalize=True) * 100

percentage_df = pd.DataFrame(
    {
        "Variable": ["answered", "resolved"],
        "True": [answered_percentage[1], resolved_percentage[1]],
        "False": [answered_percentage[0], resolved_percentage[0]],
    }
)

percentage_df.head()

Unnamed: 0,Variable,True,False
0,answered,81.08,18.92
1,resolved,72.92,27.08


In [254]:
fig = go.Figure(
    data=[
        go.Bar(name="True", x=percentage_df["Variable"], y=percentage_df["True"]),
        go.Bar(name="False", x=percentage_df["Variable"], y=percentage_df["False"]),
    ]
)

fig.update_layout(
    title="Percentage of Answered and Resolved Variables",
    xaxis_title="Variable",
    yaxis_title="Percentage",
    barmode="group",
)

fig.show()

In [255]:
dailyAvgAnswerSpeed = df.groupby("date")["answerSpeed"].mean().reset_index().round(2)
dailyAvgAnswerSpeed.head()

Unnamed: 0,date,answerSpeed
0,2021-01-01,65.3
1,2021-01-02,68.5
2,2021-01-03,78.6
3,2021-01-04,70.11
4,2021-01-05,70.2


In [256]:
monthlyAvgAnswerSpeed = (
    df.groupby('month')["answerSpeed"].mean().reset_index()
)
monthlyAvgAnswerSpeed.head()

Unnamed: 0,month,answerSpeed
0,February,67.546225
1,January,67.219931
2,March,67.831668


In [257]:
dayOfWeekAnswerSpeed = (
    df.groupby('dayOfWeek')['answerSpeed'].mean().reset_index()
)
dayOfWeekAnswerSpeed.head()

Unnamed: 0,dayOfWeek,answerSpeed
0,Friday,67.630037
1,Monday,66.941176
2,Saturday,65.814332
3,Sunday,66.928571
4,Thursday,69.576857


In [258]:
dailyAnswerSpeed = df.groupby("date")["answerSpeed"].sum().reset_index()
weeklyAnswerSpeed = dailyAnswerSpeed.copy()
weeklyAnswerSpeed["answerSpeed"] = (
    dailyAnswerSpeed["answerSpeed"].rolling(7).mean()
) 
weeklyAnswerSpeed.head(10)

Unnamed: 0,date,answerSpeed
0,2021-01-01,
1,2021-01-02,
2,2021-01-03,
3,2021-01-04,
4,2021-01-05,
5,2021-01-06,
6,2021-01-07,3228.714286
7,2021-01-08,3169.571429
8,2021-01-09,3255.285714
9,2021-01-10,3310.428571


In [259]:
def viewAnswerSpeed(df, title):
    fig = go.Figure(
        go.Scatter(
            x=df["date"],
            y=df["answerSpeed"],
            mode="lines",
        )
    )

    fig.update_layout(
        title=title,
        xaxis_title="Date",
        yaxis_title="Average Answer Speed",
    )

    return fig

In [260]:
fig = make_subplots(
    rows=1,
    cols=2,
    subplot_titles=("Daily Average Answer Speed", "Weekly Average Answer Speed"),
)

fig.add_trace(
    viewAnswerSpeed(dailyAvgAnswerSpeed, "Daily Average Answer Speed").data[0],
    row=1,
    col=1,
)
fig.add_trace(
    viewAnswerSpeed(weeklyAnswerSpeed, "Answer Speed - 7-day Rolling Window Average").data[0],
    row=1,
    col=2,
)

fig.show()

# Agent Performance
- [x] How long does each agent spend on the phone?
- [x] How satisfied are customers with each agent?
- [ ] How well do agents perform by resolution rate?

In [261]:
agentsDailyTalkDuration = (
    df.groupby(["date", "agent"])["avgTalkDuration"].sum().reset_index()
)

agentsDailyTalkDuration["avgTalkMinutes"] = (
    agentsDailyTalkDuration["avgTalkDuration"].dt.total_seconds() / 60
).round(2)
agentsDailyTalkDuration.drop(columns=["avgTalkDuration"], inplace=True)

agentsDailyTalkDuration.head()

Unnamed: 0,date,agent,avgTalkMinutes
0,2021-01-01,Becky,20.32
1,2021-01-01,Dan,17.48
2,2021-01-01,Diane,29.22
3,2021-01-01,Greg,20.7
4,2021-01-01,Jim,15.63


In [262]:
fig = go.Figure()

for agent in agentsDailyTalkDuration["agent"].unique():
    agent_data = agentsDailyTalkDuration[agentsDailyTalkDuration["agent"] == agent]
    fig.add_trace(
        go.Scatter(
            x=agent_data["date"],
            y=agent_data["avgTalkMinutes"],
            mode="lines",
            name=agent,
        )
    )

fig.update_layout(
    title="Agent's Talk Duration per Day",
    xaxis_title="Date",
    yaxis_title="Talk Duration",
)

fig.show()

In [263]:
agentRatings = df.groupby(["agent", "rating"]).size().reset_index(name="count")
agentRatings.head()

Unnamed: 0,agent,rating,count
0,Becky,1,64
1,Becky,2,42
2,Becky,3,150
3,Becky,4,160
4,Becky,5,101


In [264]:
fig = px.bar(
    agentRatings,
    x="agent",
    y="count",
    color="rating",
    title="Agets' Rating Distributions",
    labels={"agent": "Agent", "count": "Number of Calls", "rating": "Rating"},
    color_continuous_scale=PHONENOW_COLOURS,
)


fig.update_layout(barmode="stack")


fig.show()

These agents all have the same performance. The little differences may just be due to chance

In [266]:
agentResolutions = df.groupby(["agent", "resolved"]).size().reset_index(name="count")
agentResolutions_pivot = agentResolutions.pivot_table(
    index="agent", columns="resolved", values="count", fill_value=0
).reset_index()
agentResolutions_pivot.columns.name = None
agentResolutions_pivot.rename(columns={0: "Unresolved", 1: "Resolved"}, inplace=True)

agentResolutions_pivot.head()

Unnamed: 0,agent,Unresolved,Resolved
0,Becky,169.0,462.0
1,Dan,162.0,471.0
2,Diane,181.0,452.0
3,Greg,169.0,455.0
4,Jim,181.0,485.0


In [267]:
fig = go.Figure(
    data=[
        go.Bar(
            name="Resolved",
            x=agentResolutions_pivot["agent"],
            y=agentResolutions_pivot["Resolved"],
        ),
        go.Bar(
            name="Unresolved",
            x=agentResolutions_pivot["agent"],
            y=agentResolutions_pivot["Unresolved"],
        ),
    ]
)

fig.update_layout(
    barmode="group",
    title="Agents' Resolution Rates",
    xaxis_title="Agent",
    yaxis_title="Count",
)

fig.show()

In [268]:
df.head()

Unnamed: 0,call_id,agent,date,time,topic,answered,resolved,answerSpeed,avgTalkDuration,rating,dayOfWeek,dayOfMonth,hour,month
0,ID0001,Diane,2021-01-01,09:12:58,Contract related,1,1,109.0,0 days 00:02:23,3,Friday,1,9,January
1,ID0002,Becky,2021-01-01,09:12:58,Technical Support,1,0,70.0,0 days 00:04:02,3,Friday,1,9,January
2,ID0003,Stewart,2021-01-01,09:47:31,Contract related,1,1,10.0,0 days 00:02:11,3,Friday,1,9,January
3,ID0004,Greg,2021-01-01,09:47:31,Contract related,1,1,53.0,0 days 00:00:37,2,Friday,1,9,January
4,ID0005,Becky,2021-01-01,10:00:29,Payment related,1,1,95.0,0 days 00:01:00,3,Friday,1,10,January


In [270]:

df.to_csv("./call-centre-dataset.csv", index=False)

In [271]:
dtypes = df.dtypes
dtypes

call_id                     object
agent                       object
date                datetime64[ns]
time                        object
topic                       object
answered                     int64
resolved                     int64
answerSpeed                float64
avgTalkDuration    timedelta64[ns]
rating                       Int64
dayOfWeek                   object
dayOfMonth                   int32
hour                         int32
month                       object
dtype: object

In [274]:
df = pd.read_csv("./call-centre-dataset.csv")
df.head()

Unnamed: 0,call_id,agent,date,time,topic,answered,resolved,answerSpeed,avgTalkDuration,rating,dayOfWeek,dayOfMonth,hour,month
0,ID0001,Diane,2021-01-01,09:12:58,Contract related,1,1,109.0,0 days 00:02:23,3.0,Friday,1,9,January
1,ID0002,Becky,2021-01-01,09:12:58,Technical Support,1,0,70.0,0 days 00:04:02,3.0,Friday,1,9,January
2,ID0003,Stewart,2021-01-01,09:47:31,Contract related,1,1,10.0,0 days 00:02:11,3.0,Friday,1,9,January
3,ID0004,Greg,2021-01-01,09:47:31,Contract related,1,1,53.0,0 days 00:00:37,2.0,Friday,1,9,January
4,ID0005,Becky,2021-01-01,10:00:29,Payment related,1,1,95.0,0 days 00:01:00,3.0,Friday,1,10,January
