# Call Center Cleanup

For this assignment, we will be working with call center data. You can start working on the assignment after the first lesson on Exploratory Data Analysis with pandas. Make sure to read the whole assignment before starting anything! As you code along in the Jupyter notebook, you are asked to make note of the results of your analysis. Do so by clicking on the results box and adding your notes beside each question.

## Business Issue and Understanding

You are working for a company that has two call centers: the North Call Center and the South Call Center. The company is looking to possibly hire five additional reps to enhance customer experience. Your task is to explore how efficient the current reps are in each branch to determine which branch would benefit from additional assistance.

### How the Call Center Works

Call center representatives are assigned queues. When calls are assigned to a queue, the call is assigned to the next person in line in the queue. After a call is assigned to a representative, the amount of time between assignment and the call starting is divided into busy minutes and not ready minutes. If the call is incoming and a customer is waiting on the phone for a rep, the time is split into three categories: busy minutes, not ready minutes, and incoming call wait time. Once the rep has the customer on the phone, there might be during call wait time, where the call is put on hold while the rep gets an answer for the customer.

### Notes about the Dataset

If you haven't worked in a call center before, these notes might help you throughout your analysis.

* The call purpose is tagged for each call.
* The time of the call is tagged in 1 hour blocks starting at 9:00 AM and ending at 5:00 PM.
* Calls are tagged as incoming or outgoing.
* Reps are assigned to queues. When the reps are working, they take calls in the order of their queue.
* A call that is dropped due to technical error or missed by the center because they have reached maximum capacity is a lost call.
* An abandoned call is when the customer hangs up because they have been waiting for too long.
* Busy Minutes: the amount of time after a call comes in or needs to go out where the assigned rep is not available because they are busy with other customers.
* Not Ready Minutes: the amount of time after a call comes in or needs to go out where the assigned rep is not available because they are not ready (for example, getting water).
* Incoming Wait Time - amount of time after assigned rep is available to take the call customer waits for representative to pick up a call. This is tracked in seconds.
* During Call Wait Time - amount of time during call that customer has to wait for representative

## Getting Started

You have two CSVs at your disposal, `NorthCallCenter.csv` and `SouthCallCenter.csv`. Import the appropriate libraries and create two dataframes, one called `north_df` and one called `south_df`.

In [4]:
# Import the appropriate libraries with aliases
import pandas as pd

# Create two new dataframes
north_df = pd.read_csv('NorthCallCenter.csv')
south_df = pd.read_csv('SouthCallCenter.csv')


## Task 1: Exploratory Data Analysis

Time to do some EDA! In the process of learning more about the two datasets, answer the following questions. Use the code blocks below to begin cleaning your data. At the end of the section, record your answers.

#### EDA Question 1A:  How many reps are in the North branch?  

In [7]:
# EDA Question 1A solution below:
import pandas as pd
north_df = pd.read_csv('NorthCallCenter.csv')
unique_reps_count = north_df['Rep ID'].nunique()

print(f"Number of unique representatives in the North Call Center: {unique_reps_count}")


Number of unique representatives in the North Call Center: 9


#### EDA Question 1B:  How many reps are in the South branch?  

In [9]:
# EDA Question 1B solution Below:
import pandas as pd
south_df = pd.read_csv('SouthCallCenter.csv')
unique_reps_count = south_df['Rep ID'].nunique()

print(f"Number of unique representatives in the South Call Center: {unique_reps_count}")



Number of unique representatives in the South Call Center: 11


#### EDA Question 2A:  What is the average busy minutes, not ready minutes, incoming wait time, and during call wait time for the North branch? 

In [11]:
# EDA Question 2A solution Below:
import pandas as pd
north_df = pd.read_csv('NorthCallCenter.csv')

average_busy_minutes = north_df['Busy Minutes'].mean()
average_not_ready_minutes = north_df['Not Ready Minutes'].mean()
average_incoming_wait_time = north_df['Incoming Wait Time'].mean()  # Adjust if necessary
average_during_call_wait_time = north_df['During Call Wait Time'].mean()  # Adjust if necessary

print(f"Average Busy Minutes: {average_busy_minutes:.2f}")
print(f"Average Not Ready Minutes: {average_not_ready_minutes:.2f}")
print(f"Average Incoming Wait Time: {average_incoming_wait_time:.2f}")
print(f"Average During Call Wait Time: {average_during_call_wait_time:.2f}")



Average Busy Minutes: 9.99
Average Not Ready Minutes: 1.91
Average Incoming Wait Time: 3.05
Average During Call Wait Time: 2.97


#### EDA Question 2B:  What is the average busy minutes, not ready minutes, incoming wait time, and during call wait time for the South branch? 

In [13]:
# EDA Question 2B solution Below:
import pandas as pd
south_df = pd.read_csv('SouthCallCenter.csv')

average_busy_minutes = south_df['Busy Minutes'].mean()
average_not_ready_minutes = south_df['Not Ready Minutes'].mean()
average_incoming_wait_time = south_df['Incoming Wait Time'].mean()  # Adjust if necessary
average_during_call_wait_time = south_df['During Call Wait Time'].mean()  # Adjust if necessary

print(f"Average Busy Minutes: {average_busy_minutes:.2f}")
print(f"Average Not Ready Minutes: {average_not_ready_minutes:.2f}")
print(f"Average Incoming Wait Time: {average_incoming_wait_time:.2f}")
print(f"Average During Call Wait Time: {average_during_call_wait_time:.2f}")

Average Busy Minutes: 10.05
Average Not Ready Minutes: 1.91
Average Incoming Wait Time: 3.00
Average During Call Wait Time: 3.08


#### EDA Question 3:  What is the number of calls taken for each time block(Both North and South combined)?

In [14]:
# EDA Question 3 solution Below:
import pandas as pd

north_df = pd.read_csv('NorthCallCenter.csv')
south_df = pd.read_csv('SouthCallCenter.csv')

combined_df = pd.concat([north_df, south_df])

call_counts = combined_df.groupby('Time Block')['Calls'].count()

print(call_counts)



Time Block
10:00 AM     99
11:00 AM     56
12:00 PM    120
1:00 PM      40
2:00 PM      65
3:00 PM      73
4:00 PM      53
5:00 PM      43
9:00 AM      10
Name: Calls, dtype: int64


##### Record your answers for the EDA section below:
- 1a) Reps in the North Branch = 9
- 1b) Reps in the South Branch = 11


- 2a) North Branch, (round to the nearest hundredth):
> - Busy Minutes = 9.99
> - Not Ready Minutes = 1.91
> - Incoming Wait Time = 3.05
> - During Call Wait Time = 2.97
- 2b) South Branch, (round to the nearest hundredth):
> - Busy Minutes = 10.05
> - Not Ready Minutes = 1.91
> - Incoming Wait Time = 3.00
> - During Call Wait Time = 3.08
- 3) Total Calls taken for all branches per time block:
> - 9:00AM   = 99
> - 10:00AM  = 56
> - 11:00AM  = 120
> - 12:00PM  = 40
> - 1:00PM   = 65
> - 2:00PM   = 73
> - 3:00PM   = 53
> - 4:00PM   = 43
> - 5:00PM   = 10



## Task 2: Cleaning Your Data 
Now you need to clean up the datasets. When cleaning the datasets, you may find that there isn't dirty data to clean. That is okay! Some questions you need to answer about the data sets.
* Add additional code blocks as needed to show how you came to your conclusions. Add comments in your code blocks to help others understand your thinking. 

#### Cleaning Question 1:  Is there any missing data and if yes explain what you would do with the missing data and why?

In [8]:
# Question 1 solution below
#isnull check for missing data in columns of each call center and sum them
# isnull to see which rows have at lesat one null value
import pandas as pd
north_df = pd.read_csv('NorthCallCenter.csv')
south_df = pd.read_csv('SouthCallCenter.csv')

missing_data_ndf= north_df.isnull().sum()
missing_data_sdf = south_df.isnull().sum()

null_rows_ndf = north_df[north_df.isnull().any(axis =1)]
null_rows_sdf= south_df[south_df.isnull().any(axis=1)]

print(missing_data_ndf[missing_data_ndf> 0])
print(missing_data_sdf[missing_data_sdf>0])
print(null_rows_ndf)
print(null_rows_sdf)



Incoming Wait Time    163
dtype: int64
Incoming Wait Time    188
dtype: int64
     Unnamed: 0 Branch     Call Purpose Time Block Incoming or Outgoing Queue  \
82           82  North    Sales Support   10:00 AM             Outgoing     A   
83           83  North    Sales Support   10:00 AM             Outgoing     B   
84           84  North    Sales Support   11:00 AM             Outgoing     B   
85           85  North  Product Support    9:00 AM             Outgoing     B   
86           86  North    Sales Support   10:00 AM             Outgoing     B   
..          ...    ...              ...        ...                  ...   ...   
240         240  North  Product Support    5:00 PM             Outgoing     B   
241         241  North  Product Support    5:00 PM             Outgoing     A   
242         242  North  Product Support    5:00 PM             Outgoing     A   
243         243  North  Product Support    5:00 PM             Outgoing     A   
244         244  North  Product

In [4]:
#print shape to see what I am working with
import pandas as pd

north_df = pd.read_csv('NorthCallCenter.csv')
south_df= pd.read_csv('SouthCallCenter.csv')

print(f"Shape of the DataFrame: {north_df.shape}")
print(f"Shape of the DataFrame:{south_df.shape}")


Shape of the DataFrame: (245, 15)
Shape of the DataFrame:(314, 15)


In [6]:
import pandas as pd

north_df = pd.read_csv('NorthCallCenter.csv')
south_df = pd.read_csv('SouthCallCenter.csv')
north_df
south_df


Unnamed: 0.1,Unnamed: 0,Branch,Call Purpose,Time Block,Incoming or Outgoing,Queue,Rep ID,Sale,Lost Call,Abandoned,Busy Minutes,Not Ready Minutes,Incoming Wait Time,During Call Wait Time,Calls
0,0,South,Sales Support,10:00 AM,Incoming,D,Kate,NO,0,0,9,1,1.0,2,1
1,1,South,Sales Support,10:00 AM,Incoming,C,Eric,NO,0,0,8,2,1.0,4,1
2,2,South,Sales Support,10:00 AM,Incoming,C,Susan,NO,0,0,10,2,1.0,4,1
3,3,South,Sales Support,10:00 AM,Incoming,C,Alice,NO,0,0,12,1,1.0,3,1
4,4,South,Sales Support,12:00 PM,Incoming,C,Sandy,NO,0,0,8,3,1.0,3,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
309,309,South,Product Support,5:00 PM,Outgoing,D,Helga,NO,0,0,10,3,,4,1
310,310,South,Product Support,5:00 PM,Outgoing,C,Susan,NO,0,0,12,3,,4,1
311,311,South,Product Support,5:00 PM,Outgoing,C,Sandy,NO,0,0,9,3,,4,1
312,312,South,Product Support,5:00 PM,Outgoing,C,Eric,NO,0,0,10,1,,2,1


#### Record your answer below:

> Your Answer: Yes there is missing data, there seems to be a lot of null data in the  Incooming wait time column for both north and south call centers. In terms of clean up it would make sense to impute the data in that column as we are looking into which call center needs new reps to help having those values averaged and filled in would help ceate a more accurate representation of the acverage Incoming wait time.

#### Cleaning Question 2:  In the North data set, there are two separate "YES" values for a sale. Why and how did you rectify the issue?

In [14]:
# Cleaning Question 2 solution below:
import pandas as pd

north_df = pd.read_csv('NorthCallCenter.csv')

unique_sales = north_df['Sale'].unique()
print("Unique values in 'Sale' column:", unique_sales)

north_df['Sale'] = north_df['Sale'].str.rstrip()

unique_sales_after = north_df['Sale'].unique()
print("Unique values in 'Sale' column after stripping spaces:", unique_sales_after)


Unique values in 'Sale' column: ['NO' 'YES ' 'YES']
Unique values in 'Sale' column after stripping spaces: ['NO' 'YES']


##### Record your answer by below:
> Your Answer: FInd the number of Unique  Values in the column and strip the trailing space Print before and after create a cleaner look tot he data and avoid values being missed due to incorrect formatting. 

#### Cleaning Question 3:  Are there duplicates in the two data sets? If there are, how did you handle them?

In [9]:
# Cleaning Question 3 solution below:

In [17]:
import pandas as pd

north_df = pd.read_csv('NorthCallCenter.csv')

duplicates = north_df[north_df.duplicated()]

print("Duplicate rows in the NorthDataFrame:")
print(duplicates)

# Count total duplicate rows
total_duplicates = north_df.duplicated().sum()
print(f"Total number of duplicate rows: {total_duplicates}")


Duplicate rows in the NorthDataFrame:
Empty DataFrame
Columns: [Unnamed: 0, Branch, Call Purpose, Time Block, Incoming or Outgoing, Queue, Rep ID, Sale, Lost Call, Abandoned, Busy Minutes, Not Ready Minutes, Incoming Wait Time, During Call Wait Time, Calls]
Index: []
Total number of duplicate rows: 0


In [18]:
import pandas as pd

south_df = pd.read_csv('SouthCallCenter.csv')

duplicates = south_df[south_df.duplicated()]

print("Duplicate rows in the SouthDataFrame:")
print(duplicates)

# Count total duplicate rows
total_duplicates = south_df.duplicated().sum()
print(f"Total number of duplicate rows: {total_duplicates}")



Duplicate rows in the SouthDataFrame:
Empty DataFrame
Columns: [Unnamed: 0, Branch, Call Purpose, Time Block, Incoming or Outgoing, Queue, Rep ID, Sale, Lost Call, Abandoned, Busy Minutes, Not Ready Minutes, Incoming Wait Time, During Call Wait Time, Calls]
Index: []
Total number of duplicate rows: 0


##### Record your answer below:
> Your Answer: There appear to be no duplicate rows in either dataset no further action required.

#### Cleaning Question 4:  Is any of the data in the two data sets unnecessary? If yes, how did you handle it?

In [19]:
# Cleaning Question 4 solution below:
import pandas as pd

north_df = pd.read_csv('NorthCallCenter.csv')
south_df = pd.read_csv('SouthCallCenter.csv')

north_df = north_df.drop(columns=['Call Purpose'])
south_df = south_df.drop(columns=['Call Purpose'])

print("North DataFrame shape after dropping 'Call Purpose':", north_df.shape)
print("South DataFrame shape after dropping 'Call Purpose':", south_df.shape)


North DataFrame shape after dropping 'Call Purpose': (245, 14)
South DataFrame shape after dropping 'Call Purpose': (314, 14)


##### Record your answer below:
> Your Answer: The only data in the set that seems unnessecary based on the criteria of buiness issue is 'call pupose' as that doesn't seem to have bearing.  chose to drop the culomn in both data sets.

## Task 3: Data Manipulation
Before you begin answering any questions, combine the two datasets together to create a third dataframe called df. You can use this third dataframe to compare the two call centers to company-wide trends.
* Add additional code blocks as needed to show how you came to your conclusions. Add comments in your code blocks to help others understand your thinking. Record your answer below.

In [20]:
# Create dataframe for the entire company named df
import pandas as pd

north_df = pd.read_csv('NorthCallCenter.csv')
south_df = pd.read_csv ('SouthCallCenter.csv')
                       
north_df = north_df.drop(columns=['Call Purpose'], errors='ignore')  # Use errors='ignore' to avoid errors if column doesn't exist
south_df = south_df.drop(columns=['Call Purpose'], errors='ignore')

df = pd.concat([north_df, south_df], ignore_index=True)

print("Shape of the combined DataFrame (df):", df.shape)
print(df.head())




Shape of the combined DataFrame (df): (559, 14)
   Unnamed: 0 Branch Time Block Incoming or Outgoing Queue Rep ID Sale  \
0           0  North    9:00 AM             Incoming     B  Brent   NO   
1           1  North   11:00 AM             Incoming     A    Cam   NO   
2           2  North   11:00 AM             Incoming     B   Todd   NO   
3           3  North    4:00 PM             Incoming     B  Brent   NO   
4           4  North   12:00 PM             Incoming     B  Brent   NO   

   Lost Call  Abandoned  Busy Minutes  Not Ready Minutes  Incoming Wait Time  \
0          0          1             9                  2                 1.0   
1          0          0            11                  1                 1.0   
2          0          0             9                  3                 1.0   
3          0          0            11                  2                 1.0   
4          0          0             8                  2                 1.0   

   During Call Wait Time  

#### Manipulation Question 1:  Group by Rep ID and sum the resulting structure. Sort by calls to determine which rep in each branch has the highest number of calls.

In [28]:
# Manipulation Question solution below:
# Group by 'Rep ID' and sum the resulting structure
# Sort by number of calls in descending order
calls_summary = df.groupby(['Rep ID','Branch'])['Calls'].sum().reset_index()

sorted_calls = calls_summary.sort_values(by='Calls', ascending=False)

print(sorted_calls)


    Rep ID Branch  Calls
3    Brent  North     37
6     Eric  South     35
14   Randy  South     33
15   Sandy  South     32
8    Helga  South     31
13   Lilly  North     30
7   George  South     29
12    Kate  South     29
18    Todd  North     29
5     Duke  North     29
11    Karl  South     28
9      Joe  North     26
17   Susan  South     26
10    Josh  South     26
16  Sharon  South     25
4      Cam  North     24
19  Xander  North     24
1   Amanda  North     23
2     Andy  North     23
0    Alice  South     20


In [30]:
import pandas as pd

calls_summary = df.groupby(['Rep ID','Branch'])['Calls'].sum().reset_index()

top_reps = calls_summary.loc[calls_summary.groupby('Branch')['Calls'].idxmax()]

print(top_reps)


  Rep ID Branch  Calls
3  Brent  North     37
6   Eric  South     35


##### Record your answer below
Rep with the hightest number of calls and their total calls:
- North Branch = Brent 37
- South Branch = Eric 35

#### Manipulation Question 2:  The average call abandonment rate is a KPI when it comes to determining call center efficiency. As you may recall, abandoned calls are calls where the customer hangs up due to long call times. What is the average call abandonment rate for each branch and the whole company? Do any of these fall out of the optimal range of 2-5%?

In [7]:
import pandas as pd

df = pd.concat([north_df, south_df])  

abandonment_data = df[df['Incoming or Outgoing'] == 'Incoming'].groupby('Branch').agg(
    total_incoming=('Calls', 'count'),
    abandoned_calls=('Abandoned', 'sum')  
).reset_index()

abandonment_data['abandonment_rate'] = (abandonment_data['abandoned_calls'] / abandonment_data['total_incoming']) * 100

total_incoming_company = abandonment_data['total_incoming'].sum()
abandoned_calls_company = abandonment_data['abandoned_calls'].sum()
abandonment_rate_company = (abandoned_calls_company / total_incoming_company) * 100 if total_incoming_company else 0

for index, row in abandonment_data.iterrows():
    print(f"Abandonment Rate for {row['Branch']} Call Center: {row['abandonment_rate']:.2f}%")

print(f"Overall Abandonment Rate for the Company: {abandonment_rate_company:.2f}%")

optimal_range = (2, 5)
for index, row in abandonment_data.iterrows():
    if row['abandonment_rate'] < optimal_range[0] or row['abandonment_rate'] > optimal_range[1]:
        print(f"The abandonment rate for {row['Branch']} Call Center is outside range.")

if abandonment_rate_company < optimal_range[0] or abandonment_rate_company > optimal_range[1]:
    print("The abandonment rate for the Company is outside of the range.")


Abandonment Rate for North Call Center: 3.66%
Abandonment Rate for South Call Center: 0.79%
Overall Abandonment Rate for the Company: 1.92%
The abandonment rate for South Call Center is outside range.
The abandonment rate for the Company is outside of the range.


##### Record your answer below:
Average Call Abandonment Rates (round to the nearest hundredth):
- North Branch = 3.66%
- South Branch = 0.79%
- Company Wide =1.92%
- Do any of these fall out of the optimal range of 2-5%? Yes The South call center and the company rate are outsde fo the range.

#### Manipulation Question 3:  Service level is another KPI when it comes to measuring call center efficiency. Service level is the percentage of calls answered within a specific number of seconds. In the case of your employer, their ideal time frame is 2 seconds. What is the percentage of calls answered within 2 seconds for each branch and the entire company?

In [13]:
import pandas as pd

# Filter for incoming calls
incoming_calls_df = df[df['Incoming or Outgoing'] == 'Incoming']

# Calculate total incoming calls and calls answered within 2 seconds by branch
service_level_data = incoming_calls_df.groupby('Branch').agg(
    total_incoming=('Calls', 'count'),  # Total incoming calls
    answered_within_2_seconds=('Incoming Wait Time', lambda x: (x <= 2).sum())  # Calls answered within 2 seconds
).reset_index()

service_level_data['service_level_percentage'] = (service_level_data['answered_within_2_seconds'] / service_level_data['total_incoming']) * 100

# Company service level total
total_incoming_company = service_level_data['total_incoming'].sum()
answered_within_2_seconds_company = service_level_data['answered_within_2_seconds'].sum()
service_level_percentage_company = (answered_within_2_seconds_company / total_incoming_company) * 100 if total_incoming_company else 0

for index, row in service_level_data.iterrows():
    print(f"Service Level total for {row['Branch']} Call Center: {row['service_level_percentage']:.2f}%")

print(f"Service Level for the Company: {service_level_percentage_company:.2f}%")


Service Level total for North Call Center: 40.24%
Service Level total for South Call Center: 38.10%
Service Level for the Company: 38.94%


In [15]:
import pandas as pd

# Load the combined DataFrame (assume this is done already)
# df = pd.concat([north_df, south_df])  # Example of combining if not done yet.

# 1. Filter for incoming calls
incoming_calls_df = df[df['Incoming or Outgoing'] == 'Incoming']

# 2. Group by branch and count calls answered within 2 seconds
answered_in_2_seconds_by_branch = incoming_calls_df[incoming_calls_df['Incoming Wait Time'] <= 2].groupby('Branch').size().reset_index(name='total_answered_in_2_seconds')

# 3. Calculate total company-wide
total_answered_in_2_seconds_company = answered_in_2_seconds_by_branch['total_answered_in_2_seconds'].sum()

# 4. Print the results for each branch
for index, row in answered_in_2_seconds_by_branch.iterrows():
    print(f"Total calls answered in 2 seconds for {row['Branch']} Call Center: {row['total_answered_in_2_seconds']}")

# 5. Print overall company total
print(f"Total calls answered in 2 seconds for the Company: {total_answered_in_2_seconds_company}")


Total calls answered in 2 seconds for North Call Center: 33
Total calls answered in 2 seconds for South Call Center: 48
Total calls answered in 2 seconds for the Company: 81


##### Record your answer below:
Percentage of calls answered within 2 seconds, include # of calls:
- North Branch = 40.24% calls 33
- South Branch = 38.10% calls 48
- Company Wide = 38.94% calls 81

#### Manipulation Question 4: For each branch and the entire company, what is the average speed of answer?

In [16]:
# Manipulation Question 4 solution below:
import pandas as pd

# Load the combined DataFrame (assume this is done already)
# df = pd.concat([north_df, south_df])  # Example of combining if not done yet.

# 1. Filter for incoming calls
incoming_calls_df = df[df['Incoming or Outgoing'] == 'Incoming']

# 2. Calculate the average speed of answer by branch
average_speed_by_branch = incoming_calls_df.groupby('Branch').agg(
    average_speed=('Incoming Wait Time', 'mean')  # Calculate mean of Incoming Wait Time
).reset_index()

# 3. Calculate overall average speed of answer for the company
overall_average_speed = incoming_calls_df['Incoming Wait Time'].mean()

# 4. Print the average speed of answer for each branch
for index, row in average_speed_by_branch.iterrows():
    print(f"Average speed of answer for {row['Branch']} Call Center: {row['average_speed']:.2f} seconds")

# 5. Print overall average speed for the company
print(f"Overall average speed of answer for the Company: {overall_average_speed:.2f} seconds")


Average speed of answer for North Call Center: 3.05 seconds
Average speed of answer for South Call Center: 3.00 seconds
Overall average speed of answer for the Company: 3.02 seconds


##### Record your answer by below:
Average speed of answer (rounded to nearest hundredth):
- North Branch in seconds = 
- South Branch in seconds = 
- Company Wide in seconds = 

## Task 4: Visualization

Create a visualization for each of the following questions. Some of the code to handle aggregating and storing data may be written for you. For each visualization, you choose the chart style that you feel suits the situation best. Make note of the chart style you chose and why. 

*NOTE Some questions you may decide to use more than one chart and or chart style.

#### Visualization 1:  What is the average abandonment rate per queue?

In [1]:
# Create visualization 1 here
# Import any additional libraries needed with alias

# The dictionary abandonment_rates has the data you need.
abandonment_rates = {}
queues = ["A", "B", "C", "D"]
queue_dict = df.groupby("Queue").agg("sum")
for i in range(4):
    abandonment_rates[queues[i]] = queue_dict["Abandoned"][i] / queue_dict["Calls"][i] 

#Your code below:


NameError: name 'df' is not defined

##### Record your answer below:

> Chart style you chose and why: 

#### Visualization 2: What is the service level percentage and average speed of answer for each rep in the North Branch?

In [2]:
# Create visualization 2 here
# north_plt contains the data you need for the average speed of answer of each rep

north_plt = north.groupby("Rep ID")["Incoming Wait Time"].mean().to_frame().reset_index()

# Finding each Rep's Personal Service Level Percentage.  Basically, Calls within 2 secs / total calls

# Table 1: Total Incoming calls less than 2 seconds grouped by Rep
quick_calls = north[north["Incoming Wait Time"] <= 2.0]
quick_reps = quick_calls[["Rep ID", "Calls"]]
quick_stats = quick_reps.groupby(["Rep ID"]).sum()  # Final Table


# Table 2: Total Incoming Calls Only grouped by Rep
total_calls_in = north[north["Incoming or Outgoing"] == "Incoming"]
rep_calls = total_calls_in[["Rep ID", "Calls"]]     
total_stats = rep_calls.groupby(["Rep ID"]).sum() # Final Table  

#  Table 3: Service Level Percentage created via merge
service_level = pd.merge(quick_stats, total_stats, on="Rep ID")

# Create Percentage Column in Table 3
service_level["Service Level %"] = service_level["Calls_x"]/service_level["Calls_y"] * 100

#Your Code Here:


NameError: name 'north' is not defined

##### Record your answer below:

> Chart style you chose and why: 

#### Visualization 3: For each type of call purpose, how many calls are outgoing vs. incoming?

In [3]:
# Create visualization 3 here:
# The three dictionaries, complaints, sales_support, and product_support, have the information you need

purpose_group = df.groupby("Call Purpose")
call_purpose = ["Complaint", "Product Support", "Sales Support"]
purpose_counts = purpose_group["Incoming or Outgoing"].value_counts()
print(purpose_counts)

complaints = purpose_counts["Complaint"].to_dict()
sales_support = purpose_counts["Sales Support"].to_dict()
product_support = purpose_counts["Product Support"].to_dict()

#Your Code Here:


NameError: name 'df' is not defined

##### Record your answer below:

> Chart style you chose and why: 

## Task 5: Summarize Your Work

With what you know now about the two call centers and the entire company, answer the following questions. Note that while this is subjective, you should include relevant data to back up your opinion.


#### Summary Question 1:  Using KPIs such as average abandonment rate, service level and average speed of answer, in your opinion, which one of the two branches is operating more efficiently? Why?

##### Record your answer below:
> Your Answer: 



#### Summary Question 2: Based on the number of reps in each branch and how quickly the reps are working, in your opinion, which branch would benefit from the extra help?

##### Record your answer below:
> Your Answer: 


#### Summary Question 3: Now that you have explored the datasets, is there any data or information that you wish you had in this analysis?

##### Record your answer below:
> Your Answer: 


## Bonus Mission
Create a visualization that answers this question: For each call purpose, how many calls (incoming and outgoing) take place in each time block?
##### Record your answer below:

> Chart style you chose and why: 

In [4]:
# Create your Bonus Mission visualization here!
call_times = df[["Time Block", "Call Purpose", "Incoming or Outgoing", "Calls"]]

# Use groupby to plot based on time blocks:

# Use groupby and get_group to select which call purpose to plot:

NameError: name 'df' is not defined