# Case 1 - Call Center Staffing Analytics - Detecting Favoritism using Machine Learning
## Situation: 
A call center operation was under close scrutiny for uneven performance,
shoddy operations and low employee morale. There was a rumor floating around that
the call center manager was engaging in favoritism, that certain employees were given
unfairly easy working conditions, but no one was able to present a convincing case
against the manager.
## Complication: 
Some even went so far as to accuse the manager of nepotism --
implying that the workers being given extra sweet deals were those that were related to
the manager. The case was before a judge who, given the seriousness of the case,
asked for hard evidence.
## Key question:
Can we use machine learning to ‘objectively’ identify whether there is
any hard evidence to prove that certain employees were being treated in a
systematically different way than others. Can we do this using not one, but multiple
dimensions, together?

# start your analysis 

In [1]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import LocalOutlierFactor
import matplotlib.pyplot as plt
import seaborn as sns



In [2]:
df = pd.read_csv("call-center.csv")
df.describe()

Unnamed: 0,Employee ID,Avg Tix / Day,Customer rating,Tardies,Graveyard Shifts Taken,Weekend Shifts Taken,Sick Days Taken,% Sick Days Taken on Friday,Employee Dev. Hours,Shift Swaps Requested,Shift Swaps Offered
count,400.0,400.0,400.0,400.0,400.0,400.0,400.0,400.0,400.0,400.0,400.0
mean,137946.0375,156.08575,3.49515,1.465,1.985,0.9525,1.875,35.22,11.97,1.4475,1.76
std,4240.877417,4.416638,0.461497,0.972697,0.794577,0.548631,1.673732,39.295061,7.470852,0.999872,1.812626
min,130564.0,143.1,2.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,134401.5,153.075,3.21,1.0,1.0,1.0,0.0,0.0,6.0,1.0,0.0
50%,137906.5,156.05,3.505,1.0,2.0,1.0,2.0,25.0,12.0,1.0,1.0
75%,141771.25,159.1,3.81,2.0,2.0,1.0,3.0,67.0,17.0,2.0,3.0
max,145176.0,168.7,4.81,4.0,4.0,2.0,7.0,100.0,34.0,5.0,9.0


In [None]:
# Normalize the data
scaler = StandardScaler()
call_center_scaled = pd.DataFrame(scaler.fit_transform(call_center.iloc[:, 1:11]), columns=call_center.columns[1:11])


In [None]:
# Calculate the outlier scores using LocalOutlierFactor (similar to LOF in R)
lof = LocalOutlierFactor(n_neighbors=5)
outlier_scores = -lof.fit_predict(call_center_scaled)


In [None]:
# Plot score density
sns.kdeplot(outlier_scores)
plt.show()


In [None]:
# Filter rows with high outlier scores
outliers = call_center[outlier_scores > 1.5]
print(outliers)