Employee Ranking

1. Average Score of Messages

To ensure fairness in employee ranking, we calculated the average sentiment score per month instead of using raw sentiment totals. This approach eliminates the bias caused by differences in message volume—employees who send more messages would otherwise accumulate higher absolute sentiment scores regardless of sentiment quality. By mapping sentiment labels to numeric values (Positive = +1, Neutral = 0, Negative = –1) and averaging them monthly for each employee, we obtained a normalized metric that reflects overall sentiment tone rather than quantity. This average score was then used as the basis for identifying the top positive and negative employees each month.

In [10]:
import pandas as pd

df = pd.read_csv('sentimental_labeling.csv')
df.columns = df.columns.str.strip().str.lower()
df['timestamp'] = pd.to_datetime(df['date'])
df['year_month'] = df['timestamp'].dt.to_period('M').astype(str)

sentiment_map = {'Positive': 1, 'Neutral': 0, 'Negative': -1}
df['score'] = df['sentiment'].map(sentiment_map)

monthly_avg_df = (
    df.groupby(['from', 'year_month'])['score']
    .mean()
    .reset_index()
    .rename(columns={'from': 'employee', 'score': 'average_score'})
)

monthly_avg_df

Unnamed: 0,employee,year_month,average_score
0,bobette.riner@ipgdirect.com,2010-01,0.500000
1,bobette.riner@ipgdirect.com,2010-02,0.571429
2,bobette.riner@ipgdirect.com,2010-03,0.363636
3,bobette.riner@ipgdirect.com,2010-04,0.666667
4,bobette.riner@ipgdirect.com,2010-05,0.750000
...,...,...,...
235,sally.beck@enron.com,2011-08,0.214286
236,sally.beck@enron.com,2011-09,0.000000
237,sally.beck@enron.com,2011-10,0.714286
238,sally.beck@enron.com,2011-11,0.625000


2. Ranking According to Average Score

In [None]:
top_positive_list = []
top_negative_list = []

for month in monthly_avg_df['year_month'].unique():
    month_data = monthly_avg_df[monthly_avg_df['year_month'] == month]

    # Top 3 Positive
    top_pos = (
        month_data
        .sort_values(by=['average_score', 'employee'], ascending=[False, True])
        .head(3)
        .copy()
    )
    top_pos['rank'] = [1, 2, 3]
    top_pos['month'] = month
    top_pos['type'] = 'Top Positive'
    top_positive_list.append(top_pos)

    # Top 3 Negative
    top_neg = (
        month_data
        .sort_values(by=['average_score', 'employee'], ascending=[True, True])
        .head(3)
        .copy()
    )
    top_neg['rank'] = [1, 2, 3]
    top_neg['month'] = month
    top_neg['type'] = 'Top Negative'
    top_negative_list.append(top_neg)


ranking_df = pd.concat(top_positive_list + top_negative_list, ignore_index=True)

ranking_df

Unnamed: 0,employee,year_month,average_score,rank,month,type
0,eric.bass@enron.com,2010-01,0.900000,1,2010-01,Top Positive
1,don.baughman@enron.com,2010-01,0.888889,2,2010-01,Top Positive
2,john.arnold@enron.com,2010-01,0.714286,3,2010-01,Top Positive
3,lydia.delgado@enron.com,2010-02,1.000000,1,2010-02,Top Positive
4,patti.thompson@enron.com,2010-02,1.000000,2,2010-02,Top Positive
...,...,...,...,...,...,...
139,lydia.delgado@enron.com,2011-11,0.428571,2,2011-11,Top Negative
140,patti.thompson@enron.com,2011-11,0.538462,3,2011-11,Top Negative
141,johnny.palmer@enron.com,2011-12,0.333333,1,2011-12,Top Negative
142,lydia.delgado@enron.com,2011-12,0.647059,2,2011-12,Top Negative
