# Employee Sentiment Analysis

This notebook implements the *employee sentiment analysis project* as outlined in the Final LLM Assesment Requirements. It will be organized by tasks:
1. Sentiment labeling
2. Exploratory Data Analysis
3. Monthly sentiment scoring
4. Employee ranking
5. Flight risk identification
6. Linear regression modeling

Tools used include: python, pandas, matploblib/seaborn, textblob, scikit-learn.

This project was created by Andrew Cho on July 1st, 2025.

### Imports and Reading Data

In [1]:
import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from textblob import TextBlob

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# sets the plot style to ggplot
plt.style.use('ggplot')

In [2]:
data_path = "../data/message_data.csv"
df = pd.read_csv(data_path)

## 01 - Sentiment labeling

In [3]:
# Inspect data to understand its structure
print("Dataset loaded successfully. Preview:")
print(df.head(3))
print("\nDataset info:")
print(df.info())

Dataset loaded successfully. Preview:
                                        Subject  \
0                          EnronOptions Update!   
1                                  (No Subject)   
2  Phone Screen  Interview - Shannon L. Burnham   

                                                body       date  \
0  EnronOptions Announcement\n\n\nWe have updated...  5/10/2010   
1  Marc,\n\nUnfortunately, today is not going to ...  7/29/2010   
2  When: Wednesday, June 06, 2001 10:00 AM-11:00 ...  7/25/2011   

                   from  
0  sally.beck@enron.com  
1   eric.bass@enron.com  
2  sally.beck@enron.com  

Dataset info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2191 entries, 0 to 2190
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Subject  2191 non-null   object
 1   body     2191 non-null   object
 2   date     2191 non-null   object
 3   from     2191 non-null   object
dtypes: object(4)
memory usage: 68.6+ KB


In [4]:
# Defines sentiment labeling function using TextBlob
def get_sentiment_label(text):
    """
    Returns a Positive, Negative, or a Neutral 
    score based on Textblob polarity
    """
    text = str(text)
    polarity = TextBlob(text).sentiment.polarity
    
    if polarity > 0:
        return 'Positive'
    elif polarity < 0:
        return 'Negative'
    else:
        return 'Neutral'

# Applies sentiment labeling to messages in a new column
df['Sentiment Label'] = df['body'].apply(get_sentiment_label)

print("Sentiment labeling complete. Preview:")
print(df.head(3))

# Saves newly labeled dataset to data folder
df.to_csv("../data/message_data_labeled.csv", index= False)
print("\nSentiment Labeled Dataset saved to ../data/message_data_labeled.csv")

Sentiment labeling complete. Preview:
                                        Subject  \
0                          EnronOptions Update!   
1                                  (No Subject)   
2  Phone Screen  Interview - Shannon L. Burnham   

                                                body       date  \
0  EnronOptions Announcement\n\n\nWe have updated...  5/10/2010   
1  Marc,\n\nUnfortunately, today is not going to ...  7/29/2010   
2  When: Wednesday, June 06, 2001 10:00 AM-11:00 ...  7/25/2011   

                   from Sentiment Label  
0  sally.beck@enron.com        Positive  
1   eric.bass@enron.com        Negative  
2  sally.beck@enron.com         Neutral  

Sentiment Labeled Dataset saved to ../data/message_data_labeled.csv


## 02 - Exploratory Data Analysis (EDA)