Flight Risk Calculation

For each employee, all negative messages are first sorted in ascending order by timestamp. A sliding window algorithm is then applied to identify whether any group of four negative messages falls within a continuous 30-day period. To avoid duplicate detection, the window is non-overlapping—once a qualifying group is found, the index jumps forward by four positions. If a group does not meet the criteria, the window moves forward by one. The number of such detected risk windows is counted for each employee and stored in flight_risk_counts, which is then converted into a DataFrame for further analysis or reporting.

In [None]:
import pandas as pd
from datetime import timedelta


df = pd.read_csv('sentimental_labeling.csv')
df.columns = df.columns.str.strip().str.lower()
df['timestamp'] = pd.to_datetime(df['date'])
neg_df = df[df['sentiment'] == 'Negative'].copy()
flight_risk_counts = []


for emp, group in neg_df.groupby('from'):
    group = group.sort_values('timestamp').reset_index(drop=True)
    i = 0
    risk_count = 0

    while i <= len(group) - 4:
        start = group.loc[i, 'timestamp']
        end = group.loc[i + 3, 'timestamp']

        if (end - start).days <= 30:
            risk_count += 1
            i += 4  
        else:
            i += 1  

    if risk_count > 0:
        flight_risk_counts.append({
            'employee': emp,
            'flight_risk_count': risk_count
        })

risk_df = pd.DataFrame(flight_risk_counts)

risk_df

Unnamed: 0,employee,flight_risk_count
0,bobette.riner@ipgdirect.com,1
1,eric.bass@enron.com,1
2,john.arnold@enron.com,1
3,johnny.palmer@enron.com,1
4,kayne.coulter@enron.com,1
5,patti.thompson@enron.com,1
6,sally.beck@enron.com,1
