### Objective
Evaluate data quality issues that may affect insider threat detection, and define cleaning assumptions before feature engineering.

The focus is on identifying missing psychometric data, inactive users, sparse
email activity, and temporal gaps that may impact behavioral modeling and risk
analysis.

### Missing Psychometric Scores

Psychometric attributes (O, C, E, A, N) represent personality traits collected
through employee assessments and are assumed to be static.

A missing value check was performed across all psychometric columns. The results
show that no missing values are present in the psychometric dataset.

This indicates that personality data is complete for all recorded users, which
simplifies downstream personality-based feature engineering and avoids the need
for imputation or exclusion at this stage.

Although missing values are not observed, it is still assumed that psychometric
scores represent baseline traits and may not capture temporal changes in stress
or behavior.

### Inactive Users

User inactivity was assessed using email event frequency, where each row in the
dataset represents a single email interaction.

A user was considered inactive if they appeared very few times across the entire
dataset. This assessment was based on the distribution of email counts per user.

The analysis shows that all users in the dataset have substantial email activity,
with each user appearing hundreds or thousands of times. As a result, no users
meet the criteria for inactivity.

This suggests that the dataset primarily contains actively communicating users,
making it suitable for behavioral and temporal modeling without the need to
exclude inactive accounts.

### Sparse Email Activity

Sparse email users were defined as users with fewer than five email events across
the entire observation period.

After analyzing email activity counts per user, no users were found to meet this
criterion. All users in the dataset have more than 30 email activity.

This indicates that the dataset does not suffer from extreme sparsity at the
user level, reducing the risk of unstable behavioral patterns due to insufficient
data.

As a result, no users were flagged or excluded based on sparse email activity at
this stage.

### Temporal Gaps in Email Activity

Temporal gaps were analyzed by computing time differences between consecutive
email events for each user.

The majority of time gaps are short, with a median gap of a few minutes and
75% of gaps occurring within an hour. This indicates dense and regular email
activity across users.

The maximum observed gap is approximately six days, which is consistent with
expected non-working periods such as weekends or short leaves. No users were
found to exhibit prolonged inactivity beyond this range.

Overall, the dataset shows strong temporal continuity, and no corrective action
(such as gap filling or user exclusion) is required at this stage. The data is
considered suitable for time-based behavioral feature engineering.
