## User Risk Profiling

### Objective

Demonstrating how early-warning behavioral indicators can be conceptually designed using exploratory data analysis.

- Risk here does not imply malicious intent.
- Risk is defined as behavioral deviation from peer norms, not insider threat detection.

### Scope & Disclaimer
This profiling is exploratory and intended for prioritization, not decision-making or enforcement.

### 1. Reframe "Risk"

- Risk doesn't means malicious activity.

- Risk means behavioural activity from peer norms.

- Focus on pattern, not current activity.

#### Observation

Most users exhibit stable and moderate communication behavior.
Only a small subset shows strong deviation across one or more dimensions.

### 2. Risk Indicators

- Behavioral Concentration: High activity compressed into short activity period.

- Psychometric Extremity: Users far from the mean of O, C, E, A, or N.

- Communication Volatality: User whose email activity fluctuates significantly compared to peers.

### 3. Compute Risk Indicators

In [3]:
import os
import sys

sys.path.append(os.path.abspath(".."))

from src.data_utils import save_processed
import pandas as pd

df=load_processed("user_level_bivariate_features_v2.csv")

Communication Volatility

In [5]:
volatility_df = (
    df.groupby('user')['emails_per_day']
    .std()
    .reset_index()
    .rename(columns={'emails_per_day': 'communication_volatility'})
)

df = df.merge(volatility_df, on='user', how='left')


Psychometric Extremity

In [6]:
trait_cols = ['O', 'C', 'E', 'A', 'N']

trait_means = df[trait_cols].mean()

df['psychometric_distance'] = (
    (df[trait_cols] - trait_means)
    .abs()
    .mean(axis=1)
)


Communication Intensity

In [8]:
df['communication_intensity'] = df['emails_per_day']

### 4. Conceptual Risk Personas

In [9]:
def assign_risk_persona(row):
    if row['communication_volatility'] > df['communication_volatility'].quantile(0.90):
        return 'High-Variance Communicators'
    elif row['psychometric_distance'] > df['psychometric_distance'].quantile(0.90):
        return 'Psychometric Outliers with Normal Activity'
    else:
        return 'Behaviorally Stable Majority'

df['risk_persona'] = df.apply(assign_risk_persona, axis=1)


### 5. Persona Distribution

In [10]:
df['risk_persona'].value_counts()


risk_persona
Behaviorally Stable Majority                  900
Psychometric Outliers with Normal Activity    100
Name: count, dtype: int64

#### Observation

- The majority of users fall into the Behaviorally Stable Majority

- High variance and psychometric extremity appear in small, distinct subsets

### 6. Multivariate Interpretation

#### Observations

- Behavioral deviation emerges from combinations of intensity, volatility, and personality traits.

- Psychometric extremity often coexists with normal communication behavior.

- High communication volatility does not align with any single personality trait.

- Most users cluster in stable, low-risk behavioral zones.