## Multivariate Patterns (Human-Interpretable)

### Objective

The goal of this analysis is to understand how multiple behavioral and psychological factors combine to form recognizable communication patterns.

In [9]:
import sys
import os

sys.path.append(os.path.abspath(".."))

In [10]:
from src.data_utils import load_processed, save_processed
import pandas as pd
import matplotlib.pyplot as plt

df=load_processed("user_level_bivariate_features_v2.csv")


### 1. Profile-Based Reasoning

This step reframes individual metrics into combined behavioral dimensions.
Rather than interpreting single features in isolation, users are examined
as profiles formed by multiple behavioral and psychological attributes.

### Selected Dimensions

The following dimensions are used to construct interpretable behavioral profiles:

 **Communication Intensity**
  - Total email volume
  - Average emails per active day

 **Communication Volatility**
  - Variability in email activity over time

 **Psychometric Extremes**
  - Users in the upper or lower quartiles of O, C, E, A, and N

These dimensions are descriptive and not used for prediction or risk scoring.


In [11]:
df[['total_emails', 'emails_per_day', 'O', 'C', 'E', 'A', 'N']].describe()

Unnamed: 0,total_emails,emails_per_day,O,C,E,A,N
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,2629.979,7.989842,33.173,30.653,29.197,28.821,29.608
std,1830.294016,5.125504,10.642007,11.291505,10.95647,11.170844,4.938494
min,34.0,1.0,10.0,10.0,10.0,10.0,14.0
25%,1001.0,2.930586,23.0,20.0,19.0,19.0,26.0
50%,3002.0,8.751512,36.0,33.0,28.0,27.0,29.0
75%,3693.25,10.71924,42.0,40.0,39.0,39.0,33.0
max,12034.0,34.982558,50.0,50.0,50.0,50.0,49.0


### Observation

- Communication behavior shows substantial variation across users, with total email volume ranging from very low activity to extremely high levels.

- Average emails per day also varies widely, indicating differences not only in intensity but in day-to-day communication pace.

- Psychometric traits (O, C, E, A, N) span the full scale range with moderate dispersion, suggesting meaningful personality diversity across the user population.


### 2. Simple Behavioral Segments

In [14]:
df_profile=df.copy()

df_profile['high_intensity']=df_profile['total_emails']>df_profile['total_emails'].quantile(0.75)
df_profile['low_intensity']=df_profile['total_emails']<df_profile['total_emails'].quantile(0.25)

df_profile['high_volatility']=df_profile['emails_per_day']>df_profile['emails_per_day'].quantile(0.75)

df_profile.head()

Unnamed: 0,user,total_emails,avg_email_size,total_attachment,active_days,emails_per_day,employee_name,user_id,O,C,E,A,N,high_intensity,low_intensity,high_volatility
0,AAE0190,4711,30020.394184,1780,345,13.655072,August Armando Evans,AAE0190,36,30,14,50,29,True,False,True
1,AAF0535,480,30397.402083,364,162,2.962963,Athena Amelia Foreman,AAF0535,17,21,36,33,31,False,True,False
2,AAF0791,3012,29958.497676,0,346,8.705202,Aladdin Abraham Foley,AAF0791,14,40,40,50,34,False,False,False
3,AAL0706,336,29828.181548,145,336,1.0,April Alika Levy,AAL0706,37,14,28,13,25,False,True,False
4,AAM0658,659,29895.532625,613,224,2.941964,Abel Adam Morton,AAM0658,43,35,37,36,22,False,True,False


### Observation

- Communication intensity and volatility vary substantially across users.

- Percentile-based thresholds enable users to be described relative to the population rather than by absolute email counts.

### 3. Human-Readable Behavioral Archetypes

In [15]:
df_profile['archetypes']='Other'

df_profile.loc[df_profile['low_intensity'] & ~df_profile['high_volatility'], 
    'archetypes']='Stable-Low Activity'
df_profile.loc[df_profile['high_intensity'] & ~df_profile['high_volatility'], 
    'archetypes']='Stable-High Activity'

df_profile.loc[df_profile['high_volatility'], 
    'archetypes']='Volatile Communicator'

df_profile.head()

Unnamed: 0,user,total_emails,avg_email_size,total_attachment,active_days,emails_per_day,employee_name,user_id,O,C,E,A,N,high_intensity,low_intensity,high_volatility,archetypes
0,AAE0190,4711,30020.394184,1780,345,13.655072,August Armando Evans,AAE0190,36,30,14,50,29,True,False,True,Volatile Communicator
1,AAF0535,480,30397.402083,364,162,2.962963,Athena Amelia Foreman,AAF0535,17,21,36,33,31,False,True,False,Stable-Low Activity
2,AAF0791,3012,29958.497676,0,346,8.705202,Aladdin Abraham Foley,AAF0791,14,40,40,50,34,False,False,False,Other
3,AAL0706,336,29828.181548,145,336,1.0,April Alika Levy,AAL0706,37,14,28,13,25,False,True,False,Stable-Low Activity
4,AAM0658,659,29895.532625,613,224,2.941964,Abel Adam Morton,AAM0658,43,35,37,36,22,False,True,False,Stable-Low Activity


### Observation
- Rule-based archetype assignment produces interpretable communication profiles without relying on statistical clustering or model assumptions.

### 4. Archetype Distribution

In [16]:
df_profile['archetypes'].value_counts()

archetypes
Other                    480
Volatile Communicator    250
Stable-Low Activity      245
Stable-High Activity      25
Name: count, dtype: int64

### Observation

- Most users fall into either stable or mixed communication patterns, while consistently high-volume stable communicators represent a small minority of the population.

### 5. Psychometric Extremes Within Archetypes

In [17]:
trait_extreme={}

for x in ['O','C','E','A','N']:
    high_trait=df_profile[df_profile[x]>=df_profile[x].quantile(0.75)]
    trait_extreme[x]=high_trait['archetypes'].value_counts(normalize=True)

trait_extreme_df=pd.DataFrame(trait_extreme).fillna(0)
trait_extreme_df

Unnamed: 0_level_0,O,C,E,A,N
archetypes,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Other,0.472441,0.491289,0.491468,0.449814,0.494737
Stable-High Activity,0.03937,0.027875,0.020478,0.02974,0.035088
Stable-Low Activity,0.275591,0.236934,0.232082,0.263941,0.242105
Volatile Communicator,0.212598,0.243902,0.255973,0.256506,0.22807


### Observation

- Users with extreme psychometric traits are distributed across multiple behavioral archetypes.

- No personality trait maps uniquely or predominantly to a specific communication pattern.

### 6. Multivariate Interpretation

### Observations

- Communication behavior emerges from combinations of intensity, volatility, and individual traits, rather than any single variable.

- Psychometric extremes often coexist with stable and moderate communication patterns, indicating non-linear relationships.

- Volatile communication behavior is not dominated by any single personality profile, suggesting multi-factor influence.

- No single feature consistently drives outcomes; contextual feature combinations are more informative.