# Analyzing Digital Exhaust and People Analytics with Papilon

This notebook demonstrates how to use the Papilon library to analyze digital exhaust data for people analytics. Digital exhaust includes data such as email metadata, calendar interactions, chat logs, and application usage. We'll simulate a dataset and apply entropy, causal inference, and scenario analysis to uncover behavioral patterns and organizational insights.

In [None]:
# Install dependencies (if not already installed)
# !pip install papilon pandas numpy matplotlib seaborn

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from papilon.entropy import shannon_entropy
from papilon.simulation import simulate_kde_scenarios
from papilon.energy import energy_score
from papilon.states import macrostate_entropy
from papilon.relationships import analyze_relationships
from papilon.causal import infer_causal_structure

## 1. Simulate Digital Exhaust Data
We simulate activity logs capturing email, meetings, app usage, and productivity scores.

In [None]:
np.random.seed(42)
n = 500
df = pd.DataFrame({
    'employee_id': np.random.randint(1000, 1050, n),
    'emails_sent': np.random.poisson(20, n),
    'meetings_attended': np.random.poisson(5, n),
    'app_switches': np.random.poisson(30, n),
    'idle_minutes': np.random.normal(60, 10, n),
    'productivity_score': np.random.normal(75, 10, n)
})
df.head()

## 2. Entropy and Behavior Variability

In [None]:
entropy_vals = {col: shannon_entropy(df[col]) for col in df.columns if df[col].dtype != 'object'}
pd.Series(entropy_vals, name='Entropy')

## 3. Causal Structure
Identify potential causal relationships among observed behaviors.

In [None]:
causal_graph = infer_causal_structure(df.drop(columns=['employee_id']))
causal_graph.draw(format='png')

## 4. Scenario Simulation
Model productivity scenarios using KDE.

In [None]:
simulated = simulate_kde_scenarios(df[['emails_sent', 'meetings_attended', 'app_switches']], n_samples=1000)
sns.histplot(simulated['emails_sent'], kde=True)
plt.title('Simulated Email Activity')
plt.show()

## 5. Energy Score: Deviation from Norm

In [None]:
energy = energy_score(df[['emails_sent', 'meetings_attended']],
                      simulated[['emails_sent', 'meetings_attended']])
print(f"Energy Score: {energy:.4f}")