# Typing Profiling: CyberLab Human Model → Cowrie Scoring

This notebook builds a **human interaction model** from CyberLab, then applies it to Cowrie honeypot sessions to estimate how **human-like** each session’s command timing appears.

We present three levels of inference:

1. **Simple PDF Baseline (Quantile Rule)**
   Scores sessions under the CyberLab human probability density function (PDF) and labels them using **CyberLab-derived log-likelihood quantiles**.

2. **Primary: Human-Likeness Tail Test (Mahalanobis χ²)**
   Uses the global CyberLab Gaussian and converts distance-to-human into a **tail probability** `p_human_tail` (smaller ⇒ less human-like).

3. **Secondary: Human-vs-Background Posterior (Gated Mixture)**
   Adds a background (bot-like) density model (GMM) and computes a **posterior** `p(human | x)` while still keeping CyberLab as the human anchor.

Outputs are written to `./output/` as compact, reusable CSVs.


In [2]:
import os
import glob
import numpy as np
import pandas as pd

from pathlib import Path
from scipy.stats import chi2
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture

OUT_DIR = Path("output")
OUT_DIR.mkdir(exist_ok=True, parents=True)

# Inputs
CYBERLAB_SESS_PATH = "output/cyberlab_clustering_features.csv"
COWRIE_LINES_PATH  = "../../fi_fs/data/processed/Cowrie_Merged_Geo_Enriched_lines.csv"

# Outputs
HUMAN_MODEL_NPZ    = OUT_DIR / "human_gaussian_densities.npz"
COWRIE_DENSITY_CSV = OUT_DIR / "cowrie_session_density_features.csv"
COWRIE_SCORED_CSV  = OUT_DIR / "cowrie_session_typing_scored.csv"
COWRIE_A_CSV       = OUT_DIR / "cowrie_session_typing_humanlikeness_tailtest.csv"
COWRIE_B_CSV       = OUT_DIR / "cowrie_session_typing_posterior_gatedmixture.csv"

print("Paths:")
print("  CYBERLAB_SESS_PATH:", CYBERLAB_SESS_PATH)
print("  COWRIE_LINES_PATH :", COWRIE_LINES_PATH)
print("  HUMAN_MODEL_NPZ   :", HUMAN_MODEL_NPZ)
print("  COWRIE_DENSITY_CSV:", COWRIE_DENSITY_CSV)
print("  COWRIE_SCORED_CSV :", COWRIE_SCORED_CSV)
print("  COWRIE_A_CSV      :", COWRIE_A_CSV)
print("  COWRIE_B_CSV      :", COWRIE_B_CSV)


Paths:
  CYBERLAB_SESS_PATH: output/cyberlab_clustering_features.csv
  COWRIE_LINES_PATH : ../../fi_fs/data/processed/Cowrie_Merged_Geo_Enriched_lines.csv
  HUMAN_MODEL_NPZ   : output/human_gaussian_densities.npz
  COWRIE_DENSITY_CSV: output/cowrie_session_density_features.csv
  COWRIE_SCORED_CSV : output/cowrie_session_typing_scored.csv
  COWRIE_A_CSV      : output/cowrie_session_typing_humanlikeness_tailtest.csv
  COWRIE_B_CSV      : output/cowrie_session_typing_posterior_gatedmixture.csv
