# Code Example 1 — Classical Enumeration Baseline (MOVEit Threat Model)

This notebook models a MOVEit-class breach as an **inference problem** under a classical attacker model.

**Core idea:** The attacker must infer an internal "branch/state" of the target system using repeated interactions.  
Each interaction increases certainty, but also increases exposure (more retries, more errors, more anomalies).

We intentionally avoid:
- payload construction
- endpoint specifics
- vendor targeting

Instead, we simulate the *structure*:

- A target has a hidden internal state `s*` among `N` candidates.
- Each query returns a noisy "score" that correlates with correctness.
- A classical attacker repeats queries to choose the best candidate with high confidence.

Outputs produced here are designed to be captured in an exported HTML/PDF artifact.

In [1]:
import numpy as np

## Model and Parameters

We define a hidden target state `s*` in a candidate set of size `N`.

Each query produces a numeric score for each candidate:
- The correct candidate has mean score `μ_good`
- Incorrect candidates have mean score `μ_bad`
- All scores include Gaussian noise `σ`

A classical attacker must perform repeated queries and aggregate evidence.

We report:
- how many queries were required
- how often the attacker succeeded
- a simple "exposure proxy" that increases with query count
- whether the process would likely trigger an anomaly threshold (illustrative)

In [2]:
# Reproducibility
seed = 123
rng = np.random.default_rng(seed)

# Candidate space size (state/branch count)
N = 128

# Signal model: score separation
mu_good = 1.0
mu_bad  = 0.0
sigma   = 1.0

# How many queries an attacker will attempt at most
max_queries = 200

# Number of independent runs to estimate success probability
trials = 200

# Illustrative anomaly threshold: if queries exceed this, assume "noisy" behavior becomes noticeable
anomaly_query_threshold = 80

## Scoring Function

Each query produces a score vector `scores[0..N-1]`.

The "correct" internal state index `s*` has a slightly higher expected score.
This is a simplified stand-in for real side-information (timing, response structure, error patterns, etc.).

A classical attacker repeats queries and aggregates scores to identify the most likely state.

In [3]:
def sample_scores(N, s_star, mu_good, mu_bad, sigma, rng):
    """
    Generate one noisy score per candidate for a single query.
    """
    scores = rng.normal(loc=mu_bad, scale=sigma, size=N)
    scores[s_star] = rng.normal(loc=mu_good, scale=sigma)
    return scores

## Classical Aggregation Strategy

The attacker repeats queries and aggregates evidence using a simple running sum:

- Initialize cumulative scores to zero
- For each query:
  - sample a new score vector
  - add it to cumulative scores
  - pick the current best candidate (argmax)

We stop early once confidence is high enough.

Confidence metric:
- "margin" = best_score - second_best_score
- Convert margin to a probability-like value via a logistic transform

This is not a claim about real-world probabilities — it is a stable, printable way to show
how classical methods require repeated interaction to reach high confidence.

In [4]:
def logistic(x):
    return 1.0 / (1.0 + np.exp(-x))

def confidence_from_margin(margin):
    """
    Convert a margin to a smooth confidence value in (0,1).
    """
    return logistic(margin)

## One Run Simulation

We simulate a single attacker attempt.

Stop condition:
- confidence ≥ `target_confidence`, OR
- reached `max_queries`

We record:
- queries used
- whether the attacker guessed the correct state
- exposure proxy (equal to queries used)

In [5]:
def classical_inference_run(N, mu_good, mu_bad, sigma, rng, max_queries=200, target_confidence=0.95):
    # Hidden true state
    s_star = int(rng.integers(0, N))
    
    cumulative = np.zeros(N)
    queries_used = 0
    conf = 0.0
    
    for t in range(1, max_queries + 1):
        scores = sample_scores(N, s_star, mu_good, mu_bad, sigma, rng)
        cumulative += scores
        
        # best and second best
        best_idx = int(np.argmax(cumulative))
        sorted_scores = np.sort(cumulative)
        margin = sorted_scores[-1] - sorted_scores[-2]
        conf = confidence_from_margin(margin)
        
        queries_used = t
        
        if conf >= target_confidence:
            break
    
    success = (best_idx == s_star)
    return {
        "success": success,
        "queries_used": queries_used,
        "confidence": conf,
        "s_star": s_star,
        "best_idx": best_idx,
        "margin": float(margin),
    }

## Batch Experiment (Classical Baseline)

We run many trials to estimate:

- success rate vs target confidence
- typical query counts
- fraction of runs exceeding a query threshold (proxy for anomaly risk)

This is the classical attacker baseline: repeated interaction is required to collapse uncertainty.

In [6]:
target_confidence = 0.95

results = []
for _ in range(trials):
    r = classical_inference_run(
        N=N,
        mu_good=mu_good,
        mu_bad=mu_bad,
        sigma=sigma,
        rng=rng,
        max_queries=max_queries,
        target_confidence=target_confidence
    )
    results.append(r)

queries = np.array([r["queries_used"] for r in results])
successes = np.array([r["success"] for r in results], dtype=int)
confidences = np.array([r["confidence"] for r in results])

success_rate = successes.mean()
median_queries = int(np.median(queries))
p90_queries = int(np.percentile(queries, 90))
anomaly_rate = (queries > anomaly_query_threshold).mean()

print("=== Classical Enumeration Baseline ===")
print(f"Candidate states (N):                 {N}")
print(f"Signal separation (μ_good-μ_bad):     {mu_good - mu_bad:.2f}")
print(f"Noise level (σ):                      {sigma:.2f}")
print(f"Target confidence:                    {target_confidence:.2f}")
print(f"Trials:                               {trials}")
print()
print(f"Success rate:                         {success_rate:.3f}")
print(f"Median queries to reach confidence:   {median_queries}")
print(f"90th percentile queries:              {p90_queries}")
print(f"Anomaly threshold (queries):          {anomaly_query_threshold}")
print(f"Fraction exceeding anomaly threshold: {anomaly_rate:.3f}")

=== Classical Enumeration Baseline ===
Candidate states (N):                 128
Signal separation (μ_good-μ_bad):     1.00
Noise level (σ):                      1.00
Target confidence:                    0.95
Trials:                               200

Success rate:                         0.895
Median queries to reach confidence:   11
90th percentile queries:              18
Anomaly threshold (queries):          80
Fraction exceeding anomaly threshold: 0.000


## Interpretation

This baseline demonstrates the **classical tradeoff**:

- Higher confidence requires more queries.
- More queries increases exposure and the likelihood of observable anomalies.
- Even if each query is valid and well-formed, repeated probing creates detectable patterns.

This is the baseline we will compare against in Code Example 2, where we model
certainty compression (fewer interactions for equivalent confidence).

Reference: Code Example 1 — Classical Enumeration Baseline