Federated learning utilities for distributed threat detection — lightweight abstractions for building FL-based security systems with differential privacy.
Organizations face a fundamental tension in cybersecurity: effective threat detection requires broad visibility across network traffic, but sharing raw security data between organizations violates privacy regulations, exposes proprietary infrastructure details, and creates new attack surfaces.
Federated learning resolves this by training a shared threat detection model across multiple organizations without ever centralizing the data. Each organization trains locally on its own data and shares only model updates — not raw network logs, alerts, or telemetry.
fedthreat provides the building blocks:
- Aggregation strategies (FedAvg, FedMedian, Trimmed Mean) to combine updates from multiple orgs
- Differential privacy (gradient clipping + calibrated Gaussian noise) to protect individual data points
- Data partitioning to simulate realistic non-IID distributions across organizations
- A complete coordinator that manages the FL training loop
- Evaluation metrics tailored to binary threat/benign classification
pip install fedthreatOr from source:
git clone https://github.com/cwccie/fedthreat.git
cd fedthreat
pip install -e ".[dev]"Run a federated learning simulation with synthetic threat data:
# 5 organizations, 10 FL rounds, IID data
fedthreat simulate --clients 5 --rounds 10
# With differential privacy (epsilon=1.0)
fedthreat simulate --clients 5 --rounds 10 --dp-epsilon 1.0
# Byzantine-robust aggregation
fedthreat simulate --clients 5 --rounds 10 --aggregation fedmedianimport numpy as np
from fedthreat.client import SimpleThreatClient
from fedthreat.coordinator import FederatedCoordinator
from fedthreat.data import DataPartitioner
from fedthreat.models import ModelWeights, TrainingConfig
from fedthreat.metrics import evaluate_model
# Generate or load your threat detection dataset
X = np.random.randn(1000, 10) # 1000 samples, 10 features
y = (X @ np.random.randn(10) > 0).astype(float) # Binary labels
# Partition across organizations (non-IID to simulate real-world)
partitioner = DataPartitioner(X, y, seed=42)
partitions = partitioner.non_iid_partition(n_clients=5, alpha=0.5)
# Initialize coordinator
initial_weights = ModelWeights(arrays={
"W": np.zeros(10),
"b": np.zeros(1),
})
coordinator = FederatedCoordinator(initial_weights, aggregation="fedavg")
# Create clients and register data
client_data = {}
for i, (X_part, y_part) in enumerate(partitions):
client = SimpleThreatClient(f"org_{i}", n_features=10)
coordinator.add_client(client)
client_data[f"org_{i}"] = (X_part, y_part)
# Run federated training
config = TrainingConfig(epochs=3, lr=0.01, dp_epsilon=1.0)
history = coordinator.run_training(client_data, config, num_rounds=10)
# Evaluate
print(f"Final avg loss: {history[-1].metrics['avg_loss']:.4f}")fedthreat implements the Gaussian mechanism for differential privacy:
- Gradient clipping: Each client's update is clipped to an L2 norm bound
- Noise injection: Calibrated Gaussian noise is added to the clipped update
- Budget tracking: A
PrivacyAccountanttracks cumulative epsilon across rounds
from fedthreat.privacy import PrivacyAccountant
from fedthreat.metrics import privacy_report
# Set a privacy budget
accountant = PrivacyAccountant(budget_epsilon=10.0)
coordinator = FederatedCoordinator(
initial_weights, privacy_accountant=accountant
)
# After training, check privacy spend
report = privacy_report(per_round_epsilon=1.0, num_rounds=10)
print(f"Total epsilon (simple composition): {report['total_epsilon_simple']}")
print(f"Total epsilon (advanced composition): {report['total_epsilon_advanced']}")┌─────────────────────────────────────────┐
│ FederatedCoordinator │
│ - Manages FL rounds │
│ - Selects clients per round │
│ - Aggregates updates (FedAvg/Median) │
│ - Tracks privacy budget │
└──────┬──────────┬──────────┬────────────┘
│ │ │
┌────▼───┐ ┌───▼────┐ ┌───▼────┐
│Client 0│ │Client 1│ │Client 2│ (Organizations)
│ Org A │ │ Org B │ │ Org C │
│ │ │ │ │ │
│ Train │ │ Train │ │ Train │ Local training
│ + DP │ │ + DP │ │ + DP │ + noise
└────┬───┘ └───┬────┘ └───┬────┘
│ │ │
└─────────┴──────────┘
Model updates only
(no raw data shared)
Simulate realistic federated settings where organizations have different data distributions:
# Partition a CSV dataset
fedthreat partition data.csv --clients 5 --method non_iid --alpha 0.1| Method | Description |
|---|---|
iid |
Uniform random split (baseline) |
non_iid |
Dirichlet-based label skew (realistic) |
quantity_skew |
Varying dataset sizes per client |
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest --cov=fedthreat
# Lint
ruff check src/ tests/MIT License. Copyright (c) 2026 Corey Wade.