fedthreat

Federated learning utilities for distributed threat detection — lightweight abstractions for building FL-based security systems with differential privacy.

The Problem

Organizations face a fundamental tension in cybersecurity: effective threat detection requires broad visibility across network traffic, but sharing raw security data between organizations violates privacy regulations, exposes proprietary infrastructure details, and creates new attack surfaces.

Federated learning resolves this by training a shared threat detection model across multiple organizations without ever centralizing the data. Each organization trains locally on its own data and shares only model updates — not raw network logs, alerts, or telemetry.

fedthreat provides the building blocks:

Aggregation strategies (FedAvg, FedMedian, Trimmed Mean) to combine updates from multiple orgs
Differential privacy (gradient clipping + calibrated Gaussian noise) to protect individual data points
Data partitioning to simulate realistic non-IID distributions across organizations
A complete coordinator that manages the FL training loop
Evaluation metrics tailored to binary threat/benign classification

Installation

pip install fedthreat

Or from source:

git clone https://github.com/cwccie/fedthreat.git
cd fedthreat
pip install -e ".[dev]"

Quick Start

CLI Simulation

Run a federated learning simulation with synthetic threat data:

# 5 organizations, 10 FL rounds, IID data
fedthreat simulate --clients 5 --rounds 10

# With differential privacy (epsilon=1.0)
fedthreat simulate --clients 5 --rounds 10 --dp-epsilon 1.0

# Byzantine-robust aggregation
fedthreat simulate --clients 5 --rounds 10 --aggregation fedmedian

Python API

import numpy as np
from fedthreat.client import SimpleThreatClient
from fedthreat.coordinator import FederatedCoordinator
from fedthreat.data import DataPartitioner
from fedthreat.models import ModelWeights, TrainingConfig
from fedthreat.metrics import evaluate_model

# Generate or load your threat detection dataset
X = np.random.randn(1000, 10)  # 1000 samples, 10 features
y = (X @ np.random.randn(10) > 0).astype(float)  # Binary labels

# Partition across organizations (non-IID to simulate real-world)
partitioner = DataPartitioner(X, y, seed=42)
partitions = partitioner.non_iid_partition(n_clients=5, alpha=0.5)

# Initialize coordinator
initial_weights = ModelWeights(arrays={
    "W": np.zeros(10),
    "b": np.zeros(1),
})
coordinator = FederatedCoordinator(initial_weights, aggregation="fedavg")

# Create clients and register data
client_data = {}
for i, (X_part, y_part) in enumerate(partitions):
    client = SimpleThreatClient(f"org_{i}", n_features=10)
    coordinator.add_client(client)
    client_data[f"org_{i}"] = (X_part, y_part)

# Run federated training
config = TrainingConfig(epochs=3, lr=0.01, dp_epsilon=1.0)
history = coordinator.run_training(client_data, config, num_rounds=10)

# Evaluate
print(f"Final avg loss: {history[-1].metrics['avg_loss']:.4f}")

Differential Privacy

fedthreat implements the Gaussian mechanism for differential privacy:

Gradient clipping: Each client's update is clipped to an L2 norm bound
Noise injection: Calibrated Gaussian noise is added to the clipped update
Budget tracking: A PrivacyAccountant tracks cumulative epsilon across rounds

from fedthreat.privacy import PrivacyAccountant
from fedthreat.metrics import privacy_report

# Set a privacy budget
accountant = PrivacyAccountant(budget_epsilon=10.0)
coordinator = FederatedCoordinator(
    initial_weights, privacy_accountant=accountant
)

# After training, check privacy spend
report = privacy_report(per_round_epsilon=1.0, num_rounds=10)
print(f"Total epsilon (simple composition): {report['total_epsilon_simple']}")
print(f"Total epsilon (advanced composition): {report['total_epsilon_advanced']}")

Architecture

┌─────────────────────────────────────────┐
│           FederatedCoordinator          │
│  - Manages FL rounds                    │
│  - Selects clients per round            │
│  - Aggregates updates (FedAvg/Median)   │
│  - Tracks privacy budget                │
└──────┬──────────┬──────────┬────────────┘
       │          │          │
  ┌────▼───┐ ┌───▼────┐ ┌───▼────┐
  │Client 0│ │Client 1│ │Client 2│   (Organizations)
  │  Org A │ │  Org B │ │  Org C │
  │        │ │        │ │        │
  │ Train  │ │ Train  │ │ Train  │   Local training
  │ + DP   │ │ + DP   │ │ + DP   │   + noise
  └────┬───┘ └───┬────┘ └───┬────┘
       │         │          │
       └─────────┴──────────┘
              Model updates only
              (no raw data shared)

Data Partitioning

Simulate realistic federated settings where organizations have different data distributions:

# Partition a CSV dataset
fedthreat partition data.csv --clients 5 --method non_iid --alpha 0.1

Method	Description
`iid`	Uniform random split (baseline)
`non_iid`	Dirichlet-based label skew (realistic)
`quantity_skew`	Varying dataset sizes per client

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest --cov=fedthreat

# Lint
ruff check src/ tests/

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
src/fedthreat		src/fedthreat
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fedthreat

The Problem

Installation

Quick Start

CLI Simulation

Python API

Differential Privacy

Architecture

Data Partitioning

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

fedthreat

The Problem

Installation

Quick Start

CLI Simulation

Python API

Differential Privacy

Architecture

Data Partitioning

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages