Skip to content

cwccie/fedthreat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fedthreat

Federated learning utilities for distributed threat detection — lightweight abstractions for building FL-based security systems with differential privacy.

The Problem

Organizations face a fundamental tension in cybersecurity: effective threat detection requires broad visibility across network traffic, but sharing raw security data between organizations violates privacy regulations, exposes proprietary infrastructure details, and creates new attack surfaces.

Federated learning resolves this by training a shared threat detection model across multiple organizations without ever centralizing the data. Each organization trains locally on its own data and shares only model updates — not raw network logs, alerts, or telemetry.

fedthreat provides the building blocks:

  • Aggregation strategies (FedAvg, FedMedian, Trimmed Mean) to combine updates from multiple orgs
  • Differential privacy (gradient clipping + calibrated Gaussian noise) to protect individual data points
  • Data partitioning to simulate realistic non-IID distributions across organizations
  • A complete coordinator that manages the FL training loop
  • Evaluation metrics tailored to binary threat/benign classification

Installation

pip install fedthreat

Or from source:

git clone https://github.com/cwccie/fedthreat.git
cd fedthreat
pip install -e ".[dev]"

Quick Start

CLI Simulation

Run a federated learning simulation with synthetic threat data:

# 5 organizations, 10 FL rounds, IID data
fedthreat simulate --clients 5 --rounds 10

# With differential privacy (epsilon=1.0)
fedthreat simulate --clients 5 --rounds 10 --dp-epsilon 1.0

# Byzantine-robust aggregation
fedthreat simulate --clients 5 --rounds 10 --aggregation fedmedian

Python API

import numpy as np
from fedthreat.client import SimpleThreatClient
from fedthreat.coordinator import FederatedCoordinator
from fedthreat.data import DataPartitioner
from fedthreat.models import ModelWeights, TrainingConfig
from fedthreat.metrics import evaluate_model

# Generate or load your threat detection dataset
X = np.random.randn(1000, 10)  # 1000 samples, 10 features
y = (X @ np.random.randn(10) > 0).astype(float)  # Binary labels

# Partition across organizations (non-IID to simulate real-world)
partitioner = DataPartitioner(X, y, seed=42)
partitions = partitioner.non_iid_partition(n_clients=5, alpha=0.5)

# Initialize coordinator
initial_weights = ModelWeights(arrays={
    "W": np.zeros(10),
    "b": np.zeros(1),
})
coordinator = FederatedCoordinator(initial_weights, aggregation="fedavg")

# Create clients and register data
client_data = {}
for i, (X_part, y_part) in enumerate(partitions):
    client = SimpleThreatClient(f"org_{i}", n_features=10)
    coordinator.add_client(client)
    client_data[f"org_{i}"] = (X_part, y_part)

# Run federated training
config = TrainingConfig(epochs=3, lr=0.01, dp_epsilon=1.0)
history = coordinator.run_training(client_data, config, num_rounds=10)

# Evaluate
print(f"Final avg loss: {history[-1].metrics['avg_loss']:.4f}")

Differential Privacy

fedthreat implements the Gaussian mechanism for differential privacy:

  1. Gradient clipping: Each client's update is clipped to an L2 norm bound
  2. Noise injection: Calibrated Gaussian noise is added to the clipped update
  3. Budget tracking: A PrivacyAccountant tracks cumulative epsilon across rounds
from fedthreat.privacy import PrivacyAccountant
from fedthreat.metrics import privacy_report

# Set a privacy budget
accountant = PrivacyAccountant(budget_epsilon=10.0)
coordinator = FederatedCoordinator(
    initial_weights, privacy_accountant=accountant
)

# After training, check privacy spend
report = privacy_report(per_round_epsilon=1.0, num_rounds=10)
print(f"Total epsilon (simple composition): {report['total_epsilon_simple']}")
print(f"Total epsilon (advanced composition): {report['total_epsilon_advanced']}")

Architecture

┌─────────────────────────────────────────┐
│           FederatedCoordinator          │
│  - Manages FL rounds                    │
│  - Selects clients per round            │
│  - Aggregates updates (FedAvg/Median)   │
│  - Tracks privacy budget                │
└──────┬──────────┬──────────┬────────────┘
       │          │          │
  ┌────▼───┐ ┌───▼────┐ ┌───▼────┐
  │Client 0│ │Client 1│ │Client 2│   (Organizations)
  │  Org A │ │  Org B │ │  Org C │
  │        │ │        │ │        │
  │ Train  │ │ Train  │ │ Train  │   Local training
  │ + DP   │ │ + DP   │ │ + DP   │   + noise
  └────┬───┘ └───┬────┘ └───┬────┘
       │         │          │
       └─────────┴──────────┘
              Model updates only
              (no raw data shared)

Data Partitioning

Simulate realistic federated settings where organizations have different data distributions:

# Partition a CSV dataset
fedthreat partition data.csv --clients 5 --method non_iid --alpha 0.1
Method Description
iid Uniform random split (baseline)
non_iid Dirichlet-based label skew (realistic)
quantity_skew Varying dataset sizes per client

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest --cov=fedthreat

# Lint
ruff check src/ tests/

License

MIT License. Copyright (c) 2026 Corey Wade.

About

Federated learning utilities for distributed threat detection — Flower integration with differential privacy and secure aggregation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors