# Day 8: Federated Averaging (FedAvg) from Scratch

**Implementing the Core Federated Learning Algorithm**

## Overview
- **Objective**: Implement FedAvg algorithm from scratch
- **Reference**: McMahan et al., 2017 (Communication-Efficient Learning)
- **Goal**: Understand how FL really works

## What You'll Learn
1. **FedAvg Algorithm**: Weighted averaging of client updates
2. **Client-Server Architecture**: FL communication pattern
3. **Local Training**: Each client trains on their data
4. **Aggregation**: Server combines client updates

---

## 1. Setup

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from typing import List, Tuple, Dict

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("‚úÖ Libraries imported!")

## 2. Generate Client Data

Simulate multiple banks with their own transaction data

In [None]:
# Generate data for 10 banks (clients)
np.random.seed(42)

n_clients = 10
n_samples_per_client = 1000
n_features = 20

clients_data = []

for client_id in range(n_clients):
    # Generate non-IID data (each bank has slightly different distribution)
    X, y = make_classification(
        n_samples=n_samples_per_client,
        n_features=n_features,
        n_informative=15,
        n_redundant=5,
        n_classes=2,
        flip_y=0.01 + client_id * 0.002,  # Varying noise
        random_state=42 + client_id
    )
    
    # Simulate class imbalance (fraud rate varies by bank)
    fraud_ratio = 0.05 + client_id * 0.01
    n_fraud = int(n_samples_per_client * fraud_ratio)
    fraud_indices = np.random.choice(n_samples_per_client, n_fraud, replace=False)
    y[:] = 0
    y[fraud_indices] = 1
    
    clients_data.append((X, y))
    
    print(f"Client {client_id}: {X.shape}, Fraud rate: {y.mean()*100:.1f}%")

print(f"\nTotal clients: {n_clients}")
print(f"Total samples: {n_clients * n_samples_per_client}")

## 3. Define Simple Model

Logistic regression with gradient descent

In [None]:
class LogisticModel:
    """Simple logistic regression model for FL."""
    
    def __init__(self, n_features: int, learning_rate: float = 0.01):
        self.n_features = n_features
        self.learning_rate = learning_rate
        # Initialize weights
        self.weights = np.random.randn(n_features) * 0.01
        self.bias = 0.0
    
    def sigmoid(self, z):
        return 1 / (1 + np.exp(-np.clip(z, -500, 500)))
    
    def forward(self, X):
        return self.sigmoid(X @ self.weights + self.bias)
    
    def predict(self, X):
        return (self.forward(X) >= 0.5).astype(int)
    
    def compute_gradients(self, X, y):
        """Compute gradients for binary cross-entropy."""
        m = len(y)
        predictions = self.forward(X)
        
        # Gradient of BCE loss
        dz = predictions - y
        dw = (X.T @ dz) / m
        db = np.mean(dz)
        
        return dw, db
    
    def train(self, X, y, epochs: int):
        """Train locally for specified epochs."""
        for epoch in range(epochs):
            dw, db = self.compute_gradients(X, y)
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db
    
    def get_weights(self):
        """Return model weights as a dictionary."""
        return {'weights': self.weights.copy(), 'bias': self.bias}
    
    def set_weights(self, weights_dict):
        """Set model weights from dictionary."""
        self.weights = weights_dict['weights'].copy()
        self.bias = weights_dict['bias']
    
    def compute_update(self, X, y, epochs: int):
        """Train locally and return weight update (not absolute weights)."""
        old_weights = self.get_weights()
        self.train(X, y, epochs)
        new_weights = self.get_weights()
        
        # Return update (delta)
        update = {
            'weights': new_weights['weights'] - old_weights['weights'],
            'bias': new_weights['bias'] - old_weights['bias']
        }
        return update
    
    def apply_update(self, update):
        """Apply weight update to model."""
        self.weights += update['weights']
        self.bias += update['bias']

# Test model
model = LogisticModel(n_features)
print(f"Model initialized with {n_features} features")
print(f"Weights shape: {model.weights.shape}")

## 4. FedAvg Server Implementation

In [None]:
class FedAvgServer:
    """
    Federated Averaging Server.
    
    Algorithm:
    1. Select random subset of clients
    2. Send global model to selected clients
    3. Clients train locally and send back updates
    4. Aggregate updates using weighted averaging
    """
    
    def __init__(self, model, client_fraction: float = 0.5):
        self.model = model
        self.client_fraction = client_fraction
        self.round = 0
        self.history = {'train_loss': [], 'train_acc': [], 'test_acc': []}
    
    def select_clients(self, all_client_ids: List[int]) -> List[int]:
        """Randomly select clients for this round."""
        n_selected = max(1, int(len(all_client_ids) * self.client_fraction))
        return np.random.choice(all_client_ids, n_selected, replace=False).tolist()
    
    def aggregate_updates(self, updates: List[Dict], sample_counts: List[int]) -> Dict:
        """
        Aggregate client updates using weighted averaging.
        
        w_global = sum(n_k * w_k) / sum(n_k)
        
        where:
        - n_k = number of samples on client k
        - w_k = weight update from client k
        """
        total_samples = sum(sample_counts)
        
        # Initialize aggregated update
        agg_update = {
            'weights': np.zeros_like(self.model.weights),
            'bias': 0.0
        }
        
        # Weighted average
        for update, n_k in zip(updates, sample_counts):
            weight = n_k / total_samples
            agg_update['weights'] += weight * update['weights']
            agg_update['bias'] += weight * update['bias']
        
        return agg_update
    
    def federated_round(
        self, 
        clients_data: List[Tuple],
        client_ids: List[int],
        local_epochs: int = 5
    ) -> Dict:
        """Execute one round of federated learning."""
        # Select clients
        selected_ids = self.select_clients(client_ids)
        
        # Collect updates from selected clients
        updates = []
        sample_counts = []
        
        for client_id in selected_ids:
            X_client, y_client = clients_data[client_id]
            
            # Create client model with current global weights
            client_model = LogisticModel(self.model.n_features)
            client_model.set_weights(self.model.get_weights())
            
            # Train locally and get update
            update = client_model.compute_update(X_client, y_client, epochs=local_epochs)
            updates.append(update)
            sample_counts.append(len(X_client))
        
        # Aggregate updates
        agg_update = self.aggregate_updates(updates, sample_counts)
        
        # Apply aggregated update to global model
        self.model.apply_update(agg_update)
        
        self.round += 1
        
        return {
            'round': self.round,
            'n_clients': len(selected_ids),
            'client_ids': selected_ids
        }
    
    def evaluate(self, clients_data: List[Tuple]) -> Dict:
        """Evaluate global model on all clients."""
        all_predictions = []
        all_labels = []
        
        for X_client, y_client in clients_data:
            preds = self.model.predict(X_client)
            all_predictions.extend(preds)
            all_labels.extend(y_client)
        
        accuracy = accuracy_score(all_labels, all_predictions)
        return accuracy

# Initialize server
global_model = LogisticModel(n_features, learning_rate=0.01)
server = FedAvgServer(global_model, client_fraction=0.5)

print("‚úÖ FedAvg Server initialized")
print(f"   Client fraction per round: {server.client_fraction}")

## 5. Run Federated Training

In [None]:
# Training configuration
n_rounds = 30
local_epochs = 5
client_ids = list(range(n_clients))

# Track metrics
rounds_history = []
accuracy_history = []
n_clients_history = []

print("="*60)
print("FEDERATED TRAINING")
print("="*60)

# Initial evaluation
initial_acc = server.evaluate(clients_data)
print(f"\nRound 0: Initial accuracy = {initial_acc:.3f}")

# Training rounds
for round in range(n_rounds):
    # Run federated round
    result = server.federated_round(
        clients_data, 
        client_ids, 
        local_epochs=local_epochs
    )
    
    # Evaluate
    acc = server.evaluate(clients_data)
    
    # Track metrics
    rounds_history.append(result['round'])
    accuracy_history.append(acc)
    n_clients_history.append(result['n_clients'])
    
    # Print progress every 5 rounds
    if (round + 1) % 5 == 0:
        print(f"Round {result['round']}: Accuracy = {acc:.3f}, "
              f"Clients = {result['n_clients']}/{n_clients}")

print("\n" + "="*60)
print(f"Final accuracy: {accuracy_history[-1]:.3f}")
print(f"Improvement: {accuracy_history[-1] - initial_acc:+.3f}")

## 6. Visualize Training Progress

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Accuracy over rounds
axes[0].plot([0] + rounds_history, [initial_acc] + accuracy_history, 
             'o-', linewidth=2, markersize=5, color='steelblue')
axes[0].axhline(y=initial_acc, color='gray', linestyle='--', 
               label=f'Initial: {initial_acc:.3f}', alpha=0.7)
axes[0].set_xlabel('Federated Round', fontsize=12)
axes[0].set_ylabel('Accuracy', fontsize=12)
axes[0].set_title('Global Model Accuracy', fontsize=14)
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Number of clients per round
axes[1].bar(rounds_history, n_clients_history, color='coral', alpha=0.7)
axes[1].set_xlabel('Federated Round', fontsize=12)
axes[1].set_ylabel('Number of Clients', fontsize=12)
axes[1].set_title('Clients Selected per Round', fontsize=14)
axes[1].set_ylim(0, n_clients + 1)
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("\nFederated Learning Summary:")
print(f"  Total rounds: {n_rounds}")
print(f"  Total clients: {n_clients}")
print(f"  Clients per round: {sum(n_clients_history) / len(n_clients_history):.1f}")
print(f"  Initial accuracy: {initial_acc:.3f}")
print(f"  Final accuracy: {accuracy_history[-1]:.3f}")
print(f"  Improvement: {accuracy_history[-1] - initial_acc:+.3f}")

## 7. Comparison with Centralized Training

In [None]:
# Centralized training (all data in one place)
print("\n" + "="*60)
print("COMPARISON: Centralized vs Federated")
print("="*60)

# Combine all client data
X_all = np.vstack([X for X, y in clients_data])
y_all = np.concatenate([y for X, y in clients_data])

# Train centralized model
centralized_model = LogisticModel(n_features, learning_rate=0.01)

for epoch in range(n_rounds * local_epochs):
    centralized_model.train(X_all, y_all, epochs=1)
    if (epoch + 1) % (local_epochs * 5) == 0:
        acc = accuracy_score(y_all, centralized_model.predict(X_all))
        print(f"Epoch {epoch+1}: Accuracy = {acc:.3f}")

# Final comparison
centralized_acc = accuracy_score(y_all, centralized_model.predict(X_all))
federated_acc = accuracy_history[-1]

print("\n" + "="*60)
print("FINAL RESULTS")
print("="*60)
print(f"Centralized Accuracy:  {centralized_acc:.3f}")
print(f"Federated Accuracy:    {federated_acc:.3f}")
print(f"Gap:                  {centralized_acc - federated_acc:+.3f}")
print(f"\nFederated achieves {(federated_acc/centralized_acc)*100:.1f}% of centralized performance")

## 8. Summary

### FedAvg Algorithm:

**Server (Aggregator):**
1. Initialize global model
2. For each round:
   - Select random subset of clients
   - Send global weights to selected clients
   - Aggregate client updates: w = Œ£(n_k * w_k) / Œ£(n_k)

**Client:**
1. Receive global weights
2. Train locally on private data
3. Send weight update (Œîw = w_new - w_old) to server

### Key Takeaways:

- ‚úÖ **Federated learning** enables collaborative training without sharing data
- ‚úÖ **Weighted averaging** accounts for varying client data sizes
- ‚úÖ **Client sampling** reduces communication overhead
- ‚úÖ **Performance** approaches centralized training

### FedAvg Advantages:

- **Privacy**: Raw data never leaves clients
- **Communication**: Only model updates transmitted
- **Scalability**: Works with millions of devices

### FedAvg Limitations:

- **Non-IID data**: Client heterogeneity hurts convergence
- **Communication**: Model updates can still be large
- **Byzantine clients**: No defense against malicious participants

### Next Steps:
‚Üí **Day 9**: Non-IID Data Partitioning (realistic data splits)
‚Üí **Day 11**: Communication-Efficient FL (gradient compression)

---

**üìÅ Project Location**: `02_federated_learning_foundations/fedavg_from_scratch/`