# Federated Learning Pipeline

This annotated notebook explains the pipeline **step by step**, with:
- short introductions before each code cell;
- comments at the top of every cell describing *what it does* and *how to check the result*;
- a **checklist**, **troubleshooting tips**, a **glossary**, and suggested exercises.

> **Learning Objectives**
> - Understand the end‑to‑end flow of a Federated Learning (FL) pipeline: preprocessing → training → evaluation → saving artifacts.
> - Learn to read and adapt components (dataset, model, training loop, metrics).
> - Run experiments (saving results).

> **Prerequisites**
> - Intermediate Python; experience with PyTorch and scikit‑learn.
> - Basic ML concepts: train/val/test split, overfitting, metrics.
> - Introductory knowledge of FL (clients, server/aggregator, training rounds).

> **Mini Glossary (FL)**
> - **Client**: A device/site (e.g., hospital, phone) that trains locally on private data.
> - **Server**:The coordinator distributes the starting model, communicates with clients, collects client updates, and distributes new models.
> - **Aggregator/Aggregation Method**:  Method chosen to aggregate all client models and create the new global model.
> - **Round**: One cycle of local training → sending updates → aggregation.
> - **Aggregation frequency**: Number of epochs between sending weights to the server.
> - **Federated Averaging (FedAvg)**: Standard method to average/aggregate client model weights.
> - **IID Data**: Customer data follows the same distribution across all clients (homogeneity).
> - **Non‑IID Data**: Client data may follow different distributions (heterogeneity).
> - **Global model**: Model sent by the server to all clients
> - **Local model**: Model trained by the client on its data

# Import module
**Environment setup and library imports.**

This section imports all required libraries:
 - Data handling: `pandas`, `numpy`, `sklearn`
 - Model building and training: `torch`, `torch.nn`
 - Visualization: `matplotlib`

In [1]:
import torch
import torch.nn as nn
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import pandas as pd
from torch.utils.data import DataLoader, TensorDataset
import torch.nn.functional as F
from sklearn.metrics import confusion_matrix, f1_score, precision_score, recall_score
from copy import deepcopy
import matplotlib.pyplot as plt # New import
import numpy as np # New import

# Definition of the model, train function, and evaluation.

This part defines:
 - a simple **MLP (Multilayer Perceptron)** model;
 - two helper functions:
   - `train()` → performs one epoch of training;
   - `evaluate()` → computes accuracy and loss on validation/test sets. 

In [2]:
# Model definition
class MLP(nn.Module):
    def __init__(self, input_dim, output_dim, hidden_dim=64):
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        self.fc3 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Training and evaluation loops
def train(model, loader, optimizer, criterion):
    model.train()
    total_loss = 0
    correct = 0
    for features, labels in loader:
        features, labels = features.to(DEVICE), labels.to(DEVICE)
        optimizer.zero_grad()
        outputs = model(features)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
        correct += (outputs.argmax(1) == labels).sum().item()
    return total_loss / len(loader), correct / len(loader.dataset)

# Evaluation function
def evaluate(model, loader, criterion):
    model.eval()
    total_loss = 0
    correct = 0
    with torch.no_grad():
        for features, labels in loader:
            features, labels = features.to(DEVICE), labels.to(DEVICE)
            outputs = model(features)
            loss = criterion(outputs, labels)
            total_loss += loss.item()
            correct += (outputs.argmax(1) == labels).sum().item()
    return total_loss / len(loader), correct / len(loader.dataset)

# Data loading and preprocessing

The following helper functions:
 - load CSV data and separate features/labels;
 - normalize features using `StandardScaler`;
 - convert arrays to PyTorch tensors;
 - build DataLoaders for batching.

In [3]:
def load_dataset(train_file, test_file):

    # Load train/test CSVs
    train_data = pd.read_csv(train_file)
    test_data = pd.read_csv(test_file)
    class_names = train_data["Outcome"].unique()
    print(f"Classes: {class_names}")

    ### DROP non-feature ID columns (PatientID and CenterID)
    train_data = train_data.drop(columns=['PatientID', 'CenterID'])
    test_data = test_data.drop(columns=['PatientID', 'CenterID'])

    ### One-hot encode Gender
    train_data = pd.get_dummies(train_data, columns=['Gender'], drop_first=True)
    test_data = pd.get_dummies(test_data, columns=['Gender'], drop_first=True)


    train_data = train_data.fillna(0)
    test_data = test_data.fillna(0)

    # Feature and label columns
    feature_names = train_data.columns[:-1]
    print(f"Features: {feature_names}")
    
    X_train = train_data[feature_names].values
    y_train = train_data["Outcome"].values
    X_test = test_data[feature_names].values
    y_test = test_data["Outcome"].values

    # Encode class labels as integers
    class_map = {label: idx for idx, label in enumerate(class_names)}
    y_train = [class_map[label] for label in y_train]
    y_test = [class_map[label] for label in y_test]

    # Split training data into training and validation
    X_train, X_val, y_train, y_val = train_test_split(
        X_train, y_train, test_size=0.2, random_state=42, stratify=y_train
    )
    print(f"Training samples: {len(X_train)}, Validation samples: {len(X_val)}, Test samples: {len(X_test)}")

    input_dim = X_train.shape[1]
    output_dim = len(class_names)

    return X_train, y_train, X_val, y_val, X_test, y_test, input_dim, output_dim

# Normalize features with mean 0 and std 1
def features_scaling(X_train, X_val, X_test):
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_val_scaled = scaler.transform(X_val)
    X_test_scaled = scaler.transform(X_test)
    print("Features scaled successfully.")
    return X_train_scaled, X_val_scaled, X_test_scaled, scaler

# Convert numpy arrays to PyTorch tensors.
def convert_to_tensors(X_train, y_train, X_val, y_val, X_test, y_test):
    X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
    y_train_tensor = torch.tensor(y_train, dtype=torch.long)
    X_val_tensor = torch.tensor(X_val, dtype=torch.float32)
    y_val_tensor = torch.tensor(y_val, dtype=torch.long)
    X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
    y_test_tensor = torch.tensor(y_test, dtype=torch.long)

    return X_train_tensor, y_train_tensor, X_val_tensor, y_val_tensor, X_test_tensor, y_test_tensor


def create_dataloader(X_tensor, y_tensor, batch_size):
    data = TensorDataset(X_tensor, y_tensor)
    loader = DataLoader(data, batch_size=batch_size, shuffle=True)
    return loader

### Préparation des données pour le modèle serveur 
def preprocess_data_for_server(train_file, test_file):
    # Logique complète de chargement, imputation, encodage et split pour le modèle serveur
    train_data = pd.read_csv(train_file)
    test_data = pd.read_csv(test_file)
    
    cols_to_fill = ['Tobacco', 'Alcohol', 'Performance status']
    for col in cols_to_fill:
        if col in train_data.columns:
            train_data[col] = train_data[col].fillna(0.0)
            test_data[col] = test_data[col].fillna(0.0)

    train_data_processed = pd.get_dummies(train_data, columns=['Gender', 'CenterID'], drop_first=True)
    test_data_processed = pd.get_dummies(test_data, columns=['Gender', 'CenterID'], drop_first=True)
    train_data_processed.rename(columns={'Outcome': 'Class'}, inplace=True)
    test_data_processed.rename(columns={'Outcome': 'Class'}, inplace=True)
    
    all_feature_cols = sorted([col for col in train_data_processed.columns if col not in ['PatientID', 'Class']])
    
    # Alignement des colonnes (features)
    for df in [train_data_processed, test_data_processed]:
        for col in all_feature_cols:
            if col not in df.columns: df[col] = 0
    
    train_data_processed = train_data_processed.reindex(columns=all_feature_cols + ['Class'])
    test_data_processed = test_data_processed.reindex(columns=all_feature_cols + ['Class'])
    
    feature_names = all_feature_cols
    class_names = train_data_processed["Class"].unique()
    class_map = {label: idx for idx, label in enumerate(class_names)}
    train_data_processed['Class'] = train_data_processed['Class'].apply(lambda x: class_map.get(x, x))
    test_data_processed['Class'] = test_data_processed['Class'].apply(lambda x: class_map.get(x, x))
    
    X_full = train_data_processed[feature_names].values
    y_full = train_data_processed["Class"].values
    X_test = test_data_processed[feature_names].values
    y_test = test_data_processed["Class"].values
    
    X_train, X_val, y_train, y_val = train_test_split(X_full, y_full, test_size=0.2, random_state=42, stratify=y_full)
    
    input_dim = X_train.shape[1]
    output_dim = len(class_names)
    
    return X_train, y_train, X_val, y_val, X_test, y_test, input_dim, output_dim, train_data, feature_names, class_map


#### Hyperparameters


In [4]:
HIDDEN_DIM = 64
BATCH_SIZE = 32
EPOCHS = 100
LEARNING_RATE = 1e-3
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
PATIENCE = 5  # Early stopping patience

### Pipeline execution

Data loading and data loader creation.

In [5]:
# Load and preprocess server data

# Préparation des données (Serveur et Clients)
X_train_full, y_train_full, X_val_server, y_val_server, X_test_server, y_test_server, INPUT_DIM, OUTPUT_DIM, original_train_data, feature_names, class_map = preprocess_data_for_server(
    "../data_hn/data_hn_clinical_train.csv", "../data_hn/data_hn_clinical_test.csv")

X_train_server, X_val_server_scaled, X_test_server_scaled, scaler = features_scaling(X_train_full, X_val_server, X_test_server)

X_train_tensor_server, y_train_tensor_server, X_val_tensor_server, y_val_tensor_server, X_test_tensor_server, y_test_tensor_server = convert_to_tensors(
    X_train_server, y_train_full, X_val_server_scaled, y_val_server, X_test_server_scaled, y_test_server)

# Split final pour l'entraînement centralisé
X_train_server_final, X_val_server_final, y_train_server_final, y_val_server_final = train_test_split(
    X_train_tensor_server, y_train_tensor_server, test_size=0.2, random_state=42, stratify=y_train_tensor_server.cpu().numpy()
)

train_loader_server = create_dataloader(X_train_server_final, y_train_server_final, BATCH_SIZE)
val_loader_server = create_dataloader(X_val_server_final, y_val_server_final, BATCH_SIZE)
test_loader_server = create_dataloader(X_test_tensor_server, y_test_tensor_server, BATCH_SIZE)

print(f"Device: {DEVICE}, Input dim: {INPUT_DIM}, Output dim: {OUTPUT_DIM}")

Features scaled successfully.
Device: cpu, Input dim: 10, Output dim: 2


#### Server training

The server trains a baseline (centralized) model before FL.
 - Uses early stopping to prevent overfitting.
 - Prints training and validation accuracy per epoch.

In [13]:
####### Train server

# Model, optimizer, and loss
server_model = MLP(INPUT_DIM, OUTPUT_DIM, HIDDEN_DIM).to(DEVICE)
optimizer = torch.optim.Adam(server_model.parameters(), lr=LEARNING_RATE)
criterion = nn.CrossEntropyLoss()

# Early stopping setup
best_val_loss = float('inf')
patience_counter = 0


# Training loop with early stopping
print("Starting training...")
for epoch in range(EPOCHS):
    train_loss, train_acc = train(server_model, train_loader_server, optimizer, criterion)
    val_loss, val_acc = evaluate(server_model, val_loader_server, criterion)
    print(f"Epoch {epoch+1}/{EPOCHS} | Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f} | Val Loss: {val_loss:.4f} | Val Acc: {val_acc:.4f}")

    # Early stopping check (avoid overfitting)
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        patience_counter = 0
        print("New best validation loss. Resetting patience counter.")
    else:
        patience_counter += 1
        print(f"No improvement. Patience counter: {patience_counter}/{PATIENCE}")
        if patience_counter >= PATIENCE:
            print("Early stopping triggered.")
            break


# -----Test the model------
test_loss, test_acc = evaluate(server_model, test_loader_server, criterion)
server_test_accuracy = test_acc
print(f"Test Loss: {test_loss:.4f}, Test Accuracy: {test_acc:.4f}")

# Collect predictions for metrics
server_model.eval()
y_true = []
y_pred = []
with torch.no_grad():
    for features, labels in test_loader_server:
        features, labels = features.to(DEVICE), labels.to(DEVICE)
        outputs = server_model(features)
        _, preds = torch.max(outputs, 1)
        y_true.extend(labels.cpu().numpy())
        y_pred.extend(preds.cpu().numpy())

# Confusion matrix and other metrics
conf_matrix = confusion_matrix(y_true, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

f1 = f1_score(y_true, y_pred, average='weighted')
precision = precision_score(y_true, y_pred, average='weighted')
recall = recall_score(y_true, y_pred, average='weighted')

print(f"F1 Score: {f1:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")

Starting training...
Epoch 1/100 | Train Loss: 0.6462 | Train Acc: 0.6890 | Val Loss: 0.5934 | Val Acc: 0.7968
New best validation loss. Resetting patience counter.
Epoch 2/100 | Train Loss: 0.5423 | Train Acc: 0.7560 | Val Loss: 0.4891 | Val Acc: 0.7914
New best validation loss. Resetting patience counter.
Epoch 3/100 | Train Loss: 0.4882 | Train Acc: 0.7627 | Val Loss: 0.4541 | Val Acc: 0.7861
New best validation loss. Resetting patience counter.
Epoch 4/100 | Train Loss: 0.4712 | Train Acc: 0.7668 | Val Loss: 0.4487 | Val Acc: 0.7968
New best validation loss. Resetting patience counter.
Epoch 5/100 | Train Loss: 0.4562 | Train Acc: 0.7761 | Val Loss: 0.4467 | Val Acc: 0.7914
New best validation loss. Resetting patience counter.
Epoch 6/100 | Train Loss: 0.4436 | Train Acc: 0.7895 | Val Loss: 0.4469 | Val Acc: 0.7861
No improvement. Patience counter: 1/5
Epoch 7/100 | Train Loss: 0.4453 | Train Acc: 0.7882 | Val Loss: 0.4494 | Val Acc: 0.7861
No improvement. Patience counter: 2/5
Epo

#### Definition of Federated Learning parameters.

The main parameters of FL are defined: **aggregation frequency, aggregation round, number of clients, and data structures** suitable for containing **client data and models**.

These parameters are examples; modify and select them based on your data.

Define FL setup:
 - how often to aggregate (`aggregation_freq`);
 - number of rounds (`aggregation_round`);
 - number of clients and placeholder structures.


In [14]:
# FL parameters
aggregation_freq = 5
aggregation_round = round(EPOCHS / aggregation_freq)

print(f"Aggregation_round: {aggregation_round}")
clients_number = 5

# FL clients data
train_loader_clients = []
val_loader_clients = []
test_loader_clients = []

models = []

Aggregation_round: 20


**fed_avg** is FL's **standard** aggregation method. It receives the weights of all models and **averages** each weight. The new model is composed of the **average of all model weights**.

**fed_avg_weighted** is a **variant of fed_avg** that implements **weighted averaging**, taking into account the amount of data from each client.

With the **fed_avg_w** variable, we choose whether to use the weighted version of FedAVG or not.

In [None]:
# --- Implémentation FedAvg Standard et Pondéré (Task 3.1 B & C) ---

def fed_avg(models_weights):
    """Standard FedAvg: Moyenne simple (égale) des poids."""
    new_state_dict = {}
    num_clients = len(models_weights)
    for key in models_weights[0].keys():
        new_state_dict[key] = sum([m[key] for m in models_weights]) / num_clients
    return new_state_dict

#######def fed_avg_weighted(<parameters>):
def fed_avg_weighted(models_weights, client_data_sizes):
    """Weighted FedAvg: Moyenne pondérée par la taille des données du client."""
    total_size = sum(client_data_sizes)
    weights = [size / total_size for size in client_data_sizes]

    new_state_dict = deepcopy(models_weights[0])
    for key in new_state_dict.keys():
        new_state_dict[key] = new_state_dict[key] * 0.0

    for key in new_state_dict.keys():
        for i, client_weights in enumerate(models_weights):
            new_state_dict[key] += client_weights[key] * weights[i]
    return new_state_dict


#### Data loading and preprocessing (Clients)

Creation of the global model from the server model: deepcopy(…)

Loading of each client's data and preprocessing.

Creation of models for each client



Each client:
 - loads its local dataset;
 - scales features and converts to tensors;
 - creates DataLoaders;
 - initializes a model with the global weights.

In [None]:
# --- Partitionnement des Données Clients (Task 3.1 A) ---

def partition_data_for_clients(original_train_data, feature_names, scaler, class_map):
    # Partitionnement par CenterID pour simuler un environnement Non-IID
    cols_to_fill = ['Tobacco', 'Alcohol', 'Performance status']
    for col in cols_to_fill:
        original_train_data[col] = original_train_data[col].fillna(0.0)

    client_groups = original_train_data.groupby('CenterID')
    unique_centers = list(client_groups.groups.keys())
    
    train_loader_clients = []
    client_sizes = []
    client_names = []

    for center_id in unique_centers:
        client_data = client_groups.get_group(center_id).copy()
        
        client_data_processed = pd.get_dummies(client_data, columns=['Gender', 'CenterID'], drop_first=True)
        
        X_client_df = pd.DataFrame(0, index=client_data_processed.index, columns=feature_names)
        for col in client_data_processed.columns.intersection(feature_names):
            X_client_df[col] = client_data_processed[col]

        X_client = X_client_df.values
        y_client = client_data_processed['Outcome'].values
        
        X_client_scaled = scaler.transform(X_client)
        y_client_int = np.array([class_map.get(label, label) for label in y_client])
        
        X_tensor = torch.tensor(X_client_scaled, dtype=torch.float32)
        y_tensor = torch.tensor(y_client_int.tolist(), dtype=torch.long)
        
        loader = create_dataloader(X_tensor, y_tensor, BATCH_SIZE)
        train_loader_clients.append(loader)
        client_sizes.append(len(client_data))
        client_names.append(center_id)
        
    return train_loader_clients, client_sizes, client_names



# Partitionnement des données clients
train_loader_clients, client_sizes, client_names = partition_data_for_clients(
    original_train_data, feature_names, scaler, class_map)
clients_number = len(client_names)

# Simulations FL
global_model_standard = deepcopy(server_model)
std_acc, std_loss, std_client_acc = run_fl_simulation(
    global_model_standard, train_loader_clients, client_sizes, test_loader_server, 
    aggregation_round, aggregation_freq, criterion, LEARNING_RATE, weighted=False)

global_model_weighted = deepcopy(server_model)
wtd_acc, wtd_loss, wtd_client_acc = run_fl_simulation(
    global_model_weighted, train_loader_clients, client_sizes, test_loader_server, 
    aggregation_round, aggregation_freq, criterion, LEARNING_RATE, weighted=True)




--- Starting FL Simulation: Standard FedAvg ---

--- Starting FL Simulation: Weighted FedAvg ---


#### Now we need to define the core of the FL, the aggregation and training cycles.


You can draw **inspiration** from server training to train each client.

For each round, you need to **aggregate the models** with the chosen function (pass the correct parameters based on the function).

Also remember to **evaluate** the global model obtained and print the metrics.

Then start a new round, and each client works with the new models.


This is the **core FL training process**:
 - Each client trains locally for several epochs;
 - Local weights are collected;
 - Global model is updated by averaging;
 - Metrics are logged for analysis.

In [None]:
# --- Boucle d'Agrégation et d'Entraînement FL (Task 3.1 D) ---

def run_fl_simulation(global_model, train_loader_clients, client_sizes, test_loader_server, 
                      num_rounds, local_epochs, criterion, lr, weighted=False):
    
    global_model.train()
    models = [deepcopy(global_model) for _ in train_loader_clients]
    
    global_test_accuracy_history = []
    global_test_loss_history = []
    
    
    print(f"\n--- Starting FL Simulation: {'Weighted' if weighted else 'Standard'} FedAvg ---")

    for round_num in range(1, num_rounds + 1):
        
        # 1. Local Training
        client_weights_updates = []
        for i, loader in enumerate(train_loader_clients):
            
            models[i].load_state_dict(global_model.state_dict())
            optimizer = torch.optim.Adam(models[i].parameters(), lr=lr)
            
            for _ in range(local_epochs):
                train_loss, train_acc = train(models[i], loader, optimizer, criterion)
            
            client_weights_updates.append(models[i].state_dict())
        
        # 2. Global Aggregation
        if weighted:
            new_global_weights = fed_avg_weighted(client_weights_updates, client_sizes)
        else:
            new_global_weights = fed_avg(client_weights_updates)
        
        global_model.load_state_dict(new_global_weights)
        
        # 3. Global Evaluation
        test_loss, test_acc = evaluate(global_model, test_loader_server, criterion)
        global_test_accuracy_history.append(test_acc)
        global_test_loss_history.append(test_loss)
        
    # 4. Final Per-Client Accuracy 
    final_client_accuracies = []
    for i, loader in enumerate(train_loader_clients):
        _, client_acc = evaluate(global_model, loader, criterion)
        final_client_accuracies.append(client_acc)
        
    return global_test_accuracy_history, global_test_loss_history, final_client_accuracies



# Model validation

Test the model and print the metrics.

In [23]:
print("\n=== Federated Learning Models Validation ===")

# Standard FedAvg
print("Standard FedAvg - Global Test Accuracy History:")
print(std_acc)
print("Final Per-client Accuracies:", std_client_acc)

# Weighted FedAvg
print("\nWeighted FedAvg - Global Test Accuracy History:")
print(wtd_acc)
print("Final Per-client Accuracies:", wtd_client_acc)

# Pour un résumé complet, on peut aussi afficher la dernière valeur globale
print(f"\nFinal Standard FedAvg Test Accuracy: {std_acc[-1]:.4f}")
print(f"Final Weighted FedAvg Test Accuracy: {wtd_acc[-1]:.4f}")


=== Federated Learning Models Validation ===
Standard FedAvg - Global Test Accuracy History:
[0.8082191780821918, 0.8116438356164384, 0.815068493150685, 0.8082191780821918, 0.8184931506849316, 0.815068493150685, 0.8184931506849316, 0.815068493150685, 0.8082191780821918, 0.791095890410959, 0.8013698630136986, 0.7945205479452054, 0.7808219178082192, 0.7773972602739726, 0.7808219178082192, 0.7808219178082192, 0.7842465753424658, 0.773972602739726, 0.7773972602739726, 0.7773972602739726]
Final Per-client Accuracies: [0.8590604026845637, 0.8518518518518519, 0.7997587454764777]

Weighted FedAvg - Global Test Accuracy History:
[0.8082191780821918, 0.8082191780821918, 0.797945205479452, 0.8013698630136986, 0.7945205479452054, 0.8047945205479452, 0.791095890410959, 0.7876712328767124, 0.7773972602739726, 0.7773972602739726, 0.7876712328767124, 0.7671232876712328, 0.791095890410959, 0.797945205479452, 0.797945205479452, 0.797945205479452, 0.7876712328767124, 0.7808219178082192, 0.77397260273972

# Graphs

This code generates and saves key results from the Federated Learning experiments. It plots data distribution per client, global accuracy and loss over rounds (comparing Standard vs. Weighted FedAvg), and per-client accuracies. Finally, it saves all metrics in CSV files.

In [None]:
# ---- Génération des Graphiques (Task 3.2) ---

# ------> Global accuracy vs. rounds
plt.figure(figsize=(10, 6))
# Courbes de précision globale du modèle à chaque round
plt.plot(range(1, aggregation_round + 1), std_acc, label='Standard FedAvg', marker='o')
plt.plot(range(1, aggregation_round + 1), wtd_acc, label='Weighted FedAvg', marker='x')
plt.axhline(y=server_test_accuracy, color='r', linestyle='--', label=f'Centralisé (Baseline) Acc: {server_test_accuracy:.4f}')
plt.title('Précision (Accuracy) Globale de Test vs. Rounds de Communication')
plt.xlabel('Round de Communication')
plt.ylabel('Précision Globale de Test')
plt.legend()
plt.grid(True)
plt.savefig('fl_global_accuracy_vs_rounds.png')
plt.close()

# ------> Global loss vs. rounds
plt.figure(figsize=(10, 6))
# Courbes de perte globale du modèle à chaque round
plt.plot(range(1, aggregation_round + 1), std_loss, label='Standard FedAvg', marker='o')
plt.plot(range(1, aggregation_round + 1), wtd_loss, label='Weighted FedAvg', marker='x')
plt.title('Perte (Loss) Globale de Test vs. Rounds de Communication')
plt.xlabel('Round de Communication')
plt.ylabel('Perte Globale de Test')
plt.legend()
plt.grid(True)
plt.savefig('fl_global_loss_vs_rounds.png')
plt.close()

# ------> Per-client accuracy (bar chart)
x = np.arange(clients_number)
width = 0.35
fig, ax = plt.subplots(figsize=(10, 6))
# Barres pour Standard FedAvg
rects1 = ax.bar(x - width/2, std_client_acc, width, label='Standard FedAvg', color='C0')
# Barres pour Weighted FedAvg
rects2 = ax.bar(x + width/2, wtd_client_acc, width, label='Weighted FedAvg', color='C1')
ax.set_ylabel('Précision Finale du Client (sur données locales)')
ax.set_xlabel('ID Client (CenterID)')
ax.set_title('Comparaison de la Précision Finale par Client')
ax.set_xticks(x)
ax.set_xticklabels(client_names)
ax.legend()
plt.grid(axis='y', linestyle='--')
plt.savefig('fl_per_client_accuracy.png')
plt.close()

# ------> Weighted vs. Unweighted convergence
# Overlay accuracy curves for both versions (déjà inclus dans Global Accuracy vs rounds)
plt.figure(figsize=(10, 6))
plt.plot(range(1, aggregation_round + 1), std_acc, label='Standard FedAvg', marker='o')
plt.plot(range(1, aggregation_round + 1), wtd_acc, label='Weighted FedAvg', marker='x')
plt.title('Convergence: Standard vs Weighted FedAvg')
plt.xlabel('Round de Communication')
plt.ylabel('Précision Globale de Test')
plt.legend()
plt.grid(True)
plt.savefig('fl_weighted_vs_unweighted_convergence.png')
plt.close()

# ------> Data distribution histogram
plt.figure(figsize=(8, 6))
# Histogramme montrant la taille des datasets par client
plt.bar(client_names, client_sizes, color='skyblue')
plt.title('Distribution des données par client (Simulation Non-IID)')
plt.xlabel('ID Client (CenterID)')
plt.ylabel('Nombre d\'échantillons')
plt.grid(axis='y', linestyle='--')
plt.savefig('fl_client_data_distribution.png')
plt.close()


##  Interpretation paragraph

 - **Standard FedAvg** treats all clients equally regardless of dataset size, which may bias results toward smaller clients.
 - **Weighted FedAvg** gives larger clients more influence, leading to more stable and fairer global models.
 - In most experiments, **Weighted FedAvg** converges faster and achieves higher overall accuracy.
