# **Federated Learning Research Material**

### Introduction to Federated Learning

`Federated Learning (FL)` is a machine learning approach where multiple clients collaboratively train a model without sharing their raw data. Each client trains a local model and then shares only the model updates (weights) with a central server, which aggregates the updates to improve the global model.

In this workshop, we will explore how Federated Learning works by building a simple model and training it across multiple clients. We will also compare it to traditional centralized training (SGD).

In [12]:
# Importing necessary libraries
import numpy as np
import random
import os
import pickle 
import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow.keras.optimizers.schedules import ExponentialDecay  # added for learning rate scheduling
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import accuracy_score
from tensorflow.keras import backend as K
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.optimizers import SGD
from tqdm import tqdm

In [2]:
cd ..

/home/chukwuemeka-james/Documents/Ferated Learning/Fedrated Swarm Behavior


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [3]:
!pwd

/home/chukwuemeka-james/Documents/Ferated Learning/Fedrated Swarm Behavior


In [18]:
# Now we can import the custom utilities
from src.utils import load, batch_data, scale_model_weights, sum_scaled_weights, test_model, weight_scalling_factor
from src.clients import create_clients
from src.model import SimpleMLP

#### **Load and Preprocess the Data**


In [5]:
# Path to the data
data_path = 'Data/swarm_aligned'

# Load the data using the custom load function
data_list, label_list = load(data_path)

# Print some information about the dataset
print(f"Dataset loaded with {len(data_list)} samples.")

Dataset loaded with 24016 samples.


In [6]:
# Preview the data and labels (just the first 5 samples)
print(f"First 5 data samples: {data_list[:5]}")
print(f"First 5 labels: {label_list[:5]}")

First 5 data samples: [[-3.37407329e-04 -1.27701041e-04 -4.26608613e-06 ...  0.00000000e+00
   6.91926721e-06  0.00000000e+00]
 [-3.37118628e-04  1.42570308e-04 -3.23296795e-06 ...  0.00000000e+00
   1.04981985e-05  0.00000000e+00]
 [-3.35794424e-04  1.68734682e-05 -3.42861620e-06 ...  0.00000000e+00
   9.54381684e-06  0.00000000e+00]
 [-3.35703757e-04 -1.81284801e-04 -1.81093925e-06 ...  2.05192062e-07
   7.15786263e-07  0.00000000e+00]
 [-3.35551056e-04  1.66632656e-04 -3.94636826e-06 ...  0.00000000e+00
   3.10174047e-06  0.00000000e+00]]
First 5 labels: [0 0 0 1 0]


#### **Binarizing the Labels**


In [7]:
# One-hot encode the labels
n_values = np.max(label_list) + 1  # Number of unique labels
label_list = np.eye(n_values)[label_list]

# Verify the first few labels after transformation
print(f"One-hot encoded labels (first 5 samples): {label_list[:5]}")

One-hot encoded labels (first 5 samples): [[1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]]


#### **Split the Data into Train and Test**


In [8]:
# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(data_list, 
                                                    label_list, 
                                                    test_size=0.1, 
                                                    random_state=42)

# Check the shape of the training and testing data
print(f"Training data shape: {X_train.shape}")
print(f"Test data shape: {X_test.shape}")

Training data shape: (21614, 2400)
Test data shape: (2402, 2400)


#### **Creating Clients and Batching Data**

In Federated Learning, the data is distributed across several clients. Each client will have a small subset of the total dataset. Let's create clients and batch their data.

In [9]:
# Create clients by splitting the training data
clients = create_clients(X_train, y_train, num_clients=10, initial='client')

# Create batched data for each client
clients_batched = {client_name: batch_data(data) for client_name, data in clients.items()}

# Preview the batched data for the first client
client_data_example = list(clients_batched['client_1'])
print(f"First client data (shaped): {client_data_example[:1]}")

2025-04-10 20:50:51.410400: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


First client data (shaped): [(<tf.Tensor: shape=(32, 2400), dtype=float64, numpy=
array([[-2.23444612e-05, -3.00200759e-05, -8.58943516e-08, ...,
         0.00000000e+00,  1.43157253e-06,  0.00000000e+00],
       [ 4.58437242e-05, -7.33275307e-05,  9.56767638e-07, ...,
        -7.15786263e-08,  3.81752674e-06,  2.38595421e-07],
       [ 2.59672941e-04,  2.30545212e-04, -1.76322016e-06, ...,
         0.00000000e+00,  7.39645805e-06,  0.00000000e+00],
       ...,
       [-2.18398319e-04, -5.09949993e-05, -1.68448367e-06, ...,
        -2.38595421e-08,  6.91926721e-06,  0.00000000e+00],
       [-2.85551000e-04,  9.89479070e-05, -6.60909316e-07, ...,
         6.44207637e-08,  1.50315115e-05,  0.00000000e+00],
       [ 8.77625537e-05,  4.27730011e-05, -1.24069619e-07, ...,
         0.00000000e+00,  3.81752674e-06,  0.00000000e+00]])>, <tf.Tensor: shape=(32, 2), dtype=float64, numpy=
array([[0., 1.],
       [1., 0.],
       [1., 0.],
       [1., 0.],
       [1., 0.],
       [1., 0.],
       [

2025-04-10 20:50:58.694450: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


### **Defining the Global Model**

We'll define a simple Multilayer Perceptron (MLP) model. This model will be used by all the clients as their initial local model.

In [10]:
# Define a simple MLP model using the SimpleMLP class
smlp_global = SimpleMLP()
global_model = smlp_global.build(data_list.shape[1], len(label_list[0]))

# Display the model summary to check the architecture
global_model.summary()

#### **Federated Training Loop**


Now we will implement the Federated Learning process. Each client will train a local model, and then their model weights will be aggregated to update the global model. This process repeats for a number of communication rounds.

In [15]:
# Hyperparameters
comms_round = 10  # Number of global epochs (communication rounds)
# Learning rate schedule
initial_lr = 0.01
lr_schedule = ExponentialDecay(
    initial_learning_rate=initial_lr,
    decay_steps=100,
    decay_rate=0.96,
    staircase=True
)

loss = 'categorical_crossentropy'
metrics = ['accuracy']

In [16]:
optimizer = SGD(learning_rate=lr_schedule, momentum=0.9)

In [22]:
# Enable eager execution
tf.config.run_functions_eagerly(True)  

# Start the federated learning process
for comm_round in range(comms_round):
    print(f"\n--- Communication Round {comm_round+1}/{comms_round} ---")
    
    # Get global model weights
    global_weights = global_model.get_weights()
    
    # List to collect local model weights after scaling
    scaled_local_weight_list = []
    
    # Randomize client names
    client_names = list(clients_batched.keys())
    random.shuffle(client_names)
    
    # Train each client and collect scaled weights
    for client in tqdm(client_names, desc='Training clients'):
        smlp_local = SimpleMLP()
        local_model = smlp_local.build(data_list.shape[1], len(label_list[0]))
        
        # Create a new optimizer instance for each client
        optimizer = SGD(learning_rate=lr_schedule, momentum=0.9)
        
        local_model.compile(loss=loss, optimizer=optimizer, metrics=metrics)
        
        # Set local model weights to global model's weights
        local_model.set_weights(global_weights)
        
        # Fit local model with client's data
        local_model.fit(clients_batched[client], epochs=1, verbose=0)
        
        # Scale the local model weights
        scaling_factor = weight_scalling_factor(clients_batched, client)
        scaled_weights = scale_model_weights(local_model.get_weights(), scaling_factor)
        scaled_local_weight_list.append(scaled_weights)
        
        # Clear session to free memory after each client
        K.clear_session()
    
    # Aggregate the weights from all clients
    average_weights = sum_scaled_weights(scaled_local_weight_list)
    
    # Update global model with the averaged weights
    global_model.set_weights(average_weights)
    
    # Evaluate the global model after each communication round
    print(f"Evaluating global model after round {comm_round + 1}")
    for X_test_batch, Y_test_batch in tf.data.Dataset.from_tensor_slices((X_test, y_test)).batch(len(y_test)):
        test_model(X_test_batch, Y_test_batch, global_model, comm_round)


--- Communication Round 1/10 ---


Training clients:   0%|          | 0/10 [00:00<?, ?it/s]2025-04-10 21:36:01.220653: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
Training clients:  20%|██        | 2/10 [00:13<00:53,  6.72s/it]2025-04-10 21:36:12.865736: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
Training clients:  60%|██████    | 6/10 [00:35<00:22,  5.59s/it]2025-04-10 21:36:35.025891: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
Training clients: 100%|██████████| 10/10 [00:57<00:00,  5.74s/it]


Evaluating global model after round 1
[1m 1/76[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m5s[0m 70ms/step



[1m76/76[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 19ms/step
comm_round: 0 | global_acc: 69.734% | global_loss: 0.6606508493423462

--- Communication Round 2/10 ---


Training clients:  30%|███       | 3/10 [00:18<00:43,  6.15s/it]2025-04-10 21:37:19.273813: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
Training clients: 100%|██████████| 10/10 [00:50<00:00,  5.08s/it]


Evaluating global model after round 2
[1m 5/76[0m [32m━[0m[37m━━━━━━━━━━━━━━━━━━━[0m [1m0s[0m 14ms/step



[1m76/76[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step
comm_round: 1 | global_acc: 69.734% | global_loss: 0.658727765083313

--- Communication Round 3/10 ---


Training clients:  80%|████████  | 8/10 [00:32<00:08,  4.04s/it]2025-04-10 21:38:24.812876: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
Training clients: 100%|██████████| 10/10 [00:41<00:00,  4.11s/it]


Evaluating global model after round 3
[1m 4/76[0m [32m━[0m[37m━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 18ms/step



[1m76/76[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 20ms/step
comm_round: 2 | global_acc: 69.734% | global_loss: 0.66009122133255

--- Communication Round 4/10 ---


Training clients: 100%|██████████| 10/10 [00:46<00:00,  4.65s/it]


Evaluating global model after round 4
[1m 5/76[0m [32m━[0m[37m━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 15ms/step



[1m76/76[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 16ms/step
comm_round: 3 | global_acc: 69.734% | global_loss: 0.6599183082580566

--- Communication Round 5/10 ---


Training clients: 100%|██████████| 10/10 [00:41<00:00,  4.14s/it]


Evaluating global model after round 5
[1m 4/76[0m [32m━[0m[37m━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 20ms/step



[1m76/76[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 18ms/step
comm_round: 4 | global_acc: 69.734% | global_loss: 0.6599715352058411

--- Communication Round 6/10 ---


Training clients:  70%|███████   | 7/10 [00:29<00:12,  4.12s/it]2025-04-10 21:40:40.356027: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
Training clients: 100%|██████████| 10/10 [00:41<00:00,  4.16s/it]


Evaluating global model after round 6
[1m 1/76[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m5s[0m 73ms/step



[1m76/76[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step
comm_round: 5 | global_acc: 69.734% | global_loss: 0.6585253477096558

--- Communication Round 7/10 ---


Training clients: 100%|██████████| 10/10 [00:41<00:00,  4.16s/it]


Evaluating global model after round 7
[1m 4/76[0m [32m━[0m[37m━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 23ms/step



[1m76/76[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step
comm_round: 6 | global_acc: 69.734% | global_loss: 0.659183919429779

--- Communication Round 8/10 ---


Training clients: 100%|██████████| 10/10 [00:41<00:00,  4.13s/it]


Evaluating global model after round 8
[1m 3/76[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m2s[0m 29ms/step



[1m76/76[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step
comm_round: 7 | global_acc: 69.734% | global_loss: 0.6610382199287415

--- Communication Round 9/10 ---


Training clients: 100%|██████████| 10/10 [00:41<00:00,  4.14s/it]


Evaluating global model after round 9
[1m 4/76[0m [32m━[0m[37m━━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 22ms/step



[1m76/76[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step
comm_round: 8 | global_acc: 69.734% | global_loss: 0.6596881151199341

--- Communication Round 10/10 ---


Training clients: 100%|██████████| 10/10 [00:41<00:00,  4.12s/it]


Evaluating global model after round 10
[1m 3/76[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m2s[0m 31ms/step



[1m76/76[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step
comm_round: 9 | global_acc: 69.734% | global_loss: 0.6590878963470459


### **Summary and Conclusion**

In this workshop, we explored the concept of Federated Learning by training a simple MLP model across multiple clients.

**Key Takeaways:**
- Federated Learning helps distribute the computational load.
- The model's privacy is preserved since data never leaves the client.
- Performance of Federated Learning can vary based on the number of clients and data distribution.

### Notes:

1. **Execution Order**: The notebook will be executed step by step. Students can modify parameters like `comms_round` or the number of clients to see how they affect the model's performance.
   
2. **Interactive Learning**: You can add widgets to dynamically visualize the training process, such as using `matplotlib` for plotting accuracy over communication rounds.

3. **Modularity**: If you are interested in diving deeper into parts of the code (like `batch_data`, `scale_model_weights`, etc.), they can refer to individual cells and experiments with them independently.