<a href="https://colab.research.google.com/github/ambikad04/FL-PrivacyPreserving-HealthcareAnalytics/blob/main/PrivacyPreserving_HealthcareAnalytics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Libraries Used

The following libraries are essential for implementing the basic neural network classification script:

---

### Libraries Overview

- **`torch`**: The core library for tensor computations, offering support for GPU acceleration and essential operations for deep learning workflows.

- **`torch.nn`**: Provides modules and layers to construct and define neural networks (used to build the classification model).

- **`torch.optim`**: Contains optimization algorithms such as SGD and Adam, crucial for model training and weight updates.

- **`torch.nn.functional`**: Supplies a functional API for operations like activation functions and loss computations.

- **`numpy`**: A library for efficient numerical computations and array manipulations.

- **`sklearn.datasets.make_classification`**: A utility to generate synthetic datasets for binary or multiclass classification tasks.

- **`sklearn.model_selection.train_test_split`**: Splits datasets into training and testing subsets, ensuring reproducible and unbiased evaluations.

---

These libraries work together to enable the creation, training, and evaluation of a simple neural network for classification on a synthetic dataset.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

### Model Definition: Fully Connected Neural Network (FCModel)

The `FCModel` is a simple neural network designed for binary classification tasks. It works well with tabular data.

---

### **Structure**

- **Input Layer**:
  - Takes input with a size equal to the number of features (`input_dim`).

- **Hidden Layers**:
  - **First Layer**: 64 neurons with ReLU activation.
  - **Second Layer**: 32 neurons with ReLU activation.

- **Output Layer**:
  - 1 neuron with a sigmoid activation to produce a probability for binary classification.



In [2]:
# Define a Fully Connected Model for Tabular Data
class FCModel(nn.Module):
    def __init__(self, input_dim):
        super(FCModel, self).__init__()
        self.fc1 = nn.Linear(input_dim, 64)
        self.fc2 = nn.Linear(64, 32)
        self.fc3 = nn.Linear(32, 1)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = torch.sigmoid(self.fc3(x))  # Output layer for binary classification
        return x

### Creating Synthetic Healthcare Data

This part shows how to create fake data for a simple classification task, like predicting health conditions.

---

### **Data Details**

- **Features**:
  - 100 samples, each with 10 features.
  - 8 features are important for making predictions.

- **Classes**:
  - Two groups (e.g., healthy or unhealthy).

- **Splitting**:
  - The data is divided into:
    - 80% for training the model.
    - 20% for testing how well the model works.

---

### **Why This Data?**
This data is useful for practicing and testing healthcare-related prediction models.

In [3]:
# Create Synthetic Data (Healthcare)
X, y = make_classification(n_samples=100, n_features=10, n_informative=8, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


### Simulating Client Data

This section divides the dataset into smaller parts to simulate data for multiple clients, useful for federated learning or distributed training.

---

### **Data Details**

- **Training Data**:
  - The training set (`X_train` and `y_train`) is split into 5 clients.
  - Each client gets 20 samples of data.

- **Testing Data**:
  - The testing set (`X_test` and `y_test`) is split into 5 parts.
  - Each client gets 5 samples of data for testing.

- **Data Format**:
  - Each client’s data is stored as a tuple:
    - Features as `torch.tensor` (float32).
    - Labels as `torch.tensor` (float32).

---

Splitting data like this helps simulate scenarios where different clients (e.g., hospitals) have their own local data for training and testing.

In [4]:
# Simulate Client Data (Split data into 5 clients)
clients_data = [(torch.tensor(X_train[i:i+20], dtype=torch.float32), torch.tensor(y_train[i:i+20], dtype=torch.float32)) for i in range(0, 80, 20)]
clients_data_test = [(torch.tensor(X_test[i:i+5], dtype=torch.float32), torch.tensor(y_test[i:i+5], dtype=torch.float32)) for i in range(0, 20, 5)]

### Training Local Model on Each Client

This section defines a function to train a model on each client's local data for a given number of epochs.

---

### **Function Details**

- **Inputs**:
  - `model`: The neural network model to be trained.
  - `data`: The input features for training.
  - `targets`: The actual labels or targets for the data.
  - `epochs`: The number of times to iterate through the dataset (default is 5).

- **Training Process**:
  - **Loss Function**: Binary Cross-Entropy Loss (`BCELoss`) is used, suitable for binary classification.
  - **Optimizer**: Stochastic Gradient Descent (SGD) is used with a learning rate of 0.01 to adjust the model's parameters.
  - The model is set to training mode (`model.train()`).
  - For each epoch, gradients are reset, the model makes predictions, the loss is calculated, and gradients are updated.

- **Output**:
  - The function returns the model's updated state (the trained model parameters).

---

Training locally helps simulate how different clients can independently train models using their private data in federated learning.

In [5]:
# Train Local Model on Each Client
def train_local_model(model, data, targets, epochs=5):
    criterion = nn.BCELoss()
    optimizer = optim.SGD(model.parameters(), lr=0.01)
    model.train()
    for epoch in range(epochs):
        optimizer.zero_grad()
        outputs = model(data)
        loss = criterion(outputs.squeeze(), targets)
        loss.backward()
        optimizer.step()
    return model.state_dict()

### Federated Averaging with Personalization

This part defines a function to update the global model by averaging the weights from local models and adding some personalization.

---

### **Function Details**

- **Inputs**:
  - `global_model`: The main model that will be updated.
  - `local_models_weights`: A list of weights from the models trained by different clients.
  - `personalization_factor`: A value (default 0.1) that controls how much each local model affects the global model.

- **Process**:
  - The function starts by getting the current weights of the global model.
  - **Averaging**: For each weight in the global model:
    - It calculates the average of the weights from all local models.
    - Then, it updates this average by adding a small part of the local model's weights, based on the personalization factor.
  - The updated weights are then set back into the global model.

- **Output**:
  - The global model is updated with the combined and personalized weights from the local models.

---

Personalization helps each client's model keep some of its own data's characteristics, while still learning from the global model. This is helpful when clients have different kinds of data.

In [6]:
# Federated Averaging (with Personalization)
def federated_averaging_with_personalization(global_model, local_models_weights, personalization_factor=0.1):
    global_state_dict = global_model.state_dict()
    for key in global_state_dict:
        global_state_dict[key] = torch.mean(torch.stack([local_weights[key] for local_weights in local_models_weights]), dim=0)
        for local_weights in local_models_weights:
            global_state_dict[key] = global_state_dict[key] * (1 - personalization_factor) + local_weights[key] * personalization_factor
    global_model.load_state_dict(global_state_dict)

### Initializing the Global Model

This section initializes the global model, which will be used in the federated learning process.

---

### **Steps**

- **Input Dimension**:
  - The `input_dim` is set to the number of features in the training data (`X_train`).

- **Global Model**:
  - A new instance of the `FCModel` is created using the input dimension, which will define the structure of the model.

---

### **Purpose**
This global model will be updated by combining the knowledge from different local models trained by clients.

In [7]:
# Initialize the Global Model
input_dim = X_train.shape[1]  # Number of features in the dataset
global_model = FCModel(input_dim)


### Federated Learning Process (10 Rounds)

This section describes the process of training a global model over 10 rounds using federated learning.

---

### **Steps**

1. **For Each Round (10 rounds)**:
   - Print the round number (e.g., "Round 1").
   - Create an empty list to store the local model weights from each client.

2. **Local Model Training (For Each Client)**:
   - Each client uses their own data to train a local model.
   - The local model is initialized with the global model's weights.
   - The local model is trained on the client's data using the `train_local_model` function.
   - The trained model's weights are added to the list of local weights.

3. **Federated Averaging with Personalization**:
   - The `federated_averaging_with_personalization` function updates the global model by combining the local model weights, while adding some personalization.

4. **Testing the Global Model (Optional)**:
   - After updating the global model, it is tested on the test data from each client.
   - The model makes predictions, and the accuracy is calculated based on how many predictions match the true labels.

5. **Repeat for 10 Rounds**:
   - This process is repeated for 10 rounds, with the global model being updated and tested after each round.

---

### **Purpose**
This process simulates how federated learning works by training and updating the global model using data from multiple clients, while testing its performance after each round.

In [8]:
# Federated Learning Process (10 rounds)
for round in range(10):
    print(f"Round {round + 1}")
    local_models_weights = []

    # Each client trains on its local data
    for client_data, client_targets in clients_data:
        local_model = FCModel(input_dim)
        local_model.load_state_dict(global_model.state_dict())  # Initialize with global model weights
        local_weights = train_local_model(local_model, client_data, client_targets)
        local_models_weights.append(local_weights)

    # Federated Averaging with Personalization
    federated_averaging_with_personalization(global_model, local_models_weights)

    # Testing the model on client data (optional)
    global_model.eval()
    with torch.no_grad():
        correct = 0
        total = 0
        for client_data, client_targets in clients_data_test:
            outputs = global_model(client_data)
            predicted = (outputs.squeeze() > 0.5).float()
            total += client_targets.size(0)
            correct += (predicted == client_targets).sum().item()
        accuracy = correct / total
        print(f"Test Accuracy: {accuracy * 100:.2f}%")

print("Training complete.")

Round 1
Test Accuracy: 75.00%
Round 2
Test Accuracy: 70.00%
Round 3
Test Accuracy: 80.00%
Round 4
Test Accuracy: 85.00%
Round 5
Test Accuracy: 75.00%
Round 6
Test Accuracy: 75.00%
Round 7
Test Accuracy: 75.00%
Round 8
Test Accuracy: 80.00%
Round 9
Test Accuracy: 80.00%
Round 10
Test Accuracy: 80.00%
Training complete.
