**How It Works**

**Client-Specific Data**

Each data center (simulated as a client) trains its local model using its own data. The models are based on the specific characteristics and patterns observed within each data center. This approach helps prevent data leakage and preserves privacy by ensuring that sensitive data never leaves the local environment.

Federated Learning Process

	1.	Training: Clients train their models locally on their own data, using the Decision Tree Classifier.

	2.	Aggregation: After training, the server aggregates the models by averaging the feature importances from each
  client (which is a form of parameter aggregation in federated learning).

	3.	Evaluation: Each client evaluates the model locally using a test set to determine its accuracy.

**1. Set up the environment**

In [17]:
!pip install flower

Collecting flower
  Downloading flower-2.0.1-py2.py3-none-any.whl.metadata (4.5 kB)
Collecting celery>=5.0.5 (from flower)
  Downloading celery-5.5.3-py3-none-any.whl.metadata (22 kB)
Collecting billiard<5.0,>=4.2.1 (from celery>=5.0.5->flower)
  Downloading billiard-4.2.1-py3-none-any.whl.metadata (4.4 kB)
Collecting kombu<5.6,>=5.5.2 (from celery>=5.0.5->flower)
  Downloading kombu-5.5.4-py3-none-any.whl.metadata (3.5 kB)
Collecting vine<6.0,>=5.1.0 (from celery>=5.0.5->flower)
  Downloading vine-5.1.0-py3-none-any.whl.metadata (2.7 kB)
Collecting click-didyoumean>=0.3.0 (from celery>=5.0.5->flower)
  Downloading click_didyoumean-0.3.1-py3-none-any.whl.metadata (3.9 kB)
Collecting click-repl>=0.2.0 (from celery>=5.0.5->flower)
  Downloading click_repl-0.3.0-py3-none-any.whl.metadata (3.6 kB)
Collecting click-plugins>=1.1.1 (from celery>=5.0.5->flower)
  Downloading click_plugins-1.1.1.2-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting amqp<6.0.0,>=5.1.1 (from kombu<5.6,>=5.5.2->cele

**Step 1: Load and preprocess data (Use only a very small sample of the dataset)**

In [None]:
import numpy as np
from sklearn.tree import DecisionTreeClassifier  # Simpler classifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
import zipfile
import gzip
import pandas as pd

zip_file_path = "kddcup.data_10_percent.gz.zip"
gz_file_path = "kddcup.data_10_percent.gz"

# Unzipping the file
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extractall()

# Read the .gz file
with gzip.open(gz_file_path, 'rt') as f:
    df = pd.read_csv(f, header=None)

# Use a very smaller sample (e.g., 0.1% of the data for faster processing)
df = df.sample(frac=0.01, random_state=42)

# Preprocess the dataset
df[41] = LabelEncoder().fit(df[41]).transform(df[41])  # Convert labels to integers
categorical_columns = [1, 2, 3]  # These columns are categorical (protocol_type, service, flag)
label_encoder = LabelEncoder()

for col in categorical_columns:
    df[col] = label_encoder.fit_transform(df[col])

X = df.iloc[:, :-1]  # Features (all columns except the last)
y = df.iloc[:, -1]   # Target (last column)

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

**Step 2: Define the Flower Client**

In [None]:
class FedClient:
    def __init__(self, X_train, y_train, X_test, y_test):
        self.X_train = X_train
        self.y_train = y_train
        self.X_test = X_test
        self.y_test = y_test
        self.model = DecisionTreeClassifier()  # Simpler model (Decision Tree)
        self.model.fit(self.X_train, self.y_train)  # Fit the model to access feature importance

    def get_parameters(self):
        # For simplicity, we'll return the feature importances as model parameters
        return self.model.feature_importances_

    def set_parameters(self, parameters):
        # Manually setting the parameters (in practice, you would set model weights here)
        pass

    def fit(self, parameters):
        # Fit the model locally
        self.model.fit(self.X_train, self.y_train)

    def evaluate(self):
        # Evaluate the model accuracy
        y_pred = self.model.predict(self.X_test)
        accuracy = accuracy_score(self.y_test, y_pred)
        return accuracy

**Step 3: Simulate federated learning with 5 clients**

In [None]:
clients = [
    FedClient(X_train, y_train, X_test, y_test),
    FedClient(X_train, y_train, X_test, y_test),
    FedClient(X_train, y_train, X_test, y_test),
    FedClient(X_train, y_train, X_test, y_test),
    FedClient(X_train, y_train, X_test, y_test),
]

**Step 4: Run the local simulation without Flower communication**

In [34]:
for round_num in range(1):  # Run only 1 round for faster testing
    print(f"Round {round_num + 1}")

    # Fit all clients locally (this simulates client-side model training)
    for client in clients:
        client.fit(client.get_parameters())

    # Evaluate the clients
    accuracies = []
    for client in clients:
        accuracy = client.evaluate()
        accuracies.append(accuracy)
        print(f"Client accuracy: {accuracy}")

    # Simulate model aggregation (average of client parameters)
    # In a real federated learning system, you would aggregate weights or gradients
    global_params = np.mean([client.get_parameters() for client in clients], axis=0)
    print(f"Global model parameters (aggregated): {global_params}")

Round 1
Client accuracy: 0.9898785425101214
Client accuracy: 0.9932523616734144
Client accuracy: 0.9912280701754386
Client accuracy: 0.9939271255060729
Client accuracy: 0.9885290148448043
Global model parameters (aggregated): [9.71607813e-04 3.89251010e-04 1.67548185e-03 4.77030020e-04
 1.94350342e-02 1.58002218e-03 0.00000000e+00 2.89429903e-03
 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
 0.00000000e+00 0.00000000e+00 7.07718267e-04 6.10944730e-01
 4.49967151e-04 4.49967151e-04 3.62088416e-04 0.00000000e+00
 3.32029800e-01 1.12102824e-02 0.00000000e+00 1.73912163e-03
 5.43132624e-04 2.46878465e-04 1.57981918e-03 1.72485754e-03
 7.22208660e-03 4.49967151e-04 8.99934302e-04 1.22092623e-03
 7.95996021e-04]
