adap · danieljanes · Jan 25, 2024 · Nov 22, 2023 · Nov 22, 2023 · Nov 22, 2023
diff --git a/examples/flower-via-docker-compose /.gitignore b/examples/flower-via-docker-compose /.gitignore
@@ -0,0 +1,20 @@
+# ignore __pycache__ directories
+__pycache__/
+
+# ignore .pyc files
+*.pyc
+
+# ignore .vscode directory
+.vscode/
+
+# ignore mlflow and mlruns directories
+mlflow/
+mlruns/
+dataset/
+
+# ignore .npz files
+*.npz
+
+# ignore .csv files
+*.csv
+
diff --git a/examples/flower-via-docker-compose /Dockerfile b/examples/flower-via-docker-compose /Dockerfile
@@ -0,0 +1,19 @@
+# Use an official Python runtime as a parent image
+FROM python:3.8-slim-buster
+
+# Set the working directory in the container to /app
+WORKDIR /app
+
+# Copy the requirements file into the container
+COPY ./requirements.txt /app/requirements.txt
+
+# Install gcc and other dependencies
+RUN apt-get update && apt-get install -y \
+    gcc \
+    python3-dev && \
+    rm -rf /var/lib/apt/lists/*
+
+# Install any needed packages specified in requirements.txt
+RUN pip install -r requirements.txt
+
+
diff --git a/examples/flower-via-docker-compose /README.md b/examples/flower-via-docker-compose /README.md
@@ -0,0 +1,94 @@
+# Leveraging Flower and Docker for  Device Heterogeneity Management in Federated Learning
+
+
+## Introduction
+In this example, we tackle device heterogeneity in federated learning, arising from differences in memory and CPU capabilities across devices. This diversity affects training efficiency and inclusivity. Our strategy includes simulating this heterogeneity by setting CPU and memory limits in a Docker setup, using a custom docker compose generator script. This approach creates a varied training environment and enables us to develop strategies to manage these disparities effectively.
+
+
+## Handling Device Heterogeneity
+1. **System Metrics Access**:
+   - Effective management of device heterogeneity begins with monitoring system metrics of each container. We integrate the following services to achieve this:
+     - **Cadvisor**: Collects comprehensive metrics from each Docker container.
+     - **Prometheus**: Using `prometheus.yaml` for configuration, it scrapes data from Cadvisor at scheduled intervals, serving as a robust time-series database. Users can access the Prometheus UI at `http://localhost:9090` to create and run queries using PromQL, allowing for detailed insight into container performance.
+
+2. **Mitigating Heterogeneity**:
+   - In this basic use case, we address device heterogeneity by establishing rules tailored to each container's system capabilities. This involves modifying training parameters, such as batch sizes and learning rates, based on each device's memory capacity and CPU availability. These settings are specified in the `client_configs` array in the `create_docker_compose` script. For example:
+
+```python
+client_configs = [
+    {'mem_limit': '3g', 'batch_size': 32,  "cpus": 3.5, 'learning_rate': 0.001},
+    {'mem_limit': '4g', 'batch_size': 64,  "cpus": 3, 'learning_rate': 0.02},
+    {'mem_limit': '5g', 'batch_size': 128, "cpus": 2.5, 'learning_rate': 0.09},
+    {'mem_limit': '6g', 'batch_size': 256, "cpus": 1, 'learning_rate': 0.15}
+]
+```
+
+
+## Installation and Setup
+To get the project up and running, follow these steps:
+
+### Prerequisites
+Before starting, ensure the following prerequisites are met:
+
+- **Docker Installation**: Docker must be installed and the Docker daemon running on your server. If you don't already have Docker installed, you can get [installation instructions for your specific Linux distribution from Docker](https://docs.docker.com/engine/install/).
+
+
+### Step 1: Configure Docker Compose
+1. **Generate Docker Compose File**: 
+   - Execute the following command to run the `helpers/generate_docker_compose.py` script. This script creates the docker-compose configuration needed to set up the environment.
+     ```bash
+     python helpers/generate_docker_compose.py
+     ```
+   - Within the script, specify the number of clients (`total_clients`), the number of training rounds (`number_of_rounds`), and resource limitations for each client in the `client_configs` array.
+
+### Step 2: Build and Launch Containers
+1. **Execute Initialization Script**: 
+   - Run the `docker_init.sh` script to build the Docker images and start the Docker Compose process. Use the following command:
+     ```bash
+     ./docker_init.sh
+     ```
+
+2. **Services Startup**: 
+   - The script will launch several services as defined in your `docker-compose.yml` file:
+     - **Monitoring Services**: Prometheus for metrics collection, Cadvisor for container monitoring, and Grafana for data visualization.
+     - **Flower Federated Learning Environment**: The Flower server and client containers are initialized and start running.
+   - After launching the services, verify that all Docker containers are running correctly by executing the `docker ps` command. Here's an example output:
+     ```bash
+     ➜  ~ docker ps
+     CONTAINER ID   IMAGE                               COMMAND                  CREATED          STATUS                   PORTS                                            NAMES
+     72063c8968d3   flower-via-docker-compose-client3   "python client.py --…"   12 minutes ago   Up 13 seconds            0.0.0.0:6003->6003/tcp                           client3
+     77ca59fc42e6   flower-via-docker-compose-client2   "python client.py --…"   12 minutes ago   Up 13 seconds            0.0.0.0:6002->6002/tcp                           client2
+     2dc33f0b4ef6   flower-via-docker-compose-client1   "python client.py --…"   12 minutes ago   Up 13 seconds            0.0.0.0:6001->6001/tcp                           client1
+     8d87f3655476   flower-via-docker-compose-server    "python server.py --…"   12 minutes ago   Up 13 seconds            0.0.0.0:6000->6000/tcp, 0.0.0.0:8265->8265/tcp   server
+     dbcd8cf1faf1   grafana/grafana:latest              "/run.sh --config=/e…"   12 minutes ago   Up 5 minutes             0.0.0.0:3000->3000/tcp                           grafana
+     80c4a599b2a3   prom/prometheus:latest              "/bin/prometheus --c…"   12 minutes ago   Up 5 minutes             0.0.0.0:9090->9090/tcp                           prometheus
+     169880ab80bd   gcr.io/cadvisor/cadvisor:v0.47.0    "/usr/bin/cadvisor -…"   12 minutes ago   Up 5 minutes (healthy)   0.0.0.0:8080->8080/tcp                           cadvisor
+     ```
+
+3. **Automated Grafana Configuration**:
+   - Grafana is set up to automatically load pre-defined data sources and dashboards for immediate monitoring. This automation is facilitated by provisioning files: `prometheus-datasource.yml` for data sources and `default_dashboard.json` for dashboards. These files are located in the `./config/provisioning/` directory of the project and are mounted directly into the Grafana container through Docker Compose volume mappings. This ensures that upon startup, Grafana is pre-configured with the necessary settings for monitoring without any manual setup.
+
+4. **Begin Training Process**: 
+   - The federated learning training automatically begins once all client containers are successfully connected to the Flower server. This synchronizes the learning process across all participating clients.
+
+
+By following these steps, you will have a fully functional federated learning environment with device heterogeneity and monitoring capabilities.
+
+
+
+## Monitoring with Grafana
+1. **Access and Customize Grafana Dashboard**:
+   - Visit `http://localhost:3000` to enter Grafana. Thanks to the automated setup, Grafana will already have Prometheus as a data source and a pre-configured dashboard for monitoring, similar to the example provided below.
+   - You can further customize or create new dashboards as per your requirements.
+
+2. **Grafana Dashboard Example**:
+Below is an example of a Grafana dashboard showing a Bar Chart of memory usage for a specific client-container:
+
+
+<img src="public/grafana-memory-usage.png" alt="Grafana Memory Usage Histogram" width="600"/>
+
+
+This histogram offers a visual representation of the container's memory usage over time, highlighting the contrast in resource utilization between training and non-training periods. As evident from the graph, there are noticeable differences in memory consumption during active training phases compared to times when the container is not engaged in training.
+
+## Conclusion
+This project serves as a foundational example of managing device heterogeneity within the federated learning context, employing the Flower framework alongside Docker, Prometheus, and Grafana. It's designed to be a starting point for users to explore and further adapt to the complexities of device heterogeneity in federated learning environments.
diff --git a/examples/flower-via-docker-compose /client.py b/examples/flower-via-docker-compose /client.py
@@ -0,0 +1,90 @@
+import os
+import argparse
+import flwr as fl
+import tensorflow as tf
+import logging
+from helpers.load_data import load_data
+import os
+from model.model import Model
+
+logging.basicConfig(level=logging.INFO)  # Configure logging
+logger = logging.getLogger(__name__)     # Create logger for the module
+
+# Make TensorFlow log less verbose
+os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
+
+# Parse command line arguments
+parser = argparse.ArgumentParser(description='Flower client')
+
+parser.add_argument('--server_address', type=str, default="server:8080")
+parser.add_argument('--batch_size', type=int, default=32)
+parser.add_argument('--learning_rate', type=float, default=0.1)
+
+args = parser.parse_args()
+
+# Create an instance of the model and pass the learning rate as an argument
+model = Model(learning_rate=args.learning_rate)
+
+# Compile the model
+model.compile()
+
+class Client(fl.client.NumPyClient):
+    def __init__(self, args):
+        self.args = args
+
+    def get_parameters(self, config):
+        # Return the parameters of the model
+        return model.get_model().get_weights()
+
+
+    def fit(self, parameters, config):
+
+        # Set the weights of the model
+        model.get_model().set_weights(parameters)
+
+        # Load the training dataset and get the number of examples
+        train_dataset, _, num_examples_train, _ = load_data(batch_size=self.args.batch_size)
+
+        # Train the model
+        history = model.get_model().fit(train_dataset)
+
+        # Calculate evaluation metric
+        results = {
+            "accuracy": float(history.history["accuracy"][-1]),
+        }         
+
+        # Get the parameters after training
+        parameters_prime = model.get_model().get_weights()
+
+        # Directly return the parameters and the number of examples trained on
+        return parameters_prime, num_examples_train, results
+
+
+
+    def evaluate(self, parameters, config):
+
+        # Set the weights of the model
+        model.get_model().set_weights(parameters)
+
+        # Use the test dataset for evaluation
+        _, test_dataset, _, num_examples_test = load_data(batch_size=self.args.batch_size)
+
+        # Evaluate the model and get the loss and accuracy
+        loss, accuracy = model.get_model().evaluate(test_dataset)
+
+        # Return the loss, the number of examples evaluated on and the accuracy
+        return float(loss), num_examples_test, {"accuracy": float(accuracy)}
+
+
+# Function to Start the Client
+def start_fl_client():
+    try:
+        fl.client.start_numpy_client(server_address=args.server_address, client=Client(args))
+    except Exception as e:
+        logger.error("Error starting FL client: %s", e)
+        return {"status": "error", "message": str(e)}
+
+
+if __name__ == "__main__":
+    # Call the function to start the client
+    start_fl_client()
diff --git a/examples/flower-via-docker-compose /config/grafana.ini b/examples/flower-via-docker-compose /config/grafana.ini
@@ -0,0 +1,12 @@
+[security]
+allow_embedding = true
+admin_user = admin
+admin_password = admin
+
+[dashboards]
+default_home_dashboard_path = /etc/grafana/provisioning/dashboards/default_dashboard.json
+
+[auth.anonymous]
+enabled = true
+org_name = Main Org.
+org_role = Admin
diff --git a/examples/flower-via-docker-compose /config/prometheus.yml b/examples/flower-via-docker-compose /config/prometheus.yml
@@ -0,0 +1,14 @@
+
+global:
+  scrape_interval:     1s
+  evaluation_interval: 1s
+
+rule_files:
+scrape_configs:
+  - job_name: 'cadvisor'
+    scrape_interval: 1s 
+    metrics_path: '/metrics'
+    static_configs:
+      - targets: ['host.docker.internal:8080']
+        labels:
+          group: 'cadvisor'
diff --git a/examples/flower-via-docker-compose /config/provisioning/dashboards/default_dashboard.json b/examples/flower-via-docker-compose /config/provisioning/dashboards/default_dashboard.json
@@ -0,0 +1,139 @@
+{
+  "annotations": {
+    "list": [
+      {
+        "builtIn": 1,
+        "datasource": {
+          "type": "grafana",
+          "uid": "-- Grafana --"
+        },
+        "enable": true,
+        "hide": true,
+        "iconColor": "rgba(0, 211, 255, 1)",
+        "name": "Annotations & Alerts",
+        "type": "dashboard"
+      }
+    ]
+  },
+  "editable": true,
+  "fiscalYearStartMonth": 0,
+  "graphTooltip": 0,
+  "id": 2,
+  "links": [],
+  "liveNow": false,
+  "panels": [
+    {
+      "datasource": {
+        "type": "prometheus",
+        "uid": "db69454e-e558-479e-b4fc-80db52bf91da"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "color": {
+            "mode": "palette-classic"
+          },
+          "custom": {
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "fillOpacity": 80,
+            "gradientMode": "none",
+            "hideFrom": {
+              "legend": false,
+              "tooltip": false,
+              "viz": false
+            },
+            "lineWidth": 1,
+            "scaleDistribution": {
+              "type": "linear"
+            },
+            "thresholdsStyle": {
+              "mode": "off"
+            }
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {
+                "color": "green",
+                "value": null
+              },
+              {
+                "color": "red",
+                "value": 80
+              }
+            ]
+          }
+        },
+        "overrides": []
+      },
+      "gridPos": {
+        "h": 8,
+        "w": 12,
+        "x": 0,
+        "y": 0
+      },
+      "id": 1,
+      "options": {
+        "barRadius": 0,
+        "barWidth": 0.97,
+        "fullHighlight": false,
+        "groupWidth": 0.7,
+        "legend": {
+          "calcs": [],
+          "displayMode": "list",
+          "placement": "bottom",
+          "showLegend": true
+        },
+        "orientation": "auto",
+        "showValue": "auto",
+        "stacking": "none",
+        "tooltip": {
+          "mode": "single",
+          "sort": "none"
+        },
+        "xTickLabelRotation": 0,
+        "xTickLabelSpacing": 0
+      },
+      "targets": [
+        {
+          "datasource": {
+            "type": "prometheus",
+            "uid": "db69454e-e558-479e-b4fc-80db52bf91da"
+          },
+          "disableTextWrap": false,
+          "editorMode": "builder",
+          "expr": "container_memory_usage_bytes{name=\"client1\"}",
+          "fullMetaSearch": false,
+          "includeNullMetadata": true,
+          "instant": false,
+          "legendFormat": "__auto",
+          "range": true,
+          "refId": "A",
+          "useBackend": false
+        }
+      ],
+      "title": "Panel Title",
+      "type": "barchart"
+    }
+  ],
+  "refresh": "",
+  "schemaVersion": 38,
+  "style": "dark",
+  "tags": [],
+  "templating": {
+    "list": []
+  },
+  "time": {
+    "from": "now-6h",
+    "to": "now"
+  },
+  "timepicker": {},
+  "timezone": "",
+  "title": "barchart_memory",
+  "uid": "cd0d5026-20aa-4614-9dfe-0c14f1d6522f",
+  "version": 1,
+  "weekStart": ""
+}