# Converting Logistic Regression to Federated Learning


Logistic regression is a fundamental classification algorithm that models the probability of a binary outcome. Despite its name, it's used for classification rather than regression. The model uses the logistic (sigmoid) function to transform a linear combination of features into a probability between 0 and 1.

The Newton-Raphson method is a powerful second-order optimization technique that uses both first-order (gradient) and second-order (Hessian) information to find the optimal model parameters. Unlike first-order methods like gradient descent, Newton's method incorporates curvature information through the Hessian matrix, often leading to faster convergence, especially near the optimum.

In this section, we will convert logistics regression with the 2nd order Newton-Raphson optimization to Federated Learning


## Federated Logistic Regression with Second-Order Newton-Raphson optimization
This example shows how to implement a federated binary classification via logistic regression with second-order Newton-Raphson optimization.

The [UCI Heart Disease dataset](https://archive.ics.uci.edu/dataset/45/heart+disease) is
used in this example. Scripts are provided to download and process the
dataset as described
[here](https://github.com/owkin/FLamby/tree/main/flamby/datasets/fed_heart_disease).

This dataset contains samples from 4 sites, splitted into training and
testing sets as described below:
|site         | sample split                          |
|-------------|---------------------------------------|
|Cleveland    | train: 199 samples, test: 104 samples |
|Hungary      | train: 172 samples, test: 89 samples  |
|Switzerland  | train: 30 samples, test: 16 samples   |
|Long Beach V | train: 85 samples, test: 45 samples   |

The number of features in each sample is 13.

## Introduction

The [Newton-Raphson
optimization](https://en.wikipedia.org/wiki/Newton%27s_method) problem
can be described as follows.

In a binary classification task with logistic regression, the
probability of a data sample $x$ classified as positive is formulated
as:
$$p(x) = \sigma(\beta \cdot x + \beta_{0})$$
where $\sigma(.)$ denotes the sigmoid function. We can incorporate
$\beta_{0}$ and $\beta$ into a single parameter vector $\theta =
( \beta_{0},  \beta)$. Let $d$ be the number
of features for each data sample $x$ and let $N$ be the number of data
samples. We then have the matrix version of the above probability
equation:
$$p(X) = \sigma( X \theta )$$
Here $X$ is the matrix of all samples, with shape $N \times (d+1)$,
having it's first column filled with value 1 to account for the
intercept $\theta_{0}$.

The goal is to compute parameter vector $\theta$ that maximizes the
below likelihood function:
$$L_{\theta} = \prod_{i=1}^{N} p(x_i)^{y_i} (1 - p(x_i)^{1-y_i})$$

The Newton-Raphson method optimizes the likelihood function via
quadratic approximation. Omitting the maths, the theoretical update
formula for parameter vector $\theta$ is:
$$\theta^{n+1} = \theta^{n} - H_{\theta^{n}}^{-1} \nabla L_{\theta^{n}}$$
where
$$\nabla L_{\theta^{n}} = X^{T}(y - p(X))$$
is the gradient of the likelihood function, with $y$ being the vector
of ground truth for sample data matrix $X$,  and
$$H_{\theta^{n}} = -X^{T} D X$$
is the Hessian of the likelihood function, with $D$ a diagonal matrix
where diagonal value at $(i,i)$ is $D(i,i) = p(x_i) (1 - p(x_i))$.

In federated Newton-Raphson optimization, each client will compute its
own gradient $\nabla L_{\theta^{n}}$ and Hessian $H_{\theta^{n}}$
based on local training samples. A server will aggregate the gradients
and Hessians computed from all clients, and perform the update of
parameter $\theta$ based on the theoretical update formula described
above.

## Install requirements
First, install the required packages:

In [None]:
! pip install -r code/requirements.txt

## Download and prepare data

Execute the following script
```
bash ./code/data/prepare_heart_disease_data.sh
```
This will download the heart disease dataset under
`/tmp/flare/dataset/heart_disease_data/`

Please note that you may need to accept the data terms in order to complete the download.

In [None]:
# Note: the download site remembers your download history and aborts the 2nd download attempt. 
! echo y | bash ./code/data/prepare_heart_disease_data.sh

In [None]:
! ls -al /tmp/flare/dataset/heart_disease_data/

## Centralized Logistic Regression

Two implementations of logistic regression are provided in the
centralized training script, which can be specified by the `--solver`
argument:
- One is using `sklearn.LogisticRegression` with the `newton-cholesky`
  solver
- The other one is manually implemented using the theoretical update
  formulas described above.

Both implementations were tested to converge in 4 iterations and to
give the same result.

Launch the following script:

In [None]:
%cd code
! python3 train_centralized.py --solver custom
%cd -

## Federated Logistic Regression


To convert the centralized logistic regression to federated learning, we need to do the following:

1. Decide what model parameters will be transmitted between the server and clients
2. Define the workflow that orchestrates the federated learning process
3. Define how to load the initial model on the server side
4. Modify the client-side training logic to handle models received from the server
5. Implement the aggregation logic for the gradients and Hessians computed by the clients
6. Configure the job via FLARE's Job API

Let's examine each step.

### Model Parameters

We decided to simply capture the model parameters in the FLModel:

```python

model = FLModel(params={"gradient": gradient, "hessian": hessian})
```

We could optionally use FLModel.optimizer_params to store the Hessian, but either approach works.

We add a few metadata fields to help with the training process. We use the training sample size as the weight, storing this information in the metadata:

```python

model = FLModel(params=result_dict, params_type=ParamsType.FULL)
model.meta["sample_size"] = data["train_X"].shape[0]
```

### Server-Side Workflow

NVFlare now provides a standardized workflow for Federated Logistic Regression with Newton-Raphson optimization.
We use the `FedAvgLR` class from `nvflare.app_common.workflows.lr.fedavg`, which implements the FedAvg pattern
specifically for logistic regression.

### Model Loader & Initializer

NVFlare provides a standardized persistor for Logistic Regression models. We use the `LRModelPersistor` class from
`nvflare.app_common.workflows.lr.np_persistor`, which handles model initialization, loading, and saving.

```python
from nvflare.app_common.workflows.lr.np_persistor import LRModelPersistor

class LRModelPersistor(NPModelPersistor):
    """
    This class defines the persistor for Logistic Regression model.

    A persistor controls the logic behind initializing, loading
    and saving of the model / parameters for each round of a
    federated learning process.

    In the Logistic Regression with Newton Raphson, a model is just a
    1-D numpy vector containing the parameters for logistic
    regression. The length of the parameter vector is defined
    by the number of features in the dataset.

    """

    def __init__(self, model_dir="models", model_name="weights.npy", n_features=13):
        super().__init__()

        self.model_dir = model_dir
        self.model_name = model_name
        self.n_features = n_features

        # A default model is loaded when no local model is available.
        # This happens when training starts.
        #
        # A `model` for a binary logistic regression is just a matrix,
        # with shape (n_features + 1, 1).
        # For the UCI ML Heart Disease dataset, the n_features = 13.
        #
        # A default matrix with value 0s is created.
        #
        self.model = np.zeros((self.n_features + 1, 1), dtype=np.float32)
    
    def _get_initial_model_as_numpy(self) -> np.ndarray:
        """Fallback initializer used by NPModelPersistor when no saved model exists."""
        return self.model.copy()
```

### Aggregation Logic

Besides the `run()` method, we also need to implement custom aggregation and update functions.

```python
    def newton_raphson_aggregator_fn(self, results: List[FLModel]):
        """
        This uses the default thread-safe WeightedAggregationHelper,
        which implement a weighted average of all values received from
        a `result` dictionary.

        Args:
            results: a list of `FLModel`s. Each `FLModel` is received
                from a client. The field `params` is a dictionary that
                contains values to be aggregated: the gradient and hessian.
        """
        
        # On client side the `sample_size` key is used to track the number of samples for each client.
        for curr_result in results:
            self.aggregator.add(
                data=curr_result.params,
                weight=curr_result.meta.get("sample_size", 1.0),
                contributor_name=curr_result.meta.get("client_name", AppConstants.CLIENT_UNKNOWN),
                contribution_round=curr_result.current_round,
            )

        aggregated_dict = self.aggregator.get_result()
        
        # Compute global model update:
        # update = - damping_factor * Hessian^{-1} . Gradient
        # A regularization is added to avoid empty hessian.
        #
        reg = self.epsilon * np.eye(aggregated_dict["hessian"].shape[0])

        newton_raphson_updates = self.damping_factor * np.linalg.solve(
            aggregated_dict["hessian"] + reg, aggregated_dict["gradient"]
        )
        
        # Convert the aggregated result to `FLModel`, this `FLModel`
        # will then be used by `update_model` method from the base class,
        # to update the global model weights.
        #
        aggr_result = FLModel(
            params={"newton_raphson_updates": newton_raphson_updates},
            params_type=results[0].params_type,
            meta={
                "nr_aggregated": len(results),
                AppConstants.CURRENT_ROUND: results[0].current_round,
                AppConstants.NUM_ROUNDS: self.num_rounds,
            },
        )
        return aggr_result

    def update_model(self, model, model_update, replace_meta=True) -> FLModel:
        """
        Update logistic regression parameters based on
        aggregated gradient and hessian.

        """
        if replace_meta:
            model.meta = model_update.meta
        else:
            model.meta.update(model_update.meta)

        model.metrics = model_update.metrics
        model.params[NPConstants.NUMPY_KEY] += model_update.params["newton_raphson_updates"]

```

Again, we just need to use FLModel to store the result and update the model. 

### Client Training Logic 

Now, we need to convert the centralized training logic to the federated training logic with Client API.

The complete client-side code can be found in [code/src/newton_raphson_train.py](code/src/newton_raphson_train.py).

```python
def main():
 
    args = parse_arguments()

    flare.init()

    site_name = flare.get_site_name()
    
    # Load client site data.
    data = load_data(args.data_root, site_name)


    # keep running until the job is terminated or end of training round
    while flare.is_running():

        # Receive global model (FLModel) from server.
        global_model = flare.receive()

        # Get the weights, aka parameter theta for logistic regression.
        global_weights = global_model.params["weights"]

        # Local validation before training
        validation_scores = validate(data, global_weights)

        # Local training
        result_dict = train_newton_raphson(data, theta=global_weights)

        # Send result to server for aggregation.
        local_model = FLModel(params=result_dict, params_type=ParamsType.FULL)
        local_model.meta["sample_size"] = data["train_X"].shape[0]

        flare.send(local_model)
```

This is pretty straight forward. We receive the global model, perform the local training and send the result to the server. The code structure is the same to the centralized training with additional loop for the federated training. 

We added the sample size to the meta data so we can use it in weighted aggregation as the aggregation weight.

### Job Configuration with Recipe API

With the above steps, we have converted the centralized training to the federated training. 

Now, let's connect the pieces together using NVFlare's **Recipe API**, which provides a simple, declarative way to configure federated learning jobs.

NVFlare provides a pre-built recipe for Federated Logistic Regression with Newton-Raphson: `FedAvgLrRecipe`. 
This recipe encapsulates all the complexity of setting up the workflow, persistor, and client runners.

```python
from nvflare.app_common.np.recipes.lr.fedavg import FedAvgLrRecipe
from nvflare.recipe import SimEnv

    n_clients = 4
    num_rounds = 5
    data_root = "/tmp/flare/dataset/heart_disease_data"

    # Create FedAvgLrRecipe for Logistic Regression with Newton-Raphson
    recipe = FedAvgLrRecipe(
        min_clients=n_clients,
        name="newton_raphson_fedavg",
        num_rounds=num_rounds,
        damping_factor=0.8,
        num_features=13,
        train_script="src/newton_raphson_train.py",
        train_args=f"--data_root {data_root}",
        launch_external_process=True,
    )

    # Execute the recipe in simulation environment
    env = SimEnv(num_clients=n_clients, num_threads=n_clients, workspace_root="/tmp/nvflare/jobs/workdir")
    run = recipe.execute(env)
    result_location = run.get_result()
    print(f"Result location: {result_location}")
```

The recipe approach is much simpler than using the Job API directly:
- **No manual component wiring**: The recipe automatically sets up the `FedAvgLR` controller, `LRModelPersistor`, and client runners
- **Fewer lines of code**: Just specify the essential parameters
- **Built-in best practices**: The recipe uses the standardized implementations from `nvflare.app_common.workflows.lr`
- **Flexible execution**: Can easily switch between `SimEnv` (simulator) and `PocEnv` (proof of concept) environments

Under the hood, the recipe uses:
- `FedAvgLR` workflow from `nvflare.app_common.workflows.lr.fedavg`
- `LRModelPersistor` from `nvflare.app_common.workflows.lr.np_persistor`
- Proper configuration for Newton-Raphson optimization with the specified parameters 

That's it, we have converted a logistic regression example to a federated job! 

## Running Federated Logistic Regression Job

Execute the following command to launch federated logistic
regression. This will run in `nvflare`'s simulator mode.


In [None]:
! cd code && python lr_fl_job.py



Accuracy and precision for each site can be viewed in Tensorboard:
```
tensorboard --logdir=/tmp/nvflare/jobs/workdir/server/simulate_job/tb_events/
```
As can be seen from the figure below, per-site evaluation metrics in federated logistic regression are on-par with the centralized version.

<img src="./code/figs/tb-metrics.png" alt="Tensorboard metrics server"/>

In [None]:
%load_ext tensorboard

In [None]:
%tensorboard --logdir=/tmp/nvflare/jobs/workdir/server/simulate_job/tb_events/ --bind_all

Now that we have converted the centralized logistic regression to federated learning, let's move on to [federated K-Means](../02.4.2_convert_kmeans_to_federated_learning/convert_kmeans_to_fl.ipynb).