# Converting Logistic Regression to Federated Learning


Logistic regression is a fundamental classification algorithm that models the probability of a binary outcome. Despite its name, it's used for classification rather than regression. The model uses the logistic (sigmoid) function to transform a linear combination of features into a probability between 0 and 1.

The Newton-Raphson method is a powerful second-order optimization technique that uses both first-order (gradient) and second-order (Hessian) information to find the optimal model parameters. Unlike first-order methods like gradient descent, Newton's method incorporates curvature information through the Hessian matrix, often leading to faster convergence, especially near the optimum.

In this section, we will convert logistics regression with the 2nd order Newton-Raphson optimization to Federated Learning


## Federated Logistic Regression with Second-Order Newton-Raphson optimization
This example shows how to implement a federated binary classification via logistic regression with second-order Newton-Raphson optimization.

The [UCI Heart Disease dataset](https://archive.ics.uci.edu/dataset/45/heart+disease) is
used in this example. Scripts are provided to download and process the
dataset as described
[here](https://github.com/owkin/FLamby/tree/main/flamby/datasets/fed_heart_disease).

This dataset contains samples from 4 sites, splitted into training and
testing sets as described below:
|site         | sample split                          |
|-------------|---------------------------------------|
|Cleveland    | train: 199 samples, test: 104 samples |
|Hungary      | train: 172 samples, test: 89 samples  |
|Switzerland  | train: 30 samples, test: 16 samples   |
|Long Beach V | train: 85 samples, test: 45 samples   |

The number of features in each sample is 13.

## Introduction

The [Newton-Raphson
optimization](https://en.wikipedia.org/wiki/Newton%27s_method) problem
can be described as follows.

In a binary classification task with logistic regression, the
probability of a data sample $x$ classified as positive is formulated
as:
$$p(x) = \sigma(\beta \cdot x + \beta_{0})$$
where $\sigma(.)$ denotes the sigmoid function. We can incorporate
$\beta_{0}$ and $\beta$ into a single parameter vector $\theta =
( \beta_{0},  \beta)$. Let $d$ be the number
of features for each data sample $x$ and let $N$ be the number of data
samples. We then have the matrix version of the above probability
equation:
$$p(X) = \sigma( X \theta )$$
Here $X$ is the matrix of all samples, with shape $N \times (d+1)$,
having it's first column filled with value 1 to account for the
intercept $\theta_{0}$.

The goal is to compute parameter vector $\theta$ that maximizes the
below likelihood function:
$$L_{\theta} = \prod_{i=1}^{N} p(x_i)^{y_i} (1 - p(x_i)^{1-y_i})$$

The Newton-Raphson method optimizes the likelihood function via
quadratic approximation. Omitting the maths, the theoretical update
formula for parameter vector $\theta$ is:
$$\theta^{n+1} = \theta^{n} - H_{\theta^{n}}^{-1} \nabla L_{\theta^{n}}$$
where
$$\nabla L_{\theta^{n}} = X^{T}(y - p(X))$$
is the gradient of the likelihood function, with $y$ being the vector
of ground truth for sample data matrix $X$,  and
$$H_{\theta^{n}} = -X^{T} D X$$
is the Hessian of the likelihood function, with $D$ a diagonal matrix
where diagonal value at $(i,i)$ is $D(i,i) = p(x_i) (1 - p(x_i))$.

In federated Newton-Raphson optimization, each client will compute its
own gradient $\nabla L_{\theta^{n}}$ and Hessian $H_{\theta^{n}}$
based on local training samples. A server will aggregate the gradients
and Hessians computed from all clients, and perform the update of
parameter $\theta$ based on the theoretical update formula described
above.

## Implementation

Using `nvflare`, The federated logistic regression with Newton-Raphson
optimization is implemented as follows.

On the server side, all workflow logics are implemented in
class `FedAvgNewtonRaphson`, which can be found
[here](code/newton_raphson/app/custom/newton_raphson_workflow.py). The
`FedAvgNewtonRaphson` class inherits from the
[`BaseFedAvg`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_common/workflows/base_fedavg.py)
class, which itself inherits from the **ModelController**
([`ModelController`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_common/workflows/model_controller.py))
class. This is the preferrable approach to implement a custom
workflow, since `ModelController` decouples communication logic from
actual workflow (training & validation) logic. The mandatory
method to override in `ModelController` is the
[`run()`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_common/workflows/model_controller.py#L37)
method, where the orchestration of server-side workflow actually
happens. The implementation of `run()` method in
[`FedAvgNewtonRaphson`](code/newton_raphson/app/custom/newton_raphson_workflow.py)
is similar to the classic
[`FedAvg`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_common/workflows/fedavg.py#L44):
- Initialize the global model, this is acheived through method `load_model()`
  from base class
  [`ModelController`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_common/workflows/model_controller.py#L292),
  which relies on the
  [`ModelPersistor`](https://nvflare.readthedocs.io/en/main/glossary.html#persistor). A
  custom
  [`NewtonRaphsonModelPersistor`](code/newton_raphson/app/custom/newton_raphson_persistor.py)
  is implemented in this example, which is based on the
  [`NPModelPersistor`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_common/np/np_model_persistor.py)
  for numpy data, since the _model_ in the case of logistic regression
  is just the parameter vector $\theta$ that can be represented by a
  numpy array. Only the `__init__` method needs to be re-implemented
  to provide a proper initialization for the global parameter vector
  $\theta$.
- During each training round, the global model will be sent to the
  list of participating clients to perform a training task. This is
  done using the
  [`send_model_and_wait()`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_common/workflows/model_controller.py#L41)
  method. Once
  the clients finish their local training, results will be collected
  and sent back to server as
  [`FLModel`](https://nvflare.readthedocs.io/en/main/programming_guide/fl_model.html#flmodel)s.
- Results sent by clients contain their locally computed gradient and
  Hessian. A [custom aggregation
  function](code/newton_raphson/app/custom/newton_raphson_workflow.py)
  is implemented to get the averaged gradient and Hessian, and compute
  the Newton-Raphson update for the global parameter vector $\theta$,
  based on the theoretical formula shown above. The averaging of
  gradient and Hessian is based on the
  [`WeightedAggregationHelper`](https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_common/aggregators/weighted_aggregation_helper.py#L20),
  which weighs the contribution from each client based on the number
  of local training samples. The aggregated Newton-Raphson update is
  returned as an `FLModel`.
- After getting the aggregated Newton-Raphson update, an
  [`update_model()`](code/newton_raphson/app/custom/newton_raphson_workflow.py#L172)
  method is implemented to actually apply the Newton-Raphson update to
  the global model.
- The last step is to save the updated global model, again through
  the `NewtonRaphsonModelPersistor` using `save_model()`.


On the client side, the local training logic is implemented
[here](code/newton_raphson/app/custom/newton_raphson_train.py). The
implementation is based on the [`Client
API`](https://nvflare.readthedocs.io/en/main/programming_guide/execution_api_type.html#client-api). This
allows user to add minimum `nvflare`-specific code to turn a typical
centralized training script into a federated client side local training
script.
- During local training, each client receives a copy of the global
  model, sent by the server, using `flare.receive()` from the Client API.
  The received global model is an instance of `FLModel`.
- A local validation is first performed, where validation metrics
  (accuracy and precision) are streamed to server using the
  [`SummaryWriter`](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.client.tracking.html#nvflare.client.tracking.SummaryWriter). The
  streamed metrics can be loaded and visualized using tensorboard.
- Then each client computes it's gradient and Hessian based on local
  training data, using their respective theoretical formula described
  above. This is implemented in the
  [`train_newton_raphson()`](code/newton_raphson/app/custom/newton_raphson_train.py#L82)
  method. Each client then sends the computed results (always in
  `FLModel` format) to server for aggregation, using the Client API call
  `flare.send()`.

Each client site corresponds to a site listed in the data table above.

A [centralized training script](code/train_centralized.py) is also
provided, which allows for comparing the federated Newton-Raphson
optimization versus the centralized version. In the centralized
version, training data samples from all 4 sites were concatenated into
a single matrix, used to optimize the model parameters. The
optimized model was then tested separately on testing data samples of
the 4 sites, using accuracy and precision as metrics.

Comparing the federated [client-side training
code](code/newton_raphson/app/custom/newton_raphson_train.py) with the
centralized [training code](code/train_centralized.py), we can see that
the training logic remains similar: load data, perform training
(Newton-Raphson updates), and valid trained model. The only added
differences in the federated code are related to interaction with the
FL system, such as receiving and send `FLModel`.

## Install requirements
First, install the required packages:

In [None]:
%pip install -r code/requirements.txt

## Download and prepare data

Execute the following script
```
bash ./code/data/prepare_heart_disease_data.sh
```
This will download the heart disease dataset under
`/tmp/flare/dataset/heart_disease_data/`

Please note that you may need to accept the data terms in order to complete the download.

In [None]:
# Note: the the download site remember your download history and abort the 2nd download attempt. 

! echo y | bash ./code/data/prepare_heart_disease_data.sh



In [None]:
! ls -al /tmp/flare/dataset/heart_disease_data/

## Centralized Logistic Regression

Two implementations of logistic regression are provided in the
centralized training script, which can be specified by the `--solver`
argument:
- One is using `sklearn.LogisticRegression` with the `newton-cholesky`
  solver
- The other one is manually implemented using the theoretical update
  formulas described above.

Both implementations were tested to converge in 4 iterations and to
give the same result.

Launch the following script:

In [None]:
%cd code
! python3 train_centralized.py --solver custom

%cd -

## Federated Logistic Regression


To convert the centralized logistic regression to federated learning, we need to do the following:

1. Decide what model parameters will be transmitted between the server and clients
2. Define the workflow that orchestrates the federated learning process
3. Define how to load the initial model on the server side
4. Modify the client-side training logic to handle models received from the server
5. Implement the aggregation logic for the gradients and Hessians computed by the clients
6. Configure the job via FLARE job API

Let's examine each step.

### Model Parameters

We decided to simply capture the model parameters in the FLModel:


```python

model = FLModel(params={"gradient": gradient, "hessian": hessian})
```

We could optionally use FLModel.optimizer_params to store the Hessian, but either approach works.

We add a few metadata fields to help with the training process. We use the training sample size as the weight, storing this information in the metadata:

```python

model = FLModel(params=result_dict, params_type=ParamsType.FULL)
model.meta["sample_size"] = data["train_X"].shape[0]
```

### Workflow

We decided to choose the FedAvg type of scatter and gather workflow. So we can based the class using the `BaseFedAvg` class. 

```python

class FedAvgNewtonRaphson(BaseFedAvg):

    def __init__(self, damping_factor, epsilon=1.0, *args, **kwargs):
        super().__init__(*args, **kwargs)
        """
    Args:
        damping_factor: damping factor for Newton Raphson updates.
        epsilon: a regularization factor to avoid empty hessian for
            matrix inversion
    """
        self.damping_factor = damping_factor
        self.epsilon = epsilon
        self.aggregator = WeightedAggregationHelper()

    def run(self) -> None:
        
        # First load the model and set up some training params.
        # A `persisitor` (NewtonRaphsonModelPersistor) will load
        # the model in `ModelLearnable` format, then will be
        # converted `FLModel` by `ModelController`.
        #
        model = self.load_model()

        model.start_round = self.start_round
        model.total_rounds = self.num_rounds

       
        for self.current_round in range(self.start_round, self.start_round + self.num_rounds):

            # Get the list of clients.
            clients = self.sample_clients(self.num_clients)

            model.current_round = self.current_round

            results = self.send_model_and_wait(targets=clients, data=model)

            # Aggregate results receieved from clients.
            aggregate_results = self.aggregate(results, aggregate_fn=self.newton_raphson_aggregator_fn)

            # Update global model based on the following formula:
            # weights = weights + updates, where
            # updates = -damping_factor * Hessian^{-1} . Gradient
            self.update_model(model, aggregate_results)

            # Save global model.
            self.save_model(model)

        self.info("Finished FedAvg.")

```
As you can see the `run()` method is the only method we need to implement. Its nothing but a for loop that sends the model to the clients and aggregate the results. 

### Model Loader

we need to decide how to load the initial model on the server side. We decide to implement a custom persistor that loads the model from a numpy file. 

```python

class NewtonRaphsonModelPersistor(NPModelPersistor):
    """
    This class defines the persistor for Newton Raphson model.

    A persistor controls the logic behind initializing, loading
    and saving of the model / parameters for each round of a
    federated learning process.

    In the 2nd order Newton Raphson case, a model is just a
    1-D numpy vector containing the parameters for logistic
    regression. The length of the parameter vector is defined
    by the number of features in the dataset.

    """

    def __init__(self, model_dir="models", model_name="weights.npy", n_features=13):
        super().__init__()

        self.model_dir = model_dir
        self.model_name = model_name
        self.n_features = n_features

        # A default model is loaded when no local model is available.
        # This happen when training starts.
        #
        # A `model` for a binary logistic regression is just a matrix,
        # with shape (n_features + 1, 1).
        # For the UCI ML Heart Disease dataset, the n_features = 13.
        #
        # A default matrix with value 0s is created.
        #
        self.default_data = np.zeros((self.n_features + 1, 1), dtype=np.float32)

```


### Client Training Logic 

Now, we need to convert the centralized training logic to the federated training logic with Client API.

```python


def main():
 
    args = parse_arguments()

    flare.init()

    site_name = flare.get_site_name()
    
    # Load client site data.
    data = load_data(args.data_root, site_name)


    # keep running until the job is terminated or end of training round
    while flare.is_running():

        # Receive global model (FLModel) from server.
        global_model = flare.receive()

        # Get the weights, aka parameter theta for logistic regression.
        global_weights = global_model.params["weights"]

        # Local validation before training
        validation_scores = validate(data, global_weights)

        # Local training
        result_dict = train_newton_raphson(data, theta=global_weights)

        # Send result to server for aggregation.
        local_model = FLModel(params=result_dict, params_type=ParamsType.FULL)
        local_model.meta["sample_size"] = data["train_X"].shape[0]

        flare.send(local_model)

```

This is pretty straight forward. We receive the global model, perform the local training and send the result to the server. The code structure is the same to the centralized training with additional loop for the federated training. 

We added the sample size to the meta data so we can use it in weighted aggregation as the aggregation weight.


### Aggregation Logic

Now, lets loop at the aggregation logic. 

```python

    def newton_raphson_aggregator_fn(self, results: List[FLModel]):
        """
        This uses the default thread-safe WeightedAggregationHelper,
        which implement a weighted average of all values received from
        a `result` dictionary.

        Args:
            results: a list of `FLModel`s. Each `FLModel` is received
                from a client. The field `params` is a dictionary that
                contains values to be aggregated: the gradient and hessian.
        """
        
        # On client side the `sample_size` key is used to track the number of samples for each client.
        for curr_result in results:
            self.aggregator.add(
                data=curr_result.params,
                weight=curr_result.meta.get("sample_size", 1.0),
                contributor_name=curr_result.meta.get("client_name", AppConstants.CLIENT_UNKNOWN),
                contribution_round=curr_result.current_round,
            )

        aggregated_dict = self.aggregator.get_result()
        
        # Compute global model update:
        # update = - damping_factor * Hessian^{-1} . Gradient
        # A regularization is added to avoid empty hessian.
        #
        reg = self.epsilon * np.eye(aggregated_dict["hessian"].shape[0])

        newton_raphson_updates = self.damping_factor * np.linalg.solve(
            aggregated_dict["hessian"] + reg, aggregated_dict["gradient"]
        )
        
        # Convert the aggregated result to `FLModel`, this `FLModel`
        # will then be used by `update_model` method from the base class,
        # to update the global model weights.
        #
        aggr_result = FLModel(
            params={"newton_raphson_updates": newton_raphson_updates},
            params_type=results[0].params_type,
            meta={
                "nr_aggregated": len(results),
                AppConstants.CURRENT_ROUND: results[0].current_round,
                AppConstants.NUM_ROUNDS: self.num_rounds,
            },
        )
        return aggr_result

    def update_model(self, model, model_update, replace_meta=True) -> FLModel:
        """
        Update logistic regression parameters based on
        aggregated gradient and hessian.

        """
        if replace_meta:
            model.meta = model_update.meta
        else:
            model.meta.update(model_update.meta)

        model.metrics = model_update.metrics
        model.params[NPConstants.NUMPY_KEY] += model_update.params["newton_raphson_updates"]

```
Again, we just need to use FLModel to store the result and update the model. 


### Job Configuration

With the above steps, we have converted the centralized training to the federated training. 

Now, lets connect the pieces together and define the job configuration and run with simulator. 

In this example, we decided to sub-process instead of in-process training. 

We manually define the job configuration and run with simulator. 

#### server job configuration

The key is defined a workflow ```FedAvgNewtonRaphson``` and corresponding arguments: number round, clients and damping factor. 


In [None]:
! cat code/newton_raphson/app/config/config_fed_server.json

#### client job configuration

Notice that we used the ClientAPILauncherExecutor with a Cell Pipe, we also need a separate pipe for metrics relay

In [None]:
! cat code/newton_raphson/app/config/config_fed_client.json

## Running Federated Logistic Regression Job

Execute the following command to launch federated logistic
regression. This will run in `nvflare`'s simulator mode.


In [None]:
! nvflare simulator -w /tmp/nvflare/job/lr/workspace -n 4 -t 4 code/newton_raphson/



Accuracy and precision for each site can be viewed in Tensorboard:
```
tensorboard --logdir=/tmp/nvflare/job/lr/workspace/server/simulate_job/tb_events
```
As can be seen from the figure below, per-site evaluation metrics in
federated logistic regression are on-par with the centralized version.

<img src="./code/figs/tb-metrics.png" alt="Tensorboard metrics server"/>


In [None]:
! tensorboard --logdir=/tmp/nvflare/job/lr/workspace/server/simulate_job/tb_events

Now that we have converted the centralized logistic regression to federated learning, let's move on to the next example.