# Quorum and Reconnect support for dropped out parties in IBM FL

## Outline:
- [Add conda environment to Jupyter Notebook](#setup)
- [Federated Learning(FL)](#intro)
- [Aggregator](#Aggregator)
    - [Aggregator Configuration](#Aggregator-Configuration)
    - [Running the Aggregator](#Running-the-Aggregator)
- [Paries](#Parties)
    - [Parties Configuration](#Parties-Configuration)
    - [Running the Parties](#Running-the-Parties)
- [Training](#Training)
- [Quorum](#Quorum)
- [Rejoin](#Rejoin)
- [Failed Quorum](#Failed-Quorum)
- [Shut Down](#Shut-Down)

## Add conda environment to Jupyter Notebook <a name="setup"></a>

Please ensure that you have activated the `conda` environment following the instructions in the project README.

Once done, run the following commands in your terminal to install your conda environment into the Jupyter Notebook:

1. Once you have activated the conda environment, install the `ipykernel` package: `conda install -c anaconda ipykernel`

2. Next, install the `ipykernel` module within Jupyter Notebook: `python -m ipykernel install --user --name=<conda_env>`

3. Please install the `matplotlib` package for your conda environment.

4. Finally, restart the jupyter notebook once done. Ensure that you are running this Notebook from `<project_path>/Notebooks`, where project_path is the directory where the IBMFL repository was cloned.

When the Notebook is up and running it may prompt you to choose the kernel. Use the drop down to choose the kernel name same as that chosen when running `conda activate <conda_env>`. If no prompt shows up, you can change the kernel by clicking _Kernel_ > _Change kernel_ > _`<conda_env>`_.

## Federated Learning (FL) <a name="intro"></a>

**Federated Learning (FL)** is a distributed machine learning process in which each participant node (or party) retains their data locally and interacts with  other participants via a learning protocol. 
One main driver behind FL is the need to not share data with others  due to privacy and confidentially concerns.
Another driver is to improve the speed of training a machine learning model by leveraging other participants' training processes.

Setting up such a federated learning system requires setting up a communication infrastructure, converting machine learning algorithms to federated settings and in some cases knowing about the intricacies of security and privacy enabling techniques such as differential privacy and multi-party computation. 

In this Notebook we use [IBM FL](https://github.com/IBM/federated-learning-lib) to have multiple parties train a classifier to recognise handwritten digits in the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). 

For a more technical dive into IBM FL, refer the whitepaper [here](https://arxiv.org/pdf/2007.10987.pdf).

In the following cells, we set up each of the components of a Federated Learning network (See Figure below) wherein all involved parties aid in training their respective local dataset. The goal of this notebook is to show how 'Quorum' and "Reconnect' work in IBMFL. In this notebook we default to 4 parties, but depending on your resources you may use more parties.

<img style="display=block; margin:auto" src="../images/FL_Network.png" width="720"/>
<p style="text-align: center">Modified from Image Source: <a href="https://arxiv.org/pdf/2007.10987.pdf">IBM Federated Learning: An Enterprise FrameworkWhite Paper V0.1</a></p>

The problem at hand is to recognize digits from these tens of thousands of handwritten images. In this notebook, we are going to assume that the parties and the aggregator are run in the same machine. For that purpose, we first randomly split the training data to each party. Then, we define the neural network definition. After that we start the aggregator and we need to register and run all parties by running two other notebooks.

### Getting things ready
We begin by setting the number of parties that will participate in the federated learning run and splitting up the data among them.

In [1]:
import sys
sys.path.append('../..')
import os
os.chdir("../..")

num_parties = 4  ## number of participating parties
dataset = 'mnist'

We use `examples/generate_data.py` to split the dataset into files for each party. 

The script allows specifying the number of parties as well as the dataset to use (from several supported datasets: _mnist_, _femnist_, _cifar10_ and many others). 

The `-pp` argument states how many data points to choose per party. If the option `--stratify` is given, the library stratifies the data proportionally according to the source distribution. If you want to run this notebook in different machines, you can assign samples for each party locally. Then, we define the neural network definition.

In [2]:
%run examples/generate_data.py -n $num_parties -d $dataset -pp 200 

Party_ 0
nb_x_train:  (200, 28, 28) nb_x_test:  (2500, 28, 28)
* Label  0  samples:  17
* Label  1  samples:  16
* Label  2  samples:  18
* Label  3  samples:  23
* Label  4  samples:  26
* Label  5  samples:  13
* Label  6  samples:  30
* Label  7  samples:  20
* Label  8  samples:  13
* Label  9  samples:  24
Finished! :) Data saved in  examples/data/mnist/random
Party_ 1
nb_x_train:  (200, 28, 28) nb_x_test:  (2500, 28, 28)
* Label  0  samples:  25
* Label  1  samples:  20
* Label  2  samples:  14
* Label  3  samples:  18
* Label  4  samples:  30
* Label  5  samples:  20
* Label  6  samples:  20
* Label  7  samples:  20
* Label  8  samples:  19
* Label  9  samples:  14
Finished! :) Data saved in  examples/data/mnist/random
Party_ 2
nb_x_train:  (200, 28, 28) nb_x_test:  (2500, 28, 28)
* Label  0  samples:  21
* Label  1  samples:  30
* Label  2  samples:  18
* Label  3  samples:  15
* Label  4  samples:  19
* Label  5  samples:  23
* Label  6  samples:  25
* Label  7  samples:  10
*

Next we generate model by defining model. Please note that this step can be automated using generate_config script as shown in the IBM FL tutorial page [here](https://github.com/IBM/federated-learning-lib/blob/main/README.md#supported-functionality).

In [2]:
import os
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras import Model

def save_model_config(folder_configs):
    class MyModel(Model):
        def __init__(self):
            super(MyModel, self).__init__()
            self.conv1 = Conv2D(32, 3, activation='relu')
            self.flatten = Flatten()
            self.d1 = Dense(128, activation='relu')
            self.d2 = Dense(10)

        def call(self, x):
            x = self.conv1(x)
            x = self.flatten(x)
            x = self.d1(x)
            return self.d2(x)

    # Create an instance of the model
    model = MyModel()
    loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
        from_logits=True)
    optimizer = tf.keras.optimizers.Adam()
    acc = tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy')
    model.compile(optimizer=optimizer, loss=loss_object, metrics=[acc])
    img_rows, img_cols = 28, 28
    input_shape = (None, img_rows, img_cols, 1)
    model.compute_output_shape(input_shape=input_shape)

    if not os.path.exists(folder_configs):
        os.makedirs(folder_configs)

    model.save(folder_configs)

    spec = {'model_name': 'tf-cnn',
            'model_definition': folder_configs}

    model = {
        'name': 'TensorFlowFLModel',
        'path': 'ibmfl.model.tensorflow_fl_model',
        'spec': spec
    }

    return model

save_model_config('examples/configs/iter_avg/tf/')

Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Assets written to: examples/configs/iter_avg/tf/assets


{'name': 'TensorFlowFLModel',
 'path': 'ibmfl.model.tensorflow_fl_model',
 'spec': {'model_name': 'tf-cnn',
  'model_definition': 'examples/configs/iter_avg/tf/'}}

## Aggregator

Aggregator coordinates the overall process, communicates with the parties and integrates the results of the training process. This integration of results is done using the _Fusion Algorithm_.

A fusion algorithm queries the registered parties to carry out the federated learning process. The queries sent vary according to the model/algorithm type.  In return, parties send their reply as a model update object, and these model updates are then aggregated according to the specified Fusion Algorithm, specified via a `Fusion Handler` class. 

To take a look at the supported fusion algorithms, refer the IBM FL tutorial page [here](https://github.com/IBM/federated-learning-lib/blob/main/README.md#supported-functionality).

### Aggregator Configuration

We discuss the various configuration parameters for the Aggregator [here.](https://github.com/IBM/federated-learning-lib/blob/main/docs/tutorials/configure_fl.md#the-aggregators-configuration-file)

Given below is an example of the aggregator's configuration file. In this example, the aggregator does not specify a data file or maintain a global model. Hence, during the federated learning process, it only keeps track of the current model parameters. 

However, it is possible that the aggregator also has data for testing purposes and maintains a global model. When this is the case, one needs to add `data` and `model` sections in the configuration file.

<img style="display=block; margin:auto" src="../images/arch_aggregator.png" width="680"/>
<p style="text-align: center">Image Source: <a href="https://arxiv.org/pdf/2007.10987.pdf">IBM Federated Learning: An Enterprise FrameworkWhite Paper V0.1</a></p>

#### Important parameters in configuration file for this tutorial:

- `max_timeout` maximum timeout (in seconds) aggregator should wait for parties to reply back. If `max_timeout` value is specified, aggregator will wait for specified amount of time to check if the required number of parties (calculated based in the quorum percentage provided earlier) have replied back or not. 

- `perc_quorum`: quorum percentage to provide flexibility to parties that have potential connectivity failure. Given a total number of parties registered at a particular round, the quorum percentage defines the minimum number of parties that should reply back. If for some round aggregator receives less number of replies from the parties, it will stop the federated learning process. 

For detailed documentation of the objects described below, refer the API docs [here](https://ibmfl.mybluemix.net/api-documentation).

In [3]:
agg_config = {
    'connection': {
        'info': {
            'ip': '127.0.0.1',
            'port': 5000,
            'tls_config': {
                'enable': 'false'
            }
        },
        'name': 'FlaskConnection',
        'path': 'ibmfl.connection.flask_connection',
        'sync': 'False'
    },
    'data': {
        'info': {
            'npz_file': 'examples/datasets/mnist.npz'
        },
        'name': 'MnistKerasDataHandler',
        'path': 'ibmfl.util.data_handlers.mnist_keras_data_handler'
    },
    'fusion': {
        'name': 'IterAvgFusionHandler',
        'path': 'ibmfl.aggregator.fusion.iter_avg_fusion_handler'
    },
    'hyperparams': {
        'global': {
            'max_timeout': 3,
            'num_parties': num_parties,
            'perc_quorum': 0.5,
            'rounds': 1,
            'termination_accuracy': 0.9
        },
        'local': {
            'optimizer': {
                'lr': 0.01
            },
            'training': {
                'epochs': 1
            }
        }
    },
    'protocol_handler': {
        'name': 'ProtoHandler',
        'path': 'ibmfl.aggregator.protohandler.proto_handler'
    }
}

### Running the Aggregator
Next we pass the configuration parameters set in the previous cell to instantiate the `Aggregator` object. Finally, we `start()` the Aggregator process.

In [4]:
from ibmfl.aggregator.aggregator import Aggregator
aggregator = Aggregator(config_dict=agg_config)

aggregator.start()

2022-03-11 18:37:13,794 | 1.0.6 | INFO | ibmfl.util.config                             | Getting Aggregator details from arguments.
2022-03-11 18:37:15,196 | 1.0.6 | INFO | ibmfl.util.config                             | No metrics recorder config provided for this setup.
2022-03-11 18:37:15,197 | 1.0.6 | INFO | ibmfl.util.config                             | No model config provided for this setup.
2022-03-11 18:37:15,375 | 1.0.6 | INFO | ibmfl.util.config                             | No metrics config provided for this setup.
2022-03-11 18:37:15,375 | 1.0.6 | INFO | ibmfl.util.config                             | No evidencia recordeer config provided for this setup.
2022-03-11 18:37:15,376 | 1.0.6 | INFO | ibmfl.util.data_handlers.mnist_keras_data_handler | Loaded training data from examples/datasets/mnist.npz
2022-03-11 18:37:15,822 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | RestSender initialized
2022-03-11 18:37:15,823 | 1.0.6 | INFO | ibmfl.aggregator.prot

## Parties

Each party holds its own dataset that is kept to itself and used to answer queries received from the aggregator. Because each party may have stored data in different formats, FL offers an abstraction called Data Handler. This module allows for custom implementations to retrieve the data from each of the participating parties. A local training handler sits at each party to control the local training happening at the party side.

<img style="display=block; margin:auto" src="../images/arch_party.png" width="680"/>
<p style="text-align: center">Image Source: <a href="https://arxiv.org/pdf/2007.10987.pdf">IBM Federated Learning: An Enterprise FrameworkWhite Paper V0.1</a></p>

## Parties Configuration

Each party holds its own dataset that is kept to itself and used to answer queries received from the aggregator. Because each party may have stored data in different formats, FL offers an abstraction called Data Handler. This module allows for custom implementations to retrieve the data from each of the participating parties. A local training handler sits at each party to control the local training happening at the party side.

In the following cell we configure the parties configuration:

In [5]:
def get_party_config(party_id):
    party_config = {
        'local_training': {
            'name': 'LocalTrainingHandler',
            'path': 'ibmfl.party.training.local_training_handler'
        },

        # party protocol handler receives all the request from aggregator and route it to respective methods in local training handler
        'protocol_handler': {
            'name': 'PartyProtocolHandler',
            'path': 'ibmfl.party.party_protocol_handler'

        },

        # port at which the aggregator is running so that the party can establish a connection
        'aggregator': {

            'ip': '127.0.0.1',
            'port': 5000

        },
        'connection': {
            'info': {
                'ip': '127.0.0.1',
                'port': 8085 + party_id,
                'id': 'party' + str(party_id)
            },
            'name': 'FlaskConnection',
            'path': 'ibmfl.connection.flask_connection',
            'sync': False
        },

        # in cartpole case, the openai gym environment will act as the data for fl
        'data': {
            'info': {
                    'npz_file: examples/data/mnist/random/data_party' + str(party_id) + '.npz'
            },
            'name': 'MnistTFDataHandler',
            'path': 'ibmfl.util.data_handlers.mnist_keras_data_handler'
        },

        'model': {
            'name': 'TensorFlowFLModel',
            'path': 'ibmfl.model.tensorflow_fl_model',
            'spec': {
                'model_definition': 'examples/configs/iter_avg/tf',
                'model_name': 'tf-cnn',
            }
        }
    }
    return party_config

## Running the Parties

Now, we invoke the get_party_config function to setup each of the parties and start() them up.

Finally, we register the party with the Aggregator.

Note that we maintain a partyList to allow for manual drop out and eventually a seamless shutdown in the end.

In [6]:
num_parties = agg_config['hyperparams']['global']['num_parties']
from ibmfl.party.party import Party
partyList = []
for party_id in range(num_parties):
  party_config = get_party_config(party_id)
  party = Party(config_dict=party_config)
  party.start()  
  party.register_party()
  partyList.append(party)

2022-03-11 18:37:20,424 | 1.0.6 | INFO | ibmfl.util.config                             | Getting Aggregator details from arguments.
2022-03-11 18:37:20,426 | 1.0.6 | INFO | ibmfl.util.config                             | No metrics recorder config provided for this setup.
2022-03-11 18:37:20,656 | 1.0.6 | INFO | ibmfl.util.config                             | No metrics config provided for this setup.
2022-03-11 18:37:20,657 | 1.0.6 | INFO | ibmfl.util.config                             | No evidencia recordeer config provided for this setup.
2022-03-11 18:37:21,356 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | RestSender initialized
2022-03-11 18:37:21,374 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Receiver Initialized
2022-03-11 18:37:21,377 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Initializing Flask application
2022-03-11 18:37:21,384 | 1.0.6 | INFO | ibmfl.party.party                             | Party initialization

## Training

Now that our network has been set up, we begin training the model by invoking the Aggregator's `start_training()` method. 

In [7]:
aggregator.start_training()

2022-03-11 18:37:25,938 | 1.0.6 | INFO | ibmfl.aggregator.aggregator                   | Initiating Global Training.
2022-03-11 18:37:25,940 | 1.0.6 | INFO | ibmfl.aggregator.fusion.fusion_handler        | Warm start disabled.
2022-03-11 18:37:25,941 | 1.0.6 | INFO | ibmfl.aggregator.fusion.iter_avg_fusion_handler | Model updateNone
2022-03-11 18:37:25,942 | 1.0.6 | INFO | ibmfl.aggregator.fusion.fusion_state_service  | Fusion state States.SND_MODEL
2022-03-11 18:37:25,943 | 1.0.6 | INFO | ibmfl.aggregator.protohandler.proto_handler   | State: States.SND_REQ
2022-03-11 18:37:26,022 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path :7
2022-03-11 18:37:26,024 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path :7
2022-03-11 18:37:26,025 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path :7
2022-03-11 18:37:26,026 | 1.0.6 | INFO | ibmfl.connection.flask_connection       

2022-03-11 18:37:27,985 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path :7
2022-03-11 18:37:27,986 | 1.0.6 | INFO | ibmfl.party.training.local_training_handler   | Local training done, generating model update...
2022-03-11 18:37:28,000 | 1.0.6 | INFO | ibmfl.party.training.local_training_handler   | Local training done, generating model update...
2022-03-11 18:37:28,000 | 1.0.6 | INFO | ibmfl.party.party_protocol_handler            | successfully finished async request
2022-03-11 18:37:28,003 | 1.0.6 | INFO | ibmfl.party.party_protocol_handler            | successfully finished async request
2022-03-11 18:37:28,005 | 1.0.6 | INFO | ibmfl.party.party_protocol_handler            | successfully finished async request
2022-03-11 18:37:28,260 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path :7
2022-03-11 18:37:28,319 | 1.0.6 | INFO | werkzeug                                      | 127.0.0.1 - - [11/Mar/2022

True

## Quorum

Next, we manually stop the first party to verify if training still continues (Quorum: 3/4 >= 0.5). Please note that in this tutorial we are performing this step after the training has already finished as Jupyter notebook executes each cell at a time. However, in real time a party can drop out even when training is in progress. IBM FL will handle drop out during training similar to how it is handled in this tutorial.

In [8]:
# Stop first party
party_to_stop = partyList[0]
party_to_stop.stop()

2022-03-11 18:37:36,372 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Stopping Receiver and Sender
2022-03-11 18:37:36,399 | 1.0.6 | INFO | werkzeug                                      | 127.0.0.1 - - [11/Mar/2022 18:37:36] "POST /shutdown HTTP/1.1" 200 -
2022-03-11 18:37:36,403 | 1.0.6 | INFO | ibmfl.party.party                             | Party stop successful


  func()


Once the party has dropped out we can restart the training process and verify if training continues as we have more than 50% of parties still running.

In [9]:
aggregator.start_training()

2022-03-11 18:37:39,803 | 1.0.6 | INFO | ibmfl.aggregator.aggregator                   | Initiating Global Training.
2022-03-11 18:37:39,804 | 1.0.6 | INFO | ibmfl.aggregator.fusion.fusion_handler        | Warm start disabled.
2022-03-11 18:37:39,810 | 1.0.6 | INFO | ibmfl.aggregator.fusion.iter_avg_fusion_handler | Model update<ibmfl.model.model_update.ModelUpdate object at 0x7fa32cc06790>
2022-03-11 18:37:39,811 | 1.0.6 | INFO | ibmfl.aggregator.fusion.fusion_state_service  | Fusion state States.SND_MODEL
2022-03-11 18:37:39,812 | 1.0.6 | INFO | ibmfl.aggregator.protohandler.proto_handler   | State: States.SND_REQ
2022-03-11 18:37:40,214 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path :7
2022-03-11 18:37:40,235 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path :7
2022-03-11 18:37:40,238 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path :7
2022-03-11 18:37:40,35

2022-03-11 18:37:45,605 | 1.0.6 | INFO | ibmfl.aggregator.protohandler.proto_handler   | Timeout:3 Time spent:5
2022-03-11 18:37:45,606 | 1.0.6 | INFO | ibmfl.aggregator.fusion.fusion_state_service  | Fusion state States.RCV_MODEL
2022-03-11 18:37:45,607 | 1.0.6 | INFO | ibmfl.aggregator.fusion.fusion_state_service  | Fusion state States.AGGREGATING
2022-03-11 18:37:45,631 | 1.0.6 | INFO | ibmfl.aggregator.fusion.iter_avg_fusion_handler | Reached maximum global rounds. Finish training :) 
2022-03-11 18:37:45,639 | 1.0.6 | INFO | ibmfl.aggregator.aggregator                   | Finished Global Training


True

Now, we will stop another party and restart training. As our quorum is 50% training should continue (Quorum: 2/4 >= 0.5).

In [10]:
# Stop second party
party_to_stop = partyList[1]
party_to_stop.stop()


2022-03-11 18:38:18,155 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Stopping Receiver and Sender
2022-03-11 18:38:18,181 | 1.0.6 | INFO | werkzeug                                      | 127.0.0.1 - - [11/Mar/2022 18:38:18] "POST /shutdown HTTP/1.1" 200 -
2022-03-11 18:38:18,183 | 1.0.6 | INFO | ibmfl.party.party                             | Party stop successful


In [11]:
# Restart training
aggregator.start_training()

2022-03-11 18:38:21,005 | 1.0.6 | INFO | ibmfl.aggregator.aggregator                   | Initiating Global Training.
2022-03-11 18:38:21,007 | 1.0.6 | INFO | ibmfl.aggregator.fusion.fusion_handler        | Warm start disabled.
2022-03-11 18:38:21,013 | 1.0.6 | INFO | ibmfl.aggregator.fusion.iter_avg_fusion_handler | Model update<ibmfl.model.model_update.ModelUpdate object at 0x7fa3372e50d0>
2022-03-11 18:38:21,014 | 1.0.6 | INFO | ibmfl.aggregator.fusion.fusion_state_service  | Fusion state States.SND_MODEL
2022-03-11 18:38:21,015 | 1.0.6 | INFO | ibmfl.aggregator.protohandler.proto_handler   | State: States.SND_REQ
2022-03-11 18:38:21,340 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path :7
2022-03-11 18:38:21,345 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path :7
2022-03-11 18:38:21,464 | 1.0.6 | INFO | ibmfl.party.party_protocol_handler            | received a async request
2022-03-11 18:38:21,465 | 

True

## Rejoin

Next, we will make both dropped out parties rejoin the training process. IBM FL allows parties to rejoin even during the training process. Rejoining during the training is handled in same fashion as shown in this tutorial.

In [12]:
for party in [partyList[0], partyList[1]]:
    party.start()  
    party.register_party()

2022-03-11 18:38:29,605 | 1.0.6 | INFO | ibmfl.party.party                             | Party start successful
2022-03-11 18:38:29,607 | 1.0.6 | INFO | ibmfl.party.party                             | Registering party...
2022-03-11 18:38:29,610 | 1.0.6 | INFO | werkzeug                                      |  * Running on http://127.0.0.1:8085/ (Press CTRL+C to quit)
2022-03-11 18:38:29,629 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path :6
2022-03-11 18:38:29,631 | 1.0.6 | INFO | ibmfl.aggregator.protohandler.proto_handler   | Cleaning dropped parties for addition of party0
2022-03-11 18:38:29,632 | 1.0.6 | INFO | ibmfl.aggregator.protohandler.proto_handler   | Adding party with id party0
2022-03-11 18:38:29,633 | 1.0.6 | INFO | ibmfl.aggregator.protohandler.proto_handler   | Total number of registered parties:4
2022-03-11 18:38:29,635 | 1.0.6 | INFO | werkzeug                                      | 127.0.0.1 - - [11/Mar/2022 18:38:29] "POST

Now, we start training and verify that we train with all four parties.

In [13]:
# Restart training
aggregator.start_training()

2022-03-11 18:38:34,070 | 1.0.6 | INFO | ibmfl.aggregator.aggregator                   | Initiating Global Training.
2022-03-11 18:38:34,071 | 1.0.6 | INFO | ibmfl.aggregator.fusion.fusion_handler        | Warm start disabled.
2022-03-11 18:38:34,078 | 1.0.6 | INFO | ibmfl.aggregator.fusion.iter_avg_fusion_handler | Model update<ibmfl.model.model_update.ModelUpdate object at 0x7fa34cb16ee0>
2022-03-11 18:38:34,079 | 1.0.6 | INFO | ibmfl.aggregator.fusion.fusion_state_service  | Fusion state States.SND_MODEL
2022-03-11 18:38:34,080 | 1.0.6 | INFO | ibmfl.aggregator.protohandler.proto_handler   | State: States.SND_REQ
2022-03-11 18:38:34,522 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path :7
2022-03-11 18:38:34,523 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path :7
2022-03-11 18:38:34,524 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path :7
2022-03-11 18:38:34,52

2022-03-11 18:38:35,763 | 1.0.6 | INFO | ibmfl.party.training.local_training_handler   | Local training done, generating model update...
2022-03-11 18:38:35,765 | 1.0.6 | INFO | ibmfl.party.party_protocol_handler            | successfully finished async request
2022-03-11 18:38:35,767 | 1.0.6 | INFO | ibmfl.party.party_protocol_handler            | successfully finished async request
2022-03-11 18:38:35,893 | 1.0.6 | INFO | ibmfl.party.training.local_training_handler   | Local training done, generating model update...
2022-03-11 18:38:35,894 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path :7
2022-03-11 18:38:35,996 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path :7
2022-03-11 18:38:35,997 | 1.0.6 | INFO | ibmfl.party.party_protocol_handler            | successfully finished async request
2022-03-11 18:38:36,118 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path 

True

## Failed Quorum

Next, we will make 3 parties drop out and verify that Federated Learning stops as we do not meet quorum in this case (1/4 <= 0.5).

In [14]:
for party in [partyList[0], partyList[1], partyList[2]]:
    party.stop() 

2022-03-11 18:40:13,654 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Stopping Receiver and Sender
2022-03-11 18:40:13,688 | 1.0.6 | INFO | werkzeug                                      | 127.0.0.1 - - [11/Mar/2022 18:40:13] "POST /shutdown HTTP/1.1" 200 -
2022-03-11 18:40:13,692 | 1.0.6 | INFO | ibmfl.party.party                             | Party stop successful
2022-03-11 18:40:13,693 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Stopping Receiver and Sender
2022-03-11 18:40:13,716 | 1.0.6 | INFO | werkzeug                                      | 127.0.0.1 - - [11/Mar/2022 18:40:13] "POST /shutdown HTTP/1.1" 200 -
2022-03-11 18:40:13,719 | 1.0.6 | INFO | ibmfl.party.party                             | Party stop successful
2022-03-11 18:40:13,721 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Stopping Receiver and Sender
2022-03-11 18:40:13,743 | 1.0.6 | INFO | werkzeug                                      | 127.0.0.1 - - [11/M

In [15]:
# Restart training
aggregator.start_training()

2022-03-11 18:40:18,123 | 1.0.6 | INFO | ibmfl.aggregator.aggregator                   | Initiating Global Training.
2022-03-11 18:40:18,125 | 1.0.6 | INFO | ibmfl.aggregator.fusion.fusion_handler        | Warm start disabled.
2022-03-11 18:40:18,134 | 1.0.6 | INFO | ibmfl.aggregator.fusion.iter_avg_fusion_handler | Model update<ibmfl.model.model_update.ModelUpdate object at 0x7fa32cbe06a0>
2022-03-11 18:40:18,136 | 1.0.6 | INFO | ibmfl.aggregator.fusion.fusion_state_service  | Fusion state States.SND_MODEL
2022-03-11 18:40:18,138 | 1.0.6 | INFO | ibmfl.aggregator.protohandler.proto_handler   | State: States.SND_REQ
2022-03-11 18:40:18,585 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path :7
2022-03-11 18:40:18,700 | 1.0.6 | INFO | ibmfl.party.party_protocol_handler            | received a async request
2022-03-11 18:40:18,701 | 1.0.6 | INFO | ibmfl.party.party_protocol_handler            | finished async request
2022-03-11 18:40:18,702 | 1.0.6 

False

As shown above Aggregator reports that party did not reply in time and quorum is not reached.

## Shut Down

Finally, we invoke the `stop()` method on last remaining party and Aggregator.

In [16]:
aggregator.stop()

2022-03-11 18:40:37,369 | 1.0.6 | INFO | ibmfl.aggregator.protohandler.proto_handler   | State: States.SND_REQ
2022-03-11 18:40:37,391 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Request received for path :14
2022-03-11 18:40:37,393 | 1.0.6 | INFO | ibmfl.party.party_protocol_handler            | Received request from aggregator
2022-03-11 18:40:37,394 | 1.0.6 | INFO | ibmfl.party.party_protocol_handler            | Received request in with message_type:  14
2022-03-11 18:40:37,395 | 1.0.6 | INFO | ibmfl.party.party_protocol_handler            | Received request in PH 14
2022-03-11 18:40:37,397 | 1.0.6 | INFO | ibmfl.party.party_protocol_handler            | received a STOP request
2022-03-11 18:40:37,400 | 1.0.6 | INFO | werkzeug                                      | 127.0.0.1 - - [11/Mar/2022 18:40:37] "POST /14 HTTP/1.1" 200 -
2022-03-11 18:40:37,404 | 1.0.6 | INFO | ibmfl.aggregator.protohandler.proto_handler   | Total number of success responses :1
2022-03-11

In [17]:
partyList[3].stop()

2022-03-11 18:40:41,669 | 1.0.6 | INFO | ibmfl.connection.flask_connection             | Stopping Receiver and Sender
2022-03-11 18:40:41,689 | 1.0.6 | INFO | werkzeug                                      | 127.0.0.1 - - [11/Mar/2022 18:40:41] "POST /shutdown HTTP/1.1" 200 -
2022-03-11 18:40:41,692 | 1.0.6 | INFO | ibmfl.party.party                             | Party stop successful
