# Part 2 - WML Federated Learning with MNIST for Party 

### Learning Goals

When you complete the Part 2 - WML Federated Learning with MNIST for Party, you should know how to:

- Load the data that you intend to use in the Federated Learning experiment.
- Install IBM Federated Learning libraries.
- Define a data handler. For more details on data handlers, see <a href = "https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fl-cus-dh.html?audience=wdp&context=cpdaas" target="_blank" rel="noopener no referrer">Customizing the data handler</a>.
- Configure the party to train data with the aggregator.

<div class="alert alert-block alert-info">This notebook is intended to be run by the administrator or connecting party of the Federated Learning experiment.</div>
</div>

## Table of Contents

1. [Load the data](#load)<br>
2. [Install Federated Learning libraries](#install)<br>
3. [Define a Data Handler](#data-handler)
4. [Configure the party](#config)
5. [Train with Federated Learning](#train)
6. [Summary](#summary)

<div class="alert alert-block alert-warning">Before you run this notebook, you must have already run Part 1 - WML Federated Learning with MNIST for Admin. If you have not, open the notebook and run through that notebook first.</div>
</div>

<a id = "load"></a>
## 1. Load the data

### Paste Variables From Admin Notebook

Paste in the ID credentials you got from the end of the Part 1 notebook. If you have not run through Part 1, open the notebook and run through it first.

In [1]:
WML_SERVICES_HOST = 'us-south.ml.cloud.ibm.com'
IAM_APIKEY = 'xxx'
RTS_ID = 'xxx'
TRAINING_ID = 'xxx'

<a id = "1.1"></a>
### 1.1 Download MNIST handwritten digits dataset

As the party, you must provide the dataset that you will use to train the Federated Learning model. In this tutorial, a dataset is provided by default, the MNIST handwritten digits dataset.

In [2]:
import requests

dataset_resp = requests.get("https://api.dataplatform.cloud.ibm.com/v2/gallery-assets/entries/903188bb984a30f38bb889102a1baae5/data",
                            allow_redirects=True)

f = open('MNIST-pkl.zip', 'wb')
f.write(dataset_resp.content)
f.close()

In [3]:
import zipfile
import os

with zipfile.ZipFile("MNIST-pkl.zip","r") as file:
    file.extractall()
    
!ls -lh

total 64M
drwxr-x--- 2 wsuser watsonstudio 4.0K Dec  4 18:49 __MACOSX
-rw-r----- 1 wsuser watsonstudio  13K Dec  4 18:49 mnist-keras-test-payload.json
-rw-r----- 1 wsuser watsonstudio 7.5M Dec  4 18:49 mnist-keras-test.pkl
-rw-r----- 1 wsuser watsonstudio  38M Dec  4 18:49 mnist-keras-train.pkl
-rw-r----- 1 wsuser watsonstudio 7.5M Dec  4 18:49 mnist-keras-valid.pkl
-rw-r----- 1 wsuser watsonstudio  11M Dec  4 18:49 MNIST-pkl.zip


<a id = "install"></a>
## 2. Install Federated Learning libraries

In this section, we will install the necessary libraries and other packages to call for Federated Learning with the Python client.

<a id = "2.1"></a>
### 2.1 Install the IBM WML SDK with FL

This installs the IBM Watson Machine Learning CLI along with the whole software development package with Federated Learning.

In [4]:
!pip install --upgrade ibm-watson-machine-learning



<a id = "2.2"></a>
### 2.2 Install the libraries

In [5]:
!pip install environs parse websockets jsonpickle pandas pytest pyYAML requests pathlib2 psutil setproctitle tabulate lz4 opencv-python gym ray==0.8.0 cloudpickle==1.3.0 image

Collecting environs
  Downloading environs-9.2.0-py2.py3-none-any.whl (11 kB)
Collecting parse
  Downloading parse-1.18.0.tar.gz (30 kB)
Collecting websockets
  Downloading websockets-8.1-cp37-cp37m-manylinux2010_x86_64.whl (79 kB)
[K     |████████████████████████████████| 79 kB 9.8 MB/s  eta 0:00:01
[?25hCollecting jsonpickle
  Downloading jsonpickle-1.4.2-py2.py3-none-any.whl (36 kB)
Collecting pathlib2
  Downloading pathlib2-2.3.5-py2.py3-none-any.whl (18 kB)
Collecting psutil
  Downloading psutil-5.7.3.tar.gz (465 kB)
[K     |████████████████████████████████| 465 kB 38.3 MB/s eta 0:00:01
[?25hCollecting setproctitle
  Downloading setproctitle-1.2.1-cp37-cp37m-manylinux1_x86_64.whl (36 kB)
Collecting lz4
  Downloading lz4-3.1.1-cp37-cp37m-manylinux2010_x86_64.whl (1.8 MB)
[K     |████████████████████████████████| 1.8 MB 34.8 MB/s eta 0:00:01
[?25hCollecting opencv-python
  Downloading opencv_python-4.4.0.46-cp37-cp37m-manylinux2014_x86_64.whl (49.5 MB)
[K     |███████████████

<a id = "2.3"></a>
### 2.3 Install the frameworks

In [6]:
!pip install tensorflow==2.1.0 scikit-learn==0.23.1 keras==2.2.4 numpy==1.17.4 scipy==1.4.1 

Collecting keras==2.2.4
  Downloading Keras-2.2.4-py2.py3-none-any.whl (312 kB)
[K     |████████████████████████████████| 312 kB 11.4 MB/s eta 0:00:01
[?25hCollecting numpy==1.17.4
  Downloading numpy-1.17.4-cp37-cp37m-manylinux1_x86_64.whl (20.0 MB)
[K     |████████████████████████████████| 20.0 MB 12.6 MB/s eta 0:00:01
[?25hCollecting scipy==1.4.1
  Downloading scipy-1.4.1-cp37-cp37m-manylinux1_x86_64.whl (26.1 MB)
[K     |████████████████████████████████| 26.1 MB 21.4 MB/s eta 0:00:01
Installing collected packages: numpy, scipy, keras
  Attempting uninstall: numpy
    Found existing installation: numpy 1.18.5
    Uninstalling numpy-1.18.5:
      Successfully uninstalled numpy-1.18.5
  Attempting uninstall: scipy
    Found existing installation: scipy 1.5.0
    Uninstalling scipy-1.5.0:
      Successfully uninstalled scipy-1.5.0
Successfully installed keras-2.2.4 numpy-1.17.4 scipy-1.4.1


<a id = "2.4"></a>
### 2.4 Import the Party

The following code imports the package for the party, and ensures that it is loaded.

In [7]:
import ibmfl.party_env_validator
from ibmfl.party.party import Party

No module named 'scikit-learn' Required version is  0.23.1
No module named 'PyYAML'


Using TensorFlow backend.


No module named 'diffprivlib'
No module named 'opencv-python'


<a id = "data-handler"></a>
## 3. Define a Data Handler

The party should run a data handler to ensure that their datasets are in compatible format and consistent. In this tutorial, an example data handler for the MNIST dataset is provided. 

For more details on data handlers, see <a href = "https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fl-cus-dh.html?audience=wdp&context=cpdaas" target="_blank" rel="noopener no referrer">Customizing the data handler</a>.

In [8]:
## This data handler is written to the local working directory of this notebook

%%writefile mnist_keras_data_handler.py
from keras.preprocessing.image import ImageDataGenerator
import logging

import numpy as np
from keras.utils import np_utils

from ibmfl.data.data_handler import DataHandler
from ibmfl.util.datasets import load_mnist

logger = logging.getLogger(__name__)



class MnistTFDataHandler(DataHandler):
    """
       Data handler for MNIST dataset.
       """

    def __init__(self, data_config=None, channels_first=False):
        super().__init__()
        self.file_name = None
        if data_config is not None:
            if 'train_file' in data_config:
                self.train_file_name = data_config['train_file']
            if 'test_file' in data_config:
                self.test_file_name = data_config['test_file']

    def get_data(self, nb_points=500):
        """
        Gets pre-process mnist training and testing data. Because this method
        is for testing it takes as input the number of datapoints, nb_points,
        to be included in the training and testing set.

        :param: nb_points: Number of data points to be included in each set
        :type nb_points: `int`
        :return: training data
        :rtype: `tuple`
        """
        if self.file_name is None:
            (x_train, y_train), (x_test, y_test) = load_mnist()
            # Reduce datapoints to make test faster
            x_train = x_train[:nb_points]
            y_train = y_train[:nb_points]
            x_test = x_test[:nb_points]
            y_test = y_test[:nb_points]
        else:
            try:
                logger.info(
                    'Loaded training data from ' + str(self.file_name))
                data_train = np.load(self.file_name)
                with open("MNIST-pkl/mnist-keras-train.pkl", 'rb') as f:
                    (x_train, y_train)= pickle.load(f)

                with open("MNIST-pkl/mnist-keras-train.pkl", 'rb') as f:
                    (x_test, y_test)= pickle.load(f)
                
            except Exception:
                raise IOError('Unable to load training data from path '
                              'provided in config file: ' +
                              self.file_name)

        # Add a channels dimension
        import tensorflow as tf
        x_train = x_train[..., tf.newaxis]
        x_test = x_test[..., tf.newaxis]

        print('x_train shape:', x_train.shape)
        print(x_train.shape[0], 'train samples')
        print(x_test.shape[0], 'test samples')

        return (x_train, y_train), (x_test, y_test)

Writing mnist_keras_data_handler.py


### Verify Data Handler Exists

In [9]:
!ls -lh

total 64M
drwxr-x--- 2 wsuser watsonstudio 4.0K Dec  4 18:49 __MACOSX
-rw-r----- 1 wsuser watsonstudio 2.5K Dec  4 18:50 mnist_keras_data_handler.py
-rw-r----- 1 wsuser watsonstudio  13K Dec  4 18:49 mnist-keras-test-payload.json
-rw-r----- 1 wsuser watsonstudio 7.5M Dec  4 18:49 mnist-keras-test.pkl
-rw-r----- 1 wsuser watsonstudio  38M Dec  4 18:49 mnist-keras-train.pkl
-rw-r----- 1 wsuser watsonstudio 7.5M Dec  4 18:49 mnist-keras-valid.pkl
-rw-r----- 1 wsuser watsonstudio  11M Dec  4 18:49 MNIST-pkl.zip


<a id = "config"></a>
## 4. Configure the party

Each party must run their party configuration file to call out to the aggregator. Here is an example of a party configuration.

Because you had already defined the training ID, RTS ID and data handler in the previous sections of this notebook, and the local training and protocol handler are all defined by the SDK, you will only need to define the information for the dataset file under `["data"]["info"]`. 

In this tutorial, the data path is already defined as we have loaded the examplar MNIST dataset from previous sections.

In [10]:
from pathlib import Path
working_dir = !pwd
pwd = working_dir[0]

party_config = {
  "aggregator": {
    "ip": WML_SERVICES_HOST + "/ml/v4/trainings/" + TRAINING_ID
  },
  "connection": {
    "info": {
      "id": RTS_ID,
    }
  },
  "data": {
    "info": {
      "train_file": "/mnist-keras-train.pkl",
      "test_file": "/mnist-keras-test.pkl"
    },
    "name": "MnistTFDataHandler",
    "path": pwd + "/mnist_keras_data_handler.py"
  },
  "local_training": {
    "name": "LocalTrainingHandler",
    "path": "ibmfl.party.training.local_training_handler"
  },
  "protocol_handler": {
    "name": "PartyProtocolHandler",
    "path": "ibmfl.party.party_protocol_handler"
  }
}

In [11]:
print(party_config)

{'aggregator': {'ip': 'us-south.ml.cloud.ibm.com/ml/v4/trainings/897e6381-cda3-43aa-bd5f-159e6d988f03'}, 'connection': {'info': {'id': '21c0b9e8-2a9d-45d6-a928-47e69ebdab39'}}, 'data': {'info': {'train_file': '/mnist-keras-train.pkl', 'test_file': '/mnist-keras-test.pkl'}, 'name': 'MnistTFDataHandler', 'path': '/home/wsuser/work/mnist_keras_data_handler.py'}, 'local_training': {'name': 'LocalTrainingHandler', 'path': 'ibmfl.party.training.local_training_handler'}, 'protocol_handler': {'name': 'PartyProtocolHandler', 'path': 'ibmfl.party.party_protocol_handler'}}


<a id = "train"></a>
## 5. Connect and train with Federated Learning

Here you can finally connect to the aggregator to begin training.

#### Obtain Cloud Authentication Token

In [None]:
from ibm_watson_machine_learning import APIClient


wml_credentials = {
    "apikey": IAM_APIKEY,
    "url": "https://" + WML_SERVICES_HOST
}

wml_client = APIClient(wml_credentials)
IAMTOKEN = "Bearer " + wml_client.wml_token
print(IAMTOKEN)


### 5.1 Establish Connection To Aggregator

In [13]:
p = Party( config_dict = party_config, token = IAMTOKEN )

2020-12-04 18:50:31,582 | 1.0.0 | INFO | ibmfl.util.config                                  | No model config provided for this setup.
2020-12-04 18:50:31,583 | 1.0.0 | INFO | ibmfl.util.config                                  | No fusion config provided for this setup.
2020-12-04 18:50:31,587 | 1.0.0 | INFO | ibmfl.connection.websockets_connection             | Websockets Sender initialized
2020-12-04 18:50:31,589 | 1.0.0 | INFO | ibmfl.connection.websockets_connection             | WSConnection : Initialize Party Communications
2020-12-04 18:50:31,590 | 1.0.0 | INFO | ibmfl.connection.websockets_connection             | **** PartySendLoopThread
2020-12-04 18:50:31,591 | 1.0.0 | INFO | ibmfl.connection.websockets_connection             | PartySendLoop: Holding for message to send
2020-12-04 18:50:31,592 | 1.0.0 | INFO | ibmfl.connection.websockets_connection             | **** PartyRecvLoopThread
2020-12-04 18:50:31,593 | 1.0.0 | INFO | ibmfl.party.party                               

After the message "Received Heartbeat from Aggregator" appears, then the Party is ready to start.

### 5.2 Start Training

In [14]:
p.start()

2020-12-04 18:50:31,598 | 1.0.0 | INFO | ibmfl.party.party                                  | Party not registered yet.
2020-12-04 18:50:31,598 | 1.0.0 | INFO | ibmfl.party.party                                  | Registering party...
2020-12-04 18:50:31,599 | 1.0.0 | INFO | ibmfl.connection.websockets_connection             | Sending serialized message to aggregator
2020-12-04 18:50:31,600 | 1.0.0 | INFO | ibmfl.connection.websockets_connection             | PartySendLoop: Number of active messages ready to send: 1
2020-12-04 18:50:32,823 | 1.0.0 | INFO | ibmfl.connection.websockets_connection             | Received Heartbeat from Aggregator
2020-12-04 18:50:35,310 | 1.0.0 | INFO | ibmfl.party.party_protocol_handler                 | received a async request
2020-12-04 18:50:35,312 | 1.0.0 | INFO | ibmfl.party.party_protocol_handler                 | finished async request
2020-12-04 18:50:35,312 | 1.0.0 | INFO | ibmfl.party.party_protocol_handler                 | Handling async requ

<a id = "summary"></a>
## Summary

Congratulations! You have learned to:

1. Start a Federated Learning experiment
2. Load a template model
3. Create an RTS and launch the experiment job
4. Load a dataset for training
5. Define the data handler
6. Configure the party
7. Connect to the aggregator
8. Train your Federated Learning model

### Learn more

- For more details about setting up Federated Learning, terminology, and running Federated Learning from the UI, see <a href = "https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fed-lea.html?audience=wdp" target="_blank" rel="noopener no referrer">Federated Learning documentation</a> for Cloud.
- For more information on a Keras model template, see their documentation <a href = "https://www.tensorflow.org/tutorials/quickstart/advanced" target="_blank" rel="noopener no referrer">here</a>.

# <hr>
Copyright © 2020 IBM. This notebook and its source code are released under the terms of the MIT License.
<br>
 
<div style="background:#F5F7FA; height:110px; padding: 2em; font-size:14px;">
<span style="font-size:18px;color:#152935;">Love this notebook? </span>
<span style="font-size:15px;color:#152935;float:right;margin-right:40px;">Don't have an account yet?</span><br>
<span style="color:#5A6872;">Share it with your colleagues and help them discover the power of Watson Studio!</span>
<span style="border: 1px solid #3d70b2;padding:8px;float:right;margin-right:40px; color:#3d70b2;"><a href="https://ibm.co/wsnotebooks" target="_blank" style="color: #3d70b2;text-decoration: none;">Sign Up</a></span><br>
</div>