# WML Federated Learning with MNIST for Party using `ibm-watsonx-ai`.



### Learning Goals

When you complete this notebook, you should know how to:

- Load the data that you intend to use in the Federated Learning experiment.
- Install IBM Federated Learning libraries.
- Define a data handler. For more details on data handlers.
- Configure the party to train data with the aggregator.

<div class="alert alert-block alert-info">This notebook is intended to be run by the administrator or connecting party of the Federated Learning experiment.
</div>

## Table of Contents

1. [Setup](#setup)
2. [Load the data](#load)  
3. [Define a Data Handler](#data-handler)  
4. [Configure the party](#config)  
5. [Clean up](#clean)
6. [Summary and next steps](#summary)

<a id="setup"></a>
## 1. Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Create a <a href="https://cloud.ibm.com/catalog/services/watson-machine-learning" target="_blank" rel="noopener no referrer">Watson Machine Learning (WML) Service</a> instance (a free plan is offered and information about how to create the instance can be found <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/wml-plans.html?context=wx&audience=wdp" target="_blank" rel="noopener no referrer">here</a>).

### Install and import the `ibm-watsonx-ai` and dependecies
**Note:** `ibm-watsonx-ai` documentation can be found <a href="https://ibm.github.io/watsonx-ai-python-sdk/index.html" target="_blank" rel="noopener no referrer">here</a>.

In [None]:
!pip install -U ibm-watsonx-ai[fl]

### Connection to WML

Authenticate the Watson Machine Learning service on IBM Cloud. You need to provide platform `api_key` and instance `location`.

You can use [IBM Cloud CLI](https://cloud.ibm.com/docs/cli/index.html) to retrieve platform API Key and instance location.

API Key can be generated in the following way:
```
ibmcloud login
ibmcloud iam api-key-create API_KEY_NAME
```

In result, get the value of `api_key` from the output.


Location of your WML instance can be retrieved in the following way:
```
ibmcloud login --apikey API_KEY -a https://cloud.ibm.com
ibmcloud resource service-instance WML_INSTANCE_NAME
```

In result, get the value of `location` from the output.

**Tip**: Your `Cloud API key` can be generated by going to the [**Users** section of the Cloud console](https://cloud.ibm.com/iam#/users). From that page, click your name, scroll down to the **API Keys** section, and click **Create an IBM Cloud API key**. Give your key a name and click **Create**, then copy the created key and paste it below. You can also get a service specific url by going to the [**Endpoint URLs** section of the Watson Machine Learning docs](https://cloud.ibm.com/apidocs/machine-learning).  You can check your instance location in your  <a href="https://cloud.ibm.com/catalog/services/watson-machine-learning" target="_blank" rel="noopener no referrer">Watson Machine Learning (WML) Service</a> instance details.

You can also get service specific apikey by going to the [**Service IDs** section of the Cloud Console](https://cloud.ibm.com/iam/serviceids).  From that page, click **Create**, then copy the created key and paste it below.

**Action**: Enter your `api_key` and `location` in the following cell.

In [None]:
api_key = 'PASTE YOUR PLATFORM API KEY HERE'
location = 'PASTE YOUR INSTANCE LOCATION HERE'

In [None]:
from ibm_watsonx_ai import Credentials

credentials = Credentials(
    api_key=api_key,
    url='https://' + location + '.ml.cloud.ibm.com'
)

In [2]:
from ibm_watsonx_ai import APIClient

client = APIClient(credentials)



**Action**: Assign project ID below

In [None]:
project_id = 'PASTE YOUR PROJECT ID HERE'

In [4]:
client.set.default_project(project_id)

'SUCCESS'

<a id="load"></a>
## 2. Load the data

### Paste Variables From Admin Notebook

Paste in the ID's you got from the end of the Part I notebook. If you have not run through Part I, open the notebook and run through it first.

In [1]:
RTS_ID = 'PASTE REMOTE SYSTEM ID FROM PART I NOTEBOOK'
TRAINING_ID = 'PASTE TRAINIG ID FROM PART II NOTEBOOK'

### Download MNIST handwritten digits dataset

As the party, you must provide the dataset that you will use to train the Federated Learning model. In this tutorial, a dataset is provided by default, the MNIST handwritten digits dataset.

In [2]:
import requests

dataset_resp = requests.get("https://api.dataplatform.cloud.ibm.com/v2/gallery-assets/entries/903188bb984a30f38bb889102a1baae5/data",
                            allow_redirects=True)

f = open('MNIST-pkl.zip', 'wb')
f.write(dataset_resp.content)
f.close()

In [None]:
import zipfile
import os

with zipfile.ZipFile("MNIST-pkl.zip","r") as file:
    file.extractall()
    
!ls -lh

<a id="data-handler"></a>
## 3. Define a Data Handler

The party should run a data handler to ensure that their datasets are in compatible format and consistent. In this tutorial, an example data handler for the MNIST dataset is provided. 

This data handler is written to the local working directory of this notebook

In [5]:
import requests
data_handler_content_resp = requests.get("https://github.com/IBMDataScience/sample-notebooks/raw/master/Files/mnist_keras_data_handler.py",
                                  headers={"Content-Type": "application/octet-stream"},
                                      allow_redirects=True)

f = open('mnist_keras_data_handler.py', 'wb')
f.write(data_handler_content_resp.content)
f.close()

### Verify Data Handler Exists

In [6]:
!ls -lh

total 153480
-rw-r--r--@ 1 Rinay.Shah@ibm.com  staff    19K Nov 19 09:58 Federated Learning Demo Part 1 - for Admin.ipynb
-rw-r--r--@ 1 Rinay.Shah@ibm.com  staff    32K Nov 18 13:07 Federated Learning Demo Part 2 - for Party.ipynb
-rw-r--r--  1 Rinay.Shah@ibm.com  staff    11M Nov 19 10:00 MNIST-pkl.zip
drwxr-xr-x  6 Rinay.Shah@ibm.com  staff   192B Nov 18 13:05 [34m__MACOSX[m[m
-rw-r--r--  1 Rinay.Shah@ibm.com  staff    12K Nov 19 10:00 mnist-keras-test-payload.json
-rw-r--r--  1 Rinay.Shah@ibm.com  staff   7.5M Nov 19 10:00 mnist-keras-test.pkl
-rw-r--r--  1 Rinay.Shah@ibm.com  staff    37M Nov 19 10:00 mnist-keras-train.pkl
-rw-r--r--  1 Rinay.Shah@ibm.com  staff   7.5M Nov 19 10:00 mnist-keras-valid.pkl
-rw-r--r--  1 Rinay.Shah@ibm.com  staff   2.2K Nov 19 10:00 mnist_keras_data_handler.py
-rw-r--r--  1 Rinay.Shah@ibm.com  staff   9.7M Nov 19 09:58 tf_mnist_model.zip


<a id="config"></a>
## 4. Configure the party

Here you can finally connect to the aggregator to begin training.

Each party must run their party configuration file to call out to the aggregator. Here is an example of a party configuration.

Because you had already defined the training ID, RTS ID and data handler in the previous sections of this notebook, and the local training and protocol handler are all defined by the SDK, you will only need to define the information for the dataset file under `["data"]["info"]`. 

In this tutorial, the data path is already defined as we have loaded the examplar MNIST dataset from previous sections.

In [8]:
from pathlib import Path
working_dir = !pwd
pwd = working_dir[0]

party_metadata = {
    client.remote_training_systems.ConfigurationMetaNames.DATA_HANDLER: {  
        "info": {
            "train_file": pwd + "/mnist-keras-train.pkl",
            "test_file": pwd + "/mnist-keras-test.pkl"
        },
            "name": "MnistTFDataHandler",
            "path": "./mnist_keras_data_handler.py"
        }
    }

### Establish Connection To Aggregator and Start Training

In [9]:
party = client.remote_training_systems.create_party(RTS_ID, party_metadata)
party.run(aggregator_id=TRAINING_ID, asynchronous=False)

Using TensorFlow backend.


x_train shape: (50040, 28, 28, 1)
50040 train samples
10000 test samples
x_train shape: (50040, 28, 28, 1)
50040 train samples
10000 test samples


  recall = tps / tps[-1]


x_train shape: (50040, 28, 28, 1)
50040 train samples
10000 test samples
x_train shape: (50040, 28, 28, 1)
50040 train samples
10000 test samples
x_train shape: (50040, 28, 28, 1)
50040 train samples
10000 test samples
x_train shape: (50040, 28, 28, 1)
50040 train samples
10000 test samples
x_train shape: (50040, 28, 28, 1)
50040 train samples
10000 test samples
x_train shape: (50040, 28, 28, 1)
50040 train samples
10000 test samples
x_train shape: (50040, 28, 28, 1)
50040 train samples
10000 test samples
x_train shape: (50040, 28, 28, 1)
50040 train samples
10000 test samples
x_train shape: (50040, 28, 28, 1)
50040 train samples
10000 test samples
x_train shape: (50040, 28, 28, 1)
50040 train samples
10000 test samples
x_train shape: (50040, 28, 28, 1)
50040 train samples
10000 test samples
x_train shape: (50040, 28, 28, 1)
50040 train samples
10000 test samples
x_train shape: (50040, 28, 28, 1)
50040 train samples
10000 test samples
x_train shape: (50040, 28, 28, 1)
50040 train sampl

<a id="clean"></a>
# 5. Clean up

If you want to clean up all created assets:
- experiments
- trainings
- pipelines
- model definitions
- models
- functions
- deployments

please follow up this sample [notebook](https://github.com/IBM/watson-machine-learning-samples/blob/master/cloud/notebooks/python_sdk/instance-management/Machine%20Learning%20artifacts%20management.ipynb).

<a id="summary"></a>
# 6. Summary and next steps     

You successfully completed this notebook!  
You have learned to:
1. Start a Federated Learning experiment
2. Load a template model
3. Create an RTS and launch the experiment job
4. Load a dataset for training
5. Define the data handler
6. Configure the party
7. Connect to the aggregator
8. Train your Federated Learning model  

Check out our _[Online Documentation](https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/welcome-main.html?context=wx)_ for more samples, tutorials, documentation, how-tos, and blog posts. 

### Author

**Rinay Shah**, Software Developer at IBM.

Copyright © 2020-2024 IBM. This notebook and its source code are released under the terms of the MIT License.