<div class="alert alert-block alert-warning"><strong>If you have this notebook as a local copy on your platform, it may become outdated. <a href="https://dataplatform.cloud.ibm.com/exchange/public/entry/view/029d77a73d72a4134c81383d6f01f1ed?context=cpdaas&audience=wdp">Download the latest version of this notebook</a> or download the <a href="https://dataplatform.cloud.ibm.com/exchange/public/entry/view/cab78523832431e767c41527a42a6727?context=cpdaas?context=cpdaas&audience=wdp">latest version of the project</a>.</strong></div>

# Part 1 - WML Federated Learning with XGBoost and Adult Income dataset - Aggregator 

With IBM Federated Learning, you can combine data from multiple sources to train a model from the collective data without having to actually share them. This allows enterprises to train data with other companies without delegating resources for security. Another advantage is the remote data does not have to be centralized in one location, eliminates the needs to move potentially large datasets. This notebook demonstrates how to start Federated Learning with the Python client. For more details setting up Federated Learning, terminology, and running Federated Learning from the UI, see [Federated Learning documentation](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fed-lea.html?audience=wdp).

### Learning Goals

When you complete the Part 1 - WML Federated Learning with XGBoost and Adult Income dataset - Aggregator notebook, you should know how to:

- Create a Remote Training System
- Start a training job

Once you complete this notebook, please open [Part 2 - WML Federated Learning with XGBoost and Adult Income dataset - Party](https://dataplatform.cloud.ibm.com/). 

<div class="alert alert-block alert-info">This notebook is intended to be run by the administrator of the Federated Learning experiment.</div>

## Table of Contents

- [1. Prequisites](#prequisites)
    - [1.1 Define variables](#var)
    - [1.2 Define tags](#tags)
    - [1.3 Import libraries](#libraries)
- [2. Obtain IBM Cloud Token](#auth)
- [3. Create a Remote Training System](#create-rts)
- [4. Create FL Training Job](#fl-job)
    - [4.1 Get Training Job Status](#status)
- [5. Get Variables And Paste Into Party Notebook](#party-notebook)
- [6. Save Trained Model](#save-model)
    - [6.1 COS connection](#cos)
    - [6.2 Install pre-reqs](#cos-prereqs)
    - [6.3 Save model to project](#save-to-project)
- [7. Save Trained Model](#cleanup)
    - [7.1 List all training jobs](#list-jobs)
    - [7.2 Delete training jobs](#del-jobs)
    - [7.3 List all Remote Training Systems](#list-rts)
    - [7.4 Delete Remote Training Systems](#del-rts)

<a id = "prequisites"></a>
## 1. Prequisites

Before you proceed, you need to have:

- An IAM API Key. To create a new one, go to [IBM Cloud homepage](https://cloud.ibm.com). In your account, go to **Manage < IAM < API Keys**. Click **Create an IBM Cloud API Key**.

<a id = "var"></a>
### 1.1 Define variables

In [2]:
API_VERSION = "2023-02-28"

WML_SERVICES_HOST = "jp-tok.ml.cloud.ibm.com" # or "eu-de.ml.cloud.ibm.com", "eu-gb.ml.cloud.ibm.com", "jp-tok.ml.cloud.ibm.com"

WML_SERVICES_URL = "https://" + WML_SERVICES_HOST
IAM_TOKEN_URL = "https://iam.cloud.ibm.com/oidc/token"
 
IAM_APIKEY = "BAPxeik6HjmRl30Patg8NwwKy4sIYwLMV-zakiBN_sie"  

# Get this from Manage < IAM < Users, and check the URL. Your user ID should be in the format IBMid-<xxx>.
CLOUD_USERID = "IBMid-668000KU4X" 

PROJECT_ID = "2d943195-1222-476b-b14a-3bd688958446" # Get this by going into your WS project and checking the URL.

<a id = "tags"></a>
### 1.2 Define tags

Used to identify the assets created by this notebook

In [3]:
RTS_TAG = "wmlflxgbsamplerts"
TRAINING_TAG = "wmlflxgbsampletraining"

<a id = "libraries"></a>
### 1.3 Import libraries

In [4]:
import urllib3
import requests
import json
from string import Template

urllib3.disable_warnings()

<a id = "auth"></a>
## 2. Obtain Cloud authentication token

In [5]:
payload = "grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey=" + IAM_APIKEY
token_resp = requests.post(IAM_TOKEN_URL ,
                          headers={"Content-Type": "application/x-www-form-urlencoded"}, 
                          data = payload,
                          verify=True)

print(token_resp)

token = "Bearer " + json.loads(token_resp.content.decode("utf-8"))["access_token"]
print("WS token: %s " % token)

<Response [200]>
WS token: Bearer eyJraWQiOiIyMDIzMDIxMDA4MjkiLCJhbGciOiJSUzI1NiJ9.eyJpYW1faWQiOiJJQk1pZC02NjgwMDBLVTRYIiwiaWQiOiJJQk1pZC02NjgwMDBLVTRYIiwicmVhbG1pZCI6IklCTWlkIiwianRpIjoiNWM5ZTMzNzAtZmQxYi00NTIwLWJhYTEtM2NiNTBiYjhiZTNhIiwiaWRlbnRpZmllciI6IjY2ODAwMEtVNFgiLCJnaXZlbl9uYW1lIjoiUHlhZSIsImZhbWlseV9uYW1lIjoiQXVuZyIsIm5hbWUiOiJQeWFlIEF1bmciLCJlbWFpbCI6ImRycHlhZXBoeW9hdW5nLnBna0BnbWFpbC5jb20iLCJzdWIiOiJkcnB5YWVwaHlvYXVuZy5wZ2tAZ21haWwuY29tIiwiYXV0aG4iOnsic3ViIjoiZHJweWFlcGh5b2F1bmcucGdrQGdtYWlsLmNvbSIsImlhbV9pZCI6IklCTWlkLTY2ODAwMEtVNFgiLCJuYW1lIjoiUHlhZSBBdW5nIiwiZ2l2ZW5fbmFtZSI6IlB5YWUiLCJmYW1pbHlfbmFtZSI6IkF1bmciLCJlbWFpbCI6ImRycHlhZXBoeW9hdW5nLnBna0BnbWFpbC5jb20ifSwiYWNjb3VudCI6eyJib3VuZGFyeSI6Imdsb2JhbCIsInZhbGlkIjp0cnVlLCJic3MiOiIyZmJlYjU5OGU0OTA0YzJhOTY2MTdlZmYyZTFiNzZiMCIsImZyb3plbiI6dHJ1ZX0sImlhdCI6MTY3NzYwMDg2NSwiZXhwIjoxNjc3NjA0NDY1LCJpc3MiOiJodHRwczovL2lhbS5jbG91ZC5pYm0uY29tL29pZGMvdG9rZW4iLCJncmFudF90eXBlIjoidXJuOmlibTpwYXJhbXM6b2F1dGg6Z3JhbnQtdHlwZTphcGlrZXkiLCJzY

<a id = "create-rts"></a>
## 3. Create Remote Training System Asset

Now you will learn to create a Remote Training System (RTS). An RTS handles receiving your multiple parties' call to the aggregator to run the training. 
- `allowed_identities` are users permitted to connect to the Federated Learning experiment.  In this tutorial, only your user ID is permitted to connect but you can update the template and add additional users as required.
- An Admin in `remote_admin`. The template for the admin is the same as the user. In this tutorial, a template Admin is created. It is also the same as the user ID, however generally in application, the admin does not have to be one of the users.

In [6]:
wml_remote_training_system_asset_one_def = Template("""
{
  "name": "Remote Party 1",
  "project_id": "$projectId",
  "description": "Sample Remote Training System",
  "tags": [ "$tag" ],
  "organization": {
    "name": "IBM",
    "region": "US"
  },
  "allowed_identities": [
    {
      "id": "$userID",
      "type": "user"
    }
  ],
  "remote_admin": {
    "id": "$userID",
    "type": "user"
  }
}
""").substitute(userID = CLOUD_USERID,
                projectId = PROJECT_ID,
                tag = RTS_TAG)


wml_remote_training_system_one_resp = requests.post(WML_SERVICES_URL + "/ml/v4/remote_training_systems", 
                                                    headers={"Content-Type": "application/json",
                                                             "Authorization": token}, 
                                                    params={"version": API_VERSION,
                                                            "project_id": PROJECT_ID}, 
                                                    data=wml_remote_training_system_asset_one_def, 
                                                    verify=False)

print(wml_remote_training_system_one_resp)
status_json = json.loads(wml_remote_training_system_one_resp.content.decode("utf-8"))
print("Create remote training system response : "+ json.dumps(status_json, indent=4))

wml_remote_training_system_one_asset_uid = json.loads(wml_remote_training_system_one_resp.content.decode("utf-8"))["metadata"]["id"]
print("Remote Training System id: %s" % wml_remote_training_system_one_asset_uid)

<Response [201]>
Create remote training system response : {
    "entity": {
        "allowed_identities": [
            {
                "id": "IBMid-668000KU4X",
                "type": "user"
            }
        ],
        "organization": {
            "name": "IBM",
            "region": "US"
        },
        "remote_admin": {
            "id": "IBMid-668000KU4X",
            "type": "user"
        }
    },
    "metadata": {
        "created_at": "2023-02-28T16:15:18.513Z",
        "description": "Sample Remote Training System",
        "id": "57077c88-9985-41ca-81c0-4f993605868d",
        "modified_at": "2023-02-28T16:15:18.513Z",
        "name": "Remote Party 1",
        "owner": "IBMid-668000KU4X",
        "project_id": "2d943195-1222-476b-b14a-3bd688958446",
        "tags": [
            "wmlflxgbsamplerts"
        ]
    }
}
Remote Training System id: 57077c88-9985-41ca-81c0-4f993605868d


<a id = "fl-job"></a>
## 4. Create FL Training Job

In this section, you will launch the Federated Learning experiment.

In [7]:
training_payload = Template(""" 
{
  "name": "FL Aggregator",
  "tags": [ "$tag" ],
  "federated_learning": {
    "fusion_type": "xgb_classifier",
    "learning_rate": 0.1,
    "loss": "binary_crossentropy",
    "max_bins": 255,
    "rounds": 3,
    "num_classes": 2,
    "metrics": "loss",
    "remote_training" : {
      "quorum": 1.0,
      "remote_training_systems": [ { "id" : "$rts_one", "required" : true  } ]
    },
    "software_spec": {
      "name": "runtime-22.2-py3.10"
    },
    "hardware_spec": {
      "name": "XS"
    }
  },
  "training_data_references": [],
  "results_reference": {
    "type": "container",
    "name": "outputData",
    "connection": {},
    "location": {
      "path": "."
    }
  },
  "project_id": "$projectId"  
}
""").substitute(projectId = PROJECT_ID,
                rts_one = wml_remote_training_system_one_asset_uid,
                tag = TRAINING_TAG)

create_training_resp = requests.post(WML_SERVICES_URL + "/ml/v4/trainings", params={"version": API_VERSION},
                                     headers={"Content-Type": "application/json",
                                              "Authorization": token},
                                     data=training_payload,
                                     verify=False)

print(create_training_resp)
status_json = json.loads(create_training_resp.content.decode("utf-8"))
print("Create training response : "+ json.dumps(status_json, indent=4))

training_id = json.loads(create_training_resp.content.decode("utf-8"))["metadata"]["id"]
print("Training id: %s" % training_id)

<Response [201]>
Create training response : {
    "metadata": {
        "created_at": "2023-02-28T16:15:30.437Z",
        "id": "1e1a2a61-c884-44a5-9544-f033cd94b896",
        "name": "FL Aggregator",
        "project_id": "2d943195-1222-476b-b14a-3bd688958446",
        "tags": [
            "wmlflxgbsampletraining"
        ]
    },
    "entity": {
        "federated_learning": {
            "fusion_type": "xgb_classifier",
            "hardware_spec": {
                "name": "XS"
            },
            "learning_rate": 0.1,
            "loss": "binary_crossentropy",
            "max_bins": 255,
            "metrics": "loss",
            "num_classes": 2,
            "remote_training": {
                "quorum": 1.0,
                "remote_training_systems": [
                    {
                        "id": "57077c88-9985-41ca-81c0-4f993605868d",
                        "required": true
                    }
                ]
            },
            "rounds": 3,
        

<a id = "status"></a>
### 4.1 Get Training Job Status

<div class="alert alert-block alert-info">Before you run the following code, please make your that your project is associated with a Watson Machine Learning service. For more details on associating services, please see: <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/assoc-services.html?context=cpdaas&audience=wdp">Associating services</a></div>

In [8]:
get_training_resp = requests.get(WML_SERVICES_URL + "/ml/v4/trainings/" + training_id,
                                 headers={"Content-Type": "application/json",
                                          "Authorization": token},
                                  params={"version": API_VERSION,
                                          "project_id": PROJECT_ID},
                                  verify=False)

print(get_training_resp)
status_json = json.loads(get_training_resp.content.decode("utf-8"))
print("Get training response : "+ json.dumps(status_json, indent=4))

<Response [200]>
Get training response : {
    "metadata": {
        "created_at": "2023-02-28T16:15:30.437Z",
        "id": "1e1a2a61-c884-44a5-9544-f033cd94b896",
        "modified_at": "2023-02-28T16:16:17.285Z",
        "name": "FL Aggregator",
        "project_id": "2d943195-1222-476b-b14a-3bd688958446",
        "tags": [
            "wmlflxgbsampletraining"
        ]
    },
    "entity": {
        "federated_learning": {
            "fusion_type": "xgb_classifier",
            "hardware_spec": {
                "name": "XS"
            },
            "learning_rate": 0.1,
            "loss": "binary_crossentropy",
            "max_bins": 255,
            "metrics": "loss",
            "num_classes": 2,
            "remote_training": {
                "quorum": 1.0,
                "remote_training_systems": [
                    {
                        "id": "57077c88-9985-41ca-81c0-4f993605868d",
                        "required": true
                    }
                ]


<a id = "party-notebook"></a>
## 5. Get Variables And Paste Into Party Notebook

Run the following cell and copy the output. 

In [9]:
print("WML_SERVICES_HOST = '%s'" % WML_SERVICES_HOST)
print("PROJECT_ID = '%s'" % PROJECT_ID)
print("IAM_APIKEY = '%s'" % IAM_APIKEY)
print("RTS_ID = '%s'" % wml_remote_training_system_one_asset_uid)
print("TRAINING_ID = '%s'" % (training_id))

WML_SERVICES_HOST = 'jp-tok.ml.cloud.ibm.com'
PROJECT_ID = '2d943195-1222-476b-b14a-3bd688958446'
IAM_APIKEY = 'BAPxeik6HjmRl30Patg8NwwKy4sIYwLMV-zakiBN_sie'
RTS_ID = '57077c88-9985-41ca-81c0-4f993605868d'
TRAINING_ID = '1e1a2a61-c884-44a5-9544-f033cd94b896'


As the Admin, you have now launched a Federated Learning experiment. Copy the output from the previous cell. Open Part 2 - WML Federated Learning with XGBoost and Adult Income dataset - Party and paste the output into the first code cell.  Run the Part 2 - Party notebook to the end.

<a id = "save-model"></a>
## 6. Save Trained Model To Project

Once training has completed, run the cells below to save the trained model into your project.

<a id = "cos"></a>
### 6.1 Connection to COS

This information is located in your Watson Studio project, under the Manage tab, on the General page.

1. The bucket name is listed inside the Storage pane.
2. To obtain the credentials click on the Manage in IBM Cloud link located inside the Storage pane.  From your COS instance click Service Credentials.  You can use an existing or create a new credential if needed.  
- COS_APIKEY - the "apikey" from your credentials
- COS_RESOURCE_INSTANCE_ID - the "resource_instance_id" from your credentials
3. The COS endpoints are listed in your COS instance under Endpoints.

In [None]:
BUCKET = "" # bucket used by project ex. myproject-donotdelete-pr-tdnvueqivxep8v. Go to your project > Manage and check the bucket name under Cloud storage.

COS_ENDPOINT = "https://s3.us.cloud-object-storage.appdomain.cloud" # Current list available at https://control.cloud-object-storage.cloud.ibm.com/v2/endpoints

# Find these in cloud.ibm.com > Storage > Credentials > <Your COS bucket> 
COS_APIKEY = "" # eg "W00YixxxxxxxxxxMB-odB-2ySfTrFBIQQWanc--P3byk" 
COS_RESOURCE_INSTANCE_ID = "" # eg "crn:v1:bluemix:public:cloud-object-storage:global:a/3bf0d9003xxxxxxxxxx1c3e97696b71c:d6f04d83-6c4f-4a62-a165-696756d63903::"

<a id = "cos-prereqs"></a>
### 6.2 Install pre-req

In [None]:
!pip install ibm-cos-sdk

<a id = "save-to-project"></a>
### 6.3 Save model to project

In [None]:
import ibm_boto3
from ibm_botocore.client import Config, ClientError

cos = ibm_boto3.resource("s3",
    ibm_api_key_id=COS_APIKEY,
    ibm_service_instance_id=COS_RESOURCE_INSTANCE_ID,
    config=Config(signature_version="oauth"),
    endpoint_url=COS_ENDPOINT
)

ITEM_NAME = training_id + "/assets/" + training_id + "/resources/wml_model/request.json"

file = cos.Object(BUCKET, ITEM_NAME).get()
req = json.loads(file["Body"].read())


req["name"] = "Trained Adult Income Model"

model_save_payload = json.dumps(req)
print ("Model save payload: %s" % model_save_payload)

In [None]:
model_save_resp = requests.post(WML_SERVICES_URL + "/ml/v4/models",
                                params={"version": API_VERSION,
                                        "project_id": PROJECT_ID,
                                        "content_format": "native"},
                                headers={"Content-Type": "application/json",
                                         "Authorization": token},
                                data=model_save_payload,
                                verify=False)

print(model_save_resp)
status_json = json.loads(model_save_resp.content.decode("utf-8"))
print("Save model response : "+ json.dumps(status_json, indent=4))

model_id = json.loads(model_save_resp.content.decode("utf-8"))["metadata"]["id"]
print("Saved model id: %s" % model_id)

<a id = "cleanup"></a>
## 7. Clean Up Project

Use this section to delete the training jobs and assets created by this notebook.

<a id = "list-jobs"></a>
### 7.1 List all training jobs in project

In [None]:
get_training_resp = requests.get(WML_SERVICES_URL + "/ml/v4/trainings",
                                 headers={"Content-Type": "application/json",
                                          "Authorization": token},
                                 params={"version": API_VERSION,
                                         "project_id": PROJECT_ID},
                                 verify=False)

print(get_training_resp)
status_json = json.loads(get_training_resp.content.decode("utf-8"))
print("Get training response : "+ json.dumps(status_json, indent=4))

<a id = "del-jobs"></a>
### 7.2 Delete all training jobs in this project created by this notebook

This will stop all running aggregators created using this notebook.

In [None]:
get_training_resp = requests.get(WML_SERVICES_URL + "/ml/v4/trainings",
                                 headers={"Content-Type": "application/json",
                                          "Authorization": token},
                                 params={"version": API_VERSION,
                                         "project_id": PROJECT_ID,
                                         "tag.value": TRAINING_TAG},
                                 verify=False)

training_list_json = json.loads(get_training_resp.content.decode("utf-8"))
training_resources=training_list_json["resources"]

for training in training_resources:
    training_id = training["metadata"]["id"]
    print("Deleting Training ID: " + training_id)
    delete_training_resp = requests.delete(WML_SERVICES_URL + "/ml/v4/trainings/" + training_id,
                                           headers={"Content-Type": "application/json",
                                                    "Authorization": token},
                                           params={"version": API_VERSION,
                                                   "project_id": PROJECT_ID,
                                                   "hard_delete": True},
                                           verify=False)
    print(delete_training_resp)

<a id = "list-rts"></a>
### 7.3 List all remote training systems in project

In [None]:
get_rts_resp = requests.get(WML_SERVICES_URL + "/ml/v4/remote_training_systems", 
                            headers={"Content-Type": "application/json",
                                     "Authorization": token}, 
                            params={"version": API_VERSION,
                                    "project_id": PROJECT_ID}, 
                            verify=False)

print(get_rts_resp)
rts_list_json = json.loads(get_rts_resp.content.decode("utf-8"))
print("Remote Training Systems in Project : "+ json.dumps(rts_list_json, indent=4))

<a id = "del-rts"></a>
### 7.4 Delete all remote training systems in this project created by this notebook

In [None]:
get_rts_resp = requests.get(WML_SERVICES_URL + "/ml/v4/remote_training_systems", 
                            headers={"Content-Type": "application/json",
                                     "Authorization": token}, 
                            params={"version": API_VERSION,
                                    "project_id": PROJECT_ID,
                                    "tag.value": RTS_TAG}, 
                            verify=False)

rts_list_json = json.loads(get_rts_resp.content.decode("utf-8"))
rts_resources=rts_list_json["resources"]

for rts in rts_resources:
    rts_id = rts["metadata"]["id"]
    print("Deleting RTS ID: " + rts_id)
    delete_rts_resp = requests.delete(WML_SERVICES_URL + "/ml/v4/remote_training_systems/" + rts_id, 
                                      headers={"Content-Type": "application/json",
                                               "Authorization": token}, 
                                      params={"version": API_VERSION,
                                              "project_id": PROJECT_ID}, 
                                      verify=False)
    print(delete_rts_resp)

# <hr>
Copyright © 2020-2022 IBM. This notebook and its source code are released under the terms of the MIT License.
 
<br><br>
<div style="background:#F5F7FA; height:110px; padding: 2em; font-size:14px;">
<span style="font-size:18px;color:#152935;">Love this notebook? </span>
<span style="font-size:15px;color:#152935;float:right;margin-right:40px;">Don't have an account yet?</span><br>
<span style="color:#5A6872;">Share it with your colleagues and help them discover the power of Watson Studio!</span>
<span style="border: 1px solid #3d70b2;padding:8px;float:right;margin-right:40px; color:#3d70b2;"><a href="https://ibm.co/wsnotebooks" target="_blank" style="color: #3d70b2;text-decoration: none;">Sign Up</a></span><br>
</div>