# Part 1 - WML Federated Learning with XGBoost and Adult Income dataset - Aggregator 

With IBM Federated Learning, you can combine data from multiple sources to train a model from the collective data without having to actually share them. This allows enterprises to train data with other companies without delegating resources for security. Another advantage is the remote data does not have to be centralized in one location, eliminates the needs to move potentially large datasets. This notebook demonstrates how to start Federated Learning with the Python client. For more details setting up Federated Learning, terminology, and running Federated Learning from the UI, see [Federated Learning documentation](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fed-lea.html?audience=wdp).

### Learning Goals

When you complete the Part 1 - WML Federated Learning with XGBoost and Adult Income dataset - Aggregator notebook, you should know how to:

- Create a Remote Training System
- Start a training job

Once you complete this notebook, please open [Part 2 - WML Federated Learning with XGBoost and Adult Income dataset - Party](https://dataplatform.cloud.ibm.com/). 

<div class="alert alert-block alert-info">This notebook is intended to be run by the administrator of the Federated Learning experiment.</div>

## Table of Contents

- [1. Prequisites](#prequisites)
    - [1.1 Define variables](#var)
    - [1.2 Define tags](#tags)
    - [1.3 Import libraries](#libraries)
- [2. Obtain IBM Cloud Token](#auth)
- [3. Create a Remote Training System](#create-rts)
- [4. Create FL Training Job](#fl-job)
    - [4.1 Get Training Job Status](#status)
- [5. Get Variables And Paste Into Party Notebook](#party-notebook)
- [6. Save Trained Model](#save-model)
- [7. Save Trained Model](#cleanup)
    - [7.1 List all training jobs](#list-jobs)
    - [7.2 Delete training jobs](#del-jobs)
    - [7.3 List all Remote Training Systems](#list-rts)
    - [7.4 Delete Remote Training Systems](#del-rts)

<a id = "prequisites"></a>
## 1. Prequisites

1. You will need an IBM Cloud Pak for Data 4.7 system with Watson Studio and Watson Machine Learning installed.
<br/><br/>
2. The following information:
   1. URL of your IBM Cloud Pak for Data system.
   2. Your `Username`, `Password` and `User ID`.  You can find this information from the *Administration > Access Control* page.
   3. The ID of a new or existing project to be used.  This can be found from *Projects > Specific project > Manage > Project ID*. 

In [None]:
import psutil

mem_recommended = 4
mem_total = round(psutil.virtual_memory().total / 1073741824, 2)

print("System has " + format(mem_total) + "GB of memory.")
if mem_total < mem_recommended:
	print("WARNING: Running this notebook with less than " + format(mem_recommended) + "GB of memory may cause unexpected errors.")

<a id = "var"></a>
### 1.1 Define variables

In [None]:
API_VERSION = "2021-10-01"

# Hostname of CP4D cluster
CP4D_HOST = "XXX" # <host URL>
CP4D_URL = "https://" + CP4D_HOST

# Enter your CP4D username / password
WS_USER = "XXX" # <username>
WS_PASSWORD = "XXX" # <password>

# User ID for admin user, get this in Administration < User Management (CP4D_URL/zen-admin/?deployment_target=icp4data#/usermgmt-ui)
WS_USERID = "XXX" # <user ID>

PROJECT_ID = "XXX" # Get this by going into your WS project and checking the URL.

<a id = "tags"></a>
### 1.2 Define tags

Used to identify the assets created by this notebook

In [None]:
RTS_TAG = "wmlflxgbsamplerts"
TRAINING_TAG = "wmlflxgbsampletraining"

<a id = "libraries"></a>
### 1.3 Import libraries

In [None]:
import urllib3
import requests
import json
from string import Template

urllib3.disable_warnings()

<a id = "auth"></a>
## 2. Obtain Cloud authentication token

In [None]:
import base64

ws_userpass = WS_USER + ":" + WS_PASSWORD
enc_bytes = base64.b64encode(ws_userpass.encode("utf-8"))
ws_auth = str(enc_bytes, "utf-8")

token_resp = requests.get(CP4D_URL + "/v1/preauth/validateAuth",
                          headers={"Content-Type": "application/json",
                                   "Authorization": "Basic " + ws_auth},
                          verify=False)

print(token_resp)
token = "Bearer " + json.loads(token_resp.content.decode("utf-8"))["accessToken"]
print("WS Token: %s " % token)

<a id = "create-rts"></a>
## 3. Create Remote Training System Asset

Now you will learn to create a Remote Training System (RTS). An RTS handles receiving your multiple parties' call to the aggregator to run the training. 
- `allowed_identities` are users permitted to connect to the Federated Learning experiment.  In this tutorial, only your user ID is permitted to connect but you can update the template and add additional users as required.
- An Admin in `remote_admin`. The template for the admin is the same as the user. In this tutorial, a template Admin is created. It is also the same as the user ID, however generally in application, the admin does not have to be one of the users.

In [None]:
wml_remote_training_system_asset_one_def = Template("""
{
  "name": "Remote Party 1",
  "project_id": "$projectId",
  "description": "Sample Remote Training System",
  "tags": [ "$tag" ],
  "organization": {
    "name": "IBM",
    "region": "US"
  },
  "allowed_identities": [
    {
      "id": "$userID",
      "type": "user"
    }
  ],
  "remote_admin": {
    "id": "$userID",
    "type": "user"
  }
}
""").substitute(userID = WS_USERID,
                projectId = PROJECT_ID,
                tag = RTS_TAG)


wml_remote_training_system_one_resp = requests.post(CP4D_URL + "/ml/v4/remote_training_systems", 
                                                    headers={"Content-Type": "application/json",
                                                             "Authorization": token}, 
                                                    params={"version": API_VERSION,
                                                            "project_id": PROJECT_ID}, 
                                                    data=wml_remote_training_system_asset_one_def, 
                                                    verify=False)

print(wml_remote_training_system_one_resp)
status_json = json.loads(wml_remote_training_system_one_resp.content.decode("utf-8"))
print("Create remote training system response : "+ json.dumps(status_json, indent=4))

wml_remote_training_system_one_asset_uid = json.loads(wml_remote_training_system_one_resp.content.decode("utf-8"))["metadata"]["id"]
print("Remote Training System id: %s" % wml_remote_training_system_one_asset_uid)

<a id = "fl-job"></a>
## 4. Create FL Training Job

In this section, you will launch the Federated Learning experiment.

In [None]:
training_payload = Template(""" 
{
  "name": "FL Aggregator",
  "tags": [ "$tag" ],
  "federated_learning": {
    "fusion_type": "xgb_classifier",
    "learning_rate": 0.1,
    "loss": "binary_crossentropy",
    "max_bins": 255,
    "rounds": 3,
    "num_classes": 2,
    "metrics": "loss",
    "remote_training" : {
      "quorum": 1.0,
      "remote_training_systems": [ { "id" : "$rts_one", "required" : true  } ]
    },
    "software_spec": {
      "name": "runtime-23.1-py3.10"
    },
    "hardware_spec": {
      "name": "XS"
    }
  },
  "training_data_references": [],
  "results_reference": {
    "type": "fs",
    "name": "outputData",
    "location": {
      "path": "projects/$projectId/assets/data_asset"
    }
  },
  "project_id": "$projectId"  
}
""").substitute(projectId = PROJECT_ID,
                rts_one = wml_remote_training_system_one_asset_uid,
                tag = TRAINING_TAG)

create_training_resp = requests.post(CP4D_URL + "/ml/v4/trainings", params={"version": API_VERSION},
                                     headers={"Content-Type": "application/json",
                                              "Authorization": token},
                                     data=training_payload,
                                     verify=False)

print(create_training_resp)
status_json = json.loads(create_training_resp.content.decode("utf-8"))
print("Create training response : "+ json.dumps(status_json, indent=4))

training_id = json.loads(create_training_resp.content.decode("utf-8"))["metadata"]["id"]
print("Training id: %s" % training_id)

<a id = "status"></a>
### 4.1 Get Training Job Status

<div class="alert alert-block alert-info">

<b>Cloud:</b> Before you run the following code, please make your that your project is associated with a Watson Machine Learning service. For more details on associating services, please see: <a href="https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/assoc-services.html?context=cpdaas&audience=wdp">Associating services</a>

<b>CPD:</b> Ensure that your CPD cluster has installed the Watson Machine Learning service. For more details, see <a href="https://www.ibm.com/docs/en/cloud-paks/cp-data/4.0?topic=learning-installing-watson-machine">Installing Watson Machine Learning</a>
    
</div>

In [None]:
get_training_resp = requests.get(CP4D_URL + "/ml/v4/trainings/" + training_id,
                                 headers={"Content-Type": "application/json",
                                          "Authorization": token},
                                 params={"version": API_VERSION,
                                          "project_id": PROJECT_ID},
                                 verify=False)

print(get_training_resp)
status_json = json.loads(get_training_resp.content.decode("utf-8"))
print("Get training response : "+ json.dumps(status_json, indent=4))

<a id = "party-notebook"></a>
## 5. Get Variables And Paste Into Party Notebook

Run the following cell and copy the output. 

In [None]:
print("CP4D_HOST = '%s'" % CP4D_HOST)
print("WS_USER = '%s'" % WS_USER)
print("WS_PASSWORD = '%s'" % WS_PASSWORD)
print("PROJECT_ID = '%s'" % PROJECT_ID)
print("RTS_ID = '%s'" % wml_remote_training_system_one_asset_uid)
print("TRAINING_ID = '%s'" % (training_id))

As the Admin, you have now launched a Federated Learning experiment. Copy the output from the previous cell. Open Part 2 - WML Federated Learning with XGBoost and Adult Income dataset - Party and paste the output into the first code cell.  Run the Part 2 - Party notebook to the end.

<a id = "save-model"></a>
## 6. Save Trained Model To Project

Once training has completed, run the cells below to save the trained model into your project.

In [None]:
json_file = "/project_data/data_asset/" + training_id + "/assets/" + training_id + "/resources/wml_model/request.json"

with open(json_file, 'r') as file:
    data = file.read()

req = json.loads(data)

req["name"] = "Trained Adult Income Model"

model_save_payload = json.dumps(req)
print ("Model save payload: %s" % model_save_payload)

In [None]:
model_save_resp = requests.post(CP4D_URL + "/ml/v4/models",
                                params={"version": API_VERSION,
                                        "project_id": PROJECT_ID,
                                        "content_format": "native"},
                                headers={"Content-Type": "application/json",
                                         "Authorization": token},
                                data=model_save_payload,
                                verify=False)

print(model_save_resp)
status_json = json.loads(model_save_resp.content.decode("utf-8"))
print("Save model response : "+ json.dumps(status_json, indent=4))

model_id = json.loads(model_save_resp.content.decode("utf-8"))["metadata"]["id"]
print("Saved model id: %s" % model_id)

<a id = "cleanup"></a>
## 7. Clean Up Project

Use this section to delete the training jobs and assets created by this notebook.

<a id = "list-jobs"></a>
### 7.1 List all training jobs in project

In [None]:
get_training_resp = requests.get(CP4D_URL + "/ml/v4/trainings",
                                 headers={"Content-Type": "application/json",
                                          "Authorization": token},
                                 params={"version": API_VERSION,
                                         "project_id": PROJECT_ID},
                                 verify=False)

print(get_training_resp)
status_json = json.loads(get_training_resp.content.decode("utf-8"))
print("Get training response : "+ json.dumps(status_json, indent=4))

<a id = "del-jobs"></a>
### 7.2 Delete all training jobs in this project created by this notebook

This will stop all running aggregators created using this notebook.

In [None]:
get_training_resp = requests.get(CP4D_URL + "/ml/v4/trainings",
                                 headers={"Content-Type": "application/json",
                                          "Authorization": token},
                                 params={"version": API_VERSION,
                                         "project_id": PROJECT_ID,
                                         "tag.value": TRAINING_TAG},
                                 verify=False)

training_list_json = json.loads(get_training_resp.content.decode("utf-8"))
training_resources=training_list_json["resources"]

for training in training_resources:
    training_id = training["metadata"]["id"]
    print("Deleting Training ID: " + training_id)
    delete_training_resp = requests.delete(CP4D_URL + "/ml/v4/trainings/" + training_id,
                                           headers={"Content-Type": "application/json",
                                                    "Authorization": token},
                                           params={"version": API_VERSION,
                                                   "project_id": PROJECT_ID,
                                                   "hard_delete": True},
                                           verify=False)
    print(delete_training_resp)

<a id = "list-rts"></a>
### 7.3 List all remote training systems in project

In [None]:
get_rts_resp = requests.get(CP4D_URL + "/ml/v4/remote_training_systems", 
                            headers={"Content-Type": "application/json",
                                     "Authorization": token}, 
                            params={"version": API_VERSION,
                                    "project_id": PROJECT_ID}, 
                            verify=False)

print(get_rts_resp)
rts_list_json = json.loads(get_rts_resp.content.decode("utf-8"))
print("Remote Training Systems in Project : "+ json.dumps(rts_list_json, indent=4))

<a id = "del-rts"></a>
### 7.4 Delete all remote training systems in this project created by this notebook

In [None]:
get_rts_resp = requests.get(CP4D_URL + "/ml/v4/remote_training_systems", 
                            headers={"Content-Type": "application/json",
                                     "Authorization": token}, 
                            params={"version": API_VERSION,
                                    "project_id": PROJECT_ID,
                                    "tag.value": RTS_TAG}, 
                            verify=False)

rts_list_json = json.loads(get_rts_resp.content.decode("utf-8"))
rts_resources=rts_list_json["resources"]

for rts in rts_resources:
    rts_id = rts["metadata"]["id"]
    print("Deleting RTS ID: " + rts_id)
    delete_rts_resp = requests.delete(CP4D_URL + "/ml/v4/remote_training_systems/" + rts_id, 
                                      headers={"Content-Type": "application/json",
                                               "Authorization": token}, 
                                      params={"version": API_VERSION,
                                              "project_id": PROJECT_ID}, 
                                      verify=False)
    print(delete_rts_resp)

# <hr>
Copyright © 2020-2023 IBM. This notebook and its source code are released under the terms of the MIT License.
<br><br>
<div style="background:#F5F7FA; height:110px; padding: 2em; font-size:14px;">
<span style="font-size:18px;color:#152935;">Love this notebook? </span>
<span style="font-size:15px;color:#152935;float:right;margin-right:40px;">Don't have an account yet?</span><br>
<span style="color:#5A6872;">Share it with your colleagues and help them discover the power of Watson Studio!</span>
<span style="border: 1px solid #3d70b2;padding:8px;float:right;margin-right:40px; color:#3d70b2;"><a href="https://ibm.co/wsnotebooks" target="_blank" style="color: #3d70b2;text-decoration: none;">Sign Up</a></span><br>
</div>