# 垂直联邦XGB (SecureBoost)

> 以下代码仅供演示。出于系统安全考虑，请 **不要** 直接用于生产。

欢迎来到SecureBoost教程！

在本教程中，我们将探索如何使用隐语的树模型能力，使用SecureBoost算法执行垂直联邦学习。SecureBoost是一种经典算法，它优先保护垂直分区数据集中的标签信息。它使用同态加密技术实现标签加密和密文中的关键树增强步骤执行。其结果是由PYU对象组成的分布式提升树模型，每个参与方仅了解自己的拆分点。该实现利用HEU和PYU设备实现高性能。

让我们深入了解细节，学习如何使用隐语进行SecureBoost！

## 设备设置

与其他算法类似，设置安全集群和指定设备对于SecureBoost的实现是必要的。

特别是，必须指定一个HEU设备以确保SecureBoost中标签的加密和敏感信息的保护。

In [1]:
import spu
from sklearn.metrics import roc_auc_score

import secretflow as sf
from secretflow.data import FedNdarray, PartitionWay
from secretflow.device.driver import reveal, wait
from secretflow.ml.boost.sgb_v import (
    Sgb,
    get_classic_XGB_params,
    get_classic_lightGBM_params,
)
from secretflow.ml.boost.sgb_v.model import load_model
import pprint

pp = pprint.PrettyPrinter(depth=4)

# Check the version of your SecretFlow
print('The version of SecretFlow: {}'.format(sf.__version__))

The version of SecretFlow: 1.7.0b0


In [2]:
sf.shutdown()

pyu_port = 16307
spu_port = 11666


cluster_config = {
    "parties": {
        "alice": {
            # replace with alice's real address.
            "address": "ecm-01:" + str(pyu_port),
            "listen_addr": "0.0.0.0:" + str(pyu_port),
        },
        "bob": {
            # replace with bob's real address.
            "address": "ecm-02:" + str(pyu_port),
            "listen_addr": "0.0.0.0:" + str(pyu_port),
        },
    },
    "self_party": "bob",
}

tls_config = {
    "ca_cert": "/home/beng003/certificate/alice_ca.crt",
    "cert": "/home/beng003/certificate/bob_server_cert.crt",
    "key": "/home/beng003/certificate/bob_server_key.key",
}


sf.init(address="ecm-02:6379", cluster_config=cluster_config, tls_config=tls_config)


# HEU settings
heu_config = {
    'sk_keeper': {'party': 'alice'},
    'evaluators': [{'party': 'bob'}],
    'mode': 'PHEU',
    'he_parameters': {
        # ou is a fast encryption schema that is as secure as paillier.
        'schema': 'ou',
        'key_pair': {
            'generate': {
                # bit size should be 2048 to provide sufficient security.
                'bit_size': 2048,
            },
        },
    },
    'encoding': {
        'cleartext_type': 'DT_I32',
        'encoder': "IntegerEncoder",
        'encoder_args': {"scale": 1},
    },
}

2024-08-02 19:51:53,663	INFO worker.py:1540 -- Connecting to existing Ray cluster at address: ecm-02:6379...
2024-08-02 19:51:53,671	INFO worker.py:1724 -- Connected to Ray cluster.
2024-08-02 19:51:53.699 INFO api.py:233 [bob] -- [Anonymous_job] Started rayfed with {'CLUSTER_ADDRESSES': {'alice': 'ecm-01:16307', 'bob': '0.0.0.0:16307'}, 'CURRENT_PARTY_NAME': 'bob', 'TLS_CONFIG': {'ca_cert': '/home/beng003/certificate/alice_ca.crt', 'cert': '/home/beng003/certificate/bob_server_cert.crt', 'key': '/home/beng003/certificate/bob_server_key.key'}}
2024-08-02 19:51:54.740 INFO barriers.py:284 [bob] -- [Anonymous_job] Succeeded to create receiver proxy actor.
[36m(ReceiverProxyActor pid=1660988)[0m 2024-08-02 19:51:54.735 INFO grpc_proxy.py:359 [bob] -- [Anonymous_job] ReceiverProxy binding port 16307, options: (('grpc.enable_retries', 1), ('grpc.so_reuseport', 0), ('grpc.max_send_message_length', 524288000), ('grpc.max_receive_message_length', 524288000), ('grpc.service_config', '{"method

Alice and Bob are ready to go!


In [3]:
alice = sf.PYU('alice')
bob = sf.PYU('bob')
heu = sf.HEU(heu_config, spu.spu_pb2.FM128)
print("Alice and Bob are ready to go!")

## 数据准备

我们将准备一个垂直数据集。

In [4]:
from sklearn.datasets import load_breast_cancer

ds = load_breast_cancer()
x, y = ds['data'], ds['target']

v_data = FedNdarray(
    {
        alice: (alice(lambda: x[:, :15])()),
        bob: (bob(lambda: x[:, 15:])()),
    },
    partition_way=PartitionWay.VERTICAL,
)
label_data = FedNdarray(
    {alice: (alice(lambda: y)())},
    partition_way=PartitionWay.VERTICAL,
)

### 参数准备

In [5]:
params = get_classic_XGB_params()
params['num_boost_round'] = 3
params['max_depth'] = 3
pp.pprint(params)

{'audit_paths': {},
 'base_score': 0.0,
 'batch_encoding_enabled': True,
 'bottom_rate': 0.5,
 'colsample_by_tree': 1.0,
 'enable_early_stop': False,
 'enable_goss': False,
 'enable_monitor': False,
 'enable_packbits': False,
 'enable_quantization': False,
 'eval_metric': 'roc_auc',
 'first_tree_with_label_holder_feature': True,
 'fixed_point_parameter': 20,
 'gamma': 0.0,
 'learning_rate': 0.3,
 'max_depth': 3,
 'max_leaf': 15,
 'num_boost_round': 3,
 'objective': 'logistic',
 'quantization_scale': 10000.0,
 'reg_lambda': 1.0,
 'rowsample_by_tree': 1.0,
 'save_best_model': False,
 'seed': 1212,
 'sketch_eps': 0.1,
 'stopping_rounds': 1,
 'stopping_tolerance': 0.0,
 'top_rate': 0.3,
 'tree_growing_method': 'level',
 'validation_fraction': 0.1}


## 运行 Sgb

我们使用 heu 设备创建一个 Sgb 对象，并拟合数据。

In [6]:
sgb = Sgb(heu)
model = sgb.train(params, v_data, label_data)

2024-08-02 19:54:12.569 INFO proxy.py:180 [bob] -- [Anonymous_job] Create proxy actor <class 'secretflow.ml.boost.sgb_v.factory.sgb_actor.SGBActor'> with party alice.
2024-08-02 19:54:12.570 INFO proxy.py:180 [bob] -- [Anonymous_job] Create proxy actor <class 'secretflow.ml.boost.sgb_v.factory.sgb_actor.SGBActor'> with party bob.
2024-08-02 19:54:12.605 INFO global_ordermap_booster.py:214 [bob] -- [Anonymous_job] training the first tree with label holder only.
2024-08-02 19:54:12.606 INFO level_wise_tree_trainer.py:113 [bob] -- [Anonymous_job] train tree context set up.
2024-08-02 19:54:12.611 INFO level_wise_tree_trainer.py:202 [bob] -- [Anonymous_job] begin train tree.
2024-08-02 19:54:24.094 INFO global_ordermap_booster.py:237 [bob] -- [Anonymous_job] epoch 0 time 11.48967363697011s
2024-08-02 19:54:24.096 INFO level_wise_tree_trainer.py:113 [bob] -- [Anonymous_job] train tree context set up.
2024-08-02 19:54:24.102 INFO level_wise_tree_trainer.py:202 [bob] -- [Anonymous_job] begin 

[36m(ReceiverProxyActor pid=1660988)[0m [2024-08-02 19:54:24.149] [info] [thread_pool.cc:30] Create a fixed thread pool with size 7
[36m(_run pid=1661381)[0m [2024-08-02 19:54:32.772] [info] [thread_pool.cc:30] Create a fixed thread pool with size 7


## 模型评估

现在我们可以将模型输出与真实标签进行比较。

In [7]:
yhat = model.predict(v_data)
yhat = reveal(yhat)
print(f"auc: {roc_auc_score(y, yhat)}")

auc: 0.9952235611225622


## 模型保存和加载

现在我们可以保存模型, 并在以后使用它。请注意，模型是分布式的，我们将保存到多个参与方，并从多个参与方中加载。

让我们先定义路径。

In [8]:
# each participant party needs a location to store
saving_path_dict = {
    # in production we may use remote oss, for example.
    device: "./" + device.party
    for device in v_data.partitions.keys()
}

然后让我们保存模型。

In [9]:
r = model.save_model(saving_path_dict)
wait(r)

现在您可以在之前指定的位置检查文件。

最后，让我们加载模型并进行一次检查。

In [10]:
# alice is our label holder
model_loaded = load_model(saving_path_dict, alice)
fed_yhat_loaded = model_loaded.predict(v_data, alice)
yhat_loaded = reveal(fed_yhat_loaded.partitions[alice])

assert (
    yhat == yhat_loaded
).all(), "loaded model predictions should match original, yhat {} vs yhat_loaded {}".format(
    yhat, yhat_loaded
)

### 更多训练设置

如果我们想用lightGBM的方式训练树模型怎么办？我们可以设置按叶节点训练并开启GOSS功能。

In [11]:
params = get_classic_lightGBM_params()
params['num_boost_round'] = 3
params['max_leaf'] = 2**3
pp.pprint(params)
model = sgb.train(params, v_data, label_data)

{'audit_paths': {},
 'base_score': 0.0,
 'batch_encoding_enabled': True,
 'bottom_rate': 0.5,
 'colsample_by_tree': 1.0,
 'enable_early_stop': False,
 'enable_goss': True,
 'enable_monitor': False,
 'enable_packbits': False,
 'enable_quantization': False,
 'eval_metric': 'roc_auc',
 'first_tree_with_label_holder_feature': True,
 'fixed_point_parameter': 20,
 'gamma': 0.0,
 'learning_rate': 0.3,
 'max_depth': 5,
 'max_leaf': 8,
 'num_boost_round': 3,
 'objective': 'logistic',
 'quantization_scale': 10000.0,
 'reg_lambda': 1.0,
 'rowsample_by_tree': 1.0,
 'save_best_model': False,
 'seed': 1212,
 'sketch_eps': 0.1,
 'stopping_rounds': 1,
 'stopping_tolerance': 0.0,
 'top_rate': 0.3,
 'tree_growing_method': 'leaf',
 'validation_fraction': 0.1}


2024-08-02 19:55:57.305 INFO proxy.py:180 [bob] -- [Anonymous_job] Create proxy actor <class 'secretflow.ml.boost.sgb_v.factory.sgb_actor.SGBActor'> with party alice.
2024-08-02 19:55:57.306 INFO proxy.py:180 [bob] -- [Anonymous_job] Create proxy actor <class 'secretflow.ml.boost.sgb_v.factory.sgb_actor.SGBActor'> with party bob.
2024-08-02 19:55:57.336 INFO global_ordermap_booster.py:214 [bob] -- [Anonymous_job] training the first tree with label holder only.
2024-08-02 19:55:57.337 INFO leaf_wise_tree_trainer.py:117 [bob] -- [Anonymous_job] train tree context set up.
2024-08-02 19:55:59.366 INFO leaf_wise_tree_trainer.py:209 [bob] -- [Anonymous_job] begin train tree.
2024-08-02 19:56:07.398 INFO global_ordermap_booster.py:237 [bob] -- [Anonymous_job] epoch 0 time 10.061852868064307s
2024-08-02 19:56:07.399 INFO leaf_wise_tree_trainer.py:117 [bob] -- [Anonymous_job] train tree context set up.
2024-08-02 19:56:07.415 INFO leaf_wise_tree_trainer.py:209 [bob] -- [Anonymous_job] begin tra

In [12]:
yhat = model.predict(v_data)
yhat = reveal(yhat)
print(f"auc: {roc_auc_score(y, yhat)}")

auc: 0.992944347550341
