# 逻辑回归 with SPU

[SPU](https://www.secretflow.org.cn/en/docs/spu/) 是一个特定领域的编译器和运行时套件，提供可证明的安全计算服务。SPU编译器使用 [XLA](https://www.tensorflow.org/xla) 作为前端IR，支持多种AI框架（如Tensorflow、JAX和PyTorch）。SPU 编译器将 XLA 转换为可由 SPU 运行时解释的 IR。 目前 SPU 团队强烈推荐使用 [JAX](https://github.com/google/jax) 作为前端。


在本实验中，我们选择 [Breast Cancer](https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+\(diagnostic\)) 作为数据集。 我们需要通过 30 个特征来判断癌症是恶性还是良性。 在 MPC 程序中，两方共同训练模型，每一方提供一半的特征（15）。

While, first, let's just forget MPC settings and just write a Logistic Regression training program with JAX directly.

首先，让我们忘记 MPC 语意，直接使用 JAX 编写逻辑回归训练程序。

## 训练一个模型 with JAX

### Load the Dataset

We are going to split the whole dataset into train and test subsets after normalization with `breast_cancer`.

- if `train` is `True`, returns train subsets. In order to simulate training with vertical dataset splitting, the `party_id` is provided.
- else, returns test subsets.

### 加载数据集

我们将在使用“breast_cancer”标准化后将整个数据集拆分为训练和测试子集。 

- 如果 `train` 是 `True`，返回训练子集，另外，为了模拟垂直数据集拆分的训练，还需要提供“party_id”参数。
- 否则，返回测试子集。

In [1]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import Normalizer


def breast_cancer(party_id=None, train: bool = True) -> (np.ndarray, np.ndarray):
    x, y = load_breast_cancer(return_X_y=True)
    x = (x - np.min(x)) / (np.max(x) - np.min(x))
    x_train, x_test, y_train, y_test = train_test_split(
        x, y, test_size=0.2, random_state=42
    )

    if train:
        if party_id:
            if party_id == 1:
                return x_train[:, :15], _
            else:
                return x_train[:, 15:], y_train
        else:
            return x_train, y_train
    else:
        return x_test, y_test

### Define the Model

First, let's define the loss function, which is a negative log-likelihood in our case.

### 定义模型

定义损失函数，在我们的例子中它是一个负对数似然。

In [2]:
import jax.numpy as jnp


def sigmoid(x):
    return 1 / (1 + jnp.exp(-x))


# Outputs probability of a label being true.
def predict(W, b, inputs):
    return sigmoid(jnp.dot(inputs, W) + b)


# Training loss is the negative log-likelihood of the training examples.
def loss(W, b, inputs, targets):
    preds = predict(W, b, inputs)
    label_probs = preds * targets + (1 - preds) * (1 - targets)
    return -jnp.mean(jnp.log(label_probs))

Second, let's define a single train step with SGD optimizer. Just to remind you, x1 represents 15 features from one party while x2 represents the other 15 features from the other party.

其次，让我们使用 SGD 优化器定义单个训练步骤。 提醒一下，x1 代表来自一方的 15 个特征，而 x2 代表来自另一方的其他 15 个特征。

In [3]:
from jax import grad


def train_step(W, b, x1, x2, y, learning_rate):
    x = jnp.concatenate([x1, x2], axis=1)
    Wb_grad = grad(loss, (0, 1))(W, b, x, y)
    W -= learning_rate * Wb_grad[0]
    b -= learning_rate * Wb_grad[1]
    return W, b

Last, let's build everything together as a `fit` method which returns the model and losses of each epoch.

最后，让我们将所有内容构建为“fit”方法，该方法返回每个epoch的模型和损失。

In [4]:
def fit(W, b, x1, x2, y, epochs=1, learning_rate=1e-2):
    for _ in range(epochs):
        W, b = train_step(W, b, x1, x2, y, learning_rate=learning_rate)
    return W, b

### Validate the Model

We could use the AUC to validate a binary classification model.

### 验证模型[​](https://www.secretflow.org.cn/zh-CN/docs/secretflow/v1.12.0b0/tutorial/lr_with_spu#验证模型)

我们可以使用 AUC 来验证二元分类模型。

In [5]:
from sklearn.metrics import roc_auc_score


def validate_model(W, b, X_test, y_test):
    y_pred = predict(W, b, X_test)
    return roc_auc_score(y_test, y_pred)

### 试试！[​](https://www.secretflow.org.cn/zh-CN/docs/secretflow/v1.12.0b0/tutorial/lr_with_spu#试试！)

把所有的东西放在一起，训练一个 LR 模型

In [6]:
# Load the data
x1, _ = breast_cancer(party_id=1, train=True)
x2, y = breast_cancer(party_id=2, train=True)

# Hyperparameter
W = jnp.zeros((30,))
b = 0.0
epochs = 10
learning_rate = 1e-2

# Train the model
W, b = fit(W, b, x1, x2, y, epochs=epochs, learning_rate=learning_rate)

# Validate the model
X_test, y_test = breast_cancer(train=False)
auc = validate_model(W, b, X_test, y_test)
print(f'auc={auc}')

auc=0.9878807730101539


auc=0.9878807730101539


## Train a Model with SPU

At this part, we are going to show you how to do the similar training with MPC securely!

## 使用 SPU 训练模型[​](https://www.secretflow.org.cn/zh-CN/docs/secretflow/v1.12.0b0/tutorial/lr_with_spu#使用-SPU-训练模型)

在这一部分，我们将向您展示如何安全地使用 MPC 进行类似的训练！

### Init the Environment

We are going to init three virtual devices on our physical environment.

- alice, bob：Two PYU devices for local plaintext computation.
- spu：SPU device consists with alice and bob for MPC secure computation.

#### 初始化环境[​](https://www.secretflow.org.cn/zh-CN/docs/secretflow/v1.12.0b0/tutorial/lr_with_spu#初始化环境)

在物理环境中初始化三个虚拟设备。 

- alice, bob：两个用于本地明文计算的 PYU 设备。
- spu：SPU 设备由 alice 和 bob 组成，用于 MPC 安全计算。

In [7]:
import secretflow as sf

# Check the version of your SecretFlow
print('The version of SecretFlow: {}'.format(sf.__version__))

# In case you have a running secretflow runtime already.
sf.shutdown()

sf.init(['alice', 'bob'], address='local')

alice, bob = sf.PYU('alice'), sf.PYU('bob')
spu = sf.SPU(sf.utils.testing.cluster_def(['alice', 'bob']))

Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.


The version of SecretFlow: 1.9.0b0


  self.pid = _posixsubprocess.fork_exec(
2025-05-18 16:08:54,787	INFO worker.py:1724 -- Started a local Ray instance.


Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.


The version of SecretFlow: 1.9.0b0


  self.pid = _posixsubprocess.fork_exec(
2025-05-18 16:08:53,625	INFO worker.py:1724 -- Started a local Ray instance.


### Load the Dataset

we instruct alice and bob to load the train subset respectively.

### 加载数据集[​](https://www.secretflow.org.cn/zh-CN/docs/secretflow/v1.12.0b0/tutorial/lr_with_spu#id1)

我们指示 alice 和 bob 分别加载训练子集。

In [8]:
x1, _ = alice(breast_cancer)(party_id=1)
x2, y = bob(breast_cancer)(party_id=2)

x1, x2, y

(<secretflow.device.device.pyu.PYUObject at 0x7fa81d78ebf0>,
 <secretflow.device.device.pyu.PYUObject at 0x7fa81d78ece0>,
 <secretflow.device.device.pyu.PYUObject at 0x7fa7e40f1780>)

(<secretflow.device.device.pyu.PYUObject at 0x7f5d281cad10>,
 <secretflow.device.device.pyu.PYUObject at 0x7f5d281cbbb0>,
 <secretflow.device.device.pyu.PYUObject at 0x7f5d281cabf0>)

Before training, we need to pass hyperparamters and all data to SPU device. SecretFlow provides two methods:

- secretflow.to: transfer a PythonObject or DeviceObject to a specific device.
- DeviceObject.to: transfer the DeviceObject to a specific device.

训练之前，需要将超参数和所有数据传递给 SPU 设备。

SecretFlow 提供两种方法： 

- [secretflow.to](http://secretflow.to)：将 PythonObject 或 DeviceObject 传输到特定设备。 
- [DeviceObject.to](http://DeviceObject.to)：将 DeviceObject 传输到特定设备。

In [9]:
device = spu

W = jnp.zeros((30,))
b = 0.0

### x1, x2, y已经在alice和bob上了
W_, b_, x1_, x2_, y_ = (
    sf.to(alice, W).to(device),
    sf.to(alice, b).to(device),
    x1.to(device),
    x2.to(device),
    y.to(device),
)

[36m(pid=1801)[0m Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.
[36m(_run pid=1727)[0m INFO:jax._src.xla_bridge:Unable to initialize backend 'cuda': 
[36m(_run pid=1727)[0m INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
[36m(_run pid=1727)[0m INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
[36m(pid=1848)[0m Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.[32m [repeated 4x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)[0m


[36m(pid=1627)[0m Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.
[36m(_run pid=1660)[0m INFO:jax._src.xla_bridge:Unable to initialize backend 'cuda': 
[36m(_run pid=1660)[0m INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
[36m(_run pid=1660)[0m INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
[36m(pid=1660)[0m Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.[32m [repeated 4x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)[0m


### Train the model

Now we are ready to train a LR model with SPU. After training, losses and model are SPUObjects which are still secret.

### 训练模型[​](https://www.secretflow.org.cn/zh-CN/docs/secretflow/v1.12.0b0/tutorial/lr_with_spu#训练模型)

现在我们准备好用 SPU 训练一个 LR 模型。 经过训练，损失和模型是仍然保密的 SPU 对象。

指定 static_argnames 用于 jit 优化

In [10]:
W_, b_ = device(
    fit,
    static_argnames=['epochs'],
    num_returns_policy=sf.device.SPUCompilerNumReturnsPolicy.FROM_USER,
    user_specified_num_returns=2,
)(W_, b_, x1_, x2_, y_, epochs=epochs, learning_rate=learning_rate)

W_, b_

(<secretflow.device.device.spu.SPUObject at 0x7fa76c7bdb40>,
 <secretflow.device.device.spu.SPUObject at 0x7fa76c7bf130>)

[36m(_run pid=1548)[0m INFO:jax._src.xla_bridge:Unable to initialize backend 'cuda': 
[36m(_run pid=1548)[0m INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
[36m(_run pid=1548)[0m INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory


(<secretflow.device.device.spu.SPUObject at 0x7f5c63fd0ac0>,
 <secretflow.device.device.spu.SPUObject at 0x7f5c63fd3190>)

### Reveal the result

In order to check the trained model, we need to convert SPUObject(secret) to Python object(plaintext). SecretFlow provide `sf.reveal` to convert any DeviceObject to Python object.

### 揭示结果[​](https://www.secretflow.org.cn/zh-CN/docs/secretflow/v1.12.0b0/tutorial/lr_with_spu#揭示结果)

为了检查训练的模型，我们需要将 SPUObject(secret) 转换为 Python object(明文)。 SecretFlow 提供 `sf.reveal` 将任何 DeviceObject 转换为 Python object。

> 请小心使用 `sf.reveal`，因为它可能导致秘密泄露。

> Be care with `sf.reveal`，since it may result in secret leak。

最后，用 AUC 验证模型。

In [11]:
auc = validate_model(sf.reveal(W_), sf.reveal(b_), X_test, y_test)
print(f'auc={auc}')

[36m(_run pid=1728)[0m INFO:jax._src.xla_bridge:Unable to initialize backend 'cuda': 
[36m(_run pid=1728)[0m INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
[36m(_run pid=1728)[0m INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory


auc=0.987880773010154


auc=0.987880773010154


SPU 训练程序中的模型达到了与 JAX 程序几乎相同的 AUC。