## Pipeline Tutorial with HeteroSecureboost

### install

`Pipeline` is distributed along with [fate_client](https://pypi.org/project/fate-client/).

```bash
pip install fate_client
```

To use Pipeline, we need to first specify which `FATE Flow Service` to connect to. Once `fate_client` installed, one can find an cmd enterpoint name `pipeline`:

In [52]:
!pipeline --help

Usage: pipeline [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  config  pipeline config tool
  init    - DESCRIPTION: Pipeline Config Command.


Assume we have a `FATE Flow Service` in 127.0.0.1:9380(defaults in standalone), then exec

In [53]:
!pipeline init --ip 127.0.0.1 --port 9380

Pipeline configuration succeeded.


### Hetero Secureboost Example

 Before start a modeling task, the data to be used should be uploaded. Please refer to this [guide](./pipeline_tutorial_upload.ipynb).

The `pipeline` package provides components to compose a `FATE pipeline`.

In [54]:
from pipeline.backend.pipeline import PipeLine
from pipeline.component import Reader, DataTransform, Intersection, HeteroSecureBoost, Evaluation
from pipeline.interface import Data

Make a `pipeline` instance:

    - initiator: 
        * role: guest
        * party: 9999
    - roles:
        * guest: 9999
        * host: 10000
    

In [55]:
pipeline = PipeLine() \
        .set_initiator(role='guest', party_id=9999) \
        .set_roles(guest=9999, host=10000)

Define a `Reader` to load data

In [56]:
reader_0 = Reader(name="reader_0")
# set guest parameter
reader_0.get_party_instance(role='guest', party_id=9999).component_param(
    table={"name": "breast_hetero_guest", "namespace": "experiment"})
# set host parameter
reader_0.get_party_instance(role='host', party_id=10000).component_param(
    table={"name": "breast_hetero_host", "namespace": "experiment"})

Add a `DataTransform` component to parse raw data into Data Instance

In [57]:
data_transform_0 = DataTransform(name="data_transform_0")
# set guest parameter
data_transform_0.get_party_instance(role='guest', party_id=9999).component_param(
    with_label=True)
data_transform_0.get_party_instance(role='host', party_id=[10000]).component_param(
    with_label=False)

Add a `Intersection` component to perform PSI for hetero-scenario

In [58]:
intersect_0 = Intersection(name="intersection_0")

Now, we define the `HeteroSecureBoost` component. The following parameters will be set for all parties involved.

In [59]:
hetero_secureboost_0 = HeteroSecureBoost(name="hetero_secureboost_0",
                                         num_trees=5,
                                         bin_num=16,
                                         task_type="classification",
                                         objective_param={"objective": "cross_entropy"},
                                         encrypt_param={"method": "iterativeAffine"},
                                         tree_param={"max_depth": 3})


To show the evaluation result, an "Evaluation" component is needed.

In [60]:
evaluation_0 = Evaluation(name="evaluation_0", eval_type="binary")

Add components to pipeline, in order of execution:

    - data_transform_0 comsume reader_0's output data
    - intersection_0 comsume data_transform_0's output data
    - hetero_secureboost_0 consume intersection_0's output data
    - evaluation_0 consume hetero_secureboost_0's prediciton result on training data

In [61]:
pipeline.add_component(reader_0)
pipeline.add_component(data_transform_0, data=Data(data=reader_0.output.data))
pipeline.add_component(intersect_0, data=Data(data=data_transform_0.output.data))
pipeline.add_component(hetero_secureboost_0, data=Data(train_data=intersect_0.output.data))
pipeline.add_component(evaluation_0, data=Data(data=hetero_secureboost_0.output.data))


[32m2021-11-15 08:32:25.985[0m | [31m[1mERROR   [0m | [36mIPython.utils.dir2[0m:[36mget_real_method[0m:[36m65[0m - [31m[1mAn error has been caught in function 'get_real_method', process 'MainProcess' (2229), thread 'MainThread' (139937454883200):[0m
[33m[1mTraceback (most recent call last):[0m

  File "[32m/home/gitpod/.vscode-remote/extensions/ms-toolsai.jupyter-2021.10.100/pythonFiles/vscode_datascience_helpers/[0m[32m[1mkernel_prewarm_starter.py[0m", line [33m31[0m, in [35m<module>[0m
    [1mrunpy[0m[35m[1m.[0m[1mrun_module[0m[1m([0m[1mmodule[0m[1m,[0m [1mrun_name[0m[35m[1m=[0m[36m"__main__"[0m[1m,[0m [1malter_sys[0m[35m[1m=[0m[36m[1mFalse[0m[1m)[0m
    [36m│     │          └ [0m[36m[1m'ipykernel_launcher'[0m
    [36m│     └ [0m[36m[1m<function run_module at 0x7f45b9f4a840>[0m
    [36m└ [0m[36m[1m<module 'runpy' from '/home/gitpod/.pyenv/versions/3.6.15/lib/python3.6/runpy.py'>[0m

  File "/home/gitpod/.pyenv/v

<pipeline.backend.pipeline.PipeLine at 0x7f453a4bb5f8>

Then compile our pipeline to make it ready for submission.

In [62]:
pipeline.compile()

[32m2021-11-15 08:32:28.569[0m | [31m[1mERROR   [0m | [36mIPython.utils.dir2[0m:[36mget_real_method[0m:[36m65[0m - [31m[1mAn error has been caught in function 'get_real_method', process 'MainProcess' (2229), thread 'MainThread' (139937454883200):[0m
[33m[1mTraceback (most recent call last):[0m

  File "[32m/home/gitpod/.vscode-remote/extensions/ms-toolsai.jupyter-2021.10.100/pythonFiles/vscode_datascience_helpers/[0m[32m[1mkernel_prewarm_starter.py[0m", line [33m31[0m, in [35m<module>[0m
    [1mrunpy[0m[35m[1m.[0m[1mrun_module[0m[1m([0m[1mmodule[0m[1m,[0m [1mrun_name[0m[35m[1m=[0m[36m"__main__"[0m[1m,[0m [1malter_sys[0m[35m[1m=[0m[36m[1mFalse[0m[1m)[0m
    [36m│     │          └ [0m[36m[1m'ipykernel_launcher'[0m
    [36m│     └ [0m[36m[1m<function run_module at 0x7f45b9f4a840>[0m
    [36m└ [0m[36m[1m<module 'runpy' from '/home/gitpod/.pyenv/versions/3.6.15/lib/python3.6/runpy.py'>[0m

  File "/home/gitpod/.pyenv/v

<pipeline.backend.pipeline.PipeLine at 0x7f453a4bb5f8>

Now, submit(fit) our pipeline:

In [63]:
pipeline.fit()

[32m2021-11-15 08:32:36.529[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m123[0m - [1mJob id is 202111150832306613020
[0m
[32m2021-11-15 08:32:36.542[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m144[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:00[0m
[32m2021-11-15 08:32:37.065[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m144[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:00[0m
[32m2021-11-15 08:32:37.589[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m144[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:01[0m
[0m
[32m2021-11-15 08:32:38.132[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m177[0m - [1m[80D[1A[KRunning component reader_0, time e

Once training is done, traiend model may be used for prediction

First, deploy needed components

In [64]:
pipeline.deploy_component([data_transform_0, intersect_0, hetero_secureboost_0])

[32m2021-11-15 08:35:32.877[0m | [31m[1mERROR   [0m | [36mIPython.utils.dir2[0m:[36mget_real_method[0m:[36m65[0m - [31m[1mAn error has been caught in function 'get_real_method', process 'MainProcess' (2229), thread 'MainThread' (139937454883200):[0m
[33m[1mTraceback (most recent call last):[0m

  File "[32m/home/gitpod/.vscode-remote/extensions/ms-toolsai.jupyter-2021.10.100/pythonFiles/vscode_datascience_helpers/[0m[32m[1mkernel_prewarm_starter.py[0m", line [33m31[0m, in [35m<module>[0m
    [1mrunpy[0m[35m[1m.[0m[1mrun_module[0m[1m([0m[1mmodule[0m[1m,[0m [1mrun_name[0m[35m[1m=[0m[36m"__main__"[0m[1m,[0m [1malter_sys[0m[35m[1m=[0m[36m[1mFalse[0m[1m)[0m
    [36m│     │          └ [0m[36m[1m'ipykernel_launcher'[0m
    [36m│     └ [0m[36m[1m<function run_module at 0x7f45b9f4a840>[0m
    [36m└ [0m[36m[1m<module 'runpy' from '/home/gitpod/.pyenv/versions/3.6.15/lib/python3.6/runpy.py'>[0m

  File "/home/gitpod/.pyenv/v

<pipeline.backend.pipeline.PipeLine at 0x7f453a4bb5f8>

Define new `Reader` components for reading prediction data

In [66]:
reader_1 = Reader(name="reader_1")
reader_1.get_party_instance(role="guest", party_id=9999).component_param(table={"name": "breast_hetero_guest", "namespace": "experiment"})
reader_1.get_party_instance(role="host", party_id=10000).component_param(table={"name": "breast_hetero_host", "namespace": "experiment"})

Optionally, define new `Evaluation` component.

In [69]:
evaluation_0 = Evaluation(name="evaluation_0", eval_type="binary")

Add components to predict pipeline in order of execution:

In [71]:
predict_pipeline = PipeLine()
predict_pipeline.add_component(reader_1)\
                .add_component(pipeline, 
                data=Data(predict_input={pipeline.data_transform_0.input.data: reader_1.output.data}))\
                .add_component(evaluation_0, data=Data(data=pipeline.hetero_secureboost_0.output.data))

[32m2021-11-15 08:37:42.177[0m | [31m[1mERROR   [0m | [36mIPython.utils.dir2[0m:[36mget_real_method[0m:[36m65[0m - [31m[1mAn error has been caught in function 'get_real_method', process 'MainProcess' (2229), thread 'MainThread' (139937454883200):[0m
[33m[1mTraceback (most recent call last):[0m

  File "[32m/home/gitpod/.vscode-remote/extensions/ms-toolsai.jupyter-2021.10.100/pythonFiles/vscode_datascience_helpers/[0m[32m[1mkernel_prewarm_starter.py[0m", line [33m31[0m, in [35m<module>[0m
    [1mrunpy[0m[35m[1m.[0m[1mrun_module[0m[1m([0m[1mmodule[0m[1m,[0m [1mrun_name[0m[35m[1m=[0m[36m"__main__"[0m[1m,[0m [1malter_sys[0m[35m[1m=[0m[36m[1mFalse[0m[1m)[0m
    [36m│     │          └ [0m[36m[1m'ipykernel_launcher'[0m
    [36m│     └ [0m[36m[1m<function run_module at 0x7f45b9f4a840>[0m
    [36m└ [0m[36m[1m<module 'runpy' from '/home/gitpod/.pyenv/versions/3.6.15/lib/python3.6/runpy.py'>[0m

  File "/home/gitpod/.pyenv/v

<pipeline.backend.pipeline.PipeLine at 0x7f453a4a8470>

Then, run prediction job

In [72]:
predict_pipeline.predict()

[32m2021-11-15 08:37:57.931[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m123[0m - [1mJob id is 202111150837510838270
[0m
[32m2021-11-15 08:37:57.940[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m144[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:00[0m
[32m2021-11-15 08:37:58.468[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m144[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:00[0m
[32m2021-11-15 08:37:58.984[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m144[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:01[0m
[32m2021-11-15 08:37:59.502[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m144[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00

For more demo on using pipeline to submit jobs, please refer to [pipeline demos](https://github.com/FederatedAI/FATE/tree/master/examples/pipeline/demo)