## Pipeline Tutorial with HeteroSecureboost

### install

`Pipeline` is distributed along with [fate_client](https://pypi.org/project/fate-client/).

```bash
pip install fate_client
```

To use Pipeline, we need to first specify which `FATE Flow Service` to connect to. Once `fate_client` installed, one can find an cmd enterpoint name `pipeline`:

In [2]:
!pipeline --help

Usage: pipeline [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  config  pipeline config tool
  init    - DESCRIPTION: Pipeline Config Command.


Assume we have a `FATE Flow Service` in 127.0.0.1:9380(defaults in standalone), then exec

In [3]:
!pipeline init --ip 127.0.0.1 --port 9380

Pipeline configuration succeeded.


### Hetero Secureboost Example

 Before start a modeling task, the data to be used should be uploaded. Please refer to this [guide](./pipeline_tutorial_upload.ipynb).

The `pipeline` package provides components to compose a `FATE pipeline`.

In [4]:
from pipeline.backend.pipeline import PipeLine
from pipeline.component import Reader, DataTransform, Intersection, HeteroSecureBoost, Evaluation
from pipeline.interface import Data

Make a `pipeline` instance:

    - initiator: 
        * role: guest
        * party: 9999
    - roles:
        * guest: 9999
        * host: 10000
    

In [5]:
pipeline = PipeLine() \
        .set_initiator(role='guest', party_id=9999) \
        .set_roles(guest=9999, host=10000)

Define a `Reader` to load data

In [6]:
reader_0 = Reader(name="reader_0")
# set guest parameter
reader_0.get_party_instance(role='guest', party_id=9999).component_param(
    table={"name": "breast_hetero_guest", "namespace": "experiment"})
# set host parameter
reader_0.get_party_instance(role='host', party_id=10000).component_param(
    table={"name": "breast_hetero_host", "namespace": "experiment"})

Add a `DataTransform` component to parse raw data into Data Instance

In [7]:
data_transform_0 = DataTransform(name="data_transform_0")
# set guest parameter
data_transform_0.get_party_instance(role='guest', party_id=9999).component_param(
    with_label=True)
data_transform_0.get_party_instance(role='host', party_id=[10000]).component_param(
    with_label=False)

Add a `Intersection` component to perform PSI for hetero-scenario

In [8]:
intersect_0 = Intersection(name="intersection_0")

Now, we define the `HeteroSecureBoost` component. The following parameters will be set for all parties involved.

In [9]:
hetero_secureboost_0 = HeteroSecureBoost(name="hetero_secureboost_0",
                                         num_trees=5,
                                         bin_num=16,
                                         task_type="classification",
                                         objective_param={"objective": "cross_entropy"},
                                         encrypt_param={"method": "iterativeAffine"},
                                         tree_param={"max_depth": 3})


To show the evaluation result, an "Evaluation" component is needed.

In [10]:
evaluation_0 = Evaluation(name="evaluation_0", eval_type="binary")

Add components to pipeline, in order of execution:

    - data_transform_0 comsume reader_0's output data
    - intersection_0 comsume data_transform_0's output data
    - hetero_secureboost_0 consume intersection_0's output data
    - evaluation_0 consume hetero_secureboost_0's prediciton result on training data

In [11]:
pipeline.add_component(reader_0)
pipeline.add_component(data_transform_0, data=Data(data=reader_0.output.data))
pipeline.add_component(intersect_0, data=Data(data=data_transform_0.output.data))
pipeline.add_component(hetero_secureboost_0, data=Data(train_data=intersect_0.output.data))
pipeline.add_component(evaluation_0, data=Data(data=hetero_secureboost_0.output.data))


[32m2021-11-15 11:27:36.913[0m | [31m[1mERROR   [0m | [36mIPython.utils.dir2[0m:[36mget_real_method[0m:[36m65[0m - [31m[1mAn error has been caught in function 'get_real_method', process 'MainProcess' (3809), thread 'MainThread' (140529146122624):[0m
[33m[1mTraceback (most recent call last):[0m

  File "/home/gitpod/.pyenv/versions/3.6.15/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
                └ ModuleSpec(name='ipykernel_launcher', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7fcf7d82d4a8>, origin='...
  File "/home/gitpod/.pyenv/versions/3.6.15/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
         │     └ {'__name__': '__main__', '__doc__': 'Entry point for launching an IPython kernel.\n\nThis is separate from the ipykernel pack...
         └ <code object <module> at 0x7fcf7d80f9c0, file "/venv/py36/lib/python3.6/site-packages/ipykernel_launcher.py", line 5>
  File "/venv/py36

<pipeline.backend.pipeline.PipeLine at 0x7fced068fac8>

Then compile our pipeline to make it ready for submission.

In [12]:
pipeline.compile()

[32m2021-11-15 11:27:40.734[0m | [31m[1mERROR   [0m | [36mIPython.utils.dir2[0m:[36mget_real_method[0m:[36m65[0m - [31m[1mAn error has been caught in function 'get_real_method', process 'MainProcess' (3809), thread 'MainThread' (140529146122624):[0m
[33m[1mTraceback (most recent call last):[0m

  File "/home/gitpod/.pyenv/versions/3.6.15/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
                └ ModuleSpec(name='ipykernel_launcher', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7fcf7d82d4a8>, origin='...
  File "/home/gitpod/.pyenv/versions/3.6.15/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
         │     └ {'__name__': '__main__', '__doc__': 'Entry point for launching an IPython kernel.\n\nThis is separate from the ipykernel pack...
         └ <code object <module> at 0x7fcf7d80f9c0, file "/venv/py36/lib/python3.6/site-packages/ipykernel_launcher.py", line 5>
  File "/venv/py36

<pipeline.backend.pipeline.PipeLine at 0x7fced068fac8>

Now, submit(fit) our pipeline:

In [13]:
pipeline.fit()

[32m2021-11-15 11:27:50.488[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m123[0m - [1mJob id is 202111151127428590790
[0m
[32m2021-11-15 11:27:50.496[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m144[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:00[0m
[32m2021-11-15 11:27:51.007[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m144[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:00[0m
[32m2021-11-15 11:27:51.517[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m144[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:01[0m
[32m2021-11-15 11:27:52.029[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m144[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00

Once training is done, trained model may be used for prediction. Optionally, save the trained pipeline for future use.

In [14]:
pipeline.dump("pipeline_saved.pkl")

b'\x80\x03cpipeline.backend.pipeline\nPipeLine\nq\x00)\x81q\x01}q\x02(X\x0c\x00\x00\x00_create_timeq\x03X\x18\x00\x00\x00Mon Nov 15 11:27:34 2021q\x04X\n\x00\x00\x00_initiatorq\x05ctypes\nSimpleNamespace\nq\x06)Rq\x07}q\x08(X\x04\x00\x00\x00roleq\tX\x05\x00\x00\x00guestq\nX\x08\x00\x00\x00party_idq\x0bM\x0f\'ubX\x06\x00\x00\x00_rolesq\x0c}q\r(X\x04\x00\x00\x00hostq\x0e]q\x0fM\x10\'ah\n]q\x10M\x0f\'auX\x0b\x00\x00\x00_componentsq\x11}q\x12(X\x08\x00\x00\x00reader_0q\x13cpipeline.component.reader\nReader\nq\x14)\x81q\x15}q\x16(X\x0f\x00\x00\x00_component_nameq\x17h\x13X\x1a\x00\x00\x00_Component__party_instanceq\x18}q\x19(h\n}q\x1aX\x05\x00\x00\x00partyq\x1b}q\x1cM\x0f\'h\x14)\x81q\x1d}q\x1e(h\x17h\x13h\x18}q\x1fh\n}q h\x1b}q!M\x0f\'NsssX\x1d\x00\x00\x00_component_parameter_keywordsq"cbuiltins\nset\nq#]q$X\x04\x00\x00\x00nameq%a\x85q&Rq\'X\x18\x00\x00\x00_role_parameter_keywordsq(h#]q)X\x05\x00\x00\x00tableq*a\x85q+Rq,X\x0c\x00\x00\x00_module_nameq-X\x06\x00\x00\x00Readerq.X\x10\x00\x00\

First, deploy needed components from train pipeline

In [15]:
pipeline = PipeLine.load_model_from_file('pipeline_saved.pkl')
pipeline.deploy_component([data_transform_0, intersect_0, hetero_secureboost_0])

[32m2021-11-15 11:30:32.122[0m | [31m[1mERROR   [0m | [36mIPython.utils.dir2[0m:[36mget_real_method[0m:[36m65[0m - [31m[1mAn error has been caught in function 'get_real_method', process 'MainProcess' (3809), thread 'MainThread' (140529146122624):[0m
[33m[1mTraceback (most recent call last):[0m

  File "/home/gitpod/.pyenv/versions/3.6.15/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
                └ ModuleSpec(name='ipykernel_launcher', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7fcf7d82d4a8>, origin='...
  File "/home/gitpod/.pyenv/versions/3.6.15/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
         │     └ {'__name__': '__main__', '__doc__': 'Entry point for launching an IPython kernel.\n\nThis is separate from the ipykernel pack...
         └ <code object <module> at 0x7fcf7d80f9c0, file "/venv/py36/lib/python3.6/site-packages/ipykernel_launcher.py", line 5>
  File "/venv/py36

<pipeline.backend.pipeline.PipeLine at 0x7fced06b5f98>

Define new `Reader` components for reading prediction data

In [16]:
reader_1 = Reader(name="reader_1")
reader_1.get_party_instance(role="guest", party_id=9999).component_param(table={"name": "breast_hetero_guest", "namespace": "experiment"})
reader_1.get_party_instance(role="host", party_id=10000).component_param(table={"name": "breast_hetero_host", "namespace": "experiment"})

Optionally, define new `Evaluation` component.

In [17]:
evaluation_0 = Evaluation(name="evaluation_0", eval_type="binary")

Add components to predict pipeline in order of execution:

In [18]:
predict_pipeline = PipeLine()
predict_pipeline.add_component(reader_1)\
                .add_component(pipeline, 
                data=Data(predict_input={pipeline.data_transform_0.input.data: reader_1.output.data}))\
                .add_component(evaluation_0, data=Data(data=pipeline.hetero_secureboost_0.output.data))

[32m2021-11-15 11:30:35.045[0m | [31m[1mERROR   [0m | [36mIPython.utils.dir2[0m:[36mget_real_method[0m:[36m65[0m - [31m[1mAn error has been caught in function 'get_real_method', process 'MainProcess' (3809), thread 'MainThread' (140529146122624):[0m
[33m[1mTraceback (most recent call last):[0m

  File "/home/gitpod/.pyenv/versions/3.6.15/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
                └ ModuleSpec(name='ipykernel_launcher', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7fcf7d82d4a8>, origin='...
  File "/home/gitpod/.pyenv/versions/3.6.15/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
         │     └ {'__name__': '__main__', '__doc__': 'Entry point for launching an IPython kernel.\n\nThis is separate from the ipykernel pack...
         └ <code object <module> at 0x7fcf7d80f9c0, file "/venv/py36/lib/python3.6/site-packages/ipykernel_launcher.py", line 5>
  File "/venv/py36

<pipeline.backend.pipeline.PipeLine at 0x7fced0524e80>

Then, run prediction job

In [19]:
predict_pipeline.predict()

[32m2021-11-15 11:30:42.336[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m123[0m - [1mJob id is 202111151130376077970
[0m
[32m2021-11-15 11:30:42.393[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m144[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:00[0m
[32m2021-11-15 11:30:42.905[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m144[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:00[0m
[32m2021-11-15 11:30:43.419[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m144[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:01[0m
[32m2021-11-15 11:30:43.934[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m144[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00

For more demo on using pipeline to submit jobs, please refer to [pipeline demos](https://github.com/FederatedAI/FATE/tree/master/examples/pipeline/demo)