# Pipeline Tutorial with Hetero Components

### install

`Pipeline` is distributed along with [fate_client](https://pypi.org/project/fate-client/).

```bash
pip install fate_client
```

To use Pipeline, we need to first specify which `FATE Flow Service` to connect to. Once `fate_client` installed, one can find an cmd enterpoint name `pipeline`:

In [1]:
!pipeline --help

Usage: pipeline [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  init       pipeline init
  show       - DESCRIPTION: Show pipeline config details for Flow server.
  site-info  pipeline site info


Assume we have a `FATE Flow Service` in 127.0.0.1:9380(defaults in standalone), then exec

In [2]:
!pipeline init --ip 127.0.0.1 --port 9380

Pipeline configuration succeeded.


### Hetero Example

 Before start a modeling task, data to be used should be transformed into dataframe. Please refer to this [guide](./pipeline_tutorial_transform_local_file_to_dataframe.ipynb).

The `pipeline` package provides components to compose a `FATE pipeline`.

In [3]:
from fate_client.pipeline import FateFlowPipeline
from fate_client.pipeline.components.fate import PSI, CoordinatedLR, Evaluation
from fate_client.pipeline.interface import DataWarehouseChannel

Make a `pipeline` instance:

    - initiator: 
        * role: guest
        * party: 9999
    - roles:
        * guest: 9999
        * host: 10000
    

In [4]:
pipeline = FateFlowPipeline().set_roles(guest='9999', host='10000', arbiter='10000')

Add `PSI` component to perform PSI for hetero-scenario. Since this is the first component, specify input data frame from `DataWarehouseChannel`.

In [5]:
psi_0 = PSI("psi_0")
psi_0.guest.component_setting(input_data=DataWarehouseChannel(name="breast_hetero_guest",
                                                              namespace="experiment"))
psi_0.hosts[0].component_setting(input_data=DataWarehouseChannel(name="breast_hetero_host",
                                                                 namespace="experiment"))


Now, we add training component CoordinatedLR and another LR component that predicts with model from previous component. Here we show how to feed output data and model from one component to another.

In [6]:
lr_0 = CoordinatedLR("lr_0",
                     epochs=5,
                     batch_size=None,
                     optimizer={"method": "SGD", "optimizer_params": {"lr": 0.1}, "penalty": "l2", "alpha": 0.001},
                     init_param={"fit_intercept": True, "method": "zeros"},
                     train_data=psi_0.outputs["output_data"],
                     learning_rate_scheduler={"method": "linear", "scheduler_params": {"start_factor": 0.7,
                                                                                       "total_iters": 100}})
lr_1 = CoordinatedLR("lr_1", 
                     input_model=lr_0.outputs["output_model"],
                     test_data=psi_0.outputs["output_data"])

To show the evaluation result, an "Evaluation" component is needed.

In [7]:
evaluation_0 = Evaluation("evaluation_0",
                          runtime_roles=["guest"],
                          default_eval_setting="binary",
                          input_data=lr_0.outputs["train_output_data"])

Add components to pipeline, in order of execution:

    - `psi_0` is responsible for finding overlapping match id
    - `lr_0` trains Coordinated LR on data output by `psi_0`
    - `lr_1` predicts with model from `lr_0`
    - `evaluation_0` consumes `lr_0`'s prediciton result on training data

Then compile our pipeline to make it ready for submission.

In [8]:
pipeline.add_task(psi_0)
pipeline.add_task(lr_0)
pipeline.add_task(lr_1)
pipeline.add_task(evaluation_0)

pipeline.compile();

Now, submit(fit) our pipeline:

In [9]:
pipeline.fit();

Job id is 202308311051324015890

[80D[1A[KJob is waiting, time elapse: 0:00:00
[80D[1A[KJob is waiting, time elapse: 0:00:01

[80D[1A[KRunning task psi_0, time elapse: 0:00:02
[80D[1A[KRunning task psi_0, time elapse: 0:00:03
[80D[1A[KRunning task psi_0, time elapse: 0:00:04
[80D[1A[KRunning task psi_0, time elapse: 0:00:05
[80D[1A[KRunning task psi_0, time elapse: 0:00:06
[80D[1A[KRunning task psi_0, time elapse: 0:00:07
[80D[1A[KRunning task psi_0, time elapse: 0:00:08
[80D[1A[KRunning task psi_0, time elapse: 0:00:09
[80D[1A[KRunning task psi_0, time elapse: 0:00:10
[80D[1A[KRunning task psi_0, time elapse: 0:00:11
[80D[1A[KRunning task psi_0, time elapse: 0:00:12
[80D[1A[KRunning task psi_0, time elapse: 0:00:13
[80D[1A[KRunning task psi_0, time elapse: 0:00:14

[80D[1A[KRunning task lr_0, time elapse: 0:00:15
[80D[1A[KRunning task lr_0, time elapse: 0:00:16
[80D[1A[KRunning task lr_0, time elapse: 0:00:17
[80D[1A[KRunning ta

Once training is done, data and model output from trained components may be queried through pipeline api. 

In [10]:
lr_0_data = pipeline.get_task_info("lr_0").get_output_data()["train_output_data"]
import pandas as pd
pd.DataFrame(lr_0_data).head()

Unnamed: 0,extend_sid,id,label,predict_score,predict_result,predict_detail,type
0,a41979464da4e859ce5f594b3da915820,133,1,0.5453636377530179,1,"{'0': 0.4546363622469821, '1': 0.5453636377530...",train_set
1,a41979464da4e859ce5f594b3da9158222,262,0,0.2858926003794503,0,"{'0': 0.7141073996205496, '1': 0.2858926003794...",train_set
2,a41979464da4e859ce5f594b3da9158276,116,1,0.7589402080943449,1,"{'0': 0.24105979190565507, '1': 0.758940208094...",train_set
3,a41979464da4e859ce5f594b3da91582115,140,1,0.837934821102845,1,"{'0': 0.162065178897155, '1': 0.837934821102845}",train_set
4,a41979464da4e859ce5f594b3da91582160,174,1,0.819790248482875,1,"{'0': 0.18020975151712504, '1': 0.819790248482...",train_set


In [11]:
lr_0_model = pipeline.get_task_info("lr_0").get_output_model()
lr_0_model

{'output_model': {'data': {'estimator': {'end_epoch': 5,
    'fit_intercept': True,
    'is_converged': False,
    'lr_scheduler': {'lr_params': {'start_factor': 0.7, 'total_iters': 100},
     'lr_scheduler': {'_get_lr_called_within_step': False,
      '_last_lr': [0.07119999999999999],
      '_step_count': 5,
      'base_lrs': [0.1],
      'end_factor': 1.0,
      'last_epoch': 4,
      'start_factor': 0.7,
      'total_iters': 100,
      'verbose': False},
     'method': 'linear'},
    'optimizer': {'alpha': 0.001,
     'l1_penalty': False,
     'l2_penalty': True,
     'method': 'sgd',
     'model_parameter': [[0.0],
      [0.0],
      [0.0],
      [0.0],
      [0.0],
      [0.0],
      [0.0],
      [0.0],
      [0.0],
      [0.0],
      [0.0]],
     'model_parameter_dtype': 'float32',
     'optim_param': {'lr': 0.1},
     'optimizer': {'param_groups': [{'dampening': 0,
        'differentiable': False,
        'foreach': None,
        'initial_lr': 0.1,
        'lr': 0.0711999999999

To run prediction, trained components should first be deployed.

In [12]:
pipeline.deploy([psi_0, lr_0]);

Then, get deployed pipeline.

In [13]:
deployed_pipeline = pipeline.get_deployed_pipeline()

Specify data input for predict pipeline.

In [14]:
deployed_pipeline.psi_0.guest.component_setting(input_data=DataWarehouseChannel(name="breast_hetero_guest",
                                                                                namespace="experiment"))
deployed_pipeline.psi_0.hosts[0].component_setting(input_data=DataWarehouseChannel(name="breast_hetero_host",
                                                                                   namespace="experiment"))

Add components to predict pipeline in order of execution:

In [15]:
predict_pipeline = FateFlowPipeline()
predict_pipeline.add_task(deployed_pipeline)
predict_pipeline.compile();

Then, run prediction job

In [16]:
predict_pipeline.predict();

Job id is 202308311054193818250

[80D[1A[KJob is waiting, time elapse: 0:00:00

[80D[1A[KRunning task psi_0, time elapse: 0:00:01
[80D[1A[KRunning task psi_0, time elapse: 0:00:02
[80D[1A[KRunning task psi_0, time elapse: 0:00:03
[80D[1A[KRunning task psi_0, time elapse: 0:00:04
[80D[1A[KRunning task psi_0, time elapse: 0:00:05
[80D[1A[KRunning task psi_0, time elapse: 0:00:06
[80D[1A[KRunning task psi_0, time elapse: 0:00:07
[80D[1A[KRunning task psi_0, time elapse: 0:00:08
[80D[1A[KRunning task psi_0, time elapse: 0:00:09
[80D[1A[KRunning task psi_0, time elapse: 0:00:10
[80D[1A[KRunning task psi_0, time elapse: 0:00:11
[80D[1A[KRunning task psi_0, time elapse: 0:00:12

[80D[1A[KRunning task lr_0, time elapse: 0:00:13
[80D[1A[KRunning task lr_0, time elapse: 0:00:14
[80D[1A[KRunning task lr_0, time elapse: 0:00:15
[80D[1A[KRunning task lr_0, time elapse: 0:00:16
[80D[1A[KRunning task lr_0, time elapse: 0:00:17
[80D[1A[KRunning 