# Pipeline Tutorial With Using Data With Recording Meta

Starting at FATE-v1.9.0, Fate supports to use data with recoding meta. Using data with meta means that some information like "input format", "with_label" will be set at uploading step, please refer to [FATE-Flow](https://github.com/FederatedAI/FATE-Flow/tree/main/examples/upload) for more examples.

## Install

`Pipeline` is distributed along with [FATE-Client](https://pypi.org/project/fate-client/).

```bash
pip install fate_client
```
To use PipeLine, we need to first specify which `FATE Flow Service` to connect to. Once `FATE-Client` installed, one can find a cmd enterpoint named `pipeline`:

In [2]:
!pipeline --help

Usage: pipeline [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  config  pipeline config tool
  init     - DESCRIPTION: Pipeline Config Command.


Assume we have a FATE Flow Service at 127.0.0.1:9380(default in standalone), then execute the following command to initialize PipeLine:

!pipeline init --ip 127.0.0.1 --port 9380

## Upload Data with Meta

Before starting a modeling job, data should be uploaded. Here we assume that the task is between two parties: guest and host, and is run in standalone mode. If you want to run in cluster mode, make sure that the data be uploaded in each party respectively.

In [63]:
from pipeline.backend.pipeline import PipeLine

Make a `pipeline` instance
- initiator: 
    * role: guest
    * party: 9999
- roles:
    * guest: 9999

In [64]:
pipeline_upload = PipeLine().set_initiator(role="guest", party_id=9999).set_roles(guest=9999)

We will use "breast_hetero_guest" and "breast_hetero_host" under the [examples data](https://github.com/FederatedAI/FATE/tree/master/examples/data) to demonstrate how to upload data with meta.

Define data meta: 

In [65]:
breast_hetero_guest_meta = {"delimiter": ",", "with_label": True, "label_name": "y",
                            "input_format": "dense", "data_type": "float64"}

In [66]:
breast_hetero_host_meta = {"delimiter": ",", "with_label": False, 
                           "input_format": "dense", "data_type": "float64"}

In [67]:
breast_hetero_guest = {"name": "breast_hetero_guest_with_meta", "namespace": f"experiment"}
breast_hetero_host = {"name": "breast_hetero_host_with_meta", "namespace": f"experiment"}

In [78]:
# This should be replaced with actual location where FATE is deployed
fate_project_base="/data/projects/fate" # $fate_project_base
import os

In [69]:
pipeline_upload.add_upload_data(file=os.path.join(fate_project_base, "examples/data/breast_hetero_guest.csv"),
                                table_name=breast_hetero_guest["name"],         
                                namespace=breast_hetero_guest["namespace"],         
                                head=1, partition=4,
                                with_meta=True, meta=breast_hetero_guest_meta) # with_meta=True means uploading data with meta                       

pipeline_upload.add_upload_data(file=os.path.join(fate_project_base, "examples/data/breast_hetero_host.csv"),
                                table_name=breast_hetero_host["name"],
                                namespace=breast_hetero_host["namespace"],
                                head=1, partition=4,
                                with_meta=True, meta=breast_hetero_host_meta)

We can then upload the dataset

In [70]:
pipeline_upload.upload(drop=1)

 UPLOADING:||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.00%

[32m2022-08-29 14:53:53.497[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m83[0m - [1mJob id is 202208291453533660260
[0m
[32m2022-08-29 14:53:53.502[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m98[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:00[0m





[32m2022-08-29 14:53:54.515[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m98[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:01[0m
[0mm2022-08-29 14:53:55.529[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m125[0m - [1m
[32m2022-08-29 14:53:55.530[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component upload_0, time elapse: 0:00:02[0m
[32m2022-08-29 14:53:56.544[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component upload_0, time elapse: 0:00:03[0m
[32m2022-08-29 14:53:57.570[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component upload_0, time elapse: 0:00:04[0m
[32m2022-0

 UPLOADING:||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.00%

[32m2022-08-29 14:53:59.743[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m83[0m - [1mJob id is 202208291453596053400
[0m
[32m2022-08-29 14:53:59.749[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m98[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:00[0m





[32m2022-08-29 14:54:00.761[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m98[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:01[0m
[0mm2022-08-29 14:54:01.776[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m125[0m - [1m
[32m2022-08-29 14:54:01.777[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component upload_0, time elapse: 0:00:02[0m
[32m2022-08-29 14:54:02.792[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component upload_0, time elapse: 0:00:03[0m
[32m2022-08-29 14:54:03.805[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component upload_0, time elapse: 0:00:04[0m
[32m2022-0

# Use Data with Meta to Run A Modeling Task

When data is uploaded with meta, all fields specified in meta should not be set again in `DataTransform` component configuration, as `DataTransform` will use fields in meta to process data. Please refer to [doc](https://fate.readthedocs.io/en/latest/federatedml_component/data_transform/#param) of `DataTransform` component for more details.

In [71]:
from pipeline.component import Reader, DataTransform, Intersection, HeteroSecureBoost, Evaluation
from pipeline.interface import Data

Make a pipeline instance:

- initiator: 
    * role: guest
    * party: 9999
- roles:
    * guest: 9999
    * host: 10000

In [72]:
pipeline = PipeLine() \
        .set_initiator(role='guest', party_id=9999) \
        .set_roles(guest=9999, host=10000)

Define `Reader` to load data

In [73]:
reader_0 = Reader(name="reader_0")
# set guest parameter
reader_0.get_party_instance(role='guest', party_id=9999).component_param(
    table=breast_hetero_guest)
# set host parameter
reader_0.get_party_instance(role='host', party_id=10000).component_param(
    table=breast_hetero_host)

Add `DataTransform` component to parse raw data into Data Instance. As shown above, meta is already set when data is uploaded, so corresponding parameters will not be set in `DataTransform` again.

In [74]:
data_transform_0 = DataTransform(name="data_transform_0")

Add other components

In [75]:
intersect_0 = Intersection(name="intersect_0")

hetero_secureboost_0 = HeteroSecureBoost(name="hetero_secureboost_0",
                                         num_trees=5,
                                         bin_num=16,
                                         task_type="classification",
                                         objective_param={"objective": "cross_entropy"},
                                         encrypt_param={"method": "paillier"},
                                         tree_param={"max_depth": 3})

evaluation_0 = Evaluation(name="evaluation_0", eval_type="binary")

Add components to pipeline, in order of execution:

- data_transform_0 comsume reader_0's output data
- intersect_0 comsume data_transform_0's output data
- hetero_secureboost_0 consume intersect_0's output data
- evaluation_0 consume hetero_secureboost_0's prediciton result on training data

Then compile our pipeline to make it ready for submission.

In [76]:
pipeline.add_component(reader_0)
pipeline.add_component(data_transform_0, data=Data(data=reader_0.output.data))
pipeline.add_component(intersect_0, data=Data(data=data_transform_0.output.data))
pipeline.add_component(hetero_secureboost_0, data=Data(train_data=intersect_0.output.data))
pipeline.add_component(evaluation_0, data=Data(data=hetero_secureboost_0.output.data))
pipeline.compile();

Now, submit(`fit`) our pipeline:

In [77]:
pipeline.fit()

[32m2022-08-29 14:54:52.035[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m83[0m - [1mJob id is 202208291454509303160
[0m
[32m2022-08-29 14:54:52.041[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m98[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:00[0m
[0mm2022-08-29 14:54:53.056[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m125[0m - [1m
[32m2022-08-29 14:54:53.058[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component reader_0, time elapse: 0:00:01[0m
[32m2022-08-29 14:54:54.075[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component reader_0, time elapse: 0:00:02[0m
[32m2022-08-29 14:54:55.094[0m | [1mI

[32m2022-08-29 14:55:28.862[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component hetero_secureboost_0, time elapse: 0:00:36[0m
[32m2022-08-29 14:55:29.878[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component hetero_secureboost_0, time elapse: 0:00:37[0m
[32m2022-08-29 14:55:30.899[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component hetero_secureboost_0, time elapse: 0:00:38[0m
[32m2022-08-29 14:55:31.920[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component hetero_secureboost_0, time elapse: 0:00:39[0m
[32m2022-08-29 14:55:32.944[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36

[32m2022-08-29 14:56:06.708[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component hetero_secureboost_0, time elapse: 0:01:14[0m
[32m2022-08-29 14:56:07.732[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component hetero_secureboost_0, time elapse: 0:01:15[0m
[32m2022-08-29 14:56:08.749[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component hetero_secureboost_0, time elapse: 0:01:16[0m
[32m2022-08-29 14:56:09.804[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component hetero_secureboost_0, time elapse: 0:01:17[0m
[32m2022-08-29 14:56:10.859[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36

For more examples on using pipeline to submit jobs, please refer to [PipeLine demos](https://github.com/FederatedAI/FATE/tree/master/examples/pipeline/demo)