# Managed Pipelines Experimental: pipeline control flow with the KFP SDK.

This notebook shows how to construct pipelines with loops and conditionals using the KFP SDK.

## Setup

Before you run this notebook, ensure that your Google Cloud user account and project are granted access to the Managed Pipelines Experimental. To be granted access to the Managed Pipelines Experimental, fill out this [form](http://go/cloud-mlpipelines-signup) and let your account representative know you have requested access. 

This notebook is intended to be run on either one of:
* [AI Platform Notebooks](https://cloud.google.com/ai-platform-notebooks). See the "AI Platform Notebooks" section in the Experimental [User Guide](https://docs.google.com/document/d/1JXtowHwppgyghnj1N1CT73hwD1caKtWkLcm2_0qGBoI/edit?usp=sharing) for more detail on creating a notebook server instance.
* [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb)


**To run this notebook on AI Platform Notebooks**, click on the **File** menu, then select "Download .ipynb".  Then, upload that notebook from your local machine to AI Platform Notebooks. (In the AI Platform Notebooks left panel, look for an icon of an arrow pointing up, to upload).

We'll first install some libraries and set up some variables.


Set `gcloud` to use your project.  **Edit the following cell before running it**.

In [None]:
PROJECT_ID = 'your-project-id'  # <---CHANGE THIS

In [None]:
!gcloud config set project {PROJECT_ID}

If you're running this notebook on colab, authenticate with your user account:

In [None]:
import sys
if 'google.colab' in sys.modules:
  from google.colab import auth
  auth.authenticate_user()

-----------------

**If you're on AI Platform Notebooks**, authenticate with Google Cloud before running the next section, by running
```sh
gcloud auth login
```
**in the Terminal window** (which you can open via **File** > **New** in the menu). You only need to do this once per notebook instance.

### Install the KFP SDK and AI Platform Pipelines client library

For Managed Pipelines Experimental, you'll need to download special versions of the KFP SDK and the AI Platform client library.

In [None]:
!gsutil cp gs://cloud-aiplatform-pipelines/releases/latest/kfp-1.5.0rc5.tar.gz .
!gsutil cp gs://cloud-aiplatform-pipelines/releases/latest/aiplatform_pipelines_client-0.1.0.caip20210415-py3-none-any.whl .


Then, install the libraries and restart the kernel as necessary.

In [None]:
if 'google.colab' in sys.modules:
  USER_FLAG = ''
else:
  USER_FLAG = '--user'

In [None]:
!python3 -m pip install {USER_FLAG} kfp-1.5.0rc5.tar.gz --upgrade
!python3 -m pip install {USER_FLAG} aiplatform_pipelines_client-0.1.0.caip20210415-py3-none-any.whl  --upgrade

In [None]:
if not 'google.colab' in sys.modules:
  # Automatically restart kernel after installs
  import IPython
  app = IPython.Application.instance()
  app.kernel.do_shutdown(True)

The KFP version should be >= 1.5.



In [None]:
# Check the KFP version
!python3 -c "import kfp; print('KFP version: {}'.format(kfp.__version__))"

### Set some variables and do some imports

**Before you run the next cell**, **edit it** to set variables for your project.  See the "Before you begin" section of the User Guide for information on creating your API key.  For `BUCKET_NAME`, enter the name of a Cloud Storage (GCS) bucket in your project.  Don't include the `gs://` prefix.

In [None]:
PATH=%env PATH
%env PATH={PATH}:/home/jupyter/.local/bin

# Required Parameters
USER = 'YOUR_USER_NAME' # <---CHANGE THIS
BUCKET_NAME = 'YOUR_BUCKET_NAME'  # <---CHANGE THIS
PIPELINE_ROOT = 'gs://{}/pipeline_root/{}'.format(BUCKET_NAME, USER)

PROJECT_ID = 'YOUR_PROJECT_ID'  # <---CHANGE THIS
REGION = 'us-central1'
API_KEY = 'YOUR_API_KEY'  # <---CHANGE THIS

print('PIPELINE_ROOT: {}'.format(PIPELINE_ROOT))

## Control flow: nested loops and conditions


## Define some components

Here, we're defining three components: one that generates a stringified JSON list, one that prints a string, and a 'coin flip' component that outputs `heads` or `tails`. 



In [None]:
from kfp import components
from kfp import dsl
from kfp.v2.dsl import component
import kfp.v2.compiler as compiler


@component
def args_generator_op() -> str:
  import json
  return json.dumps(
      [{'A_a': '1', 'B_b': '2'}, {'A_a': '10', 'B_b': '20'}], sort_keys=True)


@component
def print_op(msg: str):
  print(msg)


@component
def flip_coin_op() -> str:
  """Flip a coin and output heads or tails randomly."""
  import random
  result = 'heads' if random.randint(0, 1) == 0 else 'tails'
  return result


## Define a pipeline using the components you built

Now, we'll define a pipeline that uses nested conditionals (`dsl.Condition` and loops (`dsl.ParallelFor`). 

You can see how output from one pipeline step can be used as input to a conditional or loop call.  
Note that we can pass the output of the `args_generator_op()` to a `ParallelFor()` call.

In [None]:
@dsl.pipeline(
    name='pipeline-with-loops-and-conditions',
    pipeline_root=PIPELINE_ROOT,
)
def my_pipeline(text_parameter: str = 'Hello world!'):
  flip1 = flip_coin_op()

  with dsl.Condition(flip1.output != 'no-such-result', name="alwaystrue1"): # always true

    args_generator = args_generator_op()
    with dsl.ParallelFor(args_generator.output) as item:
      print_op(text_parameter)
      print_op(item)

      with dsl.Condition(flip1.output == 'heads', name="heads"):
        print_op(item.A_a)

      with dsl.Condition(flip1.output == 'tails', name="tails"):
        print_op(item.B_b)

      with dsl.Condition(flip1.output != 'no-such-result', name="alwaystrue2"): # always true
        with dsl.ParallelFor(['a', 'b','c']) as item:
          print_op(item)

Compile the pipeline:

In [None]:
compiler.Compiler().compile(pipeline_func=my_pipeline, 
                            package_path='loops-and-conditions.json')

Submit the pipeline job:

In [None]:
from aiplatform.pipelines import client

api_client = client.Client(
    project_id=PROJECT_ID, 
    region=REGION, 
    api_key=API_KEY)

response = api_client.create_run_from_job_spec(
          job_spec_path='loops-and-conditions.json')

Click on the generated link above to see your pipeline run in the Console.

You can expand and collapse the nested control structures as the pipeline runs (the graph expands further than shown below), and zoom and pan the DAG.

<a href="https://storage.googleapis.com/amy-jo/images/mp/control_structures2.gif" target="_blank"><img src="https://storage.googleapis.com/amy-jo/images/mp/control_structures2.gif" width="60%"/></a>

After the pipeline has finished running, the graph reflects the runtime structure.

> **Note**: at time of writing, there is a bug with rendering the DAG for this example, which displays some 'extra' unexecuted nodes in the graph.  This will be fixed soon, and doesn't affect the correct execution of the pipeline.


-----------------------------
Copyright 2021 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.