## Kubeflow pipelines

This notebook goes through the steps of using Kubeflow pipelines using the Python3 interpreter (command-line) to preprocess, train, tune and deploy the babyweight model.


### 1. Start Hosted Pipelines and Notebook

To try out this notebook, first launch Kubeflow Hosted Pipelines and an AI Platform Notebooks instance.
Follow the instructions in this [README.md](pipelines/README.md) file.

### 2. Install necessary packages

In [1]:
%pip install --quiet kfp python-dateutil --upgrade

Note: you may need to restart the kernel to use updated packages.


Make sure to *restart the kernel* to pick up new packages (look for button in the ribbon of icons above this notebook)

### 3. Connect to the Hosted Pipelines

Visit https://console.cloud.google.com/ai-platform/pipelines/clusters
and get the hostname for your cluster.  You can get it by clicking on the Settings icon.
Alternately, click on the Open Pipelines Dashboard link and look at the URL.
Change the settings in the following cell

In [2]:
# CHANGE THESE
PIPELINES_HOST='505472f5e0115e1a-dot-us-central2.pipelines.googleusercontent.com'
PROJECT='ind-coe'
BUCKET='ai-analytics-solutions-kfpdemo'

In [3]:
import kfp
import os
client = kfp.Client(host=PIPELINES_HOST)
client.list_pipelines()

{'next_page_token': None,
 'pipelines': [{'created_at': datetime.datetime(2020, 9, 3, 6, 7, 34, tzinfo=tzlocal()),
                'default_version': {'code_source_url': None,
                                    'created_at': datetime.datetime(2020, 9, 3, 6, 7, 34, tzinfo=tzlocal()),
                                    'id': '7130b6dd-a3bd-4328-932c-7d5440f5408e',
                                    'name': '[Demo] XGBoost - Training with '
                                            'confusion matrix',
                                    'package_url': None,
                                    'parameters': [{'name': 'output',
                                                    'value': 'gs://{{kfp-default-bucket}}'},
                                                   {'name': 'project',
                                                    'value': '{{kfp-project-id}}'},
                                                   {'name': 'diagnostic_mode',
                                     

## 4. [Optional] Build Docker containers

I have made my containers public (See https://cloud.google.com/container-registry/docs/access-control on how to do this), so you can simply use my images.

In [4]:
%%bash
cd pipelines/containers
bash build_all.sh

Building Docker container in bqtocsv
Creating babyweight-pipeline-bqtocsv:latest from this Dockerfile:
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

FROM gcr.io/ml-pipeline/ml-pipeline-dataflow-tft:latest

RUN mkdir /babyweight

COPY transform.py /babyweight

ENTRYPOINT ["python", "/babyweight/transform.py"]
steps:
    - name: 'gcr.io/cloud-builders/docker'
      args: [ 'build', '-t', 'gcr.io/ind-coe/babyweight-pipeline-bqtocsv:latest', '.' ]
images:
    - 'gcr.io/ind-c

Creating temporary tarball archive of 4 file(s) totalling 5.9 KiB before compression.
Uploading tarball of [.] to [gs://ind-coe_cloudbuild/source/1599117304.893688-f43e047067294b8ab7b2aa2a8c72fa25.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/ind-coe/builds/38d34c43-eacb-4be5-876a-a0549da7e92b].
Logs are available at [https://console.cloud.google.com/cloud-build/builds/38d34c43-eacb-4be5-876a-a0549da7e92b?project=989493500108].
Creating temporary tarball archive of 3 file(s) totalling 1.6 KiB before compression.
Uploading tarball of [.] to [gs://ind-coe_cloudbuild/source/1599117356.772601-4a1e78f1283a43b1828acb2b16763f31.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/ind-coe/builds/cf8b72b1-e258-4abf-89ae-62770a77766e].
Logs are available at [https://console.cloud.google.com/cloud-build/builds/cf8b72b1-e258-4abf-89ae-62770a77766e?project=989493500108].
Creating temporary tarball archive of 4 file(s) totalling 3.1 KiB before compression.
Uploading tarball of [

Check that the Docker images work properly ...

In [5]:
!docker run -t gcr.io/ind-coe/babyweight-pipeline-bqtocsv:latest --project $PROJECT  --bucket $BUCKET --local

shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
Unable to find image 'gcr.io/ind-coe/babyweight-pipeline-bqtocsv:latest' locally
sh: 0: getcwd() failed: No such file or directory
sh: 0: getcwd() failed: No such file or directory
sh: 0: getcwd() failed: No such file or directory
latest: Pulling from ind-coe/babyweight-pipeline-bqtocsv

[1B166ec614: Pulling fs layer 
[1Bacff238f: Pulling fs layer 
[1Bacd28e10: Pulling fs layer 
[1B3351ecad: Pulling fs layer 
[1Bebadcbf7: Pulling fs layer 
[1B7fa40814: Pulling fs layer 
[1B09e1efd0: Pulling fs layer 
[1Bf2d83806: Pulling fs layer 
[1B50df810a: Pulling fs layer 
[1Ba423fcde: Pulling fs layer 
[1B2974e2b6: Pulling fs layer 
[1Bd5e8c0c1: Pulling fs layer 
[1Bd420b0a9: Pulling fs layer 
[1BDigest: sha256:53cbfd5e1a452bd39ae34eee9a42995dff13564b429e69fa4126e95e49311f1c[8A[2K[14A[2K[9A[2K[8A[2K[9A[2K[9A[2K[8A[2K[14A[2K[7A[2K[7A[2K[6A[2K[14A

### 5. Upload and execute pipeline

Upload to the Kubeflow pipeline cluster

In [3]:
from pipelines.containers.pipeline import mlp_babyweight

args = {
    'project' : PROJECT, 
    'bucket' : BUCKET
}

#pipeline = client.create_run_from_pipeline_func(mlp_babyweight.preprocess_train_and_deploy, args)

os.environ['HPARAM_JOB'] = 'babyweight_200207_231639' # change to job from complete step
pipeline = client.create_run_from_pipeline_func(mlp_babyweight.train_and_deploy, args)

In [None]:
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.