# Examples of AutoML workflows using google-cloud-pipeline-components

This notebook shows preliminary examples of how to build pipelines using new components for AI Platform (Unified) services. These components are based on the new [high-level AI Platform (Unified) SDK](https://cloud.google.com/ai-platform-unified/docs/start/client-libraries#client_libraries), available now in Preview.

More documentation on these components will be available soon. 

For this demo ensure the following APIs are enabled:
- [Cloudbuild](https://pantheon.corp.google.com/apis/library/cloudbuild.googleapis.com?q=Cloudbuild)
- [Container Registry](https://pantheon.corp.google.com/apis/library/containerregistry.googleapis.com?q=container%20registry)

## Setup

Before you run this notebook, ensure that your Google Cloud user account and project are granted access to the Managed Pipelines Experimental. To be granted access to the Managed Pipelines Experimental, fill out this [form](http://go/cloud-mlpipelines-signup) and let your account representative know you have requested access. 

This notebook is intended to be run on either one of:
* [AI Platform Notebooks](https://cloud.google.com/ai-platform-notebooks). See the "AI Platform Notebooks" section in the Experimental [User Guide](https://docs.google.com/document/d/1JXtowHwppgyghnj1N1CT73hwD1caKtWkLcm2_0qGBoI/edit?usp=sharing) for more detail on creating a notebook server instance.
* [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb)


**To run this notebook on AI Platform Notebooks**, click on the **File** menu, then select "Download .ipynb".  Then, upload that notebook from your local machine to AI Platform Notebooks. (In the AI Platform Notebooks left panel, look for an icon of an arrow pointing up, to upload).

We'll first install some libraries and set up some variables.


Set `gcloud` to use your project.  **Edit the following cell before running it**.

In [None]:
PROJECT_ID = 'your-project-id'  # <---CHANGE THIS

In [None]:
!gcloud config set project {PROJECT_ID}

If you're running this notebook on colab, authenticate with your user account:

In [None]:
import sys
if 'google.colab' in sys.modules:
  from google.colab import auth
  auth.authenticate_user()

-----------------

**If you're on AI Platform Notebooks**, authenticate with Google Cloud before running the next section, by running
```sh
gcloud auth login
```
**in the Terminal window** (which you can open via **File** > **New** in the menu). You only need to do this once per notebook instance.

### Install the KFP SDK and AI Platform Pipelines client library

For Managed Pipelines Experimental, you'll need to download special versions of the KFP SDK and the AI Platform client library.

In [None]:
!gsutil cp gs://cloud-aiplatform-pipelines/releases/latest/kfp-1.5.0rc5.tar.gz .
!gsutil cp gs://cloud-aiplatform-pipelines/releases/latest/aiplatform_pipelines_client-0.1.0.caip20210415-py3-none-any.whl .


Then, install the libraries and restart the kernel.

In [None]:
if 'google.colab' in sys.modules:
  USER_FLAG = ''
else:
  USER_FLAG = '--user'

In [None]:
!python3 -m pip install {USER_FLAG} kfp-1.5.0rc5.tar.gz aiplatform_pipelines_client-0.1.0.caip20210415-py3-none-any.whl google-cloud-aiplatform --upgrade

In [None]:

!python3 -m pip install {USER_FLAG} "git+https://github.com/kubeflow/pipelines.git#egg=google-cloud-pipeline-components&subdirectory=components/google-cloud"

In [None]:
# Automatically restart kernel after installs
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

If you're on colab, re-authorize after the kernel restart. **Edit the following cell for your project ID before running it.**

In [None]:
import sys
if 'google.colab' in sys.modules:
  PROJECT_ID = 'your-project-id'  # <---CHANGE THIS
  !gcloud config set project {PROJECT_ID}
  from google.colab import auth
  auth.authenticate_user()
  USER_FLAG = ''

The KFP version should be >= 1.5.



In [None]:
# Check the KFP version
!python3 -c "import kfp; print('KFP version: {}'.format(kfp.__version__))"

### Set some variables and do some imports

**Before you run the next cell**, **edit it** to set variables for your project.  See the "Before you begin" section of the User Guide for information on creating your API key.  For `BUCKET_NAME`, enter the name of a Cloud Storage (GCS) bucket in your project.  Don't include the `gs://` prefix.

In [None]:
PATH=%env PATH
%env PATH={PATH}:/home/jupyter/.local/bin

# Required Parameters
USER = 'YOUR_USER_NAME' # <---CHANGE THIS
BUCKET_NAME = 'YOUR_BUCKET_NAME'  # <---CHANGE THIS
PIPELINE_ROOT = 'gs://{}/pipeline_root/{}'.format(BUCKET_NAME, USER)

PROJECT_ID = 'YOUR_PROJECT_ID'  # <---CHANGE THIS
REGION = 'us-central1'
API_KEY = 'YOUR_API_KEY'  # <---CHANGE THIS

print('PIPELINE_ROOT: {}'.format(PIPELINE_ROOT))

## Create a container for the component
Note: Soon, a prebuilt container will be available and this step will not be necessary.

### Create Cloudbuild YAML
Using Kaniko cache to speed up build time.

In [None]:
CONTAINER_ARTIFACTS_DIR="demo-container-artifacts"
!mkdir -p {CONTAINER_ARTIFACTS_DIR}

In [None]:
# You can add a faster build machine using: 
# options:
#   machineType: 'E2_HIGHCPU_8'

cloudbuild_yaml=f"""steps:
- name: 'gcr.io/kaniko-project/executor:latest'
  args: 
  - --destination=gcr.io/$PROJECT_ID/test-custom-container
  - --cache=false
  - --cache-ttl=99h
"""

CONTAINER_GCR_URI=f"gcr.io/{PROJECT_ID}/test-custom-container" 
with open(f"{CONTAINER_ARTIFACTS_DIR}/cloudbuild.yaml", 'w') as fp:
    fp.write(cloudbuild_yaml)

### Write Dockerfile

In [None]:
%%writefile {CONTAINER_ARTIFACTS_DIR}/Dockerfile

# Base image to use for this docker
FROM gcr.io/google-appengine/python:latest

WORKDIR /root

# Upgrade pip to latest
RUN pip3 install --upgrade pip

# Installs additional packages
RUN pip3 install google-cloud-aiplatform --upgrade

RUN pip3 install "git+https://github.com/kubeflow/pipelines.git#egg=google-cloud-pipeline-components&subdirectory=components/google-cloud"


ENTRYPOINT ["python3","-m","google_cloud_pipeline_components.aiplatform.remote_runner"] 

### Build Container

In [None]:
!gcloud builds submit --config {CONTAINER_ARTIFACTS_DIR}/cloudbuild.yaml {CONTAINER_ARTIFACTS_DIR}

## AutoML image classification

Create a managed image dataset from CSV and train it using Automl Image Training.


In [None]:
CONTAINER_GCR_URI

Define the pipeline:

In [None]:
import kfp
from google.cloud import aiplatform
from google_cloud_pipeline_components import aiplatform as gcc_aip
from aiplatform.pipelines import client

gcc_aip.utils.DEFAULT_CONTAINER_IMAGE=CONTAINER_GCR_URI

@kfp.dsl.pipeline(name='automl-image-training-v2')
def pipeline():
  ds_op = gcc_aip.ImageDatasetCreateOp(
      project=PROJECT_ID,
      display_name='flowers',
      gcs_source='gs://cloud-samples-data/vision/automl_classification/flowers/all_data_v2.csv',
      import_schema_uri=aiplatform.schema.dataset.ioformat.image.single_label_classification,)

  training_job_run_op = gcc_aip.AutoMLImageTrainingJobRunOp(
      project=PROJECT_ID,
      display_name='train-iris-automl-mbsdk-1',
      prediction_type='classification',
      model_type="CLOUD",
      base_model=None,
      dataset=ds_op.outputs['dataset'],
      model_display_name='iris-classification-model-mbsdk',     
      training_fraction_split=0.6,
      validation_fraction_split=0.2,
      test_fraction_split=0.2,
      budget_milli_node_hours=8000,
  )
  endpoint_op = gcc_aip.ModelDeployOp(
      project=PROJECT_ID,
      model=training_job_run_op.outputs['model'])


Compile your pipeline, and then run it.

In [None]:
from kfp.v2 import compiler
compiler.Compiler().compile(pipeline_func=pipeline,
        package_path='image_classif_pipeline.json')

In [None]:
api_client = client.Client(project_id=PROJECT_ID, region='us-central1',
                          api_key=API_KEY)

response = api_client.create_run_from_job_spec('image_classif_pipeline.json', pipeline_root=PIPELINE_ROOT)

## AutoML Tabular Classification

Define and run an AutoML Tabular Classification pipeline.

In [None]:
TRAIN_FILE_NAME = 'california_housing_train.csv'
!gsutil cp sample_data/california_housing_train.csv {PIPELINE_ROOT}/data/

gcs_csv_path = f'{PIPELINE_ROOT}/data/{TRAIN_FILE_NAME}'

Define the pipeline:

In [None]:
import kfp
from kfp.v2 import compiler

from google.cloud import aiplatform
from google_cloud_pipeline_components import aiplatform as gcc_aip
from aiplatform.pipelines import client

gcc_aip.utils.DEFAULT_CONTAINER_IMAGE=CONTAINER_GCR_URI

@kfp.dsl.pipeline(name='automl-tab-training-v2')
def pipeline():
  dataset_create_op = gcc_aip.TabularDatasetCreateOp(
      project=PROJECT_ID, 
      display_name='housing',
      gcs_source=gcs_csv_path)

  training_op = gcc_aip.AutoMLTabularTrainingJobRunOp(
      project=PROJECT_ID,
      display_name='train-housing-automl_1',
      optimization_prediction_type='regression',
      optimization_objective='minimize-rmse',    
      column_transformations=[
          {"numeric": {"column_name": "longitude"}},
          {"numeric": {"column_name": "latitude"}},
          {"numeric": {"column_name": "housing_median_age"}},
          {"numeric": {"column_name": "total_rooms"}},
          {"numeric": {"column_name": "total_bedrooms"}},
          {"numeric": {"column_name": "population"}},
          {"numeric": {"column_name": "households"}},
          {"numeric": {"column_name": "median_income"}},
      ],
      dataset = dataset_create_op.outputs['dataset'],
      target_column = "longitude"
  )

  deploy_op = gcc_aip.ModelDeployOp(
      model=training_op.outputs['model'],
      project=PROJECT_ID,
      machine_type='n1-standard-4')



Compile your pipeline, and then run it.

In [None]:
compiler.Compiler().compile(pipeline_func=pipeline,
        package_path='tab_classif_pipeline.json')

In [None]:
api_client = client.Client(project_id=PROJECT_ID, region='us-central1',
                          api_key=API_KEY)

response = api_client.create_run_from_job_spec('tab_classif_pipeline.json', 
                                               pipeline_root=PIPELINE_ROOT)

## AutoML Text Classification

Define and run an AutoML Text Classification pipeline.

In [None]:
import kfp
from kfp.v2 import compiler

from google.cloud import aiplatform
from google_cloud_pipeline_components.aiplatform import TextDatasetCreateOp, AutoMLTextTrainingJobRunOp, ModelDeployOp
from aiplatform.pipelines import client

import uuid

IMPORT_FILE = "gs://cloud-ml-data/NL-classification/happiness.csv"
gcc_aip.utils.DEFAULT_CONTAINER_IMAGE=CONTAINER_GCR_URI

@kfp.dsl.pipeline(name='automl-text-classification' + str(uuid.uuid4()))
def pipeline(
    project: str=PROJECT_ID,
    import_file: str=IMPORT_FILE):
    
    dataset_create_task = TextDatasetCreateOp(
            display_name="happydb",
            gcs_source=import_file,
            import_schema_uri=aiplatform.schema.dataset.ioformat.text.multi_label_classification,
            project=project
        )
    
    training_run_task = AutoMLTextTrainingJobRunOp(
        dataset=dataset_create_task.outputs['dataset'],
        display_name="train-housing-automl_1",
        prediction_type="classification",
        multi_label=True,
        training_fraction_split=0.6,
        validation_fraction_split=0.2,
        test_fraction_split=0.2,
        model_display_name="happy-model",
        project=project
    )
    
    model_deploy_op = ModelDeployOp(
        model=training_run_task.outputs['model'],
        project=project
    )


Compile your pipeline, and then run it.

In [None]:
pipeline = compiler.Compiler().compile(pipeline_func=pipeline,
        package_path='text_classsif_pipeline.json')


In [None]:
api_client = client.Client(project_id=PROJECT_ID, region='us-central1',
                          api_key=API_KEY)

response = api_client.create_run_from_job_spec('text_classsif_pipeline.json', 
                                    pipeline_root=PIPELINE_ROOT)

-----------------------------
Copyright 2021 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.