### Using a custom base image for pipeline component
The following example shows how to use a custom docker image as the base image for a custom python-based component which includes additional source artifacts to reference from the pipeline component.


First, lets create a `scripts` directory to contain our shell script

In [1]:
!mkdir scripts

Next, lets create a simple shell script that echo's back the parameter we pass to it

In [2]:
%%writefile scripts/hello-robot.sh
#!/bin/bash
echo "Hello, $1.  Nice to meet you"

Writing scripts/hello-robot.sh


In [17]:
!chmod +x scripts/hello-robot.sh

In [2]:
import subprocess
output = subprocess.run(['scripts/hello-robot.sh','robv'],capture_output=True)
output.stdout.decode('utf-8')

'Hello, robv.  Nice to meet you\n'

Now, lets create a simple Dockerfile based off the `google-cloud-cli` image

In [3]:
%%writefile Dockerfile
FROM gcr.io/google.com/cloudsdktool/google-cloud-cli:latest
COPY ./scripts /scripts
RUN chmod -R +x /scripts

Overwriting Dockerfile


Ensure the Artifact Registry API service is enabled for your project

Learn more about [Enabling service](https://cloud.google.com/artifact-registry/docs/enable-service).

In [4]:
! gcloud services enable artifactregistry.googleapis.com

### Create a private Docker repository

Next, create your own Docker repository in the Google Artifact Registry.

1. Run the `gcloud artifacts repositories create` command to create a new Docker repository with your region with the description "docker repository".

2. Run the `gcloud artifacts repositories list` command to verify that your repository was created.

In [5]:
PRIVATE_REPO = "pipeline-components"

!gcloud artifacts repositories create {PRIVATE_REPO} --location='us-central1' --project='gcp-ml-sandbox' --repository-format=docker  --description="Images for custom pipeline components"
!gcloud artifacts repositories list

[1;31mERROR:[0m (gcloud.artifacts.repositories.create) ALREADY_EXISTS: the repository already exists
Listing items under project gcp-ml-sandbox, location us-central1.

                                                                                            ARTIFACT_REGISTRY
REPOSITORY                       FORMAT  MODE                 DESCRIPTION                                    LOCATION     LABELS  ENCRYPTION          CREATE_TIME          UPDATE_TIME          SIZE (MB)
custom-container-prediction-sdk  DOCKER  STANDARD_REPOSITORY                                                 us-central1          Google-managed key  2022-09-28T17:33:04  2022-11-11T23:04:06  560.739
docker-ray-repo                  DOCKER  STANDARD_REPOSITORY                                                 us-central1          Google-managed key  2022-02-15T21:39:45  2022-02-15T22:44:23  2985.346
docker-repo                      DOCKER  STANDARD_REPOSITORY                                                 us-centr

### Configure authentication to your private repo

Before you push or pull container images, configure Docker to use the `gcloud` command-line tool to authenticate requests to `Artifact Registry` for your region.

In [6]:
REGION='us-centra1'
PROJECT_ID='gcp-ml-sandbox'
! gcloud auth configure-docker {REGION}-docker.pkg.dev --quiet


{
  "credHelpers": {
    "gcr.io": "gcloud",
    "us.gcr.io": "gcloud",
    "eu.gcr.io": "gcloud",
    "asia.gcr.io": "gcloud",
    "staging-k8s.gcr.io": "gcloud",
    "marketplace.gcr.io": "gcloud",
    "us-central1-docker.pkg.dev": "gcloud"
  }
}
Adding credentials for: us-centra1-docker.pkg.dev
gcloud credential helpers already registered correctly.


In [7]:
### Build Docker image and push to artifact registry
LOCAL_IMAGE='hello-robot'
TAG="latest"
CONTAINER_NAME=f"{LOCAL_IMAGE}:{TAG}"
REGION='us-central1'

!docker build --tag {CONTAINER_NAME} -f Dockerfile .

!docker tag {CONTAINER_NAME} {REGION}-docker.pkg.dev/{PROJECT_ID}/{PRIVATE_REPO}/{CONTAINER_NAME}
!docker push {REGION}-docker.pkg.dev/{PROJECT_ID}/{PRIVATE_REPO}/{CONTAINER_NAME}


Sending build context to Docker daemon  22.02kB
Step 1/3 : FROM gcr.io/google.com/cloudsdktool/google-cloud-cli:latest
 ---> b91746661f9f
Step 2/3 : COPY ./scripts /scripts
 ---> 2246f1025d7b
Step 3/3 : RUN chmod -R +x /scripts
 ---> Running in 86c0a112823d
Removing intermediate container 86c0a112823d
 ---> 4270a3320bb5
Successfully built 4270a3320bb5
Successfully tagged hello-robot:latest
The push refers to repository [us-central1-docker.pkg.dev/gcp-ml-sandbox/pipeline-components/hello-robot]

[1Babde3da1: Preparing 
[1B51c80e69: Preparing 
[1B92193e84: Preparing 
[1B99a01dc2: Preparing 
[1Ba1a58a66: Preparing 
[1Bbdf2613c: Preparing 
[1B709313e8: Preparing 
[8Babde3da1: Pushed lready exists [7A[2K[7A[2K[8A[2Klatest: digest: sha256:598690c477451a9f9c608c9d31d8d78bc18ed3dbdbe46653d885f695728851d7 size: 1996


Let's test our image really quickly to ensure it does what we expect

In [8]:
!docker run -it {CONTAINER_NAME} /scripts/hello-robot.sh robv

Hello, robv.  Nice to meet you


In [27]:
!docker run -it {CONTAINER_NAME} gcloud --version

Google Cloud SDK 431.0.0
alpha 2023.05.12
app-engine-go 1.9.75
app-engine-java 2.0.14
app-engine-python 1.9.104
app-engine-python-extras 1.9.100
beta 2023.05.12
bigtable 
bq 2.0.92
bundled-python3-unix 3.9.16
cbt 0.15.0
cloud-datastore-emulator 2.3.0
cloud-firestore-emulator 1.17.4
cloud-spanner-emulator 1.5.4
core 2023.05.12
gcloud-crc32c 1.0.0
gke-gcloud-auth-plugin 0.5.3
gsutil 5.23
kpt 1.0.0-beta.31
local-extract 1.5.8
pubsub-emulator 0.8.2


Ok - lets build a simple component that takes a parameter and calls our script, returning the string as output

In [14]:
from kfp.v2 import dsl
BASE_IMAGE = f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{PRIVATE_REPO}/{CONTAINER_NAME}"

@dsl.component(base_image=BASE_IMAGE)
def hello_robot(text: str) -> str:
    import subprocess
    
    output = subprocess.run(['scripts/hello-robot.sh',text],capture_output=True)
    return output.stdout.decode('utf-8')

Now lets define a pipeline to run our component

In [24]:
BUCKET_URI='gs://gcp-ml-sandbox-scratch'
PIPELINE_ROOT = "{}/pipeline_root/base-component-example".format(BUCKET_URI)

@dsl.pipeline(
    name="custom-base-component",
    description="Simple pipeline showing a custom base image",
    pipeline_root=PIPELINE_ROOT,
)
def simple_pipeline(name_to_echo: str):
    hello_task = hello_robot(text=name_to_echo)


Finally, lets compile and run  the pipeline on GCP, passing in our pipeline parameter which will in tun be passed to our custom component

In [26]:
import google.cloud.aiplatform as aip
from kfp import dsl
from kfp.v2 import compiler
from kfp.v2.dsl import component

aip.init(project=PROJECT_ID, staging_bucket=BUCKET_URI)

compiler.Compiler().compile(pipeline_func=simple_pipeline, package_path="simple_pipeline.json")

DISPLAY_NAME = "simple_pipeline"

job = aip.PipelineJob(
    display_name=DISPLAY_NAME,
    template_path="simple_pipeline.json",
    pipeline_root=PIPELINE_ROOT,
    parameter_values={"name_to_echo": "Mo Haque"},

)

job.run()


Creating PipelineJob
PipelineJob created. Resource name: projects/357746845324/locations/us-central1/pipelineJobs/custom-base-component-20230517172434
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/357746845324/locations/us-central1/pipelineJobs/custom-base-component-20230517172434')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/custom-base-component-20230517172434?project=357746845324
PipelineJob projects/357746845324/locations/us-central1/pipelineJobs/custom-base-component-20230517172434 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/357746845324/locations/us-central1/pipelineJobs/custom-base-component-20230517172434 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/357746845324/locations/us-central1/pipelineJobs/custom-base-component-20230517172434 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/357746845324/l