#### Installation

Install the following packages required to execute this notebook.

In [5]:
! pip3 install --no-cache-dir --upgrade "kfp>2" "google-cloud-pipeline-components>2"\
                                        google-cloud-aiplatform

Collecting kfp>2
  Downloading kfp-2.3.0.tar.gz (377 kB)
     ---------------------------------------- 0.0/377.2 kB ? eta -:--:--
     - -------------------------------------- 10.2/377.2 kB ? eta -:--:--
     ------ ------------------------------ 61.4/377.2 kB 656.4 kB/s eta 0:00:01
     ------------------------- ------------ 256.0/377.2 kB 2.0 MB/s eta 0:00:01
     -------------------------------------- 377.2/377.2 kB 2.6 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting google-cloud-pipeline-components>2
  Downloading google_cloud_pipeline_components-2.5.0-py3-none-any.whl.metadata (5.9 kB)
Collecting google-cloud-aiplatform
  Downloading google_cloud_aiplatform-1.35.0-py2.py3-none-any.whl.metadata (27 kB)
Collecting docstring-parser<1,>=0.7.3 (from kfp>2)
  Downloading docstring_parser-0.15-py3-none-any.whl (36 kB)
Collecting kfp-pipeline-spec==0.2.2 (from kfp>2)
  Downloading kfp_pipeline_spec-0.2.2-py3

ERROR: Could not install packages due to an OSError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\rnekraso\\DataEngineering\\venv\\Lib\\site-packages\\google_cloud_pipeline_components\\container\\v1\\dataproc\\create_pyspark_batch\\__init__.py'
Check the permissions.


[notice] A new release of pip is available: 23.3 -> 23.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Collecting google-cloud-pipeline-components>2
  Downloading google_cloud_pipeline_components-2.5.0-py3-none-any.whl.metadata (5.9 kB)
Downloading google_cloud_pipeline_components-2.5.0-py3-none-any.whl (1.4 MB)
   ---------------------------------------- 0.0/1.4 MB ? eta -:--:--
   ---------------------------------------- 0.0/1.4 MB ? eta -:--:--
   - -------------------------------------- 0.0/1.4 MB 495.5 kB/s eta 0:00:03
   ---- ----------------------------------- 0.1/1.4 MB 1.4 MB/s eta 0:00:01
   --------- ------------------------------ 0.3/1.4 MB 2.1 MB/s eta 0:00:01
   --------------- ------------------------ 0.5/1.4 MB 2.8 MB/s eta 0:00:01
   ------------------------------ --------- 1.0/1.4 MB 4.7 MB/s eta 0:00:01
   ---------------------------------------- 1.4/1.4 MB 5.4 MB/s eta 0:00:00
Installing collected packages: google-cloud-pipeline-components
Successfully installed google-cloud-pipeline-components-2.5.0



[notice] A new release of pip is available: 23.3 -> 23.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


### Check the KFP SDK version.

In [7]:
! python3 -c "import kfp; print('KFP SDK version: {}'.format(kfp.__version__))"
! pip3 freeze | grep aiplatform

KFP SDK version: 2.3.0


'grep' is not recognized as an internal or external command,
operable program or batch file.


#### Pipeline Configurations

In [8]:
#The Google Cloud project that this pipeline runs in.
PROJECT_ID = "jads-399108"
# The region that this pipeline runs in
REGION = "us-central1"
# Specify a Cloud Storage URI that your pipelines service account can access. The artifacts of your pipeline runs are stored within the pipeline root.
PIPELINE_ROOT = "gs://temp_de2023_roman"

In [4]:
#### Import Packages

In [9]:
import google.cloud.aiplatform as aiplatform
import kfp
from kfp import compiler, dsl
from kfp.dsl import Artifact, Dataset, Input, Metrics, Model, Output, component

In [10]:
# Initialize the AI platform
aiplatform.init(
    project=PROJECT_ID,
    location=REGION,
)

#### Create Pipeline Components

We can create a component from Python functions (inline) and from a container. We will first try inline python functions. 
Refer to  https://www.kubeflow.org/docs/components/pipelines/v2/components/lightweight-python-components/ for more information.

#### Pipeline Component : Add

In [7]:
@dsl.component
def add(a: float, b: float) -> float:
  '''Calculates sum of two arguments'''
  return a + b

In [8]:
@dsl.pipeline(
  name='addition-pipeline',
  description='An example pipeline that performs addition calculations.',
  pipeline_root=PIPELINE_ROOT
)
def add_pipeline(
  x: float=1.0,
  y: float=7.0,
):
  # Passes a pipeline parameter and a constant value to the `add` factory
  # function.
  first_add_task = add(a=x, b=4.0)
  # Passes an output reference from `first_add_task` and a pipeline parameter
  # to the `add` factory function. For operations with a single return
  # value, the output reference can be accessed as `task.output` or
  # `task.outputs['output_name']`.
  second_add_task = add(a=first_add_task.output, b=y)

#### Compile the pipeline into a JSON file

In [9]:
from kfp import compiler

compiler.Compiler().compile(
    pipeline_func=add_pipeline,
    package_path='add_pipeline.yaml'
)

#### Submit the pipeline run

In [10]:
import google.cloud.aiplatform as aip


# Prepare the pipeline job
job = aip.PipelineJob(
    display_name="add_pipeline",
    template_path="add_pipeline.yaml",
    pipeline_root=PIPELINE_ROOT,
    parameter_values={
        'x': 8.0,
        'y': 9.0
    }
)

job.run()

Creating PipelineJob
PipelineJob created. Resource name: projects/1057552102023/locations/us-central1/pipelineJobs/addition-pipeline-20231013063946
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1057552102023/locations/us-central1/pipelineJobs/addition-pipeline-20231013063946')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/addition-pipeline-20231013063946?project=1057552102023
PipelineJob projects/1057552102023/locations/us-central1/pipelineJobs/addition-pipeline-20231013063946 current state:
PipelineState.PIPELINE_STATE_PENDING
PipelineJob projects/1057552102023/locations/us-central1/pipelineJobs/addition-pipeline-20231013063946 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1057552102023/locations/us-central1/pipelineJobs/addition-pipeline-20231013063946 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1057552102023/locations/us-centr