# Using Cloud Composer to orchestrate Kubeflow pipeline on Vertex AI

E**Learning Objectives:**
1. Learn how to create a custom DAG for Cloud Composer
2. Learn how to use Airflow operators for Vertex AI Pipelines
3. Learn how to orchestrate Vertex AI Pipelines with existing ETL (Extract, Transform, Load) pipeline

This notebook demonstrates an Airflow DAG creation and shows how you *would* programmatically interact with Google Cloud Storage to upload DAG files.
**Important Notes:**
1.  **DAG Execution:** Airflow DAGs are typically uploaded directly to your Cloud Composer environment's GCS DAGs folder. Airflow workers then discover and parse these files. You do *not* run the DAG code directly from this notebook to execute the Airflow workflow.
2.  **Authentication:** To run the GCS upload code, ensure your Vertex AI Workbench environment (e.g., Colab or a local Jupyter server with `gcloud` authenticated) has the necessary Google Cloud permissions to write to your Composer DAGs bucket.
3.  **Cloud Composer DAGs Folder:** The target GCS path for DAGs in Cloud Composer is usually `gs://YOUR_COMPOSER_BUCKET/dags/`.
4. This notebook assumes the Cloud Composer instance is already created by following the instructions covered in the [Run an Apache Airflow DAG in Cloud Composer](https://cloud.google.com/composer/docs/composer-3/run-apache-airflow-dag). If you haven't run it, please create Cloud Composer instance using above instructions.

**Open and edit Airflow DAG template: "/dags/airflow_run_vertex_pipelines_dag.py"**

You need to edit provide actual values configuration for the constants:


In [4]:
import os
PROJECT_ID = !(gcloud config get-value project)
PROJECT_ID = PROJECT_ID[0]
REGION = "us-central1"
os.environ["REGION"] = REGION
ARTIFACT_STORE = f"gs://{PROJECT_ID}-kfp-artifact-store"
os.environ["ARTIFACT_STORE"] = ARTIFACT_STORE
VERTEX_AI_PIPELINE_YAML = "gs://your-bucket-name/path/to/covertype_kfp_pipeline.yaml" # TODO: Update path to your compiled KFP YAML
GCS_SOURCE_DATASET_PATH = "data/covertype/dataset.csv"
GCS_TRAIN_DATASET_PATH="gs://your-bucket-name/data/train_export.csv"
GCS_BUCKET_NAME="asl-public"
BIGQUERY_DATASET_ID="your-airflow_demo_dataset"
TABLE_ID="covertype"

BIGQUERY_TABLE_SCHEMA = (
    [
        {"name": "Elevation", "type": "INTEGER", "mode": "NULLABLE"},
        {"name": "Aspect", "type": "INTEGER", "mode": "NULLABLE"},
        {"name": "Slope", "type": "INTEGER", "mode": "NULLABLE"},
        {
            "name": "Horizontal_Distance_To_Hydrology",
            "type": "INTEGER",
            "mode": "NULLABLE",
        },
        {
            "name": "Vertical_Distance_To_Hydrology",
            "type": "INTEGER",
            "mode": "NULLABLE",
        },
        {
            "name": "Horizontal_Distance_To_Roadways",
            "type": "INTEGER",
            "mode": "NULLABLE",
        },
        {"name": "Hillshade_9am", "type": "INTEGER", "mode": "NULLABLE"},
        {"name": "Hillshade_Noon", "type": "INTEGER", "mode": "NULLABLE"},
        {"name": "Hillshade_3pm", "type": "INTEGER", "mode": "NULLABLE"},
        {
            "name": "Horizontal_Distance_To_Fire_Points",
            "type": "INTEGER",
            "mode": "NULLABLE",
        },
        {"name": "Wilderness_Area", "type": "STRING", "mode": "NULLABLE"},
        {"name": "Soil_Type", "type": "STRING", "mode": "NULLABLE"},
        {"name": "Cover_Type", "type": "INTEGER", "mode": "NULLABLE"},
    ],
)

([{'name': 'Elevation', 'type': 'INTEGER', 'mode': 'NULLABLE'}, {'name': 'Aspect', 'type': 'INTEGER', 'mode': 'NULLABLE'}, {'name': 'Slope', 'type': 'INTEGER', 'mode': 'NULLABLE'}, {'name': 'Horizontal_Distance_To_Hydrology', 'type': 'INTEGER', 'mode': 'NULLABLE'}, {'name': 'Vertical_Distance_To_Hydrology', 'type': 'INTEGER', 'mode': 'NULLABLE'}, {'name': 'Horizontal_Distance_To_Roadways', 'type': 'INTEGER', 'mode': 'NULLABLE'}, {'name': 'Hillshade_9am', 'type': 'INTEGER', 'mode': 'NULLABLE'}, {'name': 'Hillshade_Noon', 'type': 'INTEGER', 'mode': 'NULLABLE'}, {'name': 'Hillshade_3pm', 'type': 'INTEGER', 'mode': 'NULLABLE'}, {'name': 'Horizontal_Distance_To_Fire_Points', 'type': 'INTEGER', 'mode': 'NULLABLE'}, {'name': 'Wilderness_Area', 'type': 'STRING', 'mode': 'NULLABLE'}, {'name': 'Soil_Type', 'type': 'STRING', 'mode': 'NULLABLE'}, {'name': 'Cover_Type', 'type': 'INTEGER', 'mode': 'NULLABLE'}],)


## Airflow DAG Code
Edit provided Airflow DAG code (`demo_vertex_ai_pipeline_integration.py`) that you intend to upload to your Cloud Composer environment.

## Uploading the DAG to Cloud Composer Storage

To deploy this DAG to your Cloud Composer environment, you need to upload it to the DAGs folder in your Composer's associated Cloud Storage bucket.

**Before running this cell, make sure you have:**
1.  **Installed Google Cloud Storage client library:** `pip install google-cloud-storage`
2.  **Authenticated:** Your environment needs to be authenticated to GCP (e.g., `gcloud auth application-default login` or running in a GCP VM/Cloud Run/Vertex AI Workbench).
3.  **Identified your Composer DAGs bucket:** This is typically named `gs://us-central1-YOUR_COMPOSER_ENV_NAME-HASH-bucket/dags/`. You can find this in the Cloud Composer console.
#

Copyright 2021 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.