In [None]:
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Running Vertex Pipelines using E2E Samples repository - Infrastructure setup.


<table align="left">

  <td>
    <a href="https://github.com/teamdatatonic/vertex-pipelines-end-to-end-samples/blob/develop/docs/notebooks/01_infrastructure_setup.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>                                                                                         
</table>

**_NOTE_**: This notebook has been tested in the following environment:

* Python version = 3.9

## Overview

This notebook shows you how to setup infrastructure to run production ready pipelines on Google Cloud using Datatonic's Vertex Pipelines End-to-end Samples repository.

Learn more about [Vertex Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/introduction).

### Objective

In this tutorial, you learn how to set up the cloud infrastructure, in order to launch your first training and predicition pipeline:

This tutorial uses the following Google Cloud services and resources:

- *`Vertex Pipelines`*
- *`Google Cloud Storage`*
- *`Artifact Registry`*
- *`BigQuery`*
- *`Cloud Build`*

The steps performed include:

* Deploy infrastructure using Terraform for a typical setup of Vertex AI and other relevant services.

### Costs 


This tutorial uses billable components of Google Cloud:

* Vertex AI
* BigQuery
* Cloud Storage
* Cloud Build
* Artifact Registry


Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing),
[BigQuery pricing](https://cloud.google.com/bigquery/pricing),
and [Cloud Storage pricing](https://cloud.google.com/storage/pricing),
and [Cloud Build pricing](https://cloud.google.com/build/pricing),
and [Artifact Registry](https://cloud.google.com/artifact-registry/pricing),
and use the [Pricing Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

### Prerequisites

- [Google Cloud SDK (gcloud)](https://cloud.google.com/sdk/docs/quickstart)
- Make
- [Terraform](https://www.terraform.io)

## Clone Turbo Templates repository

In [None]:
# Clone a Git repository
# !git clone -b develop https://github.com/teamdatatonic/vertex-pipelines-end-to-end-samples
!git clone -b feat/tutorial_notebook https://github.com/teamdatatonic/vertex-pipelines-end-to-end-samples

In [None]:
%cd vertex-pipelines-end-to-end-samples/

## Infrastructure

The cloud infrastructure is managed using Terraform and is defined in the [`terraform`](terraform) directory. There are three Terraform modules defined in [`terraform/modules`](terraform/modules):

- `cloudfunction` - deploys a (Pub/Sub-triggered) Cloud Function from local source code
- `scheduled_pipelines` - deploys Cloud Scheduler jobs that will trigger Vertex Pipeline runs (via the above Cloud Function)
- `vertex_deployment` - deploys Cloud infrastructure required for running Vertex Pipelines, including enabling APIs, creating buckets, Artifact Registry repos, service accounts, and IAM permissions.

### Terraform Installation

**If you do not have terraform installed please please refer to the official documentation for detailed installation instructions: https://developer.hashicorp.com/terraform/downloads**

## Before You Begin

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).

3. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk).

#### Set your project ID

**If you don't know your project ID**, try the following:
* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)

In [None]:
PROJECT_ID = "[my-project-id]"
# Set the project id
! gcloud config set project {PROJECT_ID}

### Authenticate your Google Cloud account


As you're running a Jupyter environment locally, you'll need to authenticate manually. Please follow the instructions provided below.

In [None]:
 ! gcloud auth login

### Environment Setup

In order to run `make` commands relevant environment variables need to be set. Please update the environment variables for your dev environment (particularly `VERTEX_PROJECT_ID` and `VERTEX_LOCATION`).

In [None]:
%%writefile .env

VERTEX_LOCATION=europe-west2
VERTEX_PROJECT_ID=my-project-id

# Suffix (e.g. '<your name>') to facilitate running concurrent pipelines in the same Google Cloud project. Change if working in a team to avoid overwriting resources during development 
RESOURCE_SUFFIX=default

# Leave as-is
VERTEX_SA_EMAIL=vertex-pipelines@${VERTEX_PROJECT_ID}.iam.gserviceaccount.com
VERTEX_PIPELINE_ROOT=gs://${VERTEX_PROJECT_ID}-pl-root
CONTAINER_IMAGE_REGISTRY=${VERTEX_LOCATION}-docker.pkg.dev/${VERTEX_PROJECT_ID}/vertex-images

# Optional
VERTEX_CMEK_IDENTIFIER=
VERTEX_NETWORK=

Load environment variables using [Python-dotenv](https://pypi.org/project/python-dotenv/).

In [None]:
! pip install python-dotenv

In [None]:
from dotenv import load_dotenv
load_dotenv()

## Infrastructure deployment using terraform.


#### Enable the Cloud Resource Manager and Service Usage APs for your project.

In [None]:
! gcloud services enable cloudresourcemanager.googleapis.com --project=$VERTEX_PROJECT_ID
! gcloud services enable serviceusage.googleapis.com --project=$VERTEX_PROJECT_ID

#### Create tfstate bucket

Before provisioning your infrastructure we need to create Google Cloud Storage (GCS) bucket that will be used to store the [state files](https://developer.hashicorp.com/terraform/language/state/remote) remotely for Terraform deployments.

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l $VERTEX_LOCATION -p $VERTEX_PROJECT_ID gs://$VERTEX_PROJECT_ID-tfstate

### Deploy required infrastructure

Deploy command will:
1. Prepare a Terraform working directory by downloading any necessary provider plugins and initialize the backend configuration.
1. Create infrastructure resources defined in Terraform configuration (terraform/envs/dev).

In [None]:
! make deploy auto-approve=true

### Congratulations on successfully deploying the infrastructure required to run ML pipelines!