#### **Author: Wissem Khlifi**
# 00 - Environment Setup

 

---
## Setup

### Get the GCP project ID from the gcloud configuration

In [1]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'genai-demo-2024'

###  Define the region for GCP services

In [5]:
REGION = 'us-central1'

### Import necessary packages

In [6]:
from google.cloud import storage
from google.cloud import bigquery

import pandas as pd
from sklearn import datasets

### Initialize clients for GCP services

In [7]:
gcs = storage.Client(project = PROJECT_ID)
bq = bigquery.Client(project = PROJECT_ID)

### Set parameters

In [8]:
BUCKET = PROJECT_ID

---
### Create Storage Bucket
Check to see if bucket already exist and create if missing:
- [GCS Python Client](https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.client.Client)

In [9]:
if not gcs.lookup_bucket(BUCKET):
    bucketDef = gcs.bucket(BUCKET)
    bucket = gcs.create_bucket(bucketDef, project=PROJECT_ID, location=REGION)
    print(f'Created Bucket: {gcs.lookup_bucket(BUCKET).name}')
else:
    bucketDef = gcs.bucket(BUCKET)
    print(f'Bucket already exist: {bucketDef.name}')

Bucket already exist: genai-demo-2024


In [10]:
print(f'Review the storage bucket in the console here:\nhttps://console.cloud.google.com/storage/browser/{PROJECT_ID};tab=objects&project={PROJECT_ID}')

Review the storage bucket in the console here:
https://console.cloud.google.com/storage/browser/genai-demo-2024;tab=objects&project=genai-demo-2024


---
<a id = 'permissions'></a>
## Service Account & Permissions

This notebook instance is running as a service account in GCP.  This service account will also be used to run other services in Vertex AI like training jobs and pipelines.  The service account will need permission to interact with object in Cloud Storage which requires the role ([roles/storage.objectAdmin](https://cloud.google.com/storage/docs/access-control/iam-roles)).  

Get the current service account:

In [11]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'292219499736-compute@developer.gserviceaccount.com'

Enable the Cloud Resource Manager API:

In [12]:
!gcloud services enable cloudresourcemanager.googleapis.com

List the service accounts current roles:

In [13]:
!gcloud projects get-iam-policy $PROJECT_ID --filter="bindings.members:$SERVICE_ACCOUNT" --format='table(bindings.role)' --flatten="bindings[].members"

ROLE
roles/aiplatform.admin
roles/aiplatform.notebookRuntimeAdmin
roles/aiplatform.user
roles/artifactregistry.admin
roles/bigquery.admin
roles/cloudbuild.builds.editor
roles/cloudfunctions.admin
roles/cloudscheduler.admin
roles/dataproc.worker
roles/dialogflow.client
roles/dialogflow.reader
roles/iam.serviceAccountAdmin
roles/iam.serviceAccountUser
roles/logging.admin
roles/ml.admin
roles/notebooks.runner
roles/pubsub.admin
roles/resourcemanager.projectIamAdmin
roles/run.admin
roles/secretmanager.admin
roles/serviceusage.serviceUsageAdmin
roles/serviceusage.serviceUsageConsumer
roles/storage.admin
roles/storage.objectAdmin


If the resulting list is missing `roles/storage.objectAdmin` or another role that contains this permission, like the basic role `roles/owner`, then it will need to be added for the service account. Use these instructions to complete this:

In [14]:
print(f'Go To IAM in the Google Cloud Console:\nhttps://console.cloud.google.com/iam-admin/iam?orgonly=true&project={PROJECT_ID}&supportedpurview=organizationId')

Go To IAM in the Google Cloud Console:
https://console.cloud.google.com/iam-admin/iam?orgonly=true&project=genai-demo-2024&supportedpurview=organizationId


From the console link above, or by going to https:/console.cloud.google.com and navigating to "IAM & Admin > IAM":
- Locate the row for the service account listed above: `<project number>-compute@developer.gserviceaccount.com`
- Under the `inheritance` column click the pencil icon to edit roles
- In the fly over menu, under `Assign roles` select `Add Another Role`
- Click the `Select a role` box and type `Storage Object Admin`, then select `Storage Object Admin`
- Click Save
- Rerun the list of services below and verify the role has been added:

In [15]:
!gcloud projects get-iam-policy $PROJECT_ID --filter="bindings.members:$SERVICE_ACCOUNT" --format='table(bindings.role)' --flatten="bindings[].members"

ROLE
roles/aiplatform.admin
roles/aiplatform.notebookRuntimeAdmin
roles/aiplatform.user
roles/artifactregistry.admin
roles/bigquery.admin
roles/cloudbuild.builds.editor
roles/cloudfunctions.admin
roles/cloudscheduler.admin
roles/dataproc.worker
roles/dialogflow.client
roles/dialogflow.reader
roles/iam.serviceAccountAdmin
roles/iam.serviceAccountUser
roles/logging.admin
roles/ml.admin
roles/notebooks.runner
roles/pubsub.admin
roles/resourcemanager.projectIamAdmin
roles/run.admin
roles/secretmanager.admin
roles/serviceusage.serviceUsageAdmin
roles/serviceusage.serviceUsageConsumer
roles/storage.admin
roles/storage.objectAdmin


---
### Install KFP
If you get an error after a step, rerun it (3 or 4 times). The dependencies sometimes resolve.
This section installs the Kubeflow Pipelines SDK and necessary components for pipeline management.

- [Install the Kubeflow Pipelines SDK](https://www.kubeflow.org/docs/components/pipelines/v1/sdk/install-sdk/)

Kubeflow is chosen over TFX for this lab because Kubeflow provides a more comprehensive platform for orchestrating
and managing machine learning workflows, particularly when leveraging AutoML. Kubeflow's integration with GCP services 
and its support for hybrid and multi-cloud environments make it a versatile choice for various ML tasks. AutoML 
simplifies the process of building high-quality models with minimal code and is well-supported within the Kubeflow ecosystem.

TFX, on the other hand, is highly specialized for TensorFlow models and is designed for end-to-end machine learning
pipelines specifically within the TensorFlow ecosystem. While TFX is powerful for production-grade TensorFlow model 
deployments, Kubeflow provides broader support for different frameworks and easier integration with GCP AutoML services.



### Install the Kubeflow Pipelines SDK
##### This allows you to create, deploy, and manage machine learning pipelines on GCP 


In [19]:
!pip install kfp -U -q

### Install Google Cloud Pipeline Components
##### These components provide pre-built, reusable pipeline components that integrate with Google Cloud services.


In [20]:
!pip install google-cloud-pipeline-components -U -q

## Update AIPlatform Package (Google Cloud Vertex AI):

The `google-cloud-aiplatform` package updates frequently.  Update it for latest functionality.

- [aiplatform Python Client](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform)
- [GitHub Repo for api-common-protos](https://github.com/googleapis/api-common-protos)

Keeping the AI Platform package updated ensures access to the latest features and bug fixes.

### Install common Google APIs Protobufs
##### This package contains common protocol buffers for Google APIs.


In [21]:
!pip install googleapis-common-protos -U -q

### Install the AI Platform Python client library
##### This library allows you to interact with Google Cloud Vertex AI for training, 


In [22]:
!pip install google-cloud-aiplatform -U -q

### Reinstall the Cloud Vertex AI Python client library (in case of any dependency issues)
### Sometimes reinstallation can resolve issues caused by dependency conflicts or installation errors.


In [23]:
!pip install google-cloud-aiplatform



### Install Google Auth library
### This library is used for handling authentication with Google services.


In [24]:
!pip install google-auth

