# Xây dựng quy trình làm việc MLOps bằng Airflow 2 trên GKE

Lab: https://codelabs.developers.google.com/codelabs/cloud-mlops-airflow-gke?hl=vi#0

## GCP Setup and Authen

In [None]:
# authen
!gcloud auth list

In [3]:
# confirm project
!gcloud config list project

[core]
project = ext-pinetree-dw



Your active configuration is: [default]


In [5]:
# set project
!gcloud config set project datkt98-test01


To update your Application Default Credentials quota project, use the `gcloud auth application-default set-quota-project` command.
ERROR: (gcloud.config.set) There was a problem refreshing your current auth tokens: Reauthentication failed. cannot prompt during non-interactive execution.
Please run:

  $ gcloud auth login

to obtain new credentials.

If you have already logged in with a different account, run:

  $ gcloud config set account ACCOUNT

to select an already authenticated account to use.


In [1]:
!gcloud config configurations list

NAME: datkt-pinetree
IS_ACTIVE: False
ACCOUNT: datkt@pinetree.vn
PROJECT: ext-pinetree-dw
COMPUTE_DEFAULT_ZONE: asia-southeast1-a
COMPUTE_DEFAULT_REGION: asia-southeast1

NAME: datkt98-test01
IS_ACTIVE: False
ACCOUNT: datkt98.test01@gmail.com
PROJECT: datkt98-test01
COMPUTE_DEFAULT_ZONE: asia-southeast1-a
COMPUTE_DEFAULT_REGION: asia-southeast1

NAME: default
IS_ACTIVE: True
ACCOUNT: datkt98.test01@gmail.com
PROJECT: ext-pinetree-dw
COMPUTE_DEFAULT_ZONE: asia-southeast1-a
COMPUTE_DEFAULT_REGION: asia-southeast1

NAME: joyas-test
IS_ACTIVE: False
ACCOUNT: datkt.joyas@gmail.com
PROJECT: joyas-vietnam
COMPUTE_DEFAULT_ZONE: asia-southeast1-a
COMPUTE_DEFAULT_REGION: asia-southeast1


In [2]:
!gcloud config configurations activate datkt98-test01

Activated [datkt98-test01].

To update your Application Default Credentials quota project, use the `gcloud auth application-default set-quota-project` command.


In [3]:
!gcloud auth login

Your browser has been opened to visit:

    https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=32555940559.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A8085%2F&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fappengine.admin+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.login+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcompute+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Faccounts.reauth&state=SmTunA3yeOXBlk8mZjA6jHBtxjoy7f&access_type=offline&code_challenge=lfeQzcj-6_RxMtoDRMsaOMEddLR6SGCk-25TVNdaSh4&code_challenge_method=S256


You are now logged in as [datkt98.test01@gmail.com].
Your current project is [datkt98-test01].  You can change this setting by running:
  $ gcloud config set project PROJECT_ID


## Initial Instance

In [None]:
%%bash

# Set environment variables
export CODELAB_PREFIX=mlops-airflow
export PROJECT_NUMBER=$(gcloud projects list --filter="${DEVSHELL_PROJECT_ID}" --format="value(PROJECT_NUMBER)")

SUFFIX=$(echo $RANDOM | md5sum | head -c 4; echo;)
export CLUSTER_NAME=${CODELAB_PREFIX}
export CLUSTER_SA=sa-${CODELAB_PREFIX}
export BUCKET_LOGS_NAME=${CODELAB_PREFIX}-logs-${SUFFIX}
export BUCKET_DAGS_NAME=${CODELAB_PREFIX}-dags-${SUFFIX}
export BUCKET_DATA_NAME=${CODELAB_PREFIX}-data-${SUFFIX}
export REPO_NAME=${CODELAB_PREFIX}-repo
export REGION=asia-southeast1

# Enable Google API's
export PROJECT_ID=${DEVSHELL_PROJECT_ID}
gcloud config set project ${PROJECT_ID}
gcloud services enable \
container.googleapis.com \
cloudbuild.googleapis.com \
artifactregistry.googleapis.com \
storage.googleapis.com

In [None]:
%%bash

# Create a VPC for the GKE cluster
gcloud compute networks create mlops --subnet-mode=auto

# Create IAM and the needed infrastructure (GKE, Bucket, Artifact Registry)
# Create an IAM Service Account
gcloud iam service-accounts create ${CLUSTER_SA} --display-name="SA for ${CLUSTER_NAME}"
gcloud projects add-iam-policy-binding ${DEVSHELL_PROJECT_ID} --member "serviceAccount:${CLUSTER_SA}@${DEVSHELL_PROJECT_ID}.iam.gserviceaccount.com" --role roles/container.defaultNodeServiceAccount

# Create a GKE cluster
gcloud container clusters create ${CLUSTER_NAME} --zone ${REGION}-a --num-nodes=2 --network=mlops --create-subnetwork name=mlops-subnet --enable-ip-alias --addons GcsFuseCsiDriver --workload-pool=${DEVSHELL_PROJECT_ID}.svc.id.goog --no-enable-insecure-kubelet-readonly-port --service-account=${CLUSTER_SA}@${DEVSHELL_PROJECT_ID}.iam.gserviceaccount.com

# Create 1 x node pool for our cluster 1 x node with 1 x L4 GPU for model finetuning
gcloud container node-pools create training \
  --accelerator type=nvidia-l4,count=1,gpu-driver-version=latest \
  --project=${PROJECT_ID} \
  --location=${REGION}-a \
  --node-locations=${REGION}-a \
  --cluster=${CLUSTER_NAME} \
  --machine-type=g2-standard-12 \
  --num-nodes=1

# Create 1 x node pool for our cluster 1 x node with 2 x L4 GPUs for inference
gcloud container node-pools create inference\
  --accelerator type=nvidia-l4,count=2,gpu-driver-version=latest \
  --project=${PROJECT_ID} \
  --location=${REGION}-a \
  --node-locations=${REGION}-a \
  --cluster=${CLUSTER_NAME} \
  --machine-type=g2-standard-24 \
  --num-nodes=1

# Download K8s credentials
gcloud container clusters get-credentials ${CLUSTER_NAME} --location ${REGION}-a

# Create Artifact Registry
gcloud artifacts repositories create ${REPO_NAME} --repository-format=docker --location=${REGION}
gcloud artifacts repositories add-iam-policy-binding ${REPO_NAME} --member=serviceAccount:${CLUSTER_SA}@${DEVSHELL_PROJECT_ID}.iam.gserviceaccount.com --role=roles/artifactregistry.reader --location=${REGION}