### 🏆 VertexAI Training
📌 Description

This notebook outlines the steps to run a custom training job on Google Cloud Vertex AI Training.

### ✅ Step 1: Authenticate & Set Up Your Google Cloud Environment


In [None]:
# 1. Log in to your Google Cloud account
!gcloud auth login

In [None]:
# Create a project if you haven't already done so
PROJECT_ID = "hackai-1337-2025-test"
PROJECT_NAME = "My Test Project"

!gcloud projects create "$PROJECT_ID" --name="$PROJECT_NAME"

# Set it as your active project
!gcloud config set project "$PROJECT_ID"

In [None]:
# Link Billing Account

# Output Example:
# ACCOUNT_ID            NAME                OPEN  MASTER_ACCOUNT_ID
# 01A2B3-XXXXXX-YYYYYY  My Billing Account  True
# Your BILLING_ACCOUNT_ID is the value in the ACCOUNT_ID column (01A2B3-XXXXXX-YYYYYY)

!gcloud billing accounts list

In [None]:
BILLING_ACCOUNT_ID = "BILLING_ACCOUNT_ID_HERE"

!gcloud billing projects link "$PROJECT_ID" --billing-account="$BILLING_ACCOUNT_ID"

### ✅ Step 2: Enable required APIs


In [None]:
!gcloud services enable compute.googleapis.com # For Compute Engine API
!gcloud services enable artifactregistry.googleapis.com # For Artifact Registry API
!gcloud services enable aiplatform.googleapis.com # For Vertex AI API
!gcloud services enable cloudbuild.googleapis.com # For Cloud Build API

In [None]:
USER_EMAIL = "YOUR_GCP_EMAIL"

!gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="user:$USER_EMAIL" \
    --role="roles/cloudbuild.builds.editor"

### ✅ Step 3: Convert Your Training Notebook to Python Script
This step assumes that you already have a notebook containing the code for your model's training step. We will generate a .py file from your notebook.
Upload your notebook and run the following command.

In [None]:
!jupyter nbconvert notebook_name.ipynb --to python

In [None]:
# Create a trainer directory and move the .py code to it
!mkdir trainer
# !mv notebook_name.py trainer/task.py
!mv notebook_name.py trainer/task.py

In [None]:
# Create your requirements.txt with your needed libraries needed for fintenuning

packages = """
transformers
datasets
"""

!echo "$packages" > requirements.txt

### ✅ Step 4: Create a Dockerfile

In [None]:
dockerfile_content = """
FROM us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.2-4.py310:latest

WORKDIR /

COPY trainer /trainer
COPY requirements.txt .

RUN pip install --upgrade pip && pip install -r requirements.txt

ENTRYPOINT ["python", "-m", "trainer.task"]
"""

with open("Dockerfile", "w") as f:
    f.write(dockerfile_content)

### ✅ Step 5: Build and Push Container

In [None]:
REPO_NAME = "hackai-docker-repo"

!gcloud artifacts repositories create "$REPO_NAME" --repository-format=docker \
--location=us-central1 --description="Docker repository"

!gcloud auth configure-docker us-central1-docker.pkg.dev

IMAGE_URI=f"us-central1-docker.pkg.dev/{PROJECT_ID}/{REPO_NAME}/my_image:latest"
IMAGE_URI

In [None]:
!mkdir -p build_context/trainer
!cp Dockerfile requirements.txt build_context/
!cp -r trainer/task.py build_context/trainer/

In [None]:
!cd build_context
!gcloud builds submit --tag="$IMAGE_URI" --project="$PROJECT_ID"

### ✅ Step 6: Run the Training Job

In [None]:
# Create a bcuket name (mandatory for the job to run)
BUCKET_NAME="gs://hackai-training-bucket"
REGION = "us-central1"
!gcloud storage buckets create $BUCKET_NAME --location=$REGION

In [None]:
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location="us-central1")

job = aiplatform.CustomContainerTrainingJob(
    display_name='my-training-job',
    container_uri=IMAGE_URI,
    staging_bucket=BUCKET_NAME
)

job.run(
    replica_count=1,
    machine_type='n1-standard-8', # To be customized
    accelerator_type='NVIDIA_TESLA_V100', # # To be customized
    accelerator_count=1
)

#### Waiting for the job to finish
