#  Vertex AI Job from local machine

This tutorial will walk you through the process of creating a custom training job in AIF from your local machine. Google Cloud utilizes Docker, which is a containerization platform, to submit the training job.

1. Authorize access to google cloud and docker images

    `Before you can start creating a custom training job in AIF (Artificial Intelligence Framework), you need to authorize access to Google Cloud and Docker images`


2. Build a docker image locally and push to AIF from google cloud

    `In this step, you build a Docker image locally on your machine. A Docker image is a lightweight, standalone executable package that contains all the dependencies and configurations required to run your training job. You define the necessary dependencies and configurations in a Dockerfile. Once the Docker image is built, you push it to AIF in Google Cloud. This makes the image accessible for deployment and execution in the AIF environment.`


3. Edit `example_config.json` file

    `The example_config.json file contains the configuration settings for your custom training job. This file typically includes information such as Hardware specifications, the input data paths, model parameters, training hyperparameters, and output directories. You need to edit this file to specify the desired settings for your training job. By customizing the config.json file, you can tailor the training process according to your specific requirements.`


4. Run training job with `example_config.json`

    `Once you have completed the previous steps, you are ready to run the training job using the edited config.json file. This step involves executing a command that initiates the training process. The command or script reads the configuration settings from config.json and uses them to set up the training environment. The training job then starts, and you can monitor its progress and performance metrics.`

## 1. Authorize access to google cloud and docker images
1.1. Two factor authentication to authorize access to project in AIF

1.2. Authentication to configure docker to use the Google Container Registry

### 1.1 Two factor authentication to authorize access to AIF project

In [None]:
!gcloud auth login

In [None]:
!gcloud config set project PROJECT_NAME

### 1.2. Authentication to configure docker to use the Google Container Registry

<font color='red'>Run the commands in the below cell on your terminal/cmd prompt</font>

In [None]:
# run these commands on your terminal window
$ gcloud auth configure-docker us-central1-docker.pkg.dev
$ gcloud auth configure-docker us.gcr.io

## 2. Build a docker image and push to AIF from google cloud

To build a Docker image and push it to Google Cloud's AI Platform (AIF), you can follow these steps:

2.0. Create a Dockerfile: 
    
        Write a Dockerfile that specifies the necessary dependencies and configurations for your application.

2.1. Build the Docker image locally

        Use the Dockerfile to build the Docker image locally on your machine. You can do this by running the docker build command and providing a tag for the image.

2.2. Running Docker Container Locally on the Docker Image

        Before pushing the Docker image to AIF, make sure to run the Docker container based on the image in order to test and debug it.

2.3. Tag the Docker image

        Tag the Docker image with the registry name provided by Google Cloud. The registry name typically follows the format `us-central1-docker.pkg.dev/[PROJECT-ID]/[DATASET-ID]/[TARGET-IMAGE-NAME][:TAG]`

2.4. Publish docker image to AI Factory Artifact Registry

        Use the docker push command to push the Docker image to the Google Cloud registry. Make sure to use the tagged image name from Step 2.2

### 2.1. Build the docker image

The following command allows you to build a Docker image locally on your machine based on the specifications defined in the Dockerfile.

```
docker build -t <LOCAL_IMAGE_NAME[:TAG]> .
```

<font color='green'>Please note the following points regarding the usage of `LOCAL_IMAGE_NAME[:TAG]`</font>:

<ul>
    <li> You can choose any name and tag for the <LOCAL_IMAGE_NAME[:TAG]> parameter. This is for local reference and helps you identify and manage your Docker images.</li>
    <li> If you use an existing local image name and tag, the new Docker image will replace the previous one with the same name and tag. Be cautious when reusing image names and tags to avoid unintentionally overwriting existing images.</li>
</ul>

<font color='red'> Note: If you are building a local image from AIF, you will need to be connected to the VPN. Also, depending on how large the image is, this initial local build can take a long time (e.g., a 25GB image could take over 2 hours, so plan accordingly).</font>


### 2.2. Running Docker Container Locally on the Docker Image

Below is the command to run the Docker container locally.

```
docker run <LOCAL_IMAGE_NAME[:TAG]> --user-arg1 user-arg1-value --user-arg2 user-arg2-value
```

<font color='green'>Note: The --user-arg* are optional</font>

#### <font color="red"> Please note that the Docker container does not have access to Google Cloud authentication. However, when we submit a training job, we will gain access to Google Cloud resources.</font>


If you still want to authenticate and run script locally please follow the `debug_tutorial.ipynb` notebook 



### 2.3. Tag the Docker image

Use the docker tag command to assign a specific tag to your Docker image. This tag helps identify and manage different versions or variations of the image.

```
docker tag -t <LOCAL_IMAGE_NAME[:TAG]> TARGET_IMAGE[:TAG]
```

<font color='green'>Please note the following points regarding the usage of `TARGET_IMAGE[:TAG]`</font>:

<ul>
    <li> It should be of the format <font color="green">us-central1-docker.pkg.dev/[PROJECT-ID]/[DATASET-ID]/[REMOTE-IMAGE-NAME][:TAG]</font></li>
    <li> Referring to [REMOTE-IMAGE-NAME][:TAG], it corresponds to the image name you wish to have in AIF. If you utilize an existing AIF image name and tag, the new Docker image will replace the previous one that shares the same name and tag. Exercise caution when reusing image names and tags to prevent unintended overwriting of existing images.</li>
    <li> To enchance the organization of images on AIF, use the same `REMOTE-IMAGE-NAME` wherever possible to avoid having too many images, however use distincr TAG names to make sure you do not overwrite someone else's work. <font color="red">Please take caution in this step so you don't accidentally overwrite someone else's Image</font>.
</ul>

### 2.4. Push the Docker image to AIF

<font color="red">Note: when pushing the docker image to AIF, please double check your TARGET-IMAGE-NAME and TAG to make sure it is unique to you.</font>

```
docker push us-central1-docker.pkg.dev/[PROJECT-ID]/[DATASET-ID]/[TARGET-IMAGE-NAME][:TAG]
```

### You can view the published image in `Artifact Registry` on AI Factory

https://console.cloud.google.com/artifacts

## 3. Edit `config.json` 

Modify `imageUri` & `args` & `studySpec` accordingly

Note:
  <ul>
  <li><font color="green"> The args parameter is a list of command line arguments that can be passed to the script. </font></li>
 
  <li><font color="green">The imageUri refers to the Docker image (TARGET_IMAGE[:TAG]) that you pushed to AIF in step 2.3. Please make sure TARGET_IMAGE[:TAG] information in the config.json is the same as before </font></li>
  </ul>


## 4. Run training job with `config.json`

<font color="red">Note: In the comand below, the --display-name can be anything you choose, as it is used for identification purposes. We would suggest including your lanid in the display name to differentiate which job is yours, that way when multiple jobs are running it is easy to differentiate yours from someone else's </font>

### 4.1. Run a single job

In [None]:
!gcloud ai custom-jobs create \
  --region=us-central1 \
  --config=example_config.json \
  --display-name=training_job_name

### 4.2 Run a hyperparameter tuning job

In [None]:

!gcloud ai hp-tuning-jobs create \
  --region=us-central1 \
  --display-name=training_job_name \
  --config=./example_config.json \
  --max-trial-count=13 \
  --parallel-trial-count=3

## Training job status

##### Please visit https://console.cloud.google.com/vertex-ai/training/hyperparameter-tuning-jobs to check the job status

##### Also visit google cloud storage path to check if the model folder exists