# Train Progressive GAN on Cloud Machine Learning Engine

Below is code for training the GAN on Cloud ML Engine (CMLE).

In [1]:
import os

os.environ['PROJECT'] = 'machine-learning-models-283320'
os.environ['BUCKET'] = 'celeba-progressive-gan'
os.environ['REGION'] = 'us-east1'

## Train 8x8 model

First, we will train the model up to 8x8 using a single GPU with a batch size of 128.

In [6]:
%%bash

JOBID=progan_8_$(date -u +%y%m%d_%H%M%S)

gcloud ai-platform jobs submit training ${JOBID} \
    --region=${REGION} \
    --module-name=trainer.task \
    --package-path=$(pwd)/progan/trainer \
    --job-dir=gs://${BUCKET}/ \
    --scale-tier=BASIC_GPU \
    --runtime-version=2.3 \
    --python-version=3.7 \
    -- \
    --data_bucket_name=${BUCKET} \
    --checkpoint_path=gs://${BUCKET}/models/ \
    --resolution=8 \
    --batch_size=128

jobId: progan__210530_173510
state: QUEUED


Job [progan__210530_173510] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe progan__210530_173510

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs progan__210530_173510


## Train 16x16 model

Now we will train the 16x16 model using the weights from the 8x8 model. For this step of training, we still use a batch size of 128 and a single GPU.

In [8]:
%%bash

JOBID=progan_16_$(date -u +%y%m%d_%H%M%S)

gcloud ai-platform jobs submit training ${JOBID} \
    --region=${REGION} \
    --module-name=trainer.task \
    --package-path=$(pwd)/progan/trainer \
    --job-dir=gs://${BUCKET}/ \
    --scale-tier=BASIC_GPU \
    --runtime-version=2.3 \
    --python-version=3.7 \
    -- \
    --data_bucket_name=${BUCKET} \
    --checkpoint_path=gs://${BUCKET}/models/ \
    --resolution=16 \
    --start_from_resolution=8 \
    --previous_weights_path=gs://${BUCKET}/models/8x8/20210529165544

jobId: progan_16_210530_211820
state: QUEUED


Job [progan_16_210530_211820] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe progan_16_210530_211820

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs progan_16_210530_211820


# TODO

For training 32x32 we train the model using a `CUSTOM` scale tier where the master machine type is `complex_model_l_gpu` which use 8 NVidia Tesla K80 GPUs. For more information about machines for custom scale tiers, see the [documentation](https://cloud.google.com/ai-platform/training/docs/using-gpus).

In [8]:
%%bash

JOBID=progan_32_$(date -u +%y%m%d_%H%M%S)

gcloud ai-platform jobs submit training ${JOBID} \
    --region=${REGION} \
    --module-name=trainer.task \
    --package-path=$(pwd)/progan/trainer \
    --job-dir=gs://${BUCKET}/ \
    --scale-tier=CUSTOM \
    --master-machine-type=complex_model_l_gpu \
    --runtime-version=2.3 \
    --python-version=3.7 \
    -- \
    --data_bucket_name=${BUCKET} \
    --checkpoint_path=gs://${BUCKET}/models/ \
    --resolution=${RESOLUTION} \
    --start_from_resolution=${START_RESOLUTION}

jobId: progan_210425_210109
state: QUEUED


Job [progan_210425_210109] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe progan_210425_210109

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs progan_210425_210109
