#Pretraining Script

This script pretrains a transformer model on protein sequences.

Note: If using a TPU from Google Cloud (not the Colab TPU), make sure to run this notebook on a VM with access to all GCP APIs, and make sure TPUs are enabled for the GCP project

Note: Run multiple copies of this notebook in multiple VMs to train multiple models in parallel

#Downgrade TensorFlow (most likely requires runtime restart if using Colab runtime)

In [None]:
!pip install tensorflow==1.15

# Configure settings

In [None]:
#@markdown ## General Config
#@markdown If preferred, a GCP TPU/runtime can be used to run this notebook (instructions below)
GCP_RUNTIME = False #@param {type:"boolean"}
#@markdown How many TPU scores the TPU has: if using colab, NUM_TPU_CORES is 8.
NUM_TPU_CORES = 8 #@param {type:"number"}
BUCKET_NAME = "theodore_jiang" #@param {type:"string"}
BUCKET_PATH = "gs://"+BUCKET_NAME
#@markdown ## IO Config
OUTPUT_MODEL_DIR = "bert_model_embedded_mutformer_12L" #@param {type:"string"}
#@markdown Folder in GCS where data was stored:
DATA_DIR = "pretraining_data_1024_embedded_mutformer" #@param {type:"string"}
LOGGING_DIR = "mutformer2_0_pretraining_logs" #@param {type:"string"}
RUN_NAME = "bert_model_embedded_mutformer_12L" #@param {type:"string"}


#### Vocabulary for the model (MutFormer uses the vocabulary below) ([PAD]
#### [UNK],[CLS],[SEP], and [MASK] are necessary default tokens; B and J
#### are markers for the beginning and ending of a protein sequence,
#### respectively; the rest are all amino acids possible, ranked 
#### approximately by frequency of occurence in human population)
#### vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
vocab = "\n".join("[PAD] [UNK] [CLS] [SEP] [MASK] L S B J E A P T G V K R D Q I N F H Y C M W".split(" "))

#If running on a GCP TPU, use these commands prior to running this notebook

To ssh into the VM:

```
gcloud beta compute ssh --zone <COMPUTE ZONE> <VM NAME> --project <PROJECT NAME> -- -L 8888:localhost:8888
```

Make sure the port above matches the port below (in this case it's 8888)

```
sudo apt-get update
sudo apt-get -y install python3 python3-pip
sudo apt-get install pkg-config
sudo apt-get install libhdf5-serial-dev
sudo apt-get install libffi6 libffi-dev
sudo -H pip3 install jupyter tensorflow==1.14 google-api-python-client tqdm
sudo -H pip3 install jupyter_http_over_ws
jupyter serverextension enable --py jupyter_http_over_ws
jupyter notebook   --NotebookApp.allow_origin='https://colab.research.google.com'   --port=8888   --NotebookApp.port_retries=0   --no-browser

(one command):sudo apt-get update ; sudo apt-get -y install python3 python3-pip ; sudo apt-get install pkg-config ; sudo apt-get -y install libhdf5-serial-dev ; sudo apt-get install libffi6 libffi-dev; sudo -H pip3 install jupyter tensorflow==1.14 google-api-python-client tqdm ; sudo -H pip3 install jupyter_http_over_ws ; jupyter serverextension enable --py jupyter_http_over_ws ; jupyter notebook   --NotebookApp.allow_origin='https://colab.research.google.com'   --port=8888   --NotebookApp.port_retries=0   --no-browser
```
And then copy and paste the outputted link with "locahost: ..." into the colab connect to local runtime option


###Also run this code segment, which creates a TPU

In [None]:
GCE_PROJECT_NAME = "" #@param {type:"string"}
TPU_ZONE = "us-central1-f" #@param {type:"string"}
TPU_NAME = "mutformer-tpu" #@param {type:"string"}

!gcloud alpha compute tpus create $TPU_NAME --accelerator-type=tpu-v2 --version=1.15.5 --zone=$TPU_ZONE ##create new TPU

!gsutil iam ch serviceAccount:`gcloud alpha compute tpus describe $TPU_NAME | grep serviceAccount | cut -d' ' -f2`:admin $BUCKET_PATH && echo 'Successfully set permissions!' ##give TPU access to GCS

#Clone the repo

In [None]:
if GCP_RUNTIME:
  !sudo apt-get -y install git
#@markdown ######where to clone the repo into (only value that it can't be is "mutformer"):
REPO_DESTINATION_PATH = "code/mutformer" #@param {type:"string"}
import os,shutil
if not os.path.exists(REPO_DESTINATION_PATH):
  os.makedirs(REPO_DESTINATION_PATH)
else:
  shutil.rmtree(REPO_DESTINATION_PATH)
  os.makedirs(REPO_DESTINATION_PATH)
cmd = "git clone https://github.com/WGLab/mutformer.git \"" + REPO_DESTINATION_PATH + "\""
!{cmd}

#Imports/Authenticate for GCP

In [None]:
if not GCP_RUNTIME:
  def authenticate_user(): ##authentication function that uses link authentication instead of popup
    if os.path.exists("/content/.config/application_default_credentials.json"): 
      return
    print("Authorize for runtime GCS:")
    !gcloud auth login --no-launch-browser
    print("Authorize for TPU GCS:")
    !gcloud auth application-default login  --no-launch-browser
  authenticate_user()

import sys
import json
import random
import logging
import tensorflow as tf
import time
import os
import shutil
import importlib

if REPO_DESTINATION_PATH == "mutformer":
  shutil.copytree(REPO_DESTINATION_PATH,"mutformer_code")
  REPO_DESTINATION_PATH = "mutformer_code"
if not os.path.exists("mutformer"):
  shutil.copytree(REPO_DESTINATION_PATH+"/mutformer_model_code","mutformer")
else:
  shutil.rmtree("mutformer")
  shutil.copytree(REPO_DESTINATION_PATH+"/mutformer_model_code","mutformer")
if "mutformer" in sys.path:
  sys.path.remove("mutformer")
sys.path.append("mutformer")

from mutformer import modeling, optimization, tokenization, run_pretraining

##reload modules so that you don't need to restart the runtime to reload modules in case that's needed
modules2reload = [modeling, 
                  optimization, 
                  tokenization,
                  run_pretraining]
for module in modules2reload:
    importlib.reload(module)

from modeling import *

##configure logging
log = logging.getLogger('tensorflow')
log.setLevel(logging.INFO)

log.handlers = []

formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

#@markdown Whether or not to write logs to a file
DO_FILE_LOGGING = True #@param {type:"boolean"}
if DO_FILE_LOGGING:
  #@markdown If using file logging, what path to write logs to
  FILE_LOGGING_PATH = 'file_logging/spam.log' #@param {type:"string"}
  if not os.path.exists("/".join(FILE_LOGGING_PATH.split("/")[:-1])):
    os.makedirs("/".join(FILE_LOGGING_PATH.split("/")[:-1]))
  fh = logging.FileHandler(FILE_LOGGING_PATH)
  fh.setLevel(logging.INFO)
  fh.setFormatter(formatter)
  log.addHandler(fh)

ch = logging.StreamHandler()
ch.setLevel(logging.INFO)
ch.setFormatter(formatter)
log.addHandler(ch)

if GCP_RUNTIME:
  tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver(TPU_NAME, zone=TPU_ZONE, project=GCE_PROJECT_NAME)
  TPU_ADDRESS = tpu_cluster_resolver.get_master()
  with tf.Session(TPU_ADDRESS) as session:
      log.info('TPU address is ' + TPU_ADDRESS)
      # Upload credentials to TPU.
      tf.contrib.cloud.configure_gcs(session)
else:
  if 'COLAB_TPU_ADDR' in os.environ:
    log.info("Using TPU runtime")
    TPU_ADDRESS = 'grpc://' + os.environ['COLAB_TPU_ADDR']

    with tf.Session(TPU_ADDRESS) as session:
      log.info('TPU address is ' + TPU_ADDRESS)
      # Upload credentials to TPU.
      with tf.gfile.Open("/content/.config/application_default_credentials.json", 'r') as f:
        auth_info = json.load(f)
      tf.contrib.cloud.configure_gcs(session, credentials=auth_info)
      
  else:
    raise Exception('Not connected to TPU runtime, TPU required to run mutformer')


#Auto Detect amount of sequences per epoch

In [None]:
#@markdown If not GCP_RUNTIME and data was stored in drive, folder where the original data was stored (for detecting the # of steps per epoch) (this variable should match up with the "INPUT_DATA_FOLDER" variable in the data generation script) (this is used to limit interaction with GCS; it can also be left blank and steps will be automatically detected from tfrecords stored in GCS).
#@markdown 
#@markdown Note: if data was originally stored in GCS or GCP_RUNTIME is true, leave this item blank and steps per epoch will be autodetected from tfrecords:
ORIG_DATA_FOLDER = "" #@param {type: "string"}

if not GCP_RUNTIME and "/content/drive" in ORIG_DATA_FOLDER:
  from google.colab import drive
  !fusermount -u /content/drive
  drive.flush_and_unmount()
  drive.mount('/content/drive', force_remount=True)
  DRIVE_PATH = "/content/drive/My Drive"

  data_path_train = ORIG_DATA_FOLDER+"/train.txt" 

  lines = tf.gfile.Open(data_path_train).read().split("\n")
  SEQUENCES_PER_EPOCH = len(lines)

  print("sequences per epoch:",SEQUENCES_PER_EPOCH)
else:
  from tqdm import tqdm
  def steps_getter(input_files):
    tot_sequences = 0
    for input_file in input_files:
      print("reading:",input_file)

      d = tf.data.TFRecordDataset(input_file)

      with tf.Session() as sess:
        tot_sequences+=sess.run(d.reduce(0, lambda x,_: x+1))

    return tot_sequences

  BUCKET_PATH = "gs://{}".format(BUCKET_NAME)
  got_data = False
  while not got_data: ##will keep trying to access the data until available
    try:
      for f in tf.io.gfile.listdir(BUCKET_PATH+"/"+DATA_DIR+"/train"): ##try to access any of the data bins
          print("trying to access training data from saved copy number "+str(f))
          DATA_GCS_DIR = BUCKET_PATH+"/"+DATA_DIR+"/train/"+str(f)
          train_input_files = tf.gfile.Glob(os.path.join(DATA_GCS_DIR,'*tfrecord'))
          print("Using:",train_input_files)
          if len(train_input_files)>0:
            got_data = True
            try:
              SEQUENCES_PER_EPOCH = steps_getter(train_input_files)
              print("sequences per epoch:",SEQUENCES_PER_EPOCH)
              if not SEQUENCES_PER_EPOCH:
                for file in train_input_files:
                  tf.io.gfile.remove(file)
                raise
              break
            except:
              got_data=False
    except:
      pass
    if got_data:
      break
    raise Exception("Could not find data, waiting for data generation...")



trying to access training data from saved copy number 1/
Using: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/1/shard_0.tfrecord']
reading: gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/1/shard_0.tfrecord
sequences per epoch: 150531


# Run Training

Run the pretraining loop (should run this in parallel with the dynamic masking data generation loop).

In [None]:
#@markdown ## Model Config:
#@markdown Model architecture to use (BertModel indicates the original BERT, BertModelModified indicates MutFormer's architecture without integrated convs, MutFormer_embedded_convs indicates MutFormer with integrated convolutions).
MODEL_ARCHITECTURE = MutFormer_embedded_convs #@param
#@markdown Maximum sequence length the model should be able to handle (the internal attention mechanisms and embeddings will be created to only account for sequences up to this length) (larger maximum sequence length will take more memory and time to train):
model_max_seq_length = 1024 #@param
#@markdown Other miscellaneous config entries:
hidden_size =   768 #@param {type:"integer"}
num_hidden_layers =   12#@param {type:"integer"}
tf_variables_intializer_value_stdev = 0.02 #@param {type:"number"}
hidden_layers_dropout_probability = 0.1 #@param {type:"number"}
intermediate_size = 3072 #@param {type:"integer"}
self_attention_dropout_probability = 0.1 #@param {type:"number"}


bert_config = {                            
  "hidden_size": hidden_size,
  "hidden_act": "gelu", 
  "initializer_range": tf_variables_intializer_value_stdev, 
  "hidden_dropout_prob": hidden_layers_dropout_probability, 
  "num_attention_heads": num_hidden_layers, 
  "type_vocab_size": 2, 
  "max_position_embeddings": model_max_seq_length, 
  "num_hidden_layers": num_hidden_layers, 
  "intermediate_size": intermediate_size, 
  "attention_probs_dropout_prob": self_attention_dropout_probability
}

##upload config
bert_config["vocab_size"] = len(vocab.split("\n"))

if not os.path.exists(OUTPUT_MODEL_DIR):
  os.makedirs(OUTPUT_MODEL_DIR)
with tf.gfile.Open(OUTPUT_MODEL_DIR+"/config.json", "w") as fo:
  json.dump(bert_config, fo, indent=2)

!gsutil -m cp -r $OUTPUT_MODEL_DIR gs://$BUCKET_NAME


#@markdown \
#@markdown 
#@markdown 
#@markdown ## Training procedure config
#@markdown When checking for dynamically generated data, how long to wait between each check (to minimize interaction with GCS, should be around the same time it takes for the data generation script to generate 1 epoch worth of data)
CHECK_DATA_EVERY_N_SECS = 1200 #@param {type:"integer"}
INIT_LEARNING_RATE =  2e-5 #@param {type:"number"}
END_LEARNING_RATE = 1e-9 #@param {type:"number"}
#@markdown How many checkpoints to keep at a time (older checkpoints will be deleted)
KEEP_N_CHECKPOINTS_AT_A_TIME = 20 #@param {type:"integer"}
#@markdown Stopping condition for training can be set by either a certain number of sequences or a certain number of steps. from below, PLANNED_TOTAL_STEPS will override PLANNED_TOTAL_SEQUENCES_SEEN; therefore, if using PLANNED_TOTAL_SEQUENCES_SEEN, set PLANNED_TOTAL_STEPS to -1.
#@markdown 
#@markdown * Option 1: How many sequences the model should train on before stopping:
PLANNED_TOTAL_SEQUENCES_SEEN =  1e9 #@param {type:"number"}
#@markdown * Option 2: How many steps the model should train for before stopping (number of total sequences trained on will depend on the batch size used).
PLANNED_TOTAL_STEPS =  -1#@param {type:"number"}
TRAIN_BATCH_SIZE =   64#@param {type:"integer"}
#@markdown If using gradient accumulation (to save memory), what multiplier to use (memory usage and training speed will both be divided by this value) (Note: batch size must be divisible by this number):
GRADIENT_ACCUMULATION_MULTIPLIER = 2 #@param {type:"integer"}


#@markdown how many steps to wait for each save (not that if SAVE_CHECKPOINT_STEPS is larger than the steps per epoch, the model will be saved every "steps per epoch" number of steps)
SAVE_CHECKPOINTS_STEPS = 1000 #@param {type:"integer"}
#@markdown When writing out training logs, how often to write them out:
SAVE_LOGS_EVERY_N_STEPS = 500 #@param (type:"integer")

PLANNED_TOTAL_STEPS = PLANNED_TOTAL_SEQUENCES_SEEN/TRAIN_BATCH_SIZE if PLANNED_TOTAL_STEPS==-1 else PLANNED_TOTAL_STEPS
DECAY_PER_STEP = (END_LEARNING_RATE-INIT_LEARNING_RATE)/PLANNED_TOTAL_STEPS


BERT_GCS_DIR = BUCKET_PATH+"/"+OUTPUT_MODEL_DIR
GCS_LOGGING_DIR = BUCKET_PATH+"/"+LOGGING_DIR+"/"+RUN_NAME

CONFIG_FILE = BERT_GCS_DIR+"/config.json"

while True: ##training loop
  INIT_CHECKPOINT = tf.train.latest_checkpoint(BERT_GCS_DIR)
  try:
    INIT_CHECKPOINT_STEP = int(INIT_CHECKPOINT.split("-")[-1])
    current_epoch = int(INIT_CHECKPOINT_STEP/STEPS_PER_EPOCH)
    print("CURRENT STEP:",INIT_CHECKPOINT_STEP)
    if int(INIT_CHECKPOINT_STEP)>=2000000:#PLANNED_TOTAL_STEPS: ##if reached planed total steps, stop
      break
  except:
    current_epoch = 0
  try: ###wrap entire training loop into try and except loop so glitches don't kill training
    print("\n\n\n\n\nEPOCH:"+str(current_epoch)+"\n")
    STEPS_PER_EPOCH = int(SEQUENCES_PER_EPOCH/TRAIN_BATCH_SIZE)
    print("Steps per epoch:",STEPS_PER_EPOCH)
    print("\n\n\n\n\n")

    got_data = False
    while not got_data:
      try:
        for f in tf.io.gfile.listdir(BUCKET_PATH+"/"+DATA_DIR+"/train"): ##try to access any of the data bins
          print("trying to access training data from saved copy number "+str(f))
          DATA_GCS_DIR = BUCKET_PATH+"/"+DATA_DIR+"/train/"+str(f)
          train_input_files = tf.gfile.Glob(os.path.join(DATA_GCS_DIR,'*tfrecord'))
          print("train_input_files:",train_input_files)
          if len(train_input_files)>0:
            got_data = True
            break
      except:
          pass
      if not got_data:
        print("Could not find data, waiting for data generation...trying again in another "+str(CHECK_DATA_EVERY_N_SECS)+" seconds.")
        time.sleep(CHECK_DATA_EVERY_N_SECS)

    config = modeling.BertConfig.from_json_file(CONFIG_FILE)

    log.info(f"Using checkpoint: {INIT_CHECKPOINT}")
    log.info(f"Using {len(train_input_files)} data shards for training")
    model_fn = run_pretraining.model_fn_builder(
        bert_config=config,
        logging_dir=GCS_LOGGING_DIR,
        save_logs_every_n_steps=SAVE_LOGS_EVERY_N_STEPS,
        init_checkpoint=INIT_CHECKPOINT,
        init_learning_rate=INIT_LEARNING_RATE,
        decay_per_step=DECAY_PER_STEP,
        num_warmup_steps=10,
        use_tpu=True,
        use_one_hot_embeddings=True,
        bert=MODEL_ARCHITECTURE,
        grad_accum_mul=GRADIENT_ACCUMULATION_MULTIPLIER)

    tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver(TPU_ADDRESS)

    run_config = tf.contrib.tpu.RunConfig(
        cluster=tpu_cluster_resolver,
        model_dir=BERT_GCS_DIR,
        save_checkpoints_steps=SAVE_CHECKPOINTS_STEPS,
        keep_checkpoint_max=KEEP_N_CHECKPOINTS_AT_A_TIME,
        tpu_config=tf.contrib.tpu.TPUConfig(
            iterations_per_loop=SAVE_CHECKPOINTS_STEPS,
            num_shards=NUM_TPU_CORES,
            per_host_input_for_training=tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2))

    estimator = tf.contrib.tpu.TPUEstimator(
        use_tpu=True,
        model_fn=model_fn,
        config=run_config,
        train_batch_size=TRAIN_BATCH_SIZE//GRADIENT_ACCUMULATION_MULTIPLIER,
        eval_batch_size=1)
      
    
    DATA_INFO = json.load(tf.gfile.Open(DATA_GCS_DIR+"info.json"))
    MAX_SEQ_LENGTH = DATA_INFO["sequence_length"]
    MAX_PREDICTIONS = DATA_INFO["max_num_predictions"]
    
    train_input_fn = run_pretraining.input_fn_builder(
            input_files=train_input_files,
            max_seq_length=MAX_SEQ_LENGTH,
            max_predictions_per_seq=MAX_PREDICTIONS,
            is_training=True)
  except Exception as e:
    log.info(f"Training load failed. error: {e}")
    continue
  try:
    estimator.train(input_fn=train_input_fn, steps=STEPS_PER_EPOCH)
    # For dynamic masking, a parallel data generation is used. This portion deletes the current dataset.
    cmd = "gsutil -m rm -r "+DATA_GCS_DIR
    !{cmd}
  except Exception as e:
    log.info(f"Training loop failed. error: {e}")




  


2022-06-03 06:17:57,821 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 06:17:58,237 - tensorflow - INFO - Done calling model_fn.
2022-06-03 06:17:58,240 - tensorflow - INFO - TPU job name worker
2022-06-03 06:18:00,589 - tensorflow - INFO - Graph was finalized.
2022-06-03 06:18:01,361 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-34576
2022-06-03 06:18:30,285 - tensorflow - INFO - Running local_init_op.
2022-06-03 06:18:31,374 - tensorflow - INFO - Done running local_init_op.
2022-06-03 06:18:46,491 - tensorflow - INFO - Saving checkpoints for 34576 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 06:19:22,082 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 06:19:22,084 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 06:19:22,094 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/0/info.json#1654236755213438...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/0/shard_0.tfrecord#1654236759959311...
/ [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 36928





EPOCH:15

Steps per epoch: 2352






trying to access training data from saved copy number 1/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/1/shard_0.tfrecord']


2022-06-03 06:34:17,242 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-36928
2022-06-03 06:34:17,246 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 06:34:17,266 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 06:34:23,517 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 06:34:23,519 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 06:34:23,527 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 06:34:23,530 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 06:34:23,532 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 06:34:23,534 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 06:34:23,537 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 06:34:23,539 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 06:34:23,543 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 06:34:53,268 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 06:34:53,719 - tensorflow - INFO - Done calling model_fn.
2022-06-03 06:34:53,721 - tensorflow - INFO - TPU job name worker
2022-06-03 06:34:56,141 - tensorflow - INFO - Graph was finalized.
2022-06-03 06:34:56,854 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-36928
2022-06-03 06:35:27,166 - tensorflow - INFO - Running local_init_op.
2022-06-03 06:35:28,281 - tensorflow - INFO - Done running local_init_op.
2022-06-03 06:35:43,965 - tensorflow - INFO - Saving checkpoints for 36928 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 06:36:18,258 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 06:36:18,261 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 06:36:18,269 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/1/info.json#1654237806263731...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/1/shard_0.tfrecord#1654237811194119...
/ [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 39280





EPOCH:16

Steps per epoch: 2352






trying to access training data from saved copy number 10/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/10/shard_0.tfrecord']


2022-06-03 06:51:26,306 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-39280
2022-06-03 06:51:26,309 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 06:51:26,331 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 06:51:32,141 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 06:51:32,143 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 06:51:32,153 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 06:51:32,155 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 06:51:32,161 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 06:51:32,163 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 06:51:32,167 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 06:51:32,170 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 06:51:32,173 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 06:51:58,829 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 06:52:02,851 - tensorflow - INFO - Done calling model_fn.
2022-06-03 06:52:02,853 - tensorflow - INFO - TPU job name worker
2022-06-03 06:52:05,200 - tensorflow - INFO - Graph was finalized.
2022-06-03 06:52:05,964 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-39280
2022-06-03 06:52:36,959 - tensorflow - INFO - Running local_init_op.
2022-06-03 06:52:38,101 - tensorflow - INFO - Done running local_init_op.
2022-06-03 06:52:54,048 - tensorflow - INFO - Saving checkpoints for 39280 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 06:53:29,420 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 06:53:29,424 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 06:53:29,435 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/10/shard_0.tfrecord#1654238877971363...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/10/info.json#1654238872837192...
/ [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 41632





EPOCH:17

Steps per epoch: 2352






trying to access training data from saved copy number 11/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/11/shard_0.tfrecord']


2022-06-03 07:08:46,742 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-41632
2022-06-03 07:08:46,745 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 07:08:46,768 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 07:08:53,069 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 07:08:53,071 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 07:08:53,081 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 07:08:53,086 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 07:08:53,089 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 07:08:53,093 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 07:08:53,096 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 07:08:53,098 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 07:08:53,102 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 07:09:21,893 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 07:09:22,346 - tensorflow - INFO - Done calling model_fn.
2022-06-03 07:09:22,348 - tensorflow - INFO - TPU job name worker
2022-06-03 07:09:24,840 - tensorflow - INFO - Graph was finalized.
2022-06-03 07:09:25,596 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-41632
2022-06-03 07:09:56,220 - tensorflow - INFO - Running local_init_op.
2022-06-03 07:09:57,376 - tensorflow - INFO - Done running local_init_op.
2022-06-03 07:10:13,636 - tensorflow - INFO - Saving checkpoints for 41632 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 07:10:50,085 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 07:10:50,088 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 07:10:50,097 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/11/info.json#1654239916707251...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/11/shard_0.tfrecord#1654239921634539...
/ [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 43984





EPOCH:18

Steps per epoch: 2352






trying to access training data from saved copy number 12/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/12/shard_0.tfrecord']


2022-06-03 07:25:36,658 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-43984
2022-06-03 07:25:36,660 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 07:25:36,671 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 07:25:43,338 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 07:25:43,341 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 07:25:43,344 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 07:25:43,347 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 07:25:43,350 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 07:25:43,351 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 07:25:43,352 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 07:25:43,354 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 07:25:43,355 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 07:26:16,018 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 07:26:16,473 - tensorflow - INFO - Done calling model_fn.
2022-06-03 07:26:16,478 - tensorflow - INFO - TPU job name worker
2022-06-03 07:26:18,988 - tensorflow - INFO - Graph was finalized.
2022-06-03 07:26:19,639 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-43984
2022-06-03 07:26:46,875 - tensorflow - INFO - Running local_init_op.
2022-06-03 07:26:48,054 - tensorflow - INFO - Done running local_init_op.
2022-06-03 07:27:04,499 - tensorflow - INFO - Saving checkpoints for 43984 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 07:27:47,008 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 07:27:47,012 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 07:27:47,022 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/12/shard_0.tfrecord#1654143466128642...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/12/info.json#1654143458945519...
/ [1/2 objects]  50% Done                                                       / [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 46336





EPOCH:19

Steps per epoch: 2352






trying to access training data from saved copy number 13/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/13/shard_0.tfrecord']


2022-06-03 07:42:35,943 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-46336
2022-06-03 07:42:35,950 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 07:42:35,963 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 07:42:41,604 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 07:42:41,606 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 07:42:41,621 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 07:42:41,624 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 07:42:41,627 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 07:42:41,630 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 07:42:41,633 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 07:42:41,635 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 07:42:41,636 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 07:43:13,535 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 07:43:13,977 - tensorflow - INFO - Done calling model_fn.
2022-06-03 07:43:13,980 - tensorflow - INFO - TPU job name worker
2022-06-03 07:43:16,479 - tensorflow - INFO - Graph was finalized.
2022-06-03 07:43:16,931 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-46336
2022-06-03 07:43:43,524 - tensorflow - INFO - Running local_init_op.
2022-06-03 07:43:44,691 - tensorflow - INFO - Done running local_init_op.
2022-06-03 07:44:00,813 - tensorflow - INFO - Saving checkpoints for 46336 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 07:44:36,322 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 07:44:36,326 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 07:44:36,336 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/13/info.json#1654144381830302...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/13/shard_0.tfrecord#1654144389256934...
/ [1/2 objects]  50% Done                                                       / [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 48688





EPOCH:20

Steps per epoch: 2352






trying to access training data from saved copy number 0/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/0/shard_0.tfrecord']


2022-06-03 07:59:28,309 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-48688
2022-06-03 07:59:28,311 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 07:59:28,324 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 07:59:33,923 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 07:59:33,925 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 07:59:33,935 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 07:59:33,938 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 07:59:33,941 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 07:59:33,943 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 07:59:33,945 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 07:59:33,947 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 07:59:33,949 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 08:00:03,219 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 08:00:03,691 - tensorflow - INFO - Done calling model_fn.
2022-06-03 08:00:03,694 - tensorflow - INFO - TPU job name worker
2022-06-03 08:00:06,241 - tensorflow - INFO - Graph was finalized.
2022-06-03 08:00:06,729 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-48688
2022-06-03 08:00:32,955 - tensorflow - INFO - Running local_init_op.
2022-06-03 08:00:34,121 - tensorflow - INFO - Done running local_init_op.
2022-06-03 08:00:49,875 - tensorflow - INFO - Saving checkpoints for 48688 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 08:01:24,801 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 08:01:24,803 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 08:01:24,812 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/0/shard_0.tfrecord#1654242173304762...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/0/info.json#1654242165104220...
/ [1/2 objects]  50% Done                                                       / [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 51040





EPOCH:21

Steps per epoch: 2352






trying to access training data from saved copy number 1/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/1/shard_0.tfrecord']


2022-06-03 08:16:13,840 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-51040
2022-06-03 08:16:13,844 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 08:16:13,858 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 08:16:19,680 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 08:16:19,683 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 08:16:19,690 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 08:16:19,696 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 08:16:19,706 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 08:16:19,708 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 08:16:19,716 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 08:16:19,718 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 08:16:19,720 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 08:16:54,216 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 08:16:54,672 - tensorflow - INFO - Done calling model_fn.
2022-06-03 08:16:54,675 - tensorflow - INFO - TPU job name worker
2022-06-03 08:16:57,196 - tensorflow - INFO - Graph was finalized.
2022-06-03 08:16:57,643 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-51040
2022-06-03 08:17:24,107 - tensorflow - INFO - Running local_init_op.
2022-06-03 08:17:25,333 - tensorflow - INFO - Done running local_init_op.
2022-06-03 08:17:41,316 - tensorflow - INFO - Saving checkpoints for 51040 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 08:18:14,908 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 08:18:14,911 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 08:18:14,923 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/1/info.json#1654243214127106...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/1/shard_0.tfrecord#1654243219149188...
/ [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 53392





EPOCH:22

Steps per epoch: 2352






trying to access training data from saved copy number 10/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/10/shard_0.tfrecord']


2022-06-03 08:33:08,227 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-53392
2022-06-03 08:33:08,237 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 08:33:08,253 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 08:33:14,126 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 08:33:14,128 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 08:33:14,131 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 08:33:14,134 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 08:33:14,137 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 08:33:14,138 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 08:33:14,140 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 08:33:14,143 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 08:33:14,144 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 08:33:41,398 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 08:33:41,845 - tensorflow - INFO - Done calling model_fn.
2022-06-03 08:33:41,853 - tensorflow - INFO - TPU job name worker
2022-06-03 08:33:44,277 - tensorflow - INFO - Graph was finalized.
2022-06-03 08:33:44,780 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-53392
2022-06-03 08:34:10,990 - tensorflow - INFO - Running local_init_op.
2022-06-03 08:34:12,261 - tensorflow - INFO - Done running local_init_op.
2022-06-03 08:34:28,022 - tensorflow - INFO - Saving checkpoints for 53392 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 08:35:02,330 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 08:35:02,335 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 08:35:02,347 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/10/shard_0.tfrecord#1654244271153920...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/10/info.json#1654244266151536...
/ [1/2 objects]  50% Done                                                       / [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 55744





EPOCH:23

Steps per epoch: 2352






trying to access training data from saved copy number 11/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/11/shard_0.tfrecord']


2022-06-03 08:50:00,563 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-55744
2022-06-03 08:50:00,565 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 08:50:00,585 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 08:50:06,216 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 08:50:06,218 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 08:50:06,229 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 08:50:06,234 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 08:50:06,238 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 08:50:06,240 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 08:50:06,247 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 08:50:06,250 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 08:50:06,256 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 08:50:35,464 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 08:50:35,896 - tensorflow - INFO - Done calling model_fn.
2022-06-03 08:50:35,899 - tensorflow - INFO - TPU job name worker
2022-06-03 08:50:38,417 - tensorflow - INFO - Graph was finalized.
2022-06-03 08:50:38,914 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-55744
2022-06-03 08:51:07,639 - tensorflow - INFO - Running local_init_op.
2022-06-03 08:51:08,920 - tensorflow - INFO - Done running local_init_op.
2022-06-03 08:51:24,630 - tensorflow - INFO - Saving checkpoints for 55744 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 08:52:01,293 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 08:52:01,296 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 08:52:01,305 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/11/info.json#1654245310863308...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/11/shard_0.tfrecord#1654245315541133...
/ [1/2 objects]  50% Done                                                       / [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 58096





EPOCH:24

Steps per epoch: 2352






trying to access training data from saved copy number 12/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/12/shard_0.tfrecord']


2022-06-03 09:06:57,443 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-58096
2022-06-03 09:06:57,445 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 09:06:57,464 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 09:07:03,125 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 09:07:03,129 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 09:07:03,132 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 09:07:03,136 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 09:07:03,138 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 09:07:03,140 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 09:07:03,142 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 09:07:03,144 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 09:07:03,145 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 09:07:40,698 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 09:07:41,164 - tensorflow - INFO - Done calling model_fn.
2022-06-03 09:07:41,167 - tensorflow - INFO - TPU job name worker
2022-06-03 09:07:43,576 - tensorflow - INFO - Graph was finalized.
2022-06-03 09:07:44,228 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-58096
2022-06-03 09:08:11,304 - tensorflow - INFO - Running local_init_op.
2022-06-03 09:08:12,547 - tensorflow - INFO - Done running local_init_op.
2022-06-03 09:08:28,429 - tensorflow - INFO - Saving checkpoints for 58096 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 09:09:05,670 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 09:09:05,673 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 09:09:05,683 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/12/info.json#1654246356129331...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/12/shard_0.tfrecord#1654246360975802...
/ [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 60448





EPOCH:25

Steps per epoch: 2352






trying to access training data from saved copy number 13/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/13/shard_0.tfrecord']


2022-06-03 09:24:38,662 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-60448
2022-06-03 09:24:38,665 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 09:24:38,684 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 09:24:44,947 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 09:24:44,949 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 09:24:44,955 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 09:24:44,957 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 09:24:44,960 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 09:24:44,962 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 09:24:44,966 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 09:24:44,967 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 09:24:44,969 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 09:25:14,623 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 09:25:15,096 - tensorflow - INFO - Done calling model_fn.
2022-06-03 09:25:15,100 - tensorflow - INFO - TPU job name worker
2022-06-03 09:25:17,651 - tensorflow - INFO - Graph was finalized.
2022-06-03 09:25:18,385 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-60448
2022-06-03 09:25:48,442 - tensorflow - INFO - Running local_init_op.
2022-06-03 09:25:49,661 - tensorflow - INFO - Done running local_init_op.
2022-06-03 09:26:06,119 - tensorflow - INFO - Saving checkpoints for 60448 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 09:26:42,550 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 09:26:42,552 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 09:26:42,562 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/13/info.json#1654247396719073...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/13/shard_0.tfrecord#1654247401598942...
/ [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 62800





EPOCH:26

Steps per epoch: 2352






trying to access training data from saved copy number 14/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/14/shard_0.tfrecord']


2022-06-03 09:42:03,423 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-62800
2022-06-03 09:42:03,426 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 09:42:03,442 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 09:42:09,624 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 09:42:09,626 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 09:42:09,632 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 09:42:09,635 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 09:42:09,641 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 09:42:09,646 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 09:42:09,648 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 09:42:09,651 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 09:42:09,657 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 09:42:39,357 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 09:42:39,805 - tensorflow - INFO - Done calling model_fn.
2022-06-03 09:42:39,808 - tensorflow - INFO - TPU job name worker
2022-06-03 09:42:42,171 - tensorflow - INFO - Graph was finalized.
2022-06-03 09:42:42,794 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-62800
2022-06-03 09:43:13,090 - tensorflow - INFO - Running local_init_op.
2022-06-03 09:43:14,377 - tensorflow - INFO - Done running local_init_op.
2022-06-03 09:43:30,813 - tensorflow - INFO - Saving checkpoints for 62800 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 09:44:14,107 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 09:44:14,111 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 09:44:14,119 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/14/info.json#1654145307011246...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/14/shard_0.tfrecord#1654145314998992...
/ [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 65152





EPOCH:27

Steps per epoch: 2352






trying to access training data from saved copy number 0/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/0/shard_0.tfrecord']


2022-06-03 09:59:17,389 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-65152
2022-06-03 09:59:17,392 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 09:59:17,407 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 09:59:23,728 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 09:59:23,735 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 09:59:23,740 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 09:59:23,744 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 09:59:23,746 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 09:59:23,749 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 09:59:23,751 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 09:59:23,752 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 09:59:23,754 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 09:59:53,134 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 09:59:53,621 - tensorflow - INFO - Done calling model_fn.
2022-06-03 09:59:53,625 - tensorflow - INFO - TPU job name worker
2022-06-03 09:59:56,097 - tensorflow - INFO - Graph was finalized.
2022-06-03 09:59:56,830 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-65152
2022-06-03 10:00:26,626 - tensorflow - INFO - Running local_init_op.
2022-06-03 10:00:27,823 - tensorflow - INFO - Done running local_init_op.
2022-06-03 10:00:43,648 - tensorflow - INFO - Saving checkpoints for 65152 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 10:01:22,178 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 10:01:22,181 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 10:01:22,189 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/0/info.json#1654249656545596...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/0/shard_0.tfrecord#1654249663176001...
/ [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 67504





EPOCH:28

Steps per epoch: 2352






trying to access training data from saved copy number 1/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/1/shard_0.tfrecord']


2022-06-03 10:16:27,003 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-67504
2022-06-03 10:16:27,006 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 10:16:27,024 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 10:16:44,236 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 10:16:44,238 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 10:16:44,241 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 10:16:44,247 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 10:16:44,250 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 10:16:44,252 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 10:16:44,254 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 10:16:44,255 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 10:16:44,257 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 10:17:14,379 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 10:17:14,814 - tensorflow - INFO - Done calling model_fn.
2022-06-03 10:17:14,817 - tensorflow - INFO - TPU job name worker
2022-06-03 10:17:17,359 - tensorflow - INFO - Graph was finalized.
2022-06-03 10:17:18,062 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-67504
2022-06-03 10:17:46,544 - tensorflow - INFO - Running local_init_op.
2022-06-03 10:17:47,790 - tensorflow - INFO - Done running local_init_op.
2022-06-03 10:18:03,577 - tensorflow - INFO - Saving checkpoints for 67504 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 10:18:42,880 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 10:18:42,883 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 10:18:42,892 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/1/shard_0.tfrecord#1654250710456164...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/1/info.json#1654250700597677...
/ [1/2 objects]  50% Done                                                       / [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 69856





EPOCH:29

Steps per epoch: 2352






trying to access training data from saved copy number 10/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/10/shard_0.tfrecord']


2022-06-03 10:33:45,310 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-69856
2022-06-03 10:33:45,313 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 10:33:45,331 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 10:33:51,176 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 10:33:51,179 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 10:33:51,183 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 10:33:51,186 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 10:33:51,188 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 10:33:51,191 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 10:33:51,193 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 10:33:51,195 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 10:33:51,197 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 10:34:20,442 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 10:34:20,898 - tensorflow - INFO - Done calling model_fn.
2022-06-03 10:34:20,901 - tensorflow - INFO - TPU job name worker
2022-06-03 10:34:23,326 - tensorflow - INFO - Graph was finalized.
2022-06-03 10:34:23,786 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-69856
2022-06-03 10:34:50,712 - tensorflow - INFO - Running local_init_op.
2022-06-03 10:34:51,986 - tensorflow - INFO - Done running local_init_op.
2022-06-03 10:35:07,631 - tensorflow - INFO - Saving checkpoints for 69856 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 10:35:44,246 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 10:35:44,248 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 10:35:44,256 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/10/info.json#1654251753131552...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/10/shard_0.tfrecord#1654251757827418...
/ [1/2 objects]  50% Done                                                       / [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 72208





EPOCH:30

Steps per epoch: 2352






trying to access training data from saved copy number 11/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/11/shard_0.tfrecord']


2022-06-03 10:50:44,367 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-72208
2022-06-03 10:50:44,369 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 10:50:44,387 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 10:50:50,217 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 10:50:50,220 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 10:50:50,224 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 10:50:50,226 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 10:50:50,230 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 10:50:50,231 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 10:50:50,233 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 10:50:50,235 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 10:50:50,236 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 10:51:18,979 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 10:51:19,447 - tensorflow - INFO - Done calling model_fn.
2022-06-03 10:51:19,450 - tensorflow - INFO - TPU job name worker
2022-06-03 10:51:21,887 - tensorflow - INFO - Graph was finalized.
2022-06-03 10:51:22,345 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-72208
2022-06-03 10:51:49,010 - tensorflow - INFO - Running local_init_op.
2022-06-03 10:51:50,305 - tensorflow - INFO - Done running local_init_op.
2022-06-03 10:52:05,834 - tensorflow - INFO - Saving checkpoints for 72208 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 10:52:42,774 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 10:52:42,777 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 10:52:42,785 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/11/info.json#1654252793023039...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/11/shard_0.tfrecord#1654252797820638...
/ [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 74560





EPOCH:31

Steps per epoch: 2352






trying to access training data from saved copy number 12/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/12/shard_0.tfrecord']


2022-06-03 11:08:00,420 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-74560
2022-06-03 11:08:00,424 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 11:08:00,449 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 11:08:06,923 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 11:08:06,926 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 11:08:06,937 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 11:08:06,939 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 11:08:06,941 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 11:08:06,945 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 11:08:06,949 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 11:08:06,953 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 11:08:06,956 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 11:08:36,250 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 11:08:36,691 - tensorflow - INFO - Done calling model_fn.
2022-06-03 11:08:36,693 - tensorflow - INFO - TPU job name worker
2022-06-03 11:08:39,099 - tensorflow - INFO - Graph was finalized.
2022-06-03 11:08:39,743 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-74560
2022-06-03 11:09:08,144 - tensorflow - INFO - Running local_init_op.
2022-06-03 11:09:09,411 - tensorflow - INFO - Done running local_init_op.
2022-06-03 11:09:25,054 - tensorflow - INFO - Saving checkpoints for 74560 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 11:10:03,033 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 11:10:03,036 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 11:10:03,049 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/12/info.json#1654253827815346...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/12/shard_0.tfrecord#1654253832851340...
/ [1/2 objects]  50% Done                                                       / [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 76912





EPOCH:32

Steps per epoch: 2352






trying to access training data from saved copy number 13/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/13/shard_0.tfrecord']


2022-06-03 11:24:59,811 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-76912
2022-06-03 11:24:59,813 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 11:24:59,837 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 11:25:05,895 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 11:25:05,905 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 11:25:05,910 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 11:25:05,913 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 11:25:05,915 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 11:25:05,917 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 11:25:05,919 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 11:25:05,921 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 11:25:05,923 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 11:25:46,757 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 11:25:47,234 - tensorflow - INFO - Done calling model_fn.
2022-06-03 11:25:47,237 - tensorflow - INFO - TPU job name worker
2022-06-03 11:25:49,844 - tensorflow - INFO - Graph was finalized.
2022-06-03 11:25:50,332 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-76912
2022-06-03 11:26:17,761 - tensorflow - INFO - Running local_init_op.
2022-06-03 11:26:19,019 - tensorflow - INFO - Done running local_init_op.
2022-06-03 11:26:35,054 - tensorflow - INFO - Saving checkpoints for 76912 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 11:27:13,669 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 11:27:13,673 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 11:27:13,686 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/13/info.json#1654254880604205...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/13/shard_0.tfrecord#1654254885639397...
/ [1/2 objects]  50% Done                                                       / [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 79264





EPOCH:33

Steps per epoch: 2352






trying to access training data from saved copy number 14/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/14/shard_0.tfrecord']


2022-06-03 11:42:14,012 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-79264
2022-06-03 11:42:14,019 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 11:42:14,034 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 11:42:20,470 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 11:42:20,474 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 11:42:20,479 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 11:42:20,481 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 11:42:20,483 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 11:42:20,485 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 11:42:20,487 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 11:42:20,488 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 11:42:20,490 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 11:42:50,541 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 11:42:50,986 - tensorflow - INFO - Done calling model_fn.
2022-06-03 11:42:50,989 - tensorflow - INFO - TPU job name worker
2022-06-03 11:42:53,498 - tensorflow - INFO - Graph was finalized.
2022-06-03 11:42:54,225 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-79264
2022-06-03 11:43:24,102 - tensorflow - INFO - Running local_init_op.
2022-06-03 11:43:25,368 - tensorflow - INFO - Done running local_init_op.
2022-06-03 11:43:41,588 - tensorflow - INFO - Saving checkpoints for 79264 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 11:44:19,695 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 11:44:19,697 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 11:44:19,709 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/14/info.json#1654255918314654...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/14/shard_0.tfrecord#1654255923146276...
/ [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 81616





EPOCH:34

Steps per epoch: 2352






trying to access training data from saved copy number 15/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/15/shard_0.tfrecord']


2022-06-03 11:59:43,669 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-81616
2022-06-03 11:59:43,671 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 11:59:43,681 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 11:59:49,976 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 11:59:49,978 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 11:59:49,997 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 11:59:50,005 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 11:59:50,008 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 11:59:50,015 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 11:59:50,019 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 11:59:50,020 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 11:59:50,022 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 12:00:20,386 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 12:00:20,887 - tensorflow - INFO - Done calling model_fn.
2022-06-03 12:00:20,891 - tensorflow - INFO - TPU job name worker
2022-06-03 12:00:23,313 - tensorflow - INFO - Graph was finalized.
2022-06-03 12:00:24,057 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-81616
2022-06-03 12:00:53,189 - tensorflow - INFO - Running local_init_op.
2022-06-03 12:00:54,457 - tensorflow - INFO - Done running local_init_op.
2022-06-03 12:01:11,666 - tensorflow - INFO - Saving checkpoints for 81616 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 12:01:51,733 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 12:01:51,736 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 12:01:51,746 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/15/info.json#1654146214348505...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/15/shard_0.tfrecord#1654146221804255...
/ [1/2 objects]  50% Done                                                       / [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 83968





EPOCH:35

Steps per epoch: 2352






trying to access training data from saved copy number 0/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/0/shard_0.tfrecord']


2022-06-03 12:16:55,789 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-83968
2022-06-03 12:16:55,791 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 12:16:55,812 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 12:17:02,119 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 12:17:02,121 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 12:17:02,134 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 12:17:02,138 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 12:17:02,143 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 12:17:02,145 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 12:17:02,150 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 12:17:02,155 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 12:17:02,157 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 12:17:38,696 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 12:17:39,214 - tensorflow - INFO - Done calling model_fn.
2022-06-03 12:17:39,218 - tensorflow - INFO - TPU job name worker
2022-06-03 12:17:41,829 - tensorflow - INFO - Graph was finalized.
2022-06-03 12:17:42,678 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-83968
2022-06-03 12:18:11,308 - tensorflow - INFO - Running local_init_op.
2022-06-03 12:18:12,589 - tensorflow - INFO - Done running local_init_op.
2022-06-03 12:18:29,261 - tensorflow - INFO - Saving checkpoints for 83968 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 12:19:10,065 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 12:19:10,068 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 12:19:10,114 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/0/info.json#1654258167009918...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/0/shard_0.tfrecord#1654258172199332...
/ [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 86320





EPOCH:36

Steps per epoch: 2352






trying to access training data from saved copy number 1/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/1/shard_0.tfrecord']


2022-06-03 12:34:16,894 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-86320
2022-06-03 12:34:16,896 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 12:34:16,913 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 12:34:23,002 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 12:34:23,004 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 12:34:23,016 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 12:34:23,019 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 12:34:23,021 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 12:34:23,023 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 12:34:23,024 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 12:34:23,026 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 12:34:23,028 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 12:34:53,939 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 12:34:54,425 - tensorflow - INFO - Done calling model_fn.
2022-06-03 12:34:54,427 - tensorflow - INFO - TPU job name worker
2022-06-03 12:34:57,145 - tensorflow - INFO - Graph was finalized.
2022-06-03 12:34:57,633 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-86320
2022-06-03 12:35:26,112 - tensorflow - INFO - Running local_init_op.
2022-06-03 12:35:27,434 - tensorflow - INFO - Done running local_init_op.
2022-06-03 12:35:43,455 - tensorflow - INFO - Saving checkpoints for 86320 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 12:36:20,390 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 12:36:20,393 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 12:36:20,401 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']

Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/1/info.json#1654259213574761...
Removing gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/1/shard_0.tfrecord#1654259218827028...
/ [1/2 objects]  50% Done                                                       / [2/2 objects] 100% Done                                                       
Operation completed over 2 objects.                                              
CURRENT STEP: 88672





EPOCH:37

Steps per epoch: 2352






trying to access training data from saved copy number 10/
train_input_files: ['gs://theodore_jiang/pretraining_data_1024_embedded_mutformer/train/10/shard_0.tfrecord']


2022-06-03 12:51:20,632 - tensorflow - INFO - Using checkpoint: gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-88672
2022-06-03 12:51:20,634 - tensorflow - INFO - Using 1 data shards for training
2022-06-03 12:51:20,655 - tensorflow - INFO - Using config: {'_model_dir': 'gs://theodore_jiang/bert_model_embedded_mutformer_12L', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
cluster_def {
  job {
    name: "worker"
    tasks {
      key: 0
      value: "10.41.207.106:8470"
    }
  }
}
isolate_session_state: true
, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <t



2022-06-03 12:51:27,372 - tensorflow - INFO - **** Trainable Variables ****
2022-06-03 12:51:27,374 - tensorflow - INFO -   name = bert/embeddings/word_embeddings:0, shape = (27, 768), *INIT_FROM_CKPT*
2022-06-03 12:51:27,386 - tensorflow - INFO -   name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*
2022-06-03 12:51:27,389 - tensorflow - INFO -   name = bert/embeddings/position_embeddings:0, shape = (1024, 768), *INIT_FROM_CKPT*
2022-06-03 12:51:27,393 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 12:51:27,396 - tensorflow - INFO -   name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 12:51:27,399 - tensorflow - INFO -   name = bert/embeddings/conv1d/kernel:0, shape = (3, 768, 768), *INIT_FROM_CKPT*
2022-06-03 12:51:27,404 - tensorflow - INFO -   name = bert/embeddings/conv1d/bias:0, shape = (768,), *INIT_FROM_CKPT*
2022-06-03 12:51:27,407 - tensorflow - INFO

Tensor("gradients/AddN_93:0", shape=(27, 768), dtype=float32) <tf.Variable 'bert/embeddings/word_embeddings:0' shape=(27, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/MatMul_1_grad/MatMul_1:0", shape=(2, 768), dtype=float32) <tf.Variable 'bert/embeddings/token_type_embeddings:0' shape=(2, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/Slice_grad/Pad:0", shape=(1024, 768), dtype=float32) <tf.Variable 'bert/embeddings/position_embeddings:0' shape=(1024, 768) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/sub_grad/Reshape:0", shape=(768,), dtype=float32) <tf.Variable 'bert/embeddings/LayerNorm/beta:0' shape=(768,) dtype=float32> Tensor("cls/predictions/truediv:0", shape=(), dtype=float32)
Tensor("gradients/bert/embeddings/LayerNorm/batchnorm/mul_grad/Reshape_1:0", shape=(

2022-06-03 12:51:58,410 - tensorflow - INFO - Create CheckpointSaverHook.
2022-06-03 12:51:58,864 - tensorflow - INFO - Done calling model_fn.
2022-06-03 12:51:58,870 - tensorflow - INFO - TPU job name worker
2022-06-03 12:52:01,423 - tensorflow - INFO - Graph was finalized.
2022-06-03 12:52:01,899 - tensorflow - INFO - Restoring parameters from gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt-88672
2022-06-03 12:52:28,439 - tensorflow - INFO - Running local_init_op.
2022-06-03 12:52:29,751 - tensorflow - INFO - Done running local_init_op.
2022-06-03 12:52:46,025 - tensorflow - INFO - Saving checkpoints for 88672 into gs://theodore_jiang/bert_model_embedded_mutformer_12L/model.ckpt.
2022-06-03 12:53:24,256 - tensorflow - INFO - Initialized dataset iterators in 1 seconds
2022-06-03 12:53:24,260 - tensorflow - INFO - Installing graceful shutdown hook.
2022-06-03 12:53:24,268 - tensorflow - INFO - Creating heartbeat manager for ['/job:worker/replica:0/task:0/device:CPU:0']