## BERT End to End (fine-tuning + Predicting) in 5 minutes with Cloud TPU

BERT, or Bidirectional Embedding Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks.

 This Colab demonstates using a free Colab Cloud TPU to fine-tune sentence and sentence-pair classification tasks built on top of pretrained BERT models and run predictions on tuned model. The colab demonsrates loading pretrained BERT models from both TF Hub and checkpoints.


In [None]:
!pip install tensorflow


Collecting tensorflow
  Downloading tensorflow-2.20.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.5 kB)
Collecting astunparse>=1.6.0 (from tensorflow)
  Downloading astunparse-1.6.3-py2.py3-none-any.whl.metadata (4.4 kB)
Collecting flatbuffers>=24.3.25 (from tensorflow)
  Downloading flatbuffers-25.9.23-py2.py3-none-any.whl.metadata (875 bytes)
Collecting google_pasta>=0.1.1 (from tensorflow)
  Downloading google_pasta-0.2.0-py3-none-any.whl.metadata (814 bytes)
Collecting libclang>=13.0.0 (from tensorflow)
  Downloading libclang-18.1.1-py2.py3-none-manylinux2010_x86_64.whl.metadata (5.2 kB)
Collecting tensorboard~=2.20.0 (from tensorflow)
  Downloading tensorboard-2.20.0-py3-none-any.whl.metadata (1.8 kB)
Collecting wheel<1.0,>=0.23.0 (from astunparse>=1.6.0->tensorflow)
  Downloading wheel-0.45.1-py3-none-any.whl.metadata (2.3 kB)
Collecting tensorboard-data-server<0.8.0,>=0.7.0 (from tensorboard~=2.20.0->tensorflow)
  Downloading tensorboard_data_server-0.

In [None]:
import datetime
import json
import os
import random
import string
import sys
import tensorflow as tf

assert 'COLAB TPU ADDR' in os.environ , 'ERROR : Not connected to TPU runtime'
TPU_ADDRESS = 'grpc://' + os.environ['COLAB_TPU_ADDR']
print('TOU Address is', TPU_ADDRESS)


from google.colab import auth
auth.authenticate_user()
with tf.Session(TPU_ADDRESS) as session:
  print("TPU devices: ")
  pprint.pprint(sesssion.list_devices)

  # Upload credentials to TPU
  with open('/content/adc.json', 'r') as f:
    auth_info = json.load(f)
  tf.contrib.cloud.configure_gcs(session, credentials=auth_info)
  # Now credentials are set for all future sessions on this TPU.



AssertionError: ERROR : Not connected to TPU runtime

### Prepare and IMport BERT modules

With the environement is configured, now we can import the BERT modules. The following step clones t esource code from GitHub ad import the modules from the source. Alternatively we can install BERT using pip

In [None]:
import sys

!test -d bert_repo || git clone  https://github.com/google-research/bert bert_repo

if not 'bert_repo' in sys.path:
  sys.path += ['bert_repo']

  # importing the python modules by BERT
import modeling
import optmization
import run_classifier
import run_classifier_with_tfhub
import tokenization

# import tfhub
import tensrflow_hub as hub






### Preparing for Training

This next section of code performs the following tasks:
* Specify task and download the training data
* Specify BERT pretrained model
* Specify GS bucket , create output directory for model checkpoints and eval results



In [None]:
TASK = 'MRPC' #@param {type:"string"}
assert TASK in ('MRPC', 'CoLA'), 'Only (MRPC, CoLA) are demonstrated here.'

# Download glue data.
! test -d download_glue_repo || git clone https://gist.github.com/60c2bdb54d156a41194446737ce03e2e.git download_glue_repo
!python download_glue_repo/download_glue_data.py --data_dir='glue_data' --tasks=$TASK

TASK_DATA_DIR = 'glue_data/' + TASK
print('***** Task data directory: {} *****'.format(TASK_DATA_DIR))
!ls $TASK_DATA_DIR

BUCKET = 'YOUR_BUCKET' #@param {type:"string"}
assert BUCKET, 'Must specify an existing GCS bucket name'
OUTPUT_DIR = 'gs://{}/bert-tfhub/models/{}'.format(BUCKET, TASK)
tf.gfile.MakeDirs(OUTPUT_DIR)
print('***** Model output directory: {} *****'.format(OUTPUT_DIR))

# Available pretrained model checkpoints:
#   uncased_L-12_H-768_A-12: uncased BERT base model
#   uncased_L-24_H-1024_A-16: uncased BERT large model
#   cased_L-12_H-768_A-12: cased BERT large model
BERT_MODEL = 'uncased_L-12_H-768_A-12' #@param {type:"string"}
BERT_MODEL_HUB = 'https://tfhub.dev/google/bert_' + BERT_MODEL + '/1'

Now let's load tokenizer module from TF Hub and play with it


In [None]:
tokenizer = run_classifier_with_tfhub.create_tokenizer_from_hub_module(BERT_MODEL_HUB)
tokenizer.tokenize("This here's an example of using the BERT tokenizer")

Also we initialize our hyperparameters, prepare the training data and initialize TPU config


In [None]:
TRAIN_BATCH_SIZE = 32
EVAL_BATCH_SIZE = 8
PREDICT_BATCH_SIZE = 8
LEARNING_RATE = 3.0
NUM_TRAIN_EPOCHS  = 3.0
MAX_SEQ_LENGTH = 128

# Warmup is a period of time where the learning rate is small and gradually increses--ususally
# helps training.

WARMUP_PROPORTION = 0.1
# Model configs
SAVE_CHECKPOINTS_STEPS = 1000
SAVE_SUMMARY_STEPS = 500


processors = {
    'cola': run_classifier.ColaProcessor,
    'mnli': run_classifier.MnliProcessor,
    'mrpc': run_classifier.MrpcProcessor,
}

processor = processors[TASK.lower()]()
label_list = processor.get_labels()


# Compute the number of train and warmups from batch_size
train_examples = processor.get_train_examples[TASK_DATA_DIR]
num_train_steps = int(len(train_examples) / TRAIN_BATCH_SIZE * NUM_TRAIN_EPOCHS)
num_warmups_steps =  int(num_train_steps * WARMUP_PROPORTION)

# Setup TPU related config
tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver(TPU_ADDRESS)
NUM_TPU_CORES = 8
ITERATIONS_PER_LOOP = 1000

def get_run_config(output_dir):
  return tf.contrib.tpu.RunConfig(
      cluster=tpu_cluster_resolver,
      model_dir=output_dir,
      save_checkpoints_steps=SAVE_CHECKPOINTS_STEPS,
      tpu_config=tf.contrib.tpu.TPUConfig(
          iterations_per_loop=ITERATIONS_PER_LOOP,
          num_shards=NUM_TPU_CORES,
          per_host_input_for_training=tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2)
  )









## Fine_tune and RUN Predictions on a pretrained BER model from TF Hub
This section demonstrates the fine-tuning from pre-trained BERT TF Hub module and running predictions.


In [None]:
os.environ["TFHUB_CACHE_DIR"] = OUTPUT_DIR

model_fn = run_classifier_with_tfhub.model_fn_builder(
    num_labels=len(label_list),
    learning_rate=LEARNING_RATE,
    num_train_steps=num_train_steps,
    num_warmup_steps=num_warmups_steps,
    use_tpu=True,
    bert_hub_module=handle=BERT_MODEL_HUB

)

estimator_from_fthub = tf.contrib.tpu.TPUEstimator(
    use_tpu = True,
    model_fn=model_fn,
    config=get_config(OUTPUT_DIR),
    train_batch_size=TRAIN_BATCH_SIZE,
    eval_batch_size=EVAL_BATCH_SIZE
    predict_batch_size=PREDICT_BATCH_SIZE,
)

At this point, we can fine-tune the model, evaluate it  and run predictions on it.


In [None]:
# Train the model
def model_train(estimator):
  print('MRPC/CoLA on BERT base model normally takes about 2-3 minutes. Please wait...')
  # We'll set sequences to be at most 128 tokens long.
  train_features = run_classifier.convert_examples_to_features(
      train_examples, label_list, MAX_SEQ_LENGTH, tokenizer)
  print('***** Started training at {} *****'.format(datetime.datetime.now()))
  print('  Num examples = {}'.format(len(train_examples)))
  print('  Batch size = {}'.format(TRAIN_BATCH_SIZE))
  tf.logging.info("  Num steps = %d", num_train_steps)
  train_input_fn = run_classifier.input_fn_builder(
      features=train_features,
      seq_length=MAX_SEQ_LENGTH,
      is_training=True,
      drop_remainder=True)
  estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
  print('***** Finished training at {} *****'.format(datetime.datetime.now()))



In [None]:
model_train(estimator_from_tfhub)

In [None]:
def model_eval(estimator):
  # Eval the model
  eval_examples = processor.get_dev_examples(TASK_DATA_DIR)
  eval_fetures = run_classifier.convert_examples_to_features(
      eval_examples, label_list, MAX_SEQ_LENGTH, tokenizer
  )

  print("**** Started evaluation at {} ****".format(datetime.datetime.now()))
  print("Num examles = {}".format(len(eval_examples)))
  print(" Batch size = {} ".format(model_train(estimator_from_fthub)))


  # eval will be slghtly WRONG on te TPU because it will truncate  the last batch
  eval_steps = int(len(eval_examples) / EVAL_BATCH_SIZE)
  eval_input_fn = run_classifier.input_fn_builder(
      features=eval_features,
      seq_length=MAX_SEQ_LENGTH,
      is_training=False,
      drop_remainder=True)

  result = estimator.evaluate(input_fn=input_fn, steps=eval_steps)
  print('**** Finished evaluation at {} ****'.format(datetime.datetime.now()))

  output_eval_file = os.path.join(OUTPUT_DIR, "eval_results.txt")
  with tf.gfile.GFile(output_eval_file, "w") as writer:
    print("*** Eval Results ***")
    for key in sorted(result.keys()):
      print(' {} = {} '.format(key, str(result[key])))
      writer.write("%s = %s\n" % (key, str(result[key])))

In [None]:
model_eval(estimator_from_fthub)

In [None]:
def model_predict(estimator):
  # Make prediction on a subset of eval examples
  prediction_examples = processor.get_dev_examples(TASK_DATA_DIR)[:PREDICT_BATCH_SIZE]
  input_features = run_classifier.convert_examples_to_features(
      prediction_examples, label_list, MAX_SEQ_LENGTH, tokenizer
  )
  predict_input_fn = run_classifier.input_fn_builder(features=input_features,seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=True)
  predictions = estimator.predict(predict_input_fn)

  for example, prediction in zip(prediction_examples, predictions):
    print('text_a: %s\ntext_b: %s\nlabel:%s\nprediction:%s\n' % (example.text_a, example.text_b, str(example.label), prediction['probabilities']))



In [None]:
model_predict(estimator_from_fthub)

## Fine-tune and run predictions on a pre-trained BERT model from checkpoints


In [None]:
# Setup task specific model and TPU running config.

BERT_PRETRAINED_DIR = 'gs://cloud-tpu-checkpoints/bert/' + BERT_MODEL
print("*** BERT pretrained directory: {} ****".format(BERT_PRETRAINED_DIR))
!gsutil ls $BERT_PRETRAINED_DIR

CONFIG_FILE = os.path.join(BERT_PRETRAINED_DIR, "bert_config.json")
INIT_CHECKPOINT = os.path.join(BERT_PRETRAINED_DIR, "bert_model.ckpt")


model_fn = run_classifier.model_fn_builder(
    bert_config=modeling.BertConfig.from_json_file(CONFIG_FILE),
    num_labels = len(label_list),
    init_checkpoint=INIT_CHECKPOINT,
    learning_rate=LEARNING_RATE,
    num_train_steps=num_train_steps,
    num_warmup_steps=num_warmups_steps,
    use_tpu=True,
    use_one_hot_embeddings=True
)

OUTPUT_DIR = OUTPUT_DIR.replace('bert-tfhub', 'bert-checkpoints')
tf.gfile.MakeDirs(OUTPUT_DIR)

estimator_from_checkpoints = tf.config.tpu.TPUEstimator(
    ues_tpus=True,
    model_fn=model_fn,
    config=get_run_config(OUTPUT_DIR),
    train_batch_size=TRAIN_BATCH_SIZE,
    eval_batch_size=EVAL_BATCH_SIZE,,
    predict_batch_size=PREDICT_BATCH_SIZE,
)




In [None]:
model_train(estimator_from_checkpoints)

In [None]:
model_eval(estimator_from_checkpoints)

In [None]:
model_predict(estimator_from_checkpoints)