# Pre-training of GPT-2

In this notebook we pre-train GPT-2 with the EmpathicDialog dataset to uses as a starting point for bringing in human feedback and to use as a baseline for comparision to our final model.

In [None]:
%tensorflow_version 1.13.1

Connect your own google drive to this notebook for saving the trained models:

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
%cd /content/drive/My Drive/nlp-2021-vda
!pwd

Download the initial GPT-2 Model from Google Cloud storage:

In [None]:
# Uncomment the line below if you are running the notebook for the first time

#!gsutil -m cp -r gs://nlp-lab/* ./

# For local run on mac (not in google colab)
#!gsutil -m cp -r 'gs://nlp-lab/*' ./

In [None]:
from scripts import pre_train_gpt2

In [None]:
sess = pre_train_gpt2.start_tf_sess()

In [None]:
#Parameters Descriptions

# sess - Tensorflow session
# dataset - path where the data is located
# steps - for how many steps do you wanna train the model
# model_name - Initial GPT model name i.e 124M or 335M or 775M
# model_dir - path where the initial GPT model is stored
# batch_size - Batch Size
# learning_rate - Learning Rate
# accumulate gradients - Accumulate gradients across N minibatches
# input_maxlen - maximum length of input tokens
# history_len - How many previous dialogues should be used in the context
# restore_from - Either "latest", "fresh", or a path to a checkpoint file
# run_name - Run id. Name of subdirectory in checkpoint/
# checkpoint_dir - path where the checkpoints should be stored or located
# multi_gpu - set True, if you have multiple GPUs
# save_every - Write a checkpoint every N steps
# print_every - Print stats every N steps
# optimizer - which optimizer to use
# overwrite - Set true, if you wanna overwrite previous checkpoints

pre_train_gpt2.finetune(sess,
             'datasets/empatheticdialogues/train.csv',
             'datasets/empatheticdialogues/valid.csv',
             steps=-1,
             model_name='124M',
             model_dir='gpt-2/models',
             batch_size=20,
             learning_rate=0.0001,
             accumulate_gradients=5,
             input_maxlen=100,
             history_len=4,
             patience=20,
             restore_from='latest',
             run_name='run1',
             checkpoint_dir='checkpoint',
             multi_gpu=False,
             print_every=1,
             max_checkpoints=1,
             optimizer='adam',
             overwrite=False)