In [1]:
# Model paths
MODEL_TYPE = "gpt2" 
OUTPUT_DIR = f"../../weights/{MODEL_TYPE}/papers_milan/"
TRAIN_PATH = f"../../data/papers_milan/train_papers.txt"
TEST_PATH = f"../../data/papers_milan/test_papers.txt"
VAL_PATH = f"../../data/papers_milan/val_papers.txt"

# Finetuning

In [2]:
def create_params_modeling(output_dir, model_type="gpt2", model_name_or_path=None, train_path=None, eval_path=None, 
                             do_train=False, do_eval=False, evaluate_during_training=False, line_by_line=False, block_size=-1):
    return {
    "output_dir": output_dir,
    "model_type": model_type,
    "model_name_or_path": model_name_or_path,
    "do_train": "--do_train" if do_train else "",
    "train_data_file": train_path if do_train else None,
    "do_eval": "--do_eval" if do_eval else "",
    "eval_data_file": eval_path if do_eval else None,
    "evaluate_during_training": "--evaluate_during_training" if evaluate_during_training else "",
    "block_size": block_size,
    "line_by_line": "--line_by_line" if line_by_line else "",
    "fp16": "--fp16",
    "fp16_opt_level": "O1"
}

In [3]:
cmd_finetuning = """../../transformers/examples/language-modeling/run_language_modeling.py \
    --output_dir={output_dir} \
    --model_type={model_type} \
    --model_name_or_path={model_name_or_path} \
    {do_train} \
    --train_data_file={train_data_file} \
    {do_eval} \
    --eval_data_file={eval_data_file} \
    {evaluate_during_training} \
    --per_device_train_batch_size=1 \
    --per_device_eval_batch_size=1 \
    --block_size={block_size}
    --overwrite_output_dir \
    --save_steps 5000 \
    --save_total_limit 5 \
    {line_by_line} \
    {fp16} \
    --fp16_opt_level={fp16_opt_level} \
    --logging_steps 2 
"""

In [10]:
# Arguments for training from scratch. I turn off evaluate_during_training,
#   line_by_line, should_continue, and model_name_or_path.
train_params = create_params_modeling(output_dir=OUTPUT_DIR, 
                                        model_type=MODEL_TYPE,
                                        model_name_or_path=MODEL_TYPE,
                                        train_path=TRAIN_PATH, 
                                        eval_path=TEST_PATH, 
                                        do_train=True, 
                                        do_eval=True, 
                                        evaluate_during_training=False,
                                        line_by_line=True
                                        )

val_finetuning_params = create_params_modeling(output_dir=OUTPUT_DIR,
                                    model_name_or_path=OUTPUT_DIR,
                                    train_path=None, 
                                    eval_path=VAL_PATH,                                      
                                    do_train=False, 
                                    do_eval=True,
                                    line_by_line=True
                                    )

val_params = create_params_modeling(output_dir=OUTPUT_DIR,
                                    model_name_or_path=MODEL_TYPE,
                                    model_type=MODEL_TYPE,
                                    train_path=None, 
                                    eval_path=VAL_PATH,
                                    do_train=False, 
                                    do_eval=True,
                                    line_by_line=True
                                     )

In [11]:
run {cmd_finetuning.format(**train_params)}

06/30/2020 16:42:10 - INFO - transformers.training_args -   PyTorch: setting up devices
06/30/2020 16:42:10 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(output_dir='../../weights/gpt2/papers_milan/', overwrite_output_dir=True, do_train=True, do_eval=True, do_predict=False, evaluate_during_training=False, per_device_train_batch_size=1, per_device_eval_batch_size=1, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, learning_rate=5e-05, weight_decay=0.0, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, warmup_steps=0, logging_dir='runs/Jun30_16-42-10_Camilo-UbuntuPC', logging_first_step=False, logging_steps=2, save_steps=5000, save_total_limit=5, no_cuda=False, seed=42, fp16=True, fp16_opt_level='O1', local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, dataloader_drop_last=False)
06/30/2020 16:42:10 - INFO - transformers.configuration_utils -   loading configuration file https://s3.amazon

Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic


HBox(children=(FloatProgress(value=0.0, description='Epoch', max=3.0, style=ProgressStyle(description_width='i…

HBox(children=(FloatProgress(value=0.0, description='Iteration', max=3583.0, style=ProgressStyle(description_w…

06/30/2020 16:42:27 - INFO - transformers.trainer -   {'loss': 5.954214572906494, 'learning_rate': 4.9990696809005496e-05, 'epoch': 0.0005581914596706671, 'step': 2}


Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8192.0


06/30/2020 16:42:27 - INFO - transformers.trainer -   {'loss': 5.598135232925415, 'learning_rate': 4.998139361801098e-05, 'epoch': 0.0011163829193413341, 'step': 4}


Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4096.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2048.0


06/30/2020 16:42:27 - INFO - transformers.trainer -   {'loss': 4.98131251335144, 'learning_rate': 4.997209042701647e-05, 'epoch': 0.0016745743790120011, 'step': 6}
06/30/2020 16:42:27 - INFO - transformers.trainer -   {'loss': 5.079455852508545, 'learning_rate': 4.9962787236021956e-05, 'epoch': 0.0022327658386826683, 'step': 8}
06/30/2020 16:42:27 - INFO - transformers.trainer -   {'loss': 4.626350164413452, 'learning_rate': 4.995348404502745e-05, 'epoch': 0.0027909572983533353, 'step': 10}
06/30/2020 16:42:27 - INFO - transformers.trainer -   {'loss': 4.750042676925659, 'learning_rate': 4.9944180854032937e-05, 'epoch': 0.0033491487580240022, 'step': 12}
06/30/2020 16:42:27 - INFO - transformers.trainer -   {'loss': 5.311662912368774, 'learning_rate': 4.993487766303842e-05, 'epoch': 0.003907340217694669, 'step': 14}
06/30/2020 16:42:28 - INFO - transformers.trainer -   {'loss': 4.626871347427368, 'learning_rate': 4.992557447204391e-05, 'epoch': 0.004465531677365337, 'step': 16}
06/30/2

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1024.0


06/30/2020 16:42:29 - INFO - transformers.trainer -   {'loss': 4.611403703689575, 'learning_rate': 4.984184575309331e-05, 'epoch': 0.009489254814401339, 'step': 34}
06/30/2020 16:42:29 - INFO - transformers.trainer -   {'loss': 4.471472501754761, 'learning_rate': 4.98325425620988e-05, 'epoch': 0.010047446274072006, 'step': 36}
06/30/2020 16:42:29 - INFO - transformers.trainer -   {'loss': 4.545317888259888, 'learning_rate': 4.982323937110429e-05, 'epoch': 0.010605637733742674, 'step': 38}
06/30/2020 16:42:29 - INFO - transformers.trainer -   {'loss': 5.127191543579102, 'learning_rate': 4.9813936180109785e-05, 'epoch': 0.011163829193413341, 'step': 40}
06/30/2020 16:42:30 - INFO - transformers.trainer -   {'loss': 4.553024649620056, 'learning_rate': 4.9804632989115265e-05, 'epoch': 0.011722020653084008, 'step': 42}
06/30/2020 16:42:30 - INFO - transformers.trainer -   {'loss': 4.434962749481201, 'learning_rate': 4.979532979812076e-05, 'epoch': 0.012280212112754674, 'step': 44}
06/30/202

06/30/2020 16:42:36 - INFO - transformers.trainer -   {'loss': 4.794435977935791, 'learning_rate': 4.937668620336776e-05, 'epoch': 0.03739882779793469, 'step': 134}
06/30/2020 16:42:36 - INFO - transformers.trainer -   {'loss': 3.6300346851348877, 'learning_rate': 4.9367383012373245e-05, 'epoch': 0.037957019257605355, 'step': 136}
06/30/2020 16:42:36 - INFO - transformers.trainer -   {'loss': 4.864150047302246, 'learning_rate': 4.935807982137873e-05, 'epoch': 0.038515210717276024, 'step': 138}
06/30/2020 16:42:36 - INFO - transformers.trainer -   {'loss': 4.246760010719299, 'learning_rate': 4.9348776630384225e-05, 'epoch': 0.039073402176946694, 'step': 140}
06/30/2020 16:42:37 - INFO - transformers.trainer -   {'loss': 4.197904586791992, 'learning_rate': 4.933947343938971e-05, 'epoch': 0.03963159363661736, 'step': 142}


Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 512.0


06/30/2020 16:42:37 - INFO - transformers.trainer -   {'loss': 4.079368591308594, 'learning_rate': 4.93301702483952e-05, 'epoch': 0.040189785096288025, 'step': 144}
06/30/2020 16:42:37 - INFO - transformers.trainer -   {'loss': 4.257288694381714, 'learning_rate': 4.9320867057400686e-05, 'epoch': 0.040747976555958694, 'step': 146}
06/30/2020 16:42:37 - INFO - transformers.trainer -   {'loss': 4.607589244842529, 'learning_rate': 4.931156386640618e-05, 'epoch': 0.04130616801562936, 'step': 148}
06/30/2020 16:42:37 - INFO - transformers.trainer -   {'loss': 4.624755382537842, 'learning_rate': 4.930226067541167e-05, 'epoch': 0.041864359475300025, 'step': 150}
06/30/2020 16:42:37 - INFO - transformers.trainer -   {'loss': 4.862242698669434, 'learning_rate': 4.929295748441716e-05, 'epoch': 0.042422550934970694, 'step': 152}
06/30/2020 16:42:37 - INFO - transformers.trainer -   {'loss': 5.036818027496338, 'learning_rate': 4.9283654293422646e-05, 'epoch': 0.042980742394641364, 'step': 154}
06/3

06/30/2020 16:42:44 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.8865010698669646e-05, 'epoch': 0.06809935807982138, 'step': 244}
06/30/2020 16:42:44 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.885570750767513e-05, 'epoch': 0.06865754953949205, 'step': 246}
06/30/2020 16:42:44 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.884640431668062e-05, 'epoch': 0.06921574099916271, 'step': 248}
06/30/2020 16:42:44 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.883710112568611e-05, 'epoch': 0.06977393245883338, 'step': 250}
06/30/2020 16:42:45 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.88277979346916e-05, 'epoch': 0.07033212391850405, 'step': 252}
06/30/2020 16:42:45 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.881849474369709e-05, 'epoch': 0.07089031537817471, 'step': 254}
06/30/2020 16:42:45 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.8809191

06/30/2020 16:42:52 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.835333519397153e-05, 'epoch': 0.09879988836170807, 'step': 354}
06/30/2020 16:42:52 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.834403200297703e-05, 'epoch': 0.09935807982137873, 'step': 356}
06/30/2020 16:42:52 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.8334728811982514e-05, 'epoch': 0.09991627128104939, 'step': 358}
06/30/2020 16:42:52 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.8325425620988e-05, 'epoch': 0.10047446274072007, 'step': 360}
06/30/2020 16:42:53 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.831612242999349e-05, 'epoch': 0.10103265420039073, 'step': 362}
06/30/2020 16:42:53 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.830681923899898e-05, 'epoch': 0.10159084566006141, 'step': 364}
06/30/2020 16:42:53 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.82975160

06/30/2020 16:43:00 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.784165968927342e-05, 'epoch': 0.12950041864359477, 'step': 464}
06/30/2020 16:43:00 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.7832356498278914e-05, 'epoch': 0.13005861010326542, 'step': 466}
06/30/2020 16:43:00 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.78230533072844e-05, 'epoch': 0.1306168015629361, 'step': 468}
06/30/2020 16:43:00 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.781375011628989e-05, 'epoch': 0.13117499302260677, 'step': 470}
06/30/2020 16:43:00 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.7804446925295375e-05, 'epoch': 0.13173318448227742, 'step': 472}
06/30/2020 16:43:01 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.779514373430087e-05, 'epoch': 0.1322913759419481, 'step': 474}
06/30/2020 16:43:01 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.77858405

06/30/2020 16:43:08 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.732998418457531e-05, 'epoch': 0.16020094892548145, 'step': 574}
06/30/2020 16:43:08 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.73206809935808e-05, 'epoch': 0.1607591403851521, 'step': 576}
06/30/2020 16:43:08 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.731137780258629e-05, 'epoch': 0.16131733184482278, 'step': 578}
06/30/2020 16:43:08 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.730207461159178e-05, 'epoch': 0.16187552330449345, 'step': 580}
06/30/2020 16:43:08 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.729277142059726e-05, 'epoch': 0.1624337147641641, 'step': 582}
06/30/2020 16:43:08 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.7283468229602756e-05, 'epoch': 0.16299190622383478, 'step': 584}
06/30/2020 16:43:09 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.727416503

06/30/2020 16:43:16 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.6818308679877196e-05, 'epoch': 0.19090147920736814, 'step': 684}
06/30/2020 16:43:16 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.680900548888269e-05, 'epoch': 0.19145967066703878, 'step': 686}
06/30/2020 16:43:16 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.6799702297888176e-05, 'epoch': 0.19201786212670946, 'step': 688}
06/30/2020 16:43:16 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.679039910689367e-05, 'epoch': 0.19257605358638014, 'step': 690}
06/30/2020 16:43:16 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.6781095915899157e-05, 'epoch': 0.19313424504605078, 'step': 692}
06/30/2020 16:43:16 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.677179272490464e-05, 'epoch': 0.19369243650572146, 'step': 694}
06/30/2020 16:43:16 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.6762

06/30/2020 16:43:23 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.630663317517909e-05, 'epoch': 0.22160200948925482, 'step': 794}
06/30/2020 16:43:24 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.629732998418458e-05, 'epoch': 0.22216020094892547, 'step': 796}
06/30/2020 16:43:24 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.6288026793190064e-05, 'epoch': 0.22271839240859614, 'step': 798}
06/30/2020 16:43:24 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.627872360219556e-05, 'epoch': 0.22327658386826682, 'step': 800}
06/30/2020 16:43:24 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.6269420411201044e-05, 'epoch': 0.2238347753279375, 'step': 802}
06/30/2020 16:43:24 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.626011722020653e-05, 'epoch': 0.22439296678760814, 'step': 804}
06/30/2020 16:43:24 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.625081

06/30/2020 16:43:31 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.579495767048098e-05, 'epoch': 0.2523025397711415, 'step': 904}
06/30/2020 16:43:31 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.5785654479486465e-05, 'epoch': 0.25286073123081215, 'step': 906}
06/30/2020 16:43:31 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.577635128849195e-05, 'epoch': 0.25341892269048283, 'step': 908}
06/30/2020 16:43:32 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.5767048097497445e-05, 'epoch': 0.2539771141501535, 'step': 910}
06/30/2020 16:43:32 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.575774490650293e-05, 'epoch': 0.2545353056098242, 'step': 912}
06/30/2020 16:43:32 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.5748441715508425e-05, 'epoch': 0.25509349706949486, 'step': 914}
06/30/2020 16:43:32 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.5739138

06/30/2020 16:43:39 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.5283282165782865e-05, 'epoch': 0.2830030700530282, 'step': 1014}
06/30/2020 16:43:39 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.527397897478836e-05, 'epoch': 0.28356126151269884, 'step': 1016}
06/30/2020 16:43:40 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.526467578379384e-05, 'epoch': 0.2841194529723695, 'step': 1018}
06/30/2020 16:43:40 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.525537259279933e-05, 'epoch': 0.2846776444320402, 'step': 1020}
06/30/2020 16:43:40 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.524606940180482e-05, 'epoch': 0.28523583589171086, 'step': 1022}
06/30/2020 16:43:40 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.523676621081031e-05, 'epoch': 0.28579402735138154, 'step': 1024}
06/30/2020 16:43:40 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.522

06/30/2020 16:43:47 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.477160666108475e-05, 'epoch': 0.3137036003349149, 'step': 1124}
06/30/2020 16:43:47 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.4762303470090246e-05, 'epoch': 0.3142617917945855, 'step': 1126}
06/30/2020 16:43:47 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.475300027909573e-05, 'epoch': 0.3148199832542562, 'step': 1128}
06/30/2020 16:43:47 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.474369708810122e-05, 'epoch': 0.3153781747139269, 'step': 1130}
06/30/2020 16:43:47 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.473439389710671e-05, 'epoch': 0.31593636617359755, 'step': 1132}
06/30/2020 16:43:48 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.47250907061122e-05, 'epoch': 0.3164945576332682, 'step': 1134}
06/30/2020 16:43:48 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.471578

06/30/2020 16:43:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.425993115638664e-05, 'epoch': 0.3444041306168016, 'step': 1234}
06/30/2020 16:43:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.4250627965392134e-05, 'epoch': 0.3449623220764722, 'step': 1236}
06/30/2020 16:43:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.424132477439762e-05, 'epoch': 0.3455205135361429, 'step': 1238}
06/30/2020 16:43:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.423202158340311e-05, 'epoch': 0.34607870499581356, 'step': 1240}
06/30/2020 16:43:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.4222718392408594e-05, 'epoch': 0.34663689645548423, 'step': 1242}
06/30/2020 16:43:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.421341520141409e-05, 'epoch': 0.3471950879151549, 'step': 1244}
06/30/2020 16:43:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.420

06/30/2020 16:44:02 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.374825565168853e-05, 'epoch': 0.37510466089868827, 'step': 1344}
06/30/2020 16:44:02 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.373895246069402e-05, 'epoch': 0.3756628523583589, 'step': 1346}
06/30/2020 16:44:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.372964926969951e-05, 'epoch': 0.37622104381802957, 'step': 1348}
06/30/2020 16:44:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.3720346078705e-05, 'epoch': 0.37677923527770024, 'step': 1350}
06/30/2020 16:44:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.371104288771048e-05, 'epoch': 0.3773374267373709, 'step': 1352}
06/30/2020 16:44:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.3701739696715975e-05, 'epoch': 0.3778956181970416, 'step': 1354}
06/30/2020 16:44:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.36924

06/30/2020 16:44:10 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.3236580146990415e-05, 'epoch': 0.40580519118057495, 'step': 1454}
06/30/2020 16:44:10 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.322727695599591e-05, 'epoch': 0.40636338264024563, 'step': 1456}
06/30/2020 16:44:10 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.3217973765001396e-05, 'epoch': 0.40692157409991625, 'step': 1458}
06/30/2020 16:44:10 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.320867057400689e-05, 'epoch': 0.4074797655595869, 'step': 1460}
06/30/2020 16:44:11 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.3199367383012376e-05, 'epoch': 0.4080379570192576, 'step': 1462}
06/30/2020 16:44:11 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.319006419201786e-05, 'epoch': 0.4085961484789283, 'step': 1464}
06/30/2020 16:44:11 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.3

06/30/2020 16:44:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.273420783328682e-05, 'epoch': 0.43594753000279096, 'step': 1562}
06/30/2020 16:44:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.272490464229231e-05, 'epoch': 0.43650572146246164, 'step': 1564}
06/30/2020 16:44:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.2715601451297797e-05, 'epoch': 0.4370639129221323, 'step': 1566}
06/30/2020 16:44:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.270629826030328e-05, 'epoch': 0.43762210438180293, 'step': 1568}
06/30/2020 16:44:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.269699506930878e-05, 'epoch': 0.4381802958414736, 'step': 1570}
06/30/2020 16:44:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.2687691878314264e-05, 'epoch': 0.4387384873011443, 'step': 1572}
06/30/2020 16:44:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.26

06/30/2020 16:44:25 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.222253232858871e-05, 'epoch': 0.46664806028467765, 'step': 1672}
06/30/2020 16:44:25 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.22132291375942e-05, 'epoch': 0.4672062517443483, 'step': 1674}
06/30/2020 16:44:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.2203925946599684e-05, 'epoch': 0.467764443204019, 'step': 1676}
06/30/2020 16:44:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.219462275560517e-05, 'epoch': 0.4683226346636896, 'step': 1678}
06/30/2020 16:44:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.2185319564610664e-05, 'epoch': 0.4688808261233603, 'step': 1680}
06/30/2020 16:44:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.217601637361615e-05, 'epoch': 0.46943901758303097, 'step': 1682}
06/30/2020 16:44:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.21667

06/30/2020 16:44:33 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.17108568238906e-05, 'epoch': 0.49734859056656433, 'step': 1782}
06/30/2020 16:44:33 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.1701553632896085e-05, 'epoch': 0.497906782026235, 'step': 1784}
06/30/2020 16:44:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.169225044190158e-05, 'epoch': 0.4984649734859057, 'step': 1786}
06/30/2020 16:44:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.1682947250907065e-05, 'epoch': 0.49902316494557636, 'step': 1788}
06/30/2020 16:44:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.167364405991255e-05, 'epoch': 0.499581356405247, 'step': 1790}
06/30/2020 16:44:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.166434086891804e-05, 'epoch': 0.5001395478649177, 'step': 1792}
06/30/2020 16:44:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.165503

06/30/2020 16:44:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.1199181319192486e-05, 'epoch': 0.528049120848451, 'step': 1892}
06/30/2020 16:44:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.118987812819797e-05, 'epoch': 0.5286073123081216, 'step': 1894}
06/30/2020 16:44:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.1180574937203466e-05, 'epoch': 0.5291655037677924, 'step': 1896}
06/30/2020 16:44:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.117127174620895e-05, 'epoch': 0.529723695227463, 'step': 1898}
06/30/2020 16:44:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.116196855521444e-05, 'epoch': 0.5302818866871337, 'step': 1900}
06/30/2020 16:44:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.1152665364219926e-05, 'epoch': 0.5308400781468043, 'step': 1902}
06/30/2020 16:44:42 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.114336

06/30/2020 16:44:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.068750581449437e-05, 'epoch': 0.5587496511303377, 'step': 2002}
06/30/2020 16:44:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.067820262349986e-05, 'epoch': 0.5593078425900083, 'step': 2004}
06/30/2020 16:44:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.0668899432505353e-05, 'epoch': 0.559866034049679, 'step': 2006}
06/30/2020 16:44:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.065959624151084e-05, 'epoch': 0.5604242255093497, 'step': 2008}
06/30/2020 16:44:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.0650293050516334e-05, 'epoch': 0.5609824169690204, 'step': 2010}
06/30/2020 16:44:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.0640989859521814e-05, 'epoch': 0.561540608428691, 'step': 2012}
06/30/2020 16:44:50 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.063168

06/30/2020 16:44:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.017583030979627e-05, 'epoch': 0.5894501814122244, 'step': 2112}
06/30/2020 16:44:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.016652711880175e-05, 'epoch': 0.5900083728718951, 'step': 2114}
06/30/2020 16:44:57 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.015722392780724e-05, 'epoch': 0.5905665643315657, 'step': 2116}
06/30/2020 16:44:57 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.014792073681273e-05, 'epoch': 0.5911247557912364, 'step': 2118}
06/30/2020 16:44:57 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.013861754581822e-05, 'epoch': 0.5916829472509071, 'step': 2120}
06/30/2020 16:44:57 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.012931435482371e-05, 'epoch': 0.5922411387105777, 'step': 2122}
06/30/2020 16:44:57 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.0120011

06/30/2020 16:45:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.9664154805098155e-05, 'epoch': 0.6201507116941111, 'step': 2222}
06/30/2020 16:45:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.965485161410364e-05, 'epoch': 0.6207089031537818, 'step': 2224}
06/30/2020 16:45:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.964554842310913e-05, 'epoch': 0.6212670946134524, 'step': 2226}
06/30/2020 16:45:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.9636245232114615e-05, 'epoch': 0.621825286073123, 'step': 2228}
06/30/2020 16:45:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.962694204112011e-05, 'epoch': 0.6223834775327938, 'step': 2230}
06/30/2020 16:45:05 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.9617638850125596e-05, 'epoch': 0.6229416689924644, 'step': 2232}
06/30/2020 16:45:05 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.96083

06/30/2020 16:45:11 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.915247930040004e-05, 'epoch': 0.6508512419759978, 'step': 2332}
06/30/2020 16:45:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.914317610940553e-05, 'epoch': 0.6514094334356685, 'step': 2334}
06/30/2020 16:45:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.9133872918411016e-05, 'epoch': 0.6519676248953391, 'step': 2336}
06/30/2020 16:45:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.91245697274165e-05, 'epoch': 0.6525258163550097, 'step': 2338}
06/30/2020 16:45:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.9115266536421996e-05, 'epoch': 0.6530840078146805, 'step': 2340}
06/30/2020 16:45:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.910596334542748e-05, 'epoch': 0.6536421992743511, 'step': 2342}
06/30/2020 16:45:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.909666

06/30/2020 16:45:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.864080379570193e-05, 'epoch': 0.6815517722578844, 'step': 2442}
06/30/2020 16:45:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.863150060470742e-05, 'epoch': 0.6821099637175552, 'step': 2444}
06/30/2020 16:45:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.862219741371291e-05, 'epoch': 0.6826681551772258, 'step': 2446}
06/30/2020 16:45:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.861289422271839e-05, 'epoch': 0.6832263466368964, 'step': 2448}
06/30/2020 16:45:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.8603591031723884e-05, 'epoch': 0.6837845380965671, 'step': 2450}
06/30/2020 16:45:20 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.859428784072937e-05, 'epoch': 0.6843427295562378, 'step': 2452}
06/30/2020 16:45:20 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.858498

06/30/2020 16:45:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.812912829100382e-05, 'epoch': 0.7122523025397711, 'step': 2552}
06/30/2020 16:45:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.8119825100009304e-05, 'epoch': 0.7128104939994419, 'step': 2554}
06/30/2020 16:45:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.81105219090148e-05, 'epoch': 0.7133686854591125, 'step': 2556}
06/30/2020 16:45:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.8101218718020285e-05, 'epoch': 0.7139268769187831, 'step': 2558}
06/30/2020 16:45:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.809191552702577e-05, 'epoch': 0.7144850683784538, 'step': 2560}
06/30/2020 16:45:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.808261233603126e-05, 'epoch': 0.7150432598381244, 'step': 2562}
06/30/2020 16:45:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.807330

06/30/2020 16:45:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.7617452786305705e-05, 'epoch': 0.7429528328216578, 'step': 2662}
06/30/2020 16:45:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.760814959531119e-05, 'epoch': 0.7435110242813285, 'step': 2664}
06/30/2020 16:45:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.7598846404316685e-05, 'epoch': 0.7440692157409992, 'step': 2666}
06/30/2020 16:45:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.758954321332217e-05, 'epoch': 0.7446274072006698, 'step': 2668}
06/30/2020 16:45:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.758024002232766e-05, 'epoch': 0.7451855986603405, 'step': 2670}
06/30/2020 16:45:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.7570936831333146e-05, 'epoch': 0.7457437901200111, 'step': 2672}
06/30/2020 16:45:35 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.7561

06/30/2020 16:45:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.710577728160759e-05, 'epoch': 0.7736533631035445, 'step': 2772}
06/30/2020 16:45:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.709647409061308e-05, 'epoch': 0.7742115545632152, 'step': 2774}
06/30/2020 16:45:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.708717089961857e-05, 'epoch': 0.7747697460228858, 'step': 2776}
06/30/2020 16:45:42 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.707786770862406e-05, 'epoch': 0.7753279374825565, 'step': 2778}
06/30/2020 16:45:42 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.706856451762955e-05, 'epoch': 0.7758861289422272, 'step': 2780}
06/30/2020 16:45:42 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.705926132663503e-05, 'epoch': 0.7764443204018978, 'step': 2782}
06/30/2020 16:45:42 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.7049958

06/30/2020 16:45:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.659410177690949e-05, 'epoch': 0.8043538933854312, 'step': 2882}
06/30/2020 16:45:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.658479858591497e-05, 'epoch': 0.8049120848451019, 'step': 2884}
06/30/2020 16:45:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.657549539492046e-05, 'epoch': 0.8054702763047725, 'step': 2886}
06/30/2020 16:45:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.656619220392595e-05, 'epoch': 0.8060284677644433, 'step': 2888}
06/30/2020 16:45:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.655688901293144e-05, 'epoch': 0.8065866592241139, 'step': 2890}
06/30/2020 16:45:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.654758582193693e-05, 'epoch': 0.8071448506837845, 'step': 2892}
06/30/2020 16:45:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.6538282

06/30/2020 16:45:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.6082426272211374e-05, 'epoch': 0.8350544236673179, 'step': 2992}
06/30/2020 16:45:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.607312308121686e-05, 'epoch': 0.8356126151269886, 'step': 2994}
06/30/2020 16:45:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.606381989022235e-05, 'epoch': 0.8361708065866592, 'step': 2996}
06/30/2020 16:45:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.6054516699227835e-05, 'epoch': 0.8367289980463299, 'step': 2998}
06/30/2020 16:45:57 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.604521350823333e-05, 'epoch': 0.8372871895060006, 'step': 3000}
06/30/2020 16:45:57 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.6035910317238815e-05, 'epoch': 0.8378453809656712, 'step': 3002}
06/30/2020 16:45:57 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.6026

06/30/2020 16:46:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.557075076751326e-05, 'epoch': 0.8657549539492045, 'step': 3102}
06/30/2020 16:46:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.556144757651875e-05, 'epoch': 0.8663131454088753, 'step': 3104}
06/30/2020 16:46:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.5552144385524236e-05, 'epoch': 0.8668713368685459, 'step': 3106}
06/30/2020 16:46:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.554284119452972e-05, 'epoch': 0.8674295283282166, 'step': 3108}
06/30/2020 16:46:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.5533538003535216e-05, 'epoch': 0.8679877197878872, 'step': 3110}
06/30/2020 16:46:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.55242348125407e-05, 'epoch': 0.8685459112475579, 'step': 3112}
06/30/2020 16:46:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.551493

06/30/2020 16:46:11 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.505907526281515e-05, 'epoch': 0.8964554842310912, 'step': 3212}
06/30/2020 16:46:11 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.5049772071820636e-05, 'epoch': 0.897013675690762, 'step': 3214}
06/30/2020 16:46:11 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.504046888082613e-05, 'epoch': 0.8975718671504326, 'step': 3216}
06/30/2020 16:46:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.503116568983161e-05, 'epoch': 0.8981300586101033, 'step': 3218}
06/30/2020 16:46:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.50218624988371e-05, 'epoch': 0.8986882500697739, 'step': 3220}
06/30/2020 16:46:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.501255930784259e-05, 'epoch': 0.8992464415294446, 'step': 3222}
06/30/2020 16:46:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.50032561

06/30/2020 16:46:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.454739975811704e-05, 'epoch': 0.9271560145129779, 'step': 3322}
06/30/2020 16:46:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.4538096567122524e-05, 'epoch': 0.9277142059726486, 'step': 3324}
06/30/2020 16:46:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.452879337612802e-05, 'epoch': 0.9282723974323193, 'step': 3326}
06/30/2020 16:46:20 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.4519490185133504e-05, 'epoch': 0.92883058889199, 'step': 3328}
06/30/2020 16:46:20 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.451018699413899e-05, 'epoch': 0.9293887803516606, 'step': 3330}
06/30/2020 16:46:20 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.450088380314448e-05, 'epoch': 0.9299469718113312, 'step': 3332}
06/30/2020 16:46:20 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.4491580

06/30/2020 16:46:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.4035724253418925e-05, 'epoch': 0.9578565447948646, 'step': 3432}
06/30/2020 16:46:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.402642106242441e-05, 'epoch': 0.9584147362545353, 'step': 3434}
06/30/2020 16:46:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.4017117871429905e-05, 'epoch': 0.958972927714206, 'step': 3436}
06/30/2020 16:46:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.400781468043539e-05, 'epoch': 0.9595311191738767, 'step': 3438}
06/30/2020 16:46:28 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.3998511489440885e-05, 'epoch': 0.9600893106335473, 'step': 3440}
06/30/2020 16:46:28 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.3989208298446365e-05, 'epoch': 0.9606475020932179, 'step': 3442}
06/30/2020 16:46:28 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.3979

06/30/2020 16:46:35 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.352404874872081e-05, 'epoch': 0.9885570750767513, 'step': 3542}
06/30/2020 16:46:35 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.35147455577263e-05, 'epoch': 0.989115266536422, 'step': 3544}
06/30/2020 16:46:35 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.350544236673179e-05, 'epoch': 0.9896734579960926, 'step': 3546}
06/30/2020 16:46:35 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.349613917573728e-05, 'epoch': 0.9902316494557634, 'step': 3548}
06/30/2020 16:46:35 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.348683598474277e-05, 'epoch': 0.990789840915434, 'step': 3550}
06/30/2020 16:46:35 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.347753279374826e-05, 'epoch': 0.9913480323751047, 'step': 3552}
06/30/2020 16:46:36 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.3468229602




HBox(children=(FloatProgress(value=0.0, description='Iteration', max=3583.0, style=ProgressStyle(description_w…

06/30/2020 16:46:38 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.332868173783608e-05, 'epoch': 1.0002790957298353, 'step': 3584}
06/30/2020 16:46:38 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.331937854684157e-05, 'epoch': 1.000837287189506, 'step': 3586}
06/30/2020 16:46:38 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.331007535584706e-05, 'epoch': 1.0013954786491768, 'step': 3588}
06/30/2020 16:46:38 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.330077216485254e-05, 'epoch': 1.0019536701088474, 'step': 3590}
06/30/2020 16:46:38 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.3291468973858036e-05, 'epoch': 1.002511861568518, 'step': 3592}
06/30/2020 16:46:38 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.328216578286352e-05, 'epoch': 1.0030700530281886, 'step': 3594}
06/30/2020 16:46:39 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.32728625

06/30/2020 16:46:46 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.281700623313797e-05, 'epoch': 1.030979626011722, 'step': 3694}
06/30/2020 16:46:46 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.2807703042143456e-05, 'epoch': 1.0315378174713927, 'step': 3696}
06/30/2020 16:46:46 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.279839985114895e-05, 'epoch': 1.0320960089310633, 'step': 3698}
06/30/2020 16:46:46 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.2789096660154436e-05, 'epoch': 1.032654200390734, 'step': 3700}
06/30/2020 16:46:46 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.277979346915992e-05, 'epoch': 1.0332123918504046, 'step': 3702}
06/30/2020 16:46:46 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.277049027816541e-05, 'epoch': 1.0337705833100754, 'step': 3704}
06/30/2020 16:46:46 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.2761187

06/30/2020 16:46:54 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.230533072843986e-05, 'epoch': 1.0616801562936087, 'step': 3804}
06/30/2020 16:46:54 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.2296027537445344e-05, 'epoch': 1.0622383477532793, 'step': 3806}
06/30/2020 16:46:54 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.228672434645084e-05, 'epoch': 1.0627965392129501, 'step': 3808}
06/30/2020 16:46:54 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.2277421155456324e-05, 'epoch': 1.0633547306726208, 'step': 3810}
06/30/2020 16:46:54 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.226811796446181e-05, 'epoch': 1.0639129221322914, 'step': 3812}
06/30/2020 16:46:54 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.22588147734673e-05, 'epoch': 1.064471113591962, 'step': 3814}
06/30/2020 16:46:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.2249511

06/30/2020 16:47:02 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.1793655223741744e-05, 'epoch': 1.0923806865754955, 'step': 3914}
06/30/2020 16:47:02 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.178435203274723e-05, 'epoch': 1.092938878035166, 'step': 3916}
06/30/2020 16:47:02 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.1775048841752725e-05, 'epoch': 1.0934970694948367, 'step': 3918}
06/30/2020 16:47:02 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.176574565075821e-05, 'epoch': 1.0940552609545073, 'step': 3920}
06/30/2020 16:47:02 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.1756442459763705e-05, 'epoch': 1.094613452414178, 'step': 3922}
06/30/2020 16:47:02 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.1747139268769185e-05, 'epoch': 1.0951716438738488, 'step': 3924}
06/30/2020 16:47:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.17378

06/30/2020 16:47:09 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.128197971904364e-05, 'epoch': 1.123081216857382, 'step': 4024}
06/30/2020 16:47:10 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.127267652804912e-05, 'epoch': 1.1236394083170527, 'step': 4026}
06/30/2020 16:47:10 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.126337333705461e-05, 'epoch': 1.1241975997767235, 'step': 4028}
06/30/2020 16:47:10 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.12540701460601e-05, 'epoch': 1.1247557912363941, 'step': 4030}
06/30/2020 16:47:10 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.124476695506559e-05, 'epoch': 1.1253139826960648, 'step': 4032}
06/30/2020 16:47:10 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.123546376407108e-05, 'epoch': 1.1258721741557354, 'step': 4034}
06/30/2020 16:47:10 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.122616057

06/30/2020 16:47:17 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.0770304214345526e-05, 'epoch': 1.1537817471392688, 'step': 4134}
06/30/2020 16:47:17 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.076100102335101e-05, 'epoch': 1.1543399385989395, 'step': 4136}
06/30/2020 16:47:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.07516978323565e-05, 'epoch': 1.15489813005861, 'step': 4138}
06/30/2020 16:47:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.0742394641361986e-05, 'epoch': 1.1554563215182807, 'step': 4140}
06/30/2020 16:47:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.073309145036748e-05, 'epoch': 1.1560145129779515, 'step': 4142}
06/30/2020 16:47:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.072378825937297e-05, 'epoch': 1.1565727044376222, 'step': 4144}
06/30/2020 16:47:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.07144850

06/30/2020 16:47:25 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.0258628709647414e-05, 'epoch': 1.1844822774211554, 'step': 4244}
06/30/2020 16:47:25 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.02493255186529e-05, 'epoch': 1.185040468880826, 'step': 4246}
06/30/2020 16:47:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.024002232765839e-05, 'epoch': 1.1855986603404969, 'step': 4248}
06/30/2020 16:47:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.0230719136663877e-05, 'epoch': 1.1861568518001675, 'step': 4250}
06/30/2020 16:47:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.0221415945669368e-05, 'epoch': 1.1867150432598381, 'step': 4252}
06/30/2020 16:47:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.0212112754674854e-05, 'epoch': 1.1872732347195087, 'step': 4254}
06/30/2020 16:47:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.02028

06/30/2020 16:47:33 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.97469532049493e-05, 'epoch': 1.2151828077030422, 'step': 4354}
06/30/2020 16:47:33 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.9737650013954788e-05, 'epoch': 1.2157409991627128, 'step': 4356}
06/30/2020 16:47:33 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.9728346822960278e-05, 'epoch': 1.2162991906223835, 'step': 4358}
06/30/2020 16:47:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.9719043631965765e-05, 'epoch': 1.216857382082054, 'step': 4360}
06/30/2020 16:47:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.9709740440971255e-05, 'epoch': 1.2174155735417247, 'step': 4362}
06/30/2020 16:47:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.9700437249976742e-05, 'epoch': 1.2179737650013955, 'step': 4364}
06/30/2020 16:47:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.9691

06/30/2020 16:47:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.923527770025119e-05, 'epoch': 1.2458833379849288, 'step': 4464}
06/30/2020 16:47:42 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.9225974509256676e-05, 'epoch': 1.2464415294445994, 'step': 4466}
06/30/2020 16:47:42 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.921667131826217e-05, 'epoch': 1.2469997209042702, 'step': 4468}
06/30/2020 16:47:42 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.9207368127267652e-05, 'epoch': 1.2475579123639409, 'step': 4470}
06/30/2020 16:47:42 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.9198064936273146e-05, 'epoch': 1.2481161038236115, 'step': 4472}
06/30/2020 16:47:42 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.918876174527863e-05, 'epoch': 1.248674295283282, 'step': 4474}
06/30/2020 16:47:42 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.91794

06/30/2020 16:47:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.872360219555308e-05, 'epoch': 1.2765838682668156, 'step': 4574}
06/30/2020 16:47:50 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.8714299004558563e-05, 'epoch': 1.2771420597264862, 'step': 4576}
06/30/2020 16:47:50 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.8704995813564057e-05, 'epoch': 1.2777002511861568, 'step': 4578}
06/30/2020 16:47:50 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.8695692622569543e-05, 'epoch': 1.2782584426458274, 'step': 4580}
06/30/2020 16:47:50 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.8686389431575034e-05, 'epoch': 1.2788166341054983, 'step': 4582}
06/30/2020 16:47:50 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.867708624058052e-05, 'epoch': 1.279374825565169, 'step': 4584}
06/30/2020 16:47:50 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.8667

06/30/2020 16:47:57 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.8211926690854967e-05, 'epoch': 1.3072843985487022, 'step': 4684}
06/30/2020 16:47:57 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.8202623499860454e-05, 'epoch': 1.3078425900083728, 'step': 4686}
06/30/2020 16:47:58 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.8193320308865944e-05, 'epoch': 1.3084007814680436, 'step': 4688}
06/30/2020 16:47:58 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.818401711787143e-05, 'epoch': 1.3089589729277142, 'step': 4690}
06/30/2020 16:47:58 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.817471392687692e-05, 'epoch': 1.3095171643873849, 'step': 4692}
06/30/2020 16:47:58 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.8165410735882408e-05, 'epoch': 1.3100753558470555, 'step': 4694}
06/30/2020 16:47:58 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.815

06/30/2020 16:48:05 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.7700251186156855e-05, 'epoch': 1.337984928830589, 'step': 4794}
06/30/2020 16:48:05 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.769094799516234e-05, 'epoch': 1.3385431202902596, 'step': 4796}
06/30/2020 16:48:06 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.7681644804167835e-05, 'epoch': 1.3391013117499302, 'step': 4798}
06/30/2020 16:48:06 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.767234161317332e-05, 'epoch': 1.3396595032096008, 'step': 4800}
06/30/2020 16:48:06 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.7663038422178812e-05, 'epoch': 1.3402176946692714, 'step': 4802}
06/30/2020 16:48:06 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.7653735231184295e-05, 'epoch': 1.3407758861289423, 'step': 4804}
06/30/2020 16:48:06 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.7644

06/30/2020 16:48:13 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.718857568145874e-05, 'epoch': 1.3686854591124755, 'step': 4904}
06/30/2020 16:48:13 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.717927249046423e-05, 'epoch': 1.3692436505721464, 'step': 4906}
06/30/2020 16:48:14 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.7169969299469723e-05, 'epoch': 1.369801842031817, 'step': 4908}
06/30/2020 16:48:14 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.716066610847521e-05, 'epoch': 1.3703600334914876, 'step': 4910}
06/30/2020 16:48:14 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.71513629174807e-05, 'epoch': 1.3709182249511582, 'step': 4912}
06/30/2020 16:48:14 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.7142059726486186e-05, 'epoch': 1.3714764164108288, 'step': 4914}
06/30/2020 16:48:14 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.7132756

06/30/2020 16:48:22 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.6723416131733186e-05, 'epoch': 1.396595032096009, 'step': 5004}
06/30/2020 16:48:22 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.6714112940738673e-05, 'epoch': 1.3971532235556796, 'step': 5006}
06/30/2020 16:48:22 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.6704809749744163e-05, 'epoch': 1.3977114150153502, 'step': 5008}
06/30/2020 16:48:22 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.669550655874965e-05, 'epoch': 1.3982696064750209, 'step': 5010}
06/30/2020 16:48:23 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.6686203367755143e-05, 'epoch': 1.3988277979346915, 'step': 5012}
06/30/2020 16:48:23 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.6676900176760626e-05, 'epoch': 1.3993859893943623, 'step': 5014}
06/30/2020 16:48:23 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.666

06/30/2020 16:48:30 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.6211740627035077e-05, 'epoch': 1.4272955623778956, 'step': 5114}
06/30/2020 16:48:31 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.620243743604056e-05, 'epoch': 1.4278537538375664, 'step': 5116}
06/30/2020 16:48:31 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.6193134245046054e-05, 'epoch': 1.428411945297237, 'step': 5118}
06/30/2020 16:48:31 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.6183831054051537e-05, 'epoch': 1.4289701367569076, 'step': 5120}
06/30/2020 16:48:31 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.617452786305703e-05, 'epoch': 1.4295283282165783, 'step': 5122}
06/30/2020 16:48:31 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.6165224672062517e-05, 'epoch': 1.430086519676249, 'step': 5124}
06/30/2020 16:48:31 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.61559

06/30/2020 16:48:39 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.5700065122336964e-05, 'epoch': 1.4579960926597824, 'step': 5224}
06/30/2020 16:48:39 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.569076193134245e-05, 'epoch': 1.458554284119453, 'step': 5226}
06/30/2020 16:48:39 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.568145874034794e-05, 'epoch': 1.4591124755791236, 'step': 5228}
06/30/2020 16:48:39 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.5672155549353428e-05, 'epoch': 1.4596706670387944, 'step': 5230}
06/30/2020 16:48:39 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.5662852358358918e-05, 'epoch': 1.460228858498465, 'step': 5232}
06/30/2020 16:48:39 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.5653549167364405e-05, 'epoch': 1.4607870499581357, 'step': 5234}
06/30/2020 16:48:39 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.56442

06/30/2020 16:48:46 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.5188389617638852e-05, 'epoch': 1.488696622941669, 'step': 5334}
06/30/2020 16:48:47 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.517908642664434e-05, 'epoch': 1.4892548144013396, 'step': 5336}
06/30/2020 16:48:47 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.516978323564983e-05, 'epoch': 1.4898130058610104, 'step': 5338}
06/30/2020 16:48:47 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.5160480044655315e-05, 'epoch': 1.490371197320681, 'step': 5340}
06/30/2020 16:48:47 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.5151176853660806e-05, 'epoch': 1.4909293887803516, 'step': 5342}
06/30/2020 16:48:47 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.5141873662666292e-05, 'epoch': 1.4914875802400223, 'step': 5344}
06/30/2020 16:48:47 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.51325

06/30/2020 16:48:54 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.467671411294074e-05, 'epoch': 1.5193971532235557, 'step': 5444}
06/30/2020 16:48:54 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.466741092194623e-05, 'epoch': 1.5199553446832264, 'step': 5446}
06/30/2020 16:48:54 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.465810773095172e-05, 'epoch': 1.520513536142897, 'step': 5448}
06/30/2020 16:48:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.4648804539957206e-05, 'epoch': 1.5210717276025676, 'step': 5450}
06/30/2020 16:48:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.4639501348962697e-05, 'epoch': 1.5216299190622382, 'step': 5452}
06/30/2020 16:48:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.4630198157968183e-05, 'epoch': 1.522188110521909, 'step': 5454}
06/30/2020 16:48:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.462089

06/30/2020 16:49:02 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.416503860824263e-05, 'epoch': 1.5500976835054425, 'step': 5554}
06/30/2020 16:49:02 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.4155735417248117e-05, 'epoch': 1.5506558749651131, 'step': 5556}
06/30/2020 16:49:02 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.4146432226253607e-05, 'epoch': 1.5512140664247838, 'step': 5558}
06/30/2020 16:49:02 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.4137129035259097e-05, 'epoch': 1.5517722578844544, 'step': 5560}
06/30/2020 16:49:02 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.4127825844264584e-05, 'epoch': 1.552330449344125, 'step': 5562}
06/30/2020 16:49:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.4118522653270074e-05, 'epoch': 1.5528886408037956, 'step': 5564}
06/30/2020 16:49:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.410

06/30/2020 16:49:10 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.3653363103544518e-05, 'epoch': 1.580798213787329, 'step': 5664}
06/30/2020 16:49:10 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.3644059912550008e-05, 'epoch': 1.5813564052469997, 'step': 5666}
06/30/2020 16:49:10 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.3634756721555495e-05, 'epoch': 1.5819145967066703, 'step': 5668}
06/30/2020 16:49:10 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.3625453530560985e-05, 'epoch': 1.5824727881663412, 'step': 5670}
06/30/2020 16:49:10 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.361615033956647e-05, 'epoch': 1.5830309796260118, 'step': 5672}
06/30/2020 16:49:10 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.3606847148571962e-05, 'epoch': 1.5835891710856824, 'step': 5674}
06/30/2020 16:49:10 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.359

06/30/2020 16:49:17 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.3141687598846405e-05, 'epoch': 1.6114987440692157, 'step': 5774}
06/30/2020 16:49:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.3132384407851895e-05, 'epoch': 1.6120569355288863, 'step': 5776}
06/30/2020 16:49:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.3123081216857386e-05, 'epoch': 1.612615126988557, 'step': 5778}
06/30/2020 16:49:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.3113778025862872e-05, 'epoch': 1.6131733184482278, 'step': 5780}
06/30/2020 16:49:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.3104474834868363e-05, 'epoch': 1.6137315099078984, 'step': 5782}
06/30/2020 16:49:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.309517164387385e-05, 'epoch': 1.6142897013675692, 'step': 5784}
06/30/2020 16:49:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.308

06/30/2020 16:49:25 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.2630012094148296e-05, 'epoch': 1.6421992743511025, 'step': 5884}
06/30/2020 16:49:25 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.2620708903153783e-05, 'epoch': 1.642757465810773, 'step': 5886}
06/30/2020 16:49:25 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.2611405712159273e-05, 'epoch': 1.6433156572704437, 'step': 5888}
06/30/2020 16:49:25 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.260210252116476e-05, 'epoch': 1.6438738487301143, 'step': 5890}
06/30/2020 16:49:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.259279933017025e-05, 'epoch': 1.644432040189785, 'step': 5892}
06/30/2020 16:49:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.258349613917574e-05, 'epoch': 1.6449902316494558, 'step': 5894}
06/30/2020 16:49:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.257419

06/30/2020 16:49:33 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.2118336589450184e-05, 'epoch': 1.6728998046329893, 'step': 5994}
06/30/2020 16:49:33 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.2109033398455674e-05, 'epoch': 1.6734579960926599, 'step': 5996}
06/30/2020 16:49:33 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.209973020746116e-05, 'epoch': 1.6740161875523305, 'step': 5998}
06/30/2020 16:49:33 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.209042701646665e-05, 'epoch': 1.6745743790120011, 'step': 6000}
06/30/2020 16:49:33 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.2081123825472138e-05, 'epoch': 1.6751325704716717, 'step': 6002}
06/30/2020 16:49:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.2071820634477628e-05, 'epoch': 1.6756907619313424, 'step': 6004}
06/30/2020 16:49:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.206

06/30/2020 16:49:40 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.160666108475207e-05, 'epoch': 1.7036003349148758, 'step': 6104}
06/30/2020 16:49:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.159735789375756e-05, 'epoch': 1.7041585263745465, 'step': 6106}
06/30/2020 16:49:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.1588054702763048e-05, 'epoch': 1.7047167178342173, 'step': 6108}
06/30/2020 16:49:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.157875151176854e-05, 'epoch': 1.705274909293888, 'step': 6110}
06/30/2020 16:49:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.156944832077403e-05, 'epoch': 1.7058331007535585, 'step': 6112}
06/30/2020 16:49:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.1560145129779515e-05, 'epoch': 1.7063912922132292, 'step': 6114}
06/30/2020 16:49:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.155084

06/30/2020 16:49:48 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.1094985580053962e-05, 'epoch': 1.7343008651967624, 'step': 6214}
06/30/2020 16:49:48 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.108568238905945e-05, 'epoch': 1.734859056656433, 'step': 6216}
06/30/2020 16:49:48 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.107637919806494e-05, 'epoch': 1.7354172481161039, 'step': 6218}
06/30/2020 16:49:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.1067076007070426e-05, 'epoch': 1.7359754395757745, 'step': 6220}
06/30/2020 16:49:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.1057772816075916e-05, 'epoch': 1.7365336310354451, 'step': 6222}
06/30/2020 16:49:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.1048469625081406e-05, 'epoch': 1.737091822495116, 'step': 6224}
06/30/2020 16:49:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.10391

06/30/2020 16:49:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.0583310075355846e-05, 'epoch': 1.7650013954786492, 'step': 6324}
06/30/2020 16:49:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.0574006884361336e-05, 'epoch': 1.7655595869383198, 'step': 6326}
06/30/2020 16:49:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.0564703693366827e-05, 'epoch': 1.7661177783979904, 'step': 6328}
06/30/2020 16:49:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.0555400502372317e-05, 'epoch': 1.766675969857661, 'step': 6330}
06/30/2020 16:49:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.0546097311377804e-05, 'epoch': 1.7672341613173317, 'step': 6332}
06/30/2020 16:49:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.0536794120383294e-05, 'epoch': 1.7677923527770025, 'step': 6334}
06/30/2020 16:49:57 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.05

06/30/2020 16:50:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.0071634570657734e-05, 'epoch': 1.795701925760536, 'step': 6434}
06/30/2020 16:50:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.0062331379663227e-05, 'epoch': 1.7962601172202066, 'step': 6436}
06/30/2020 16:50:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.0053028188668714e-05, 'epoch': 1.7968183086798772, 'step': 6438}
06/30/2020 16:50:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.0043724997674204e-05, 'epoch': 1.7973765001395479, 'step': 6440}
06/30/2020 16:50:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.0034421806679694e-05, 'epoch': 1.7979346915992185, 'step': 6442}
06/30/2020 16:50:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.002511861568518e-05, 'epoch': 1.798492883058889, 'step': 6444}
06/30/2020 16:50:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.0015

06/30/2020 16:50:11 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.9559959065959625e-05, 'epoch': 1.8264024560424226, 'step': 6544}
06/30/2020 16:50:11 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.9550655874965115e-05, 'epoch': 1.8269606475020932, 'step': 6546}
06/30/2020 16:50:11 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.9541352683970605e-05, 'epoch': 1.827518838961764, 'step': 6548}
06/30/2020 16:50:11 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.9532049492976092e-05, 'epoch': 1.8280770304214347, 'step': 6550}
06/30/2020 16:50:11 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.9522746301981582e-05, 'epoch': 1.8286352218811053, 'step': 6552}
06/30/2020 16:50:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.951344311098707e-05, 'epoch': 1.829193413340776, 'step': 6554}
06/30/2020 16:50:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.9504

06/30/2020 16:50:18 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.9048283561261512e-05, 'epoch': 1.8571029863243091, 'step': 6654}
06/30/2020 16:50:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.9038980370267002e-05, 'epoch': 1.8576611777839798, 'step': 6656}
06/30/2020 16:50:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.9029677179272493e-05, 'epoch': 1.8582193692436506, 'step': 6658}
06/30/2020 16:50:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.9020373988277983e-05, 'epoch': 1.8587775607033212, 'step': 6660}
06/30/2020 16:50:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.901107079728347e-05, 'epoch': 1.859335752162992, 'step': 6662}
06/30/2020 16:50:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.900176760628896e-05, 'epoch': 1.8598939436226627, 'step': 6664}
06/30/2020 16:50:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.8992

06/30/2020 16:50:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.85366080565634e-05, 'epoch': 1.887803516606196, 'step': 6764}
06/30/2020 16:50:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.852730486556889e-05, 'epoch': 1.8883617080658666, 'step': 6766}
06/30/2020 16:50:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.851800167457438e-05, 'epoch': 1.8889198995255372, 'step': 6768}
06/30/2020 16:50:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.850869848357987e-05, 'epoch': 1.8894780909852078, 'step': 6770}


Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2048.0


06/30/2020 16:50:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.8499395292585357e-05, 'epoch': 1.8900362824448786, 'step': 6772}
06/30/2020 16:50:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.8490092101590847e-05, 'epoch': 1.8905944739045493, 'step': 6774}
06/30/2020 16:50:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.8480788910596337e-05, 'epoch': 1.8911526653642199, 'step': 6776}
06/30/2020 16:50:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.8471485719601824e-05, 'epoch': 1.8917108568238907, 'step': 6778}
06/30/2020 16:50:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.8462182528607314e-05, 'epoch': 1.8922690482835614, 'step': 6780}
06/30/2020 16:50:28 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.84528793376128e-05, 'epoch': 1.892827239743232, 'step': 6782}
06/30/2020 16:50:28 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.8443

06/30/2020 16:50:35 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.7987719787887248e-05, 'epoch': 1.9207368127267652, 'step': 6882}
06/30/2020 16:50:35 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.7978416596892735e-05, 'epoch': 1.9212950041864358, 'step': 6884}
06/30/2020 16:50:35 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.7969113405898225e-05, 'epoch': 1.9218531956461065, 'step': 6886}
06/30/2020 16:50:35 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.7959810214903715e-05, 'epoch': 1.9224113871057773, 'step': 6888}
06/30/2020 16:50:36 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.7950507023909202e-05, 'epoch': 1.922969578565448, 'step': 6890}
06/30/2020 16:50:36 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.7941203832914692e-05, 'epoch': 1.9235277700251188, 'step': 6892}
06/30/2020 16:50:36 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.79

06/30/2020 16:50:43 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.7476044283189136e-05, 'epoch': 1.951437343008652, 'step': 6992}
06/30/2020 16:50:43 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.7466741092194626e-05, 'epoch': 1.9519955344683226, 'step': 6994}
06/30/2020 16:50:43 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.7457437901200112e-05, 'epoch': 1.9525537259279933, 'step': 6996}
06/30/2020 16:50:43 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.7448134710205603e-05, 'epoch': 1.9531119173876639, 'step': 6998}
06/30/2020 16:50:44 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.743883151921109e-05, 'epoch': 1.9536701088473345, 'step': 7000}
06/30/2020 16:50:44 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.742952832821658e-05, 'epoch': 1.9542283003070053, 'step': 7002}
06/30/2020 16:50:44 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.7420

06/30/2020 16:50:51 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.6964368778491023e-05, 'epoch': 1.9821378732905388, 'step': 7102}
06/30/2020 16:50:51 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.6955065587496513e-05, 'epoch': 1.9826960647502094, 'step': 7104}
06/30/2020 16:50:51 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.6945762396502003e-05, 'epoch': 1.98325425620988, 'step': 7106}
06/30/2020 16:50:51 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.693645920550749e-05, 'epoch': 1.9838124476695507, 'step': 7108}
06/30/2020 16:50:51 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.692715601451298e-05, 'epoch': 1.9843706391292213, 'step': 7110}
06/30/2020 16:50:51 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.6917852823518467e-05, 'epoch': 1.984928830588892, 'step': 7112}
06/30/2020 16:50:51 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.690854




HBox(children=(FloatProgress(value=0.0, description='Iteration', max=3583.0, style=ProgressStyle(description_w…

06/30/2020 16:50:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.6657363475672154e-05, 'epoch': 2.0005581914596706, 'step': 7168}
06/30/2020 16:50:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.6648060284677644e-05, 'epoch': 2.0011163829193412, 'step': 7170}
06/30/2020 16:50:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.663875709368313e-05, 'epoch': 2.001674574379012, 'step': 7172}
06/30/2020 16:50:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.6629453902688625e-05, 'epoch': 2.0022327658386825, 'step': 7174}
06/30/2020 16:50:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.662015071169411e-05, 'epoch': 2.0027909572983535, 'step': 7176}
06/30/2020 16:50:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.66108475206996e-05, 'epoch': 2.003349148758024, 'step': 7178}
06/30/2020 16:50:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.6601544

06/30/2020 16:51:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.6145687970974045e-05, 'epoch': 2.0312587217415574, 'step': 7278}
06/30/2020 16:51:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.6136384779979532e-05, 'epoch': 2.031816913201228, 'step': 7280}
06/30/2020 16:51:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.6127081588985022e-05, 'epoch': 2.0323751046608987, 'step': 7282}
06/30/2020 16:51:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.6117778397990512e-05, 'epoch': 2.0329332961205693, 'step': 7284}
06/30/2020 16:51:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.6108475206996002e-05, 'epoch': 2.03349148758024, 'step': 7286}
06/30/2020 16:51:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.609917201600149e-05, 'epoch': 2.0340496790399105, 'step': 7288}
06/30/2020 16:51:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.60898

06/30/2020 16:51:11 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.5634012466275933e-05, 'epoch': 2.061959252023444, 'step': 7388}
06/30/2020 16:51:11 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.562470927528142e-05, 'epoch': 2.062517443483115, 'step': 7390}
06/30/2020 16:51:11 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.561540608428691e-05, 'epoch': 2.0630756349427855, 'step': 7392}
06/30/2020 16:51:11 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.56061028932924e-05, 'epoch': 2.063633826402456, 'step': 7394}
06/30/2020 16:51:11 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.559679970229789e-05, 'epoch': 2.0641920178621267, 'step': 7396}
06/30/2020 16:51:11 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.558749651130338e-05, 'epoch': 2.0647502093217973, 'step': 7398}
06/30/2020 16:51:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.5578193320

06/30/2020 16:51:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.512233696157782e-05, 'epoch': 2.0926597823053306, 'step': 7498}
06/30/2020 16:51:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.511303377058331e-05, 'epoch': 2.093217973765001, 'step': 7500}
06/30/2020 16:51:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.5103730579588799e-05, 'epoch': 2.0937761652246722, 'step': 7502}
06/30/2020 16:51:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.5094427388594287e-05, 'epoch': 2.094334356684343, 'step': 7504}
06/30/2020 16:51:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.508512419759978e-05, 'epoch': 2.0948925481440135, 'step': 7506}
06/30/2020 16:51:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.5075821006605268e-05, 'epoch': 2.095450739603684, 'step': 7508}
06/30/2020 16:51:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.5066517

06/30/2020 16:51:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.461066145687971e-05, 'epoch': 2.1233603125872174, 'step': 7608}
06/30/2020 16:51:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.4601358265885198e-05, 'epoch': 2.123918504046888, 'step': 7610}
06/30/2020 16:51:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.4592055074890686e-05, 'epoch': 2.1244766955065586, 'step': 7612}
06/30/2020 16:51:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.4582751883896177e-05, 'epoch': 2.125034886966229, 'step': 7614}
06/30/2020 16:51:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.4573448692901667e-05, 'epoch': 2.1255930784259003, 'step': 7616}
06/30/2020 16:51:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.4564145501907155e-05, 'epoch': 2.126151269885571, 'step': 7618}
06/30/2020 16:51:27 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.45548

06/30/2020 16:51:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.4098985952181599e-05, 'epoch': 2.154060842869104, 'step': 7718}
06/30/2020 16:51:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.4089682761187087e-05, 'epoch': 2.1546190343287748, 'step': 7720}
06/30/2020 16:51:34 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.4080379570192576e-05, 'epoch': 2.1551772257884454, 'step': 7722}
06/30/2020 16:51:35 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.4071076379198064e-05, 'epoch': 2.155735417248116, 'step': 7724}
06/30/2020 16:51:35 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.4061773188203556e-05, 'epoch': 2.1562936087077866, 'step': 7726}
06/30/2020 16:51:35 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.4052469997209044e-05, 'epoch': 2.1568518001674573, 'step': 7728}
06/30/2020 16:51:35 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.404

06/30/2020 16:51:42 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.3587310447483486e-05, 'epoch': 2.184761373150991, 'step': 7828}
06/30/2020 16:51:42 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.3578007256488975e-05, 'epoch': 2.1853195646106616, 'step': 7830}
06/30/2020 16:51:43 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.3568704065494465e-05, 'epoch': 2.185877756070332, 'step': 7832}
06/30/2020 16:51:43 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.3559400874499953e-05, 'epoch': 2.186435947530003, 'step': 7834}
06/30/2020 16:51:43 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.3550097683505445e-05, 'epoch': 2.1869941389896734, 'step': 7836}
06/30/2020 16:51:43 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.3540794492510934e-05, 'epoch': 2.187552330449344, 'step': 7838}
06/30/2020 16:51:43 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.35314

06/30/2020 16:51:51 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.3075634942785375e-05, 'epoch': 2.2154619034328773, 'step': 7938}
06/30/2020 16:51:51 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.3066331751790864e-05, 'epoch': 2.2160200948925484, 'step': 7940}
06/30/2020 16:51:51 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.3057028560796352e-05, 'epoch': 2.216578286352219, 'step': 7942}
06/30/2020 16:51:51 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.304772536980184e-05, 'epoch': 2.2171364778118896, 'step': 7944}
06/30/2020 16:51:51 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.3038422178807331e-05, 'epoch': 2.2176946692715602, 'step': 7946}
06/30/2020 16:51:51 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.3029118987812821e-05, 'epoch': 2.218252860731231, 'step': 7948}
06/30/2020 16:51:52 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.3019

06/30/2020 16:51:59 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.2563959438087265e-05, 'epoch': 2.246162433714764, 'step': 8048}
06/30/2020 16:51:59 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.2554656247092753e-05, 'epoch': 2.2467206251744347, 'step': 8050}
06/30/2020 16:51:59 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.2545353056098242e-05, 'epoch': 2.2472788166341053, 'step': 8052}
06/30/2020 16:51:59 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.253604986510373e-05, 'epoch': 2.247837008093776, 'step': 8054}
06/30/2020 16:51:59 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.2526746674109219e-05, 'epoch': 2.248395199553447, 'step': 8056}
06/30/2020 16:52:00 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.251744348311471e-05, 'epoch': 2.2489533910131176, 'step': 8058}
06/30/2020 16:52:00 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.250814

06/30/2020 16:52:07 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.2052283933389152e-05, 'epoch': 2.276862963996651, 'step': 8158}
06/30/2020 16:52:07 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.2042980742394642e-05, 'epoch': 2.2774211554563215, 'step': 8160}
06/30/2020 16:52:07 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.203367755140013e-05, 'epoch': 2.277979346915992, 'step': 8162}
06/30/2020 16:52:07 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.202437436040562e-05, 'epoch': 2.2785375383756628, 'step': 8164}
06/30/2020 16:52:07 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.201507116941111e-05, 'epoch': 2.2790957298353334, 'step': 8166}
06/30/2020 16:52:08 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.2005767978416598e-05, 'epoch': 2.279653921295004, 'step': 8168}
06/30/2020 16:52:08 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.1996464

06/30/2020 16:52:14 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.1540608428691041e-05, 'epoch': 2.3075634942785377, 'step': 8268}
06/30/2020 16:52:15 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.1531305237696532e-05, 'epoch': 2.3081216857382083, 'step': 8270}
06/30/2020 16:52:15 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.152200204670202e-05, 'epoch': 2.308679877197879, 'step': 8272}
06/30/2020 16:52:15 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.1512698855707509e-05, 'epoch': 2.3092380686575495, 'step': 8274}
06/30/2020 16:52:15 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.1503395664712997e-05, 'epoch': 2.30979626011722, 'step': 8276}
06/30/2020 16:52:15 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.1494092473718485e-05, 'epoch': 2.310354451576891, 'step': 8278}
06/30/2020 16:52:15 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.148478

06/30/2020 16:52:22 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.1028932923992929e-05, 'epoch': 2.338264024560424, 'step': 8378}
06/30/2020 16:52:22 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.1019629732998419e-05, 'epoch': 2.3388222160200947, 'step': 8380}
06/30/2020 16:52:22 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.1010326542003908e-05, 'epoch': 2.3393804074797657, 'step': 8382}
06/30/2020 16:52:23 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.1001023351009398e-05, 'epoch': 2.3399385989394363, 'step': 8384}
06/30/2020 16:52:23 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.0991720160014886e-05, 'epoch': 2.340496790399107, 'step': 8386}
06/30/2020 16:52:23 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.0982416969020375e-05, 'epoch': 2.3410549818587776, 'step': 8388}
06/30/2020 16:52:23 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.097

06/30/2020 16:52:30 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.0517257419294818e-05, 'epoch': 2.368964554842311, 'step': 8488}
06/30/2020 16:52:30 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.0507954228300307e-05, 'epoch': 2.3695227463019815, 'step': 8490}
06/30/2020 16:52:30 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.0498651037305797e-05, 'epoch': 2.370080937761652, 'step': 8492}
06/30/2020 16:52:30 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.0489347846311285e-05, 'epoch': 2.370639129221323, 'step': 8494}
06/30/2020 16:52:31 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.0480044655316774e-05, 'epoch': 2.3711973206809938, 'step': 8496}
06/30/2020 16:52:31 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.0470741464322264e-05, 'epoch': 2.3717555121406644, 'step': 8498}
06/30/2020 16:52:31 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.0461

06/30/2020 16:52:39 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.0005581914596707e-05, 'epoch': 2.3996650851241976, 'step': 8598}
06/30/2020 16:52:39 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 9.996278723602196e-06, 'epoch': 2.4002232765838682, 'step': 8600}
06/30/2020 16:52:39 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 9.986975532607686e-06, 'epoch': 2.400781468043539, 'step': 8602}
06/30/2020 16:52:39 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 9.977672341613174e-06, 'epoch': 2.4013396595032095, 'step': 8604}
06/30/2020 16:52:39 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 9.968369150618663e-06, 'epoch': 2.40189785096288, 'step': 8606}
06/30/2020 16:52:39 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 9.959065959624151e-06, 'epoch': 2.4024560424225507, 'step': 8608}
06/30/2020 16:52:39 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 9.949762768

06/30/2020 16:52:47 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 9.493906409898595e-06, 'epoch': 2.4303656154060844, 'step': 8708}
06/30/2020 16:52:47 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 9.484603218904083e-06, 'epoch': 2.430923806865755, 'step': 8710}
06/30/2020 16:52:47 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 9.475300027909574e-06, 'epoch': 2.4314819983254257, 'step': 8712}
06/30/2020 16:52:47 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 9.465996836915062e-06, 'epoch': 2.4320401897850963, 'step': 8714}
06/30/2020 16:52:47 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 9.456693645920552e-06, 'epoch': 2.432598381244767, 'step': 8716}
06/30/2020 16:52:48 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 9.44739045492604e-06, 'epoch': 2.4331565727044375, 'step': 8718}
06/30/2020 16:52:48 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 9.4380872639

06/30/2020 16:52:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 8.982230905200484e-06, 'epoch': 2.4610661456879708, 'step': 8818}
06/30/2020 16:52:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 8.972927714205973e-06, 'epoch': 2.461624337147642, 'step': 8820}
06/30/2020 16:52:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 8.963624523211461e-06, 'epoch': 2.4621825286073125, 'step': 8822}
06/30/2020 16:52:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 8.954321332216951e-06, 'epoch': 2.462740720066983, 'step': 8824}
06/30/2020 16:52:55 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 8.94501814122244e-06, 'epoch': 2.4632989115266537, 'step': 8826}
06/30/2020 16:52:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 8.935714950227928e-06, 'epoch': 2.4638571029863243, 'step': 8828}
06/30/2020 16:52:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 8.9264117592

06/30/2020 16:53:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 8.470555400502372e-06, 'epoch': 2.4917666759698576, 'step': 8928}
06/30/2020 16:53:03 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 8.461252209507862e-06, 'epoch': 2.492324867429528, 'step': 8930}
06/30/2020 16:53:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 8.45194901851335e-06, 'epoch': 2.492883058889199, 'step': 8932}
06/30/2020 16:53:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 8.44264582751884e-06, 'epoch': 2.4934412503488694, 'step': 8934}
06/30/2020 16:53:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 8.433342636524329e-06, 'epoch': 2.4939994418085405, 'step': 8936}
06/30/2020 16:53:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 8.424039445529817e-06, 'epoch': 2.494557633268211, 'step': 8938}
06/30/2020 16:53:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 8.414736254535

06/30/2020 16:53:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.958879895804261e-06, 'epoch': 2.5224672062517444, 'step': 9038}
06/30/2020 16:53:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.94957670480975e-06, 'epoch': 2.523025397711415, 'step': 9040}
06/30/2020 16:53:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.940273513815238e-06, 'epoch': 2.5235835891710856, 'step': 9042}
06/30/2020 16:53:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.930970322820728e-06, 'epoch': 2.5241417806307562, 'step': 9044}
06/30/2020 16:53:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.921667131826216e-06, 'epoch': 2.524699972090427, 'step': 9046}
06/30/2020 16:53:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.912363940831707e-06, 'epoch': 2.525258163550098, 'step': 9048}
06/30/2020 16:53:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.90306074983

06/30/2020 16:53:19 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.447204391106149e-06, 'epoch': 2.553167736533631, 'step': 9148}
06/30/2020 16:53:20 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.437901200111639e-06, 'epoch': 2.553725927993302, 'step': 9150}
06/30/2020 16:53:20 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.428598009117127e-06, 'epoch': 2.5542841194529724, 'step': 9152}
06/30/2020 16:53:20 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.4192948181226155e-06, 'epoch': 2.554842310912643, 'step': 9154}
06/30/2020 16:53:20 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.409991627128106e-06, 'epoch': 2.5554005023723136, 'step': 9156}
06/30/2020 16:53:20 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.400688436133595e-06, 'epoch': 2.5559586938319843, 'step': 9158}
06/30/2020 16:53:20 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.391385245

Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2048.0


06/30/2020 16:53:25 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.093683133314727e-06, 'epoch': 2.5743790120011165, 'step': 9224}
06/30/2020 16:53:25 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.0843799423202155e-06, 'epoch': 2.574937203460787, 'step': 9226}
06/30/2020 16:53:25 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.075076751325706e-06, 'epoch': 2.5754953949204578, 'step': 9228}
06/30/2020 16:53:25 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.065773560331194e-06, 'epoch': 2.5760535863801284, 'step': 9230}
06/30/2020 16:53:25 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.056470369336683e-06, 'epoch': 2.576611777839799, 'step': 9232}
06/30/2020 16:53:25 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.047167178342172e-06, 'epoch': 2.5771699692994696, 'step': 9234}
06/30/2020 16:53:26 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 7.03786398

06/30/2020 16:53:33 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 6.582007628616615e-06, 'epoch': 2.6050795422830033, 'step': 9334}
06/30/2020 16:53:33 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 6.572704437622104e-06, 'epoch': 2.6056377337426735, 'step': 9336}
06/30/2020 16:53:33 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 6.563401246627593e-06, 'epoch': 2.6061959252023446, 'step': 9338}
06/30/2020 16:53:33 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 6.554098055633083e-06, 'epoch': 2.606754116662015, 'step': 9340}
06/30/2020 16:53:33 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 6.544794864638572e-06, 'epoch': 2.607312308121686, 'step': 9342}
06/30/2020 16:53:33 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 6.53549167364406e-06, 'epoch': 2.6078704995813564, 'step': 9344}
06/30/2020 16:53:33 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 6.5261884826

06/30/2020 16:53:40 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 6.070332123918504e-06, 'epoch': 2.6357800725648897, 'step': 9444}
06/30/2020 16:53:40 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 6.061028932923993e-06, 'epoch': 2.6363382640245603, 'step': 9446}
06/30/2020 16:53:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 6.051725741929482e-06, 'epoch': 2.636896455484231, 'step': 9448}
06/30/2020 16:53:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 6.042422550934971e-06, 'epoch': 2.637454646943902, 'step': 9450}
06/30/2020 16:53:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 6.033119359940459e-06, 'epoch': 2.6380128384035726, 'step': 9452}
06/30/2020 16:53:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 6.0238161689459495e-06, 'epoch': 2.638571029863243, 'step': 9454}
06/30/2020 16:53:41 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 6.014512977

06/30/2020 16:53:48 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 5.558656619220393e-06, 'epoch': 2.6664806028467765, 'step': 9554}
06/30/2020 16:53:48 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 5.549353428225882e-06, 'epoch': 2.667038794306447, 'step': 9556}
06/30/2020 16:53:48 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 5.540050237231371e-06, 'epoch': 2.6675969857661177, 'step': 9558}
06/30/2020 16:53:48 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 5.530747046236859e-06, 'epoch': 2.6681551772257883, 'step': 9560}
06/30/2020 16:53:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 5.5214438552423485e-06, 'epoch': 2.6687133686854594, 'step': 9562}
06/30/2020 16:53:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 5.512140664247837e-06, 'epoch': 2.6692715601451296, 'step': 9564}
06/30/2020 16:53:49 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 5.5028374

06/30/2020 16:53:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 5.046981114522281e-06, 'epoch': 2.6971811331286633, 'step': 9664}
06/30/2020 16:53:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 5.037677923527771e-06, 'epoch': 2.697739324588334, 'step': 9666}
06/30/2020 16:53:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 5.028374732533259e-06, 'epoch': 2.6982975160480045, 'step': 9668}
06/30/2020 16:53:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 5.0190715415387485e-06, 'epoch': 2.698855707507675, 'step': 9670}
06/30/2020 16:53:56 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 5.009768350544237e-06, 'epoch': 2.6994138989673457, 'step': 9672}
06/30/2020 16:53:57 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 5.000465159549725e-06, 'epoch': 2.6999720904270164, 'step': 9674}
06/30/2020 16:53:57 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.99116196

06/30/2020 16:54:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.53530560982417e-06, 'epoch': 2.7278816634105496, 'step': 9774}
06/30/2020 16:54:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.526002418829658e-06, 'epoch': 2.7284398548702207, 'step': 9776}
06/30/2020 16:54:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.5166992278351475e-06, 'epoch': 2.7289980463298913, 'step': 9778}
06/30/2020 16:54:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.507396036840637e-06, 'epoch': 2.729556237789562, 'step': 9780}
06/30/2020 16:54:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.498092845846125e-06, 'epoch': 2.7301144292492325, 'step': 9782}
06/30/2020 16:54:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.488789654851614e-06, 'epoch': 2.730672620708903, 'step': 9784}
06/30/2020 16:54:04 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.479486463

06/30/2020 16:54:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.023630105126058e-06, 'epoch': 2.7585821936924364, 'step': 9884}
06/30/2020 16:54:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.0143269141315474e-06, 'epoch': 2.759140385152107, 'step': 9886}
06/30/2020 16:54:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.005023723137037e-06, 'epoch': 2.759698576611778, 'step': 9888}
06/30/2020 16:54:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.995720532142525e-06, 'epoch': 2.7602567680714483, 'step': 9890}
06/30/2020 16:54:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.986417341148014e-06, 'epoch': 2.7608149595311193, 'step': 9892}
06/30/2020 16:54:12 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.977114150153503e-06, 'epoch': 2.76137315099079, 'step': 9894}
06/30/2020 16:54:13 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.9678109591

06/30/2020 16:54:20 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.511954600427947e-06, 'epoch': 2.789282723974323, 'step': 9994}
06/30/2020 16:54:20 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.5026514094334354e-06, 'epoch': 2.789840915433994, 'step': 9996}
06/30/2020 16:54:20 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.493348218438925e-06, 'epoch': 2.7903991068936644, 'step': 9998}
06/30/2020 16:54:20 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.4840450274444136e-06, 'epoch': 2.790957298353335, 'step': 10000}
06/30/2020 16:54:20 - INFO - transformers.trainer -   Saving model checkpoint to ../../weights/gpt2/papers_milan/checkpoint-10000
06/30/2020 16:54:20 - INFO - transformers.configuration_utils -   Configuration saved in ../../weights/gpt2/papers_milan/checkpoint-10000/config.json
06/30/2020 16:54:21 - INFO - transformers.modeling_utils -   Model weights saved in ../../weights/gpt2/papers_milan/

06/30/2020 16:54:28 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.028188668713369e-06, 'epoch': 2.818308679877198, 'step': 10098}
06/30/2020 16:54:28 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.0188854777188575e-06, 'epoch': 2.8188668713368683, 'step': 10100}
06/30/2020 16:54:29 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.009582286724347e-06, 'epoch': 2.8194250627965394, 'step': 10102}
06/30/2020 16:54:29 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 3.0002790957298353e-06, 'epoch': 2.81998325425621, 'step': 10104}
06/30/2020 16:54:29 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.9909759047353246e-06, 'epoch': 2.8205414457158806, 'step': 10106}
06/30/2020 16:54:29 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.981672713740813e-06, 'epoch': 2.8210996371755512, 'step': 10108}
06/30/2020 16:54:29 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.9

06/30/2020 16:54:36 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.5258163550097686e-06, 'epoch': 2.848451018699414, 'step': 10206}
06/30/2020 16:54:36 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.5165131640152575e-06, 'epoch': 2.8490092101590845, 'step': 10208}
06/30/2020 16:54:36 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.5072099730207463e-06, 'epoch': 2.849567401618755, 'step': 10210}
06/30/2020 16:54:36 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.497906782026235e-06, 'epoch': 2.8501255930784257, 'step': 10212}
06/30/2020 16:54:36 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.488603591031724e-06, 'epoch': 2.850683784538097, 'step': 10214}
06/30/2020 16:54:37 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.4793004000372126e-06, 'epoch': 2.851241975997767, 'step': 10216}
06/30/2020 16:54:37 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.4

06/30/2020 16:54:44 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.023444041306168e-06, 'epoch': 2.87859335752163, 'step': 10314}
06/30/2020 16:54:44 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.014140850311657e-06, 'epoch': 2.8791515489813007, 'step': 10316}
06/30/2020 16:54:44 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 2.004837659317146e-06, 'epoch': 2.8797097404409713, 'step': 10318}
06/30/2020 16:54:44 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.9955344683226347e-06, 'epoch': 2.880267931900642, 'step': 10320}
06/30/2020 16:54:44 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.9862312773281236e-06, 'epoch': 2.8808261233603125, 'step': 10322}
06/30/2020 16:54:45 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.9769280863336125e-06, 'epoch': 2.881384314819983, 'step': 10324}
06/30/2020 16:54:45 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.96

06/30/2020 16:54:52 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.5210717276025678e-06, 'epoch': 2.9087356963438458, 'step': 10422}
06/30/2020 16:54:52 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.5117685366080567e-06, 'epoch': 2.909293887803517, 'step': 10424}
06/30/2020 16:54:52 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.5024653456135455e-06, 'epoch': 2.909852079263187, 'step': 10426}
06/30/2020 16:54:52 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.4931621546190344e-06, 'epoch': 2.910410270722858, 'step': 10428}
06/30/2020 16:54:52 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.4838589636245233e-06, 'epoch': 2.9109684621825287, 'step': 10430}
06/30/2020 16:54:53 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.474555772630012e-06, 'epoch': 2.9115266536421993, 'step': 10432}
06/30/2020 16:54:53 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1

06/30/2020 16:54:59 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.0186994138989675e-06, 'epoch': 2.938878035166062, 'step': 10530}
06/30/2020 16:55:00 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.0093962229044564e-06, 'epoch': 2.9394362266257326, 'step': 10532}
06/30/2020 16:55:00 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 1.0000930319099452e-06, 'epoch': 2.939994418085403, 'step': 10534}
06/30/2020 16:55:00 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 9.907898409154341e-07, 'epoch': 2.940552609545074, 'step': 10536}
06/30/2020 16:55:00 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 9.814866499209228e-07, 'epoch': 2.9411108010047444, 'step': 10538}
06/30/2020 16:55:00 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 9.721834589264117e-07, 'epoch': 2.9416689924644155, 'step': 10540}
06/30/2020 16:55:00 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 9.6

06/30/2020 16:55:07 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 5.070239092008559e-07, 'epoch': 2.9695785654479487, 'step': 10640}
06/30/2020 16:55:07 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.977207182063447e-07, 'epoch': 2.9701367569076194, 'step': 10642}
06/30/2020 16:55:07 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.884175272118337e-07, 'epoch': 2.97069494836729, 'step': 10644}
06/30/2020 16:55:07 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.791143362173226e-07, 'epoch': 2.9712531398269606, 'step': 10646}
06/30/2020 16:55:08 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.698111452228114e-07, 'epoch': 2.971811331286631, 'step': 10648}
06/30/2020 16:55:08 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.6050795422830034e-07, 'epoch': 2.972369522746302, 'step': 10650}
06/30/2020 16:55:08 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.5120

06/30/2020 16:55:14 - INFO - transformers.trainer -   {'loss': nan, 'learning_rate': 4.651595497255559e-09, 'epoch': 2.9997209042701645, 'step': 10748}
06/30/2020 16:55:15 - INFO - transformers.trainer -   

Training completed. Do not forget to share your model on huggingface.co/models =)


06/30/2020 16:55:15 - INFO - transformers.trainer -   Saving model checkpoint to ../../weights/gpt2/papers_milan/
06/30/2020 16:55:15 - INFO - transformers.configuration_utils -   Configuration saved in ../../weights/gpt2/papers_milan/config.json






06/30/2020 16:55:15 - INFO - transformers.modeling_utils -   Model weights saved in ../../weights/gpt2/papers_milan/pytorch_model.bin
06/30/2020 16:55:15 - INFO - __main__ -   *** Evaluate ***
06/30/2020 16:55:15 - INFO - transformers.trainer -   ***** Running Evaluation *****
06/30/2020 16:55:15 - INFO - transformers.trainer -     Num examples = 448
06/30/2020 16:55:15 - INFO - transformers.trainer -     Batch size = 1


HBox(children=(FloatProgress(value=0.0, description='Evaluation', max=448.0, style=ProgressStyle(description_w…

06/30/2020 16:55:22 - INFO - transformers.trainer -   {'eval_loss': 3.6394232530146837, 'epoch': 3.0, 'step': 10749}
06/30/2020 16:55:22 - INFO - __main__ -   ***** Eval results *****
06/30/2020 16:55:22 - INFO - __main__ -     perplexity = 38.06987370756345





In [12]:
run {cmd_finetuning.format(**val_finetuning_params)}

06/30/2020 16:55:50 - INFO - transformers.training_args -   PyTorch: setting up devices
06/30/2020 16:55:50 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(output_dir='../../weights/gpt2/papers_milan/', overwrite_output_dir=True, do_train=False, do_eval=True, do_predict=False, evaluate_during_training=False, per_device_train_batch_size=1, per_device_eval_batch_size=1, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, learning_rate=5e-05, weight_decay=0.0, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, warmup_steps=0, logging_dir='runs/Jun30_16-55-50_Camilo-UbuntuPC', logging_first_step=False, logging_steps=2, save_steps=5000, save_total_limit=5, no_cuda=False, seed=42, fp16=True, fp16_opt_level='O1', local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, dataloader_drop_last=False)
06/30/2020 16:55:50 - INFO - transformers.configuration_utils -   loading configuration file ../../weights/gp

HBox(children=(FloatProgress(value=0.0, description='Evaluation', max=447.0, style=ProgressStyle(description_w…

06/30/2020 16:56:02 - INFO - transformers.trainer -   {'eval_loss': 3.582757234840052, 'step': 0}
06/30/2020 16:56:02 - INFO - __main__ -   ***** Eval results *****
06/30/2020 16:56:02 - INFO - __main__ -     perplexity = 35.972589110633976





In [13]:
run {cmd_finetuning.format(**val_params)}

06/30/2020 16:56:02 - INFO - transformers.training_args -   PyTorch: setting up devices
06/30/2020 16:56:02 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(output_dir='../../weights/gpt2/papers_milan/', overwrite_output_dir=True, do_train=False, do_eval=True, do_predict=False, evaluate_during_training=False, per_device_train_batch_size=1, per_device_eval_batch_size=1, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, learning_rate=5e-05, weight_decay=0.0, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=3.0, max_steps=-1, warmup_steps=0, logging_dir='runs/Jun30_16-56-02_Camilo-UbuntuPC', logging_first_step=False, logging_steps=2, save_steps=5000, save_total_limit=5, no_cuda=False, seed=42, fp16=True, fp16_opt_level='O1', local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, dataloader_drop_last=False)
06/30/2020 16:56:02 - INFO - transformers.configuration_utils -   loading configuration file https://s3.amazo

HBox(children=(FloatProgress(value=0.0, description='Evaluation', max=447.0, style=ProgressStyle(description_w…

06/30/2020 16:56:25 - INFO - transformers.trainer -   {'eval_loss': 5.189669396786615, 'step': 0}
06/30/2020 16:56:25 - INFO - __main__ -   ***** Eval results *****
06/30/2020 16:56:25 - INFO - __main__ -     perplexity = 179.40922985827137





# Generation

In [14]:
def create_params_generation(MODEL_TYPE, MODEL_NAME_OR_PATH, NUM_RETURN_SEQUENCES=1, LENGTH=20):
    return {
        "model_type": MODEL_TYPE,
        "model_name_or_path": MODEL_NAME_OR_PATH,
        "num_return_sequences": NUM_RETURN_SEQUENCES,
        "length": LENGTH
    }

In [15]:
cmd_generation = """../../transformers/examples/text-generation/run_generation.py \
    --model_type={model_type} \
    --model_name_or_path={model_name_or_path} \
    --num_return_sequences={num_return_sequences} \
    --length={length}
"""

In [16]:
generation_finetuning_params = create_params_generation(MODEL_TYPE, OUTPUT_DIR, NUM_RETURN_SEQUENCES=5, LENGTH=100)
generation_params = create_params_generation(MODEL_TYPE, MODEL_TYPE, NUM_RETURN_SEQUENCES=5, LENGTH=100)

In [17]:
run {cmd_generation.format(**generation_finetuning_params)}

06/30/2020 16:56:50 - INFO - transformers.tokenization_utils_base -   Model name '../../weights/gpt2/papers_milan/' not found in model shortcut name list (gpt2, gpt2-medium, gpt2-large, gpt2-xl, distilgpt2). Assuming '../../weights/gpt2/papers_milan/' is a path, a model identifier, or url to a directory containing tokenizer files.
06/30/2020 16:56:50 - INFO - transformers.tokenization_utils_base -   Didn't find file ../../weights/gpt2/papers_milan/added_tokens.json. We won't load it.
06/30/2020 16:56:50 - INFO - transformers.tokenization_utils_base -   loading file ../../weights/gpt2/papers_milan/vocab.json
06/30/2020 16:56:50 - INFO - transformers.tokenization_utils_base -   loading file ../../weights/gpt2/papers_milan/merges.txt
06/30/2020 16:56:50 - INFO - transformers.tokenization_utils_base -   loading file None
06/30/2020 16:56:50 - INFO - transformers.tokenization_utils_base -   loading file ../../weights/gpt2/papers_milan/special_tokens_map.json
06/30/2020 16:56:50 - INFO - tra

Model prompt >>> It was already stated in Lemma that the reconstruction vector which mutual information between a source vector




=== GENERATED SEQUENCE 1 ===
It was already stated in Lemma that the reconstruction vector which mutual information between a source vector and its corresponding reconstruction error is if and only if is such that the previous definition. The latter definition is well understood and is consistent with what was in, Lemma..,. The latter definition is to be extended for in the, Lemma.,. and see Section.. More so, we argue that, equivalently, for any real valued random process, if,,, is said to be random a, then needs to satisfy the same condition for any other real valued random process
=== GENERATED SEQUENCE 2 ===
It was already stated in Lemma that the reconstruction vector which mutual information between a source vector a and a, jointly with that source vector a would be defined in a matrix of x, y, z, where the first element of the matrix is the letter of the alphabet of the vector, the last element is the second element of the vector, and the last element is the number of the vector

In [18]:
run {cmd_generation.format(**generation_params)}

06/30/2020 16:58:45 - INFO - transformers.tokenization_utils_base -   loading file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-vocab.json from cache at /home/camilojd/.cache/torch/transformers/f2808208f9bec2320371a9f5f891c184ae0b674ef866b79c58177067d15732dd.1512018be4ba4e8726e41b9145129dc30651ea4fec86aa61f4b9f40bf94eac71
06/30/2020 16:58:45 - INFO - transformers.tokenization_utils_base -   loading file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-merges.txt from cache at /home/camilojd/.cache/torch/transformers/d629f792e430b3c76a1291bb2766b0a047e36fae0588f9dbc1ae51decdff691b.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
06/30/2020 16:58:45 - INFO - transformers.configuration_utils -   loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-config.json from cache at /home/camilojd/.cache/torch/transformers/4be02c5697d91738003fb1685c9872f284166aa32e061576bbe6aaeb95649fcf.db13c9bc9c7bdd738ec89e069621d88e05dc670366092d

Model prompt >>> It was already stated in Lemma that the reconstruction vector which mutual information between a source vector




=== GENERATED SEQUENCE 1 ===
It was already stated in Lemma that the reconstruction vector which mutual information between a source vector and itself must exist is not an independent unit. If an approach to this problem takes a solution of an independent vector, for example, two independent factors are required to exist in the base of the source vector, and if one of them is undefined, then other factors of our description must also exist. This includes natural systems or biologically meaningful parameters of nature. These laws of nature then permit the modification of the basic matter of a system or organism. Obviously, the known specialties of any high-speed
=== GENERATED SEQUENCE 2 ===
It was already stated in Lemma that the reconstruction vector which mutual information between a source vector and a value is reciprocal is called the coincopious vector and the haggaristic vector is called the abrupt vector.

The combination is also known as the matter variable (now known as the hag