Skip to content
Permalink
master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
INFO:root:Namespace(accumulate=None, batch_size=32, bert_dataset='book_corpus_wiki_en_uncased', bert_model='bert_12_768_12', dev_batch_size=8, epochs=3, gpu=True, log_interval=100, lr=5e-05, max_len=80, model_parameters=None, optimizer='bertadam', output_dir='./output_dir', seed=2, task_name='MNLI', warmup_ratio=0.1)
INFO:root:BERTClassifier(
(bert): BERTModel(
(encoder): BERTEncoder(
(dropout_layer): Dropout(p = 0.1, axes=())
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
(transformer_cells): HybridSequential(
(0): BERTEncoderCell(
(dropout_layer): Dropout(p = 0.1, axes=())
(attention_cell): MultiHeadAttentionCell(
(_base_cell): DotProductAttentionCell(
(_dropout_layer): Dropout(p = 0.1, axes=())
)
(proj_query): Dense(768 -> 768, linear)
(proj_key): Dense(768 -> 768, linear)
(proj_value): Dense(768 -> 768, linear)
)
(proj): Dense(768 -> 768, linear)
(ffn): BERTPositionwiseFFN(
(ffn_1): Dense(768 -> 3072, linear)
(activation): GELU()
(ffn_2): Dense(3072 -> 768, linear)
(dropout_layer): Dropout(p = 0.1, axes=())
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(1): BERTEncoderCell(
(dropout_layer): Dropout(p = 0.1, axes=())
(attention_cell): MultiHeadAttentionCell(
(_base_cell): DotProductAttentionCell(
(_dropout_layer): Dropout(p = 0.1, axes=())
)
(proj_query): Dense(768 -> 768, linear)
(proj_key): Dense(768 -> 768, linear)
(proj_value): Dense(768 -> 768, linear)
)
(proj): Dense(768 -> 768, linear)
(ffn): BERTPositionwiseFFN(
(ffn_1): Dense(768 -> 3072, linear)
(activation): GELU()
(ffn_2): Dense(3072 -> 768, linear)
(dropout_layer): Dropout(p = 0.1, axes=())
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(2): BERTEncoderCell(
(dropout_layer): Dropout(p = 0.1, axes=())
(attention_cell): MultiHeadAttentionCell(
(_base_cell): DotProductAttentionCell(
(_dropout_layer): Dropout(p = 0.1, axes=())
)
(proj_query): Dense(768 -> 768, linear)
(proj_key): Dense(768 -> 768, linear)
(proj_value): Dense(768 -> 768, linear)
)
(proj): Dense(768 -> 768, linear)
(ffn): BERTPositionwiseFFN(
(ffn_1): Dense(768 -> 3072, linear)
(activation): GELU()
(ffn_2): Dense(3072 -> 768, linear)
(dropout_layer): Dropout(p = 0.1, axes=())
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(3): BERTEncoderCell(
(dropout_layer): Dropout(p = 0.1, axes=())
(attention_cell): MultiHeadAttentionCell(
(_base_cell): DotProductAttentionCell(
(_dropout_layer): Dropout(p = 0.1, axes=())
)
(proj_query): Dense(768 -> 768, linear)
(proj_key): Dense(768 -> 768, linear)
(proj_value): Dense(768 -> 768, linear)
)
(proj): Dense(768 -> 768, linear)
(ffn): BERTPositionwiseFFN(
(ffn_1): Dense(768 -> 3072, linear)
(activation): GELU()
(ffn_2): Dense(3072 -> 768, linear)
(dropout_layer): Dropout(p = 0.1, axes=())
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(4): BERTEncoderCell(
(dropout_layer): Dropout(p = 0.1, axes=())
(attention_cell): MultiHeadAttentionCell(
(_base_cell): DotProductAttentionCell(
(_dropout_layer): Dropout(p = 0.1, axes=())
)
(proj_query): Dense(768 -> 768, linear)
(proj_key): Dense(768 -> 768, linear)
(proj_value): Dense(768 -> 768, linear)
)
(proj): Dense(768 -> 768, linear)
(ffn): BERTPositionwiseFFN(
(ffn_1): Dense(768 -> 3072, linear)
(activation): GELU()
(ffn_2): Dense(3072 -> 768, linear)
(dropout_layer): Dropout(p = 0.1, axes=())
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(5): BERTEncoderCell(
(dropout_layer): Dropout(p = 0.1, axes=())
(attention_cell): MultiHeadAttentionCell(
(_base_cell): DotProductAttentionCell(
(_dropout_layer): Dropout(p = 0.1, axes=())
)
(proj_query): Dense(768 -> 768, linear)
(proj_key): Dense(768 -> 768, linear)
(proj_value): Dense(768 -> 768, linear)
)
(proj): Dense(768 -> 768, linear)
(ffn): BERTPositionwiseFFN(
(ffn_1): Dense(768 -> 3072, linear)
(activation): GELU()
(ffn_2): Dense(3072 -> 768, linear)
(dropout_layer): Dropout(p = 0.1, axes=())
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(6): BERTEncoderCell(
(dropout_layer): Dropout(p = 0.1, axes=())
(attention_cell): MultiHeadAttentionCell(
(_base_cell): DotProductAttentionCell(
(_dropout_layer): Dropout(p = 0.1, axes=())
)
(proj_query): Dense(768 -> 768, linear)
(proj_key): Dense(768 -> 768, linear)
(proj_value): Dense(768 -> 768, linear)
)
(proj): Dense(768 -> 768, linear)
(ffn): BERTPositionwiseFFN(
(ffn_1): Dense(768 -> 3072, linear)
(activation): GELU()
(ffn_2): Dense(3072 -> 768, linear)
(dropout_layer): Dropout(p = 0.1, axes=())
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(7): BERTEncoderCell(
(dropout_layer): Dropout(p = 0.1, axes=())
(attention_cell): MultiHeadAttentionCell(
(_base_cell): DotProductAttentionCell(
(_dropout_layer): Dropout(p = 0.1, axes=())
)
(proj_query): Dense(768 -> 768, linear)
(proj_key): Dense(768 -> 768, linear)
(proj_value): Dense(768 -> 768, linear)
)
(proj): Dense(768 -> 768, linear)
(ffn): BERTPositionwiseFFN(
(ffn_1): Dense(768 -> 3072, linear)
(activation): GELU()
(ffn_2): Dense(3072 -> 768, linear)
(dropout_layer): Dropout(p = 0.1, axes=())
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(8): BERTEncoderCell(
(dropout_layer): Dropout(p = 0.1, axes=())
(attention_cell): MultiHeadAttentionCell(
(_base_cell): DotProductAttentionCell(
(_dropout_layer): Dropout(p = 0.1, axes=())
)
(proj_query): Dense(768 -> 768, linear)
(proj_key): Dense(768 -> 768, linear)
(proj_value): Dense(768 -> 768, linear)
)
(proj): Dense(768 -> 768, linear)
(ffn): BERTPositionwiseFFN(
(ffn_1): Dense(768 -> 3072, linear)
(activation): GELU()
(ffn_2): Dense(3072 -> 768, linear)
(dropout_layer): Dropout(p = 0.1, axes=())
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(9): BERTEncoderCell(
(dropout_layer): Dropout(p = 0.1, axes=())
(attention_cell): MultiHeadAttentionCell(
(_base_cell): DotProductAttentionCell(
(_dropout_layer): Dropout(p = 0.1, axes=())
)
(proj_query): Dense(768 -> 768, linear)
(proj_key): Dense(768 -> 768, linear)
(proj_value): Dense(768 -> 768, linear)
)
(proj): Dense(768 -> 768, linear)
(ffn): BERTPositionwiseFFN(
(ffn_1): Dense(768 -> 3072, linear)
(activation): GELU()
(ffn_2): Dense(3072 -> 768, linear)
(dropout_layer): Dropout(p = 0.1, axes=())
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(10): BERTEncoderCell(
(dropout_layer): Dropout(p = 0.1, axes=())
(attention_cell): MultiHeadAttentionCell(
(_base_cell): DotProductAttentionCell(
(_dropout_layer): Dropout(p = 0.1, axes=())
)
(proj_query): Dense(768 -> 768, linear)
(proj_key): Dense(768 -> 768, linear)
(proj_value): Dense(768 -> 768, linear)
)
(proj): Dense(768 -> 768, linear)
(ffn): BERTPositionwiseFFN(
(ffn_1): Dense(768 -> 3072, linear)
(activation): GELU()
(ffn_2): Dense(3072 -> 768, linear)
(dropout_layer): Dropout(p = 0.1, axes=())
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(11): BERTEncoderCell(
(dropout_layer): Dropout(p = 0.1, axes=())
(attention_cell): MultiHeadAttentionCell(
(_base_cell): DotProductAttentionCell(
(_dropout_layer): Dropout(p = 0.1, axes=())
)
(proj_query): Dense(768 -> 768, linear)
(proj_key): Dense(768 -> 768, linear)
(proj_value): Dense(768 -> 768, linear)
)
(proj): Dense(768 -> 768, linear)
(ffn): BERTPositionwiseFFN(
(ffn_1): Dense(768 -> 3072, linear)
(activation): GELU()
(ffn_2): Dense(3072 -> 768, linear)
(dropout_layer): Dropout(p = 0.1, axes=())
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768)
)
)
)
(word_embed): HybridSequential(
(0): Embedding(30522 -> 768, float32)
(1): Dropout(p = 0.1, axes=())
)
(token_type_embed): HybridSequential(
(0): Embedding(2 -> 768, float32)
(1): Dropout(p = 0.1, axes=())
)
(pooler): Dense(768 -> 768, Activation(tanh))
)
(classifier): HybridSequential(
(0): Dropout(p = 0.1, axes=())
(1): Dense(None -> 3, linear)
)
)
INFO:root:processing dataset...
INFO:root:[Epoch 1 Batch 100/12277] loss=1.1486, lr=0.0000014, metrics=accuracy:0.3550
INFO:root:[Epoch 1 Batch 200/12277] loss=1.0610, lr=0.0000027, metrics=accuracy:0.3909
INFO:root:[Epoch 1 Batch 300/12277] loss=0.9557, lr=0.0000041, metrics=accuracy:0.4423
INFO:root:[Epoch 1 Batch 400/12277] loss=0.8391, lr=0.0000054, metrics=accuracy:0.4914
INFO:root:[Epoch 1 Batch 500/12277] loss=0.7997, lr=0.0000068, metrics=accuracy:0.5241
INFO:root:[Epoch 1 Batch 600/12277] loss=0.7553, lr=0.0000081, metrics=accuracy:0.5497
INFO:root:[Epoch 1 Batch 700/12277] loss=0.7370, lr=0.0000095, metrics=accuracy:0.5687
INFO:root:[Epoch 1 Batch 800/12277] loss=0.7035, lr=0.0000109, metrics=accuracy:0.5856
INFO:root:[Epoch 1 Batch 900/12277] loss=0.6920, lr=0.0000122, metrics=accuracy:0.5992
INFO:root:[Epoch 1 Batch 1000/12277] loss=0.6913, lr=0.0000136, metrics=accuracy:0.6103
INFO:root:[Epoch 1 Batch 1100/12277] loss=0.6783, lr=0.0000149, metrics=accuracy:0.6197
INFO:root:[Epoch 1 Batch 1200/12277] loss=0.6398, lr=0.0000163, metrics=accuracy:0.6297
INFO:root:[Epoch 1 Batch 1300/12277] loss=0.6366, lr=0.0000177, metrics=accuracy:0.6385
INFO:root:[Epoch 1 Batch 1400/12277] loss=0.6478, lr=0.0000190, metrics=accuracy:0.6453
INFO:root:[Epoch 1 Batch 1500/12277] loss=0.6242, lr=0.0000204, metrics=accuracy:0.6520
INFO:root:[Epoch 1 Batch 1600/12277] loss=0.6165, lr=0.0000217, metrics=accuracy:0.6583
INFO:root:[Epoch 1 Batch 1700/12277] loss=0.6019, lr=0.0000231, metrics=accuracy:0.6639
INFO:root:[Epoch 1 Batch 1800/12277] loss=0.6262, lr=0.0000244, metrics=accuracy:0.6687
INFO:root:[Epoch 1 Batch 1900/12277] loss=0.6176, lr=0.0000258, metrics=accuracy:0.6730
INFO:root:[Epoch 1 Batch 2000/12277] loss=0.5914, lr=0.0000272, metrics=accuracy:0.6771
INFO:root:[Epoch 1 Batch 2100/12277] loss=0.5881, lr=0.0000285, metrics=accuracy:0.6814
INFO:root:[Epoch 1 Batch 2200/12277] loss=0.6084, lr=0.0000299, metrics=accuracy:0.6846
INFO:root:[Epoch 1 Batch 2300/12277] loss=0.6204, lr=0.0000312, metrics=accuracy:0.6877
INFO:root:[Epoch 1 Batch 2400/12277] loss=0.5828, lr=0.0000326, metrics=accuracy:0.6909
INFO:root:[Epoch 1 Batch 2500/12277] loss=0.6147, lr=0.0000340, metrics=accuracy:0.6930
INFO:root:[Epoch 1 Batch 2600/12277] loss=0.5919, lr=0.0000353, metrics=accuracy:0.6955
INFO:root:[Epoch 1 Batch 2700/12277] loss=0.6018, lr=0.0000367, metrics=accuracy:0.6980
INFO:root:[Epoch 1 Batch 2800/12277] loss=0.6102, lr=0.0000380, metrics=accuracy:0.6999
INFO:root:[Epoch 1 Batch 2900/12277] loss=0.5564, lr=0.0000394, metrics=accuracy:0.7026
INFO:root:[Epoch 1 Batch 3000/12277] loss=0.6061, lr=0.0000407, metrics=accuracy:0.7044
INFO:root:[Epoch 1 Batch 3100/12277] loss=0.5720, lr=0.0000421, metrics=accuracy:0.7067
INFO:root:[Epoch 1 Batch 3200/12277] loss=0.5922, lr=0.0000435, metrics=accuracy:0.7081
INFO:root:[Epoch 1 Batch 3300/12277] loss=0.5786, lr=0.0000448, metrics=accuracy:0.7100
INFO:root:[Epoch 1 Batch 3400/12277] loss=0.5656, lr=0.0000462, metrics=accuracy:0.7119
INFO:root:[Epoch 1 Batch 3500/12277] loss=0.5765, lr=0.0000475, metrics=accuracy:0.7134
INFO:root:[Epoch 1 Batch 3600/12277] loss=0.5689, lr=0.0000489, metrics=accuracy:0.7150
INFO:root:[Epoch 1 Batch 3700/12277] loss=0.5286, lr=0.0000500, metrics=accuracy:0.7170
INFO:root:[Epoch 1 Batch 3800/12277] loss=0.5646, lr=0.0000498, metrics=accuracy:0.7186
INFO:root:[Epoch 1 Batch 3900/12277] loss=0.5563, lr=0.0000497, metrics=accuracy:0.7201
INFO:root:[Epoch 1 Batch 4000/12277] loss=0.5847, lr=0.0000495, metrics=accuracy:0.7212
INFO:root:[Epoch 1 Batch 4100/12277] loss=0.5873, lr=0.0000494, metrics=accuracy:0.7225
INFO:root:[Epoch 1 Batch 4200/12277] loss=0.5530, lr=0.0000492, metrics=accuracy:0.7238
INFO:root:[Epoch 1 Batch 4300/12277] loss=0.5895, lr=0.0000491, metrics=accuracy:0.7246
INFO:root:[Epoch 1 Batch 4400/12277] loss=0.5605, lr=0.0000489, metrics=accuracy:0.7258
INFO:root:[Epoch 1 Batch 4500/12277] loss=0.5573, lr=0.0000488, metrics=accuracy:0.7270
INFO:root:[Epoch 1 Batch 4600/12277] loss=0.5499, lr=0.0000486, metrics=accuracy:0.7283
INFO:root:[Epoch 1 Batch 4700/12277] loss=0.5515, lr=0.0000485, metrics=accuracy:0.7294
INFO:root:[Epoch 1 Batch 4800/12277] loss=0.5390, lr=0.0000483, metrics=accuracy:0.7308
INFO:root:[Epoch 1 Batch 4900/12277] loss=0.5654, lr=0.0000482, metrics=accuracy:0.7316
INFO:root:[Epoch 1 Batch 5000/12277] loss=0.6041, lr=0.0000480, metrics=accuracy:0.7322
INFO:root:[Epoch 1 Batch 5100/12277] loss=0.5512, lr=0.0000479, metrics=accuracy:0.7333
INFO:root:[Epoch 1 Batch 5200/12277] loss=0.5534, lr=0.0000477, metrics=accuracy:0.7341
INFO:root:[Epoch 1 Batch 5300/12277] loss=0.5583, lr=0.0000476, metrics=accuracy:0.7348
INFO:root:[Epoch 1 Batch 5400/12277] loss=0.5186, lr=0.0000474, metrics=accuracy:0.7359
INFO:root:[Epoch 1 Batch 5500/12277] loss=0.5604, lr=0.0000473, metrics=accuracy:0.7365
INFO:root:[Epoch 1 Batch 5600/12277] loss=0.5303, lr=0.0000471, metrics=accuracy:0.7373
INFO:root:[Epoch 1 Batch 5700/12277] loss=0.5473, lr=0.0000470, metrics=accuracy:0.7381
INFO:root:[Epoch 1 Batch 5800/12277] loss=0.5175, lr=0.0000468, metrics=accuracy:0.7391
INFO:root:[Epoch 1 Batch 5900/12277] loss=0.5376, lr=0.0000467, metrics=accuracy:0.7398
INFO:root:[Epoch 1 Batch 6000/12277] loss=0.5551, lr=0.0000465, metrics=accuracy:0.7404
INFO:root:[Epoch 1 Batch 6100/12277] loss=0.5492, lr=0.0000463, metrics=accuracy:0.7410
INFO:root:[Epoch 1 Batch 6200/12277] loss=0.5311, lr=0.0000462, metrics=accuracy:0.7417
INFO:root:[Epoch 1 Batch 6300/12277] loss=0.5236, lr=0.0000460, metrics=accuracy:0.7425
INFO:root:[Epoch 1 Batch 6400/12277] loss=0.5166, lr=0.0000459, metrics=accuracy:0.7433
INFO:root:[Epoch 1 Batch 6500/12277] loss=0.5445, lr=0.0000457, metrics=accuracy:0.7440
INFO:root:[Epoch 1 Batch 6600/12277] loss=0.5348, lr=0.0000456, metrics=accuracy:0.7447
INFO:root:[Epoch 1 Batch 6700/12277] loss=0.5247, lr=0.0000454, metrics=accuracy:0.7454
INFO:root:[Epoch 1 Batch 6800/12277] loss=0.5279, lr=0.0000453, metrics=accuracy:0.7462
INFO:root:[Epoch 1 Batch 6900/12277] loss=0.5058, lr=0.0000451, metrics=accuracy:0.7470
INFO:root:[Epoch 1 Batch 7000/12277] loss=0.5217, lr=0.0000450, metrics=accuracy:0.7476
INFO:root:[Epoch 1 Batch 7100/12277] loss=0.5214, lr=0.0000448, metrics=accuracy:0.7482
INFO:root:[Epoch 1 Batch 7200/12277] loss=0.5147, lr=0.0000447, metrics=accuracy:0.7489
INFO:root:[Epoch 1 Batch 7300/12277] loss=0.5265, lr=0.0000445, metrics=accuracy:0.7496
INFO:root:[Epoch 1 Batch 7400/12277] loss=0.5113, lr=0.0000444, metrics=accuracy:0.7502
INFO:root:[Epoch 1 Batch 7500/12277] loss=0.5129, lr=0.0000442, metrics=accuracy:0.7509
INFO:root:[Epoch 1 Batch 7600/12277] loss=0.5286, lr=0.0000441, metrics=accuracy:0.7514
INFO:root:[Epoch 1 Batch 7700/12277] loss=0.5334, lr=0.0000439, metrics=accuracy:0.7519
INFO:root:[Epoch 1 Batch 7800/12277] loss=0.4988, lr=0.0000438, metrics=accuracy:0.7526
INFO:root:[Epoch 1 Batch 7900/12277] loss=0.5064, lr=0.0000436, metrics=accuracy:0.7532
INFO:root:[Epoch 1 Batch 8000/12277] loss=0.4915, lr=0.0000435, metrics=accuracy:0.7538
INFO:root:[Epoch 1 Batch 8100/12277] loss=0.5063, lr=0.0000433, metrics=accuracy:0.7545
INFO:root:[Epoch 1 Batch 8200/12277] loss=0.4974, lr=0.0000432, metrics=accuracy:0.7551
INFO:root:[Epoch 1 Batch 8300/12277] loss=0.5069, lr=0.0000430, metrics=accuracy:0.7557
INFO:root:[Epoch 1 Batch 8400/12277] loss=0.5158, lr=0.0000429, metrics=accuracy:0.7561
INFO:root:[Epoch 1 Batch 8500/12277] loss=0.4869, lr=0.0000427, metrics=accuracy:0.7567
INFO:root:[Epoch 1 Batch 8600/12277] loss=0.5133, lr=0.0000426, metrics=accuracy:0.7572
INFO:root:[Epoch 1 Batch 8700/12277] loss=0.5269, lr=0.0000424, metrics=accuracy:0.7576
INFO:root:[Epoch 1 Batch 8800/12277] loss=0.5105, lr=0.0000423, metrics=accuracy:0.7580
INFO:root:[Epoch 1 Batch 8900/12277] loss=0.5210, lr=0.0000421, metrics=accuracy:0.7585
INFO:root:[Epoch 1 Batch 9000/12277] loss=0.4871, lr=0.0000420, metrics=accuracy:0.7590
INFO:root:[Epoch 1 Batch 9100/12277] loss=0.5097, lr=0.0000418, metrics=accuracy:0.7594
INFO:root:[Epoch 1 Batch 9200/12277] loss=0.5006, lr=0.0000417, metrics=accuracy:0.7599
INFO:root:[Epoch 1 Batch 9300/12277] loss=0.5186, lr=0.0000415, metrics=accuracy:0.7603
INFO:root:[Epoch 1 Batch 9400/12277] loss=0.5054, lr=0.0000414, metrics=accuracy:0.7608
INFO:root:[Epoch 1 Batch 9500/12277] loss=0.5145, lr=0.0000412, metrics=accuracy:0.7612
INFO:root:[Epoch 1 Batch 9600/12277] loss=0.5008, lr=0.0000411, metrics=accuracy:0.7616
INFO:root:[Epoch 1 Batch 9700/12277] loss=0.5175, lr=0.0000409, metrics=accuracy:0.7619
INFO:root:[Epoch 1 Batch 9800/12277] loss=0.5049, lr=0.0000408, metrics=accuracy:0.7624
INFO:root:[Epoch 1 Batch 9900/12277] loss=0.4772, lr=0.0000406, metrics=accuracy:0.7629
INFO:root:[Epoch 1 Batch 10000/12277] loss=0.4961, lr=0.0000405, metrics=accuracy:0.7633
INFO:root:[Epoch 1 Batch 10100/12277] loss=0.4888, lr=0.0000403, metrics=accuracy:0.7637
INFO:root:[Epoch 1 Batch 10200/12277] loss=0.5006, lr=0.0000402, metrics=accuracy:0.7641
INFO:root:[Epoch 1 Batch 10300/12277] loss=0.4961, lr=0.0000400, metrics=accuracy:0.7645
INFO:root:[Epoch 1 Batch 10400/12277] loss=0.4925, lr=0.0000399, metrics=accuracy:0.7649
INFO:root:[Epoch 1 Batch 10500/12277] loss=0.4924, lr=0.0000397, metrics=accuracy:0.7653
INFO:root:[Epoch 1 Batch 10600/12277] loss=0.4956, lr=0.0000396, metrics=accuracy:0.7657
INFO:root:[Epoch 1 Batch 10700/12277] loss=0.4858, lr=0.0000394, metrics=accuracy:0.7660
INFO:root:[Epoch 1 Batch 10800/12277] loss=0.5099, lr=0.0000393, metrics=accuracy:0.7663
INFO:root:[Epoch 1 Batch 10900/12277] loss=0.5028, lr=0.0000391, metrics=accuracy:0.7666
INFO:root:[Epoch 1 Batch 11000/12277] loss=0.4804, lr=0.0000390, metrics=accuracy:0.7670
INFO:root:[Epoch 1 Batch 11100/12277] loss=0.4793, lr=0.0000388, metrics=accuracy:0.7675
INFO:root:[Epoch 1 Batch 11200/12277] loss=0.4859, lr=0.0000387, metrics=accuracy:0.7679
INFO:root:[Epoch 1 Batch 11300/12277] loss=0.4899, lr=0.0000385, metrics=accuracy:0.7683
INFO:root:[Epoch 1 Batch 11400/12277] loss=0.4935, lr=0.0000384, metrics=accuracy:0.7686
INFO:root:[Epoch 1 Batch 11500/12277] loss=0.4863, lr=0.0000382, metrics=accuracy:0.7690
INFO:root:[Epoch 1 Batch 11600/12277] loss=0.4926, lr=0.0000381, metrics=accuracy:0.7693
INFO:root:[Epoch 1 Batch 11700/12277] loss=0.4809, lr=0.0000379, metrics=accuracy:0.7696
INFO:root:[Epoch 1 Batch 11800/12277] loss=0.4947, lr=0.0000377, metrics=accuracy:0.7699
INFO:root:[Epoch 1 Batch 11900/12277] loss=0.4842, lr=0.0000376, metrics=accuracy:0.7702
INFO:root:[Epoch 1 Batch 12000/12277] loss=0.4901, lr=0.0000374, metrics=accuracy:0.7705
INFO:root:[Epoch 1 Batch 12100/12277] loss=0.4987, lr=0.0000373, metrics=accuracy:0.7709
INFO:root:[Epoch 1 Batch 12200/12277] loss=0.4914, lr=0.0000371, metrics=accuracy:0.7711
INFO:root:On MNLI Matched:
INFO:root:validation metrics:accuracy:0.8175
INFO:root:On MNLI Mismatched:
INFO:root:validation metrics:accuracy:0.8247
INFO:root:params saved in : ./output_dir/model_bert_MNLI_0.params
INFO:root:Time cost=2930.7s
INFO:root:[Epoch 2 Batch 100/12277] loss=0.3645, lr=0.0000369, metrics=accuracy:0.8569
INFO:root:[Epoch 2 Batch 200/12277] loss=0.3640, lr=0.0000367, metrics=accuracy:0.8609
INFO:root:[Epoch 2 Batch 300/12277] loss=0.3761, lr=0.0000366, metrics=accuracy:0.8609
INFO:root:[Epoch 2 Batch 400/12277] loss=0.3780, lr=0.0000364, metrics=accuracy:0.8606
INFO:root:[Epoch 2 Batch 500/12277] loss=0.3951, lr=0.0000363, metrics=accuracy:0.8580
INFO:root:[Epoch 2 Batch 600/12277] loss=0.3729, lr=0.0000361, metrics=accuracy:0.8578
INFO:root:[Epoch 2 Batch 700/12277] loss=0.3841, lr=0.0000360, metrics=accuracy:0.8572
INFO:root:[Epoch 2 Batch 800/12277] loss=0.3800, lr=0.0000358, metrics=accuracy:0.8573
INFO:root:[Epoch 2 Batch 900/12277] loss=0.4055, lr=0.0000357, metrics=accuracy:0.8557
INFO:root:[Epoch 2 Batch 1000/12277] loss=0.3628, lr=0.0000355, metrics=accuracy:0.8563
INFO:root:[Epoch 2 Batch 1100/12277] loss=0.3732, lr=0.0000354, metrics=accuracy:0.8561
INFO:root:[Epoch 2 Batch 1200/12277] loss=0.3845, lr=0.0000352, metrics=accuracy:0.8555
INFO:root:[Epoch 2 Batch 1300/12277] loss=0.3997, lr=0.0000351, metrics=accuracy:0.8552
INFO:root:[Epoch 2 Batch 1400/12277] loss=0.3712, lr=0.0000349, metrics=accuracy:0.8557
INFO:root:[Epoch 2 Batch 1500/12277] loss=0.3610, lr=0.0000348, metrics=accuracy:0.8563
INFO:root:[Epoch 2 Batch 1600/12277] loss=0.3803, lr=0.0000346, metrics=accuracy:0.8564
INFO:root:[Epoch 2 Batch 1700/12277] loss=0.3863, lr=0.0000345, metrics=accuracy:0.8560
INFO:root:[Epoch 2 Batch 1800/12277] loss=0.3986, lr=0.0000343, metrics=accuracy:0.8556
INFO:root:[Epoch 2 Batch 1900/12277] loss=0.3859, lr=0.0000342, metrics=accuracy:0.8557
INFO:root:[Epoch 2 Batch 2000/12277] loss=0.3711, lr=0.0000340, metrics=accuracy:0.8557
INFO:root:[Epoch 2 Batch 2100/12277] loss=0.3509, lr=0.0000339, metrics=accuracy:0.8564
INFO:root:[Epoch 2 Batch 2200/12277] loss=0.3872, lr=0.0000337, metrics=accuracy:0.8561
INFO:root:[Epoch 2 Batch 2300/12277] loss=0.3888, lr=0.0000336, metrics=accuracy:0.8560
INFO:root:[Epoch 2 Batch 2400/12277] loss=0.3621, lr=0.0000334, metrics=accuracy:0.8562
INFO:root:[Epoch 2 Batch 2500/12277] loss=0.4020, lr=0.0000333, metrics=accuracy:0.8559
INFO:root:[Epoch 2 Batch 2600/12277] loss=0.3661, lr=0.0000331, metrics=accuracy:0.8560
INFO:root:[Epoch 2 Batch 2700/12277] loss=0.3716, lr=0.0000330, metrics=accuracy:0.8563
INFO:root:[Epoch 2 Batch 2800/12277] loss=0.3582, lr=0.0000328, metrics=accuracy:0.8564
INFO:root:[Epoch 2 Batch 2900/12277] loss=0.3753, lr=0.0000327, metrics=accuracy:0.8566
INFO:root:[Epoch 2 Batch 3000/12277] loss=0.3692, lr=0.0000325, metrics=accuracy:0.8569
INFO:root:[Epoch 2 Batch 3100/12277] loss=0.3805, lr=0.0000324, metrics=accuracy:0.8568
INFO:root:[Epoch 2 Batch 3200/12277] loss=0.4049, lr=0.0000322, metrics=accuracy:0.8565
INFO:root:[Epoch 2 Batch 3300/12277] loss=0.3686, lr=0.0000320, metrics=accuracy:0.8567
INFO:root:[Epoch 2 Batch 3400/12277] loss=0.3758, lr=0.0000319, metrics=accuracy:0.8567
INFO:root:[Epoch 2 Batch 3500/12277] loss=0.3626, lr=0.0000317, metrics=accuracy:0.8569
INFO:root:[Epoch 2 Batch 3600/12277] loss=0.3679, lr=0.0000316, metrics=accuracy:0.8570
INFO:root:[Epoch 2 Batch 3700/12277] loss=0.3606, lr=0.0000314, metrics=accuracy:0.8571
INFO:root:[Epoch 2 Batch 3800/12277] loss=0.3941, lr=0.0000313, metrics=accuracy:0.8569
INFO:root:[Epoch 2 Batch 3900/12277] loss=0.3638, lr=0.0000311, metrics=accuracy:0.8570
INFO:root:[Epoch 2 Batch 4000/12277] loss=0.3755, lr=0.0000310, metrics=accuracy:0.8570
INFO:root:[Epoch 2 Batch 4100/12277] loss=0.3819, lr=0.0000308, metrics=accuracy:0.8570
INFO:root:[Epoch 2 Batch 4200/12277] loss=0.3784, lr=0.0000307, metrics=accuracy:0.8569
INFO:root:[Epoch 2 Batch 4300/12277] loss=0.3730, lr=0.0000305, metrics=accuracy:0.8570
INFO:root:[Epoch 2 Batch 4400/12277] loss=0.3714, lr=0.0000304, metrics=accuracy:0.8571
INFO:root:[Epoch 2 Batch 4500/12277] loss=0.3925, lr=0.0000302, metrics=accuracy:0.8570
INFO:root:[Epoch 2 Batch 4600/12277] loss=0.3852, lr=0.0000301, metrics=accuracy:0.8570
INFO:root:[Epoch 2 Batch 4700/12277] loss=0.3580, lr=0.0000299, metrics=accuracy:0.8573
INFO:root:[Epoch 2 Batch 4800/12277] loss=0.3778, lr=0.0000298, metrics=accuracy:0.8572
INFO:root:[Epoch 2 Batch 4900/12277] loss=0.3706, lr=0.0000296, metrics=accuracy:0.8572
INFO:root:[Epoch 2 Batch 5000/12277] loss=0.3776, lr=0.0000295, metrics=accuracy:0.8573
INFO:root:[Epoch 2 Batch 5100/12277] loss=0.3687, lr=0.0000293, metrics=accuracy:0.8573
INFO:root:[Epoch 2 Batch 5200/12277] loss=0.3880, lr=0.0000292, metrics=accuracy:0.8573
INFO:root:[Epoch 2 Batch 5300/12277] loss=0.3875, lr=0.0000290, metrics=accuracy:0.8571
INFO:root:[Epoch 2 Batch 5400/12277] loss=0.3697, lr=0.0000289, metrics=accuracy:0.8572
INFO:root:[Epoch 2 Batch 5500/12277] loss=0.3629, lr=0.0000287, metrics=accuracy:0.8572
INFO:root:[Epoch 2 Batch 5600/12277] loss=0.3942, lr=0.0000286, metrics=accuracy:0.8572
INFO:root:[Epoch 2 Batch 5700/12277] loss=0.3857, lr=0.0000284, metrics=accuracy:0.8571
INFO:root:[Epoch 2 Batch 5800/12277] loss=0.3791, lr=0.0000283, metrics=accuracy:0.8571
INFO:root:[Epoch 2 Batch 5900/12277] loss=0.3851, lr=0.0000281, metrics=accuracy:0.8571
INFO:root:[Epoch 2 Batch 6000/12277] loss=0.3544, lr=0.0000280, metrics=accuracy:0.8571
INFO:root:[Epoch 2 Batch 6100/12277] loss=0.3752, lr=0.0000278, metrics=accuracy:0.8571
INFO:root:[Epoch 2 Batch 6200/12277] loss=0.3743, lr=0.0000277, metrics=accuracy:0.8571
INFO:root:[Epoch 2 Batch 6300/12277] loss=0.3695, lr=0.0000275, metrics=accuracy:0.8572
INFO:root:[Epoch 2 Batch 6400/12277] loss=0.3647, lr=0.0000274, metrics=accuracy:0.8573
INFO:root:[Epoch 2 Batch 6500/12277] loss=0.3683, lr=0.0000272, metrics=accuracy:0.8574
INFO:root:[Epoch 2 Batch 6600/12277] loss=0.3642, lr=0.0000271, metrics=accuracy:0.8574
INFO:root:[Epoch 2 Batch 6700/12277] loss=0.3791, lr=0.0000269, metrics=accuracy:0.8574
INFO:root:[Epoch 2 Batch 6800/12277] loss=0.3630, lr=0.0000268, metrics=accuracy:0.8573
INFO:root:[Epoch 2 Batch 6900/12277] loss=0.3672, lr=0.0000266, metrics=accuracy:0.8574
INFO:root:[Epoch 2 Batch 7000/12277] loss=0.3749, lr=0.0000265, metrics=accuracy:0.8574
INFO:root:[Epoch 2 Batch 7100/12277] loss=0.3806, lr=0.0000263, metrics=accuracy:0.8573
INFO:root:[Epoch 2 Batch 7200/12277] loss=0.3699, lr=0.0000262, metrics=accuracy:0.8574
INFO:root:[Epoch 2 Batch 7300/12277] loss=0.3898, lr=0.0000260, metrics=accuracy:0.8573
INFO:root:[Epoch 2 Batch 7400/12277] loss=0.3465, lr=0.0000259, metrics=accuracy:0.8574
INFO:root:[Epoch 2 Batch 7500/12277] loss=0.3575, lr=0.0000257, metrics=accuracy:0.8576
INFO:root:[Epoch 2 Batch 7600/12277] loss=0.3430, lr=0.0000256, metrics=accuracy:0.8577
INFO:root:[Epoch 2 Batch 7700/12277] loss=0.3574, lr=0.0000254, metrics=accuracy:0.8577
INFO:root:[Epoch 2 Batch 7800/12277] loss=0.3755, lr=0.0000253, metrics=accuracy:0.8577
INFO:root:[Epoch 2 Batch 7900/12277] loss=0.3786, lr=0.0000251, metrics=accuracy:0.8576
INFO:root:[Epoch 2 Batch 8000/12277] loss=0.3562, lr=0.0000250, metrics=accuracy:0.8577
INFO:root:[Epoch 2 Batch 8100/12277] loss=0.3847, lr=0.0000248, metrics=accuracy:0.8577
INFO:root:[Epoch 2 Batch 8200/12277] loss=0.3748, lr=0.0000247, metrics=accuracy:0.8576
INFO:root:[Epoch 2 Batch 8300/12277] loss=0.3725, lr=0.0000245, metrics=accuracy:0.8577
INFO:root:[Epoch 2 Batch 8400/12277] loss=0.3467, lr=0.0000244, metrics=accuracy:0.8577
INFO:root:[Epoch 2 Batch 8500/12277] loss=0.3671, lr=0.0000242, metrics=accuracy:0.8578
INFO:root:[Epoch 2 Batch 8600/12277] loss=0.3636, lr=0.0000241, metrics=accuracy:0.8578
INFO:root:[Epoch 2 Batch 8700/12277] loss=0.3905, lr=0.0000239, metrics=accuracy:0.8577
INFO:root:[Epoch 2 Batch 8800/12277] loss=0.3660, lr=0.0000237, metrics=accuracy:0.8578
INFO:root:[Epoch 2 Batch 8900/12277] loss=0.3508, lr=0.0000236, metrics=accuracy:0.8580
INFO:root:[Epoch 2 Batch 9000/12277] loss=0.3482, lr=0.0000234, metrics=accuracy:0.8581
INFO:root:[Epoch 2 Batch 9100/12277] loss=0.3469, lr=0.0000233, metrics=accuracy:0.8582
INFO:root:[Epoch 2 Batch 9200/12277] loss=0.3583, lr=0.0000231, metrics=accuracy:0.8583
INFO:root:[Epoch 2 Batch 9300/12277] loss=0.3864, lr=0.0000230, metrics=accuracy:0.8583
INFO:root:[Epoch 2 Batch 9400/12277] loss=0.3533, lr=0.0000228, metrics=accuracy:0.8584
INFO:root:[Epoch 2 Batch 9500/12277] loss=0.3712, lr=0.0000227, metrics=accuracy:0.8584
INFO:root:[Epoch 2 Batch 9600/12277] loss=0.3501, lr=0.0000225, metrics=accuracy:0.8585
INFO:root:[Epoch 2 Batch 9700/12277] loss=0.3579, lr=0.0000224, metrics=accuracy:0.8586
INFO:root:[Epoch 2 Batch 9800/12277] loss=0.3552, lr=0.0000222, metrics=accuracy:0.8586
INFO:root:[Epoch 2 Batch 9900/12277] loss=0.3855, lr=0.0000221, metrics=accuracy:0.8586
INFO:root:[Epoch 2 Batch 10000/12277] loss=0.3719, lr=0.0000219, metrics=accuracy:0.8585
INFO:root:[Epoch 2 Batch 10100/12277] loss=0.3641, lr=0.0000218, metrics=accuracy:0.8586
INFO:root:[Epoch 2 Batch 10200/12277] loss=0.3679, lr=0.0000216, metrics=accuracy:0.8586
INFO:root:[Epoch 2 Batch 10300/12277] loss=0.3640, lr=0.0000215, metrics=accuracy:0.8586
INFO:root:[Epoch 2 Batch 10400/12277] loss=0.3541, lr=0.0000213, metrics=accuracy:0.8587
INFO:root:[Epoch 2 Batch 10500/12277] loss=0.3520, lr=0.0000212, metrics=accuracy:0.8588
INFO:root:[Epoch 2 Batch 10600/12277] loss=0.3591, lr=0.0000210, metrics=accuracy:0.8588
INFO:root:[Epoch 2 Batch 10700/12277] loss=0.3589, lr=0.0000209, metrics=accuracy:0.8588
INFO:root:[Epoch 2 Batch 10800/12277] loss=0.3673, lr=0.0000207, metrics=accuracy:0.8589
INFO:root:[Epoch 2 Batch 10900/12277] loss=0.3600, lr=0.0000206, metrics=accuracy:0.8589
INFO:root:[Epoch 2 Batch 11000/12277] loss=0.3375, lr=0.0000204, metrics=accuracy:0.8590
INFO:root:[Epoch 2 Batch 11100/12277] loss=0.3672, lr=0.0000203, metrics=accuracy:0.8590
INFO:root:[Epoch 2 Batch 11200/12277] loss=0.3659, lr=0.0000201, metrics=accuracy:0.8590
INFO:root:[Epoch 2 Batch 11300/12277] loss=0.3571, lr=0.0000200, metrics=accuracy:0.8591
INFO:root:[Epoch 2 Batch 11400/12277] loss=0.3400, lr=0.0000198, metrics=accuracy:0.8592
INFO:root:[Epoch 2 Batch 11500/12277] loss=0.3765, lr=0.0000197, metrics=accuracy:0.8591
INFO:root:[Epoch 2 Batch 11600/12277] loss=0.3333, lr=0.0000195, metrics=accuracy:0.8593
INFO:root:[Epoch 2 Batch 11700/12277] loss=0.3653, lr=0.0000194, metrics=accuracy:0.8593
INFO:root:[Epoch 2 Batch 11800/12277] loss=0.3681, lr=0.0000192, metrics=accuracy:0.8593
INFO:root:[Epoch 2 Batch 11900/12277] loss=0.3489, lr=0.0000191, metrics=accuracy:0.8595
INFO:root:[Epoch 2 Batch 12000/12277] loss=0.3388, lr=0.0000189, metrics=accuracy:0.8596
INFO:root:[Epoch 2 Batch 12100/12277] loss=0.3524, lr=0.0000188, metrics=accuracy:0.8596
INFO:root:[Epoch 2 Batch 12200/12277] loss=0.3570, lr=0.0000186, metrics=accuracy:0.8597
INFO:root:On MNLI Matched:
INFO:root:validation metrics:accuracy:0.8395
INFO:root:On MNLI Mismatched:
INFO:root:validation metrics:accuracy:0.8382
INFO:root:params saved in : ./output_dir/model_bert_MNLI_1.params
INFO:root:Time cost=2935.2s
INFO:root:[Epoch 3 Batch 100/12277] loss=0.2328, lr=0.0000184, metrics=accuracy:0.9181
INFO:root:[Epoch 3 Batch 200/12277] loss=0.2344, lr=0.0000182, metrics=accuracy:0.9175
INFO:root:[Epoch 3 Batch 300/12277] loss=0.2430, lr=0.0000180, metrics=accuracy:0.9172
INFO:root:[Epoch 3 Batch 400/12277] loss=0.2280, lr=0.0000179, metrics=accuracy:0.9177
INFO:root:[Epoch 3 Batch 500/12277] loss=0.2250, lr=0.0000177, metrics=accuracy:0.9183
INFO:root:[Epoch 3 Batch 600/12277] loss=0.2307, lr=0.0000176, metrics=accuracy:0.9185
INFO:root:[Epoch 3 Batch 700/12277] loss=0.2347, lr=0.0000174, metrics=accuracy:0.9181
INFO:root:[Epoch 3 Batch 800/12277] loss=0.2110, lr=0.0000173, metrics=accuracy:0.9182
INFO:root:[Epoch 3 Batch 900/12277] loss=0.2155, lr=0.0000171, metrics=accuracy:0.9189
INFO:root:[Epoch 3 Batch 1000/12277] loss=0.2089, lr=0.0000170, metrics=accuracy:0.9197
INFO:root:[Epoch 3 Batch 1100/12277] loss=0.2557, lr=0.0000168, metrics=accuracy:0.9191
INFO:root:[Epoch 3 Batch 1200/12277] loss=0.2261, lr=0.0000167, metrics=accuracy:0.9191
INFO:root:[Epoch 3 Batch 1300/12277] loss=0.2287, lr=0.0000165, metrics=accuracy:0.9191
INFO:root:[Epoch 3 Batch 1400/12277] loss=0.2201, lr=0.0000164, metrics=accuracy:0.9196
INFO:root:[Epoch 3 Batch 1500/12277] loss=0.2453, lr=0.0000162, metrics=accuracy:0.9190
INFO:root:[Epoch 3 Batch 1600/12277] loss=0.2214, lr=0.0000161, metrics=accuracy:0.9192
INFO:root:[Epoch 3 Batch 1700/12277] loss=0.2198, lr=0.0000159, metrics=accuracy:0.9191
INFO:root:[Epoch 3 Batch 1800/12277] loss=0.2280, lr=0.0000158, metrics=accuracy:0.9189
INFO:root:[Epoch 3 Batch 1900/12277] loss=0.2238, lr=0.0000156, metrics=accuracy:0.9190
INFO:root:[Epoch 3 Batch 2000/12277] loss=0.2208, lr=0.0000155, metrics=accuracy:0.9191
INFO:root:[Epoch 3 Batch 2100/12277] loss=0.2457, lr=0.0000153, metrics=accuracy:0.9187
INFO:root:[Epoch 3 Batch 2200/12277] loss=0.2342, lr=0.0000152, metrics=accuracy:0.9185
INFO:root:[Epoch 3 Batch 2300/12277] loss=0.2239, lr=0.0000150, metrics=accuracy:0.9188
INFO:root:[Epoch 3 Batch 2400/12277] loss=0.2384, lr=0.0000149, metrics=accuracy:0.9186
INFO:root:[Epoch 3 Batch 2500/12277] loss=0.2200, lr=0.0000147, metrics=accuracy:0.9188
INFO:root:[Epoch 3 Batch 2600/12277] loss=0.2378, lr=0.0000146, metrics=accuracy:0.9186
INFO:root:[Epoch 3 Batch 2700/12277] loss=0.2300, lr=0.0000144, metrics=accuracy:0.9186
INFO:root:[Epoch 3 Batch 2800/12277] loss=0.2305, lr=0.0000143, metrics=accuracy:0.9185
INFO:root:[Epoch 3 Batch 2900/12277] loss=0.2272, lr=0.0000141, metrics=accuracy:0.9186
INFO:root:[Epoch 3 Batch 3000/12277] loss=0.2408, lr=0.0000140, metrics=accuracy:0.9185
INFO:root:[Epoch 3 Batch 3100/12277] loss=0.2409, lr=0.0000138, metrics=accuracy:0.9185
INFO:root:[Epoch 3 Batch 3200/12277] loss=0.2339, lr=0.0000137, metrics=accuracy:0.9184
INFO:root:[Epoch 3 Batch 3300/12277] loss=0.2373, lr=0.0000135, metrics=accuracy:0.9183
INFO:root:[Epoch 3 Batch 3400/12277] loss=0.2138, lr=0.0000134, metrics=accuracy:0.9184
INFO:root:[Epoch 3 Batch 3500/12277] loss=0.2237, lr=0.0000132, metrics=accuracy:0.9186
INFO:root:[Epoch 3 Batch 3600/12277] loss=0.2299, lr=0.0000131, metrics=accuracy:0.9185
INFO:root:[Epoch 3 Batch 3700/12277] loss=0.2241, lr=0.0000129, metrics=accuracy:0.9185
INFO:root:[Epoch 3 Batch 3800/12277] loss=0.2216, lr=0.0000128, metrics=accuracy:0.9187
INFO:root:[Epoch 3 Batch 3900/12277] loss=0.2292, lr=0.0000126, metrics=accuracy:0.9186
INFO:root:[Epoch 3 Batch 4000/12277] loss=0.2251, lr=0.0000125, metrics=accuracy:0.9186
INFO:root:[Epoch 3 Batch 4100/12277] loss=0.2345, lr=0.0000123, metrics=accuracy:0.9184
INFO:root:[Epoch 3 Batch 4200/12277] loss=0.2460, lr=0.0000122, metrics=accuracy:0.9183
INFO:root:[Epoch 3 Batch 4300/12277] loss=0.2334, lr=0.0000120, metrics=accuracy:0.9183
INFO:root:[Epoch 3 Batch 4400/12277] loss=0.2179, lr=0.0000119, metrics=accuracy:0.9183
INFO:root:[Epoch 3 Batch 4500/12277] loss=0.2070, lr=0.0000117, metrics=accuracy:0.9186
INFO:root:[Epoch 3 Batch 4600/12277] loss=0.2398, lr=0.0000116, metrics=accuracy:0.9185
INFO:root:[Epoch 3 Batch 4700/12277] loss=0.2377, lr=0.0000114, metrics=accuracy:0.9184
INFO:root:[Epoch 3 Batch 4800/12277] loss=0.2102, lr=0.0000113, metrics=accuracy:0.9186
INFO:root:[Epoch 3 Batch 4900/12277] loss=0.2158, lr=0.0000111, metrics=accuracy:0.9187
INFO:root:[Epoch 3 Batch 5000/12277] loss=0.2282, lr=0.0000110, metrics=accuracy:0.9188
INFO:root:[Epoch 3 Batch 5100/12277] loss=0.2310, lr=0.0000108, metrics=accuracy:0.9188
INFO:root:[Epoch 3 Batch 5200/12277] loss=0.2311, lr=0.0000107, metrics=accuracy:0.9187
INFO:root:[Epoch 3 Batch 5300/12277] loss=0.2278, lr=0.0000105, metrics=accuracy:0.9188
INFO:root:[Epoch 3 Batch 5400/12277] loss=0.2296, lr=0.0000104, metrics=accuracy:0.9188
INFO:root:[Epoch 3 Batch 5500/12277] loss=0.2295, lr=0.0000102, metrics=accuracy:0.9189
INFO:root:[Epoch 3 Batch 5600/12277] loss=0.2081, lr=0.0000101, metrics=accuracy:0.9191
INFO:root:[Epoch 3 Batch 5700/12277] loss=0.2529, lr=0.0000099, metrics=accuracy:0.9190
INFO:root:[Epoch 3 Batch 5800/12277] loss=0.2334, lr=0.0000097, metrics=accuracy:0.9189
INFO:root:[Epoch 3 Batch 5900/12277] loss=0.2411, lr=0.0000096, metrics=accuracy:0.9189
INFO:root:[Epoch 3 Batch 6000/12277] loss=0.1987, lr=0.0000094, metrics=accuracy:0.9190
INFO:root:[Epoch 3 Batch 6100/12277] loss=0.2188, lr=0.0000093, metrics=accuracy:0.9191
INFO:root:[Epoch 3 Batch 6200/12277] loss=0.2191, lr=0.0000091, metrics=accuracy:0.9192
INFO:root:[Epoch 3 Batch 6300/12277] loss=0.2103, lr=0.0000090, metrics=accuracy:0.9193
INFO:root:[Epoch 3 Batch 6400/12277] loss=0.2450, lr=0.0000088, metrics=accuracy:0.9192
INFO:root:[Epoch 3 Batch 6500/12277] loss=0.2293, lr=0.0000087, metrics=accuracy:0.9191
INFO:root:[Epoch 3 Batch 6600/12277] loss=0.2226, lr=0.0000085, metrics=accuracy:0.9192
INFO:root:[Epoch 3 Batch 6700/12277] loss=0.2109, lr=0.0000084, metrics=accuracy:0.9192
INFO:root:[Epoch 3 Batch 6800/12277] loss=0.2266, lr=0.0000082, metrics=accuracy:0.9192
INFO:root:[Epoch 3 Batch 6900/12277] loss=0.2132, lr=0.0000081, metrics=accuracy:0.9193
INFO:root:[Epoch 3 Batch 7000/12277] loss=0.1935, lr=0.0000079, metrics=accuracy:0.9196
INFO:root:[Epoch 3 Batch 7100/12277] loss=0.2177, lr=0.0000078, metrics=accuracy:0.9196
INFO:root:[Epoch 3 Batch 7200/12277] loss=0.2292, lr=0.0000076, metrics=accuracy:0.9196
INFO:root:[Epoch 3 Batch 7300/12277] loss=0.2168, lr=0.0000075, metrics=accuracy:0.9196
INFO:root:[Epoch 3 Batch 7400/12277] loss=0.2034, lr=0.0000073, metrics=accuracy:0.9197
INFO:root:[Epoch 3 Batch 7500/12277] loss=0.2171, lr=0.0000072, metrics=accuracy:0.9197
INFO:root:[Epoch 3 Batch 7600/12277] loss=0.2171, lr=0.0000070, metrics=accuracy:0.9197
INFO:root:[Epoch 3 Batch 7700/12277] loss=0.2264, lr=0.0000069, metrics=accuracy:0.9197
INFO:root:[Epoch 3 Batch 7800/12277] loss=0.2149, lr=0.0000067, metrics=accuracy:0.9198
INFO:root:[Epoch 3 Batch 7900/12277] loss=0.2406, lr=0.0000066, metrics=accuracy:0.9197
INFO:root:[Epoch 3 Batch 8000/12277] loss=0.2121, lr=0.0000064, metrics=accuracy:0.9198
INFO:root:[Epoch 3 Batch 8100/12277] loss=0.2215, lr=0.0000063, metrics=accuracy:0.9198
INFO:root:[Epoch 3 Batch 8200/12277] loss=0.2263, lr=0.0000061, metrics=accuracy:0.9198
INFO:root:[Epoch 3 Batch 8300/12277] loss=0.2084, lr=0.0000060, metrics=accuracy:0.9199
INFO:root:[Epoch 3 Batch 8400/12277] loss=0.2263, lr=0.0000058, metrics=accuracy:0.9199
INFO:root:[Epoch 3 Batch 8500/12277] loss=0.2073, lr=0.0000057, metrics=accuracy:0.9200
INFO:root:[Epoch 3 Batch 8600/12277] loss=0.2363, lr=0.0000055, metrics=accuracy:0.9198
INFO:root:[Epoch 3 Batch 8700/12277] loss=0.2210, lr=0.0000054, metrics=accuracy:0.9199
INFO:root:[Epoch 3 Batch 8800/12277] loss=0.2073, lr=0.0000052, metrics=accuracy:0.9199
INFO:root:[Epoch 3 Batch 8900/12277] loss=0.2272, lr=0.0000051, metrics=accuracy:0.9199
INFO:root:[Epoch 3 Batch 9000/12277] loss=0.1974, lr=0.0000049, metrics=accuracy:0.9201
INFO:root:[Epoch 3 Batch 9100/12277] loss=0.1935, lr=0.0000048, metrics=accuracy:0.9202
INFO:root:[Epoch 3 Batch 9200/12277] loss=0.2173, lr=0.0000046, metrics=accuracy:0.9202
INFO:root:[Epoch 3 Batch 9300/12277] loss=0.2246, lr=0.0000045, metrics=accuracy:0.9203
INFO:root:[Epoch 3 Batch 9400/12277] loss=0.2251, lr=0.0000043, metrics=accuracy:0.9203
INFO:root:[Epoch 3 Batch 9500/12277] loss=0.1962, lr=0.0000042, metrics=accuracy:0.9204
INFO:root:[Epoch 3 Batch 9600/12277] loss=0.2124, lr=0.0000040, metrics=accuracy:0.9205
INFO:root:[Epoch 3 Batch 9700/12277] loss=0.2150, lr=0.0000039, metrics=accuracy:0.9205
INFO:root:[Epoch 3 Batch 9800/12277] loss=0.2148, lr=0.0000037, metrics=accuracy:0.9205
INFO:root:[Epoch 3 Batch 9900/12277] loss=0.2296, lr=0.0000036, metrics=accuracy:0.9204
INFO:root:[Epoch 3 Batch 10000/12277] loss=0.2131, lr=0.0000034, metrics=accuracy:0.9205
INFO:root:[Epoch 3 Batch 10100/12277] loss=0.2276, lr=0.0000033, metrics=accuracy:0.9205
INFO:root:[Epoch 3 Batch 10200/12277] loss=0.2342, lr=0.0000031, metrics=accuracy:0.9204
INFO:root:[Epoch 3 Batch 10300/12277] loss=0.2104, lr=0.0000030, metrics=accuracy:0.9205
INFO:root:[Epoch 3 Batch 10400/12277] loss=0.2103, lr=0.0000028, metrics=accuracy:0.9205
INFO:root:[Epoch 3 Batch 10500/12277] loss=0.2147, lr=0.0000027, metrics=accuracy:0.9206
INFO:root:[Epoch 3 Batch 10600/12277] loss=0.2185, lr=0.0000025, metrics=accuracy:0.9206
INFO:root:[Epoch 3 Batch 10700/12277] loss=0.2136, lr=0.0000024, metrics=accuracy:0.9206
INFO:root:[Epoch 3 Batch 10800/12277] loss=0.2129, lr=0.0000022, metrics=accuracy:0.9206
INFO:root:[Epoch 3 Batch 10900/12277] loss=0.2212, lr=0.0000021, metrics=accuracy:0.9206
INFO:root:[Epoch 3 Batch 11000/12277] loss=0.2069, lr=0.0000019, metrics=accuracy:0.9207
INFO:root:[Epoch 3 Batch 11100/12277] loss=0.2302, lr=0.0000018, metrics=accuracy:0.9207
INFO:root:[Epoch 3 Batch 11200/12277] loss=0.2160, lr=0.0000016, metrics=accuracy:0.9207
INFO:root:[Epoch 3 Batch 11300/12277] loss=0.1919, lr=0.0000015, metrics=accuracy:0.9207
INFO:root:[Epoch 3 Batch 11400/12277] loss=0.2249, lr=0.0000013, metrics=accuracy:0.9207
INFO:root:[Epoch 3 Batch 11500/12277] loss=0.1948, lr=0.0000011, metrics=accuracy:0.9208
INFO:root:[Epoch 3 Batch 11600/12277] loss=0.2044, lr=0.0000010, metrics=accuracy:0.9209
INFO:root:[Epoch 3 Batch 11700/12277] loss=0.1999, lr=0.0000008, metrics=accuracy:0.9209
INFO:root:[Epoch 3 Batch 11800/12277] loss=0.1921, lr=0.0000007, metrics=accuracy:0.9210
INFO:root:[Epoch 3 Batch 11900/12277] loss=0.2376, lr=0.0000005, metrics=accuracy:0.9210
INFO:root:[Epoch 3 Batch 12000/12277] loss=0.2224, lr=0.0000004, metrics=accuracy:0.9209
INFO:root:[Epoch 3 Batch 12100/12277] loss=0.2182, lr=0.0000002, metrics=accuracy:0.9210
INFO:root:[Epoch 3 Batch 12200/12277] loss=0.2330, lr=0.0000001, metrics=accuracy:0.9209
INFO:root:On MNLI Matched:
INFO:root:validation metrics:accuracy:0.8455
INFO:root:On MNLI Mismatched:
INFO:root:validation metrics:accuracy:0.8466
INFO:root:params saved in : ./output_dir/model_bert_MNLI_2.params
INFO:root:Time cost=2935.0s