Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
web-data/gluonnlp/logs/bert/finetuned_mnli.log
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
649 lines (649 sloc)
43.9 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
INFO:root:Namespace(accumulate=None, batch_size=32, bert_dataset='book_corpus_wiki_en_uncased', bert_model='bert_12_768_12', dev_batch_size=8, epochs=3, gpu=True, log_interval=100, lr=5e-05, max_len=80, model_parameters=None, optimizer='bertadam', output_dir='./output_dir', seed=2, task_name='MNLI', warmup_ratio=0.1) | |
INFO:root:BERTClassifier( | |
(bert): BERTModel( | |
(encoder): BERTEncoder( | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
(transformer_cells): HybridSequential( | |
(0): BERTEncoderCell( | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(attention_cell): MultiHeadAttentionCell( | |
(_base_cell): DotProductAttentionCell( | |
(_dropout_layer): Dropout(p = 0.1, axes=()) | |
) | |
(proj_query): Dense(768 -> 768, linear) | |
(proj_key): Dense(768 -> 768, linear) | |
(proj_value): Dense(768 -> 768, linear) | |
) | |
(proj): Dense(768 -> 768, linear) | |
(ffn): BERTPositionwiseFFN( | |
(ffn_1): Dense(768 -> 3072, linear) | |
(activation): GELU() | |
(ffn_2): Dense(3072 -> 768, linear) | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(1): BERTEncoderCell( | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(attention_cell): MultiHeadAttentionCell( | |
(_base_cell): DotProductAttentionCell( | |
(_dropout_layer): Dropout(p = 0.1, axes=()) | |
) | |
(proj_query): Dense(768 -> 768, linear) | |
(proj_key): Dense(768 -> 768, linear) | |
(proj_value): Dense(768 -> 768, linear) | |
) | |
(proj): Dense(768 -> 768, linear) | |
(ffn): BERTPositionwiseFFN( | |
(ffn_1): Dense(768 -> 3072, linear) | |
(activation): GELU() | |
(ffn_2): Dense(3072 -> 768, linear) | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(2): BERTEncoderCell( | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(attention_cell): MultiHeadAttentionCell( | |
(_base_cell): DotProductAttentionCell( | |
(_dropout_layer): Dropout(p = 0.1, axes=()) | |
) | |
(proj_query): Dense(768 -> 768, linear) | |
(proj_key): Dense(768 -> 768, linear) | |
(proj_value): Dense(768 -> 768, linear) | |
) | |
(proj): Dense(768 -> 768, linear) | |
(ffn): BERTPositionwiseFFN( | |
(ffn_1): Dense(768 -> 3072, linear) | |
(activation): GELU() | |
(ffn_2): Dense(3072 -> 768, linear) | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(3): BERTEncoderCell( | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(attention_cell): MultiHeadAttentionCell( | |
(_base_cell): DotProductAttentionCell( | |
(_dropout_layer): Dropout(p = 0.1, axes=()) | |
) | |
(proj_query): Dense(768 -> 768, linear) | |
(proj_key): Dense(768 -> 768, linear) | |
(proj_value): Dense(768 -> 768, linear) | |
) | |
(proj): Dense(768 -> 768, linear) | |
(ffn): BERTPositionwiseFFN( | |
(ffn_1): Dense(768 -> 3072, linear) | |
(activation): GELU() | |
(ffn_2): Dense(3072 -> 768, linear) | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(4): BERTEncoderCell( | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(attention_cell): MultiHeadAttentionCell( | |
(_base_cell): DotProductAttentionCell( | |
(_dropout_layer): Dropout(p = 0.1, axes=()) | |
) | |
(proj_query): Dense(768 -> 768, linear) | |
(proj_key): Dense(768 -> 768, linear) | |
(proj_value): Dense(768 -> 768, linear) | |
) | |
(proj): Dense(768 -> 768, linear) | |
(ffn): BERTPositionwiseFFN( | |
(ffn_1): Dense(768 -> 3072, linear) | |
(activation): GELU() | |
(ffn_2): Dense(3072 -> 768, linear) | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(5): BERTEncoderCell( | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(attention_cell): MultiHeadAttentionCell( | |
(_base_cell): DotProductAttentionCell( | |
(_dropout_layer): Dropout(p = 0.1, axes=()) | |
) | |
(proj_query): Dense(768 -> 768, linear) | |
(proj_key): Dense(768 -> 768, linear) | |
(proj_value): Dense(768 -> 768, linear) | |
) | |
(proj): Dense(768 -> 768, linear) | |
(ffn): BERTPositionwiseFFN( | |
(ffn_1): Dense(768 -> 3072, linear) | |
(activation): GELU() | |
(ffn_2): Dense(3072 -> 768, linear) | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(6): BERTEncoderCell( | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(attention_cell): MultiHeadAttentionCell( | |
(_base_cell): DotProductAttentionCell( | |
(_dropout_layer): Dropout(p = 0.1, axes=()) | |
) | |
(proj_query): Dense(768 -> 768, linear) | |
(proj_key): Dense(768 -> 768, linear) | |
(proj_value): Dense(768 -> 768, linear) | |
) | |
(proj): Dense(768 -> 768, linear) | |
(ffn): BERTPositionwiseFFN( | |
(ffn_1): Dense(768 -> 3072, linear) | |
(activation): GELU() | |
(ffn_2): Dense(3072 -> 768, linear) | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(7): BERTEncoderCell( | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(attention_cell): MultiHeadAttentionCell( | |
(_base_cell): DotProductAttentionCell( | |
(_dropout_layer): Dropout(p = 0.1, axes=()) | |
) | |
(proj_query): Dense(768 -> 768, linear) | |
(proj_key): Dense(768 -> 768, linear) | |
(proj_value): Dense(768 -> 768, linear) | |
) | |
(proj): Dense(768 -> 768, linear) | |
(ffn): BERTPositionwiseFFN( | |
(ffn_1): Dense(768 -> 3072, linear) | |
(activation): GELU() | |
(ffn_2): Dense(3072 -> 768, linear) | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(8): BERTEncoderCell( | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(attention_cell): MultiHeadAttentionCell( | |
(_base_cell): DotProductAttentionCell( | |
(_dropout_layer): Dropout(p = 0.1, axes=()) | |
) | |
(proj_query): Dense(768 -> 768, linear) | |
(proj_key): Dense(768 -> 768, linear) | |
(proj_value): Dense(768 -> 768, linear) | |
) | |
(proj): Dense(768 -> 768, linear) | |
(ffn): BERTPositionwiseFFN( | |
(ffn_1): Dense(768 -> 3072, linear) | |
(activation): GELU() | |
(ffn_2): Dense(3072 -> 768, linear) | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(9): BERTEncoderCell( | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(attention_cell): MultiHeadAttentionCell( | |
(_base_cell): DotProductAttentionCell( | |
(_dropout_layer): Dropout(p = 0.1, axes=()) | |
) | |
(proj_query): Dense(768 -> 768, linear) | |
(proj_key): Dense(768 -> 768, linear) | |
(proj_value): Dense(768 -> 768, linear) | |
) | |
(proj): Dense(768 -> 768, linear) | |
(ffn): BERTPositionwiseFFN( | |
(ffn_1): Dense(768 -> 3072, linear) | |
(activation): GELU() | |
(ffn_2): Dense(3072 -> 768, linear) | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(10): BERTEncoderCell( | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(attention_cell): MultiHeadAttentionCell( | |
(_base_cell): DotProductAttentionCell( | |
(_dropout_layer): Dropout(p = 0.1, axes=()) | |
) | |
(proj_query): Dense(768 -> 768, linear) | |
(proj_key): Dense(768 -> 768, linear) | |
(proj_value): Dense(768 -> 768, linear) | |
) | |
(proj): Dense(768 -> 768, linear) | |
(ffn): BERTPositionwiseFFN( | |
(ffn_1): Dense(768 -> 3072, linear) | |
(activation): GELU() | |
(ffn_2): Dense(3072 -> 768, linear) | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(11): BERTEncoderCell( | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(attention_cell): MultiHeadAttentionCell( | |
(_base_cell): DotProductAttentionCell( | |
(_dropout_layer): Dropout(p = 0.1, axes=()) | |
) | |
(proj_query): Dense(768 -> 768, linear) | |
(proj_key): Dense(768 -> 768, linear) | |
(proj_value): Dense(768 -> 768, linear) | |
) | |
(proj): Dense(768 -> 768, linear) | |
(ffn): BERTPositionwiseFFN( | |
(ffn_1): Dense(768 -> 3072, linear) | |
(activation): GELU() | |
(ffn_2): Dense(3072 -> 768, linear) | |
(dropout_layer): Dropout(p = 0.1, axes=()) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
(layer_norm): BERTLayerNorm(eps=1e-12, axis=-1, center=True, scale=True, in_channels=768) | |
) | |
) | |
) | |
(word_embed): HybridSequential( | |
(0): Embedding(30522 -> 768, float32) | |
(1): Dropout(p = 0.1, axes=()) | |
) | |
(token_type_embed): HybridSequential( | |
(0): Embedding(2 -> 768, float32) | |
(1): Dropout(p = 0.1, axes=()) | |
) | |
(pooler): Dense(768 -> 768, Activation(tanh)) | |
) | |
(classifier): HybridSequential( | |
(0): Dropout(p = 0.1, axes=()) | |
(1): Dense(None -> 3, linear) | |
) | |
) | |
INFO:root:processing dataset... | |
INFO:root:[Epoch 1 Batch 100/12277] loss=1.1486, lr=0.0000014, metrics=accuracy:0.3550 | |
INFO:root:[Epoch 1 Batch 200/12277] loss=1.0610, lr=0.0000027, metrics=accuracy:0.3909 | |
INFO:root:[Epoch 1 Batch 300/12277] loss=0.9557, lr=0.0000041, metrics=accuracy:0.4423 | |
INFO:root:[Epoch 1 Batch 400/12277] loss=0.8391, lr=0.0000054, metrics=accuracy:0.4914 | |
INFO:root:[Epoch 1 Batch 500/12277] loss=0.7997, lr=0.0000068, metrics=accuracy:0.5241 | |
INFO:root:[Epoch 1 Batch 600/12277] loss=0.7553, lr=0.0000081, metrics=accuracy:0.5497 | |
INFO:root:[Epoch 1 Batch 700/12277] loss=0.7370, lr=0.0000095, metrics=accuracy:0.5687 | |
INFO:root:[Epoch 1 Batch 800/12277] loss=0.7035, lr=0.0000109, metrics=accuracy:0.5856 | |
INFO:root:[Epoch 1 Batch 900/12277] loss=0.6920, lr=0.0000122, metrics=accuracy:0.5992 | |
INFO:root:[Epoch 1 Batch 1000/12277] loss=0.6913, lr=0.0000136, metrics=accuracy:0.6103 | |
INFO:root:[Epoch 1 Batch 1100/12277] loss=0.6783, lr=0.0000149, metrics=accuracy:0.6197 | |
INFO:root:[Epoch 1 Batch 1200/12277] loss=0.6398, lr=0.0000163, metrics=accuracy:0.6297 | |
INFO:root:[Epoch 1 Batch 1300/12277] loss=0.6366, lr=0.0000177, metrics=accuracy:0.6385 | |
INFO:root:[Epoch 1 Batch 1400/12277] loss=0.6478, lr=0.0000190, metrics=accuracy:0.6453 | |
INFO:root:[Epoch 1 Batch 1500/12277] loss=0.6242, lr=0.0000204, metrics=accuracy:0.6520 | |
INFO:root:[Epoch 1 Batch 1600/12277] loss=0.6165, lr=0.0000217, metrics=accuracy:0.6583 | |
INFO:root:[Epoch 1 Batch 1700/12277] loss=0.6019, lr=0.0000231, metrics=accuracy:0.6639 | |
INFO:root:[Epoch 1 Batch 1800/12277] loss=0.6262, lr=0.0000244, metrics=accuracy:0.6687 | |
INFO:root:[Epoch 1 Batch 1900/12277] loss=0.6176, lr=0.0000258, metrics=accuracy:0.6730 | |
INFO:root:[Epoch 1 Batch 2000/12277] loss=0.5914, lr=0.0000272, metrics=accuracy:0.6771 | |
INFO:root:[Epoch 1 Batch 2100/12277] loss=0.5881, lr=0.0000285, metrics=accuracy:0.6814 | |
INFO:root:[Epoch 1 Batch 2200/12277] loss=0.6084, lr=0.0000299, metrics=accuracy:0.6846 | |
INFO:root:[Epoch 1 Batch 2300/12277] loss=0.6204, lr=0.0000312, metrics=accuracy:0.6877 | |
INFO:root:[Epoch 1 Batch 2400/12277] loss=0.5828, lr=0.0000326, metrics=accuracy:0.6909 | |
INFO:root:[Epoch 1 Batch 2500/12277] loss=0.6147, lr=0.0000340, metrics=accuracy:0.6930 | |
INFO:root:[Epoch 1 Batch 2600/12277] loss=0.5919, lr=0.0000353, metrics=accuracy:0.6955 | |
INFO:root:[Epoch 1 Batch 2700/12277] loss=0.6018, lr=0.0000367, metrics=accuracy:0.6980 | |
INFO:root:[Epoch 1 Batch 2800/12277] loss=0.6102, lr=0.0000380, metrics=accuracy:0.6999 | |
INFO:root:[Epoch 1 Batch 2900/12277] loss=0.5564, lr=0.0000394, metrics=accuracy:0.7026 | |
INFO:root:[Epoch 1 Batch 3000/12277] loss=0.6061, lr=0.0000407, metrics=accuracy:0.7044 | |
INFO:root:[Epoch 1 Batch 3100/12277] loss=0.5720, lr=0.0000421, metrics=accuracy:0.7067 | |
INFO:root:[Epoch 1 Batch 3200/12277] loss=0.5922, lr=0.0000435, metrics=accuracy:0.7081 | |
INFO:root:[Epoch 1 Batch 3300/12277] loss=0.5786, lr=0.0000448, metrics=accuracy:0.7100 | |
INFO:root:[Epoch 1 Batch 3400/12277] loss=0.5656, lr=0.0000462, metrics=accuracy:0.7119 | |
INFO:root:[Epoch 1 Batch 3500/12277] loss=0.5765, lr=0.0000475, metrics=accuracy:0.7134 | |
INFO:root:[Epoch 1 Batch 3600/12277] loss=0.5689, lr=0.0000489, metrics=accuracy:0.7150 | |
INFO:root:[Epoch 1 Batch 3700/12277] loss=0.5286, lr=0.0000500, metrics=accuracy:0.7170 | |
INFO:root:[Epoch 1 Batch 3800/12277] loss=0.5646, lr=0.0000498, metrics=accuracy:0.7186 | |
INFO:root:[Epoch 1 Batch 3900/12277] loss=0.5563, lr=0.0000497, metrics=accuracy:0.7201 | |
INFO:root:[Epoch 1 Batch 4000/12277] loss=0.5847, lr=0.0000495, metrics=accuracy:0.7212 | |
INFO:root:[Epoch 1 Batch 4100/12277] loss=0.5873, lr=0.0000494, metrics=accuracy:0.7225 | |
INFO:root:[Epoch 1 Batch 4200/12277] loss=0.5530, lr=0.0000492, metrics=accuracy:0.7238 | |
INFO:root:[Epoch 1 Batch 4300/12277] loss=0.5895, lr=0.0000491, metrics=accuracy:0.7246 | |
INFO:root:[Epoch 1 Batch 4400/12277] loss=0.5605, lr=0.0000489, metrics=accuracy:0.7258 | |
INFO:root:[Epoch 1 Batch 4500/12277] loss=0.5573, lr=0.0000488, metrics=accuracy:0.7270 | |
INFO:root:[Epoch 1 Batch 4600/12277] loss=0.5499, lr=0.0000486, metrics=accuracy:0.7283 | |
INFO:root:[Epoch 1 Batch 4700/12277] loss=0.5515, lr=0.0000485, metrics=accuracy:0.7294 | |
INFO:root:[Epoch 1 Batch 4800/12277] loss=0.5390, lr=0.0000483, metrics=accuracy:0.7308 | |
INFO:root:[Epoch 1 Batch 4900/12277] loss=0.5654, lr=0.0000482, metrics=accuracy:0.7316 | |
INFO:root:[Epoch 1 Batch 5000/12277] loss=0.6041, lr=0.0000480, metrics=accuracy:0.7322 | |
INFO:root:[Epoch 1 Batch 5100/12277] loss=0.5512, lr=0.0000479, metrics=accuracy:0.7333 | |
INFO:root:[Epoch 1 Batch 5200/12277] loss=0.5534, lr=0.0000477, metrics=accuracy:0.7341 | |
INFO:root:[Epoch 1 Batch 5300/12277] loss=0.5583, lr=0.0000476, metrics=accuracy:0.7348 | |
INFO:root:[Epoch 1 Batch 5400/12277] loss=0.5186, lr=0.0000474, metrics=accuracy:0.7359 | |
INFO:root:[Epoch 1 Batch 5500/12277] loss=0.5604, lr=0.0000473, metrics=accuracy:0.7365 | |
INFO:root:[Epoch 1 Batch 5600/12277] loss=0.5303, lr=0.0000471, metrics=accuracy:0.7373 | |
INFO:root:[Epoch 1 Batch 5700/12277] loss=0.5473, lr=0.0000470, metrics=accuracy:0.7381 | |
INFO:root:[Epoch 1 Batch 5800/12277] loss=0.5175, lr=0.0000468, metrics=accuracy:0.7391 | |
INFO:root:[Epoch 1 Batch 5900/12277] loss=0.5376, lr=0.0000467, metrics=accuracy:0.7398 | |
INFO:root:[Epoch 1 Batch 6000/12277] loss=0.5551, lr=0.0000465, metrics=accuracy:0.7404 | |
INFO:root:[Epoch 1 Batch 6100/12277] loss=0.5492, lr=0.0000463, metrics=accuracy:0.7410 | |
INFO:root:[Epoch 1 Batch 6200/12277] loss=0.5311, lr=0.0000462, metrics=accuracy:0.7417 | |
INFO:root:[Epoch 1 Batch 6300/12277] loss=0.5236, lr=0.0000460, metrics=accuracy:0.7425 | |
INFO:root:[Epoch 1 Batch 6400/12277] loss=0.5166, lr=0.0000459, metrics=accuracy:0.7433 | |
INFO:root:[Epoch 1 Batch 6500/12277] loss=0.5445, lr=0.0000457, metrics=accuracy:0.7440 | |
INFO:root:[Epoch 1 Batch 6600/12277] loss=0.5348, lr=0.0000456, metrics=accuracy:0.7447 | |
INFO:root:[Epoch 1 Batch 6700/12277] loss=0.5247, lr=0.0000454, metrics=accuracy:0.7454 | |
INFO:root:[Epoch 1 Batch 6800/12277] loss=0.5279, lr=0.0000453, metrics=accuracy:0.7462 | |
INFO:root:[Epoch 1 Batch 6900/12277] loss=0.5058, lr=0.0000451, metrics=accuracy:0.7470 | |
INFO:root:[Epoch 1 Batch 7000/12277] loss=0.5217, lr=0.0000450, metrics=accuracy:0.7476 | |
INFO:root:[Epoch 1 Batch 7100/12277] loss=0.5214, lr=0.0000448, metrics=accuracy:0.7482 | |
INFO:root:[Epoch 1 Batch 7200/12277] loss=0.5147, lr=0.0000447, metrics=accuracy:0.7489 | |
INFO:root:[Epoch 1 Batch 7300/12277] loss=0.5265, lr=0.0000445, metrics=accuracy:0.7496 | |
INFO:root:[Epoch 1 Batch 7400/12277] loss=0.5113, lr=0.0000444, metrics=accuracy:0.7502 | |
INFO:root:[Epoch 1 Batch 7500/12277] loss=0.5129, lr=0.0000442, metrics=accuracy:0.7509 | |
INFO:root:[Epoch 1 Batch 7600/12277] loss=0.5286, lr=0.0000441, metrics=accuracy:0.7514 | |
INFO:root:[Epoch 1 Batch 7700/12277] loss=0.5334, lr=0.0000439, metrics=accuracy:0.7519 | |
INFO:root:[Epoch 1 Batch 7800/12277] loss=0.4988, lr=0.0000438, metrics=accuracy:0.7526 | |
INFO:root:[Epoch 1 Batch 7900/12277] loss=0.5064, lr=0.0000436, metrics=accuracy:0.7532 | |
INFO:root:[Epoch 1 Batch 8000/12277] loss=0.4915, lr=0.0000435, metrics=accuracy:0.7538 | |
INFO:root:[Epoch 1 Batch 8100/12277] loss=0.5063, lr=0.0000433, metrics=accuracy:0.7545 | |
INFO:root:[Epoch 1 Batch 8200/12277] loss=0.4974, lr=0.0000432, metrics=accuracy:0.7551 | |
INFO:root:[Epoch 1 Batch 8300/12277] loss=0.5069, lr=0.0000430, metrics=accuracy:0.7557 | |
INFO:root:[Epoch 1 Batch 8400/12277] loss=0.5158, lr=0.0000429, metrics=accuracy:0.7561 | |
INFO:root:[Epoch 1 Batch 8500/12277] loss=0.4869, lr=0.0000427, metrics=accuracy:0.7567 | |
INFO:root:[Epoch 1 Batch 8600/12277] loss=0.5133, lr=0.0000426, metrics=accuracy:0.7572 | |
INFO:root:[Epoch 1 Batch 8700/12277] loss=0.5269, lr=0.0000424, metrics=accuracy:0.7576 | |
INFO:root:[Epoch 1 Batch 8800/12277] loss=0.5105, lr=0.0000423, metrics=accuracy:0.7580 | |
INFO:root:[Epoch 1 Batch 8900/12277] loss=0.5210, lr=0.0000421, metrics=accuracy:0.7585 | |
INFO:root:[Epoch 1 Batch 9000/12277] loss=0.4871, lr=0.0000420, metrics=accuracy:0.7590 | |
INFO:root:[Epoch 1 Batch 9100/12277] loss=0.5097, lr=0.0000418, metrics=accuracy:0.7594 | |
INFO:root:[Epoch 1 Batch 9200/12277] loss=0.5006, lr=0.0000417, metrics=accuracy:0.7599 | |
INFO:root:[Epoch 1 Batch 9300/12277] loss=0.5186, lr=0.0000415, metrics=accuracy:0.7603 | |
INFO:root:[Epoch 1 Batch 9400/12277] loss=0.5054, lr=0.0000414, metrics=accuracy:0.7608 | |
INFO:root:[Epoch 1 Batch 9500/12277] loss=0.5145, lr=0.0000412, metrics=accuracy:0.7612 | |
INFO:root:[Epoch 1 Batch 9600/12277] loss=0.5008, lr=0.0000411, metrics=accuracy:0.7616 | |
INFO:root:[Epoch 1 Batch 9700/12277] loss=0.5175, lr=0.0000409, metrics=accuracy:0.7619 | |
INFO:root:[Epoch 1 Batch 9800/12277] loss=0.5049, lr=0.0000408, metrics=accuracy:0.7624 | |
INFO:root:[Epoch 1 Batch 9900/12277] loss=0.4772, lr=0.0000406, metrics=accuracy:0.7629 | |
INFO:root:[Epoch 1 Batch 10000/12277] loss=0.4961, lr=0.0000405, metrics=accuracy:0.7633 | |
INFO:root:[Epoch 1 Batch 10100/12277] loss=0.4888, lr=0.0000403, metrics=accuracy:0.7637 | |
INFO:root:[Epoch 1 Batch 10200/12277] loss=0.5006, lr=0.0000402, metrics=accuracy:0.7641 | |
INFO:root:[Epoch 1 Batch 10300/12277] loss=0.4961, lr=0.0000400, metrics=accuracy:0.7645 | |
INFO:root:[Epoch 1 Batch 10400/12277] loss=0.4925, lr=0.0000399, metrics=accuracy:0.7649 | |
INFO:root:[Epoch 1 Batch 10500/12277] loss=0.4924, lr=0.0000397, metrics=accuracy:0.7653 | |
INFO:root:[Epoch 1 Batch 10600/12277] loss=0.4956, lr=0.0000396, metrics=accuracy:0.7657 | |
INFO:root:[Epoch 1 Batch 10700/12277] loss=0.4858, lr=0.0000394, metrics=accuracy:0.7660 | |
INFO:root:[Epoch 1 Batch 10800/12277] loss=0.5099, lr=0.0000393, metrics=accuracy:0.7663 | |
INFO:root:[Epoch 1 Batch 10900/12277] loss=0.5028, lr=0.0000391, metrics=accuracy:0.7666 | |
INFO:root:[Epoch 1 Batch 11000/12277] loss=0.4804, lr=0.0000390, metrics=accuracy:0.7670 | |
INFO:root:[Epoch 1 Batch 11100/12277] loss=0.4793, lr=0.0000388, metrics=accuracy:0.7675 | |
INFO:root:[Epoch 1 Batch 11200/12277] loss=0.4859, lr=0.0000387, metrics=accuracy:0.7679 | |
INFO:root:[Epoch 1 Batch 11300/12277] loss=0.4899, lr=0.0000385, metrics=accuracy:0.7683 | |
INFO:root:[Epoch 1 Batch 11400/12277] loss=0.4935, lr=0.0000384, metrics=accuracy:0.7686 | |
INFO:root:[Epoch 1 Batch 11500/12277] loss=0.4863, lr=0.0000382, metrics=accuracy:0.7690 | |
INFO:root:[Epoch 1 Batch 11600/12277] loss=0.4926, lr=0.0000381, metrics=accuracy:0.7693 | |
INFO:root:[Epoch 1 Batch 11700/12277] loss=0.4809, lr=0.0000379, metrics=accuracy:0.7696 | |
INFO:root:[Epoch 1 Batch 11800/12277] loss=0.4947, lr=0.0000377, metrics=accuracy:0.7699 | |
INFO:root:[Epoch 1 Batch 11900/12277] loss=0.4842, lr=0.0000376, metrics=accuracy:0.7702 | |
INFO:root:[Epoch 1 Batch 12000/12277] loss=0.4901, lr=0.0000374, metrics=accuracy:0.7705 | |
INFO:root:[Epoch 1 Batch 12100/12277] loss=0.4987, lr=0.0000373, metrics=accuracy:0.7709 | |
INFO:root:[Epoch 1 Batch 12200/12277] loss=0.4914, lr=0.0000371, metrics=accuracy:0.7711 | |
INFO:root:On MNLI Matched: | |
INFO:root:validation metrics:accuracy:0.8175 | |
INFO:root:On MNLI Mismatched: | |
INFO:root:validation metrics:accuracy:0.8247 | |
INFO:root:params saved in : ./output_dir/model_bert_MNLI_0.params | |
INFO:root:Time cost=2930.7s | |
INFO:root:[Epoch 2 Batch 100/12277] loss=0.3645, lr=0.0000369, metrics=accuracy:0.8569 | |
INFO:root:[Epoch 2 Batch 200/12277] loss=0.3640, lr=0.0000367, metrics=accuracy:0.8609 | |
INFO:root:[Epoch 2 Batch 300/12277] loss=0.3761, lr=0.0000366, metrics=accuracy:0.8609 | |
INFO:root:[Epoch 2 Batch 400/12277] loss=0.3780, lr=0.0000364, metrics=accuracy:0.8606 | |
INFO:root:[Epoch 2 Batch 500/12277] loss=0.3951, lr=0.0000363, metrics=accuracy:0.8580 | |
INFO:root:[Epoch 2 Batch 600/12277] loss=0.3729, lr=0.0000361, metrics=accuracy:0.8578 | |
INFO:root:[Epoch 2 Batch 700/12277] loss=0.3841, lr=0.0000360, metrics=accuracy:0.8572 | |
INFO:root:[Epoch 2 Batch 800/12277] loss=0.3800, lr=0.0000358, metrics=accuracy:0.8573 | |
INFO:root:[Epoch 2 Batch 900/12277] loss=0.4055, lr=0.0000357, metrics=accuracy:0.8557 | |
INFO:root:[Epoch 2 Batch 1000/12277] loss=0.3628, lr=0.0000355, metrics=accuracy:0.8563 | |
INFO:root:[Epoch 2 Batch 1100/12277] loss=0.3732, lr=0.0000354, metrics=accuracy:0.8561 | |
INFO:root:[Epoch 2 Batch 1200/12277] loss=0.3845, lr=0.0000352, metrics=accuracy:0.8555 | |
INFO:root:[Epoch 2 Batch 1300/12277] loss=0.3997, lr=0.0000351, metrics=accuracy:0.8552 | |
INFO:root:[Epoch 2 Batch 1400/12277] loss=0.3712, lr=0.0000349, metrics=accuracy:0.8557 | |
INFO:root:[Epoch 2 Batch 1500/12277] loss=0.3610, lr=0.0000348, metrics=accuracy:0.8563 | |
INFO:root:[Epoch 2 Batch 1600/12277] loss=0.3803, lr=0.0000346, metrics=accuracy:0.8564 | |
INFO:root:[Epoch 2 Batch 1700/12277] loss=0.3863, lr=0.0000345, metrics=accuracy:0.8560 | |
INFO:root:[Epoch 2 Batch 1800/12277] loss=0.3986, lr=0.0000343, metrics=accuracy:0.8556 | |
INFO:root:[Epoch 2 Batch 1900/12277] loss=0.3859, lr=0.0000342, metrics=accuracy:0.8557 | |
INFO:root:[Epoch 2 Batch 2000/12277] loss=0.3711, lr=0.0000340, metrics=accuracy:0.8557 | |
INFO:root:[Epoch 2 Batch 2100/12277] loss=0.3509, lr=0.0000339, metrics=accuracy:0.8564 | |
INFO:root:[Epoch 2 Batch 2200/12277] loss=0.3872, lr=0.0000337, metrics=accuracy:0.8561 | |
INFO:root:[Epoch 2 Batch 2300/12277] loss=0.3888, lr=0.0000336, metrics=accuracy:0.8560 | |
INFO:root:[Epoch 2 Batch 2400/12277] loss=0.3621, lr=0.0000334, metrics=accuracy:0.8562 | |
INFO:root:[Epoch 2 Batch 2500/12277] loss=0.4020, lr=0.0000333, metrics=accuracy:0.8559 | |
INFO:root:[Epoch 2 Batch 2600/12277] loss=0.3661, lr=0.0000331, metrics=accuracy:0.8560 | |
INFO:root:[Epoch 2 Batch 2700/12277] loss=0.3716, lr=0.0000330, metrics=accuracy:0.8563 | |
INFO:root:[Epoch 2 Batch 2800/12277] loss=0.3582, lr=0.0000328, metrics=accuracy:0.8564 | |
INFO:root:[Epoch 2 Batch 2900/12277] loss=0.3753, lr=0.0000327, metrics=accuracy:0.8566 | |
INFO:root:[Epoch 2 Batch 3000/12277] loss=0.3692, lr=0.0000325, metrics=accuracy:0.8569 | |
INFO:root:[Epoch 2 Batch 3100/12277] loss=0.3805, lr=0.0000324, metrics=accuracy:0.8568 | |
INFO:root:[Epoch 2 Batch 3200/12277] loss=0.4049, lr=0.0000322, metrics=accuracy:0.8565 | |
INFO:root:[Epoch 2 Batch 3300/12277] loss=0.3686, lr=0.0000320, metrics=accuracy:0.8567 | |
INFO:root:[Epoch 2 Batch 3400/12277] loss=0.3758, lr=0.0000319, metrics=accuracy:0.8567 | |
INFO:root:[Epoch 2 Batch 3500/12277] loss=0.3626, lr=0.0000317, metrics=accuracy:0.8569 | |
INFO:root:[Epoch 2 Batch 3600/12277] loss=0.3679, lr=0.0000316, metrics=accuracy:0.8570 | |
INFO:root:[Epoch 2 Batch 3700/12277] loss=0.3606, lr=0.0000314, metrics=accuracy:0.8571 | |
INFO:root:[Epoch 2 Batch 3800/12277] loss=0.3941, lr=0.0000313, metrics=accuracy:0.8569 | |
INFO:root:[Epoch 2 Batch 3900/12277] loss=0.3638, lr=0.0000311, metrics=accuracy:0.8570 | |
INFO:root:[Epoch 2 Batch 4000/12277] loss=0.3755, lr=0.0000310, metrics=accuracy:0.8570 | |
INFO:root:[Epoch 2 Batch 4100/12277] loss=0.3819, lr=0.0000308, metrics=accuracy:0.8570 | |
INFO:root:[Epoch 2 Batch 4200/12277] loss=0.3784, lr=0.0000307, metrics=accuracy:0.8569 | |
INFO:root:[Epoch 2 Batch 4300/12277] loss=0.3730, lr=0.0000305, metrics=accuracy:0.8570 | |
INFO:root:[Epoch 2 Batch 4400/12277] loss=0.3714, lr=0.0000304, metrics=accuracy:0.8571 | |
INFO:root:[Epoch 2 Batch 4500/12277] loss=0.3925, lr=0.0000302, metrics=accuracy:0.8570 | |
INFO:root:[Epoch 2 Batch 4600/12277] loss=0.3852, lr=0.0000301, metrics=accuracy:0.8570 | |
INFO:root:[Epoch 2 Batch 4700/12277] loss=0.3580, lr=0.0000299, metrics=accuracy:0.8573 | |
INFO:root:[Epoch 2 Batch 4800/12277] loss=0.3778, lr=0.0000298, metrics=accuracy:0.8572 | |
INFO:root:[Epoch 2 Batch 4900/12277] loss=0.3706, lr=0.0000296, metrics=accuracy:0.8572 | |
INFO:root:[Epoch 2 Batch 5000/12277] loss=0.3776, lr=0.0000295, metrics=accuracy:0.8573 | |
INFO:root:[Epoch 2 Batch 5100/12277] loss=0.3687, lr=0.0000293, metrics=accuracy:0.8573 | |
INFO:root:[Epoch 2 Batch 5200/12277] loss=0.3880, lr=0.0000292, metrics=accuracy:0.8573 | |
INFO:root:[Epoch 2 Batch 5300/12277] loss=0.3875, lr=0.0000290, metrics=accuracy:0.8571 | |
INFO:root:[Epoch 2 Batch 5400/12277] loss=0.3697, lr=0.0000289, metrics=accuracy:0.8572 | |
INFO:root:[Epoch 2 Batch 5500/12277] loss=0.3629, lr=0.0000287, metrics=accuracy:0.8572 | |
INFO:root:[Epoch 2 Batch 5600/12277] loss=0.3942, lr=0.0000286, metrics=accuracy:0.8572 | |
INFO:root:[Epoch 2 Batch 5700/12277] loss=0.3857, lr=0.0000284, metrics=accuracy:0.8571 | |
INFO:root:[Epoch 2 Batch 5800/12277] loss=0.3791, lr=0.0000283, metrics=accuracy:0.8571 | |
INFO:root:[Epoch 2 Batch 5900/12277] loss=0.3851, lr=0.0000281, metrics=accuracy:0.8571 | |
INFO:root:[Epoch 2 Batch 6000/12277] loss=0.3544, lr=0.0000280, metrics=accuracy:0.8571 | |
INFO:root:[Epoch 2 Batch 6100/12277] loss=0.3752, lr=0.0000278, metrics=accuracy:0.8571 | |
INFO:root:[Epoch 2 Batch 6200/12277] loss=0.3743, lr=0.0000277, metrics=accuracy:0.8571 | |
INFO:root:[Epoch 2 Batch 6300/12277] loss=0.3695, lr=0.0000275, metrics=accuracy:0.8572 | |
INFO:root:[Epoch 2 Batch 6400/12277] loss=0.3647, lr=0.0000274, metrics=accuracy:0.8573 | |
INFO:root:[Epoch 2 Batch 6500/12277] loss=0.3683, lr=0.0000272, metrics=accuracy:0.8574 | |
INFO:root:[Epoch 2 Batch 6600/12277] loss=0.3642, lr=0.0000271, metrics=accuracy:0.8574 | |
INFO:root:[Epoch 2 Batch 6700/12277] loss=0.3791, lr=0.0000269, metrics=accuracy:0.8574 | |
INFO:root:[Epoch 2 Batch 6800/12277] loss=0.3630, lr=0.0000268, metrics=accuracy:0.8573 | |
INFO:root:[Epoch 2 Batch 6900/12277] loss=0.3672, lr=0.0000266, metrics=accuracy:0.8574 | |
INFO:root:[Epoch 2 Batch 7000/12277] loss=0.3749, lr=0.0000265, metrics=accuracy:0.8574 | |
INFO:root:[Epoch 2 Batch 7100/12277] loss=0.3806, lr=0.0000263, metrics=accuracy:0.8573 | |
INFO:root:[Epoch 2 Batch 7200/12277] loss=0.3699, lr=0.0000262, metrics=accuracy:0.8574 | |
INFO:root:[Epoch 2 Batch 7300/12277] loss=0.3898, lr=0.0000260, metrics=accuracy:0.8573 | |
INFO:root:[Epoch 2 Batch 7400/12277] loss=0.3465, lr=0.0000259, metrics=accuracy:0.8574 | |
INFO:root:[Epoch 2 Batch 7500/12277] loss=0.3575, lr=0.0000257, metrics=accuracy:0.8576 | |
INFO:root:[Epoch 2 Batch 7600/12277] loss=0.3430, lr=0.0000256, metrics=accuracy:0.8577 | |
INFO:root:[Epoch 2 Batch 7700/12277] loss=0.3574, lr=0.0000254, metrics=accuracy:0.8577 | |
INFO:root:[Epoch 2 Batch 7800/12277] loss=0.3755, lr=0.0000253, metrics=accuracy:0.8577 | |
INFO:root:[Epoch 2 Batch 7900/12277] loss=0.3786, lr=0.0000251, metrics=accuracy:0.8576 | |
INFO:root:[Epoch 2 Batch 8000/12277] loss=0.3562, lr=0.0000250, metrics=accuracy:0.8577 | |
INFO:root:[Epoch 2 Batch 8100/12277] loss=0.3847, lr=0.0000248, metrics=accuracy:0.8577 | |
INFO:root:[Epoch 2 Batch 8200/12277] loss=0.3748, lr=0.0000247, metrics=accuracy:0.8576 | |
INFO:root:[Epoch 2 Batch 8300/12277] loss=0.3725, lr=0.0000245, metrics=accuracy:0.8577 | |
INFO:root:[Epoch 2 Batch 8400/12277] loss=0.3467, lr=0.0000244, metrics=accuracy:0.8577 | |
INFO:root:[Epoch 2 Batch 8500/12277] loss=0.3671, lr=0.0000242, metrics=accuracy:0.8578 | |
INFO:root:[Epoch 2 Batch 8600/12277] loss=0.3636, lr=0.0000241, metrics=accuracy:0.8578 | |
INFO:root:[Epoch 2 Batch 8700/12277] loss=0.3905, lr=0.0000239, metrics=accuracy:0.8577 | |
INFO:root:[Epoch 2 Batch 8800/12277] loss=0.3660, lr=0.0000237, metrics=accuracy:0.8578 | |
INFO:root:[Epoch 2 Batch 8900/12277] loss=0.3508, lr=0.0000236, metrics=accuracy:0.8580 | |
INFO:root:[Epoch 2 Batch 9000/12277] loss=0.3482, lr=0.0000234, metrics=accuracy:0.8581 | |
INFO:root:[Epoch 2 Batch 9100/12277] loss=0.3469, lr=0.0000233, metrics=accuracy:0.8582 | |
INFO:root:[Epoch 2 Batch 9200/12277] loss=0.3583, lr=0.0000231, metrics=accuracy:0.8583 | |
INFO:root:[Epoch 2 Batch 9300/12277] loss=0.3864, lr=0.0000230, metrics=accuracy:0.8583 | |
INFO:root:[Epoch 2 Batch 9400/12277] loss=0.3533, lr=0.0000228, metrics=accuracy:0.8584 | |
INFO:root:[Epoch 2 Batch 9500/12277] loss=0.3712, lr=0.0000227, metrics=accuracy:0.8584 | |
INFO:root:[Epoch 2 Batch 9600/12277] loss=0.3501, lr=0.0000225, metrics=accuracy:0.8585 | |
INFO:root:[Epoch 2 Batch 9700/12277] loss=0.3579, lr=0.0000224, metrics=accuracy:0.8586 | |
INFO:root:[Epoch 2 Batch 9800/12277] loss=0.3552, lr=0.0000222, metrics=accuracy:0.8586 | |
INFO:root:[Epoch 2 Batch 9900/12277] loss=0.3855, lr=0.0000221, metrics=accuracy:0.8586 | |
INFO:root:[Epoch 2 Batch 10000/12277] loss=0.3719, lr=0.0000219, metrics=accuracy:0.8585 | |
INFO:root:[Epoch 2 Batch 10100/12277] loss=0.3641, lr=0.0000218, metrics=accuracy:0.8586 | |
INFO:root:[Epoch 2 Batch 10200/12277] loss=0.3679, lr=0.0000216, metrics=accuracy:0.8586 | |
INFO:root:[Epoch 2 Batch 10300/12277] loss=0.3640, lr=0.0000215, metrics=accuracy:0.8586 | |
INFO:root:[Epoch 2 Batch 10400/12277] loss=0.3541, lr=0.0000213, metrics=accuracy:0.8587 | |
INFO:root:[Epoch 2 Batch 10500/12277] loss=0.3520, lr=0.0000212, metrics=accuracy:0.8588 | |
INFO:root:[Epoch 2 Batch 10600/12277] loss=0.3591, lr=0.0000210, metrics=accuracy:0.8588 | |
INFO:root:[Epoch 2 Batch 10700/12277] loss=0.3589, lr=0.0000209, metrics=accuracy:0.8588 | |
INFO:root:[Epoch 2 Batch 10800/12277] loss=0.3673, lr=0.0000207, metrics=accuracy:0.8589 | |
INFO:root:[Epoch 2 Batch 10900/12277] loss=0.3600, lr=0.0000206, metrics=accuracy:0.8589 | |
INFO:root:[Epoch 2 Batch 11000/12277] loss=0.3375, lr=0.0000204, metrics=accuracy:0.8590 | |
INFO:root:[Epoch 2 Batch 11100/12277] loss=0.3672, lr=0.0000203, metrics=accuracy:0.8590 | |
INFO:root:[Epoch 2 Batch 11200/12277] loss=0.3659, lr=0.0000201, metrics=accuracy:0.8590 | |
INFO:root:[Epoch 2 Batch 11300/12277] loss=0.3571, lr=0.0000200, metrics=accuracy:0.8591 | |
INFO:root:[Epoch 2 Batch 11400/12277] loss=0.3400, lr=0.0000198, metrics=accuracy:0.8592 | |
INFO:root:[Epoch 2 Batch 11500/12277] loss=0.3765, lr=0.0000197, metrics=accuracy:0.8591 | |
INFO:root:[Epoch 2 Batch 11600/12277] loss=0.3333, lr=0.0000195, metrics=accuracy:0.8593 | |
INFO:root:[Epoch 2 Batch 11700/12277] loss=0.3653, lr=0.0000194, metrics=accuracy:0.8593 | |
INFO:root:[Epoch 2 Batch 11800/12277] loss=0.3681, lr=0.0000192, metrics=accuracy:0.8593 | |
INFO:root:[Epoch 2 Batch 11900/12277] loss=0.3489, lr=0.0000191, metrics=accuracy:0.8595 | |
INFO:root:[Epoch 2 Batch 12000/12277] loss=0.3388, lr=0.0000189, metrics=accuracy:0.8596 | |
INFO:root:[Epoch 2 Batch 12100/12277] loss=0.3524, lr=0.0000188, metrics=accuracy:0.8596 | |
INFO:root:[Epoch 2 Batch 12200/12277] loss=0.3570, lr=0.0000186, metrics=accuracy:0.8597 | |
INFO:root:On MNLI Matched: | |
INFO:root:validation metrics:accuracy:0.8395 | |
INFO:root:On MNLI Mismatched: | |
INFO:root:validation metrics:accuracy:0.8382 | |
INFO:root:params saved in : ./output_dir/model_bert_MNLI_1.params | |
INFO:root:Time cost=2935.2s | |
INFO:root:[Epoch 3 Batch 100/12277] loss=0.2328, lr=0.0000184, metrics=accuracy:0.9181 | |
INFO:root:[Epoch 3 Batch 200/12277] loss=0.2344, lr=0.0000182, metrics=accuracy:0.9175 | |
INFO:root:[Epoch 3 Batch 300/12277] loss=0.2430, lr=0.0000180, metrics=accuracy:0.9172 | |
INFO:root:[Epoch 3 Batch 400/12277] loss=0.2280, lr=0.0000179, metrics=accuracy:0.9177 | |
INFO:root:[Epoch 3 Batch 500/12277] loss=0.2250, lr=0.0000177, metrics=accuracy:0.9183 | |
INFO:root:[Epoch 3 Batch 600/12277] loss=0.2307, lr=0.0000176, metrics=accuracy:0.9185 | |
INFO:root:[Epoch 3 Batch 700/12277] loss=0.2347, lr=0.0000174, metrics=accuracy:0.9181 | |
INFO:root:[Epoch 3 Batch 800/12277] loss=0.2110, lr=0.0000173, metrics=accuracy:0.9182 | |
INFO:root:[Epoch 3 Batch 900/12277] loss=0.2155, lr=0.0000171, metrics=accuracy:0.9189 | |
INFO:root:[Epoch 3 Batch 1000/12277] loss=0.2089, lr=0.0000170, metrics=accuracy:0.9197 | |
INFO:root:[Epoch 3 Batch 1100/12277] loss=0.2557, lr=0.0000168, metrics=accuracy:0.9191 | |
INFO:root:[Epoch 3 Batch 1200/12277] loss=0.2261, lr=0.0000167, metrics=accuracy:0.9191 | |
INFO:root:[Epoch 3 Batch 1300/12277] loss=0.2287, lr=0.0000165, metrics=accuracy:0.9191 | |
INFO:root:[Epoch 3 Batch 1400/12277] loss=0.2201, lr=0.0000164, metrics=accuracy:0.9196 | |
INFO:root:[Epoch 3 Batch 1500/12277] loss=0.2453, lr=0.0000162, metrics=accuracy:0.9190 | |
INFO:root:[Epoch 3 Batch 1600/12277] loss=0.2214, lr=0.0000161, metrics=accuracy:0.9192 | |
INFO:root:[Epoch 3 Batch 1700/12277] loss=0.2198, lr=0.0000159, metrics=accuracy:0.9191 | |
INFO:root:[Epoch 3 Batch 1800/12277] loss=0.2280, lr=0.0000158, metrics=accuracy:0.9189 | |
INFO:root:[Epoch 3 Batch 1900/12277] loss=0.2238, lr=0.0000156, metrics=accuracy:0.9190 | |
INFO:root:[Epoch 3 Batch 2000/12277] loss=0.2208, lr=0.0000155, metrics=accuracy:0.9191 | |
INFO:root:[Epoch 3 Batch 2100/12277] loss=0.2457, lr=0.0000153, metrics=accuracy:0.9187 | |
INFO:root:[Epoch 3 Batch 2200/12277] loss=0.2342, lr=0.0000152, metrics=accuracy:0.9185 | |
INFO:root:[Epoch 3 Batch 2300/12277] loss=0.2239, lr=0.0000150, metrics=accuracy:0.9188 | |
INFO:root:[Epoch 3 Batch 2400/12277] loss=0.2384, lr=0.0000149, metrics=accuracy:0.9186 | |
INFO:root:[Epoch 3 Batch 2500/12277] loss=0.2200, lr=0.0000147, metrics=accuracy:0.9188 | |
INFO:root:[Epoch 3 Batch 2600/12277] loss=0.2378, lr=0.0000146, metrics=accuracy:0.9186 | |
INFO:root:[Epoch 3 Batch 2700/12277] loss=0.2300, lr=0.0000144, metrics=accuracy:0.9186 | |
INFO:root:[Epoch 3 Batch 2800/12277] loss=0.2305, lr=0.0000143, metrics=accuracy:0.9185 | |
INFO:root:[Epoch 3 Batch 2900/12277] loss=0.2272, lr=0.0000141, metrics=accuracy:0.9186 | |
INFO:root:[Epoch 3 Batch 3000/12277] loss=0.2408, lr=0.0000140, metrics=accuracy:0.9185 | |
INFO:root:[Epoch 3 Batch 3100/12277] loss=0.2409, lr=0.0000138, metrics=accuracy:0.9185 | |
INFO:root:[Epoch 3 Batch 3200/12277] loss=0.2339, lr=0.0000137, metrics=accuracy:0.9184 | |
INFO:root:[Epoch 3 Batch 3300/12277] loss=0.2373, lr=0.0000135, metrics=accuracy:0.9183 | |
INFO:root:[Epoch 3 Batch 3400/12277] loss=0.2138, lr=0.0000134, metrics=accuracy:0.9184 | |
INFO:root:[Epoch 3 Batch 3500/12277] loss=0.2237, lr=0.0000132, metrics=accuracy:0.9186 | |
INFO:root:[Epoch 3 Batch 3600/12277] loss=0.2299, lr=0.0000131, metrics=accuracy:0.9185 | |
INFO:root:[Epoch 3 Batch 3700/12277] loss=0.2241, lr=0.0000129, metrics=accuracy:0.9185 | |
INFO:root:[Epoch 3 Batch 3800/12277] loss=0.2216, lr=0.0000128, metrics=accuracy:0.9187 | |
INFO:root:[Epoch 3 Batch 3900/12277] loss=0.2292, lr=0.0000126, metrics=accuracy:0.9186 | |
INFO:root:[Epoch 3 Batch 4000/12277] loss=0.2251, lr=0.0000125, metrics=accuracy:0.9186 | |
INFO:root:[Epoch 3 Batch 4100/12277] loss=0.2345, lr=0.0000123, metrics=accuracy:0.9184 | |
INFO:root:[Epoch 3 Batch 4200/12277] loss=0.2460, lr=0.0000122, metrics=accuracy:0.9183 | |
INFO:root:[Epoch 3 Batch 4300/12277] loss=0.2334, lr=0.0000120, metrics=accuracy:0.9183 | |
INFO:root:[Epoch 3 Batch 4400/12277] loss=0.2179, lr=0.0000119, metrics=accuracy:0.9183 | |
INFO:root:[Epoch 3 Batch 4500/12277] loss=0.2070, lr=0.0000117, metrics=accuracy:0.9186 | |
INFO:root:[Epoch 3 Batch 4600/12277] loss=0.2398, lr=0.0000116, metrics=accuracy:0.9185 | |
INFO:root:[Epoch 3 Batch 4700/12277] loss=0.2377, lr=0.0000114, metrics=accuracy:0.9184 | |
INFO:root:[Epoch 3 Batch 4800/12277] loss=0.2102, lr=0.0000113, metrics=accuracy:0.9186 | |
INFO:root:[Epoch 3 Batch 4900/12277] loss=0.2158, lr=0.0000111, metrics=accuracy:0.9187 | |
INFO:root:[Epoch 3 Batch 5000/12277] loss=0.2282, lr=0.0000110, metrics=accuracy:0.9188 | |
INFO:root:[Epoch 3 Batch 5100/12277] loss=0.2310, lr=0.0000108, metrics=accuracy:0.9188 | |
INFO:root:[Epoch 3 Batch 5200/12277] loss=0.2311, lr=0.0000107, metrics=accuracy:0.9187 | |
INFO:root:[Epoch 3 Batch 5300/12277] loss=0.2278, lr=0.0000105, metrics=accuracy:0.9188 | |
INFO:root:[Epoch 3 Batch 5400/12277] loss=0.2296, lr=0.0000104, metrics=accuracy:0.9188 | |
INFO:root:[Epoch 3 Batch 5500/12277] loss=0.2295, lr=0.0000102, metrics=accuracy:0.9189 | |
INFO:root:[Epoch 3 Batch 5600/12277] loss=0.2081, lr=0.0000101, metrics=accuracy:0.9191 | |
INFO:root:[Epoch 3 Batch 5700/12277] loss=0.2529, lr=0.0000099, metrics=accuracy:0.9190 | |
INFO:root:[Epoch 3 Batch 5800/12277] loss=0.2334, lr=0.0000097, metrics=accuracy:0.9189 | |
INFO:root:[Epoch 3 Batch 5900/12277] loss=0.2411, lr=0.0000096, metrics=accuracy:0.9189 | |
INFO:root:[Epoch 3 Batch 6000/12277] loss=0.1987, lr=0.0000094, metrics=accuracy:0.9190 | |
INFO:root:[Epoch 3 Batch 6100/12277] loss=0.2188, lr=0.0000093, metrics=accuracy:0.9191 | |
INFO:root:[Epoch 3 Batch 6200/12277] loss=0.2191, lr=0.0000091, metrics=accuracy:0.9192 | |
INFO:root:[Epoch 3 Batch 6300/12277] loss=0.2103, lr=0.0000090, metrics=accuracy:0.9193 | |
INFO:root:[Epoch 3 Batch 6400/12277] loss=0.2450, lr=0.0000088, metrics=accuracy:0.9192 | |
INFO:root:[Epoch 3 Batch 6500/12277] loss=0.2293, lr=0.0000087, metrics=accuracy:0.9191 | |
INFO:root:[Epoch 3 Batch 6600/12277] loss=0.2226, lr=0.0000085, metrics=accuracy:0.9192 | |
INFO:root:[Epoch 3 Batch 6700/12277] loss=0.2109, lr=0.0000084, metrics=accuracy:0.9192 | |
INFO:root:[Epoch 3 Batch 6800/12277] loss=0.2266, lr=0.0000082, metrics=accuracy:0.9192 | |
INFO:root:[Epoch 3 Batch 6900/12277] loss=0.2132, lr=0.0000081, metrics=accuracy:0.9193 | |
INFO:root:[Epoch 3 Batch 7000/12277] loss=0.1935, lr=0.0000079, metrics=accuracy:0.9196 | |
INFO:root:[Epoch 3 Batch 7100/12277] loss=0.2177, lr=0.0000078, metrics=accuracy:0.9196 | |
INFO:root:[Epoch 3 Batch 7200/12277] loss=0.2292, lr=0.0000076, metrics=accuracy:0.9196 | |
INFO:root:[Epoch 3 Batch 7300/12277] loss=0.2168, lr=0.0000075, metrics=accuracy:0.9196 | |
INFO:root:[Epoch 3 Batch 7400/12277] loss=0.2034, lr=0.0000073, metrics=accuracy:0.9197 | |
INFO:root:[Epoch 3 Batch 7500/12277] loss=0.2171, lr=0.0000072, metrics=accuracy:0.9197 | |
INFO:root:[Epoch 3 Batch 7600/12277] loss=0.2171, lr=0.0000070, metrics=accuracy:0.9197 | |
INFO:root:[Epoch 3 Batch 7700/12277] loss=0.2264, lr=0.0000069, metrics=accuracy:0.9197 | |
INFO:root:[Epoch 3 Batch 7800/12277] loss=0.2149, lr=0.0000067, metrics=accuracy:0.9198 | |
INFO:root:[Epoch 3 Batch 7900/12277] loss=0.2406, lr=0.0000066, metrics=accuracy:0.9197 | |
INFO:root:[Epoch 3 Batch 8000/12277] loss=0.2121, lr=0.0000064, metrics=accuracy:0.9198 | |
INFO:root:[Epoch 3 Batch 8100/12277] loss=0.2215, lr=0.0000063, metrics=accuracy:0.9198 | |
INFO:root:[Epoch 3 Batch 8200/12277] loss=0.2263, lr=0.0000061, metrics=accuracy:0.9198 | |
INFO:root:[Epoch 3 Batch 8300/12277] loss=0.2084, lr=0.0000060, metrics=accuracy:0.9199 | |
INFO:root:[Epoch 3 Batch 8400/12277] loss=0.2263, lr=0.0000058, metrics=accuracy:0.9199 | |
INFO:root:[Epoch 3 Batch 8500/12277] loss=0.2073, lr=0.0000057, metrics=accuracy:0.9200 | |
INFO:root:[Epoch 3 Batch 8600/12277] loss=0.2363, lr=0.0000055, metrics=accuracy:0.9198 | |
INFO:root:[Epoch 3 Batch 8700/12277] loss=0.2210, lr=0.0000054, metrics=accuracy:0.9199 | |
INFO:root:[Epoch 3 Batch 8800/12277] loss=0.2073, lr=0.0000052, metrics=accuracy:0.9199 | |
INFO:root:[Epoch 3 Batch 8900/12277] loss=0.2272, lr=0.0000051, metrics=accuracy:0.9199 | |
INFO:root:[Epoch 3 Batch 9000/12277] loss=0.1974, lr=0.0000049, metrics=accuracy:0.9201 | |
INFO:root:[Epoch 3 Batch 9100/12277] loss=0.1935, lr=0.0000048, metrics=accuracy:0.9202 | |
INFO:root:[Epoch 3 Batch 9200/12277] loss=0.2173, lr=0.0000046, metrics=accuracy:0.9202 | |
INFO:root:[Epoch 3 Batch 9300/12277] loss=0.2246, lr=0.0000045, metrics=accuracy:0.9203 | |
INFO:root:[Epoch 3 Batch 9400/12277] loss=0.2251, lr=0.0000043, metrics=accuracy:0.9203 | |
INFO:root:[Epoch 3 Batch 9500/12277] loss=0.1962, lr=0.0000042, metrics=accuracy:0.9204 | |
INFO:root:[Epoch 3 Batch 9600/12277] loss=0.2124, lr=0.0000040, metrics=accuracy:0.9205 | |
INFO:root:[Epoch 3 Batch 9700/12277] loss=0.2150, lr=0.0000039, metrics=accuracy:0.9205 | |
INFO:root:[Epoch 3 Batch 9800/12277] loss=0.2148, lr=0.0000037, metrics=accuracy:0.9205 | |
INFO:root:[Epoch 3 Batch 9900/12277] loss=0.2296, lr=0.0000036, metrics=accuracy:0.9204 | |
INFO:root:[Epoch 3 Batch 10000/12277] loss=0.2131, lr=0.0000034, metrics=accuracy:0.9205 | |
INFO:root:[Epoch 3 Batch 10100/12277] loss=0.2276, lr=0.0000033, metrics=accuracy:0.9205 | |
INFO:root:[Epoch 3 Batch 10200/12277] loss=0.2342, lr=0.0000031, metrics=accuracy:0.9204 | |
INFO:root:[Epoch 3 Batch 10300/12277] loss=0.2104, lr=0.0000030, metrics=accuracy:0.9205 | |
INFO:root:[Epoch 3 Batch 10400/12277] loss=0.2103, lr=0.0000028, metrics=accuracy:0.9205 | |
INFO:root:[Epoch 3 Batch 10500/12277] loss=0.2147, lr=0.0000027, metrics=accuracy:0.9206 | |
INFO:root:[Epoch 3 Batch 10600/12277] loss=0.2185, lr=0.0000025, metrics=accuracy:0.9206 | |
INFO:root:[Epoch 3 Batch 10700/12277] loss=0.2136, lr=0.0000024, metrics=accuracy:0.9206 | |
INFO:root:[Epoch 3 Batch 10800/12277] loss=0.2129, lr=0.0000022, metrics=accuracy:0.9206 | |
INFO:root:[Epoch 3 Batch 10900/12277] loss=0.2212, lr=0.0000021, metrics=accuracy:0.9206 | |
INFO:root:[Epoch 3 Batch 11000/12277] loss=0.2069, lr=0.0000019, metrics=accuracy:0.9207 | |
INFO:root:[Epoch 3 Batch 11100/12277] loss=0.2302, lr=0.0000018, metrics=accuracy:0.9207 | |
INFO:root:[Epoch 3 Batch 11200/12277] loss=0.2160, lr=0.0000016, metrics=accuracy:0.9207 | |
INFO:root:[Epoch 3 Batch 11300/12277] loss=0.1919, lr=0.0000015, metrics=accuracy:0.9207 | |
INFO:root:[Epoch 3 Batch 11400/12277] loss=0.2249, lr=0.0000013, metrics=accuracy:0.9207 | |
INFO:root:[Epoch 3 Batch 11500/12277] loss=0.1948, lr=0.0000011, metrics=accuracy:0.9208 | |
INFO:root:[Epoch 3 Batch 11600/12277] loss=0.2044, lr=0.0000010, metrics=accuracy:0.9209 | |
INFO:root:[Epoch 3 Batch 11700/12277] loss=0.1999, lr=0.0000008, metrics=accuracy:0.9209 | |
INFO:root:[Epoch 3 Batch 11800/12277] loss=0.1921, lr=0.0000007, metrics=accuracy:0.9210 | |
INFO:root:[Epoch 3 Batch 11900/12277] loss=0.2376, lr=0.0000005, metrics=accuracy:0.9210 | |
INFO:root:[Epoch 3 Batch 12000/12277] loss=0.2224, lr=0.0000004, metrics=accuracy:0.9209 | |
INFO:root:[Epoch 3 Batch 12100/12277] loss=0.2182, lr=0.0000002, metrics=accuracy:0.9210 | |
INFO:root:[Epoch 3 Batch 12200/12277] loss=0.2330, lr=0.0000001, metrics=accuracy:0.9209 | |
INFO:root:On MNLI Matched: | |
INFO:root:validation metrics:accuracy:0.8455 | |
INFO:root:On MNLI Mismatched: | |
INFO:root:validation metrics:accuracy:0.8466 | |
INFO:root:params saved in : ./output_dir/model_bert_MNLI_2.params | |
INFO:root:Time cost=2935.0s |