[09/15 08:06:47] libai INFO: Rank of current process: 0. World size: 8 [09/15 08:06:47] libai INFO: Command line arguments: Namespace(config_file='configs/swin_imagenet.py', eval_only=False, fast_dev_run=False, opts=[], resume=False) [09/15 08:06:47] libai INFO: Contents of args.config_file=configs/swin_imagenet.py: from libai.config import LazyCall from .common.models.swin.swin_tiny_patch4_window7_224 import model from .common.models.graph import graph from .common.train import train from .common.optim import optim from .common.data.imagenet import dataloader from flowvision.data import Mixup from flowvision.loss.cross_entropy import SoftTargetCrossEntropy # Refine data path to imagenet dataloader.train.dataset[0].root = "/data/ImageNet/extract/" dataloader.test[0].dataset.root ="/data/ImageNet/extract/" # Add Mixup Func dataloader.train.mixup_func = LazyCall(Mixup)(  mixup_alpha=0.8,  cutmix_alpha=1.0,  prob=1.0,  switch_prob=0.5,  mode="batch",  num_classes=1000, ) # Refine model cfg for vit training on imagenet model.num_classes = 1000 model.loss_func = LazyCall(SoftTargetCrossEntropy)() # Refine optimizer cfg for vit model optim.lr = 1e-3 optim.eps = 1e-8 optim.weight_decay = 0.05 # Refine train cfg for vit model train.train_micro_batch_size = 128 train.test_micro_batch_size = 128 train.train_epoch = 300 train.warmup_ratio = 20 / 300 train.eval_period = 1562 train.log_period = 100 train.output_dir = "./commit_c43b" # Scheduler train.scheduler.warmup_factor = 0.001 train.scheduler.alpha = 0.01 train.scheduler.warmup_method = "linear" graph.enabled = True train.rdma_enabled = True # Set fp16 ON train.amp.enabled = True [09/15 08:06:47] libai INFO: Full config saved to ./commit_c43b/config.yaml [09/15 08:06:47] lb.engine.default INFO: > compiling dataset index builder ... [09/15 08:06:48] lb.engine.default INFO: >>> done with dataset index builder. Compilation time: 0.064 seconds [09/15 08:06:48] lb.engine.default INFO: >>> done with compiling. Compilation time: 0.065 seconds [09/15 08:06:48] lb.engine.default INFO: Prepare training, validating, testing set [09/15 08:06:51] lb.engine.default INFO: Prepare testing set [09/15 08:06:51] lb.engine.default INFO: Auto-scaling the config to train.train_iter=375342, train.warmup_iter=25023 [09/15 08:06:59] lb.engine.default INFO: Model: SwinTransformer( (patch_embed): PatchEmbed( (proj): Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4)) (norm): LayerNorm((96,), eps=1e-05, elementwise_affine=True) ) (pos_drop): Dropout(p=0.0, inplace=False) (layers): ModuleList( (0): BasicLayer( (blocks): ModuleList( (0): SwinTransformerBlock( (norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=96, out_features=288, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=96, out_features=96, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): Identity() (norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=96, out_features=384, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=384, out_features=96, bias=True, parallel=row) ) ) (1): SwinTransformerBlock( (norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=96, out_features=288, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=96, out_features=96, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=96, out_features=384, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=384, out_features=96, bias=True, parallel=row) ) ) ) (downsample): PatchMerging( (reduction): Linear1D(in_features=384, out_features=192, bias=False, parallel=data) (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True) ) ) (1): BasicLayer( (blocks): ModuleList( (0): SwinTransformerBlock( (norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=192, out_features=576, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=192, out_features=192, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=192, out_features=768, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=768, out_features=192, bias=True, parallel=row) ) ) (1): SwinTransformerBlock( (norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=192, out_features=576, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=192, out_features=192, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=192, out_features=768, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=768, out_features=192, bias=True, parallel=row) ) ) ) (downsample): PatchMerging( (reduction): Linear1D(in_features=768, out_features=384, bias=False, parallel=data) (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) ) (2): BasicLayer( (blocks): ModuleList( (0): SwinTransformerBlock( (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=384, out_features=1152, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=384, out_features=384, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=384, out_features=1536, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=1536, out_features=384, bias=True, parallel=row) ) ) (1): SwinTransformerBlock( (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=384, out_features=1152, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=384, out_features=384, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=384, out_features=1536, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=1536, out_features=384, bias=True, parallel=row) ) ) (2): SwinTransformerBlock( (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=384, out_features=1152, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=384, out_features=384, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=384, out_features=1536, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=1536, out_features=384, bias=True, parallel=row) ) ) (3): SwinTransformerBlock( (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=384, out_features=1152, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=384, out_features=384, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=384, out_features=1536, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=1536, out_features=384, bias=True, parallel=row) ) ) (4): SwinTransformerBlock( (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=384, out_features=1152, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=384, out_features=384, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=384, out_features=1536, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=1536, out_features=384, bias=True, parallel=row) ) ) (5): SwinTransformerBlock( (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=384, out_features=1152, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=384, out_features=384, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=384, out_features=1536, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=1536, out_features=384, bias=True, parallel=row) ) ) ) (downsample): PatchMerging( (reduction): Linear1D(in_features=1536, out_features=768, bias=False, parallel=data) (norm): LayerNorm((1536,), eps=1e-05, elementwise_affine=True) ) ) (3): BasicLayer( (blocks): ModuleList( (0): SwinTransformerBlock( (norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=768, out_features=2304, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=768, out_features=768, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=768, out_features=3072, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=3072, out_features=768, bias=True, parallel=row) ) ) (1): SwinTransformerBlock( (norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=768, out_features=2304, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=768, out_features=768, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=768, out_features=3072, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=3072, out_features=768, bias=True, parallel=row) ) ) ) ) ) (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (avgpool): AdaptiveAvgPool1d() (head): Linear1D(in_features=768, out_features=1000, bias=True, parallel=data) (loss_func): SoftTargetCrossEntropy() ) [09/15 08:06:59] lb.engine.trainer INFO: Starting training from iteration 0 [09/15 08:07:01] lb.models.utils.graph_base INFO: Start compling the train graph which may take some time. Please wait for a moment ... [09/15 08:07:59] lb.utils.events INFO: eta: 19:58:18 iteration: 99/375342 consumed_samples: 102400 total_loss: 6.976 time: 0.3196 s/iter data_time: 0.2026 s/iter total_throughput: 3204.10 samples/s lr: 4.91e-06 [09/15 08:08:32] lb.utils.events INFO: eta: 19:57:59 iteration: 199/375342 consumed_samples: 204800 total_loss: 6.945 time: 0.3251 s/iter data_time: 0.1805 s/iter total_throughput: 3149.68 samples/s lr: 8.86e-06 [09/15 08:09:05] lb.utils.events INFO: eta: 16:45:48 iteration: 299/375342 consumed_samples: 307200 total_loss: 6.911 time: 0.3263 s/iter data_time: 0.1889 s/iter total_throughput: 3138.23 samples/s lr: 1.28e-05 [09/15 08:09:38] lb.utils.events INFO: eta: 16:00:26 iteration: 399/375342 consumed_samples: 409600 total_loss: 6.884 time: 0.3269 s/iter data_time: 0.1842 s/iter total_throughput: 3132.06 samples/s lr: 1.68e-05 [09/15 08:10:11] lb.utils.events INFO: eta: 15:51:58 iteration: 499/375342 consumed_samples: 512000 total_loss: 6.862 time: 0.3284 s/iter data_time: 0.1921 s/iter total_throughput: 3118.38 samples/s lr: 2.07e-05 [09/15 08:10:44] lb.utils.events INFO: eta: 15:48:34 iteration: 599/375342 consumed_samples: 614400 total_loss: 6.84 time: 0.3283 s/iter data_time: 0.1831 s/iter total_throughput: 3119.11 samples/s lr: 2.47e-05 [09/15 08:11:17] lb.utils.events INFO: eta: 15:50:05 iteration: 699/375342 consumed_samples: 716800 total_loss: 6.818 time: 0.3282 s/iter data_time: 0.1931 s/iter total_throughput: 3120.46 samples/s lr: 2.86e-05 [09/15 08:11:50] lb.utils.events INFO: eta: 15:56:42 iteration: 799/375342 consumed_samples: 819200 total_loss: 6.795 time: 0.3284 s/iter data_time: 0.1892 s/iter total_throughput: 3118.62 samples/s lr: 3.26e-05 [09/15 08:12:23] lb.utils.events INFO: eta: 15:57:28 iteration: 899/375342 consumed_samples: 921600 total_loss: 6.777 time: 0.3283 s/iter data_time: 0.1816 s/iter total_throughput: 3118.69 samples/s lr: 3.65e-05 [09/15 08:12:56] lb.utils.events INFO: eta: 15:59:40 iteration: 999/375342 consumed_samples: 1024000 total_loss: 6.749 time: 0.3285 s/iter data_time: 0.1760 s/iter total_throughput: 3116.79 samples/s lr: 4.05e-05 [09/15 08:13:29] lb.utils.events INFO: eta: 16:10:51 iteration: 1099/375342 consumed_samples: 1126400 total_loss: 6.723 time: 0.3285 s/iter data_time: 0.1904 s/iter total_throughput: 3116.95 samples/s lr: 4.44e-05 [09/15 08:14:02] lb.utils.events INFO: eta: 16:11:56 iteration: 1199/375342 consumed_samples: 1228800 total_loss: 6.688 time: 0.3285 s/iter data_time: 0.1792 s/iter total_throughput: 3117.50 samples/s lr: 4.83e-05 [09/15 08:14:34] lb.utils.events INFO: eta: 16:15:25 iteration: 1299/375342 consumed_samples: 1331200 total_loss: 6.649 time: 0.3285 s/iter data_time: 0.1796 s/iter total_throughput: 3117.11 samples/s lr: 5.23e-05 [09/15 08:15:08] lb.utils.events INFO: eta: 16:18:59 iteration: 1399/375342 consumed_samples: 1433600 total_loss: 6.616 time: 0.3287 s/iter data_time: 0.1892 s/iter total_throughput: 3114.89 samples/s lr: 5.62e-05 [09/15 08:15:41] lb.utils.events INFO: eta: 16:43:18 iteration: 1499/375342 consumed_samples: 1536000 total_loss: 6.58 time: 0.3289 s/iter data_time: 0.1931 s/iter total_throughput: 3113.16 samples/s lr: 6.02e-05 [09/15 08:16:14] lb.utils.events INFO: eta: 18:27:30 iteration: 1599/375342 consumed_samples: 1638400 total_loss: 6.545 time: 0.3289 s/iter data_time: 0.1816 s/iter total_throughput: 3113.54 samples/s lr: 6.41e-05 [09/15 08:16:47] lb.utils.events INFO: eta: 19:01:12 iteration: 1699/375342 consumed_samples: 1740800 total_loss: 6.52 time: 0.3289 s/iter data_time: 0.1865 s/iter total_throughput: 3113.14 samples/s lr: 6.81e-05 [09/15 08:17:19] lb.utils.events INFO: eta: 17:46:36 iteration: 1799/375342 consumed_samples: 1843200 total_loss: 6.493 time: 0.3289 s/iter data_time: 0.1908 s/iter total_throughput: 3113.45 samples/s lr: 7.20e-05 [09/15 08:17:53] lb.utils.events INFO: eta: 16:58:23 iteration: 1899/375342 consumed_samples: 1945600 total_loss: 6.458 time: 0.3291 s/iter data_time: 0.1920 s/iter total_throughput: 3111.46 samples/s lr: 7.60e-05 [09/15 08:18:26] lb.utils.events INFO: eta: 16:30:24 iteration: 1999/375342 consumed_samples: 2048000 total_loss: 6.434 time: 0.3294 s/iter data_time: 0.1915 s/iter total_throughput: 3109.11 samples/s lr: 7.99e-05 [09/15 08:18:59] lb.utils.events INFO: eta: 15:59:24 iteration: 2099/375342 consumed_samples: 2150400 total_loss: 6.408 time: 0.3295 s/iter data_time: 0.1922 s/iter total_throughput: 3107.93 samples/s lr: 8.39e-05 [09/15 08:19:32] lb.utils.events INFO: eta: 15:54:19 iteration: 2199/375342 consumed_samples: 2252800 total_loss: 6.382 time: 0.3293 s/iter data_time: 0.1756 s/iter total_throughput: 3109.51 samples/s lr: 8.78e-05 [09/15 08:20:05] lb.utils.events INFO: eta: 15:55:33 iteration: 2299/375342 consumed_samples: 2355200 total_loss: 6.356 time: 0.3293 s/iter data_time: 0.1868 s/iter total_throughput: 3110.09 samples/s lr: 9.18e-05 [09/15 08:20:38] lb.utils.events INFO: eta: 15:57:38 iteration: 2399/375342 consumed_samples: 2457600 total_loss: 6.332 time: 0.3292 s/iter data_time: 0.1836 s/iter total_throughput: 3110.12 samples/s lr: 9.57e-05 [09/15 08:21:11] lb.utils.events INFO: eta: 15:59:57 iteration: 2499/375342 consumed_samples: 2560000 total_loss: 6.312 time: 0.3293 s/iter data_time: 0.1844 s/iter total_throughput: 3110.06 samples/s lr: 9.97e-05 [09/15 08:21:44] lb.utils.events INFO: eta: 15:53:18 iteration: 2599/375342 consumed_samples: 2662400 total_loss: 6.273 time: 0.3293 s/iter data_time: 0.1819 s/iter total_throughput: 3109.18 samples/s lr: 1.04e-04 [09/15 08:22:17] lb.utils.events INFO: eta: 15:50:47 iteration: 2699/375342 consumed_samples: 2764800 total_loss: 6.25 time: 0.3293 s/iter data_time: 0.1910 s/iter total_throughput: 3109.70 samples/s lr: 1.08e-04 [09/15 08:22:50] lb.utils.events INFO: eta: 15:53:44 iteration: 2799/375342 consumed_samples: 2867200 total_loss: 6.217 time: 0.3293 s/iter data_time: 0.1923 s/iter total_throughput: 3109.48 samples/s lr: 1.12e-04 [09/15 08:23:23] lb.utils.events INFO: eta: 15:49:37 iteration: 2899/375342 consumed_samples: 2969600 total_loss: 6.19 time: 0.3294 s/iter data_time: 0.1822 s/iter total_throughput: 3108.92 samples/s lr: 1.15e-04 [09/15 08:23:56] lb.utils.events INFO: eta: 15:49:48 iteration: 2999/375342 consumed_samples: 3072000 total_loss: 6.166 time: 0.3294 s/iter data_time: 0.1842 s/iter total_throughput: 3109.07 samples/s lr: 1.19e-04 [09/15 08:24:29] lb.utils.events INFO: eta: 15:52:26 iteration: 3099/375342 consumed_samples: 3174400 total_loss: 6.126 time: 0.3294 s/iter data_time: 0.1798 s/iter total_throughput: 3109.11 samples/s lr: 1.23e-04 [09/15 08:25:02] lb.utils.events INFO: eta: 15:54:17 iteration: 3199/375342 consumed_samples: 3276800 total_loss: 6.108 time: 0.3295 s/iter data_time: 0.1830 s/iter total_throughput: 3108.07 samples/s lr: 1.27e-04 [09/15 08:25:35] lb.utils.events INFO: eta: 15:47:55 iteration: 3299/375342 consumed_samples: 3379200 total_loss: 6.084 time: 0.3295 s/iter data_time: 0.1896 s/iter total_throughput: 3107.58 samples/s lr: 1.31e-04 [09/15 08:26:08] lb.utils.events INFO: eta: 15:44:23 iteration: 3399/375342 consumed_samples: 3481600 total_loss: 6.056 time: 0.3295 s/iter data_time: 0.1866 s/iter total_throughput: 3107.35 samples/s lr: 1.35e-04 [09/15 08:26:41] lb.utils.events INFO: eta: 15:42:00 iteration: 3499/375342 consumed_samples: 3584000 total_loss: 6.032 time: 0.3295 s/iter data_time: 0.1907 s/iter total_throughput: 3107.31 samples/s lr: 1.39e-04 [09/15 08:27:14] lb.utils.events INFO: eta: 15:40:17 iteration: 3599/375342 consumed_samples: 3686400 total_loss: 6.012 time: 0.3296 s/iter data_time: 0.1915 s/iter total_throughput: 3107.00 samples/s lr: 1.43e-04 [09/15 08:27:47] lb.utils.events INFO: eta: 15:38:06 iteration: 3699/375342 consumed_samples: 3788800 total_loss: 5.971 time: 0.3295 s/iter data_time: 0.1922 s/iter total_throughput: 3107.49 samples/s lr: 1.47e-04 [09/15 08:28:20] lb.utils.events INFO: eta: 15:35:40 iteration: 3799/375342 consumed_samples: 3891200 total_loss: 5.948 time: 0.3294 s/iter data_time: 0.1793 s/iter total_throughput: 3108.24 samples/s lr: 1.51e-04 [09/15 08:28:53] lb.utils.events INFO: eta: 15:36:25 iteration: 3899/375342 consumed_samples: 3993600 total_loss: 5.933 time: 0.3295 s/iter data_time: 0.1934 s/iter total_throughput: 3107.46 samples/s lr: 1.55e-04 [09/15 08:29:26] lb.utils.events INFO: eta: 15:35:31 iteration: 3999/375342 consumed_samples: 4096000 total_loss: 5.916 time: 0.3296 s/iter data_time: 0.1844 s/iter total_throughput: 3106.85 samples/s lr: 1.59e-04 [09/15 08:29:59] lb.utils.events INFO: eta: 15:33:31 iteration: 4099/375342 consumed_samples: 4198400 total_loss: 5.88 time: 0.3296 s/iter data_time: 0.1848 s/iter total_throughput: 3106.62 samples/s lr: 1.63e-04 [09/15 08:30:32] lb.utils.events INFO: eta: 15:31:32 iteration: 4199/375342 consumed_samples: 4300800 total_loss: 5.865 time: 0.3296 s/iter data_time: 0.1882 s/iter total_throughput: 3106.37 samples/s lr: 1.67e-04 [09/15 08:31:05] lb.utils.events INFO: eta: 15:30:21 iteration: 4299/375342 consumed_samples: 4403200 total_loss: 5.834 time: 0.3296 s/iter data_time: 0.1908 s/iter total_throughput: 3106.52 samples/s lr: 1.71e-04 [09/15 08:31:38] lb.utils.events INFO: eta: 15:31:38 iteration: 4399/375342 consumed_samples: 4505600 total_loss: 5.812 time: 0.3297 s/iter data_time: 0.1937 s/iter total_throughput: 3106.22 samples/s lr: 1.75e-04 [09/15 08:32:11] lb.utils.events INFO: eta: 15:30:15 iteration: 4499/375342 consumed_samples: 4608000 total_loss: 5.787 time: 0.3297 s/iter data_time: 0.1823 s/iter total_throughput: 3106.15 samples/s lr: 1.79e-04 [09/15 08:32:44] lb.utils.events INFO: eta: 15:30:17 iteration: 4599/375342 consumed_samples: 4710400 total_loss: 5.768 time: 0.3297 s/iter data_time: 0.1908 s/iter total_throughput: 3106.26 samples/s lr: 1.83e-04 [09/15 08:33:17] lb.utils.events INFO: eta: 15:31:22 iteration: 4699/375342 consumed_samples: 4812800 total_loss: 5.763 time: 0.3296 s/iter data_time: 0.1795 s/iter total_throughput: 3106.68 samples/s lr: 1.87e-04 [09/15 08:33:50] lb.utils.events INFO: eta: 15:30:57 iteration: 4799/375342 consumed_samples: 4915200 total_loss: 5.744 time: 0.3296 s/iter data_time: 0.1897 s/iter total_throughput: 3106.37 samples/s lr: 1.91e-04 [09/15 08:34:23] lb.utils.events INFO: eta: 15:31:37 iteration: 4899/375342 consumed_samples: 5017600 total_loss: 5.718 time: 0.3296 s/iter data_time: 0.1783 s/iter total_throughput: 3106.84 samples/s lr: 1.94e-04 [09/15 08:34:56] lb.utils.checkpoint INFO: Saving checkpoint to ./commit_c43b/model_0004999 [09/15 08:34:56] lb.evaluation.evaluator INFO: with eval_iter 100000.0, reset total samples 50000 to 50000 [09/15 08:34:56] lb.evaluation.evaluator INFO: Start inference on 50000 samples [09/15 08:34:58] lb.models.utils.graph_base INFO: Start compling the eval graph which may take some time. Please wait for a moment ... [09/15 08:35:03] lb.evaluation.evaluator INFO: Inference done 11264/50000. Dataloading: 0.0198 s/iter. Inference: 0.1767 s/iter. Eval: 0.0024 s/iter. Total: 0.1989 s/iter. ETA=0:00:07 [09/15 08:35:08] lb.evaluation.evaluator INFO: Inference done 29696/50000. Dataloading: 0.0678 s/iter. Inference: 0.1911 s/iter. Eval: 0.0022 s/iter. Total: 0.2612 s/iter. ETA=0:00:04 [09/15 08:35:13] lb.evaluation.evaluator INFO: Inference done 48128/50000. Dataloading: 0.0541 s/iter. Inference: 0.2201 s/iter. Eval: 0.0021 s/iter. Total: 0.2764 s/iter. ETA=0:00:00 [09/15 08:35:14] lb.evaluation.evaluator INFO: Total valid samples: 50000 [09/15 08:35:14] lb.evaluation.evaluator INFO: Total inference time: 0:00:12.120410 (0.000242 s / iter per device, on 8 devices) [09/15 08:35:14] lb.evaluation.evaluator INFO: Total inference pure compute time: 0:00:09 (0.000193 s / iter per device, on 8 devices) [09/15 08:35:14] lb.engine.default INFO: Evaluation results for ImageNetDataset in csv format: [09/15 08:35:14] lb.evaluation.utils INFO: copypaste: Acc@1=13.991999999999999 [09/15 08:35:14] lb.evaluation.utils INFO: copypaste: Acc@5=29.964000000000002 [09/15 08:35:14] lb.engine.hooks INFO: Saved first model at 13.99200 @ 4999 steps [09/15 08:35:14] lb.utils.checkpoint INFO: Saving checkpoint to ./commit_c43b/model_best [09/15 08:35:15] lb.utils.events INFO: eta: 15:33:18 iteration: 4999/375342 consumed_samples: 5120000 total_loss: 5.698 time: 0.3296 s/iter data_time: 0.1818 s/iter total_throughput: 3107.14 samples/s lr: 1.98e-04 [09/15 08:35:46] lb.utils.events INFO: eta: 15:35:28 iteration: 5099/375342 consumed_samples: 5222400 total_loss: 5.687 time: 0.3292 s/iter data_time: 0.1848 s/iter total_throughput: 3110.27 samples/s lr: 2.02e-04 [09/15 08:36:19] lb.utils.events INFO: eta: 15:40:38 iteration: 5199/375342 consumed_samples: 5324800 total_loss: 5.666 time: 0.3292 s/iter data_time: 0.1862 s/iter total_throughput: 3110.11 samples/s lr: 2.06e-04 [09/15 08:36:52] lb.utils.events INFO: eta: 15:44:28 iteration: 5299/375342 consumed_samples: 5427200 total_loss: 5.644 time: 0.3293 s/iter data_time: 0.1792 s/iter total_throughput: 3110.04 samples/s lr: 2.10e-04 [09/15 08:37:25] lb.utils.events INFO: eta: 15:49:16 iteration: 5399/375342 consumed_samples: 5529600 total_loss: 5.624 time: 0.3292 s/iter data_time: 0.1802 s/iter total_throughput: 3110.24 samples/s lr: 2.14e-04 [09/15 08:37:58] lb.utils.events INFO: eta: 16:20:18 iteration: 5499/375342 consumed_samples: 5632000 total_loss: 5.605 time: 0.3293 s/iter data_time: 0.1871 s/iter total_throughput: 3109.93 samples/s lr: 2.18e-04 [09/15 08:38:31] lb.utils.events INFO: eta: 19:56:12 iteration: 5599/375342 consumed_samples: 5734400 total_loss: 5.59 time: 0.3292 s/iter data_time: 0.1837 s/iter total_throughput: 3110.18 samples/s lr: 2.22e-04 [09/15 08:39:03] lb.utils.events INFO: eta: 1 day, 1:24:42 iteration: 5699/375342 consumed_samples: 5836800 total_loss: 5.573 time: 0.3292 s/iter data_time: 0.1861 s/iter total_throughput: 3110.16 samples/s lr: 2.26e-04 [09/15 08:39:37] lb.utils.events INFO: eta: 1 day, 2:56:53 iteration: 5799/375342 consumed_samples: 5939200 total_loss: 5.554 time: 0.3293 s/iter data_time: 0.1883 s/iter total_throughput: 3109.22 samples/s lr: 2.30e-04 [09/15 08:40:10] lb.utils.events INFO: eta: 1 day, 4:39:20 iteration: 5899/375342 consumed_samples: 6041600 total_loss: 5.534 time: 0.3293 s/iter data_time: 0.1834 s/iter total_throughput: 3109.49 samples/s lr: 2.34e-04 [09/15 08:40:43] lb.utils.events INFO: eta: 1 day, 3:43:43 iteration: 5999/375342 consumed_samples: 6144000 total_loss: 5.521 time: 0.3293 s/iter data_time: 0.1803 s/iter total_throughput: 3109.42 samples/s lr: 2.38e-04 [09/15 08:41:16] lb.utils.events INFO: eta: 1 day, 4:48:23 iteration: 6099/375342 consumed_samples: 6246400 total_loss: 5.523 time: 0.3294 s/iter data_time: 0.1961 s/iter total_throughput: 3108.95 samples/s lr: 2.42e-04 [09/15 08:41:49] lb.utils.events INFO: eta: 1 day, 4:01:02 iteration: 6199/375342 consumed_samples: 6348800 total_loss: 5.521 time: 0.3293 s/iter data_time: 0.1778 s/iter total_throughput: 3109.49 samples/s lr: 2.46e-04 [09/15 08:42:22] lb.utils.events INFO: eta: 1 day, 2:01:21 iteration: 6299/375342 consumed_samples: 6451200 total_loss: 5.481 time: 0.3294 s/iter data_time: 0.1936 s/iter total_throughput: 3108.94 samples/s lr: 2.50e-04 [09/15 08:42:55] lb.utils.events INFO: eta: 23:49:44 iteration: 6399/375342 consumed_samples: 6553600 total_loss: 5.448 time: 0.3294 s/iter data_time: 0.1878 s/iter total_throughput: 3108.72 samples/s lr: 2.54e-04 [09/15 08:43:28] lb.utils.events INFO: eta: 16:48:35 iteration: 6499/375342 consumed_samples: 6656000 total_loss: 5.448 time: 0.3294 s/iter data_time: 0.1846 s/iter total_throughput: 3108.53 samples/s lr: 2.58e-04 [09/15 08:44:01] lb.utils.events INFO: eta: 15:50:32 iteration: 6599/375342 consumed_samples: 6758400 total_loss: 5.451 time: 0.3294 s/iter data_time: 0.1791 s/iter total_throughput: 3108.97 samples/s lr: 2.62e-04 [09/15 08:44:34] lb.utils.events INFO: eta: 15:40:47 iteration: 6699/375342 consumed_samples: 6860800 total_loss: 5.437 time: 0.3294 s/iter data_time: 0.1848 s/iter total_throughput: 3108.75 samples/s lr: 2.66e-04 [09/15 08:45:07] lb.utils.events INFO: eta: 15:37:48 iteration: 6799/375342 consumed_samples: 6963200 total_loss: 5.417 time: 0.3294 s/iter data_time: 0.1777 s/iter total_throughput: 3108.90 samples/s lr: 2.69e-04 [09/15 08:45:40] lb.utils.events INFO: eta: 15:35:50 iteration: 6899/375342 consumed_samples: 7065600 total_loss: 5.402 time: 0.3294 s/iter data_time: 0.1832 s/iter total_throughput: 3108.86 samples/s lr: 2.73e-04 [09/15 08:46:13] lb.utils.events INFO: eta: 15:35:56 iteration: 6999/375342 consumed_samples: 7168000 total_loss: 5.397 time: 0.3294 s/iter data_time: 0.1918 s/iter total_throughput: 3108.33 samples/s lr: 2.77e-04 [09/15 08:46:46] lb.utils.events INFO: eta: 15:32:10 iteration: 7099/375342 consumed_samples: 7270400 total_loss: 5.389 time: 0.3294 s/iter data_time: 0.1817 s/iter total_throughput: 3108.22 samples/s lr: 2.81e-04 [09/15 08:47:19] lb.utils.events INFO: eta: 15:31:27 iteration: 7199/375342 consumed_samples: 7372800 total_loss: 5.374 time: 0.3294 s/iter data_time: 0.1883 s/iter total_throughput: 3108.29 samples/s lr: 2.85e-04 [09/15 08:47:52] lb.utils.events INFO: eta: 15:31:52 iteration: 7299/375342 consumed_samples: 7475200 total_loss: 5.347 time: 0.3294 s/iter data_time: 0.1764 s/iter total_throughput: 3108.50 samples/s lr: 2.89e-04 [09/15 08:48:25] lb.utils.events INFO: eta: 15:33:53 iteration: 7399/375342 consumed_samples: 7577600 total_loss: 5.344 time: 0.3295 s/iter data_time: 0.1980 s/iter total_throughput: 3107.91 samples/s lr: 2.93e-04 [09/15 08:48:58] lb.utils.events INFO: eta: 15:33:28 iteration: 7499/375342 consumed_samples: 7680000 total_loss: 5.346 time: 0.3295 s/iter data_time: 0.1889 s/iter total_throughput: 3107.68 samples/s lr: 2.97e-04 [09/15 08:49:31] lb.utils.events INFO: eta: 15:31:58 iteration: 7599/375342 consumed_samples: 7782400 total_loss: 5.319 time: 0.3295 s/iter data_time: 0.1764 s/iter total_throughput: 3107.68 samples/s lr: 3.01e-04 [09/15 08:50:04] lb.utils.events INFO: eta: 15:31:59 iteration: 7699/375342 consumed_samples: 7884800 total_loss: 5.31 time: 0.3295 s/iter data_time: 0.1885 s/iter total_throughput: 3107.59 samples/s lr: 3.05e-04 [09/15 08:50:37] lb.utils.events INFO: eta: 15:30:41 iteration: 7799/375342 consumed_samples: 7987200 total_loss: 5.295 time: 0.3296 s/iter data_time: 0.1796 s/iter total_throughput: 3107.25 samples/s lr: 3.09e-04 [09/15 08:51:10] lb.utils.events INFO: eta: 15:31:26 iteration: 7899/375342 consumed_samples: 8089600 total_loss: 5.261 time: 0.3295 s/iter data_time: 0.1860 s/iter total_throughput: 3107.39 samples/s lr: 3.13e-04 [09/15 08:51:43] lb.utils.events INFO: eta: 15:31:47 iteration: 7999/375342 consumed_samples: 8192000 total_loss: 5.254 time: 0.3296 s/iter data_time: 0.1951 s/iter total_throughput: 3107.25 samples/s lr: 3.17e-04 [09/15 08:52:16] lb.utils.events INFO: eta: 15:33:38 iteration: 8099/375342 consumed_samples: 8294400 total_loss: 5.256 time: 0.3296 s/iter data_time: 0.1877 s/iter total_throughput: 3107.21 samples/s lr: 3.21e-04 [09/15 08:52:49] lb.utils.events INFO: eta: 15:33:18 iteration: 8199/375342 consumed_samples: 8396800 total_loss: 5.255 time: 0.3296 s/iter data_time: 0.1820 s/iter total_throughput: 3107.20 samples/s lr: 3.25e-04 [09/15 08:53:22] lb.utils.events INFO: eta: 15:33:43 iteration: 8299/375342 consumed_samples: 8499200 total_loss: 5.249 time: 0.3295 s/iter data_time: 0.1836 s/iter total_throughput: 3107.59 samples/s lr: 3.29e-04 [09/15 08:53:55] lb.utils.events INFO: eta: 15:33:02 iteration: 8399/375342 consumed_samples: 8601600 total_loss: 5.234 time: 0.3295 s/iter data_time: 0.1884 s/iter total_throughput: 3107.28 samples/s lr: 3.33e-04 [09/15 08:54:29] lb.utils.events INFO: eta: 15:32:52 iteration: 8499/375342 consumed_samples: 8704000 total_loss: 5.214 time: 0.3296 s/iter data_time: 0.1922 s/iter total_throughput: 3106.79 samples/s lr: 3.37e-04 [09/15 08:55:01] lb.utils.events INFO: eta: 15:33:37 iteration: 8599/375342 consumed_samples: 8806400 total_loss: 5.211 time: 0.3296 s/iter data_time: 0.1821 s/iter total_throughput: 3107.02 samples/s lr: 3.41e-04 [09/15 08:55:34] lb.utils.events INFO: eta: 15:33:26 iteration: 8699/375342 consumed_samples: 8908800 total_loss: 5.211 time: 0.3296 s/iter data_time: 0.1793 s/iter total_throughput: 3107.00 samples/s lr: 3.45e-04 [09/15 08:56:07] lb.utils.events INFO: eta: 15:35:00 iteration: 8799/375342 consumed_samples: 9011200 total_loss: 5.214 time: 0.3296 s/iter data_time: 0.1808 s/iter total_throughput: 3107.05 samples/s lr: 3.48e-04 [09/15 08:56:41] lb.utils.events INFO: eta: 15:34:22 iteration: 8899/375342 consumed_samples: 9113600 total_loss: 5.2 time: 0.3296 s/iter data_time: 0.1838 s/iter total_throughput: 3106.69 samples/s lr: 3.52e-04 [09/15 08:57:13] lb.utils.events INFO: eta: 15:28:59 iteration: 8999/375342 consumed_samples: 9216000 total_loss: 5.169 time: 0.3296 s/iter data_time: 0.1799 s/iter total_throughput: 3106.82 samples/s lr: 3.56e-04 [09/15 08:57:46] lb.utils.events INFO: eta: 15:25:46 iteration: 9099/375342 consumed_samples: 9318400 total_loss: 5.149 time: 0.3296 s/iter data_time: 0.1822 s/iter total_throughput: 3107.20 samples/s lr: 3.60e-04 [09/15 08:58:19] lb.utils.events INFO: eta: 15:24:25 iteration: 9199/375342 consumed_samples: 9420800 total_loss: 5.132 time: 0.3296 s/iter data_time: 0.1889 s/iter total_throughput: 3106.88 samples/s lr: 3.64e-04 [09/15 08:58:53] lb.utils.events INFO: eta: 15:22:51 iteration: 9299/375342 consumed_samples: 9523200 total_loss: 5.125 time: 0.3296 s/iter data_time: 0.1845 s/iter total_throughput: 3106.51 samples/s lr: 3.68e-04 [09/15 08:59:26] lb.utils.events INFO: eta: 15:23:01 iteration: 9399/375342 consumed_samples: 9625600 total_loss: 5.122 time: 0.3296 s/iter data_time: 0.1760 s/iter total_throughput: 3106.56 samples/s lr: 3.72e-04 [09/15 08:59:59] lb.utils.events INFO: eta: 15:28:18 iteration: 9499/375342 consumed_samples: 9728000 total_loss: 5.112 time: 0.3296 s/iter data_time: 0.1778 s/iter total_throughput: 3106.41 samples/s lr: 3.76e-04 [09/15 09:00:32] lb.utils.events INFO: eta: 15:32:19 iteration: 9599/375342 consumed_samples: 9830400 total_loss: 5.101 time: 0.3297 s/iter data_time: 0.1830 s/iter total_throughput: 3106.20 samples/s lr: 3.80e-04 [09/15 09:01:05] lb.utils.events INFO: eta: 15:30:47 iteration: 9699/375342 consumed_samples: 9932800 total_loss: 5.081 time: 0.3297 s/iter data_time: 0.1799 s/iter total_throughput: 3106.20 samples/s lr: 3.84e-04 [09/15 09:01:38] lb.utils.events INFO: eta: 15:29:47 iteration: 9799/375342 consumed_samples: 10035200 total_loss: 5.081 time: 0.3297 s/iter data_time: 0.1856 s/iter total_throughput: 3106.21 samples/s lr: 3.88e-04 [09/15 09:02:11] lb.utils.events INFO: eta: 15:29:38 iteration: 9899/375342 consumed_samples: 10137600 total_loss: 5.073 time: 0.3297 s/iter data_time: 0.1792 s/iter total_throughput: 3105.98 samples/s lr: 3.92e-04 [09/15 09:02:44] lb.utils.checkpoint INFO: Saving checkpoint to ./commit_c43b/model_0009999 [09/15 09:02:45] lb.evaluation.evaluator INFO: with eval_iter 100000.0, reset total samples 50000 to 50000 [09/15 09:02:45] lb.evaluation.evaluator INFO: Start inference on 50000 samples [09/15 09:02:49] lb.evaluation.evaluator INFO: Inference done 11264/50000. Dataloading: 0.1022 s/iter. Inference: 0.1485 s/iter. Eval: 0.0019 s/iter. Total: 0.2526 s/iter. ETA=0:00:09 [09/15 09:02:54] lb.evaluation.evaluator INFO: Inference done 29696/50000. Dataloading: 0.1366 s/iter. Inference: 0.1525 s/iter. Eval: 0.0021 s/iter. Total: 0.2913 s/iter. ETA=0:00:05 [09/15 09:02:59] lb.evaluation.evaluator INFO: Inference done 48128/50000. Dataloading: 0.1012 s/iter. Inference: 0.1840 s/iter. Eval: 0.0021 s/iter. Total: 0.2874 s/iter. ETA=0:00:00 [09/15 09:03:00] lb.evaluation.evaluator INFO: Total valid samples: 50000 [09/15 09:03:00] lb.evaluation.evaluator INFO: Total inference time: 0:00:12.519507 (0.000250 s / iter per device, on 8 devices) [09/15 09:03:00] lb.evaluation.evaluator INFO: Total inference pure compute time: 0:00:08 (0.000161 s / iter per device, on 8 devices) [09/15 09:03:00] lb.engine.default INFO: Evaluation results for ImageNetDataset in csv format: [09/15 09:03:00] lb.evaluation.utils INFO: copypaste: Acc@1=26.338 [09/15 09:03:00] lb.evaluation.utils INFO: copypaste: Acc@5=46.186 [09/15 09:03:00] lb.engine.hooks INFO: Saved best model as latest eval score for Acc@1 is 26.33800, better than last best score 13.99200 @ iteration 4999. [09/15 09:03:00] lb.utils.checkpoint INFO: Saving checkpoint to ./commit_c43b/model_best [09/15 09:03:01] lb.utils.events INFO: eta: 15:31:24 iteration: 9999/375342 consumed_samples: 10240000 total_loss: 5.053 time: 0.3297 s/iter data_time: 0.1826 s/iter total_throughput: 3106.03 samples/s lr: 3.96e-04 [09/15 09:03:32] lb.utils.events INFO: eta: 15:42:16 iteration: 10099/375342 consumed_samples: 10342400 total_loss: 5.015 time: 0.3295 s/iter data_time: 0.1845 s/iter total_throughput: 3107.55 samples/s lr: 4.00e-04 [09/15 09:04:05] lb.utils.events INFO: eta: 16:26:40 iteration: 10199/375342 consumed_samples: 10444800 total_loss: 5 time: 0.3295 s/iter data_time: 0.1748 s/iter total_throughput: 3107.86 samples/s lr: 4.04e-04 [09/15 09:04:38] lb.utils.events INFO: eta: 17:53:49 iteration: 10299/375342 consumed_samples: 10547200 total_loss: 4.987 time: 0.3295 s/iter data_time: 0.1920 s/iter total_throughput: 3107.43 samples/s lr: 4.08e-04 [09/15 09:05:11] lb.utils.events INFO: eta: 17:44:50 iteration: 10399/375342 consumed_samples: 10649600 total_loss: 4.988 time: 0.3295 s/iter data_time: 0.1789 s/iter total_throughput: 3107.58 samples/s lr: 4.12e-04 [09/15 09:05:44] lb.utils.events INFO: eta: 16:03:07 iteration: 10499/375342 consumed_samples: 10752000 total_loss: 5.015 time: 0.3295 s/iter data_time: 0.1849 s/iter total_throughput: 3107.57 samples/s lr: 4.16e-04 [09/15 09:06:17] lb.utils.events INFO: eta: 16:04:54 iteration: 10599/375342 consumed_samples: 10854400 total_loss: 5.008 time: 0.3295 s/iter data_time: 0.1894 s/iter total_throughput: 3107.57 samples/s lr: 4.20e-04 [09/15 09:06:50] lb.utils.events INFO: eta: 16:18:33 iteration: 10699/375342 consumed_samples: 10956800 total_loss: 4.993 time: 0.3295 s/iter data_time: 0.1844 s/iter total_throughput: 3107.48 samples/s lr: 4.24e-04 [09/15 09:07:23] lb.utils.events INFO: eta: 16:35:11 iteration: 10799/375342 consumed_samples: 11059200 total_loss: 4.969 time: 0.3295 s/iter data_time: 0.1901 s/iter total_throughput: 3107.27 samples/s lr: 4.27e-04 [09/15 09:07:56] lb.utils.events INFO: eta: 16:43:39 iteration: 10899/375342 consumed_samples: 11161600 total_loss: 4.955 time: 0.3296 s/iter data_time: 0.1865 s/iter total_throughput: 3107.25 samples/s lr: 4.31e-04 [09/15 09:08:29] lb.utils.events INFO: eta: 16:34:44 iteration: 10999/375342 consumed_samples: 11264000 total_loss: 4.953 time: 0.3296 s/iter data_time: 0.1770 s/iter total_throughput: 3107.04 samples/s lr: 4.35e-04 [09/15 09:09:02] lb.utils.events INFO: eta: 15:52:19 iteration: 11099/375342 consumed_samples: 11366400 total_loss: 4.948 time: 0.3296 s/iter data_time: 0.1818 s/iter total_throughput: 3106.91 samples/s lr: 4.39e-04 [09/15 09:09:35] lb.utils.events INFO: eta: 15:33:12 iteration: 11199/375342 consumed_samples: 11468800 total_loss: 4.943 time: 0.3296 s/iter data_time: 0.1840 s/iter total_throughput: 3106.82 samples/s lr: 4.43e-04 [09/15 09:10:08] lb.utils.events INFO: eta: 15:35:52 iteration: 11299/375342 consumed_samples: 11571200 total_loss: 4.924 time: 0.3296 s/iter data_time: 0.1919 s/iter total_throughput: 3106.86 samples/s lr: 4.47e-04 [09/15 09:10:41] lb.utils.events INFO: eta: 15:36:26 iteration: 11399/375342 consumed_samples: 11673600 total_loss: 4.921 time: 0.3296 s/iter data_time: 0.1862 s/iter total_throughput: 3107.08 samples/s lr: 4.51e-04 [09/15 09:11:14] lb.utils.events INFO: eta: 15:44:05 iteration: 11499/375342 consumed_samples: 11776000 total_loss: 4.913 time: 0.3296 s/iter data_time: 0.1908 s/iter total_throughput: 3106.77 samples/s lr: 4.55e-04 [09/15 09:11:47] lb.utils.events INFO: eta: 15:51:38 iteration: 11599/375342 consumed_samples: 11878400 total_loss: 4.887 time: 0.3296 s/iter data_time: 0.1929 s/iter total_throughput: 3106.81 samples/s lr: 4.59e-04 [09/15 09:12:21] lb.utils.events INFO: eta: 16:04:52 iteration: 11699/375342 consumed_samples: 11980800 total_loss: 4.886 time: 0.3296 s/iter data_time: 0.1788 s/iter total_throughput: 3106.58 samples/s lr: 4.63e-04 [09/15 09:12:53] lb.utils.events INFO: eta: 15:59:42 iteration: 11799/375342 consumed_samples: 12083200 total_loss: 4.897 time: 0.3296 s/iter data_time: 0.1826 s/iter total_throughput: 3106.73 samples/s lr: 4.67e-04 [09/15 09:13:26] lb.utils.events INFO: eta: 15:41:44 iteration: 11899/375342 consumed_samples: 12185600 total_loss: 4.889 time: 0.3296 s/iter data_time: 0.1789 s/iter total_throughput: 3106.73 samples/s lr: 4.71e-04 [09/15 09:14:00] lb.utils.events INFO: eta: 15:34:53 iteration: 11999/375342 consumed_samples: 12288000 total_loss: 4.863 time: 0.3297 s/iter data_time: 0.1789 s/iter total_throughput: 3106.32 samples/s lr: 4.75e-04 [09/15 09:14:33] lb.utils.events INFO: eta: 15:28:23 iteration: 12099/375342 consumed_samples: 12390400 total_loss: 4.841 time: 0.3296 s/iter data_time: 0.1845 s/iter total_throughput: 3106.35 samples/s lr: 4.79e-04 [09/15 09:15:06] lb.utils.events INFO: eta: 15:26:04 iteration: 12199/375342 consumed_samples: 12492800 total_loss: 4.838 time: 0.3296 s/iter data_time: 0.1855 s/iter total_throughput: 3106.46 samples/s lr: 4.83e-04 [09/15 09:15:39] lb.utils.events INFO: eta: 15:22:44 iteration: 12299/375342 consumed_samples: 12595200 total_loss: 4.82 time: 0.3297 s/iter data_time: 0.1809 s/iter total_throughput: 3106.26 samples/s lr: 4.87e-04 [09/15 09:16:12] lb.utils.events INFO: eta: 15:19:01 iteration: 12399/375342 consumed_samples: 12697600 total_loss: 4.797 time: 0.3297 s/iter data_time: 0.1850 s/iter total_throughput: 3106.08 samples/s lr: 4.91e-04 [09/15 09:16:45] lb.utils.events INFO: eta: 15:17:15 iteration: 12499/375342 consumed_samples: 12800000 total_loss: 4.796 time: 0.3297 s/iter data_time: 0.1804 s/iter total_throughput: 3105.85 samples/s lr: 4.95e-04 [09/15 09:17:18] lb.utils.events INFO: eta: 15:15:28 iteration: 12599/375342 consumed_samples: 12902400 total_loss: 4.809 time: 0.3297 s/iter data_time: 0.1844 s/iter total_throughput: 3105.76 samples/s lr: 4.99e-04 [09/15 09:17:52] lb.utils.events INFO: eta: 15:13:51 iteration: 12699/375342 consumed_samples: 13004800 total_loss: 4.782 time: 0.3297 s/iter data_time: 0.1883 s/iter total_throughput: 3105.62 samples/s lr: 5.02e-04 [09/15 09:18:25] lb.utils.events INFO: eta: 15:12:31 iteration: 12799/375342 consumed_samples: 13107200 total_loss: 4.784 time: 0.3297 s/iter data_time: 0.1764 s/iter total_throughput: 3105.63 samples/s lr: 5.06e-04 [09/15 09:18:58] lb.utils.events INFO: eta: 15:11:49 iteration: 12899/375342 consumed_samples: 13209600 total_loss: 4.779 time: 0.3298 s/iter data_time: 0.1935 s/iter total_throughput: 3105.28 samples/s lr: 5.10e-04 [09/15 09:19:31] lb.utils.events INFO: eta: 15:12:01 iteration: 12999/375342 consumed_samples: 13312000 total_loss: 4.767 time: 0.3297 s/iter data_time: 0.1751 s/iter total_throughput: 3105.55 samples/s lr: 5.14e-04 [09/15 09:20:03] lb.utils.events INFO: eta: 15:11:20 iteration: 13099/375342 consumed_samples: 13414400 total_loss: 4.777 time: 0.3297 s/iter data_time: 0.1857 s/iter total_throughput: 3105.66 samples/s lr: 5.18e-04 [09/15 09:20:37] lb.utils.events INFO: eta: 15:10:41 iteration: 13199/375342 consumed_samples: 13516800 total_loss: 4.779 time: 0.3297 s/iter data_time: 0.1854 s/iter total_throughput: 3105.51 samples/s lr: 5.22e-04 [09/15 09:21:10] lb.utils.events INFO: eta: 15:09:33 iteration: 13299/375342 consumed_samples: 13619200 total_loss: 4.764 time: 0.3298 s/iter data_time: 0.1771 s/iter total_throughput: 3105.28 samples/s lr: 5.26e-04 [09/15 09:21:43] lb.utils.events INFO: eta: 15:08:06 iteration: 13399/375342 consumed_samples: 13721600 total_loss: 4.74 time: 0.3298 s/iter data_time: 0.1870 s/iter total_throughput: 3104.94 samples/s lr: 5.30e-04 [09/15 09:22:17] lb.utils.events INFO: eta: 15:08:40 iteration: 13499/375342 consumed_samples: 13824000 total_loss: 4.738 time: 0.3298 s/iter data_time: 0.1915 s/iter total_throughput: 3104.80 samples/s lr: 5.34e-04 [09/15 09:22:49] lb.utils.events INFO: eta: 15:08:03 iteration: 13599/375342 consumed_samples: 13926400 total_loss: 4.735 time: 0.3298 s/iter data_time: 0.1876 s/iter total_throughput: 3105.03 samples/s lr: 5.38e-04 [09/15 09:23:22] lb.utils.events INFO: eta: 15:07:46 iteration: 13699/375342 consumed_samples: 14028800 total_loss: 4.722 time: 0.3298 s/iter data_time: 0.1792 s/iter total_throughput: 3105.07 samples/s lr: 5.42e-04 [09/15 09:23:55] lb.utils.events INFO: eta: 15:07:29 iteration: 13799/375342 consumed_samples: 14131200 total_loss: 4.712 time: 0.3298 s/iter data_time: 0.1802 s/iter total_throughput: 3105.12 samples/s lr: 5.46e-04 [09/15 09:24:28] lb.utils.events INFO: eta: 15:07:33 iteration: 13899/375342 consumed_samples: 14233600 total_loss: 4.706 time: 0.3298 s/iter data_time: 0.1807 s/iter total_throughput: 3105.25 samples/s lr: 5.50e-04 [09/15 09:25:01] lb.utils.events INFO: eta: 15:08:24 iteration: 13999/375342 consumed_samples: 14336000 total_loss: 4.697 time: 0.3298 s/iter data_time: 0.1865 s/iter total_throughput: 3105.04 samples/s lr: 5.54e-04 [09/15 09:25:35] lb.utils.events INFO: eta: 15:09:02 iteration: 14099/375342 consumed_samples: 14438400 total_loss: 4.692 time: 0.3299 s/iter data_time: 0.1863 s/iter total_throughput: 3104.39 samples/s lr: 5.58e-04 [09/15 09:26:09] lb.utils.events INFO: eta: 15:11:00 iteration: 14199/375342 consumed_samples: 14540800 total_loss: 4.684 time: 0.3299 s/iter data_time: 0.1968 s/iter total_throughput: 3103.68 samples/s lr: 5.62e-04 [09/15 09:26:43] lb.utils.events INFO: eta: 15:14:33 iteration: 14299/375342 consumed_samples: 14643200 total_loss: 4.668 time: 0.3300 s/iter data_time: 0.1979 s/iter total_throughput: 3103.07 samples/s lr: 5.66e-04 [09/15 09:27:17] lb.utils.events INFO: eta: 15:18:11 iteration: 14399/375342 consumed_samples: 14745600 total_loss: 4.673 time: 0.3301 s/iter data_time: 0.1998 s/iter total_throughput: 3102.52 samples/s lr: 5.70e-04 [09/15 09:27:51] lb.utils.events INFO: eta: 15:18:39 iteration: 14499/375342 consumed_samples: 14848000 total_loss: 4.672 time: 0.3301 s/iter data_time: 0.1896 s/iter total_throughput: 3101.83 samples/s lr: 5.74e-04 [09/15 09:28:25] lb.utils.events INFO: eta: 15:19:54 iteration: 14599/375342 consumed_samples: 14950400 total_loss: 4.656 time: 0.3302 s/iter data_time: 0.1953 s/iter total_throughput: 3100.98 samples/s lr: 5.78e-04 [09/15 09:29:00] lb.utils.events INFO: eta: 15:19:57 iteration: 14699/375342 consumed_samples: 15052800 total_loss: 4.653 time: 0.3303 s/iter data_time: 0.1912 s/iter total_throughput: 3100.01 samples/s lr: 5.81e-04 [09/15 09:29:34] lb.utils.events INFO: eta: 15:24:10 iteration: 14799/375342 consumed_samples: 15155200 total_loss: 4.649 time: 0.3304 s/iter data_time: 0.1891 s/iter total_throughput: 3099.61 samples/s lr: 5.85e-04 [09/15 09:30:08] lb.utils.events INFO: eta: 15:21:48 iteration: 14899/375342 consumed_samples: 15257600 total_loss: 4.62 time: 0.3304 s/iter data_time: 0.1905 s/iter total_throughput: 3099.03 samples/s lr: 5.89e-04 [09/15 09:30:41] lb.utils.checkpoint INFO: Saving checkpoint to ./commit_c43b/model_0014999 [09/15 09:30:42] lb.evaluation.evaluator INFO: with eval_iter 100000.0, reset total samples 50000 to 50000 [09/15 09:30:42] lb.evaluation.evaluator INFO: Start inference on 50000 samples [09/15 09:30:46] lb.evaluation.evaluator INFO: Inference done 11264/50000. Dataloading: 0.1051 s/iter. Inference: 0.1451 s/iter. Eval: 0.0021 s/iter. Total: 0.2522 s/iter. ETA=0:00:09 [09/15 09:30:51] lb.evaluation.evaluator INFO: Inference done 29696/50000. Dataloading: 0.1382 s/iter. Inference: 0.1511 s/iter. Eval: 0.0020 s/iter. Total: 0.2914 s/iter. ETA=0:00:05 [09/15 09:30:57] lb.evaluation.evaluator INFO: Inference done 50000/50000. Dataloading: 0.1286 s/iter. Inference: 0.1491 s/iter. Eval: 0.0021 s/iter. Total: 0.2798 s/iter. ETA=0:00:00 [09/15 09:30:57] lb.evaluation.evaluator INFO: Total valid samples: 50000 [09/15 09:30:57] lb.evaluation.evaluator INFO: Total inference time: 0:00:12.412977 (0.000248 s / iter per device, on 8 devices) [09/15 09:30:57] lb.evaluation.evaluator INFO: Total inference pure compute time: 0:00:06 (0.000131 s / iter per device, on 8 devices) [09/15 09:30:57] lb.engine.default INFO: Evaluation results for ImageNetDataset in csv format: [09/15 09:30:57] lb.evaluation.utils INFO: copypaste: Acc@1=33.958 [09/15 09:30:57] lb.evaluation.utils INFO: copypaste: Acc@5=55.44 [09/15 09:30:57] lb.engine.hooks INFO: Saved best model as latest eval score for Acc@1 is 33.95800, better than last best score 26.33800 @ iteration 9999. [09/15 09:30:57] lb.utils.checkpoint INFO: Saving checkpoint to ./commit_c43b/model_best [09/15 09:30:58] lb.utils.events INFO: eta: 15:17:31 iteration: 14999/375342 consumed_samples: 15360000 total_loss: 4.614 time: 0.3305 s/iter data_time: 0.1925 s/iter total_throughput: 3098.71 samples/s lr: 5.93e-04 [09/15 09:31:29] lb.utils.events INFO: eta: 15:16:20 iteration: 15099/375342 consumed_samples: 15462400 total_loss: 4.601 time: 0.3304 s/iter data_time: 0.1926 s/iter total_throughput: 3099.53 samples/s lr: 5.97e-04 [09/15 09:32:03] lb.utils.events INFO: eta: 15:12:21 iteration: 15199/375342 consumed_samples: 15564800 total_loss: 4.595 time: 0.3304 s/iter data_time: 0.1929 s/iter total_throughput: 3099.14 samples/s lr: 6.01e-04 [09/15 09:32:37] lb.utils.events INFO: eta: 15:09:23 iteration: 15299/375342 consumed_samples: 15667200 total_loss: 4.59 time: 0.3305 s/iter data_time: 0.1935 s/iter total_throughput: 3098.42 samples/s lr: 6.05e-04 [09/15 09:33:11] lb.utils.events INFO: eta: 15:07:14 iteration: 15399/375342 consumed_samples: 15769600 total_loss: 4.594 time: 0.3306 s/iter data_time: 0.1882 s/iter total_throughput: 3097.82 samples/s lr: 6.09e-04 [09/15 09:33:45] lb.utils.events INFO: eta: 15:06:02 iteration: 15499/375342 consumed_samples: 15872000 total_loss: 4.603 time: 0.3306 s/iter data_time: 0.1970 s/iter total_throughput: 3097.23 samples/s lr: 6.13e-04 [09/15 09:34:19] lb.utils.events INFO: eta: 15:04:52 iteration: 15599/375342 consumed_samples: 15974400 total_loss: 4.593 time: 0.3307 s/iter data_time: 0.1890 s/iter total_throughput: 3096.76 samples/s lr: 6.17e-04 [09/15 09:34:53] lb.utils.events INFO: eta: 15:05:07 iteration: 15699/375342 consumed_samples: 16076800 total_loss: 4.568 time: 0.3307 s/iter data_time: 0.1956 s/iter total_throughput: 3096.22 samples/s lr: 6.21e-04 [09/15 09:35:27] lb.utils.events INFO: eta: 15:04:34 iteration: 15799/375342 consumed_samples: 16179200 total_loss: 4.563 time: 0.3308 s/iter data_time: 0.1957 s/iter total_throughput: 3095.84 samples/s lr: 6.25e-04 [09/15 09:36:01] lb.utils.events INFO: eta: 15:03:18 iteration: 15899/375342 consumed_samples: 16281600 total_loss: 4.566 time: 0.3308 s/iter data_time: 0.1913 s/iter total_throughput: 3095.53 samples/s lr: 6.29e-04 [09/15 09:36:34] lb.utils.events INFO: eta: 15:04:05 iteration: 15999/375342 consumed_samples: 16384000 total_loss: 4.561 time: 0.3308 s/iter data_time: 0.1872 s/iter total_throughput: 3095.30 samples/s lr: 6.33e-04 [09/15 09:37:08] lb.utils.events INFO: eta: 15:03:41 iteration: 16099/375342 consumed_samples: 16486400 total_loss: 4.541 time: 0.3309 s/iter data_time: 0.1983 s/iter total_throughput: 3094.85 samples/s lr: 6.37e-04 [09/15 09:37:42] lb.utils.events INFO: eta: 15:04:55 iteration: 16199/375342 consumed_samples: 16588800 total_loss: 4.537 time: 0.3309 s/iter data_time: 0.1836 s/iter total_throughput: 3094.50 samples/s lr: 6.41e-04 [09/15 09:38:15] lb.utils.events INFO: eta: 15:06:33 iteration: 16299/375342 consumed_samples: 16691200 total_loss: 4.536 time: 0.3310 s/iter data_time: 0.1941 s/iter total_throughput: 3094.07 samples/s lr: 6.45e-04 [09/15 09:38:49] lb.utils.events INFO: eta: 15:07:13 iteration: 16399/375342 consumed_samples: 16793600 total_loss: 4.519 time: 0.3310 s/iter data_time: 0.1881 s/iter total_throughput: 3093.68 samples/s lr: 6.49e-04 [09/15 09:39:23] lb.utils.events INFO: eta: 15:07:31 iteration: 16499/375342 consumed_samples: 16896000 total_loss: 4.508 time: 0.3310 s/iter data_time: 0.1857 s/iter total_throughput: 3093.56 samples/s lr: 6.53e-04 [09/15 09:39:56] lb.utils.events INFO: eta: 15:08:41 iteration: 16599/375342 consumed_samples: 16998400 total_loss: 4.502 time: 0.3310 s/iter data_time: 0.1904 s/iter total_throughput: 3093.35 samples/s lr: 6.57e-04 [09/15 09:40:30] lb.utils.events INFO: eta: 15:08:53 iteration: 16699/375342 consumed_samples: 17100800 total_loss: 4.493 time: 0.3311 s/iter data_time: 0.1901 s/iter total_throughput: 3092.93 samples/s lr: 6.60e-04 [09/15 09:41:04] lb.utils.events INFO: eta: 15:11:20 iteration: 16799/375342 consumed_samples: 17203200 total_loss: 4.489 time: 0.3311 s/iter data_time: 0.1938 s/iter total_throughput: 3092.64 samples/s lr: 6.64e-04 [09/15 09:41:37] lb.utils.events INFO: eta: 15:14:20 iteration: 16899/375342 consumed_samples: 17305600 total_loss: 4.493 time: 0.3311 s/iter data_time: 0.1888 s/iter total_throughput: 3092.30 samples/s lr: 6.68e-04 [09/15 09:42:11] lb.utils.events INFO: eta: 15:18:55 iteration: 16999/375342 consumed_samples: 17408000 total_loss: 4.502 time: 0.3312 s/iter data_time: 0.1966 s/iter total_throughput: 3091.96 samples/s lr: 6.72e-04 [09/15 09:42:45] lb.utils.events INFO: eta: 15:25:04 iteration: 17099/375342 consumed_samples: 17510400 total_loss: 4.505 time: 0.3312 s/iter data_time: 0.1943 s/iter total_throughput: 3091.60 samples/s lr: 6.76e-04 [09/15 09:43:18] lb.utils.events INFO: eta: 15:30:39 iteration: 17199/375342 consumed_samples: 17612800 total_loss: 4.49 time: 0.3312 s/iter data_time: 0.1932 s/iter total_throughput: 3091.41 samples/s lr: 6.80e-04 [09/15 09:43:52] lb.utils.events INFO: eta: 16:27:39 iteration: 17299/375342 consumed_samples: 17715200 total_loss: 4.496 time: 0.3313 s/iter data_time: 0.1867 s/iter total_throughput: 3091.01 samples/s lr: 6.84e-04 [09/15 09:44:26] lb.utils.events INFO: eta: 20:55:01 iteration: 17399/375342 consumed_samples: 17817600 total_loss: 4.505 time: 0.3313 s/iter data_time: 0.1848 s/iter total_throughput: 3090.83 samples/s lr: 6.88e-04 [09/15 09:44:59] lb.utils.events INFO: eta: 1 day, 1:28:23 iteration: 17499/375342 consumed_samples: 17920000 total_loss: 4.476 time: 0.3313 s/iter data_time: 0.1923 s/iter total_throughput: 3090.52 samples/s lr: 6.92e-04 [09/15 09:45:33] lb.utils.events INFO: eta: 1 day, 3:18:27 iteration: 17599/375342 consumed_samples: 18022400 total_loss: 4.474 time: 0.3314 s/iter data_time: 0.1865 s/iter total_throughput: 3090.16 samples/s lr: 6.96e-04 [09/15 09:46:07] lb.utils.events INFO: eta: 1 day, 1:59:27 iteration: 17699/375342 consumed_samples: 18124800 total_loss: 4.468 time: 0.3314 s/iter data_time: 0.1922 s/iter total_throughput: 3089.88 samples/s lr: 7.00e-04 [09/15 09:46:40] lb.utils.events INFO: eta: 1 day, 1:59:00 iteration: 17799/375342 consumed_samples: 18227200 total_loss: 4.449 time: 0.3314 s/iter data_time: 0.1894 s/iter total_throughput: 3089.63 samples/s lr: 7.04e-04 [09/15 09:47:14] lb.utils.events INFO: eta: 23:49:35 iteration: 17899/375342 consumed_samples: 18329600 total_loss: 4.453 time: 0.3315 s/iter data_time: 0.1905 s/iter total_throughput: 3089.39 samples/s lr: 7.08e-04 [09/15 09:47:48] lb.utils.events INFO: eta: 23:37:31 iteration: 17999/375342 consumed_samples: 18432000 total_loss: 4.458 time: 0.3315 s/iter data_time: 0.1982 s/iter total_throughput: 3089.09 samples/s lr: 7.12e-04 [09/15 09:48:21] lb.utils.events INFO: eta: 1 day, 0:01:10 iteration: 18099/375342 consumed_samples: 18534400 total_loss: 4.439 time: 0.3315 s/iter data_time: 0.1873 s/iter total_throughput: 3089.03 samples/s lr: 7.16e-04 [09/15 09:48:55] lb.utils.events INFO: eta: 1 day, 0:27:11 iteration: 18199/375342 consumed_samples: 18636800 total_loss: 4.438 time: 0.3315 s/iter data_time: 0.1963 s/iter total_throughput: 3088.70 samples/s lr: 7.20e-04 [09/15 09:49:29] lb.utils.events INFO: eta: 23:15:01 iteration: 18299/375342 consumed_samples: 18739200 total_loss: 4.452 time: 0.3316 s/iter data_time: 0.2035 s/iter total_throughput: 3088.32 samples/s lr: 7.24e-04 [09/15 09:50:02] lb.utils.events INFO: eta: 20:00:43 iteration: 18399/375342 consumed_samples: 18841600 total_loss: 4.444 time: 0.3316 s/iter data_time: 0.1830 s/iter total_throughput: 3088.07 samples/s lr: 7.28e-04 [09/15 09:50:36] lb.utils.events INFO: eta: 16:01:18 iteration: 18499/375342 consumed_samples: 18944000 total_loss: 4.439 time: 0.3316 s/iter data_time: 0.1978 s/iter total_throughput: 3087.78 samples/s lr: 7.32e-04 [09/15 09:51:10] lb.utils.events INFO: eta: 15:18:50 iteration: 18599/375342 consumed_samples: 19046400 total_loss: 4.439 time: 0.3316 s/iter data_time: 0.1825 s/iter total_throughput: 3087.65 samples/s lr: 7.35e-04 [09/15 09:51:43] lb.utils.events INFO: eta: 15:15:23 iteration: 18699/375342 consumed_samples: 19148800 total_loss: 4.412 time: 0.3317 s/iter data_time: 0.1865 s/iter total_throughput: 3087.47 samples/s lr: 7.39e-04 [09/15 09:52:17] lb.utils.events INFO: eta: 15:14:51 iteration: 18799/375342 consumed_samples: 19251200 total_loss: 4.412 time: 0.3317 s/iter data_time: 0.1864 s/iter total_throughput: 3087.28 samples/s lr: 7.43e-04 [09/15 09:52:50] lb.utils.events INFO: eta: 15:15:31 iteration: 18899/375342 consumed_samples: 19353600 total_loss: 4.409 time: 0.3317 s/iter data_time: 0.1846 s/iter total_throughput: 3087.13 samples/s lr: 7.47e-04 [09/15 09:53:24] lb.utils.events INFO: eta: 15:15:15 iteration: 18999/375342 consumed_samples: 19456000 total_loss: 4.398 time: 0.3317 s/iter data_time: 0.1879 s/iter total_throughput: 3086.87 samples/s lr: 7.51e-04 [09/15 09:53:58] lb.utils.events INFO: eta: 15:11:57 iteration: 19099/375342 consumed_samples: 19558400 total_loss: 4.379 time: 0.3318 s/iter data_time: 0.1927 s/iter total_throughput: 3086.53 samples/s lr: 7.55e-04 [09/15 09:54:32] lb.utils.events INFO: eta: 15:03:56 iteration: 19199/375342 consumed_samples: 19660800 total_loss: 4.39 time: 0.3318 s/iter data_time: 0.1973 s/iter total_throughput: 3086.13 samples/s lr: 7.59e-04 [09/15 09:55:06] lb.utils.events INFO: eta: 15:02:04 iteration: 19299/375342 consumed_samples: 19763200 total_loss: 4.394 time: 0.3318 s/iter data_time: 0.1900 s/iter total_throughput: 3085.86 samples/s lr: 7.63e-04 [09/15 09:55:39] lb.utils.events INFO: eta: 15:01:28 iteration: 19399/375342 consumed_samples: 19865600 total_loss: 4.385 time: 0.3319 s/iter data_time: 0.1798 s/iter total_throughput: 3085.63 samples/s lr: 7.67e-04 [09/15 09:56:13] lb.utils.events INFO: eta: 15:02:35 iteration: 19499/375342 consumed_samples: 19968000 total_loss: 4.358 time: 0.3319 s/iter data_time: 0.1868 s/iter total_throughput: 3085.46 samples/s lr: 7.71e-04 [09/15 09:56:47] lb.utils.events INFO: eta: 15:02:20 iteration: 19599/375342 consumed_samples: 20070400 total_loss: 4.369 time: 0.3319 s/iter data_time: 0.1912 s/iter total_throughput: 3085.15 samples/s lr: 7.75e-04 [09/15 09:57:20] lb.utils.events INFO: eta: 15:03:01 iteration: 19699/375342 consumed_samples: 20172800 total_loss: 4.372 time: 0.3319 s/iter data_time: 0.1958 s/iter total_throughput: 3085.02 samples/s lr: 7.79e-04 [09/15 09:57:54] lb.utils.events INFO: eta: 15:00:43 iteration: 19799/375342 consumed_samples: 20275200 total_loss: 4.377 time: 0.3320 s/iter data_time: 0.1931 s/iter total_throughput: 3084.68 samples/s lr: 7.83e-04 [09/15 09:58:28] lb.utils.events INFO: eta: 14:59:22 iteration: 19899/375342 consumed_samples: 20377600 total_loss: 4.385 time: 0.3320 s/iter data_time: 0.1880 s/iter total_throughput: 3084.53 samples/s lr: 7.87e-04 [09/15 09:59:01] lb.utils.checkpoint INFO: Saving checkpoint to ./commit_c43b/model_0019999 [09/15 09:59:02] lb.evaluation.evaluator INFO: with eval_iter 100000.0, reset total samples 50000 to 50000 [09/15 09:59:02] lb.evaluation.evaluator INFO: Start inference on 50000 samples [09/15 09:59:06] lb.evaluation.evaluator INFO: Inference done 11264/50000. Dataloading: 0.1010 s/iter. Inference: 0.1498 s/iter. Eval: 0.0022 s/iter. Total: 0.2530 s/iter. ETA=0:00:09 [09/15 09:59:11] lb.evaluation.evaluator INFO: Inference done 29696/50000. Dataloading: 0.1381 s/iter. Inference: 0.1507 s/iter. Eval: 0.0021 s/iter. Total: 0.2909 s/iter. ETA=0:00:05 [09/15 09:59:16] lb.evaluation.evaluator INFO: Inference done 49152/50000. Dataloading: 0.1212 s/iter. Inference: 0.1556 s/iter. Eval: 0.0020 s/iter. Total: 0.2789 s/iter. ETA=0:00:00 [09/15 09:59:17] lb.evaluation.evaluator INFO: Total valid samples: 50000 [09/15 09:59:17] lb.evaluation.evaluator INFO: Total inference time: 0:00:12.548497 (0.000251 s / iter per device, on 8 devices) [09/15 09:59:17] lb.evaluation.evaluator INFO: Total inference pure compute time: 0:00:07 (0.000140 s / iter per device, on 8 devices) [09/15 09:59:17] lb.engine.default INFO: Evaluation results for ImageNetDataset in csv format: [09/15 09:59:17] lb.evaluation.utils INFO: copypaste: Acc@1=37.206 [09/15 09:59:17] lb.evaluation.utils INFO: copypaste: Acc@5=58.982 [09/15 09:59:17] lb.engine.hooks INFO: Saved best model as latest eval score for Acc@1 is 37.20600, better than last best score 33.95800 @ iteration 14999. [09/15 09:59:17] lb.utils.checkpoint INFO: Saving checkpoint to ./commit_c43b/model_best [09/15 09:59:18] lb.utils.events INFO: eta: 14:56:44 iteration: 19999/375342 consumed_samples: 20480000 total_loss: 4.371 time: 0.3320 s/iter data_time: 0.1902 s/iter total_throughput: 3084.40 samples/s lr: 7.91e-04 [09/15 09:59:50] lb.utils.events INFO: eta: 14:56:12 iteration: 20099/375342 consumed_samples: 20582400 total_loss: 4.365 time: 0.3319 s/iter data_time: 0.1859 s/iter total_throughput: 3084.98 samples/s lr: 7.95e-04 [09/15 10:00:24] lb.utils.events INFO: eta: 14:56:18 iteration: 20199/375342 consumed_samples: 20684800 total_loss: 4.349 time: 0.3320 s/iter data_time: 0.2092 s/iter total_throughput: 3084.54 samples/s lr: 7.99e-04 [09/15 10:00:58] lb.utils.events INFO: eta: 14:57:48 iteration: 20299/375342 consumed_samples: 20787200 total_loss: 4.349 time: 0.3320 s/iter data_time: 0.1949 s/iter total_throughput: 3084.18 samples/s lr: 8.03e-04 [09/15 10:01:32] lb.utils.events INFO: eta: 14:57:56 iteration: 20399/375342 consumed_samples: 20889600 total_loss: 4.334 time: 0.3321 s/iter data_time: 0.1973 s/iter total_throughput: 3083.86 samples/s lr: 8.07e-04 [09/15 10:02:05] lb.utils.events INFO: eta: 14:58:11 iteration: 20499/375342 consumed_samples: 20992000 total_loss: 4.334 time: 0.3321 s/iter data_time: 0.1872 s/iter total_throughput: 3083.77 samples/s lr: 8.11e-04 [09/15 10:02:39] lb.utils.events INFO: eta: 14:59:03 iteration: 20599/375342 consumed_samples: 21094400 total_loss: 4.335 time: 0.3321 s/iter data_time: 0.1902 s/iter total_throughput: 3083.51 samples/s lr: 8.14e-04 [09/15 10:03:13] lb.utils.events INFO: eta: 14:59:50 iteration: 20699/375342 consumed_samples: 21196800 total_loss: 4.308 time: 0.3321 s/iter data_time: 0.1872 s/iter total_throughput: 3083.28 samples/s lr: 8.18e-04 [09/15 10:03:46] lb.utils.events INFO: eta: 15:03:22 iteration: 20799/375342 consumed_samples: 21299200 total_loss: 4.325 time: 0.3321 s/iter data_time: 0.1922 s/iter total_throughput: 3083.07 samples/s lr: 8.22e-04 [09/15 10:04:20] lb.utils.events INFO: eta: 15:04:26 iteration: 20899/375342 consumed_samples: 21401600 total_loss: 4.322 time: 0.3322 s/iter data_time: 0.1995 s/iter total_throughput: 3082.94 samples/s lr: 8.26e-04 [09/15 10:04:53] lb.utils.events INFO: eta: 15:05:33 iteration: 20999/375342 consumed_samples: 21504000 total_loss: 4.307 time: 0.3322 s/iter data_time: 0.1875 s/iter total_throughput: 3082.75 samples/s lr: 8.30e-04 [09/15 10:05:27] lb.utils.events INFO: eta: 15:05:51 iteration: 21099/375342 consumed_samples: 21606400 total_loss: 4.315 time: 0.3322 s/iter data_time: 0.1813 s/iter total_throughput: 3082.63 samples/s lr: 8.34e-04 [09/15 10:06:01] lb.utils.events INFO: eta: 15:05:27 iteration: 21199/375342 consumed_samples: 21708800 total_loss: 4.317 time: 0.3322 s/iter data_time: 0.1976 s/iter total_throughput: 3082.25 samples/s lr: 8.38e-04 [09/15 10:06:34] lb.utils.events INFO: eta: 15:00:34 iteration: 21299/375342 consumed_samples: 21811200 total_loss: 4.289 time: 0.3322 s/iter data_time: 0.1955 s/iter total_throughput: 3082.20 samples/s lr: 8.42e-04 [09/15 10:07:08] lb.utils.events INFO: eta: 14:59:26 iteration: 21399/375342 consumed_samples: 21913600 total_loss: 4.287 time: 0.3323 s/iter data_time: 0.1887 s/iter total_throughput: 3082.00 samples/s lr: 8.46e-04 [09/15 10:07:41] lb.utils.events INFO: eta: 14:59:03 iteration: 21499/375342 consumed_samples: 22016000 total_loss: 4.275 time: 0.3323 s/iter data_time: 0.1940 s/iter total_throughput: 3081.96 samples/s lr: 8.50e-04 [09/15 10:08:14] lb.utils.events INFO: eta: 15:00:04 iteration: 21599/375342 consumed_samples: 22118400 total_loss: 4.255 time: 0.3322 s/iter data_time: 0.1833 s/iter total_throughput: 3082.03 samples/s lr: 8.54e-04 [09/15 10:08:48] lb.utils.events INFO: eta: 15:00:09 iteration: 21699/375342 consumed_samples: 22220800 total_loss: 4.254 time: 0.3323 s/iter data_time: 0.1933 s/iter total_throughput: 3081.79 samples/s lr: 8.58e-04 [09/15 10:09:22] lb.utils.events INFO: eta: 14:59:44 iteration: 21799/375342 consumed_samples: 22323200 total_loss: 4.267 time: 0.3323 s/iter data_time: 0.1922 s/iter total_throughput: 3081.66 samples/s lr: 8.62e-04 [09/15 10:09:55] lb.utils.events INFO: eta: 14:59:04 iteration: 21899/375342 consumed_samples: 22425600 total_loss: 4.286 time: 0.3323 s/iter data_time: 0.1874 s/iter total_throughput: 3081.54 samples/s lr: 8.66e-04 [09/15 10:10:29] lb.utils.events INFO: eta: 14:56:49 iteration: 21999/375342 consumed_samples: 22528000 total_loss: 4.291 time: 0.3323 s/iter data_time: 0.1929 s/iter total_throughput: 3081.28 samples/s lr: 8.70e-04 [09/15 10:11:03] lb.utils.events INFO: eta: 14:54:31 iteration: 22099/375342 consumed_samples: 22630400 total_loss: 4.27 time: 0.3323 s/iter data_time: 0.1912 s/iter total_throughput: 3081.12 samples/s lr: 8.74e-04 [09/15 10:11:36] lb.utils.events INFO: eta: 14:54:09 iteration: 22199/375342 consumed_samples: 22732800 total_loss: 4.25 time: 0.3324 s/iter data_time: 0.1876 s/iter total_throughput: 3080.98 samples/s lr: 8.78e-04 [09/15 10:12:10] lb.utils.events INFO: eta: 14:53:04 iteration: 22299/375342 consumed_samples: 22835200 total_loss: 4.253 time: 0.3324 s/iter data_time: 0.1828 s/iter total_throughput: 3080.85 samples/s lr: 8.82e-04 [09/15 10:12:43] lb.utils.events INFO: eta: 14:52:04 iteration: 22399/375342 consumed_samples: 22937600 total_loss: 4.266 time: 0.3324 s/iter data_time: 0.1965 s/iter total_throughput: 3080.75 samples/s lr: 8.86e-04 [09/15 10:13:17] lb.utils.events INFO: eta: 14:50:40 iteration: 22499/375342 consumed_samples: 23040000 total_loss: 4.277 time: 0.3324 s/iter data_time: 0.1971 s/iter total_throughput: 3080.61 samples/s lr: 8.90e-04 [09/15 10:13:50] lb.utils.events INFO: eta: 14:48:52 iteration: 22599/375342 consumed_samples: 23142400 total_loss: 4.27 time: 0.3324 s/iter data_time: 0.1818 s/iter total_throughput: 3080.61 samples/s lr: 8.93e-04 [09/15 10:14:24] lb.utils.events INFO: eta: 14:46:53 iteration: 22699/375342 consumed_samples: 23244800 total_loss: 4.233 time: 0.3324 s/iter data_time: 0.1922 s/iter total_throughput: 3080.55 samples/s lr: 8.97e-04 [09/15 10:14:57] lb.utils.events INFO: eta: 14:46:37 iteration: 22799/375342 consumed_samples: 23347200 total_loss: 4.233 time: 0.3324 s/iter data_time: 0.1834 s/iter total_throughput: 3080.58 samples/s lr: 9.01e-04 [09/15 10:15:30] lb.utils.events INFO: eta: 14:47:53 iteration: 22899/375342 consumed_samples: 23449600 total_loss: 4.236 time: 0.3324 s/iter data_time: 0.1865 s/iter total_throughput: 3080.48 samples/s lr: 9.05e-04 [09/15 10:16:04] lb.utils.events INFO: eta: 14:49:07 iteration: 22999/375342 consumed_samples: 23552000 total_loss: 4.234 time: 0.3324 s/iter data_time: 0.1907 s/iter total_throughput: 3080.33 samples/s lr: 9.09e-04 [09/15 10:16:38] lb.utils.events INFO: eta: 14:50:39 iteration: 23099/375342 consumed_samples: 23654400 total_loss: 4.241 time: 0.3325 s/iter data_time: 0.1929 s/iter total_throughput: 3080.01 samples/s lr: 9.13e-04 [09/15 10:17:12] lb.utils.events INFO: eta: 14:54:10 iteration: 23199/375342 consumed_samples: 23756800 total_loss: 4.263 time: 0.3325 s/iter data_time: 0.1889 s/iter total_throughput: 3079.85 samples/s lr: 9.17e-04 [09/15 10:17:45] lb.utils.events INFO: eta: 14:57:52 iteration: 23299/375342 consumed_samples: 23859200 total_loss: 4.278 time: 0.3325 s/iter data_time: 0.1904 s/iter total_throughput: 3079.74 samples/s lr: 9.21e-04 [09/15 10:18:19] lb.utils.events INFO: eta: 15:01:12 iteration: 23399/375342 consumed_samples: 23961600 total_loss: 4.245 time: 0.3325 s/iter data_time: 0.1885 s/iter total_throughput: 3079.60 samples/s lr: 9.25e-04 [09/15 10:18:53] lb.utils.events INFO: eta: 15:07:52 iteration: 23499/375342 consumed_samples: 24064000 total_loss: 4.232 time: 0.3325 s/iter data_time: 0.1945 s/iter total_throughput: 3079.40 samples/s lr: 9.29e-04 [09/15 10:19:26] lb.utils.events INFO: eta: 15:13:42 iteration: 23599/375342 consumed_samples: 24166400 total_loss: 4.209 time: 0.3325 s/iter data_time: 0.1944 s/iter total_throughput: 3079.24 samples/s lr: 9.33e-04 [09/15 10:20:00] lb.utils.events INFO: eta: 15:12:44 iteration: 23699/375342 consumed_samples: 24268800 total_loss: 4.218 time: 0.3326 s/iter data_time: 0.1906 s/iter total_throughput: 3079.06 samples/s lr: 9.37e-04 [09/15 10:20:33] lb.utils.events INFO: eta: 15:15:03 iteration: 23799/375342 consumed_samples: 24371200 total_loss: 4.226 time: 0.3326 s/iter data_time: 0.1854 s/iter total_throughput: 3078.97 samples/s lr: 9.41e-04 [09/15 10:21:07] lb.utils.events INFO: eta: 15:10:39 iteration: 23899/375342 consumed_samples: 24473600 total_loss: 4.213 time: 0.3326 s/iter data_time: 0.1899 s/iter total_throughput: 3078.91 samples/s lr: 9.45e-04 [09/15 10:21:40] lb.utils.events INFO: eta: 15:21:59 iteration: 23999/375342 consumed_samples: 24576000 total_loss: 4.208 time: 0.3326 s/iter data_time: 0.1805 s/iter total_throughput: 3078.89 samples/s lr: 9.49e-04 [09/15 10:22:14] lb.utils.events INFO: eta: 15:12:48 iteration: 24099/375342 consumed_samples: 24678400 total_loss: 4.201 time: 0.3326 s/iter data_time: 0.2008 s/iter total_throughput: 3078.83 samples/s lr: 9.53e-04 [09/15 10:22:47] lb.utils.events INFO: eta: 15:03:48 iteration: 24199/375342 consumed_samples: 24780800 total_loss: 4.186 time: 0.3326 s/iter data_time: 0.1890 s/iter total_throughput: 3078.74 samples/s lr: 9.57e-04 [09/15 10:23:21] lb.utils.events INFO: eta: 14:56:10 iteration: 24299/375342 consumed_samples: 24883200 total_loss: 4.197 time: 0.3326 s/iter data_time: 0.1841 s/iter total_throughput: 3078.70 samples/s lr: 9.61e-04 [09/15 10:23:54] lb.utils.events INFO: eta: 14:54:15 iteration: 24399/375342 consumed_samples: 24985600 total_loss: 4.205 time: 0.3326 s/iter data_time: 0.1861 s/iter total_throughput: 3078.60 samples/s lr: 9.65e-04 [09/15 10:24:27] lb.utils.events INFO: eta: 14:51:55 iteration: 24499/375342 consumed_samples: 25088000 total_loss: 4.2 time: 0.3326 s/iter data_time: 0.1942 s/iter total_throughput: 3078.62 samples/s lr: 9.68e-04 [09/15 10:25:01] lb.utils.events INFO: eta: 14:48:38 iteration: 24599/375342 consumed_samples: 25190400 total_loss: 4.187 time: 0.3326 s/iter data_time: 0.1878 s/iter total_throughput: 3078.56 samples/s lr: 9.72e-04 [09/15 10:25:35] lb.utils.events INFO: eta: 14:51:41 iteration: 24699/375342 consumed_samples: 25292800 total_loss: 4.196 time: 0.3326 s/iter data_time: 0.1897 s/iter total_throughput: 3078.33 samples/s lr: 9.76e-04