[09/15 11:05:28] libai INFO: Rank of current process: 0. World size: 8 [09/15 11:05:28] libai INFO: Command line arguments: Namespace(config_file='configs/swin_imagenet.py', eval_only=False, fast_dev_run=False, opts=[], resume=False) [09/15 11:05:28] libai INFO: Contents of args.config_file=configs/swin_imagenet.py: from libai.config import LazyCall from .common.models.swin.swin_tiny_patch4_window7_224 import model from .common.models.graph import graph from .common.train import train from .common.optim import optim from .common.data.imagenet import dataloader from flowvision.data import Mixup from flowvision.loss.cross_entropy import SoftTargetCrossEntropy # Refine data path to imagenet dataloader.train.dataset[0].root = "/data/ImageNet/extract/" dataloader.test[0].dataset.root ="/data/ImageNet/extract/" # Add Mixup Func dataloader.train.mixup_func = LazyCall(Mixup)(  mixup_alpha=0.8,  cutmix_alpha=1.0,  prob=1.0,  switch_prob=0.5,  mode="batch",  num_classes=1000, ) # Refine model cfg for vit training on imagenet model.num_classes = 1000 model.loss_func = LazyCall(SoftTargetCrossEntropy)() # Refine optimizer cfg for vit model optim.lr = 1e-3 optim.eps = 1e-8 optim.weight_decay = 0.05 # Refine train cfg for vit model train.train_micro_batch_size = 128 train.test_micro_batch_size = 128 train.train_epoch = 300 train.warmup_ratio = 20 / 300 train.eval_period = 1562 train.log_period = 100 train.output_dir = "./commit_de24" # Scheduler train.scheduler.warmup_factor = 0.001 train.scheduler.alpha = 0.01 train.scheduler.warmup_method = "linear" graph.enabled = True train.rdma_enabled = True # Set fp16 ON train.amp.enabled = True [09/15 11:05:28] libai INFO: Full config saved to ./commit_de24/config.yaml [09/15 11:05:28] lb.engine.default INFO: > compiling dataset index builder ... [09/15 11:05:28] lb.engine.default INFO: >>> done with dataset index builder. Compilation time: 0.076 seconds [09/15 11:05:28] lb.engine.default INFO: >>> done with compiling. Compilation time: 0.077 seconds [09/15 11:05:28] lb.engine.default INFO: Prepare training, validating, testing set [09/15 11:05:32] lb.engine.default INFO: Prepare testing set [09/15 11:05:32] lb.engine.default INFO: Auto-scaling the config to train.train_iter=375342, train.warmup_iter=25023 [09/15 11:05:40] lb.engine.default INFO: Model: SwinTransformer( (patch_embed): PatchEmbed( (proj): Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4)) (norm): LayerNorm((96,), eps=1e-05, elementwise_affine=True) ) (pos_drop): Dropout(p=0.0, inplace=False) (layers): ModuleList( (0): BasicLayer( (blocks): ModuleList( (0): SwinTransformerBlock( (norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=96, out_features=288, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=96, out_features=96, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): Identity() (norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=96, out_features=384, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=384, out_features=96, bias=True, parallel=row) ) ) (1): SwinTransformerBlock( (norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=96, out_features=288, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=96, out_features=96, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=96, out_features=384, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=384, out_features=96, bias=True, parallel=row) ) ) ) (downsample): PatchMerging( (reduction): Linear1D(in_features=384, out_features=192, bias=False, parallel=data) (norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True) ) ) (1): BasicLayer( (blocks): ModuleList( (0): SwinTransformerBlock( (norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=192, out_features=576, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=192, out_features=192, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=192, out_features=768, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=768, out_features=192, bias=True, parallel=row) ) ) (1): SwinTransformerBlock( (norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=192, out_features=576, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=192, out_features=192, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=192, out_features=768, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=768, out_features=192, bias=True, parallel=row) ) ) ) (downsample): PatchMerging( (reduction): Linear1D(in_features=768, out_features=384, bias=False, parallel=data) (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) ) (2): BasicLayer( (blocks): ModuleList( (0): SwinTransformerBlock( (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=384, out_features=1152, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=384, out_features=384, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=384, out_features=1536, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=1536, out_features=384, bias=True, parallel=row) ) ) (1): SwinTransformerBlock( (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=384, out_features=1152, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=384, out_features=384, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=384, out_features=1536, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=1536, out_features=384, bias=True, parallel=row) ) ) (2): SwinTransformerBlock( (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=384, out_features=1152, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=384, out_features=384, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=384, out_features=1536, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=1536, out_features=384, bias=True, parallel=row) ) ) (3): SwinTransformerBlock( (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=384, out_features=1152, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=384, out_features=384, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=384, out_features=1536, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=1536, out_features=384, bias=True, parallel=row) ) ) (4): SwinTransformerBlock( (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=384, out_features=1152, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=384, out_features=384, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=384, out_features=1536, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=1536, out_features=384, bias=True, parallel=row) ) ) (5): SwinTransformerBlock( (norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=384, out_features=1152, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=384, out_features=384, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=384, out_features=1536, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=1536, out_features=384, bias=True, parallel=row) ) ) ) (downsample): PatchMerging( (reduction): Linear1D(in_features=1536, out_features=768, bias=False, parallel=data) (norm): LayerNorm((1536,), eps=1e-05, elementwise_affine=True) ) ) (3): BasicLayer( (blocks): ModuleList( (0): SwinTransformerBlock( (norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=768, out_features=2304, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=768, out_features=768, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=768, out_features=3072, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=3072, out_features=768, bias=True, parallel=row) ) ) (1): SwinTransformerBlock( (norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (attn): WindowAttention( (qkv): Linear1D(in_features=768, out_features=2304, bias=True, parallel=data) (attn_drop): Dropout(p=0.0, inplace=False) (proj): Linear1D(in_features=768, out_features=768, bias=True, parallel=data) (proj_drop): Dropout(p=0.0, inplace=False) (softmax): Softmax(dim=-1) ) (drop_path): DropPath() (norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (mlp): MLP( bias_gelu_fusion=True, bias_dropout_fusion=True, dropout=0.0 (dense_h_to_4h): Linear1D(in_features=768, out_features=3072, bias=True, parallel=col) (dense_4h_to_h): Linear1D(in_features=3072, out_features=768, bias=True, parallel=row) ) ) ) ) ) (norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (avgpool): AdaptiveAvgPool1d() (head): Linear1D(in_features=768, out_features=1000, bias=True, parallel=data) (loss_func): SoftTargetCrossEntropy() ) [09/15 11:05:40] lb.engine.trainer INFO: Starting training from iteration 0 [09/15 11:05:41] lb.models.utils.graph_base INFO: Start compling the train graph which may take some time. Please wait for a moment ... [09/15 11:06:41] lb.utils.events INFO: eta: 15:39:57 iteration: 99/375342 consumed_samples: 102400 total_loss: 6.972 time: 0.3251 s/iter data_time: 0.1979 s/iter total_throughput: 3149.46 samples/s lr: 4.91e-06 [09/15 11:07:14] lb.utils.events INFO: eta: 15:36:52 iteration: 199/375342 consumed_samples: 204800 total_loss: 6.942 time: 0.3311 s/iter data_time: 0.2020 s/iter total_throughput: 3093.02 samples/s lr: 8.86e-06 [09/15 11:07:48] lb.utils.events INFO: eta: 15:37:10 iteration: 299/375342 consumed_samples: 307200 total_loss: 6.912 time: 0.3326 s/iter data_time: 0.1988 s/iter total_throughput: 3078.57 samples/s lr: 1.28e-05 [09/15 11:08:22] lb.utils.events INFO: eta: 15:40:11 iteration: 399/375342 consumed_samples: 409600 total_loss: 6.89 time: 0.3338 s/iter data_time: 0.1912 s/iter total_throughput: 3067.89 samples/s lr: 1.68e-05 [09/15 11:08:55] lb.utils.events INFO: eta: 15:39:56 iteration: 499/375342 consumed_samples: 512000 total_loss: 6.869 time: 0.3347 s/iter data_time: 0.1895 s/iter total_throughput: 3059.31 samples/s lr: 2.07e-05 [09/15 11:09:29] lb.utils.events INFO: eta: 15:40:30 iteration: 599/375342 consumed_samples: 614400 total_loss: 6.851 time: 0.3352 s/iter data_time: 0.1870 s/iter total_throughput: 3054.82 samples/s lr: 2.47e-05 [09/15 11:10:03] lb.utils.events INFO: eta: 15:43:14 iteration: 699/375342 consumed_samples: 716800 total_loss: 6.831 time: 0.3359 s/iter data_time: 0.1872 s/iter total_throughput: 3048.71 samples/s lr: 2.86e-05 [09/15 11:10:37] lb.utils.events INFO: eta: 15:43:37 iteration: 799/375342 consumed_samples: 819200 total_loss: 6.81 time: 0.3366 s/iter data_time: 0.1918 s/iter total_throughput: 3042.50 samples/s lr: 3.26e-05 [09/15 11:11:11] lb.utils.events INFO: eta: 15:44:07 iteration: 899/375342 consumed_samples: 921600 total_loss: 6.783 time: 0.3371 s/iter data_time: 0.1912 s/iter total_throughput: 3037.98 samples/s lr: 3.65e-05 [09/15 11:11:46] lb.utils.events INFO: eta: 15:44:22 iteration: 999/375342 consumed_samples: 1024000 total_loss: 6.759 time: 0.3374 s/iter data_time: 0.1982 s/iter total_throughput: 3034.64 samples/s lr: 4.05e-05 [09/15 11:12:20] lb.utils.events INFO: eta: 15:46:21 iteration: 1099/375342 consumed_samples: 1126400 total_loss: 6.729 time: 0.3377 s/iter data_time: 0.1897 s/iter total_throughput: 3032.21 samples/s lr: 4.44e-05 [09/15 11:12:54] lb.utils.events INFO: eta: 15:47:45 iteration: 1199/375342 consumed_samples: 1228800 total_loss: 6.697 time: 0.3386 s/iter data_time: 0.1971 s/iter total_throughput: 3024.26 samples/s lr: 4.83e-05 [09/15 11:13:29] lb.utils.events INFO: eta: 15:48:05 iteration: 1299/375342 consumed_samples: 1331200 total_loss: 6.667 time: 0.3388 s/iter data_time: 0.1830 s/iter total_throughput: 3022.74 samples/s lr: 5.23e-05 [09/15 11:14:02] lb.utils.events INFO: eta: 15:48:39 iteration: 1399/375342 consumed_samples: 1433600 total_loss: 6.633 time: 0.3387 s/iter data_time: 0.1955 s/iter total_throughput: 3023.32 samples/s lr: 5.62e-05 [09/15 11:14:36] lb.utils.events INFO: eta: 15:50:42 iteration: 1499/375342 consumed_samples: 1536000 total_loss: 6.609 time: 0.3385 s/iter data_time: 0.1923 s/iter total_throughput: 3025.00 samples/s lr: 6.02e-05 [09/15 11:15:10] lb.utils.events INFO: eta: 15:51:42 iteration: 1599/375342 consumed_samples: 1638400 total_loss: 6.585 time: 0.3386 s/iter data_time: 0.1760 s/iter total_throughput: 3024.43 samples/s lr: 6.41e-05 [09/15 11:15:44] lb.utils.events INFO: eta: 15:51:56 iteration: 1699/375342 consumed_samples: 1740800 total_loss: 6.558 time: 0.3386 s/iter data_time: 0.1798 s/iter total_throughput: 3024.58 samples/s lr: 6.81e-05 [09/15 11:16:18] lb.utils.events INFO: eta: 15:52:16 iteration: 1799/375342 consumed_samples: 1843200 total_loss: 6.531 time: 0.3386 s/iter data_time: 0.1939 s/iter total_throughput: 3024.34 samples/s lr: 7.20e-05 [09/15 11:16:51] lb.utils.events INFO: eta: 15:52:51 iteration: 1899/375342 consumed_samples: 1945600 total_loss: 6.503 time: 0.3384 s/iter data_time: 0.1828 s/iter total_throughput: 3026.29 samples/s lr: 7.60e-05 [09/15 11:17:25] lb.utils.events INFO: eta: 15:53:00 iteration: 1999/375342 consumed_samples: 2048000 total_loss: 6.482 time: 0.3386 s/iter data_time: 0.1962 s/iter total_throughput: 3023.95 samples/s lr: 7.99e-05 [09/15 11:17:59] lb.utils.events INFO: eta: 15:52:52 iteration: 2099/375342 consumed_samples: 2150400 total_loss: 6.462 time: 0.3386 s/iter data_time: 0.1946 s/iter total_throughput: 3024.17 samples/s lr: 8.39e-05 [09/15 11:18:33] lb.utils.events INFO: eta: 15:52:59 iteration: 2199/375342 consumed_samples: 2252800 total_loss: 6.446 time: 0.3387 s/iter data_time: 0.1905 s/iter total_throughput: 3023.15 samples/s lr: 8.78e-05 [09/15 11:19:07] lb.utils.events INFO: eta: 15:53:23 iteration: 2299/375342 consumed_samples: 2355200 total_loss: 6.42 time: 0.3386 s/iter data_time: 0.1854 s/iter total_throughput: 3024.34 samples/s lr: 9.18e-05 [09/15 11:19:41] lb.utils.events INFO: eta: 15:53:41 iteration: 2399/375342 consumed_samples: 2457600 total_loss: 6.4 time: 0.3386 s/iter data_time: 0.1835 s/iter total_throughput: 3024.63 samples/s lr: 9.57e-05 [09/15 11:20:15] lb.utils.events INFO: eta: 15:53:35 iteration: 2499/375342 consumed_samples: 2560000 total_loss: 6.381 time: 0.3386 s/iter data_time: 0.1915 s/iter total_throughput: 3024.19 samples/s lr: 9.97e-05 [09/15 11:20:48] lb.utils.events INFO: eta: 15:53:43 iteration: 2599/375342 consumed_samples: 2662400 total_loss: 6.36 time: 0.3385 s/iter data_time: 0.1874 s/iter total_throughput: 3024.95 samples/s lr: 1.04e-04 [09/15 11:21:22] lb.utils.events INFO: eta: 15:53:25 iteration: 2699/375342 consumed_samples: 2764800 total_loss: 6.334 time: 0.3385 s/iter data_time: 0.1991 s/iter total_throughput: 3025.42 samples/s lr: 1.08e-04 [09/15 11:21:56] lb.utils.events INFO: eta: 15:53:50 iteration: 2799/375342 consumed_samples: 2867200 total_loss: 6.327 time: 0.3383 s/iter data_time: 0.1780 s/iter total_throughput: 3026.75 samples/s lr: 1.12e-04 [09/15 11:22:30] lb.utils.events INFO: eta: 15:53:47 iteration: 2899/375342 consumed_samples: 2969600 total_loss: 6.306 time: 0.3384 s/iter data_time: 0.1838 s/iter total_throughput: 3026.42 samples/s lr: 1.15e-04 [09/15 11:23:04] lb.utils.events INFO: eta: 15:54:13 iteration: 2999/375342 consumed_samples: 3072000 total_loss: 6.288 time: 0.3385 s/iter data_time: 0.1935 s/iter total_throughput: 3024.75 samples/s lr: 1.19e-04 [09/15 11:23:38] lb.utils.events INFO: eta: 15:53:47 iteration: 3099/375342 consumed_samples: 3174400 total_loss: 6.274 time: 0.3384 s/iter data_time: 0.1765 s/iter total_throughput: 3025.58 samples/s lr: 1.23e-04 [09/15 11:24:11] lb.utils.events INFO: eta: 15:53:53 iteration: 3199/375342 consumed_samples: 3276800 total_loss: 6.244 time: 0.3383 s/iter data_time: 0.1821 s/iter total_throughput: 3027.21 samples/s lr: 1.27e-04 [09/15 11:24:44] lb.utils.events INFO: eta: 15:53:21 iteration: 3299/375342 consumed_samples: 3379200 total_loss: 6.233 time: 0.3381 s/iter data_time: 0.1824 s/iter total_throughput: 3028.86 samples/s lr: 1.31e-04 [09/15 11:25:18] lb.utils.events INFO: eta: 15:52:20 iteration: 3399/375342 consumed_samples: 3481600 total_loss: 6.225 time: 0.3382 s/iter data_time: 0.1868 s/iter total_throughput: 3028.14 samples/s lr: 1.35e-04 [09/15 11:25:52] lb.utils.events INFO: eta: 15:51:54 iteration: 3499/375342 consumed_samples: 3584000 total_loss: 6.201 time: 0.3381 s/iter data_time: 0.1916 s/iter total_throughput: 3028.60 samples/s lr: 1.39e-04 [09/15 11:26:26] lb.utils.events INFO: eta: 15:50:43 iteration: 3599/375342 consumed_samples: 3686400 total_loss: 6.173 time: 0.3381 s/iter data_time: 0.1880 s/iter total_throughput: 3028.69 samples/s lr: 1.43e-04 [09/15 11:26:59] lb.utils.events INFO: eta: 15:50:20 iteration: 3699/375342 consumed_samples: 3788800 total_loss: 6.158 time: 0.3381 s/iter data_time: 0.1940 s/iter total_throughput: 3028.74 samples/s lr: 1.47e-04 [09/15 11:27:33] lb.utils.events INFO: eta: 15:49:34 iteration: 3799/375342 consumed_samples: 3891200 total_loss: 6.142 time: 0.3381 s/iter data_time: 0.1924 s/iter total_throughput: 3028.61 samples/s lr: 1.51e-04 [09/15 11:28:07] lb.utils.events INFO: eta: 15:49:25 iteration: 3899/375342 consumed_samples: 3993600 total_loss: 6.125 time: 0.3381 s/iter data_time: 0.1908 s/iter total_throughput: 3028.48 samples/s lr: 1.55e-04 [09/15 11:28:41] lb.utils.events INFO: eta: 15:49:18 iteration: 3999/375342 consumed_samples: 4096000 total_loss: 6.104 time: 0.3382 s/iter data_time: 0.1858 s/iter total_throughput: 3027.80 samples/s lr: 1.59e-04 [09/15 11:29:15] lb.utils.events INFO: eta: 15:49:07 iteration: 4099/375342 consumed_samples: 4198400 total_loss: 6.086 time: 0.3382 s/iter data_time: 0.1856 s/iter total_throughput: 3027.58 samples/s lr: 1.63e-04 [09/15 11:29:49] lb.utils.events INFO: eta: 15:49:08 iteration: 4199/375342 consumed_samples: 4300800 total_loss: 6.063 time: 0.3382 s/iter data_time: 0.1841 s/iter total_throughput: 3028.09 samples/s lr: 1.67e-04 [09/15 11:30:22] lb.utils.events INFO: eta: 15:48:53 iteration: 4299/375342 consumed_samples: 4403200 total_loss: 6.049 time: 0.3381 s/iter data_time: 0.1957 s/iter total_throughput: 3028.64 samples/s lr: 1.71e-04 [09/15 11:30:56] lb.utils.events INFO: eta: 15:48:26 iteration: 4399/375342 consumed_samples: 4505600 total_loss: 6.054 time: 0.3380 s/iter data_time: 0.1859 s/iter total_throughput: 3029.17 samples/s lr: 1.75e-04 [09/15 11:31:30] lb.utils.events INFO: eta: 15:48:24 iteration: 4499/375342 consumed_samples: 4608000 total_loss: 6.044 time: 0.3380 s/iter data_time: 0.1893 s/iter total_throughput: 3029.47 samples/s lr: 1.79e-04 [09/15 11:32:03] lb.utils.events INFO: eta: 15:48:26 iteration: 4599/375342 consumed_samples: 4710400 total_loss: 6.005 time: 0.3380 s/iter data_time: 0.1838 s/iter total_throughput: 3029.59 samples/s lr: 1.83e-04 [09/15 11:32:37] lb.utils.events INFO: eta: 15:48:35 iteration: 4699/375342 consumed_samples: 4812800 total_loss: 5.977 time: 0.3379 s/iter data_time: 0.1844 s/iter total_throughput: 3030.56 samples/s lr: 1.87e-04 [09/15 11:33:10] lb.utils.events INFO: eta: 15:48:57 iteration: 4799/375342 consumed_samples: 4915200 total_loss: 5.972 time: 0.3379 s/iter data_time: 0.1849 s/iter total_throughput: 3030.86 samples/s lr: 1.91e-04 [09/15 11:33:44] lb.utils.events INFO: eta: 15:49:16 iteration: 4899/375342 consumed_samples: 5017600 total_loss: 5.963 time: 0.3378 s/iter data_time: 0.1749 s/iter total_throughput: 3030.98 samples/s lr: 1.94e-04 [09/15 11:34:18] lb.utils.checkpoint INFO: Saving checkpoint to ./commit_de24/model_0004999 [09/15 11:34:19] lb.evaluation.evaluator INFO: with eval_iter 100000.0, reset total samples 50000 to 50000 [09/15 11:34:19] lb.evaluation.evaluator INFO: Start inference on 50000 samples [09/15 11:34:20] lb.models.utils.graph_base INFO: Start compling the eval graph which may take some time. Please wait for a moment ... [09/15 11:34:25] lb.evaluation.evaluator INFO: Inference done 11264/50000. Dataloading: 0.0321 s/iter. Inference: 0.1623 s/iter. Eval: 0.0024 s/iter. Total: 0.1969 s/iter. ETA=0:00:07 [09/15 11:34:30] lb.evaluation.evaluator INFO: Inference done 27648/50000. Dataloading: 0.0948 s/iter. Inference: 0.1858 s/iter. Eval: 0.0022 s/iter. Total: 0.2828 s/iter. ETA=0:00:05 [09/15 11:34:36] lb.evaluation.evaluator INFO: Inference done 47104/50000. Dataloading: 0.0722 s/iter. Inference: 0.2117 s/iter. Eval: 0.0021 s/iter. Total: 0.2860 s/iter. ETA=0:00:00 [09/15 11:34:36] lb.evaluation.evaluator INFO: Total valid samples: 50000 [09/15 11:34:36] lb.evaluation.evaluator INFO: Total inference time: 0:00:12.381101 (0.000248 s / iter per device, on 8 devices) [09/15 11:34:36] lb.evaluation.evaluator INFO: Total inference pure compute time: 0:00:09 (0.000183 s / iter per device, on 8 devices) [09/15 11:34:36] lb.engine.default INFO: Evaluation results for ImageNetDataset in csv format: [09/15 11:34:36] lb.evaluation.utils INFO: copypaste: Acc@1=16.488 [09/15 11:34:36] lb.evaluation.utils INFO: copypaste: Acc@5=36.46 [09/15 11:34:36] lb.engine.hooks INFO: Saved first model at 16.48800 @ 4999 steps [09/15 11:34:36] lb.utils.checkpoint INFO: Saving checkpoint to ./commit_de24/model_best [09/15 11:34:37] lb.utils.events INFO: eta: 15:48:54 iteration: 4999/375342 consumed_samples: 5120000 total_loss: 5.95 time: 0.3379 s/iter data_time: 0.1907 s/iter total_throughput: 3030.70 samples/s lr: 1.98e-04 [09/15 11:35:10] lb.utils.events INFO: eta: 15:48:45 iteration: 5099/375342 consumed_samples: 5222400 total_loss: 5.928 time: 0.3376 s/iter data_time: 0.1931 s/iter total_throughput: 3033.15 samples/s lr: 2.02e-04 [09/15 11:35:43] lb.utils.events INFO: eta: 15:47:38 iteration: 5199/375342 consumed_samples: 5324800 total_loss: 5.913 time: 0.3376 s/iter data_time: 0.1919 s/iter total_throughput: 3033.27 samples/s lr: 2.06e-04 [09/15 11:36:17] lb.utils.events INFO: eta: 15:47:16 iteration: 5299/375342 consumed_samples: 5427200 total_loss: 5.883 time: 0.3376 s/iter data_time: 0.1834 s/iter total_throughput: 3033.44 samples/s lr: 2.10e-04 [09/15 11:36:50] lb.utils.events INFO: eta: 15:47:03 iteration: 5399/375342 consumed_samples: 5529600 total_loss: 5.874 time: 0.3375 s/iter data_time: 0.1912 s/iter total_throughput: 3034.04 samples/s lr: 2.14e-04 [09/15 11:37:25] lb.utils.events INFO: eta: 15:46:09 iteration: 5499/375342 consumed_samples: 5632000 total_loss: 5.868 time: 0.3376 s/iter data_time: 0.1822 s/iter total_throughput: 3032.97 samples/s lr: 2.18e-04 [09/15 11:37:59] lb.utils.events INFO: eta: 15:46:49 iteration: 5599/375342 consumed_samples: 5734400 total_loss: 5.857 time: 0.3377 s/iter data_time: 0.1913 s/iter total_throughput: 3032.64 samples/s lr: 2.22e-04 [09/15 11:38:33] lb.utils.events INFO: eta: 15:47:18 iteration: 5699/375342 consumed_samples: 5836800 total_loss: 5.813 time: 0.3377 s/iter data_time: 0.1817 s/iter total_throughput: 3032.36 samples/s lr: 2.26e-04 [09/15 11:39:06] lb.utils.events INFO: eta: 15:46:53 iteration: 5799/375342 consumed_samples: 5939200 total_loss: 5.795 time: 0.3377 s/iter data_time: 0.1921 s/iter total_throughput: 3032.35 samples/s lr: 2.30e-04 [09/15 11:39:40] lb.utils.events INFO: eta: 15:46:47 iteration: 5899/375342 consumed_samples: 6041600 total_loss: 5.798 time: 0.3377 s/iter data_time: 0.1802 s/iter total_throughput: 3032.67 samples/s lr: 2.34e-04 [09/15 11:40:14] lb.utils.events INFO: eta: 15:46:36 iteration: 5999/375342 consumed_samples: 6144000 total_loss: 5.801 time: 0.3376 s/iter data_time: 0.1819 s/iter total_throughput: 3033.01 samples/s lr: 2.38e-04 [09/15 11:40:47] lb.utils.events INFO: eta: 15:46:00 iteration: 6099/375342 consumed_samples: 6246400 total_loss: 5.79 time: 0.3376 s/iter data_time: 0.1884 s/iter total_throughput: 3032.94 samples/s lr: 2.42e-04 [09/15 11:41:21] lb.utils.events INFO: eta: 15:46:05 iteration: 6199/375342 consumed_samples: 6348800 total_loss: 5.775 time: 0.3377 s/iter data_time: 0.2021 s/iter total_throughput: 3032.49 samples/s lr: 2.46e-04 [09/15 11:41:55] lb.utils.events INFO: eta: 15:46:21 iteration: 6299/375342 consumed_samples: 6451200 total_loss: 5.768 time: 0.3377 s/iter data_time: 0.1889 s/iter total_throughput: 3032.62 samples/s lr: 2.50e-04 [09/15 11:42:29] lb.utils.events INFO: eta: 15:47:00 iteration: 6399/375342 consumed_samples: 6553600 total_loss: 5.733 time: 0.3377 s/iter data_time: 0.1833 s/iter total_throughput: 3032.65 samples/s lr: 2.54e-04 [09/15 11:43:03] lb.utils.events INFO: eta: 15:47:01 iteration: 6499/375342 consumed_samples: 6656000 total_loss: 5.721 time: 0.3377 s/iter data_time: 0.1946 s/iter total_throughput: 3032.56 samples/s lr: 2.58e-04 [09/15 11:43:36] lb.utils.events INFO: eta: 15:46:54 iteration: 6599/375342 consumed_samples: 6758400 total_loss: 5.723 time: 0.3376 s/iter data_time: 0.1833 s/iter total_throughput: 3032.85 samples/s lr: 2.62e-04 [09/15 11:44:10] lb.utils.events INFO: eta: 15:46:26 iteration: 6699/375342 consumed_samples: 6860800 total_loss: 5.713 time: 0.3376 s/iter data_time: 0.1826 s/iter total_throughput: 3032.99 samples/s lr: 2.66e-04 [09/15 11:44:44] lb.utils.events INFO: eta: 15:45:54 iteration: 6799/375342 consumed_samples: 6963200 total_loss: 5.691 time: 0.3377 s/iter data_time: 0.1894 s/iter total_throughput: 3032.30 samples/s lr: 2.69e-04 [09/15 11:45:18] lb.utils.events INFO: eta: 15:44:46 iteration: 6899/375342 consumed_samples: 7065600 total_loss: 5.661 time: 0.3377 s/iter data_time: 0.1996 s/iter total_throughput: 3032.14 samples/s lr: 2.73e-04 [09/15 11:45:53] lb.utils.events INFO: eta: 15:44:56 iteration: 6999/375342 consumed_samples: 7168000 total_loss: 5.655 time: 0.3378 s/iter data_time: 0.1945 s/iter total_throughput: 3031.44 samples/s lr: 2.77e-04 [09/15 11:46:26] lb.utils.events INFO: eta: 15:45:07 iteration: 7099/375342 consumed_samples: 7270400 total_loss: 5.642 time: 0.3378 s/iter data_time: 0.1836 s/iter total_throughput: 3031.81 samples/s lr: 2.81e-04 [09/15 11:47:00] lb.utils.events INFO: eta: 15:44:30 iteration: 7199/375342 consumed_samples: 7372800 total_loss: 5.636 time: 0.3378 s/iter data_time: 0.1903 s/iter total_throughput: 3031.82 samples/s lr: 2.85e-04 [09/15 11:47:33] lb.utils.events INFO: eta: 15:44:13 iteration: 7299/375342 consumed_samples: 7475200 total_loss: 5.616 time: 0.3377 s/iter data_time: 0.1836 s/iter total_throughput: 3032.09 samples/s lr: 2.89e-04 [09/15 11:48:07] lb.utils.events INFO: eta: 15:43:32 iteration: 7399/375342 consumed_samples: 7577600 total_loss: 5.599 time: 0.3377 s/iter data_time: 0.2013 s/iter total_throughput: 3032.22 samples/s lr: 2.93e-04 [09/15 11:48:41] lb.utils.events INFO: eta: 15:43:16 iteration: 7499/375342 consumed_samples: 7680000 total_loss: 5.601 time: 0.3377 s/iter data_time: 0.1956 s/iter total_throughput: 3032.04 samples/s lr: 2.97e-04 [09/15 11:49:15] lb.utils.events INFO: eta: 15:43:01 iteration: 7599/375342 consumed_samples: 7782400 total_loss: 5.616 time: 0.3377 s/iter data_time: 0.1982 s/iter total_throughput: 3032.16 samples/s lr: 3.01e-04 [09/15 11:49:49] lb.utils.events INFO: eta: 15:43:03 iteration: 7699/375342 consumed_samples: 7884800 total_loss: 5.587 time: 0.3377 s/iter data_time: 0.2040 s/iter total_throughput: 3031.90 samples/s lr: 3.05e-04 [09/15 11:50:22] lb.utils.events INFO: eta: 15:44:12 iteration: 7799/375342 consumed_samples: 7987200 total_loss: 5.576 time: 0.3377 s/iter data_time: 0.1844 s/iter total_throughput: 3031.89 samples/s lr: 3.09e-04 [09/15 11:50:56] lb.utils.events INFO: eta: 15:44:39 iteration: 7899/375342 consumed_samples: 8089600 total_loss: 5.542 time: 0.3377 s/iter data_time: 0.1890 s/iter total_throughput: 3031.88 samples/s lr: 3.13e-04 [09/15 11:51:30] lb.utils.events INFO: eta: 15:44:26 iteration: 7999/375342 consumed_samples: 8192000 total_loss: 5.521 time: 0.3377 s/iter data_time: 0.1886 s/iter total_throughput: 3032.12 samples/s lr: 3.17e-04 [09/15 11:52:04] lb.utils.events INFO: eta: 15:44:19 iteration: 8099/375342 consumed_samples: 8294400 total_loss: 5.521 time: 0.3377 s/iter data_time: 0.1813 s/iter total_throughput: 3032.11 samples/s lr: 3.21e-04 [09/15 11:52:37] lb.utils.events INFO: eta: 15:44:49 iteration: 8199/375342 consumed_samples: 8396800 total_loss: 5.526 time: 0.3377 s/iter data_time: 0.1843 s/iter total_throughput: 3032.56 samples/s lr: 3.25e-04 [09/15 11:53:11] lb.utils.events INFO: eta: 15:44:00 iteration: 8299/375342 consumed_samples: 8499200 total_loss: 5.521 time: 0.3377 s/iter data_time: 0.1937 s/iter total_throughput: 3032.54 samples/s lr: 3.29e-04 [09/15 11:53:45] lb.utils.events INFO: eta: 15:43:41 iteration: 8399/375342 consumed_samples: 8601600 total_loss: 5.512 time: 0.3377 s/iter data_time: 0.1856 s/iter total_throughput: 3032.54 samples/s lr: 3.33e-04 [09/15 11:54:19] lb.utils.events INFO: eta: 15:43:25 iteration: 8499/375342 consumed_samples: 8704000 total_loss: 5.501 time: 0.3377 s/iter data_time: 0.1898 s/iter total_throughput: 3032.26 samples/s lr: 3.37e-04 [09/15 11:54:52] lb.utils.events INFO: eta: 15:44:28 iteration: 8599/375342 consumed_samples: 8806400 total_loss: 5.502 time: 0.3376 s/iter data_time: 0.1899 s/iter total_throughput: 3032.91 samples/s lr: 3.41e-04 [09/15 11:55:26] lb.utils.events INFO: eta: 15:44:08 iteration: 8699/375342 consumed_samples: 8908800 total_loss: 5.493 time: 0.3377 s/iter data_time: 0.1848 s/iter total_throughput: 3032.63 samples/s lr: 3.45e-04 [09/15 11:56:00] lb.utils.events INFO: eta: 15:43:16 iteration: 8799/375342 consumed_samples: 9011200 total_loss: 5.448 time: 0.3377 s/iter data_time: 0.2013 s/iter total_throughput: 3032.42 samples/s lr: 3.48e-04 [09/15 11:56:34] lb.utils.events INFO: eta: 15:43:51 iteration: 8899/375342 consumed_samples: 9113600 total_loss: 5.438 time: 0.3377 s/iter data_time: 0.1950 s/iter total_throughput: 3032.41 samples/s lr: 3.52e-04 [09/15 11:57:07] lb.utils.events INFO: eta: 15:43:22 iteration: 8999/375342 consumed_samples: 9216000 total_loss: 5.398 time: 0.3377 s/iter data_time: 0.1831 s/iter total_throughput: 3032.33 samples/s lr: 3.56e-04 [09/15 11:57:41] lb.utils.events INFO: eta: 15:42:29 iteration: 9099/375342 consumed_samples: 9318400 total_loss: 5.418 time: 0.3376 s/iter data_time: 0.1829 s/iter total_throughput: 3032.73 samples/s lr: 3.60e-04 [09/15 11:58:15] lb.utils.events INFO: eta: 15:40:59 iteration: 9199/375342 consumed_samples: 9420800 total_loss: 5.386 time: 0.3377 s/iter data_time: 0.2120 s/iter total_throughput: 3032.00 samples/s lr: 3.64e-04 [09/15 11:58:49] lb.utils.events INFO: eta: 15:40:54 iteration: 9299/375342 consumed_samples: 9523200 total_loss: 5.38 time: 0.3378 s/iter data_time: 0.1944 s/iter total_throughput: 3031.61 samples/s lr: 3.68e-04 [09/15 11:59:23] lb.utils.events INFO: eta: 15:40:53 iteration: 9399/375342 consumed_samples: 9625600 total_loss: 5.393 time: 0.3378 s/iter data_time: 0.1838 s/iter total_throughput: 3031.44 samples/s lr: 3.72e-04 [09/15 11:59:57] lb.utils.events INFO: eta: 15:41:39 iteration: 9499/375342 consumed_samples: 9728000 total_loss: 5.386 time: 0.3378 s/iter data_time: 0.1849 s/iter total_throughput: 3031.81 samples/s lr: 3.76e-04 [09/15 12:00:30] lb.utils.events INFO: eta: 15:40:09 iteration: 9599/375342 consumed_samples: 9830400 total_loss: 5.374 time: 0.3377 s/iter data_time: 0.1798 s/iter total_throughput: 3032.06 samples/s lr: 3.80e-04 [09/15 12:01:04] lb.utils.events INFO: eta: 15:40:06 iteration: 9699/375342 consumed_samples: 9932800 total_loss: 5.338 time: 0.3377 s/iter data_time: 0.1876 s/iter total_throughput: 3031.88 samples/s lr: 3.84e-04 [09/15 12:01:38] lb.utils.events INFO: eta: 15:39:56 iteration: 9799/375342 consumed_samples: 10035200 total_loss: 5.331 time: 0.3377 s/iter data_time: 0.1945 s/iter total_throughput: 3031.93 samples/s lr: 3.88e-04 [09/15 12:02:12] lb.utils.events INFO: eta: 15:38:55 iteration: 9899/375342 consumed_samples: 10137600 total_loss: 5.33 time: 0.3378 s/iter data_time: 0.1927 s/iter total_throughput: 3031.55 samples/s lr: 3.92e-04 [09/15 12:02:47] lb.utils.checkpoint INFO: Saving checkpoint to ./commit_de24/model_0009999 [09/15 12:02:47] lb.evaluation.evaluator INFO: with eval_iter 100000.0, reset total samples 50000 to 50000 [09/15 12:02:47] lb.evaluation.evaluator INFO: Start inference on 50000 samples [09/15 12:02:52] lb.evaluation.evaluator INFO: Inference done 11264/50000. Dataloading: 0.1207 s/iter. Inference: 0.1500 s/iter. Eval: 0.0021 s/iter. Total: 0.2728 s/iter. ETA=0:00:10 [09/15 12:02:57] lb.evaluation.evaluator INFO: Inference done 27648/50000. Dataloading: 0.1533 s/iter. Inference: 0.1481 s/iter. Eval: 0.0022 s/iter. Total: 0.3037 s/iter. ETA=0:00:06 [09/15 12:03:02] lb.evaluation.evaluator INFO: Inference done 45056/50000. Dataloading: 0.1490 s/iter. Inference: 0.1486 s/iter. Eval: 0.0022 s/iter. Total: 0.2999 s/iter. ETA=0:00:01 [09/15 12:03:04] lb.evaluation.evaluator INFO: Total valid samples: 50000 [09/15 12:03:04] lb.evaluation.evaluator INFO: Total inference time: 0:00:13.357607 (0.000267 s / iter per device, on 8 devices) [09/15 12:03:04] lb.evaluation.evaluator INFO: Total inference pure compute time: 0:00:06 (0.000132 s / iter per device, on 8 devices) [09/15 12:03:04] lb.engine.default INFO: Evaluation results for ImageNetDataset in csv format: [09/15 12:03:04] lb.evaluation.utils INFO: copypaste: Acc@1=35.394 [09/15 12:03:04] lb.evaluation.utils INFO: copypaste: Acc@5=61.072 [09/15 12:03:04] lb.engine.hooks INFO: Saved best model as latest eval score for Acc@1 is 35.39400, better than last best score 16.48800 @ iteration 4999. [09/15 12:03:04] lb.utils.checkpoint INFO: Saving checkpoint to ./commit_de24/model_best [09/15 12:03:04] lb.utils.events INFO: eta: 15:38:26 iteration: 9999/375342 consumed_samples: 10240000 total_loss: 5.355 time: 0.3378 s/iter data_time: 0.1948 s/iter total_throughput: 3030.94 samples/s lr: 3.96e-04 [09/15 12:03:37] lb.utils.events INFO: eta: 15:39:06 iteration: 10099/375342 consumed_samples: 10342400 total_loss: 5.354 time: 0.3377 s/iter data_time: 0.1952 s/iter total_throughput: 3032.15 samples/s lr: 4.00e-04 [09/15 12:04:11] lb.utils.events INFO: eta: 15:38:17 iteration: 10199/375342 consumed_samples: 10444800 total_loss: 5.319 time: 0.3377 s/iter data_time: 0.1821 s/iter total_throughput: 3032.18 samples/s lr: 4.04e-04 [09/15 12:04:45] lb.utils.events INFO: eta: 15:38:28 iteration: 10299/375342 consumed_samples: 10547200 total_loss: 5.29 time: 0.3377 s/iter data_time: 0.1895 s/iter total_throughput: 3031.90 samples/s lr: 4.08e-04 [09/15 12:05:18] lb.utils.events INFO: eta: 15:38:12 iteration: 10399/375342 consumed_samples: 10649600 total_loss: 5.276 time: 0.3377 s/iter data_time: 0.1857 s/iter total_throughput: 3031.97 samples/s lr: 4.12e-04 [09/15 12:05:53] lb.utils.events INFO: eta: 15:37:08 iteration: 10499/375342 consumed_samples: 10752000 total_loss: 5.293 time: 0.3378 s/iter data_time: 0.1916 s/iter total_throughput: 3031.59 samples/s lr: 4.16e-04 [09/15 12:06:26] lb.utils.events INFO: eta: 15:37:06 iteration: 10599/375342 consumed_samples: 10854400 total_loss: 5.307 time: 0.3377 s/iter data_time: 0.1827 s/iter total_throughput: 3032.01 samples/s lr: 4.20e-04 [09/15 12:07:00] lb.utils.events INFO: eta: 15:37:00 iteration: 10699/375342 consumed_samples: 10956800 total_loss: 5.3 time: 0.3377 s/iter data_time: 0.1921 s/iter total_throughput: 3031.99 samples/s lr: 4.24e-04 [09/15 12:07:34] lb.utils.events INFO: eta: 15:36:20 iteration: 10799/375342 consumed_samples: 11059200 total_loss: 5.285 time: 0.3377 s/iter data_time: 0.1861 s/iter total_throughput: 3031.97 samples/s lr: 4.27e-04 [09/15 12:08:08] lb.utils.events INFO: eta: 15:35:54 iteration: 10899/375342 consumed_samples: 11161600 total_loss: 5.251 time: 0.3378 s/iter data_time: 0.1950 s/iter total_throughput: 3031.66 samples/s lr: 4.31e-04 [09/15 12:08:41] lb.utils.events INFO: eta: 15:36:46 iteration: 10999/375342 consumed_samples: 11264000 total_loss: 5.242 time: 0.3377 s/iter data_time: 0.1884 s/iter total_throughput: 3031.85 samples/s lr: 4.35e-04 [09/15 12:09:15] lb.utils.events INFO: eta: 15:35:34 iteration: 11099/375342 consumed_samples: 11366400 total_loss: 5.258 time: 0.3378 s/iter data_time: 0.1900 s/iter total_throughput: 3031.50 samples/s lr: 4.39e-04 [09/15 12:09:49] lb.utils.events INFO: eta: 15:36:03 iteration: 11199/375342 consumed_samples: 11468800 total_loss: 5.255 time: 0.3378 s/iter data_time: 0.1878 s/iter total_throughput: 3031.46 samples/s lr: 4.43e-04 [09/15 12:10:23] lb.utils.events INFO: eta: 15:35:33 iteration: 11299/375342 consumed_samples: 11571200 total_loss: 5.233 time: 0.3378 s/iter data_time: 0.1877 s/iter total_throughput: 3031.22 samples/s lr: 4.47e-04 [09/15 12:10:57] lb.utils.events INFO: eta: 15:35:07 iteration: 11399/375342 consumed_samples: 11673600 total_loss: 5.212 time: 0.3378 s/iter data_time: 0.1755 s/iter total_throughput: 3031.61 samples/s lr: 4.51e-04 [09/15 12:11:30] lb.utils.events INFO: eta: 15:35:49 iteration: 11499/375342 consumed_samples: 11776000 total_loss: 5.187 time: 0.3378 s/iter data_time: 0.1944 s/iter total_throughput: 3031.61 samples/s lr: 4.55e-04 [09/15 12:12:05] lb.utils.events INFO: eta: 15:34:49 iteration: 11599/375342 consumed_samples: 11878400 total_loss: 5.224 time: 0.3378 s/iter data_time: 0.1886 s/iter total_throughput: 3030.95 samples/s lr: 4.59e-04 [09/15 12:12:39] lb.utils.events INFO: eta: 15:33:49 iteration: 11699/375342 consumed_samples: 11980800 total_loss: 5.217 time: 0.3379 s/iter data_time: 0.1967 s/iter total_throughput: 3030.54 samples/s lr: 4.63e-04 [09/15 12:13:13] lb.utils.events INFO: eta: 15:34:29 iteration: 11799/375342 consumed_samples: 12083200 total_loss: 5.187 time: 0.3379 s/iter data_time: 0.1835 s/iter total_throughput: 3030.39 samples/s lr: 4.67e-04 [09/15 12:13:47] lb.utils.events INFO: eta: 15:34:18 iteration: 11899/375342 consumed_samples: 12185600 total_loss: 5.187 time: 0.3379 s/iter data_time: 0.1862 s/iter total_throughput: 3030.44 samples/s lr: 4.71e-04 [09/15 12:14:21] lb.utils.events INFO: eta: 15:34:02 iteration: 11999/375342 consumed_samples: 12288000 total_loss: 5.205 time: 0.3379 s/iter data_time: 0.1894 s/iter total_throughput: 3030.60 samples/s lr: 4.75e-04 [09/15 12:14:55] lb.utils.events INFO: eta: 15:34:32 iteration: 12099/375342 consumed_samples: 12390400 total_loss: 5.165 time: 0.3379 s/iter data_time: 0.1884 s/iter total_throughput: 3030.19 samples/s lr: 4.79e-04 [09/15 12:15:29] lb.utils.events INFO: eta: 15:34:22 iteration: 12199/375342 consumed_samples: 12492800 total_loss: 5.145 time: 0.3379 s/iter data_time: 0.1848 s/iter total_throughput: 3030.16 samples/s lr: 4.83e-04 [09/15 12:16:03] lb.utils.events INFO: eta: 15:35:00 iteration: 12299/375342 consumed_samples: 12595200 total_loss: 5.15 time: 0.3379 s/iter data_time: 0.1914 s/iter total_throughput: 3030.29 samples/s lr: 4.87e-04 [09/15 12:16:37] lb.utils.events INFO: eta: 15:33:39 iteration: 12399/375342 consumed_samples: 12697600 total_loss: 5.143 time: 0.3379 s/iter data_time: 0.1897 s/iter total_throughput: 3030.04 samples/s lr: 4.91e-04 [09/15 12:17:11] lb.utils.events INFO: eta: 15:33:10 iteration: 12499/375342 consumed_samples: 12800000 total_loss: 5.12 time: 0.3380 s/iter data_time: 0.1873 s/iter total_throughput: 3029.96 samples/s lr: 4.95e-04 [09/15 12:17:45] lb.utils.events INFO: eta: 15:32:55 iteration: 12599/375342 consumed_samples: 12902400 total_loss: 5.118 time: 0.3380 s/iter data_time: 0.1950 s/iter total_throughput: 3029.48 samples/s lr: 4.99e-04 [09/15 12:18:19] lb.utils.events INFO: eta: 15:32:40 iteration: 12699/375342 consumed_samples: 13004800 total_loss: 5.125 time: 0.3381 s/iter data_time: 0.1979 s/iter total_throughput: 3029.11 samples/s lr: 5.02e-04 [09/15 12:18:53] lb.utils.events INFO: eta: 15:32:23 iteration: 12799/375342 consumed_samples: 13107200 total_loss: 5.106 time: 0.3380 s/iter data_time: 0.1867 s/iter total_throughput: 3029.25 samples/s lr: 5.06e-04 [09/15 12:19:27] lb.utils.events INFO: eta: 15:32:16 iteration: 12899/375342 consumed_samples: 13209600 total_loss: 5.091 time: 0.3381 s/iter data_time: 0.2106 s/iter total_throughput: 3029.00 samples/s lr: 5.10e-04 [09/15 12:20:01] lb.utils.events INFO: eta: 15:31:52 iteration: 12999/375342 consumed_samples: 13312000 total_loss: 5.109 time: 0.3381 s/iter data_time: 0.1914 s/iter total_throughput: 3029.01 samples/s lr: 5.14e-04 [09/15 12:20:35] lb.utils.events INFO: eta: 15:31:22 iteration: 13099/375342 consumed_samples: 13414400 total_loss: 5.093 time: 0.3381 s/iter data_time: 0.1959 s/iter total_throughput: 3028.74 samples/s lr: 5.18e-04 [09/15 12:21:09] lb.utils.events INFO: eta: 15:30:30 iteration: 13199/375342 consumed_samples: 13516800 total_loss: 5.08 time: 0.3381 s/iter data_time: 0.1906 s/iter total_throughput: 3028.74 samples/s lr: 5.22e-04 [09/15 12:21:43] lb.utils.events INFO: eta: 15:30:11 iteration: 13299/375342 consumed_samples: 13619200 total_loss: 5.074 time: 0.3381 s/iter data_time: 0.1848 s/iter total_throughput: 3028.90 samples/s lr: 5.26e-04 [09/15 12:22:16] lb.utils.events INFO: eta: 15:30:48 iteration: 13399/375342 consumed_samples: 13721600 total_loss: 5.075 time: 0.3381 s/iter data_time: 0.1927 s/iter total_throughput: 3028.93 samples/s lr: 5.30e-04 [09/15 12:22:51] lb.utils.events INFO: eta: 15:29:58 iteration: 13499/375342 consumed_samples: 13824000 total_loss: 5.088 time: 0.3381 s/iter data_time: 0.1923 s/iter total_throughput: 3028.62 samples/s lr: 5.34e-04 [09/15 12:23:25] lb.utils.events INFO: eta: 15:29:45 iteration: 13599/375342 consumed_samples: 13926400 total_loss: 5.043 time: 0.3382 s/iter data_time: 0.1946 s/iter total_throughput: 3028.17 samples/s lr: 5.38e-04 [09/15 12:23:58] lb.utils.events INFO: eta: 15:30:14 iteration: 13699/375342 consumed_samples: 14028800 total_loss: 5.031 time: 0.3381 s/iter data_time: 0.1941 s/iter total_throughput: 3028.51 samples/s lr: 5.42e-04 [09/15 12:24:33] lb.utils.events INFO: eta: 15:29:15 iteration: 13799/375342 consumed_samples: 14131200 total_loss: 5.029 time: 0.3381 s/iter data_time: 0.1901 s/iter total_throughput: 3028.37 samples/s lr: 5.46e-04 [09/15 12:25:06] lb.utils.events INFO: eta: 15:28:59 iteration: 13899/375342 consumed_samples: 14233600 total_loss: 5.016 time: 0.3381 s/iter data_time: 0.1833 s/iter total_throughput: 3028.41 samples/s lr: 5.50e-04 [09/15 12:25:40] lb.utils.events INFO: eta: 15:28:46 iteration: 13999/375342 consumed_samples: 14336000 total_loss: 5.029 time: 0.3381 s/iter data_time: 0.1905 s/iter total_throughput: 3028.37 samples/s lr: 5.54e-04 [09/15 12:26:15] lb.utils.events INFO: eta: 15:28:36 iteration: 14099/375342 consumed_samples: 14438400 total_loss: 5.025 time: 0.3382 s/iter data_time: 0.1892 s/iter total_throughput: 3027.94 samples/s lr: 5.58e-04 [09/15 12:26:49] lb.utils.events INFO: eta: 15:29:03 iteration: 14199/375342 consumed_samples: 14540800 total_loss: 5.011 time: 0.3382 s/iter data_time: 0.1919 s/iter total_throughput: 3027.65 samples/s lr: 5.62e-04 [09/15 12:27:24] lb.utils.events INFO: eta: 15:28:05 iteration: 14299/375342 consumed_samples: 14643200 total_loss: 5.001 time: 0.3383 s/iter data_time: 0.1995 s/iter total_throughput: 3027.07 samples/s lr: 5.66e-04 [09/15 12:27:58] lb.utils.events INFO: eta: 15:28:21 iteration: 14399/375342 consumed_samples: 14745600 total_loss: 4.977 time: 0.3383 s/iter data_time: 0.1854 s/iter total_throughput: 3026.67 samples/s lr: 5.70e-04 [09/15 12:28:33] lb.utils.events INFO: eta: 15:28:53 iteration: 14499/375342 consumed_samples: 14848000 total_loss: 4.992 time: 0.3384 s/iter data_time: 0.1979 s/iter total_throughput: 3026.04 samples/s lr: 5.74e-04 [09/15 12:29:08] lb.utils.events INFO: eta: 15:28:33 iteration: 14599/375342 consumed_samples: 14950400 total_loss: 4.993 time: 0.3385 s/iter data_time: 0.1880 s/iter total_throughput: 3025.23 samples/s lr: 5.78e-04 [09/15 12:29:43] lb.utils.events INFO: eta: 15:27:47 iteration: 14699/375342 consumed_samples: 15052800 total_loss: 4.972 time: 0.3386 s/iter data_time: 0.1934 s/iter total_throughput: 3024.44 samples/s lr: 5.81e-04 [09/15 12:30:19] lb.utils.events INFO: eta: 15:26:47 iteration: 14799/375342 consumed_samples: 15155200 total_loss: 4.975 time: 0.3387 s/iter data_time: 0.2084 s/iter total_throughput: 3023.65 samples/s lr: 5.85e-04 [09/15 12:30:53] lb.utils.events INFO: eta: 15:27:25 iteration: 14899/375342 consumed_samples: 15257600 total_loss: 4.974 time: 0.3387 s/iter data_time: 0.2117 s/iter total_throughput: 3023.03 samples/s lr: 5.89e-04 [09/15 12:31:28] lb.utils.checkpoint INFO: Saving checkpoint to ./commit_de24/model_0014999 [09/15 12:31:29] lb.evaluation.evaluator INFO: with eval_iter 100000.0, reset total samples 50000 to 50000 [09/15 12:31:29] lb.evaluation.evaluator INFO: Start inference on 50000 samples [09/15 12:31:33] lb.evaluation.evaluator INFO: Inference done 11264/50000. Dataloading: 0.1197 s/iter. Inference: 0.1503 s/iter. Eval: 0.0021 s/iter. Total: 0.2721 s/iter. ETA=0:00:10 [09/15 12:31:38] lb.evaluation.evaluator INFO: Inference done 27648/50000. Dataloading: 0.1535 s/iter. Inference: 0.1504 s/iter. Eval: 0.0021 s/iter. Total: 0.3061 s/iter. ETA=0:00:06 [09/15 12:31:43] lb.evaluation.evaluator INFO: Inference done 45056/50000. Dataloading: 0.1506 s/iter. Inference: 0.1495 s/iter. Eval: 0.0021 s/iter. Total: 0.3023 s/iter. ETA=0:00:01 [09/15 12:31:45] lb.evaluation.evaluator INFO: Total valid samples: 50000 [09/15 12:31:45] lb.evaluation.evaluator INFO: Total inference time: 0:00:13.420552 (0.000268 s / iter per device, on 8 devices) [09/15 12:31:45] lb.evaluation.evaluator INFO: Total inference pure compute time: 0:00:06 (0.000132 s / iter per device, on 8 devices) [09/15 12:31:45] lb.engine.default INFO: Evaluation results for ImageNetDataset in csv format: [09/15 12:31:45] lb.evaluation.utils INFO: copypaste: Acc@1=47.024 [09/15 12:31:45] lb.evaluation.utils INFO: copypaste: Acc@5=72.24000000000001 [09/15 12:31:45] lb.engine.hooks INFO: Saved best model as latest eval score for Acc@1 is 47.02400, better than last best score 35.39400 @ iteration 9999. [09/15 12:31:45] lb.utils.checkpoint INFO: Saving checkpoint to ./commit_de24/model_best [09/15 12:31:46] lb.utils.events INFO: eta: 15:27:04 iteration: 14999/375342 consumed_samples: 15360000 total_loss: 4.959 time: 0.3388 s/iter data_time: 0.1942 s/iter total_throughput: 3022.62 samples/s lr: 5.93e-04 [09/15 12:32:19] lb.utils.events INFO: eta: 15:27:03 iteration: 15099/375342 consumed_samples: 15462400 total_loss: 4.951 time: 0.3387 s/iter data_time: 0.2015 s/iter total_throughput: 3023.04 samples/s lr: 5.97e-04 [09/15 12:32:54] lb.utils.events INFO: eta: 15:26:16 iteration: 15199/375342 consumed_samples: 15564800 total_loss: 4.948 time: 0.3388 s/iter data_time: 0.1946 s/iter total_throughput: 3022.29 samples/s lr: 6.01e-04 [09/15 12:33:29] lb.utils.events INFO: eta: 15:26:00 iteration: 15299/375342 consumed_samples: 15667200 total_loss: 4.925 time: 0.3389 s/iter data_time: 0.1974 s/iter total_throughput: 3021.89 samples/s lr: 6.05e-04 [09/15 12:34:03] lb.utils.events INFO: eta: 15:25:09 iteration: 15399/375342 consumed_samples: 15769600 total_loss: 4.926 time: 0.3389 s/iter data_time: 0.1954 s/iter total_throughput: 3021.38 samples/s lr: 6.09e-04 [09/15 12:34:38] lb.utils.events INFO: eta: 15:24:42 iteration: 15499/375342 consumed_samples: 15872000 total_loss: 4.908 time: 0.3390 s/iter data_time: 0.1970 s/iter total_throughput: 3020.86 samples/s lr: 6.13e-04 [09/15 12:35:13] lb.utils.events INFO: eta: 15:24:36 iteration: 15599/375342 consumed_samples: 15974400 total_loss: 4.911 time: 0.3390 s/iter data_time: 0.1987 s/iter total_throughput: 3020.34 samples/s lr: 6.17e-04 [09/15 12:35:48] lb.utils.events INFO: eta: 15:24:34 iteration: 15699/375342 consumed_samples: 16076800 total_loss: 4.935 time: 0.3391 s/iter data_time: 0.2069 s/iter total_throughput: 3019.81 samples/s lr: 6.21e-04 [09/15 12:36:22] lb.utils.events INFO: eta: 15:25:15 iteration: 15799/375342 consumed_samples: 16179200 total_loss: 4.906 time: 0.3391 s/iter data_time: 0.1898 s/iter total_throughput: 3019.71 samples/s lr: 6.25e-04 [09/15 12:36:57] lb.utils.events INFO: eta: 15:24:45 iteration: 15899/375342 consumed_samples: 16281600 total_loss: 4.884 time: 0.3391 s/iter data_time: 0.1922 s/iter total_throughput: 3019.33 samples/s lr: 6.29e-04 [09/15 12:37:31] lb.utils.events INFO: eta: 15:24:28 iteration: 15999/375342 consumed_samples: 16384000 total_loss: 4.907 time: 0.3392 s/iter data_time: 0.1964 s/iter total_throughput: 3018.98 samples/s lr: 6.33e-04 [09/15 12:38:05] lb.utils.events INFO: eta: 15:24:14 iteration: 16099/375342 consumed_samples: 16486400 total_loss: 4.903 time: 0.3392 s/iter data_time: 0.1911 s/iter total_throughput: 3018.82 samples/s lr: 6.37e-04 [09/15 12:38:40] lb.utils.events INFO: eta: 15:24:17 iteration: 16199/375342 consumed_samples: 16588800 total_loss: 4.861 time: 0.3392 s/iter data_time: 0.2071 s/iter total_throughput: 3018.47 samples/s lr: 6.41e-04 [09/15 12:39:15] lb.utils.events INFO: eta: 15:24:25 iteration: 16299/375342 consumed_samples: 16691200 total_loss: 4.873 time: 0.3393 s/iter data_time: 0.1968 s/iter total_throughput: 3018.10 samples/s lr: 6.45e-04 [09/15 12:39:48] lb.utils.events INFO: eta: 15:24:24 iteration: 16399/375342 consumed_samples: 16793600 total_loss: 4.87 time: 0.3393 s/iter data_time: 0.1785 s/iter total_throughput: 3018.13 samples/s lr: 6.49e-04 [09/15 12:40:23] lb.utils.events INFO: eta: 15:24:20 iteration: 16499/375342 consumed_samples: 16896000 total_loss: 4.859 time: 0.3393 s/iter data_time: 0.2040 s/iter total_throughput: 3017.80 samples/s lr: 6.53e-04 [09/15 12:40:57] lb.utils.events INFO: eta: 15:23:49 iteration: 16599/375342 consumed_samples: 16998400 total_loss: 4.863 time: 0.3394 s/iter data_time: 0.1938 s/iter total_throughput: 3017.53 samples/s lr: 6.57e-04