Experimental results are not the same when run the code multiple times #6

LLLddddd · 2022-10-20T01:59:27Z

Hi,

It's a great work in moment localization and achieves significant results! I have some questions about the results when running codes for multiple times. It seems that for the same code and the same hyper parameters, experimental results are not the same when run the code twice.

Have you meet the same problem?Is there any solutions?

Thanks!

zhenzhiwang · 2022-10-21T10:45:26Z

Please provide more information about the difference, e.g., evaluation results in the log.

LLLddddd · 2022-10-22T06:12:15Z

for example, on TACoS, we get the results for the first running:

+-------------+-------------+-------------+-------------+-------------+-------------+
| R@1,IoU@0.1 | R@1,IoU@0.3 | R@1,IoU@0.5 | R@5,IoU@0.1 | R@5,IoU@0.3 | R@5,IoU@0.5 |
+-------------+-------------+-------------+-------------+-------------+-------------+
| 47.34 | 36.14 | 25.47 | 77.03 | 60.43 | 45.49 |
+-------------+-------------+-------------+-------------+-------------+-------------+

and for the second running:

+-------------+-------------+-------------+-------------+-------------+-------------+
| R@1,IoU@0.1 | R@1,IoU@0.3 | R@1,IoU@0.5 | R@5,IoU@0.1 | R@5,IoU@0.3 | R@5,IoU@0.5 |
+-------------+-------------+-------------+-------------+-------------+-------------+
| 50.01 | 36.67 | 25.64 | 77.36 | 61.93 | 46.54 |
+-------------+-------------+-------------+-------------+-------------+-------------+

some metric can differ a lot.

zhenzhiwang · 2022-10-22T08:34:56Z

Thank you LLLddddd.
I don't observe such results. Could you please provide the original log file of these results?
Besides, are you sure the hyper-parameters are exactly the same and the resluts are achieved by the same epoch? To a better comparison, please also set the random seeds as the same value. It is the first time for me to know such situation, so please provide more information so that we could better explore the reason behind it.

LLLddddd · 2022-10-22T13:02:33Z

Hi，thanks for your reply. In order to make sure the hyper-parameters, random seed, etc are exactly the same, I re-download your entire official code.

I only changed the following places of your code:

file path in paths_catalog.py
since when i run the code, it raises an error in tranier.py: AttributeError: 'RandomSampler' object has no attribute 'set_epoch'. So I delete the code "data_loader.batch_sampler.sampler.set_epoch(epoch)" in 60 line of trainer.py. I'm not sure what's the function of this code line? Will it have effect if i delete it?
I use "CUDA_VISIBLE_DEVICES=0 python train_net.py --config-file configs/pool_tacos_128x128_k5l8.yaml" to run the code, I did not use the tacos_train.sh.
I only use 1 gpu card. I did not use distributed training.

All the rest are the same as your official code.

The log of the first running (first two epoch):

nohup: ignoring input
2022-10-22 19:44:48,045 mmn INFO: Using 1 GPUs
2022-10-22 19:44:48,045 mmn INFO: Namespace(config_file='configs/pool_tacos_128x128_k5l8.yaml', distributed=False, local_rank=0, opts=[], skip_test=False)
2022-10-22 19:44:48,045 mmn INFO: Loaded configuration file configs/pool_tacos_128x128_k5l8.yaml
2022-10-22 19:44:48,045 mmn INFO:
MODEL:
ARCHITECTURE: "MMN"
MMN:
NUM_CLIPS: 128
JOINT_SPACE_SIZE: 256
FEATPOOL:
INPUT_SIZE: 4096
HIDDEN_SIZE: 512
KERNEL_SIZE: 2
FEAT2D:
NAME: "pool"
POOLING_COUNTS: [15,8,8,8]
TEXT_ENCODER:
NAME: 'BERT'
PREDICTOR:
HIDDEN_SIZE: 512
KERNEL_SIZE: 5
NUM_STACK_LAYERS: 8
LOSS:
MIN_IOU: 0.3
MAX_IOU: 0.7
NUM_POSTIVE_VIDEO_PROPOSAL: 3
NEGATIVE_VIDEO_IOU: 0.5
SENT_REMOVAL_IOU: 0.5
TAU_VIDEO: 0.1
TAU_SENT: 0.1
MARGIN: 0.1
CONTRASTIVE_WEIGHT: 0.1
DATASETS:
NAME: "tacos"
TRAIN: ("tacos_train",)
TEST: ("tacos_test",)
INPUT:
NUM_PRE_CLIPS: 256
DATALOADER:
NUM_WORKERS: 12
SOLVER:
LR: 0.0015
BATCH_SIZE: 4
MILESTONES: (130, 190)
MAX_EPOCH: 250
TEST_PERIOD: 10
CHECKPOINT_PERIOD: 10
RESUME: False
RESUME_EPOCH: 101
FREEZE_BERT: 40
ONLY_IOU: 100
SKIP_TEST: 0
TEST:
NMS_THRESH: 0.4
BATCH_SIZE: 16
CONTRASTIVE_SCORE_POW: 0.3

2022-10-22 19:44:48,046 mmn INFO: Running with config:
DATALOADER:
NUM_WORKERS: 12
DATASETS:
NAME: tacos
TEST: ('tacos_test',)
TRAIN: ('tacos_train',)
INPUT:
NUM_PRE_CLIPS: 256
MODEL:
ARCHITECTURE: MMN
DEVICE: cuda
MMN:
FEAT2D:
NAME: pool
POOLING_COUNTS: [15, 8, 8, 8]
FEATPOOL:
HIDDEN_SIZE: 512
INPUT_SIZE: 4096
KERNEL_SIZE: 2
JOINT_SPACE_SIZE: 256
LOSS:
BCE_WEIGHT: 1
CONTRASTIVE_WEIGHT: 0.1
MARGIN: 0.1
MAX_IOU: 0.7
MIN_IOU: 0.3
NEGATIVE_VIDEO_IOU: 0.5
NUM_POSTIVE_VIDEO_PROPOSAL: 3
PAIRWISE_SENT_WEIGHT: 0.0
SENT_REMOVAL_IOU: 0.5
TAU_SENT: 0.1
TAU_VIDEO: 0.1
NUM_CLIPS: 128
PREDICTOR:
HIDDEN_SIZE: 512
KERNEL_SIZE: 5
NUM_STACK_LAYERS: 8
TEXT_ENCODER:
NAME: BERT
OUTPUT_DIR: .
PATHS_CATALOG: ./MMN-main-official/MMN-main/mmn/config/paths_catalog.py
SOLVER:
BATCH_SIZE: 4
CHECKPOINT_PERIOD: 10
FREEZE_BERT: 40
LR: 0.0015
MAX_EPOCH: 250
MILESTONES: (130, 190)
ONLY_IOU: 100
RESUME: False
RESUME_EPOCH: 101
SKIP_TEST: 0
TEST_PERIOD: 10
TEST:
BATCH_SIZE: 16
CONTRASTIVE_SCORE_POW: 0.3
NMS_THRESH: 0.4
2022-10-22 19:44:48,046 mmn INFO: Saving config into: ./config.yml
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_projector.bias', 'vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']

This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
2022-10-22 19:45:00,061 mmn.trainer INFO: Preparing data, please wait...
2022-10-22 19:45:30,962 mmn.trainer INFO: Preparing data, please wait...
2022-10-22 19:46:07,631 mmn.trainer INFO: Start training
2022-10-22 19:46:07,633 mmn.trainer INFO: Start epoch 1. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:46:07,633 mmn.trainer INFO: Using all losses
./MMN-main-official/MMN-main/mmn/data/datasets/utils.py:35: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at /opt/conda/conda-bld/pytorch_1603728993639/work/torch/csrc/utils/python_arg_parser.cpp:882.)
grids = score2d.nonzero()
2022-10-22 19:46:43,818 mmn.trainer INFO: eta: 4:45:51 epoch: 1/250 iteration: 10/19 loss_vid: 0.74 loss_sent: 1.02 loss_iou: 0.22 time: 3.62 max mem: 5880
2022-10-22 19:47:13,833 mmn.trainer INFO: eta: 4:34:43 epoch: 1/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.34 max mem: 5880
2022-10-22 19:47:14,036 mmn.trainer INFO: Start epoch 2. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:47:14,036 mmn.trainer INFO: Using all losses
2022-10-22 19:47:50,335 mmn.trainer INFO: eta: 4:38:39 epoch: 2/250 iteration: 10/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.13 time: 3.65 max mem: 5880
2022-10-22 19:48:20,727 mmn.trainer INFO: eta: 4:35:03 epoch: 2/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.38 max mem: 5880
2022-10-22 19:48:20,888 mmn.trainer INFO: Start epoch 3. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:48:20,888 mmn.trainer INFO: Using all losses
2022-10-22 19:48:57,038 mmn.trainer INFO: eta: 4:36:34 epoch: 3/250 iteration: 10/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.63 max mem: 5880
2022-10-22 19:49:28,093 mmn.trainer INFO: eta: 4:35:04 epoch: 3/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.13 time: 3.48 max mem: 5931
2022-10-22 19:49:28,227 mmn.trainer INFO: Start epoch 4. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:49:28,227 mmn.trainer INFO: Using all losses
2022-10-22 19:50:04,279 mmn.trainer INFO: eta: 4:35:40 epoch: 4/250 iteration: 10/19 loss_vid: 0.73 loss_sent: 1.02 loss_iou: 0.13 time: 3.62 max mem: 5931
2022-10-22 19:50:35,181 mmn.trainer INFO: eta: 4:34:14 epoch: 4/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.45 max mem: 5931
2022-10-22 19:50:35,337 mmn.trainer INFO: Start epoch 5. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:50:35,337 mmn.trainer INFO: Using all losses
2022-10-22 19:51:11,027 mmn.trainer INFO: eta: 4:34:13 epoch: 5/250 iteration: 10/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.58 max mem: 5931
2022-10-22 19:51:42,102 mmn.trainer INFO: eta: 4:33:08 epoch: 5/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.01 loss_iou: 0.13 time: 3.48 max mem: 5931
2022-10-22 19:51:42,273 mmn.trainer INFO: Start epoch 6. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:51:42,273 mmn.trainer INFO: Using all losses
2022-10-22 19:52:18,258 mmn.trainer INFO: eta: 4:33:15 epoch: 6/250 iteration: 10/19 loss_vid: 0.72 loss_sent: 1.00 loss_iou: 0.14 time: 3.62 max mem: 5931
2022-10-22 19:52:49,545 mmn.trainer INFO: eta: 4:32:24 epoch: 6/250 iteration: 19/19 loss_vid: 0.71 loss_sent: 0.99 loss_iou: 0.13 time: 3.48 max mem: 5931
2022-10-22 19:52:49,726 mmn.trainer INFO: Start epoch 7. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:52:49,727 mmn.trainer INFO: Using all losses
2022-10-22 19:53:26,213 mmn.trainer INFO: eta: 4:32:41 epoch: 7/250 iteration: 10/19 loss_vid: 0.71 loss_sent: 0.99 loss_iou: 0.14 time: 3.67 max mem: 5931
2022-10-22 19:53:56,666 mmn.trainer INFO: eta: 4:31:22 epoch: 7/250 iteration: 19/19 loss_vid: 0.70 loss_sent: 0.98 loss_iou: 0.14 time: 3.40 max mem: 5931
2022-10-22 19:53:56,998 mmn.trainer INFO: Start epoch 8. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:53:56,998 mmn.trainer INFO: Using all losses
2022-10-22 19:54:33,683 mmn.trainer INFO: eta: 4:31:43 epoch: 8/250 iteration: 10/19 loss_vid: 0.70 loss_sent: 0.98 loss_iou: 0.13 time: 3.70 max mem: 5931
2022-10-22 19:55:03,949 mmn.trainer INFO: eta: 4:30:23 epoch: 8/250 iteration: 19/19 loss_vid: 0.69 loss_sent: 0.97 loss_iou: 0.14 time: 3.38 max mem: 5931
2022-10-22 19:55:04,097 mmn.trainer INFO: Start epoch 9. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:55:04,097 mmn.trainer INFO: Using all losses
2022-10-22 19:55:40,197 mmn.trainer INFO: eta: 4:30:15 epoch: 9/250 iteration: 10/19 loss_vid: 0.69 loss_sent: 0.97 loss_iou: 0.14 time: 3.62 max mem: 5931
2022-10-22 19:56:10,485 mmn.trainer INFO: eta: 4:29:03 epoch: 9/250 iteration: 19/19 loss_vid: 0.69 loss_sent: 0.97 loss_iou: 0.13 time: 3.39 max mem: 5931
2022-10-22 19:56:10,831 mmn.trainer INFO: Start epoch 10. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:56:10,831 mmn.trainer INFO: Using all losses
2022-10-22 19:56:47,086 mmn.trainer INFO: eta: 4:29:01 epoch: 10/250 iteration: 10/19 loss_vid: 0.68 loss_sent: 0.97 loss_iou: 0.13 time: 3.66 max mem: 5931
2022-10-22 19:57:16,882 mmn.trainer INFO: eta: 4:27:41 epoch: 10/250 iteration: 19/19 loss_vid: 0.68 loss_sent: 0.96 loss_iou: 0.13 time: 3.32 max mem: 5931
2022-10-22 19:57:17,025 mmn.utils.checkpoint INFO: Saving checkpoint to ./pool_model_10e.pth
2022-10-22 19:57:21,256 mmn.inference INFO: Start evaluation on ('tacos_test',) dataset (Size: 25).
2022-10-22 19:57:41,447 mmn.inference INFO: Model inference time: 0:00:08.073524 (0.323 s / inference per device, on 1 devices)
2022-10-22 19:57:41,448 mmn.inference INFO: Performing TACoSDataset evaluation (Size: 25).

0it [00:00, ?it/s]
1it [00:08, 8.09s/it]
2it [00:15, 7.63s/it]
3it [00:24, 8.09s/it]
4it [00:27, 6.36s/it]
5it [00:33, 6.05s/it]
6it [00:43, 7.45s/it]
7it [00:48, 6.79s/it]
8it [00:54, 6.35s/it]
9it [01:02, 6.98s/it]
10it [01:10, 7.19s/it]
11it [01:19, 7.83s/it]
12it [01:25, 7.35s/it]
13it [01:32, 7.28s/it]
14it [01:42, 7.96s/it]
15it [01:49, 7.61s/it]
16it [01:57, 7.67s/it]
17it [02:03, 7.17s/it]
18it [02:10, 7.13s/it]
19it [02:17, 7.28s/it]
20it [02:22, 6.43s/it]
21it [02:27, 6.15s/it]
22it [02:34, 6.25s/it]
23it [02:41, 6.50s/it]
24it [02:49, 6.96s/it]
25it [02:57, 7.30s/it]
25it [02:57, 7.10s/it]2022-10-22 20:00:38,870 mmn.inference INFO:
+-------------+-------------+-------------+-------------+-------------+-------------+
| R@1,IoU@0.1 | R@1,IoU@0.3 | R@1,IoU@0.5 | R@5,IoU@0.1 | R@5,IoU@0.3 | R@5,IoU@0.5 |
+-------------+-------------+-------------+-------------+-------------+-------------+
| 24.72 | 11.87 | 4.22 | 40.51 | 27.27 | 16.12 |
+-------------+-------------+-------------+-------------+-------------+-------------+
2022-10-22 20:00:38,930 mmn.trainer INFO: Start epoch 11. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:00:38,930 mmn.trainer INFO: Using all losses

2022-10-22 20:01:12,734 mmn.trainer INFO: eta: 5:43:11 epoch: 11/250 iteration: 10/19 loss_vid: 0.68 loss_sent: 0.96 loss_iou: 0.13 time: 23.59 max mem: 5931
2022-10-22 20:01:41,536 mmn.trainer INFO: eta: 5:38:11 epoch: 11/250 iteration: 19/19 loss_vid: 0.68 loss_sent: 0.97 loss_iou: 0.13 time: 3.22 max mem: 5931
2022-10-22 20:01:41,689 mmn.trainer INFO: Start epoch 12. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:01:41,690 mmn.trainer INFO: Using all losses
2022-10-22 20:02:16,402 mmn.trainer INFO: eta: 5:34:03 epoch: 12/250 iteration: 10/19 loss_vid: 0.68 loss_sent: 0.96 loss_iou: 0.13 time: 3.49 max mem: 5931
2022-10-22 20:02:45,657 mmn.trainer INFO: eta: 5:29:54 epoch: 12/250 iteration: 19/19 loss_vid: 0.67 loss_sent: 0.95 loss_iou: 0.13 time: 3.27 max mem: 5931
2022-10-22 20:02:45,784 mmn.trainer INFO: Start epoch 13. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:02:45,785 mmn.trainer INFO: Using all losses
2022-10-22 20:03:20,538 mmn.trainer INFO: eta: 5:26:21 epoch: 13/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.96 loss_iou: 0.13 time: 3.49 max mem: 5931
2022-10-22 20:03:50,474 mmn.trainer INFO: eta: 5:22:56 epoch: 13/250 iteration: 19/19 loss_vid: 0.68 loss_sent: 0.96 loss_iou: 0.12 time: 3.34 max mem: 5931
2022-10-22 20:03:50,620 mmn.trainer INFO: Start epoch 14. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:03:50,620 mmn.trainer INFO: Using all losses
2022-10-22 20:04:26,146 mmn.trainer INFO: eta: 5:20:04 epoch: 14/250 iteration: 10/19 loss_vid: 0.68 loss_sent: 0.96 loss_iou: 0.13 time: 3.57 max mem: 5931
2022-10-22 20:04:55,851 mmn.trainer INFO: eta: 5:16:58 epoch: 14/250 iteration: 19/19 loss_vid: 0.66 loss_sent: 0.94 loss_iou: 0.12 time: 3.31 max mem: 5931
2022-10-22 20:04:55,989 mmn.trainer INFO: Start epoch 15. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:04:55,989 mmn.trainer INFO: Using all losses
2022-10-22 20:05:31,275 mmn.trainer INFO: eta: 5:14:22 epoch: 15/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.95 loss_iou: 0.13 time: 3.54 max mem: 5931
2022-10-22 20:06:00,960 mmn.trainer INFO: eta: 5:11:35 epoch: 15/250 iteration: 19/19 loss_vid: 0.66 loss_sent: 0.94 loss_iou: 0.12 time: 3.31 max mem: 5931
2022-10-22 20:06:01,082 mmn.trainer INFO: Start epoch 16. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:06:01,082 mmn.trainer INFO: Using all losses
2022-10-22 20:06:36,527 mmn.trainer INFO: eta: 5:09:18 epoch: 16/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.95 loss_iou: 0.12 time: 3.56 max mem: 5931
2022-10-22 20:07:06,416 mmn.trainer INFO: eta: 5:06:49 epoch: 16/250 iteration: 19/19 loss_vid: 0.66 loss_sent: 0.95 loss_iou: 0.12 time: 3.33 max mem: 5931
2022-10-22 20:07:06,545 mmn.trainer INFO: Start epoch 17. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:07:06,545 mmn.trainer INFO: Using all losses
2022-10-22 20:07:41,854 mmn.trainer INFO: eta: 5:04:43 epoch: 17/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.94 loss_iou: 0.12 time: 3.54 max mem: 5931
2022-10-22 20:08:11,821 mmn.trainer INFO: eta: 5:02:29 epoch: 17/250 iteration: 19/19 loss_vid: 0.66 loss_sent: 0.94 loss_iou: 0.12 time: 3.34 max mem: 5931
2022-10-22 20:08:11,980 mmn.trainer INFO: Start epoch 18. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:08:11,980 mmn.trainer INFO: Using all losses
2022-10-22 20:08:47,296 mmn.trainer INFO: eta: 5:00:34 epoch: 18/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.94 loss_iou: 0.12 time: 3.55 max mem: 5931
2022-10-22 20:09:17,322 mmn.trainer INFO: eta: 4:58:31 epoch: 18/250 iteration: 19/19 loss_vid: 0.67 loss_sent: 0.94 loss_iou: 0.13 time: 3.33 max mem: 5931
2022-10-22 20:09:17,477 mmn.trainer INFO: Start epoch 19. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:09:17,477 mmn.trainer INFO: Using all losses
2022-10-22 20:09:53,431 mmn.trainer INFO: eta: 4:56:54 epoch: 19/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.94 loss_iou: 0.12 time: 3.61 max mem: 5931
2022-10-22 20:10:23,315 mmn.trainer INFO: eta: 4:54:58 epoch: 19/250 iteration: 19/19 loss_vid: 0.66 loss_sent: 0.94 loss_iou: 0.13 time: 3.32 max mem: 5931
2022-10-22 20:10:23,480 mmn.trainer INFO: Start epoch 20. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:10:23,480 mmn.trainer INFO: Using all losses
2022-10-22 20:10:58,675 mmn.trainer INFO: eta: 4:53:19 epoch: 20/250 iteration: 10/19 loss_vid: 0.66 loss_sent: 0.93 loss_iou: 0.12 time: 3.54 max mem: 5931
2022-10-22 20:11:28,663 mmn.trainer INFO: eta: 4:51:31 epoch: 20/250 iteration: 19/19 loss_vid: 0.65 loss_sent: 0.92 loss_iou: 0.12 time: 3.34 max mem: 5931
2022-10-22 20:11:28,826 mmn.utils.checkpoint INFO: Saving checkpoint to ./pool_model_20e.pth
2022-10-22 20:11:32,992 mmn.inference INFO: Start evaluation on ('tacos_test',) dataset (Size: 25).
2022-10-22 20:11:45,165 mmn.inference INFO: Model inference time: 0:00:08.189008 (0.328 s / inference per device, on 1 devices)
2022-10-22 20:11:45,167 mmn.inference INFO: Performing TACoSDataset evaluation (Size: 25).

0it [00:00, ?it/s]
1it [00:08, 8.00s/it]
2it [00:14, 7.25s/it]
3it [00:23, 7.73s/it]
4it [00:26, 6.14s/it]
5it [00:32, 5.95s/it]
6it [00:42, 7.26s/it]
7it [00:47, 6.55s/it]
8it [00:52, 6.12s/it]
9it [01:00, 6.86s/it]
10it [01:08, 6.96s/it]
11it [01:16, 7.30s/it]
12it [01:22, 6.94s/it]
13it [01:28, 6.75s/it]
14it [01:37, 7.35s/it]
15it [01:44, 7.26s/it]
16it [01:55, 8.57s/it]
17it [02:03, 8.22s/it]
18it [02:11, 8.33s/it]
19it [02:22, 9.07s/it]
20it [02:28, 7.94s/it]
21it [02:36, 8.04s/it]
22it [02:45, 8.25s/it]
23it [02:52, 8.13s/it]
24it [03:03, 8.80s/it]
25it [03:14, 9.64s/it]
25it [03:14, 7.80s/it]2022-10-22 20:15:00,069 mmn.inference INFO:
+-------------+-------------+-------------+-------------+-------------+-------------+
| R@1,IoU@0.1 | R@1,IoU@0.3 | R@1,IoU@0.5 | R@5,IoU@0.1 | R@5,IoU@0.3 | R@5,IoU@0.5 |
+-------------+-------------+-------------+-------------+-------------+-------------+
| 29.59 | 17.15 | 7.75 | 62.36 | 46.69 | 27.92 |
+-------------+-------------+-------------+-------------+-------------+-------------+
2022-10-22 20:15:00,160 mmn.trainer INFO: Start epoch 21. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:15:00,161 mmn.trainer INFO: Using all losses

2022-10-22 20:15:36,223 mmn.trainer INFO: eta: 5:29:31 epoch: 21/250 iteration: 10/19 loss_vid: 0.66 loss_sent: 0.93 loss_iou: 0.12 time: 24.75 max mem: 5931
2022-10-22 20:16:06,322 mmn.trainer INFO: eta: 5:26:54 epoch: 21/250 iteration: 19/19 loss_vid: 0.65 loss_sent: 0.92 loss_iou: 0.12 time: 3.36 max mem: 5931
2022-10-22 20:16:06,514 mmn.trainer INFO: Start epoch 22. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:16:06,514 mmn.trainer INFO: Using all losses
2022-10-22 20:16:42,464 mmn.trainer INFO: eta: 5:24:34 epoch: 22/250 iteration: 10/19 loss_vid: 0.65 loss_sent: 0.91 loss_iou: 0.11 time: 3.61 max mem: 5931
2022-10-22 20:17:12,540 mmn.trainer INFO: eta: 5:22:07 epoch: 22/250 iteration: 19/19 loss_vid: 0.64 loss_sent: 0.91 loss_iou: 0.12 time: 3.36 max mem: 5931
2022-10-22 20:17:12,828 mmn.trainer INFO: Start epoch 23. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:17:12,829 mmn.trainer INFO: Using all losses
2022-10-22 20:17:49,070 mmn.trainer INFO: eta: 5:20:00 epoch: 23/250 iteration: 10/19 loss_vid: 0.64 loss_sent: 0.91 loss_iou: 0.12 time: 3.65 max mem: 5931
2022-10-22 20:18:20,236 mmn.trainer INFO: eta: 5:17:53 epoch: 23/250 iteration: 19/19 loss_vid: 0.65 loss_sent: 0.90 loss_iou: 0.11 time: 3.47 max mem: 5931
2022-10-22 20:18:20,538 mmn.trainer INFO: Start epoch 24. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:18:20,538 mmn.trainer INFO: Using all losses
2022-10-22 20:18:57,706 mmn.trainer INFO: eta: 5:16:04 epoch: 24/250 iteration: 10/19 loss_vid: 0.65 loss_sent: 0.90 loss_iou: 0.12 time: 3.75 max mem: 5931

the log of the second training:

nohup: ignoring input
2022-10-22 19:47:02,181 mmn INFO: Using 1 GPUs
2022-10-22 19:47:02,181 mmn INFO: Namespace(config_file='configs/pool_tacos_128x128_k5l8.yaml', distributed=False, local_rank=0, opts=[], skip_test=False)
2022-10-22 19:47:02,181 mmn INFO: Loaded configuration file configs/pool_tacos_128x128_k5l8.yaml
2022-10-22 19:47:02,181 mmn INFO:
MODEL:
ARCHITECTURE: "MMN"
MMN:
NUM_CLIPS: 128
JOINT_SPACE_SIZE: 256
FEATPOOL:
INPUT_SIZE: 4096
HIDDEN_SIZE: 512
KERNEL_SIZE: 2
FEAT2D:
NAME: "pool"
POOLING_COUNTS: [15,8,8,8]
TEXT_ENCODER:
NAME: 'BERT'
PREDICTOR:
HIDDEN_SIZE: 512
KERNEL_SIZE: 5
NUM_STACK_LAYERS: 8
LOSS:
MIN_IOU: 0.3
MAX_IOU: 0.7
NUM_POSTIVE_VIDEO_PROPOSAL: 3
NEGATIVE_VIDEO_IOU: 0.5
SENT_REMOVAL_IOU: 0.5
TAU_VIDEO: 0.1
TAU_SENT: 0.1
MARGIN: 0.1
CONTRASTIVE_WEIGHT: 0.1
DATASETS:
NAME: "tacos"
TRAIN: ("tacos_train",)
TEST: ("tacos_test",)
INPUT:
NUM_PRE_CLIPS: 256
DATALOADER:
NUM_WORKERS: 12
SOLVER:
LR: 0.0015
BATCH_SIZE: 4
MILESTONES: (130, 190)
MAX_EPOCH: 250
TEST_PERIOD: 10
CHECKPOINT_PERIOD: 10
RESUME: False
RESUME_EPOCH: 101
FREEZE_BERT: 40
ONLY_IOU: 100
SKIP_TEST: 0
TEST:
NMS_THRESH: 0.4
BATCH_SIZE: 16
CONTRASTIVE_SCORE_POW: 0.3

2022-10-22 19:47:02,182 mmn INFO: Running with config:
DATALOADER:
NUM_WORKERS: 12
DATASETS:
NAME: tacos
TEST: ('tacos_test',)
TRAIN: ('tacos_train',)
INPUT:
NUM_PRE_CLIPS: 256
MODEL:
ARCHITECTURE: MMN
DEVICE: cuda
MMN:
FEAT2D:
NAME: pool
POOLING_COUNTS: [15, 8, 8, 8]
FEATPOOL:
HIDDEN_SIZE: 512
INPUT_SIZE: 4096
KERNEL_SIZE: 2
JOINT_SPACE_SIZE: 256
LOSS:
BCE_WEIGHT: 1
CONTRASTIVE_WEIGHT: 0.1
MARGIN: 0.1
MAX_IOU: 0.7
MIN_IOU: 0.3
NEGATIVE_VIDEO_IOU: 0.5
NUM_POSTIVE_VIDEO_PROPOSAL: 3
PAIRWISE_SENT_WEIGHT: 0.0
SENT_REMOVAL_IOU: 0.5
TAU_SENT: 0.1
TAU_VIDEO: 0.1
NUM_CLIPS: 128
PREDICTOR:
HIDDEN_SIZE: 512
KERNEL_SIZE: 5
NUM_STACK_LAYERS: 8
TEXT_ENCODER:
NAME: BERT
OUTPUT_DIR: .
PATHS_CATALOG: ./MMN-main-official/MMN-main/mmn/config/paths_catalog.py
SOLVER:
BATCH_SIZE: 4
CHECKPOINT_PERIOD: 10
FREEZE_BERT: 40
LR: 0.0015
MAX_EPOCH: 250
MILESTONES: (130, 190)
ONLY_IOU: 100
RESUME: False
RESUME_EPOCH: 101
SKIP_TEST: 0
TEST_PERIOD: 10
TEST:
BATCH_SIZE: 16
CONTRASTIVE_SCORE_POW: 0.3
NMS_THRESH: 0.4
2022-10-22 19:47:02,182 mmn INFO: Saving config into: ./config.yml
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.bias']

This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
2022-10-22 19:47:24,821 mmn.trainer INFO: Preparing data, please wait...
2022-10-22 19:48:08,211 mmn.trainer INFO: Preparing data, please wait...
2022-10-22 19:48:27,384 mmn.trainer INFO: Start training
2022-10-22 19:48:27,386 mmn.trainer INFO: Start epoch 1. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:48:27,387 mmn.trainer INFO: Using all losses
./MMN-main-official/MMN-main/mmn/data/datasets/utils.py:35: UserWarning: This overload of nonzero is deprecated:
nonzero()
Consider using one of the following signatures instead:
nonzero(*, bool as_tuple) (Triggered internally at /opt/conda/conda-bld/pytorch_1603728993639/work/torch/csrc/utils/python_arg_parser.cpp:882.)
grids = score2d.nonzero()
2022-10-22 19:49:04,995 mmn.trainer INFO: eta: 4:57:05 epoch: 1/250 iteration: 10/19 loss_vid: 0.74 loss_sent: 1.02 loss_iou: 0.22 time: 3.76 max mem: 5880
2022-10-22 19:49:35,106 mmn.trainer INFO: eta: 4:41:01 epoch: 1/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.35 max mem: 5880
2022-10-22 19:49:35,356 mmn.trainer INFO: Start epoch 2. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:49:35,356 mmn.trainer INFO: Using all losses
2022-10-22 19:50:11,703 mmn.trainer INFO: eta: 4:43:01 epoch: 2/250 iteration: 10/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.13 time: 3.66 max mem: 5880
2022-10-22 19:50:42,019 mmn.trainer INFO: eta: 4:38:14 epoch: 2/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.37 max mem: 5880
2022-10-22 19:50:42,314 mmn.trainer INFO: Start epoch 3. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:50:42,315 mmn.trainer INFO: Using all losses
2022-10-22 19:51:19,514 mmn.trainer INFO: eta: 4:41:01 epoch: 3/250 iteration: 10/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.75 max mem: 5880
2022-10-22 19:51:50,072 mmn.trainer INFO: eta: 4:38:07 epoch: 3/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.13 time: 3.40 max mem: 5931
2022-10-22 19:51:50,416 mmn.trainer INFO: Start epoch 4. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:51:50,416 mmn.trainer INFO: Using all losses
2022-10-22 19:52:27,834 mmn.trainer INFO: eta: 4:40:06 epoch: 4/250 iteration: 10/19 loss_vid: 0.73 loss_sent: 1.02 loss_iou: 0.13 time: 3.78 max mem: 5931
2022-10-22 19:52:58,100 mmn.trainer INFO: eta: 4:37:28 epoch: 4/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.37 max mem: 5931
2022-10-22 19:52:58,351 mmn.trainer INFO: Start epoch 5. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:52:58,351 mmn.trainer INFO: Using all losses
2022-10-22 19:53:34,492 mmn.trainer INFO: eta: 4:37:35 epoch: 5/250 iteration: 10/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.64 max mem: 5931
2022-10-22 19:54:05,270 mmn.trainer INFO: eta: 4:35:56 epoch: 5/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.01 loss_iou: 0.13 time: 3.43 max mem: 5931
2022-10-22 19:54:05,490 mmn.trainer INFO: Start epoch 6. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:54:05,490 mmn.trainer INFO: Using all losses
2022-10-22 19:54:42,282 mmn.trainer INFO: eta: 4:36:24 epoch: 6/250 iteration: 10/19 loss_vid: 0.72 loss_sent: 1.00 loss_iou: 0.14 time: 3.70 max mem: 5931
2022-10-22 19:55:12,733 mmn.trainer INFO: eta: 4:34:44 epoch: 6/250 iteration: 19/19 loss_vid: 0.71 loss_sent: 0.99 loss_iou: 0.13 time: 3.38 max mem: 5931
2022-10-22 19:55:13,059 mmn.trainer INFO: Start epoch 7. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:55:13,060 mmn.trainer INFO: Using all losses
2022-10-22 19:55:50,374 mmn.trainer INFO: eta: 4:35:26 epoch: 7/250 iteration: 10/19 loss_vid: 0.71 loss_sent: 0.99 loss_iou: 0.14 time: 3.76 max mem: 5931
2022-10-22 19:56:20,779 mmn.trainer INFO: eta: 4:33:53 epoch: 7/250 iteration: 19/19 loss_vid: 0.69 loss_sent: 0.98 loss_iou: 0.14 time: 3.38 max mem: 5931
2022-10-22 19:56:21,101 mmn.trainer INFO: Start epoch 8. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:56:21,101 mmn.trainer INFO: Using all losses
2022-10-22 19:56:57,951 mmn.trainer INFO: eta: 4:34:08 epoch: 8/250 iteration: 10/19 loss_vid: 0.70 loss_sent: 0.98 loss_iou: 0.13 time: 3.72 max mem: 5931
2022-10-22 19:57:27,451 mmn.trainer INFO: eta: 4:32:16 epoch: 8/250 iteration: 19/19 loss_vid: 0.69 loss_sent: 0.97 loss_iou: 0.14 time: 3.29 max mem: 5931
2022-10-22 19:57:27,599 mmn.trainer INFO: Start epoch 9. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:57:27,599 mmn.trainer INFO: Using all losses
2022-10-22 19:58:02,527 mmn.trainer INFO: eta: 4:31:28 epoch: 9/250 iteration: 10/19 loss_vid: 0.69 loss_sent: 0.97 loss_iou: 0.14 time: 3.51 max mem: 5931
2022-10-22 19:58:32,257 mmn.trainer INFO: eta: 4:29:57 epoch: 9/250 iteration: 19/19 loss_vid: 0.69 loss_sent: 0.97 loss_iou: 0.13 time: 3.32 max mem: 5931
2022-10-22 19:58:32,429 mmn.trainer INFO: Start epoch 10. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 19:58:32,429 mmn.trainer INFO: Using all losses
2022-10-22 19:59:07,513 mmn.trainer INFO: eta: 4:29:18 epoch: 10/250 iteration: 10/19 loss_vid: 0.68 loss_sent: 0.97 loss_iou: 0.13 time: 3.53 max mem: 5931
2022-10-22 19:59:37,206 mmn.trainer INFO: eta: 4:27:55 epoch: 10/250 iteration: 19/19 loss_vid: 0.68 loss_sent: 0.96 loss_iou: 0.13 time: 3.30 max mem: 5931
2022-10-22 19:59:37,339 mmn.utils.checkpoint INFO: Saving checkpoint to ./pool_model_10e.pth
2022-10-22 19:59:41,130 mmn.inference INFO: Start evaluation on ('tacos_test',) dataset (Size: 25).
2022-10-22 19:59:52,750 mmn.inference INFO: Model inference time: 0:00:08.141408 (0.326 s / inference per device, on 1 devices)
2022-10-22 19:59:52,751 mmn.inference INFO: Performing TACoSDataset evaluation (Size: 25).

0it [00:00, ?it/s]
1it [00:08, 8.27s/it]
2it [00:15, 7.69s/it]
3it [00:24, 8.27s/it]
4it [00:28, 6.56s/it]
5it [00:34, 6.27s/it]
6it [00:44, 7.52s/it]
7it [00:49, 6.79s/it]
8it [00:55, 6.42s/it]
9it [01:03, 7.13s/it]
10it [01:11, 7.19s/it]
11it [01:20, 7.80s/it]
12it [01:26, 7.40s/it]
13it [01:33, 7.27s/it]
14it [01:43, 7.99s/it]
15it [01:50, 7.81s/it]
16it [01:58, 7.77s/it]
17it [02:04, 7.27s/it]
18it [02:12, 7.37s/it]
19it [02:20, 7.54s/it]
20it [02:24, 6.71s/it]
21it [02:31, 6.56s/it]
22it [02:37, 6.53s/it]
23it [02:44, 6.72s/it]
24it [02:53, 7.34s/it]
25it [03:02, 7.83s/it]
25it [03:02, 7.30s/it]2022-10-22 20:02:55,217 mmn.inference INFO:
+-------------+-------------+-------------+-------------+-------------+-------------+
| R@1,IoU@0.1 | R@1,IoU@0.3 | R@1,IoU@0.5 | R@5,IoU@0.1 | R@5,IoU@0.3 | R@5,IoU@0.5 |
+-------------+-------------+-------------+-------------+-------------+-------------+
| 23.07 | 10.25 | 3.95 | 40.14 | 25.72 | 16.30 |
+-------------+-------------+-------------+-------------+-------------+-------------+
2022-10-22 20:02:55,295 mmn.trainer INFO: Start epoch 11. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:02:55,295 mmn.trainer INFO: Using all losses

2022-10-22 20:03:29,082 mmn.trainer INFO: eta: 5:41:53 epoch: 11/250 iteration: 10/19 loss_vid: 0.69 loss_sent: 0.96 loss_iou: 0.13 time: 23.19 max mem: 5931
2022-10-22 20:03:58,041 mmn.trainer INFO: eta: 5:37:00 epoch: 11/250 iteration: 19/19 loss_vid: 0.68 loss_sent: 0.97 loss_iou: 0.13 time: 3.23 max mem: 5931
2022-10-22 20:03:58,177 mmn.trainer INFO: Start epoch 12. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:03:58,177 mmn.trainer INFO: Using all losses
2022-10-22 20:04:33,011 mmn.trainer INFO: eta: 5:32:58 epoch: 12/250 iteration: 10/19 loss_vid: 0.68 loss_sent: 0.96 loss_iou: 0.13 time: 3.50 max mem: 5931
2022-10-22 20:05:02,451 mmn.trainer INFO: eta: 5:28:55 epoch: 12/250 iteration: 19/19 loss_vid: 0.67 loss_sent: 0.95 loss_iou: 0.13 time: 3.29 max mem: 5931
2022-10-22 20:05:02,603 mmn.trainer INFO: Start epoch 13. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:05:02,603 mmn.trainer INFO: Using all losses
2022-10-22 20:05:37,639 mmn.trainer INFO: eta: 5:25:31 epoch: 13/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.96 loss_iou: 0.13 time: 3.52 max mem: 5931
2022-10-22 20:06:07,290 mmn.trainer INFO: eta: 5:22:02 epoch: 13/250 iteration: 19/19 loss_vid: 0.68 loss_sent: 0.95 loss_iou: 0.12 time: 3.32 max mem: 5931
2022-10-22 20:06:07,430 mmn.trainer INFO: Start epoch 14. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:06:07,431 mmn.trainer INFO: Using all losses
2022-10-22 20:06:42,861 mmn.trainer INFO: eta: 5:19:11 epoch: 14/250 iteration: 10/19 loss_vid: 0.68 loss_sent: 0.96 loss_iou: 0.12 time: 3.56 max mem: 5931
2022-10-22 20:07:12,458 mmn.trainer INFO: eta: 5:16:05 epoch: 14/250 iteration: 19/19 loss_vid: 0.66 loss_sent: 0.94 loss_iou: 0.12 time: 3.30 max mem: 5931
2022-10-22 20:07:12,610 mmn.trainer INFO: Start epoch 15. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:07:12,610 mmn.trainer INFO: Using all losses
2022-10-22 20:07:47,929 mmn.trainer INFO: eta: 5:13:32 epoch: 15/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.95 loss_iou: 0.13 time: 3.55 max mem: 5931
2022-10-22 20:08:17,514 mmn.trainer INFO: eta: 5:10:45 epoch: 15/250 iteration: 19/19 loss_vid: 0.67 loss_sent: 0.95 loss_iou: 0.12 time: 3.30 max mem: 5931
2022-10-22 20:08:17,685 mmn.trainer INFO: Start epoch 16. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:08:17,685 mmn.trainer INFO: Using all losses
2022-10-22 20:08:53,066 mmn.trainer INFO: eta: 5:08:29 epoch: 16/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.95 loss_iou: 0.12 time: 3.56 max mem: 5931
2022-10-22 20:09:22,859 mmn.trainer INFO: eta: 5:06:01 epoch: 16/250 iteration: 19/19 loss_vid: 0.66 loss_sent: 0.94 loss_iou: 0.13 time: 3.32 max mem: 5931
2022-10-22 20:09:23,018 mmn.trainer INFO: Start epoch 17. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:09:23,019 mmn.trainer INFO: Using all losses
2022-10-22 20:09:58,336 mmn.trainer INFO: eta: 5:03:57 epoch: 17/250 iteration: 10/19 loss_vid: 0.66 loss_sent: 0.93 loss_iou: 0.12 time: 3.55 max mem: 5931
2022-10-22 20:10:28,255 mmn.trainer INFO: eta: 5:01:43 epoch: 17/250 iteration: 19/19 loss_vid: 0.65 loss_sent: 0.93 loss_iou: 0.12 time: 3.33 max mem: 5931
2022-10-22 20:10:28,408 mmn.trainer INFO: Start epoch 18. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:10:28,408 mmn.trainer INFO: Using all losses
2022-10-22 20:11:03,819 mmn.trainer INFO: eta: 4:59:52 epoch: 18/250 iteration: 10/19 loss_vid: 0.66 loss_sent: 0.94 loss_iou: 0.12 time: 3.56 max mem: 5931
2022-10-22 20:11:33,739 mmn.trainer INFO: eta: 4:57:48 epoch: 18/250 iteration: 19/19 loss_vid: 0.67 loss_sent: 0.94 loss_iou: 0.13 time: 3.32 max mem: 5931
2022-10-22 20:11:33,890 mmn.trainer INFO: Start epoch 19. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:11:33,890 mmn.trainer INFO: Using all losses
2022-10-22 20:12:09,617 mmn.trainer INFO: eta: 4:56:09 epoch: 19/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.94 loss_iou: 0.12 time: 3.59 max mem: 5931
2022-10-22 20:12:39,262 mmn.trainer INFO: eta: 4:54:11 epoch: 19/250 iteration: 19/19 loss_vid: 0.66 loss_sent: 0.94 loss_iou: 0.13 time: 3.30 max mem: 5931
2022-10-22 20:12:39,407 mmn.trainer INFO: Start epoch 20. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:12:39,407 mmn.trainer INFO: Using all losses
2022-10-22 20:13:14,387 mmn.trainer INFO: eta: 4:52:31 epoch: 20/250 iteration: 10/19 loss_vid: 0.65 loss_sent: 0.92 loss_iou: 0.12 time: 3.51 max mem: 5931
2022-10-22 20:13:44,687 mmn.trainer INFO: eta: 4:50:48 epoch: 20/250 iteration: 19/19 loss_vid: 0.65 loss_sent: 0.92 loss_iou: 0.12 time: 3.37 max mem: 5931
2022-10-22 20:13:44,846 mmn.utils.checkpoint INFO: Saving checkpoint to ./pool_model_20e.pth
2022-10-22 20:13:56,710 mmn.inference INFO: Start evaluation on ('tacos_test',) dataset (Size: 25).
2022-10-22 20:14:10,127 mmn.inference INFO: Model inference time: 0:00:08.545635 (0.342 s / inference per device, on 1 devices)
2022-10-22 20:14:10,128 mmn.inference INFO: Performing TACoSDataset evaluation (Size: 25).

0it [00:00, ?it/s]
1it [00:10, 10.95s/it]
2it [00:21, 10.63s/it]
3it [00:32, 10.86s/it]
4it [00:36, 8.32s/it]
5it [00:45, 8.26s/it]
6it [00:58, 10.11s/it]
7it [01:05, 9.02s/it]
8it [01:14, 9.03s/it]
9it [01:25, 9.55s/it]
10it [01:34, 9.33s/it]
11it [01:45, 10.02s/it]
12it [01:55, 9.91s/it]
13it [02:03, 9.39s/it]
14it [02:14, 9.98s/it]
15it [02:25, 10.11s/it]
16it [02:35, 10.15s/it]
17it [02:43, 9.62s/it]
18it [02:54, 9.93s/it]
19it [03:06, 10.38s/it]
20it [03:12, 9.26s/it]
21it [03:21, 9.15s/it]
22it [03:31, 9.46s/it]
23it [03:40, 9.26s/it]
24it [03:52, 9.95s/it]
25it [04:03, 10.36s/it]
25it [04:03, 9.74s/it]2022-10-22 20:18:13,563 mmn.inference INFO:
+-------------+-------------+-------------+-------------+-------------+-------------+
| R@1,IoU@0.1 | R@1,IoU@0.3 | R@1,IoU@0.5 | R@5,IoU@0.1 | R@5,IoU@0.3 | R@5,IoU@0.5 |
+-------------+-------------+-------------+-------------+-------------+-------------+
| 25.27 | 14.00 | 6.80 | 53.34 | 39.77 | 24.29 |
+-------------+-------------+-------------+-------------+-------------+-------------+
2022-10-22 20:18:13,640 mmn.trainer INFO: Start epoch 21. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:18:13,640 mmn.trainer INFO: Using all losses

2022-10-22 20:18:48,761 mmn.trainer INFO: eta: 5:39:22 epoch: 21/250 iteration: 10/19 loss_vid: 0.66 loss_sent: 0.93 loss_iou: 0.12 time: 30.41 max mem: 5931
2022-10-22 20:19:18,079 mmn.trainer INFO: eta: 5:36:21 epoch: 21/250 iteration: 19/19 loss_vid: 0.64 loss_sent: 0.91 loss_iou: 0.12 time: 3.27 max mem: 5931
2022-10-22 20:19:18,263 mmn.trainer INFO: Start epoch 22. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:19:18,263 mmn.trainer INFO: Using all losses
2022-10-22 20:19:53,895 mmn.trainer INFO: eta: 5:33:42 epoch: 22/250 iteration: 10/19 loss_vid: 0.65 loss_sent: 0.91 loss_iou: 0.11 time: 3.58 max mem: 5931
2022-10-22 20:20:24,019 mmn.trainer INFO: eta: 5:31:03 epoch: 22/250 iteration: 19/19 loss_vid: 0.64 loss_sent: 0.91 loss_iou: 0.12 time: 3.35 max mem: 5931
2022-10-22 20:20:24,154 mmn.trainer INFO: Start epoch 23. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:20:24,154 mmn.trainer INFO: Using all losses
2022-10-22 20:20:59,735 mmn.trainer INFO: eta: 5:28:35 epoch: 23/250 iteration: 10/19 loss_vid: 0.64 loss_sent: 0.91 loss_iou: 0.12 time: 3.57 max mem: 5931
2022-10-22 20:21:30,406 mmn.trainer INFO: eta: 5:26:11 epoch: 23/250 iteration: 19/19 loss_vid: 0.65 loss_sent: 0.91 loss_iou: 0.12 time: 3.41 max mem: 5931
2022-10-22 20:21:30,752 mmn.trainer INFO: Start epoch 24. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:21:30,752 mmn.trainer INFO: Using all losses
2022-10-22 20:22:07,630 mmn.trainer INFO: eta: 5:24:07 epoch: 24/250 iteration: 10/19 loss_vid: 0.64 loss_sent: 0.90 loss_iou: 0.12 time: 3.72 max mem: 5931

In addition, i'm not sure whether the environment has influence. My pytorch version is 1.7.0,python version is 3.6.6, cuda version is 10.2. I also test on the environment of pytorch 1.10.1, python 3.6.6, cuda 11.4, but i could not get same result as well.

zhenzhiwang · 2022-10-23T12:00:58Z

Thank you for the detailed log. I believe the randomness in my code is not completely removed, although I mannually set the random seed in

MMN/train_net.py

Line 134 in be8db4b

seed = 25285

. I am busy recently, so I am afraid I could not investigate the whole code thoroughly to find where the randomness is.

Due to the randomness of negative sample selection in contrastive learning, it could lead to dramatic difference in gradients and affect the training process (especially in the TACoS dataset, due to that its videos commonly have 100+ sentences per video). I think the difference in evaluation metrics during the training process is not important, as long as the final results after convergence are similar between different trainings. Please compare your final results between your multiple trainings, and also compare it with the result reported in the paper. It will not be a problem if there is no huge performance gap. Just in case there is a huge performance gap, I will check it later. However, I believe there is no huge performance gap because the provided model parameters are trained just with this code in this version.

LLLddddd · 2022-10-23T12:32:54Z

Thanks a lot!

If there is free time further, I'll appreciate it if you could check the randomness in your code.

LLLddddd closed this as completed Oct 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental results are not the same when run the code multiple times #6

Experimental results are not the same when run the code multiple times #6

LLLddddd commented Oct 20, 2022 •

edited

zhenzhiwang commented Oct 21, 2022

LLLddddd commented Oct 22, 2022

zhenzhiwang commented Oct 22, 2022

LLLddddd commented Oct 22, 2022

zhenzhiwang commented Oct 23, 2022

LLLddddd commented Oct 23, 2022 •

edited

Experimental results are not the same when run the code multiple times #6

Experimental results are not the same when run the code multiple times #6

Comments

LLLddddd commented Oct 20, 2022 • edited

zhenzhiwang commented Oct 21, 2022

LLLddddd commented Oct 22, 2022

zhenzhiwang commented Oct 22, 2022

LLLddddd commented Oct 22, 2022

zhenzhiwang commented Oct 23, 2022

LLLddddd commented Oct 23, 2022 • edited

LLLddddd commented Oct 20, 2022 •

edited

LLLddddd commented Oct 23, 2022 •

edited