Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental results are not the same when run the code multiple times #6

Closed
LLLddddd opened this issue Oct 20, 2022 · 6 comments
Closed

Comments

@LLLddddd
Copy link

LLLddddd commented Oct 20, 2022

Hi,

It's a great work in moment localization and achieves significant results! I have some questions about the results when running codes for multiple times. It seems that for the same code and the same hyper parameters, experimental results are not the same when run the code twice.

Have you meet the same problem?Is there any solutions?

Thanks!

@zhenzhiwang
Copy link
Collaborator

Please provide more information about the difference, e.g., evaluation results in the log.

@LLLddddd
Copy link
Author

for example, on TACoS, we get the results for the first running:

+-------------+-------------+-------------+-------------+-------------+-------------+
| R@1,IoU@0.1 | R@1,IoU@0.3 | R@1,IoU@0.5 | R@5,IoU@0.1 | R@5,IoU@0.3 | R@5,IoU@0.5 |
+-------------+-------------+-------------+-------------+-------------+-------------+
| 47.34 | 36.14 | 25.47 | 77.03 | 60.43 | 45.49 |
+-------------+-------------+-------------+-------------+-------------+-------------+

and for the second running:

+-------------+-------------+-------------+-------------+-------------+-------------+
| R@1,IoU@0.1 | R@1,IoU@0.3 | R@1,IoU@0.5 | R@5,IoU@0.1 | R@5,IoU@0.3 | R@5,IoU@0.5 |
+-------------+-------------+-------------+-------------+-------------+-------------+
| 50.01 | 36.67 | 25.64 | 77.36 | 61.93 | 46.54 |
+-------------+-------------+-------------+-------------+-------------+-------------+

some metric can differ a lot.

@zhenzhiwang
Copy link
Collaborator

Thank you LLLddddd.
I don't observe such results. Could you please provide the original log file of these results?
Besides, are you sure the hyper-parameters are exactly the same and the resluts are achieved by the same epoch? To a better comparison, please also set the random seeds as the same value. It is the first time for me to know such situation, so please provide more information so that we could better explore the reason behind it.

@LLLddddd
Copy link
Author

Hi,thanks for your reply. In order to make sure the hyper-parameters, random seed, etc are exactly the same, I re-download your entire official code.

I only changed the following places of your code:

  1. file path in paths_catalog.py
  2. since when i run the code, it raises an error in tranier.py: AttributeError: 'RandomSampler' object has no attribute 'set_epoch'. So I delete the code "data_loader.batch_sampler.sampler.set_epoch(epoch)" in 60 line of trainer.py. I'm not sure what's the function of this code line? Will it have effect if i delete it?
  3. I use "CUDA_VISIBLE_DEVICES=0 python train_net.py --config-file configs/pool_tacos_128x128_k5l8.yaml" to run the code, I did not use the tacos_train.sh.
  4. I only use 1 gpu card. I did not use distributed training.

All the rest are the same as your official code.

The log of the first running (first two epoch):

nohup: ignoring input
2022-10-22 19:44:48,045 mmn INFO: Using 1 GPUs
2022-10-22 19:44:48,045 mmn INFO: Namespace(config_file='configs/pool_tacos_128x128_k5l8.yaml', distributed=False, local_rank=0, opts=[], skip_test=False)
2022-10-22 19:44:48,045 mmn INFO: Loaded configuration file configs/pool_tacos_128x128_k5l8.yaml
2022-10-22 19:44:48,045 mmn INFO:
MODEL:
ARCHITECTURE: "MMN"
MMN:
NUM_CLIPS: 128
JOINT_SPACE_SIZE: 256
FEATPOOL:
INPUT_SIZE: 4096
HIDDEN_SIZE: 512
KERNEL_SIZE: 2
FEAT2D:
NAME: "pool"
POOLING_COUNTS: [15,8,8,8]
TEXT_ENCODER:
NAME: 'BERT'
PREDICTOR:
HIDDEN_SIZE: 512
KERNEL_SIZE: 5
NUM_STACK_LAYERS: 8
LOSS:
MIN_IOU: 0.3
MAX_IOU: 0.7
NUM_POSTIVE_VIDEO_PROPOSAL: 3
NEGATIVE_VIDEO_IOU: 0.5
SENT_REMOVAL_IOU: 0.5
TAU_VIDEO: 0.1
TAU_SENT: 0.1
MARGIN: 0.1
CONTRASTIVE_WEIGHT: 0.1
DATASETS:
NAME: "tacos"
TRAIN: ("tacos_train",)
TEST: ("tacos_test",)
INPUT:
NUM_PRE_CLIPS: 256
DATALOADER:
NUM_WORKERS: 12
SOLVER:
LR: 0.0015
BATCH_SIZE: 4
MILESTONES: (130, 190)
MAX_EPOCH: 250
TEST_PERIOD: 10
CHECKPOINT_PERIOD: 10
RESUME: False
RESUME_EPOCH: 101
FREEZE_BERT: 40
ONLY_IOU: 100
SKIP_TEST: 0
TEST:
NMS_THRESH: 0.4
BATCH_SIZE: 16
CONTRASTIVE_SCORE_POW: 0.3

2022-10-22 19:44:48,046 mmn INFO: Running with config:
DATALOADER:
NUM_WORKERS: 12
DATASETS:
NAME: tacos
TEST: ('tacos_test',)
TRAIN: ('tacos_train',)
INPUT:
NUM_PRE_CLIPS: 256
MODEL:
ARCHITECTURE: MMN
DEVICE: cuda
MMN:
FEAT2D:
NAME: pool
POOLING_COUNTS: [15, 8, 8, 8]
FEATPOOL:
HIDDEN_SIZE: 512
INPUT_SIZE: 4096
KERNEL_SIZE: 2
JOINT_SPACE_SIZE: 256
LOSS:
BCE_WEIGHT: 1
CONTRASTIVE_WEIGHT: 0.1
MARGIN: 0.1
MAX_IOU: 0.7
MIN_IOU: 0.3
NEGATIVE_VIDEO_IOU: 0.5
NUM_POSTIVE_VIDEO_PROPOSAL: 3
PAIRWISE_SENT_WEIGHT: 0.0
SENT_REMOVAL_IOU: 0.5
TAU_SENT: 0.1
TAU_VIDEO: 0.1
NUM_CLIPS: 128
PREDICTOR:
HIDDEN_SIZE: 512
KERNEL_SIZE: 5
NUM_STACK_LAYERS: 8
TEXT_ENCODER:
NAME: BERT
OUTPUT_DIR: .
PATHS_CATALOG: ./MMN-main-official/MMN-main/mmn/config/paths_catalog.py
SOLVER:
BATCH_SIZE: 4
CHECKPOINT_PERIOD: 10
FREEZE_BERT: 40
LR: 0.0015
MAX_EPOCH: 250
MILESTONES: (130, 190)
ONLY_IOU: 100
RESUME: False
RESUME_EPOCH: 101
SKIP_TEST: 0
TEST_PERIOD: 10
TEST:
BATCH_SIZE: 16
CONTRASTIVE_SCORE_POW: 0.3
NMS_THRESH: 0.4
2022-10-22 19:44:48,046 mmn INFO: Saving config into: ./config.yml
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_projector.bias', 'vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_transform.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight']

  • This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    2022-10-22 19:45:00,061 mmn.trainer INFO: Preparing data, please wait...
    2022-10-22 19:45:30,962 mmn.trainer INFO: Preparing data, please wait...
    2022-10-22 19:46:07,631 mmn.trainer INFO: Start training
    2022-10-22 19:46:07,633 mmn.trainer INFO: Start epoch 1. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:46:07,633 mmn.trainer INFO: Using all losses
    ./MMN-main-official/MMN-main/mmn/data/datasets/utils.py:35: UserWarning: This overload of nonzero is deprecated:
    nonzero()
    Consider using one of the following signatures instead:
    nonzero(*, bool as_tuple) (Triggered internally at /opt/conda/conda-bld/pytorch_1603728993639/work/torch/csrc/utils/python_arg_parser.cpp:882.)
    grids = score2d.nonzero()
    2022-10-22 19:46:43,818 mmn.trainer INFO: eta: 4:45:51 epoch: 1/250 iteration: 10/19 loss_vid: 0.74 loss_sent: 1.02 loss_iou: 0.22 time: 3.62 max mem: 5880
    2022-10-22 19:47:13,833 mmn.trainer INFO: eta: 4:34:43 epoch: 1/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.34 max mem: 5880
    2022-10-22 19:47:14,036 mmn.trainer INFO: Start epoch 2. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:47:14,036 mmn.trainer INFO: Using all losses
    2022-10-22 19:47:50,335 mmn.trainer INFO: eta: 4:38:39 epoch: 2/250 iteration: 10/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.13 time: 3.65 max mem: 5880
    2022-10-22 19:48:20,727 mmn.trainer INFO: eta: 4:35:03 epoch: 2/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.38 max mem: 5880
    2022-10-22 19:48:20,888 mmn.trainer INFO: Start epoch 3. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:48:20,888 mmn.trainer INFO: Using all losses
    2022-10-22 19:48:57,038 mmn.trainer INFO: eta: 4:36:34 epoch: 3/250 iteration: 10/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.63 max mem: 5880
    2022-10-22 19:49:28,093 mmn.trainer INFO: eta: 4:35:04 epoch: 3/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.13 time: 3.48 max mem: 5931
    2022-10-22 19:49:28,227 mmn.trainer INFO: Start epoch 4. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:49:28,227 mmn.trainer INFO: Using all losses
    2022-10-22 19:50:04,279 mmn.trainer INFO: eta: 4:35:40 epoch: 4/250 iteration: 10/19 loss_vid: 0.73 loss_sent: 1.02 loss_iou: 0.13 time: 3.62 max mem: 5931
    2022-10-22 19:50:35,181 mmn.trainer INFO: eta: 4:34:14 epoch: 4/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.45 max mem: 5931
    2022-10-22 19:50:35,337 mmn.trainer INFO: Start epoch 5. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:50:35,337 mmn.trainer INFO: Using all losses
    2022-10-22 19:51:11,027 mmn.trainer INFO: eta: 4:34:13 epoch: 5/250 iteration: 10/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.58 max mem: 5931
    2022-10-22 19:51:42,102 mmn.trainer INFO: eta: 4:33:08 epoch: 5/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.01 loss_iou: 0.13 time: 3.48 max mem: 5931
    2022-10-22 19:51:42,273 mmn.trainer INFO: Start epoch 6. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:51:42,273 mmn.trainer INFO: Using all losses
    2022-10-22 19:52:18,258 mmn.trainer INFO: eta: 4:33:15 epoch: 6/250 iteration: 10/19 loss_vid: 0.72 loss_sent: 1.00 loss_iou: 0.14 time: 3.62 max mem: 5931
    2022-10-22 19:52:49,545 mmn.trainer INFO: eta: 4:32:24 epoch: 6/250 iteration: 19/19 loss_vid: 0.71 loss_sent: 0.99 loss_iou: 0.13 time: 3.48 max mem: 5931
    2022-10-22 19:52:49,726 mmn.trainer INFO: Start epoch 7. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:52:49,727 mmn.trainer INFO: Using all losses
    2022-10-22 19:53:26,213 mmn.trainer INFO: eta: 4:32:41 epoch: 7/250 iteration: 10/19 loss_vid: 0.71 loss_sent: 0.99 loss_iou: 0.14 time: 3.67 max mem: 5931
    2022-10-22 19:53:56,666 mmn.trainer INFO: eta: 4:31:22 epoch: 7/250 iteration: 19/19 loss_vid: 0.70 loss_sent: 0.98 loss_iou: 0.14 time: 3.40 max mem: 5931
    2022-10-22 19:53:56,998 mmn.trainer INFO: Start epoch 8. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:53:56,998 mmn.trainer INFO: Using all losses
    2022-10-22 19:54:33,683 mmn.trainer INFO: eta: 4:31:43 epoch: 8/250 iteration: 10/19 loss_vid: 0.70 loss_sent: 0.98 loss_iou: 0.13 time: 3.70 max mem: 5931
    2022-10-22 19:55:03,949 mmn.trainer INFO: eta: 4:30:23 epoch: 8/250 iteration: 19/19 loss_vid: 0.69 loss_sent: 0.97 loss_iou: 0.14 time: 3.38 max mem: 5931
    2022-10-22 19:55:04,097 mmn.trainer INFO: Start epoch 9. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:55:04,097 mmn.trainer INFO: Using all losses
    2022-10-22 19:55:40,197 mmn.trainer INFO: eta: 4:30:15 epoch: 9/250 iteration: 10/19 loss_vid: 0.69 loss_sent: 0.97 loss_iou: 0.14 time: 3.62 max mem: 5931
    2022-10-22 19:56:10,485 mmn.trainer INFO: eta: 4:29:03 epoch: 9/250 iteration: 19/19 loss_vid: 0.69 loss_sent: 0.97 loss_iou: 0.13 time: 3.39 max mem: 5931
    2022-10-22 19:56:10,831 mmn.trainer INFO: Start epoch 10. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:56:10,831 mmn.trainer INFO: Using all losses
    2022-10-22 19:56:47,086 mmn.trainer INFO: eta: 4:29:01 epoch: 10/250 iteration: 10/19 loss_vid: 0.68 loss_sent: 0.97 loss_iou: 0.13 time: 3.66 max mem: 5931
    2022-10-22 19:57:16,882 mmn.trainer INFO: eta: 4:27:41 epoch: 10/250 iteration: 19/19 loss_vid: 0.68 loss_sent: 0.96 loss_iou: 0.13 time: 3.32 max mem: 5931
    2022-10-22 19:57:17,025 mmn.utils.checkpoint INFO: Saving checkpoint to ./pool_model_10e.pth
    2022-10-22 19:57:21,256 mmn.inference INFO: Start evaluation on ('tacos_test',) dataset (Size: 25).
    2022-10-22 19:57:41,447 mmn.inference INFO: Model inference time: 0:00:08.073524 (0.323 s / inference per device, on 1 devices)
    2022-10-22 19:57:41,448 mmn.inference INFO: Performing TACoSDataset evaluation (Size: 25).

0it [00:00, ?it/s]
1it [00:08, 8.09s/it]
2it [00:15, 7.63s/it]
3it [00:24, 8.09s/it]
4it [00:27, 6.36s/it]
5it [00:33, 6.05s/it]
6it [00:43, 7.45s/it]
7it [00:48, 6.79s/it]
8it [00:54, 6.35s/it]
9it [01:02, 6.98s/it]
10it [01:10, 7.19s/it]
11it [01:19, 7.83s/it]
12it [01:25, 7.35s/it]
13it [01:32, 7.28s/it]
14it [01:42, 7.96s/it]
15it [01:49, 7.61s/it]
16it [01:57, 7.67s/it]
17it [02:03, 7.17s/it]
18it [02:10, 7.13s/it]
19it [02:17, 7.28s/it]
20it [02:22, 6.43s/it]
21it [02:27, 6.15s/it]
22it [02:34, 6.25s/it]
23it [02:41, 6.50s/it]
24it [02:49, 6.96s/it]
25it [02:57, 7.30s/it]
25it [02:57, 7.10s/it]2022-10-22 20:00:38,870 mmn.inference INFO:
+-------------+-------------+-------------+-------------+-------------+-------------+
| R@1,IoU@0.1 | R@1,IoU@0.3 | R@1,IoU@0.5 | R@5,IoU@0.1 | R@5,IoU@0.3 | R@5,IoU@0.5 |
+-------------+-------------+-------------+-------------+-------------+-------------+
| 24.72 | 11.87 | 4.22 | 40.51 | 27.27 | 16.12 |
+-------------+-------------+-------------+-------------+-------------+-------------+
2022-10-22 20:00:38,930 mmn.trainer INFO: Start epoch 11. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:00:38,930 mmn.trainer INFO: Using all losses

2022-10-22 20:01:12,734 mmn.trainer INFO: eta: 5:43:11 epoch: 11/250 iteration: 10/19 loss_vid: 0.68 loss_sent: 0.96 loss_iou: 0.13 time: 23.59 max mem: 5931
2022-10-22 20:01:41,536 mmn.trainer INFO: eta: 5:38:11 epoch: 11/250 iteration: 19/19 loss_vid: 0.68 loss_sent: 0.97 loss_iou: 0.13 time: 3.22 max mem: 5931
2022-10-22 20:01:41,689 mmn.trainer INFO: Start epoch 12. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:01:41,690 mmn.trainer INFO: Using all losses
2022-10-22 20:02:16,402 mmn.trainer INFO: eta: 5:34:03 epoch: 12/250 iteration: 10/19 loss_vid: 0.68 loss_sent: 0.96 loss_iou: 0.13 time: 3.49 max mem: 5931
2022-10-22 20:02:45,657 mmn.trainer INFO: eta: 5:29:54 epoch: 12/250 iteration: 19/19 loss_vid: 0.67 loss_sent: 0.95 loss_iou: 0.13 time: 3.27 max mem: 5931
2022-10-22 20:02:45,784 mmn.trainer INFO: Start epoch 13. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:02:45,785 mmn.trainer INFO: Using all losses
2022-10-22 20:03:20,538 mmn.trainer INFO: eta: 5:26:21 epoch: 13/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.96 loss_iou: 0.13 time: 3.49 max mem: 5931
2022-10-22 20:03:50,474 mmn.trainer INFO: eta: 5:22:56 epoch: 13/250 iteration: 19/19 loss_vid: 0.68 loss_sent: 0.96 loss_iou: 0.12 time: 3.34 max mem: 5931
2022-10-22 20:03:50,620 mmn.trainer INFO: Start epoch 14. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:03:50,620 mmn.trainer INFO: Using all losses
2022-10-22 20:04:26,146 mmn.trainer INFO: eta: 5:20:04 epoch: 14/250 iteration: 10/19 loss_vid: 0.68 loss_sent: 0.96 loss_iou: 0.13 time: 3.57 max mem: 5931
2022-10-22 20:04:55,851 mmn.trainer INFO: eta: 5:16:58 epoch: 14/250 iteration: 19/19 loss_vid: 0.66 loss_sent: 0.94 loss_iou: 0.12 time: 3.31 max mem: 5931
2022-10-22 20:04:55,989 mmn.trainer INFO: Start epoch 15. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:04:55,989 mmn.trainer INFO: Using all losses
2022-10-22 20:05:31,275 mmn.trainer INFO: eta: 5:14:22 epoch: 15/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.95 loss_iou: 0.13 time: 3.54 max mem: 5931
2022-10-22 20:06:00,960 mmn.trainer INFO: eta: 5:11:35 epoch: 15/250 iteration: 19/19 loss_vid: 0.66 loss_sent: 0.94 loss_iou: 0.12 time: 3.31 max mem: 5931
2022-10-22 20:06:01,082 mmn.trainer INFO: Start epoch 16. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:06:01,082 mmn.trainer INFO: Using all losses
2022-10-22 20:06:36,527 mmn.trainer INFO: eta: 5:09:18 epoch: 16/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.95 loss_iou: 0.12 time: 3.56 max mem: 5931
2022-10-22 20:07:06,416 mmn.trainer INFO: eta: 5:06:49 epoch: 16/250 iteration: 19/19 loss_vid: 0.66 loss_sent: 0.95 loss_iou: 0.12 time: 3.33 max mem: 5931
2022-10-22 20:07:06,545 mmn.trainer INFO: Start epoch 17. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:07:06,545 mmn.trainer INFO: Using all losses
2022-10-22 20:07:41,854 mmn.trainer INFO: eta: 5:04:43 epoch: 17/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.94 loss_iou: 0.12 time: 3.54 max mem: 5931
2022-10-22 20:08:11,821 mmn.trainer INFO: eta: 5:02:29 epoch: 17/250 iteration: 19/19 loss_vid: 0.66 loss_sent: 0.94 loss_iou: 0.12 time: 3.34 max mem: 5931
2022-10-22 20:08:11,980 mmn.trainer INFO: Start epoch 18. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:08:11,980 mmn.trainer INFO: Using all losses
2022-10-22 20:08:47,296 mmn.trainer INFO: eta: 5:00:34 epoch: 18/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.94 loss_iou: 0.12 time: 3.55 max mem: 5931
2022-10-22 20:09:17,322 mmn.trainer INFO: eta: 4:58:31 epoch: 18/250 iteration: 19/19 loss_vid: 0.67 loss_sent: 0.94 loss_iou: 0.13 time: 3.33 max mem: 5931
2022-10-22 20:09:17,477 mmn.trainer INFO: Start epoch 19. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:09:17,477 mmn.trainer INFO: Using all losses
2022-10-22 20:09:53,431 mmn.trainer INFO: eta: 4:56:54 epoch: 19/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.94 loss_iou: 0.12 time: 3.61 max mem: 5931
2022-10-22 20:10:23,315 mmn.trainer INFO: eta: 4:54:58 epoch: 19/250 iteration: 19/19 loss_vid: 0.66 loss_sent: 0.94 loss_iou: 0.13 time: 3.32 max mem: 5931
2022-10-22 20:10:23,480 mmn.trainer INFO: Start epoch 20. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:10:23,480 mmn.trainer INFO: Using all losses
2022-10-22 20:10:58,675 mmn.trainer INFO: eta: 4:53:19 epoch: 20/250 iteration: 10/19 loss_vid: 0.66 loss_sent: 0.93 loss_iou: 0.12 time: 3.54 max mem: 5931
2022-10-22 20:11:28,663 mmn.trainer INFO: eta: 4:51:31 epoch: 20/250 iteration: 19/19 loss_vid: 0.65 loss_sent: 0.92 loss_iou: 0.12 time: 3.34 max mem: 5931
2022-10-22 20:11:28,826 mmn.utils.checkpoint INFO: Saving checkpoint to ./pool_model_20e.pth
2022-10-22 20:11:32,992 mmn.inference INFO: Start evaluation on ('tacos_test',) dataset (Size: 25).
2022-10-22 20:11:45,165 mmn.inference INFO: Model inference time: 0:00:08.189008 (0.328 s / inference per device, on 1 devices)
2022-10-22 20:11:45,167 mmn.inference INFO: Performing TACoSDataset evaluation (Size: 25).

0it [00:00, ?it/s]
1it [00:08, 8.00s/it]
2it [00:14, 7.25s/it]
3it [00:23, 7.73s/it]
4it [00:26, 6.14s/it]
5it [00:32, 5.95s/it]
6it [00:42, 7.26s/it]
7it [00:47, 6.55s/it]
8it [00:52, 6.12s/it]
9it [01:00, 6.86s/it]
10it [01:08, 6.96s/it]
11it [01:16, 7.30s/it]
12it [01:22, 6.94s/it]
13it [01:28, 6.75s/it]
14it [01:37, 7.35s/it]
15it [01:44, 7.26s/it]
16it [01:55, 8.57s/it]
17it [02:03, 8.22s/it]
18it [02:11, 8.33s/it]
19it [02:22, 9.07s/it]
20it [02:28, 7.94s/it]
21it [02:36, 8.04s/it]
22it [02:45, 8.25s/it]
23it [02:52, 8.13s/it]
24it [03:03, 8.80s/it]
25it [03:14, 9.64s/it]
25it [03:14, 7.80s/it]2022-10-22 20:15:00,069 mmn.inference INFO:
+-------------+-------------+-------------+-------------+-------------+-------------+
| R@1,IoU@0.1 | R@1,IoU@0.3 | R@1,IoU@0.5 | R@5,IoU@0.1 | R@5,IoU@0.3 | R@5,IoU@0.5 |
+-------------+-------------+-------------+-------------+-------------+-------------+
| 29.59 | 17.15 | 7.75 | 62.36 | 46.69 | 27.92 |
+-------------+-------------+-------------+-------------+-------------+-------------+
2022-10-22 20:15:00,160 mmn.trainer INFO: Start epoch 21. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:15:00,161 mmn.trainer INFO: Using all losses

2022-10-22 20:15:36,223 mmn.trainer INFO: eta: 5:29:31 epoch: 21/250 iteration: 10/19 loss_vid: 0.66 loss_sent: 0.93 loss_iou: 0.12 time: 24.75 max mem: 5931
2022-10-22 20:16:06,322 mmn.trainer INFO: eta: 5:26:54 epoch: 21/250 iteration: 19/19 loss_vid: 0.65 loss_sent: 0.92 loss_iou: 0.12 time: 3.36 max mem: 5931
2022-10-22 20:16:06,514 mmn.trainer INFO: Start epoch 22. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:16:06,514 mmn.trainer INFO: Using all losses
2022-10-22 20:16:42,464 mmn.trainer INFO: eta: 5:24:34 epoch: 22/250 iteration: 10/19 loss_vid: 0.65 loss_sent: 0.91 loss_iou: 0.11 time: 3.61 max mem: 5931
2022-10-22 20:17:12,540 mmn.trainer INFO: eta: 5:22:07 epoch: 22/250 iteration: 19/19 loss_vid: 0.64 loss_sent: 0.91 loss_iou: 0.12 time: 3.36 max mem: 5931
2022-10-22 20:17:12,828 mmn.trainer INFO: Start epoch 23. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:17:12,829 mmn.trainer INFO: Using all losses
2022-10-22 20:17:49,070 mmn.trainer INFO: eta: 5:20:00 epoch: 23/250 iteration: 10/19 loss_vid: 0.64 loss_sent: 0.91 loss_iou: 0.12 time: 3.65 max mem: 5931
2022-10-22 20:18:20,236 mmn.trainer INFO: eta: 5:17:53 epoch: 23/250 iteration: 19/19 loss_vid: 0.65 loss_sent: 0.90 loss_iou: 0.11 time: 3.47 max mem: 5931
2022-10-22 20:18:20,538 mmn.trainer INFO: Start epoch 24. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:18:20,538 mmn.trainer INFO: Using all losses
2022-10-22 20:18:57,706 mmn.trainer INFO: eta: 5:16:04 epoch: 24/250 iteration: 10/19 loss_vid: 0.65 loss_sent: 0.90 loss_iou: 0.12 time: 3.75 max mem: 5931

the log of the second training:

nohup: ignoring input
2022-10-22 19:47:02,181 mmn INFO: Using 1 GPUs
2022-10-22 19:47:02,181 mmn INFO: Namespace(config_file='configs/pool_tacos_128x128_k5l8.yaml', distributed=False, local_rank=0, opts=[], skip_test=False)
2022-10-22 19:47:02,181 mmn INFO: Loaded configuration file configs/pool_tacos_128x128_k5l8.yaml
2022-10-22 19:47:02,181 mmn INFO:
MODEL:
ARCHITECTURE: "MMN"
MMN:
NUM_CLIPS: 128
JOINT_SPACE_SIZE: 256
FEATPOOL:
INPUT_SIZE: 4096
HIDDEN_SIZE: 512
KERNEL_SIZE: 2
FEAT2D:
NAME: "pool"
POOLING_COUNTS: [15,8,8,8]
TEXT_ENCODER:
NAME: 'BERT'
PREDICTOR:
HIDDEN_SIZE: 512
KERNEL_SIZE: 5
NUM_STACK_LAYERS: 8
LOSS:
MIN_IOU: 0.3
MAX_IOU: 0.7
NUM_POSTIVE_VIDEO_PROPOSAL: 3
NEGATIVE_VIDEO_IOU: 0.5
SENT_REMOVAL_IOU: 0.5
TAU_VIDEO: 0.1
TAU_SENT: 0.1
MARGIN: 0.1
CONTRASTIVE_WEIGHT: 0.1
DATASETS:
NAME: "tacos"
TRAIN: ("tacos_train",)
TEST: ("tacos_test",)
INPUT:
NUM_PRE_CLIPS: 256
DATALOADER:
NUM_WORKERS: 12
SOLVER:
LR: 0.0015
BATCH_SIZE: 4
MILESTONES: (130, 190)
MAX_EPOCH: 250
TEST_PERIOD: 10
CHECKPOINT_PERIOD: 10
RESUME: False
RESUME_EPOCH: 101
FREEZE_BERT: 40
ONLY_IOU: 100
SKIP_TEST: 0
TEST:
NMS_THRESH: 0.4
BATCH_SIZE: 16
CONTRASTIVE_SCORE_POW: 0.3

2022-10-22 19:47:02,182 mmn INFO: Running with config:
DATALOADER:
NUM_WORKERS: 12
DATASETS:
NAME: tacos
TEST: ('tacos_test',)
TRAIN: ('tacos_train',)
INPUT:
NUM_PRE_CLIPS: 256
MODEL:
ARCHITECTURE: MMN
DEVICE: cuda
MMN:
FEAT2D:
NAME: pool
POOLING_COUNTS: [15, 8, 8, 8]
FEATPOOL:
HIDDEN_SIZE: 512
INPUT_SIZE: 4096
KERNEL_SIZE: 2
JOINT_SPACE_SIZE: 256
LOSS:
BCE_WEIGHT: 1
CONTRASTIVE_WEIGHT: 0.1
MARGIN: 0.1
MAX_IOU: 0.7
MIN_IOU: 0.3
NEGATIVE_VIDEO_IOU: 0.5
NUM_POSTIVE_VIDEO_PROPOSAL: 3
PAIRWISE_SENT_WEIGHT: 0.0
SENT_REMOVAL_IOU: 0.5
TAU_SENT: 0.1
TAU_VIDEO: 0.1
NUM_CLIPS: 128
PREDICTOR:
HIDDEN_SIZE: 512
KERNEL_SIZE: 5
NUM_STACK_LAYERS: 8
TEXT_ENCODER:
NAME: BERT
OUTPUT_DIR: .
PATHS_CATALOG: ./MMN-main-official/MMN-main/mmn/config/paths_catalog.py
SOLVER:
BATCH_SIZE: 4
CHECKPOINT_PERIOD: 10
FREEZE_BERT: 40
LR: 0.0015
MAX_EPOCH: 250
MILESTONES: (130, 190)
ONLY_IOU: 100
RESUME: False
RESUME_EPOCH: 101
SKIP_TEST: 0
TEST_PERIOD: 10
TEST:
BATCH_SIZE: 16
CONTRASTIVE_SCORE_POW: 0.3
NMS_THRESH: 0.4
2022-10-22 19:47:02,182 mmn INFO: Saving config into: ./config.yml
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.bias']

  • This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    2022-10-22 19:47:24,821 mmn.trainer INFO: Preparing data, please wait...
    2022-10-22 19:48:08,211 mmn.trainer INFO: Preparing data, please wait...
    2022-10-22 19:48:27,384 mmn.trainer INFO: Start training
    2022-10-22 19:48:27,386 mmn.trainer INFO: Start epoch 1. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:48:27,387 mmn.trainer INFO: Using all losses
    ./MMN-main-official/MMN-main/mmn/data/datasets/utils.py:35: UserWarning: This overload of nonzero is deprecated:
    nonzero()
    Consider using one of the following signatures instead:
    nonzero(*, bool as_tuple) (Triggered internally at /opt/conda/conda-bld/pytorch_1603728993639/work/torch/csrc/utils/python_arg_parser.cpp:882.)
    grids = score2d.nonzero()
    2022-10-22 19:49:04,995 mmn.trainer INFO: eta: 4:57:05 epoch: 1/250 iteration: 10/19 loss_vid: 0.74 loss_sent: 1.02 loss_iou: 0.22 time: 3.76 max mem: 5880
    2022-10-22 19:49:35,106 mmn.trainer INFO: eta: 4:41:01 epoch: 1/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.35 max mem: 5880
    2022-10-22 19:49:35,356 mmn.trainer INFO: Start epoch 2. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:49:35,356 mmn.trainer INFO: Using all losses
    2022-10-22 19:50:11,703 mmn.trainer INFO: eta: 4:43:01 epoch: 2/250 iteration: 10/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.13 time: 3.66 max mem: 5880
    2022-10-22 19:50:42,019 mmn.trainer INFO: eta: 4:38:14 epoch: 2/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.37 max mem: 5880
    2022-10-22 19:50:42,314 mmn.trainer INFO: Start epoch 3. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:50:42,315 mmn.trainer INFO: Using all losses
    2022-10-22 19:51:19,514 mmn.trainer INFO: eta: 4:41:01 epoch: 3/250 iteration: 10/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.75 max mem: 5880
    2022-10-22 19:51:50,072 mmn.trainer INFO: eta: 4:38:07 epoch: 3/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.13 time: 3.40 max mem: 5931
    2022-10-22 19:51:50,416 mmn.trainer INFO: Start epoch 4. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:51:50,416 mmn.trainer INFO: Using all losses
    2022-10-22 19:52:27,834 mmn.trainer INFO: eta: 4:40:06 epoch: 4/250 iteration: 10/19 loss_vid: 0.73 loss_sent: 1.02 loss_iou: 0.13 time: 3.78 max mem: 5931
    2022-10-22 19:52:58,100 mmn.trainer INFO: eta: 4:37:28 epoch: 4/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.37 max mem: 5931
    2022-10-22 19:52:58,351 mmn.trainer INFO: Start epoch 5. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:52:58,351 mmn.trainer INFO: Using all losses
    2022-10-22 19:53:34,492 mmn.trainer INFO: eta: 4:37:35 epoch: 5/250 iteration: 10/19 loss_vid: 0.72 loss_sent: 1.02 loss_iou: 0.14 time: 3.64 max mem: 5931
    2022-10-22 19:54:05,270 mmn.trainer INFO: eta: 4:35:56 epoch: 5/250 iteration: 19/19 loss_vid: 0.72 loss_sent: 1.01 loss_iou: 0.13 time: 3.43 max mem: 5931
    2022-10-22 19:54:05,490 mmn.trainer INFO: Start epoch 6. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:54:05,490 mmn.trainer INFO: Using all losses
    2022-10-22 19:54:42,282 mmn.trainer INFO: eta: 4:36:24 epoch: 6/250 iteration: 10/19 loss_vid: 0.72 loss_sent: 1.00 loss_iou: 0.14 time: 3.70 max mem: 5931
    2022-10-22 19:55:12,733 mmn.trainer INFO: eta: 4:34:44 epoch: 6/250 iteration: 19/19 loss_vid: 0.71 loss_sent: 0.99 loss_iou: 0.13 time: 3.38 max mem: 5931
    2022-10-22 19:55:13,059 mmn.trainer INFO: Start epoch 7. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:55:13,060 mmn.trainer INFO: Using all losses
    2022-10-22 19:55:50,374 mmn.trainer INFO: eta: 4:35:26 epoch: 7/250 iteration: 10/19 loss_vid: 0.71 loss_sent: 0.99 loss_iou: 0.14 time: 3.76 max mem: 5931
    2022-10-22 19:56:20,779 mmn.trainer INFO: eta: 4:33:53 epoch: 7/250 iteration: 19/19 loss_vid: 0.69 loss_sent: 0.98 loss_iou: 0.14 time: 3.38 max mem: 5931
    2022-10-22 19:56:21,101 mmn.trainer INFO: Start epoch 8. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:56:21,101 mmn.trainer INFO: Using all losses
    2022-10-22 19:56:57,951 mmn.trainer INFO: eta: 4:34:08 epoch: 8/250 iteration: 10/19 loss_vid: 0.70 loss_sent: 0.98 loss_iou: 0.13 time: 3.72 max mem: 5931
    2022-10-22 19:57:27,451 mmn.trainer INFO: eta: 4:32:16 epoch: 8/250 iteration: 19/19 loss_vid: 0.69 loss_sent: 0.97 loss_iou: 0.14 time: 3.29 max mem: 5931
    2022-10-22 19:57:27,599 mmn.trainer INFO: Start epoch 9. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:57:27,599 mmn.trainer INFO: Using all losses
    2022-10-22 19:58:02,527 mmn.trainer INFO: eta: 4:31:28 epoch: 9/250 iteration: 10/19 loss_vid: 0.69 loss_sent: 0.97 loss_iou: 0.14 time: 3.51 max mem: 5931
    2022-10-22 19:58:32,257 mmn.trainer INFO: eta: 4:29:57 epoch: 9/250 iteration: 19/19 loss_vid: 0.69 loss_sent: 0.97 loss_iou: 0.13 time: 3.32 max mem: 5931
    2022-10-22 19:58:32,429 mmn.trainer INFO: Start epoch 10. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
    2022-10-22 19:58:32,429 mmn.trainer INFO: Using all losses
    2022-10-22 19:59:07,513 mmn.trainer INFO: eta: 4:29:18 epoch: 10/250 iteration: 10/19 loss_vid: 0.68 loss_sent: 0.97 loss_iou: 0.13 time: 3.53 max mem: 5931
    2022-10-22 19:59:37,206 mmn.trainer INFO: eta: 4:27:55 epoch: 10/250 iteration: 19/19 loss_vid: 0.68 loss_sent: 0.96 loss_iou: 0.13 time: 3.30 max mem: 5931
    2022-10-22 19:59:37,339 mmn.utils.checkpoint INFO: Saving checkpoint to ./pool_model_10e.pth
    2022-10-22 19:59:41,130 mmn.inference INFO: Start evaluation on ('tacos_test',) dataset (Size: 25).
    2022-10-22 19:59:52,750 mmn.inference INFO: Model inference time: 0:00:08.141408 (0.326 s / inference per device, on 1 devices)
    2022-10-22 19:59:52,751 mmn.inference INFO: Performing TACoSDataset evaluation (Size: 25).

0it [00:00, ?it/s]
1it [00:08, 8.27s/it]
2it [00:15, 7.69s/it]
3it [00:24, 8.27s/it]
4it [00:28, 6.56s/it]
5it [00:34, 6.27s/it]
6it [00:44, 7.52s/it]
7it [00:49, 6.79s/it]
8it [00:55, 6.42s/it]
9it [01:03, 7.13s/it]
10it [01:11, 7.19s/it]
11it [01:20, 7.80s/it]
12it [01:26, 7.40s/it]
13it [01:33, 7.27s/it]
14it [01:43, 7.99s/it]
15it [01:50, 7.81s/it]
16it [01:58, 7.77s/it]
17it [02:04, 7.27s/it]
18it [02:12, 7.37s/it]
19it [02:20, 7.54s/it]
20it [02:24, 6.71s/it]
21it [02:31, 6.56s/it]
22it [02:37, 6.53s/it]
23it [02:44, 6.72s/it]
24it [02:53, 7.34s/it]
25it [03:02, 7.83s/it]
25it [03:02, 7.30s/it]2022-10-22 20:02:55,217 mmn.inference INFO:
+-------------+-------------+-------------+-------------+-------------+-------------+
| R@1,IoU@0.1 | R@1,IoU@0.3 | R@1,IoU@0.5 | R@5,IoU@0.1 | R@5,IoU@0.3 | R@5,IoU@0.5 |
+-------------+-------------+-------------+-------------+-------------+-------------+
| 23.07 | 10.25 | 3.95 | 40.14 | 25.72 | 16.30 |
+-------------+-------------+-------------+-------------+-------------+-------------+
2022-10-22 20:02:55,295 mmn.trainer INFO: Start epoch 11. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:02:55,295 mmn.trainer INFO: Using all losses

2022-10-22 20:03:29,082 mmn.trainer INFO: eta: 5:41:53 epoch: 11/250 iteration: 10/19 loss_vid: 0.69 loss_sent: 0.96 loss_iou: 0.13 time: 23.19 max mem: 5931
2022-10-22 20:03:58,041 mmn.trainer INFO: eta: 5:37:00 epoch: 11/250 iteration: 19/19 loss_vid: 0.68 loss_sent: 0.97 loss_iou: 0.13 time: 3.23 max mem: 5931
2022-10-22 20:03:58,177 mmn.trainer INFO: Start epoch 12. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:03:58,177 mmn.trainer INFO: Using all losses
2022-10-22 20:04:33,011 mmn.trainer INFO: eta: 5:32:58 epoch: 12/250 iteration: 10/19 loss_vid: 0.68 loss_sent: 0.96 loss_iou: 0.13 time: 3.50 max mem: 5931
2022-10-22 20:05:02,451 mmn.trainer INFO: eta: 5:28:55 epoch: 12/250 iteration: 19/19 loss_vid: 0.67 loss_sent: 0.95 loss_iou: 0.13 time: 3.29 max mem: 5931
2022-10-22 20:05:02,603 mmn.trainer INFO: Start epoch 13. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:05:02,603 mmn.trainer INFO: Using all losses
2022-10-22 20:05:37,639 mmn.trainer INFO: eta: 5:25:31 epoch: 13/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.96 loss_iou: 0.13 time: 3.52 max mem: 5931
2022-10-22 20:06:07,290 mmn.trainer INFO: eta: 5:22:02 epoch: 13/250 iteration: 19/19 loss_vid: 0.68 loss_sent: 0.95 loss_iou: 0.12 time: 3.32 max mem: 5931
2022-10-22 20:06:07,430 mmn.trainer INFO: Start epoch 14. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:06:07,431 mmn.trainer INFO: Using all losses
2022-10-22 20:06:42,861 mmn.trainer INFO: eta: 5:19:11 epoch: 14/250 iteration: 10/19 loss_vid: 0.68 loss_sent: 0.96 loss_iou: 0.12 time: 3.56 max mem: 5931
2022-10-22 20:07:12,458 mmn.trainer INFO: eta: 5:16:05 epoch: 14/250 iteration: 19/19 loss_vid: 0.66 loss_sent: 0.94 loss_iou: 0.12 time: 3.30 max mem: 5931
2022-10-22 20:07:12,610 mmn.trainer INFO: Start epoch 15. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:07:12,610 mmn.trainer INFO: Using all losses
2022-10-22 20:07:47,929 mmn.trainer INFO: eta: 5:13:32 epoch: 15/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.95 loss_iou: 0.13 time: 3.55 max mem: 5931
2022-10-22 20:08:17,514 mmn.trainer INFO: eta: 5:10:45 epoch: 15/250 iteration: 19/19 loss_vid: 0.67 loss_sent: 0.95 loss_iou: 0.12 time: 3.30 max mem: 5931
2022-10-22 20:08:17,685 mmn.trainer INFO: Start epoch 16. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:08:17,685 mmn.trainer INFO: Using all losses
2022-10-22 20:08:53,066 mmn.trainer INFO: eta: 5:08:29 epoch: 16/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.95 loss_iou: 0.12 time: 3.56 max mem: 5931
2022-10-22 20:09:22,859 mmn.trainer INFO: eta: 5:06:01 epoch: 16/250 iteration: 19/19 loss_vid: 0.66 loss_sent: 0.94 loss_iou: 0.13 time: 3.32 max mem: 5931
2022-10-22 20:09:23,018 mmn.trainer INFO: Start epoch 17. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:09:23,019 mmn.trainer INFO: Using all losses
2022-10-22 20:09:58,336 mmn.trainer INFO: eta: 5:03:57 epoch: 17/250 iteration: 10/19 loss_vid: 0.66 loss_sent: 0.93 loss_iou: 0.12 time: 3.55 max mem: 5931
2022-10-22 20:10:28,255 mmn.trainer INFO: eta: 5:01:43 epoch: 17/250 iteration: 19/19 loss_vid: 0.65 loss_sent: 0.93 loss_iou: 0.12 time: 3.33 max mem: 5931
2022-10-22 20:10:28,408 mmn.trainer INFO: Start epoch 18. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:10:28,408 mmn.trainer INFO: Using all losses
2022-10-22 20:11:03,819 mmn.trainer INFO: eta: 4:59:52 epoch: 18/250 iteration: 10/19 loss_vid: 0.66 loss_sent: 0.94 loss_iou: 0.12 time: 3.56 max mem: 5931
2022-10-22 20:11:33,739 mmn.trainer INFO: eta: 4:57:48 epoch: 18/250 iteration: 19/19 loss_vid: 0.67 loss_sent: 0.94 loss_iou: 0.13 time: 3.32 max mem: 5931
2022-10-22 20:11:33,890 mmn.trainer INFO: Start epoch 19. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:11:33,890 mmn.trainer INFO: Using all losses
2022-10-22 20:12:09,617 mmn.trainer INFO: eta: 4:56:09 epoch: 19/250 iteration: 10/19 loss_vid: 0.67 loss_sent: 0.94 loss_iou: 0.12 time: 3.59 max mem: 5931
2022-10-22 20:12:39,262 mmn.trainer INFO: eta: 4:54:11 epoch: 19/250 iteration: 19/19 loss_vid: 0.66 loss_sent: 0.94 loss_iou: 0.13 time: 3.30 max mem: 5931
2022-10-22 20:12:39,407 mmn.trainer INFO: Start epoch 20. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:12:39,407 mmn.trainer INFO: Using all losses
2022-10-22 20:13:14,387 mmn.trainer INFO: eta: 4:52:31 epoch: 20/250 iteration: 10/19 loss_vid: 0.65 loss_sent: 0.92 loss_iou: 0.12 time: 3.51 max mem: 5931
2022-10-22 20:13:44,687 mmn.trainer INFO: eta: 4:50:48 epoch: 20/250 iteration: 19/19 loss_vid: 0.65 loss_sent: 0.92 loss_iou: 0.12 time: 3.37 max mem: 5931
2022-10-22 20:13:44,846 mmn.utils.checkpoint INFO: Saving checkpoint to ./pool_model_20e.pth
2022-10-22 20:13:56,710 mmn.inference INFO: Start evaluation on ('tacos_test',) dataset (Size: 25).
2022-10-22 20:14:10,127 mmn.inference INFO: Model inference time: 0:00:08.545635 (0.342 s / inference per device, on 1 devices)
2022-10-22 20:14:10,128 mmn.inference INFO: Performing TACoSDataset evaluation (Size: 25).

0it [00:00, ?it/s]
1it [00:10, 10.95s/it]
2it [00:21, 10.63s/it]
3it [00:32, 10.86s/it]
4it [00:36, 8.32s/it]
5it [00:45, 8.26s/it]
6it [00:58, 10.11s/it]
7it [01:05, 9.02s/it]
8it [01:14, 9.03s/it]
9it [01:25, 9.55s/it]
10it [01:34, 9.33s/it]
11it [01:45, 10.02s/it]
12it [01:55, 9.91s/it]
13it [02:03, 9.39s/it]
14it [02:14, 9.98s/it]
15it [02:25, 10.11s/it]
16it [02:35, 10.15s/it]
17it [02:43, 9.62s/it]
18it [02:54, 9.93s/it]
19it [03:06, 10.38s/it]
20it [03:12, 9.26s/it]
21it [03:21, 9.15s/it]
22it [03:31, 9.46s/it]
23it [03:40, 9.26s/it]
24it [03:52, 9.95s/it]
25it [04:03, 10.36s/it]
25it [04:03, 9.74s/it]2022-10-22 20:18:13,563 mmn.inference INFO:
+-------------+-------------+-------------+-------------+-------------+-------------+
| R@1,IoU@0.1 | R@1,IoU@0.3 | R@1,IoU@0.5 | R@5,IoU@0.1 | R@5,IoU@0.3 | R@5,IoU@0.5 |
+-------------+-------------+-------------+-------------+-------------+-------------+
| 25.27 | 14.00 | 6.80 | 53.34 | 39.77 | 24.29 |
+-------------+-------------+-------------+-------------+-------------+-------------+
2022-10-22 20:18:13,640 mmn.trainer INFO: Start epoch 21. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:18:13,640 mmn.trainer INFO: Using all losses

2022-10-22 20:18:48,761 mmn.trainer INFO: eta: 5:39:22 epoch: 21/250 iteration: 10/19 loss_vid: 0.66 loss_sent: 0.93 loss_iou: 0.12 time: 30.41 max mem: 5931
2022-10-22 20:19:18,079 mmn.trainer INFO: eta: 5:36:21 epoch: 21/250 iteration: 19/19 loss_vid: 0.64 loss_sent: 0.91 loss_iou: 0.12 time: 3.27 max mem: 5931
2022-10-22 20:19:18,263 mmn.trainer INFO: Start epoch 22. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:19:18,263 mmn.trainer INFO: Using all losses
2022-10-22 20:19:53,895 mmn.trainer INFO: eta: 5:33:42 epoch: 22/250 iteration: 10/19 loss_vid: 0.65 loss_sent: 0.91 loss_iou: 0.11 time: 3.58 max mem: 5931
2022-10-22 20:20:24,019 mmn.trainer INFO: eta: 5:31:03 epoch: 22/250 iteration: 19/19 loss_vid: 0.64 loss_sent: 0.91 loss_iou: 0.12 time: 3.35 max mem: 5931
2022-10-22 20:20:24,154 mmn.trainer INFO: Start epoch 23. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:20:24,154 mmn.trainer INFO: Using all losses
2022-10-22 20:20:59,735 mmn.trainer INFO: eta: 5:28:35 epoch: 23/250 iteration: 10/19 loss_vid: 0.64 loss_sent: 0.91 loss_iou: 0.12 time: 3.57 max mem: 5931
2022-10-22 20:21:30,406 mmn.trainer INFO: eta: 5:26:11 epoch: 23/250 iteration: 19/19 loss_vid: 0.65 loss_sent: 0.91 loss_iou: 0.12 time: 3.41 max mem: 5931
2022-10-22 20:21:30,752 mmn.trainer INFO: Start epoch 24. base_lr=1.5e-03, bert_lr=1.5e-04, bert.requires_grad=False
2022-10-22 20:21:30,752 mmn.trainer INFO: Using all losses
2022-10-22 20:22:07,630 mmn.trainer INFO: eta: 5:24:07 epoch: 24/250 iteration: 10/19 loss_vid: 0.64 loss_sent: 0.90 loss_iou: 0.12 time: 3.72 max mem: 5931

In addition, i'm not sure whether the environment has influence. My pytorch version is 1.7.0,python version is 3.6.6, cuda version is 10.2. I also test on the environment of pytorch 1.10.1, python 3.6.6, cuda 11.4, but i could not get same result as well.

@zhenzhiwang
Copy link
Collaborator

Thank you for the detailed log. I believe the randomness in my code is not completely removed, although I mannually set the random seed in

seed = 25285
. I am busy recently, so I am afraid I could not investigate the whole code thoroughly to find where the randomness is.

Due to the randomness of negative sample selection in contrastive learning, it could lead to dramatic difference in gradients and affect the training process (especially in the TACoS dataset, due to that its videos commonly have 100+ sentences per video). I think the difference in evaluation metrics during the training process is not important, as long as the final results after convergence are similar between different trainings. Please compare your final results between your multiple trainings, and also compare it with the result reported in the paper. It will not be a problem if there is no huge performance gap. Just in case there is a huge performance gap, I will check it later. However, I believe there is no huge performance gap because the provided model parameters are trained just with this code in this version.

@LLLddddd
Copy link
Author

LLLddddd commented Oct 23, 2022

Thanks a lot!

If there is free time further, I'll appreciate it if you could check the randomness in your code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants