Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions of training details of coclr #14

Closed
YuqiHUO opened this issue Dec 3, 2020 · 6 comments
Closed

questions of training details of coclr #14

YuqiHUO opened this issue Dec 3, 2020 · 6 comments

Comments

@YuqiHUO
Copy link

YuqiHUO commented Dec 3, 2020

Hi, im trying to replicate your result on the alternation stage, I now use two init models you provided (both 400~ epochs). I have two questions.

1). According to your paper, "At the alternation stage, on UCF101 the model is trained for two cycles, where each cycle includes 200 epochs, i.e. RGB and Flow networks are each trained for 100 epochs". Does that main I need to run main_coclr.py four times? each time with 100 epoch and the newest two pretrained models I have from the previous training process?

2). If so, what lr do you use in each of four 100 epochs in the alternation stage? I also checked the COCLR pretrained model you provided, it seems in 182 epoch and 109 epoch the lr is 1e-4. Is that mean I need to train the second cycles with larger lr, e.g. 1e-2, and decay down to 1e-4?

Best Regards,
Yuqi

@TengdaHan
Copy link
Owner

Hi,

  1. Yes exactly. I updated some commands in the readme file.
  2. For each time I run main_coclr.py, I use Adam with lr starting from 1e-3, then decay once.

@junmin98
Copy link

junmin98 commented Dec 29, 2020

Yuqi

Hi, im trying to replicate your result on the alternation stage, I now use two init models you provided (both 400~ epochs). I have two questions.

1). According to your paper, "At the alternation stage, on UCF101 the model is trained for two cycles, where each cycle includes 200 epochs, i.e. RGB and Flow networks are each trained for 100 epochs". Does that main I need to run main_coclr.py four times? each time with 100 epoch and the newest two pretrained models I have from the previous training process?

2). If so, what lr do you use in each of four 100 epochs in the alternation stage? I also checked the COCLR pretrained model you provided, it seems in 182 epoch and 109 epoch the lr is 1e-4. Is that mean I need to train the second cycles with larger lr, e.g. 1e-2, and decay down to 1e-4?

Best Regards,
Yuqi

Hi!
I'm also on the Alternation stage, but I have a question.
Do you see phrases like "Weighs not used from pretrained file:", "Weights not loaded into new model:"?
If not, Can you share the phrase entered in the terminal such as "CUDA_VISIBLE_DEVICES..."

And how much accuracy does FlowMining of Cycle1 get?
When I run it, the accuracy doesn't exceed 1. Originally in Cycle1, is it like that?

I would be grateful if you share your experiences!

@YuqiHUO
Copy link
Author

YuqiHUO commented Dec 29, 2020

Yuqi

Hi, im trying to replicate your result on the alternation stage, I now use two init models you provided (both 400~ epochs). I have two questions.
1). According to your paper, "At the alternation stage, on UCF101 the model is trained for two cycles, where each cycle includes 200 epochs, i.e. RGB and Flow networks are each trained for 100 epochs". Does that main I need to run main_coclr.py four times? each time with 100 epoch and the newest two pretrained models I have from the previous training process?
2). If so, what lr do you use in each of four 100 epochs in the alternation stage? I also checked the COCLR pretrained model you provided, it seems in 182 epoch and 109 epoch the lr is 1e-4. Is that mean I need to train the second cycles with larger lr, e.g. 1e-2, and decay down to 1e-4?
Best Regards,
Yuqi

Hi!
I'm also on the Alternation stage, but I have a question.
Do you see phrases like "Weighs not used from pretrained file:", "Weights not loaded into new model:"?
If not, Can you share the phrase entered in the terminal such as "CUDA_VISIBLE_DEVICES..."

And how much accuracy does FlowMining of Cycle1 get?
When I run it, the accuracy doesn't exceed 1. Originally in Cycle1, is it like that?

I would be grateful if you share your experiences!

Hi,

  1. I see your another issue, during the alternation stage, encoders/samplers load their params from the pretrained files, while queues and ptrs are initialized randomly. Thats why you saw the "Weights not loaded into new model: queue.../ queue..."
    I change the code a lot, for reference, you can see my FlowMining phase
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 main_coclr.py \
--net s3d --topk 5 --moco-k 2048 --seq_len 32 --ds 1 \
--dataset ucf101-2stream-2clip \
--batch_size 16 -j 8  \
--epochs 100 --schedule 80 \
--prefix Cycle1-FlowMining_ \
--pretrain log-pretrain/ep399.pth.tar  log-pretrain/ep396.pth.tar
  1. acc in coclr alternation stage means the top-1 acc on the pretext task "multi-positive instance discrimination". Thus you can never exceed 1, which means the 100% accuracy. Actually, you should get a number between 82%~92%, from my experience.

@junmin98
Copy link

Thank you so much!

Yuqi

Hi, im trying to replicate your result on the alternation stage, I now use two init models you provided (both 400~ epochs). I have two questions.
1). According to your paper, "At the alternation stage, on UCF101 the model is trained for two cycles, where each cycle includes 200 epochs, i.e. RGB and Flow networks are each trained for 100 epochs". Does that main I need to run main_coclr.py four times? each time with 100 epoch and the newest two pretrained models I have from the previous training process?
2). If so, what lr do you use in each of four 100 epochs in the alternation stage? I also checked the COCLR pretrained model you provided, it seems in 182 epoch and 109 epoch the lr is 1e-4. Is that mean I need to train the second cycles with larger lr, e.g. 1e-2, and decay down to 1e-4?
Best Regards,
Yuqi

Hi!
I'm also on the Alternation stage, but I have a question.
Do you see phrases like "Weighs not used from pretrained file:", "Weights not loaded into new model:"?
If not, Can you share the phrase entered in the terminal such as "CUDA_VISIBLE_DEVICES..."
And how much accuracy does FlowMining of Cycle1 get?
When I run it, the accuracy doesn't exceed 1. Originally in Cycle1, is it like that?
I would be grateful if you share your experiences!

Hi,

1. I see your another issue, during the alternation stage, encoders/samplers load their params from the pretrained files, while  queues and ptrs are initialized randomly. Thats why you saw the "Weights not loaded into new model: queue.../ queue..."
   I change the code a lot, for reference, you can see my FlowMining phase
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 main_coclr.py \
--net s3d --topk 5 --moco-k 2048 --seq_len 32 --ds 1 \
--dataset ucf101-2stream-2clip \
--batch_size 16 -j 8  \
--epochs 100 --schedule 80 \
--prefix Cycle1-FlowMining_ \
--pretrain log-pretrain/ep399.pth.tar  log-pretrain/ep396.pth.tar
1. acc in coclr alternation stage means the top-1 acc on the pretext task "multi-positive instance discrimination". Thus you can never exceed 1, which means the 100% accuracy. Actually, you should get a number between 82%~92%, from my experience.

Thank you so much!
your answer helped me a lot!
I'll keep going

@junmin98
Copy link

junmin98 commented Jan 1, 2021

Yuqi

Hi, im trying to replicate your result on the alternation stage, I now use two init models you provided (both 400~ epochs). I have two questions.
1). According to your paper, "At the alternation stage, on UCF101 the model is trained for two cycles, where each cycle includes 200 epochs, i.e. RGB and Flow networks are each trained for 100 epochs". Does that main I need to run main_coclr.py four times? each time with 100 epoch and the newest two pretrained models I have from the previous training process?
2). If so, what lr do you use in each of four 100 epochs in the alternation stage? I also checked the COCLR pretrained model you provided, it seems in 182 epoch and 109 epoch the lr is 1e-4. Is that mean I need to train the second cycles with larger lr, e.g. 1e-2, and decay down to 1e-4?
Best Regards,
Yuqi

Hi!
I'm also on the Alternation stage, but I have a question.
Do you see phrases like "Weighs not used from pretrained file:", "Weights not loaded into new model:"?
If not, Can you share the phrase entered in the terminal such as "CUDA_VISIBLE_DEVICES..."
And how much accuracy does FlowMining of Cycle1 get?
When I run it, the accuracy doesn't exceed 1. Originally in Cycle1, is it like that?
I would be grateful if you share your experiences!

Hi,

  1. I see your another issue, during the alternation stage, encoders/samplers load their params from the pretrained files, while queues and ptrs are initialized randomly. Thats why you saw the "Weights not loaded into new model: queue.../ queue..."
    I change the code a lot, for reference, you can see my FlowMining phase
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 main_coclr.py \
--net s3d --topk 5 --moco-k 2048 --seq_len 32 --ds 1 \
--dataset ucf101-2stream-2clip \
--batch_size 16 -j 8  \
--epochs 100 --schedule 80 \
--prefix Cycle1-FlowMining_ \
--pretrain log-pretrain/ep399.pth.tar  log-pretrain/ep396.pth.tar
  1. acc in coclr alternation stage means the top-1 acc on the pretext task "multi-positive instance discrimination". Thus you can never exceed 1, which means the 100% accuracy. Actually, you should get a number between 82%~92%, from my experience.

Thanks to you, I am doing well. But can I just ask one more question?

After completing Cycle 1, I proceeded to Cycle2, which is a problem.

First of all, in the Cylcle1-flowMining process, the best of the result is epoch 4 or 5, Is your best result epoch about 4 or 5 in the Cycle1-FlowMining process?

and I used this to proceed with Cycle2-RGBMIning. So, after proceeding to Cycle 2, I proceeded downstream.
The result of action recognition acc@1 = about 43% acc@5 = only about 73%. (very low..)

Second, when you proceed to Cycle2, do you have to proceed with start_epoch=101 and epoch=200 like this?
When I proceeded with Cycle 2 as described above, the accuracy came out the same

@YuqiHUO
Copy link
Author

YuqiHUO commented Jan 1, 2021

Yuqi

Hi, im trying to replicate your result on the alternation stage, I now use two init models you provided (both 400~ epochs). I have two questions.
1). According to your paper, "At the alternation stage, on UCF101 the model is trained for two cycles, where each cycle includes 200 epochs, i.e. RGB and Flow networks are each trained for 100 epochs". Does that main I need to run main_coclr.py four times? each time with 100 epoch and the newest two pretrained models I have from the previous training process?
2). If so, what lr do you use in each of four 100 epochs in the alternation stage? I also checked the COCLR pretrained model you provided, it seems in 182 epoch and 109 epoch the lr is 1e-4. Is that mean I need to train the second cycles with larger lr, e.g. 1e-2, and decay down to 1e-4?
Best Regards,
Yuqi

Hi!
I'm also on the Alternation stage, but I have a question.
Do you see phrases like "Weighs not used from pretrained file:", "Weights not loaded into new model:"?
If not, Can you share the phrase entered in the terminal such as "CUDA_VISIBLE_DEVICES..."
And how much accuracy does FlowMining of Cycle1 get?
When I run it, the accuracy doesn't exceed 1. Originally in Cycle1, is it like that?
I would be grateful if you share your experiences!

Hi,

  1. I see your another issue, during the alternation stage, encoders/samplers load their params from the pretrained files, while queues and ptrs are initialized randomly. Thats why you saw the "Weights not loaded into new model: queue.../ queue..."
    I change the code a lot, for reference, you can see my FlowMining phase
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 main_coclr.py \
--net s3d --topk 5 --moco-k 2048 --seq_len 32 --ds 1 \
--dataset ucf101-2stream-2clip \
--batch_size 16 -j 8  \
--epochs 100 --schedule 80 \
--prefix Cycle1-FlowMining_ \
--pretrain log-pretrain/ep399.pth.tar  log-pretrain/ep396.pth.tar
  1. acc in coclr alternation stage means the top-1 acc on the pretext task "multi-positive instance discrimination". Thus you can never exceed 1, which means the 100% accuracy. Actually, you should get a number between 82%~92%, from my experience.

Thanks to you, I am doing well. But can I just ask one more question?

After completing Cycle 1, I proceeded to Cycle2, which is a problem.

First of all, in the Cylcle1-flowMining process, the best of the result is epoch 4 or 5, Is your best result epoch about 4 or 5 in the Cycle1-FlowMining process?

and I used this to proceed with Cycle2-RGBMIning. So, after proceeding to Cycle 2, I proceeded downstream.
The result of action recognition acc@1 = about 43% acc@5 = only about 73%. (very low..)

Second, when you proceed to Cycle2, do you have to proceed with start_epoch=101 and epoch=200 like this?
When I proceeded with Cycle 2 as described above, the accuracy came out the same

  1. The reason that the best of the result is 4 or 5 is the pretext task's acc is first down and then up. But the real 'best' epoch in the first cycle must be the Ep99 (I think). So you should use Ep99 to preceded the Cycle2-RGBMining.
  2. I didn't use start_epoch=101 and epoch=200, I just set a total of 100 epochs (but you can regard this as 101-200), I think the epoch setting doesn't matter the final result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants