Have the same accuracy #11

Mrbishuai · 2021-08-10T01:04:27Z

Hello, Li Tao, Thanks for your great work.
I'am sorry to bother you, but the problem has been bothering me for a long time.
When I run train_ssl.py for pre-train, the --neg is set to repeat. Next, run ft_classify.py to fine-tune.
When I run train_ssl.py for pre-train, the --neg is set to shuffle. Other settings are the same. Next, run ft_classify.py to fine-tune.
Why is there the same accuracy and best_model. I don't understand this very much. Can you help me analyze what's wrong?

BestJuly · 2021-08-12T08:42:00Z

Hi @Mrbishuai , thank you for your interest.

It sounds not reasonable because settings are different.
Could you provide more detailed information, such as

Have you make sure that in your code version, with different settings of --neg, the input data is the same?
Have you check the retrieval accuracies from different (i.e. repeat and shuffle) self-supervised learning models.

Because you mentioned the same accuracy and ft_classify.py, I guess one reason might be the model loading part. Because I fixed random seed, and if you finetune models when not successfully loading the pretrained weights, the results will be the same for each run.
For the loading part, you can modify the code model.load_state_dict(xxx, strict=False) to model.load_state_dict(xxx, strict=True) to check which layers have been successfully loaded. It should raise error because for classification, the fully-connected layer is newly added. You can also uncomment this to check.

Mrbishuai · 2021-08-12T09:47:52Z

Hi @BestJuly , thank you for your reply!
I think the pre-train part is OK. When I run train_ssl.py with different settings of --neg(repeat or shuffle), the parameter display is different.
I also guess there is a problem loading the model.

When download your pre-train model r3d_res_repeat_cls.pth to direct test, the code is
pretrained_weights = torch.load(args.ckpt, map_location='cpu')
if args.mode == 'train':
model.load_state_dict(pretrained_weights, strict=False)
else:
model.load_state_dict(pretrained_weights, strict=True)

When run ft_classify.py with you provided to generate best_model, next to test. This code had to be modefied to
pretrained_weights = torch.load(args.ckpt, map_location='cpu')
if args.mode == 'train':
model.load_state_dict(pretrained_weights, strict=False)
else:
model.load_state_dict(pretrained_weights['model'], strict=True)

After careful analysis of their parameters.
I found that your r3d_res_repeat_cls.pth pre-train model, the class is 'collections.OrderedDict' and length is 74.
My generated mest_ model.pt, the class is dict and length is 1.
I don't know if there's a problem here.

BestJuly · 2021-08-12T09:58:22Z

@Mrbishuai This might caused by different version of saving codes.

I am not sure whether I used the same codes for my provided checkpoint.
(I think it is highly possible in current situation that the provided checkpoint is generated from different codes.)

There are different options for saving model parameters:

# option 1
state = {'model': model.state_dict(),}
torch.save(state, PATH)

# option 2
torch.save(model.state_dict(), PATH)

You should use model.load_state_dict(pretrained_weights['model'], strict=True) for option 1 and model.load_state_dict(pretrained_weights, strict=True) for option 2.
This will also result in different type as you mentioned, dict and collections.OrderedDict.

The strict=True is used for loading fine-tuned models and strict=False is used for loading SSL models because of the differences of model definition.

Mrbishuai · 2021-08-13T01:50:45Z

Hi @BestJuly , thank you for your reply!
I think the pre-train part is OK. When I run train_ssl.py with different settings of --neg(repeat or shuffle), the parameter display is different.
I also guess there is a problem loading the model.

When download your pre-train model r3d_res_repeat_cls.pth to direct test, the code is
pretrained_weights = torch.load(args.ckpt, map_location='cpu')
if args.mode == 'train':
model.load_state_dict(pretrained_weights, strict=False)
else:
model.load_state_dict(pretrained_weights, strict=True)

When run ft_classify.py with you provided to generate best_model, next to test. This code had to be modefied to
pretrained_weights = torch.load(args.ckpt, map_location='cpu')
if args.mode == 'train':
model.load_state_dict(pretrained_weights, strict=False)
else:
model.load_state_dict(pretrained_weights['model'], strict=True)

After careful analysis of their parameters.
I found that your r3d_res_repeat_cls.pth pre-train model, the class is 'collections.OrderedDict' and length is 74.
My generated mest_ model.pt, the class is dict and length is 1.
I don't know if there's a problem here.

Mrbishuai · 2021-08-13T01:54:07Z

Hi @BestJuly .
when I modify the code model.load_state_dict(xxx, strict=False) to model.load_state_dict(xxx, strict=True).
There is a mistake
Traceback (most recent call last):
File "ft_classify.py", line 238, in
model.load_state_dict(pretrained_weights, strict=True)
File "/home/bishuai/anaconda3/envs/IIC/lib/python3.7/site-packages/torch/nn/modules/module.py", line 839, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for R3DNet:
Missing key(s) in state_dict: "conv1.temporal_spatial_conv.weight", "bn1.weight", "bn1.bias", "bn1.running_mean", "bn1.running_var", "conv2.block1.conv1.temporal_spatial_conv.weight", "conv2.block1.bn1.weight", "conv2.block1.bn1.bias", "conv2.block1.bn1.running_mean", "conv2.block1.bn1.running_var", "conv2.block1.conv2.temporal_spatial_conv.weight", "conv2.block1.bn2.weight", "conv2.block1.bn2.bias", "conv2.block1.bn2.running_mean", "conv2.block1.bn2.running_var", "conv3.block1.downsampleconv.temporal_spatial_conv.weight", "conv3.block1.downsamplebn.weight", "conv3.block1.downsamplebn.bias", "conv3.block1.downsamplebn.running_mean", "conv3.block1.downsamplebn.running_var", "conv3.block1.conv1.temporal_spatial_conv.weight", "conv3.block1.bn1.weight", "conv3.block1.bn1.bias", "conv3.block1.bn1.running_mean", "conv3.block1.bn1.running_var", "conv3.block1.conv2.temporal_spatial_conv.weight", "conv3.block1.bn2.weight", "conv3.block1.bn2.bias", "conv3.block1.bn2.running_mean", "conv3.block1.bn2.running_var", "conv4.block1.downsampleconv.temporal_spatial_conv.weight", "conv4.block1.downsamplebn.weight", "conv4.block1.downsamplebn.bias", "conv4.block1.downsamplebn.running_mean", "conv4.block1.downsamplebn.running_var", "conv4.block1.conv1.temporal_spatial_conv.weight", "conv4.block1.bn1.weight", "conv4.block1.bn1.bias", "conv4.block1.bn1.running_mean", "conv4.block1.bn1.running_var", "conv4.block1.conv2.temporal_spatial_conv.weight", "conv4.block1.bn2.weight", "conv4.block1.bn2.bias", "conv4.block1.bn2.running_mean", "conv4.block1.bn2.running_var", "conv5.block1.downsampleconv.temporal_spatial_conv.weight", "conv5.block1.downsamplebn.weight", "conv5.block1.downsamplebn.bias", "conv5.block1.downsamplebn.running_mean", "conv5.block1.downsamplebn.running_var", "conv5.block1.conv1.temporal_spatial_conv.weight", "conv5.block1.bn1.weight", "conv5.block1.bn1.bias", "conv5.block1.bn1.running_mean", "conv5.block1.bn1.running_var", "conv5.block1.conv2.temporal_spatial_conv.weight", "conv5.block1.bn2.weight", "conv5.block1.bn2.bias", "conv5.block1.bn2.running_mean", "conv5.block1.bn2.running_var", "linear.weight", "linear.bias".

BestJuly · 2021-08-13T02:06:05Z

Hi @Mrbishuai
Because model architecutes are the same, you need to check the names of weights and the saving structures I mentioned.

For example, you may meet the problem when the name of each layer contain base_network,
then the error should contain both "missing keys" and "keys which are not found" because the names are different.
I am not sure whether the error report also contained other information telling this.

Also, I want to mention again that there are two options

# option 1
## save checkpoint
state = {'model': model.state_dict(),}
torch.save(state, PATH)
## load checkpoint
model.load_state_dict(pretrained_weights['model'], strict=True)

# option 2
## save checkpoint
torch.save(model.state_dict(), PATH)
## load checkpoint
model.load_state_dict(pretrained_weights, strict=True)

If you only use loading part of option 2 to load the models saving with option 1, errors will raise.

The correct msg you should get is that

If you load SSL pre-trained models when setting strict=True, errors will raise but only saying mismatched layers of FC layers, then you can change to set strict=False and start your fine-tuning process;
If you load my provided classification model, no errors should occur. (Because I do not remember which option I used when I train the provided model, please try both options).

BestJuly closed this as completed Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have the same accuracy #11

Have the same accuracy #11

Mrbishuai commented Aug 10, 2021

BestJuly commented Aug 12, 2021

Mrbishuai commented Aug 12, 2021

BestJuly commented Aug 12, 2021 •

edited

Loading

Mrbishuai commented Aug 13, 2021

Mrbishuai commented Aug 13, 2021

BestJuly commented Aug 13, 2021 •

edited

Loading

Have the same accuracy #11

Have the same accuracy #11

Comments

Mrbishuai commented Aug 10, 2021

BestJuly commented Aug 12, 2021

Mrbishuai commented Aug 12, 2021

BestJuly commented Aug 12, 2021 • edited Loading

Mrbishuai commented Aug 13, 2021

Mrbishuai commented Aug 13, 2021

BestJuly commented Aug 13, 2021 • edited Loading

BestJuly commented Aug 12, 2021 •

edited

Loading

BestJuly commented Aug 13, 2021 •

edited

Loading