Question about cutting gradient 

Hello and thanks for your work. I would like to ask regarding the “on-the-fly” linear evaluation that you do after each pre-training epoch. In the tf original implementation, they cut the gradient during linear evaluation training to ensure that labels are not backpropgated to the ResNet-50. In PyTorch, this can be done by detach(). But I see that you only include a seperate optimizer for the linear evaluation and include the parameters of this linear evaluation layer in the optimizer (which means they will be updated only). That makes sense, but have you made sure that this is the case? (Have you made sure that using a seperate optimizer ensures the gradient does not flow to the resnet-50)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about cutting gradient #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about cutting gradient #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions