Skip to content

Question about cutting gradient  #4

@fawazsammani

Description

@fawazsammani

Hello and thanks for your work. I would like to ask regarding the “on-the-fly” linear evaluation that you do after each pre-training epoch. In the tf original implementation, they cut the gradient during linear evaluation training to ensure that labels are not backpropgated to the ResNet-50. In PyTorch, this can be done by detach(). But I see that you only include a seperate optimizer for the linear evaluation and include the parameters of this linear evaluation layer in the optimizer (which means they will be updated only). That makes sense, but have you made sure that this is the case? (Have you made sure that using a seperate optimizer ensures the gradient does not flow to the resnet-50)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions