Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it necessary to rebuild the model every train iteration? #1

Closed
pangyyyyy opened this issue Apr 11, 2022 · 4 comments
Closed

Is it necessary to rebuild the model every train iteration? #1

pangyyyyy opened this issue Apr 11, 2022 · 4 comments

Comments

@pangyyyyy
Copy link

pangyyyyy commented Apr 11, 2022

Hi, thanks for the great work!

I noticed that you rebuild the meta-model every iteration (L129), and I was wondering if that is necessary?

RobustMW-Net/trainer.py

Lines 122 to 130 in cabea1f

for iters in range(args.iters):
adjust_learning_rate(optimizer_model, iters + 1)
model.train()
input, target = next(iter(train_loader))
input_var = to_var(input, requires_grad=False)
target_var = to_var(target, requires_grad=False)
meta_model = build_model()
meta_model.load_state_dict(model.state_dict())

Would it have any difference or negative impact if i were to just build the meta-model before the loop and reload the model's state_dict every iteration (L130) instead?

@arghosh
Copy link
Owner

arghosh commented Apr 11, 2022

Hi. Good catch. I think, it should be the same and faster. But, make sure, gradients are not accumulated.
This code follows more closely Meta-weight-network implementation. You can check my other repo where I simplified the MWNet implementation using higher package.

@pangyyyyy
Copy link
Author

pangyyyyy commented Apr 11, 2022

@arghosh Thanks for the clarification! Your implementation using higher package seems rather neat, does it support distributed training?

@arghosh
Copy link
Owner

arghosh commented Apr 11, 2022

My code does not support distributed training. I don't think higher supports data parallel. But, you may pass the meta batch to different GPUs, do local meta step, compute base model gradients in each node; after that, DDP can handle that, I guess.

@pangyyyyy
Copy link
Author

@arghosh Thanks for the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants