Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tutorial: adversarial training seems slow. maybe i'm wrong #5

Closed
goodfeli opened this issue Sep 15, 2016 · 8 comments · Fixed by #8
Closed

tutorial: adversarial training seems slow. maybe i'm wrong #5

goodfeli opened this issue Sep 15, 2016 · 8 comments · Fixed by #8

Comments

@goodfeli
Copy link
Contributor

should benchmark it and make sure the runtime is correct.

@goodfeli
Copy link
Contributor Author

without looking at the code, I bet we're missing a stop_gradient somewhere

@goodfeli
Copy link
Contributor Author

3.7 seconds per 100 batches for naive training

@goodfeli
Copy link
Contributor Author

52 sec per 100 batches for adv training

@goodfeli
Copy link
Contributor Author

in pylearn2, my result with adversarial training takes 3 sec per full epoch

@goodfeli
Copy link
Contributor Author

in pylearn2, without adversarial training, my code runs in 1 sec per full epoch

@goodfeli
Copy link
Contributor Author

naive training is forward-back, O(2).
adversarial training is forward-back, back(with different targets)-forward-back, so O(5) if no steps can be parallelized. So it should be roughly 2.5X slower than naive training in theory.
The pylearn2 implementation is 3X slower than naive training, so apparently in practice we can expect some extra overhead.
Doesn't explain why this is > 10X slower.

@goodfeli
Copy link
Contributor Author

whoa, actually something is seriously weird.
1st 100 batches with adv training take 54 seconds
2nd 100 batches take 102 seconds
3rd 100 batches take 153 seconds

@npapernot
Copy link
Member

You are right: the issue was due to my naive implementation which redefined the adversarial loss in the TF graph at each iteration (batch...). I fixed by introducing a new function that add the loss to the graph, and which returns the TF var to be evaluated at each iteration d7a95d3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants