Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Condtional GAN and DCGAN tutorial #111

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

shreyas-kowshik
Copy link
Contributor

Added the markdown file as discussed @MikeInnes

@shreyas-kowshik shreyas-kowshik changed the title Added DCGAN Tutorial Added DCGAN Tutorial and Conditional GAN Model Mar 4, 2019
@shreyas-kowshik
Copy link
Contributor Author

The cGAN part code is completed. Will add the tutorial in the Literate.jl format next.

@shreyas-kowshik shreyas-kowshik changed the title Added DCGAN Tutorial and Conditional GAN Model Added Condtional GAN and DCGAN tutorial Mar 4, 2019
@shreyas-kowshik
Copy link
Contributor Author

shreyas-kowshik commented Mar 4, 2019

Ping @MikeInnes.
Updated PR with 2 tutorials in the format as discussed.

@DhairyaLGandhi
Copy link
Member

Could we get rid of the .ipynb_checkpoints really quickly?

@shreyas-kowshik
Copy link
Contributor Author

Thank you @dhairyagandhi96 . I have made the changes requested.

@MikeInnes
Copy link
Member

You shouldn't have the same content twice here, you can remove the generated markdown file. Not sure what I asked for earlier (ideal to reference comments) but a README would usually just have an overall description and instructions for running the script.

@shreyas-kowshik
Copy link
Contributor Author

@MikeInnes Thank You for the feedback. I have removed the shortened the content of the markdown files, removed HTML and fancy formatting and added instructions to run the script.

1. The original commits were messed up. This commit overwrites all previous commits.
2.The markdown files and the instructions to run the code is given.
return p
end

function zero_grad!(model)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need these utilities; just use update! with Params and Grads, like Flux.train! does.

Also, you're still using HTML tags above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MikeInnes I used the update! without zeroing out the gradients of the models manually. However that model is not converging to the actual output even after repeated trials. However, manually zeroing out the gradients does. I don't know if it's a bug on my part. I was using this as a reference :
https://github.com/eriklindernoren/PyTorch-GAN/blob/1f130dfca726e14254e4fd78e5fb63f08931acd3/implementations/cgan/cgan.py#L161-L195

As pointed out on Slack, gradient used in update! should automatically zero out the gradients, but the results are not reflecting them...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The normal update! method will work if you use gradient rather than back!. back! should be avoided as it's effectively deprecated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MikeInnes Sorry for replying late.Made the requested changes.
Does it look good?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update! does zero out the gradient so no need to do it explicitly I suppose.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe update! only zeros out the gradient in the call but not all the gradient.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this MWE below:

using Flux, Tracker

d1 = Dense(2, 1)
d2 = Dense(1, 1)
c = Chain(d1, d2)
p1 = params(d1)
p2 = params(d2)
pall = params(c)

x = rand(2, 10)

loss() = sum(c(x))

# Case 1
gradient(loss, pall).grads |> values |> println
# This zeros out all gradients

# Case 2
gradient(loss, p1).grads |> values |> println
# After this call, the gradient of p2 is not zeroed out
# Thus the call for gradient of p2 below will be affected
gradient(loss, p2).grads |> values |> println
# After this call, the gradient of p1 is not zeroed out

# Just zero out all gradients before the next experiment
Tracker.zero_grad!.(Tracker.grad.(p1))
Tracker.zero_grad!.(Tracker.grad.(p2))

# Case 3
gradient(loss, p1).grads |> values |> println
Tracker.zero_grad!.(Tracker.grad.(p2))  # just to avoid the situation in Case 1
gradient(loss, p2).grads |> values |> println

gives

Any[Float32[10.0] (tracked), Float32[1.8203094] (tracked), Float32[1.2446415 0.7490297] (tracked), Float32[-1.8717546] (tracked)]

Any[Float32[1.8203094] (tracked), Float32[1.2446415 0.7490297] (tracked)]
Any[Float32[20.0] (tracked), Float32[-3.7435093] (tracked)]

Any[Float32[1.8203094] (tracked), Float32[1.2446415 0.7490297] (tracked)]
Any[Float32[10.0] (tracked), Float32[-1.8717546] (tracked)]

See how Case 2 is different from Case 1 and Case 3.

Copy link
Contributor

@matsueushi matsueushi Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I created a DCGAN model with Tracker backend (Flux v0.9.0), it didn't converge without zeroing out gradients after training the discriminator. https://github.com/matsueushi/fluxjl-gan/blob/flux0.9.0/mnist-dcgan.jl

However, with Zygote backend (Flux v0.10.0),

using Flux

d1 = Dense(2, 1)
d2 = Dense(1, 1)
c = Chain(d1, d2)
p1 = params(d1)
p2 = params(d2)
pall = params(c)

x = rand(2, 10)

loss() = sum(c(x))

@info "Case1"
gradient(loss, pall).grads |> values |> println

@info "Case2"
gradient(loss, p1).grads |> values |> println
gradient(loss, p2).grads |> values |> println

gives expected results

[ Info: Case1
Any[Float32[9.715003], Float32[3.7775292 4.41461], Float32[10.0], Float32[-5.8413825]]
[ Info: Case2
Any[Float32[9.715003], Float32[3.7775292 4.41461]]
Any[Float32[10.0], Float32[-5.8413825]]

and I didn't have to zero out gradients. https://github.com/matsueushi/fluxjl-gan/blob/e60684b6c8ecc601eb6784ae393eae9a3a3ba57a/mnist-dcgan.jl

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is expected. Only a tracker based AD needs the zero-out part. AD based on Zygote doesn't have this side effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants