Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the example of VAE using Chainer distributions #5356

Merged
merged 19 commits into from Nov 21, 2018

Conversation

Projects
None yet
5 participants
@ganow
Copy link
Contributor

ganow commented Sep 15, 2018

I rewrote the official example of Variational Autoencoder in examples/vae by using Chainer distributions.

This code records 83.1864 validation loss by 100 epochs and 20 latent variable dimensions, the parameter setting of which is the same as the default behavior of the previous code. A visualization of the reconstructed images is shown below.

The figure shown below represents the random samples from the validation dataset.
test

The images reconstructed from these random samples are shown below.
test_reconstruction

I have to note that there are two breaking changes from the previous codes.

First, I change the dataset from MNIST to binarized MNIST. It is because Variational Autoencoder typically models each pixel of the MNIST images as Bernoulli distribution so as this code does. Bernoulli distribution is defined on a discrete variable {0, 1}, and D.Bernoulli(p).log_prob(x) returns 0 or -inf when x is not in {0, 1}. This behavior collapses training for continuous input. For this reason, we need to discretize pixel values when we want to use Bernoulli distribution.

Second, I removed train_vae_custom_loop.py. The train_vae_custom_loop.py is the version which does not use a trainer and an updater. I found that this code basically does the same thing as train_vae.py, and I felt that explaining how to use a trainer and an updater is outside the scope of this example. I can add train_vae_custom_loop.py again if it is needed.

@crcrpar

This comment has been minimized.

Copy link
Contributor

crcrpar commented Sep 17, 2018

Quick look your code and I like this but some parameters seem to be changed or removed.

For example, beta and k are missed.

@toslunar toslunar self-assigned this Sep 17, 2018

Show resolved Hide resolved examples/vae/net.py Outdated
train[train >= 0.5] = 1.0
train[train < 0.5] = 0.0
test[test >= 0.5] = 1.0
test[test < 0.5] = 0.0

This comment has been minimized.

Copy link
@YoshikawaMasashi

YoshikawaMasashi Sep 17, 2018

Member

I'll implement not strict option for D.Bernoulli.

This comment has been minimized.

Copy link
@YoshikawaMasashi

This comment has been minimized.

Copy link
@YoshikawaMasashi

YoshikawaMasashi Sep 26, 2018

Member

this PR is merged. so you can use D.Bernoulli without binarization.

@ganow

This comment has been minimized.

Copy link
Contributor Author

ganow commented Sep 18, 2018

Quick look your code and I like this but some parameters seem to be changed or removed.

For example, beta and k are missed.

Thank you for mentioning them. I missed these parameters because the previous code did not provide any methods to change them via CLI. I think it's better to incorporate them and provide the command line options, so I'll fix it.

@ganow

This comment has been minimized.

Copy link
Contributor Author

ganow commented Sep 18, 2018

Quick look your code and I like this but some parameters seem to be changed or removed.
For example, beta and k are missed.

Thank you for mentioning them. I missed these parameters because the previous code did not provide any methods to change them via CLI. I think it's better to incorporate them and provide the command line options, so I'll fix it.

I changed the code so that to incorporate beta, but I realized that to incorporate k is a little difficult. If we simply put k into the posterior distribution like z = q_z.sample(k), the shape of z becomes (k, batch_size, dim). This modification changes z.ndim and following operation decoder(z) raises an error shown below:

...
chainer.utils.type_check.InvalidType:
Invalid operation is performed in: LinearFunction (Forward)

Expect: x.shape[1] == W.shape[1]
Actual: <batch_size * dim> != <dim>

I have some idea to solve this problem dirtily, like using F.reshape() several times. If someone has any better, or appropriate way to solve, I would like you to ask.

@toslunar

This comment has been minimized.

Copy link
Member

toslunar commented Sep 18, 2018

Does n_batch_axes=2 work?

@ganow

This comment has been minimized.

Copy link
Contributor Author

ganow commented Sep 18, 2018

Does n_batch_axes=2 work?

It worked, thank you, but D.Bernoulli(logit=self.linear(z, n_batch_axes=2)) raised following error. D.Bernoulli seems to only accept two-dimensional array in total.

Exception in main training loop: boolean index did not match indexed array along dimension 0; dimension is 10 but corresponding boolean dimension is 100
Traceback (most recent call last):
...
  File "/Users/ganow/local/src/github.com/ganow/chainer/examples/vae/net.py", line 34, in __call__
    self.rec = F.mean(F.sum(p_x.log_prob(x), axis=-1))
  File "/Users/ganow/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/chainer/distributions/bernoulli.py", line 113, in log_prob
    return _bernoulli_log_prob(self.logit, x)
  File "/Users/ganow/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/chainer/distributions/bernoulli.py", line 53, in _bernoulli_log_prob
    y, = BernoulliLogProb().apply((logit, x))
  File "/Users/ganow/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/chainer/function_node.py", line 257, in apply
    outputs = self.forward(in_data)
  File "/Users/ganow/.pyenv/versions/anaconda3-5.2.0/lib/python3.6/site-packages/chainer/distributions/bernoulli.py", line 23, in forward
    y[self.invalid] = - xp.inf
...
@ganow

This comment has been minimized.

Copy link
Contributor Author

ganow commented Sep 19, 2018

In VAE, we typically do D.Bernoulli(logit=NN(q_z.sample(k))).log_prob(x) such that x.shape == (batch_size, dim) and NN(q_z.sample(k)).shape == (k, batch_size, dim). Currently D.Bernoulli seems not to work when logit.ndim > x.ndim, so I implemented k option with F.broadcast_to().

@toslunar
Copy link
Member

toslunar left a comment

The algorithm looks correct. I left some comments on implementation details.

Show resolved Hide resolved examples/vae/net.py
Show resolved Hide resolved examples/vae/train_vae.py Outdated
Show resolved Hide resolved examples/vae/net.py Outdated
Show resolved Hide resolved examples/vae/train_vae.py Outdated
Show resolved Hide resolved examples/vae/net.py Outdated
@toslunar
Copy link
Member

toslunar left a comment

I checked a run with a GPU.

Show resolved Hide resolved examples/vae/net.py Outdated
Show resolved Hide resolved examples/vae/train_vae.py
Show resolved Hide resolved examples/vae/train_vae.py Outdated

toslunar and others added some commits Nov 13, 2018

transfer avg_elbo_loss to gpu
Co-Authored-By: ganow <y.nagano.92@gmail.com>
@toslunar
Copy link
Member

toslunar left a comment

Could you use Variable.array instead of .data? (Changed in #5386).
LGTM except for this.

@toslunar

This comment has been minimized.

Copy link
Member

toslunar commented Nov 13, 2018

Jenkins, test this please.

@chainer-ci

This comment has been minimized.

Copy link
Collaborator

chainer-ci commented Nov 13, 2018

Jenkins CI test (for commit 6957ecf, target branch master) failed with status FAILURE.
(For contributors, please wait until the reviewer confirms the details of the error.)

Show resolved Hide resolved examples/vae/net.py
add register_persistent() to examples/vae/net.py
Co-Authored-By: ganow <y.nagano.92@gmail.com>
@toslunar

This comment has been minimized.

Copy link
Member

toslunar commented Nov 16, 2018

Jenkins, test this please.

@chainer-ci

This comment has been minimized.

Copy link
Collaborator

chainer-ci commented Nov 16, 2018

Jenkins CI test (for commit 61565f4, target branch master) failed with status FAILURE.
(For contributors, please wait until the reviewer confirms the details of the error.)

@toslunar

This comment has been minimized.

Copy link
Member

toslunar commented Nov 21, 2018

The Jenkins failure (BatchRenormalizationTest_param_7.test_forward_gpu) is unrelated to the PR.

@toslunar toslunar merged commit d6ac11f into chainer:master Nov 21, 2018

4 checks passed

codecov/patch Coverage not affected when comparing fcb0732...61565f4
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.2%) to 89.665%
Details

@toslunar toslunar added this to the v6.0.0b1 milestone Nov 21, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.