v3.0.0
This is a major release of Chainer v3.0.0. All the updates from the previous major version (v2.0.0) are found in the release notes below:
- v3.0.0a1 (https://github.com/chainer/chainer/releases/tag/v3.0.0a1)
- v3.0.0b1 (https://github.com/chainer/chainer/releases/tag/v3.0.0b1)
- v3.0.0rc1 (https://github.com/chainer/chainer/releases/tag/v3.0.0rc1)
- v3.0.0 (this document)
The biggest change is the introduction of new-style differentiable functions and resulting support for double backward (gradient of gradient) in many functions. The details are linked below:
- The new-style differentiable function (see the details in the v3.0.0b1 release notes)
- Double backward support for many functions (see the list of almost all functions which support double backward in the v3.0.0rc1 release notes, and the others are listed below.)
As for the backward compatibility, most users of v2.x are not affected by the introduction of new-style function FunctionNode
because the conventional Function
is still supported in v3 (and in the future versions). Even if you are using custom functions written with Function
, you can continue running the same code with Chainer v3.0.0. You need to rewrite such custom functions only when you want to use new features added to the new-style function, e.g. double backprop.
The backward compatibility of the overall APIs is slightly broken, though most users are not affected. See the above release notes for the details of broken compatibility.
Examples of grad of grad in Chainer
Usage of the grad
function
You can calculate gradients of any variables in a computational graph w.r.t. any other variables in the graph using the chainer.grad
function with enable_double_backprop=True
option.
# Both x and y are chainer.Variable objects
y = x * x * x / 3 # Construct a computational graph
gx, = chainer.grad([y], [x], enable_double_backprop=True)
ggx, = chainer.grad([gx], [x], enable_double_backprop=True)
Here, the above calculation of ggx
is equal to:
gx.backward()
x.grad_var # => This is equal to the above ggx
Of course, one more differentiation gives us 2:
gggx, = chainer.grad([ggx], [x], enable_double_backprop=True)
print(gggx) #=> variable([ 2.])
The loss function of WGAN-GP
WGAN-GP (which stands for Wasserstein GAN with Gradient Penalty[1]) is one example of a GAN that uses gradients of gradients when calculating the loss. It penalizes the gradient norm for enforcing the Lipschitz constraint. The gradient norm is computed at a random interpolation x_hat
between a generated point x_tilde
and a real example x
. Then, the loss including the penalty term will be further differentiated w.r.t. trainable parameters in the model, so that it actually performs double backward for the discriminator. The code below shows how to implement it using the backward()
method with enable_double_backprop=True
option:
# G (generator) and D (discriminator) should be implemented somewhere else
x_tilde = G(z)
x_hat = x + u * (x_tilde – x)
# 1st diff
D(x_hat).backward(enable_double_backprop=True)
gradient_penalty = lambda * (x_hat.grad_var – 1) ** 2
loss = D(x_tilde) – D(x) + gradient_penalty
model.cleargrads() # to clear the 1st diff of params
loss.backward() # 2nd diff
You can also implement it using grad()
, which may be faster because it omits the computation of gradients w.r.t. parameters.
x_tilde = G(z)
x_hat = x + u * (x_tilde – x)
# 1st diff
gx_hat, = chainer.grad([D(x_hat)], [x_hat], enable_double_backprop=True)
gradient_penalty = lambda * (gx_hat – 1) ** 2
loss = D(x_tilde) – D(x) + gradient_penalty
model.cleargrads() # to clear the 1st diff of params
loss.backward() # 2nd diff
[1]: I. Gulrajani, et. al. “Improved Training of Wasserstein GANs,” https://arxiv.org/abs/1704.00028
Here are some simple comparisons of grad of grad in Chainer and other frameworks:
https://gist.github.com/delta2323/9bbca950ee32c523c7aec2e02ad7f85a
New features
- Add
F.flip
function (#3532) - Functions with double-backprop support:
F.swapaxis
(#3480),F.permutate
(#3481),F.transpose_sequence
(#3525)
Bug fixes
- Workaround for NumPy dot operation bug on non-contiguous arrays (#3478)
- Fix
KeyError
when using evaluator without target 'main' (#3460) - Fix
AttributeError
for missinginv_std
inF.fixed_batch_normalization
backward (#3479, thanks @zaburo-ch!)
Improvements
- Remove unused
invoke_before_training
argument fromTrainer.extend
(#3516) - Improve performance of
MultiprocessIterator
for non tuple/dict datasets (#3413, thanks @yuyu2172!) - Type check in
chainer.grad
(#3514)
Documentation
- Document deprecation of stream option of
to_gpu
(#3519) - Add documentation for
ParameterStatistics
extension (#3323) - Fix typos: (#3414, thanks @knorth55!), (#3455, thanks @HusainZafar!),
- Fix source links for functions defined with
contextlib.contextmanager
(#3567) - Improve or fix documentation:
F.swapaxes
,F.squeeze
,F.transpose
(#3415, thanks @naoto0804!),F.separate
,F.select_item
, andF.permutate
(#3417, thanks @naoto0804!), Constant initializer (#3560),init_scope
(#3520),F.reshape
(#3515), ConvNet tutorial (#3509) - Add documentation of links for framework compatibility (#3476)
- Fix documentation warnings (#3490)
- Intoroduce docstring checker and fix markup of “returns” sections (#3510)
- Remove obsolete statement about copy between devices in
to_gpu
(#3517) - Document deprecation of stream option of
to_gpu
(#3519) - Fix type-check reference (#3521)
- Improve style of deprecation notification (#3522)
- Avoid horizontal scroll of tables (#3538)
- Add/modify supported versions of dependencies in the installation guide (#3580)
Tests
- Skip multiprocess interrupt tests (#3412)
- Add tests for
__delattr__
inLink
andChain
(#3416, thanks @naoto0804!) - Improve
numerical_grad
accuracy (#3495) - Improve test mode of VAE example (#3431)
- Delete redundant test settings for
F.get_item
(#3469, thanks @yuyu2172!) - Avoid unwanted output of
assert_allclose
failure (#3518) - Stabilization of stochastic numerical errors