chainer

Upgrade Guide from v1 to v2

This document provides detailed information of differences between Chainer v1 and v2. You will know by reading it which part of your code is required (or recommended) to be fixed when you upgrade Chainer from v1 to v2.

CuPy

CuPy has been separated from Chainer into a separate package

CuPy, which was originally a part of Chainer, has been separated into a different Python package since Chainer v2. It changes the way to set up Chainer with CUDA support. In particular, you have to separately install cupy package to enable CUDA support. See install-guide for the recommended installation steps.

Fortunately, there is no need of updating your source code to catch up with this change.

Global configurations

Training mode is configured by a thread-local flag

In Chainer v2, the concept of training mode is added. It is represented by a thread-local flag chainer.config.train, which is a part of the unified configuration <configuration>. When chainer.config.train is True, functions of Chainer run in the training mode, and otherwise they run in the test mode. For example, ~links.BatchNormalization and ~functions.dropout behave differently in each mode.

In Chainer v1, such a behavior was configured by the train or test argument of each function. This train/test argument has been removed in Chainer v2. If your code is using the train or test argument, you have to update it. In most cases, what you have to do is just removing the train / test argument from any function calls.

Example

Consider the following model definition and the code to call it in test mode written for Chainer v1.

# Chainer v1
import chainer.functions as F

class MyModel(chainer.Link):
    ...

    def __call__(self, x, train=True):
        return f(F.dropout(x, train=train))

m = MyModel(...)
y = m(x, train=False)

In Chainer v2, it should be updated into the following code:

# Chainer v2
import chainer.functions as F

class MyModel(chainer.Link):
    ...

    def __call__(self, x):
        return f(F.dropout(x))

m = MyModel(...)
with chainer.using_config('train', False):
    y = m(x)

Configurations are added and replace some of existing global flags

There are many global settings moved to the unified configuration <configuration> other than the training mode. Following is the complete list of the configuration entries that have corresponding features in Chainer v1.

chainer.config.cudnn_deterministic: It is corresponding to the deterministic argument of some convolution functions in Chainer v1. This argument has been removed since Chainer v2. If you are using this argument, you have to use the chainer.config.cudnn_deterministic flag to change the behavior of the convolution functions.
chainer.config.debug: It is corresponding to the debug mode in Chainer v1, which was configured by set_debug and extracted by is_debug. These functions are also available in Chainer v2, so you basically do not need to update the code related to the debug mode.
chainer.config.enable_backprop: It is corresponding to the backprop mode in Chainer v1. The functions no_backprop_mode and force_backprop_mode are still available in Chainer v2, which automatically turns on/off the enable_backprop flag. One important difference from Chainer v1 is that the volatile flag is removed from Variable. Therefore, there are more situations that you need to modify the enable_backprop flag.
chainer.config.keep_graph_on_report: This flag configures whether or not to keep the computational graph alive for a reported variable. In Chainer v2, when a Variable object is reported by report, a copy of the variable isolated from the computational graph is created and stored by default. Setting True to this flag, you can change this behavior and then the original Variable object is stored as is. See upgrade-reporter-purge-variable for the details.
chainer.config.train: It is corresponding to the train or test argument of some functions in Chainer v1. This argument has been removed since Chainer v2. If you are using this argument, you have to use the chainer.config.train flag instead. See upgrade-train-mode for more details.
chainer.config.type_check: It is corresponding to the Function.type_check_enable flag. If your code touches this flag, you have to use chainer.config.type_check instead. Note that the environment variable CHAINER_TYPE_CHECK is still available in Chainer v2, so if you are only using the environment variable, there is no need of updating your code.
chainer.config.use_cudnn: It is corresponding to the use_cudnn argument of many functions that have cuDNN implementations. This argument has been removed since Chainer v2. If you are using this argument, you have to use the chainer.config.use_cudnn flag instead. Note that this flag is ternary, not binary. See configuration for more details.

These configurations can be modified in two ways.

Simply substituting a new value to an entry, like chainer.config.train = False.
Using the chainer.using_config context manager. It can be used with the with statement of Python as follows:
```
with chainer.using_config('train', False):
    do something  # this code runs with chainer.config.train == False
```
It recovers the original configuration after quitting the with block.

The chainer.config manages the thread-local configuration. You can also set the global configuration by modifying chainer.global_config. Note that the global configuration is used only if the entry of the thread-local configuration is not explicitly set up.

Variable

Volatile flag is removed

The Variable.volatile flag has been removed since Chainer v2.

Instead, the configuration chainer.config.enable_backprop can be used to enable/disable the automatic differentiation feature. If it is True, Chainer always creates a computational graph on the forward propagation, which corresponds to passing non-volatile variables in Chainer v1. Otherwise, Chainer does not create a graph, which corresponds to passing volatile variables in Chainer v1. The biggest difference is that enable_backprop is a thread-local flag, whereas volatile was a flag local to each Variable object. Note that enable_backprop flag has already existed in Chainer v1, which took effect only if all the inputs to the function have volatile == 'auto'.

The chainer.config.enable_backprop flag can be modified directly or by using ~chainer.using_config. See configuration for details. There is also a convenience function, no_backprop_mode, to turn off the flag.

If you are using the Variable.volatile flag, you have to stop setting this flag (it will not take effect), and set the enable_backprop flag instead.

Example

Let model be your model, and consider the following code that calls it in volatile mode.

# Chainer v1
x_data = ...   # ndarray
x = chainer.Variable(x_data, volatile=True)
y = model(x)

In Chainer v2, it should be updated as follows.

# Chainer v2
x_data = ...   # ndarray
x = chainer.Variable(x_data)
with chainer.no_backprop_mode():
    y = model(x)

Variable is not a part of a computational graph anymore

The Variable class has been separated into two distinct classes, the Variable class and the VariableNode class, since Chainer v2. Every class:Variable object owns its own VariableNode object. A computational graph consists of Function objects and VariableNode objects. When one applies a Function to a Variable, the VariableNode object of the variable is extracted and set to one of the inputs of the function.

Note that the underlying data array of the variable is till held by the Variable object. It allows each Function implementation to release unneeded arrays from the computational graph, resulting in greatly reduced memory consumption.

This change does not affect most users' code. If you are directly traversing the computational graph by yourself or modifying the graph ad-hoc, you may have to update your code. In most cases, it is enough to just change Variable into VariableNode in the code traversing the computational graph.

Parameter has to be an instance of Parameter class

Chainer v2 has a subclass of Variable called Parameter. This class has an interface convenient on setting up a parameter variable registered to Link.

You basically do not need to update your code because Link.add_param creates a Parameter object in Chainer v2. There is a new recommended way of registering parameters to a link in Chainer v2, though. See here <upgrade-new-param-register> for the recommended way of parameter registration.

Small changes to Variable

There are some changes on the interface and specification of methods.

len(variable) returns the length of the first axis of the underlying array in Chainer v2. This is equivalent to len(variable.data). It is different from the behavior of Chainer v1, in which len returned the total number of elements in the underlying array.
repr(variable) returns a NumPy-like text representation of the underlying array in Chainer v2. In Chainer v1, it just returns a string that shows the name of the variable.

Function

The force_tuple option of split_axis is True by default

In Chainer v2, the force_tuple argument of functions.split_axis is set to True by default. Therefore, it always returns a tuple regardless of the number of sections made after the split. It was False by default in Chainer v1.

Type check APIs are updated to enable lazy building of the error messages

In Chainer v2, the type check APIs are updated so that the overhead of checking types is greatly reduced. In order to achieve the overhead reduction, some APIs are changed.

If you have custom Function implementations that do type checking, you have to update your code. The following list shows which part has to be updated.

Use utils.type_check.eval instead of Expr.eval.
Use utils.type_check.make_variable to create a utils.type_check.Variable object instead of directly constructing it by yourself.
Stop using .name attribute of any expression.

Background of this change: In Chainer v1, the type checking APIs build an abstract syntax tree (AST) based on each expression that tests some condition. The AST is used to emit a kind error message. However, building an AST requires constructions of many Python objects, which adds large Python overheads. In Chainer v2, the Function.type_check_forward method is called once or twice. At the first call, the type checking APIs run in light-weight mode, where it does not build an AST and just checks the condition. The second call is made only if there is a test that fails, where it builds an AST. This change makes the ordinary path of running the type checking much faster, while keeping the kind error messages.

Methods to release unneeded arrays are added

As is written above <upgrade-variable-node>, Chainer v2 introduced a new mechanism to reduce the memory consumption of each Function implementation. In many cases, a Function implementation does not need some input arrays in its backward computation. A new method called Function.retain_inputs can be used to specify which input arrays are actually needed. This method must not be called from the outside of Function.forward.

Example

For example, consider the following simple addition function.

class AddFunction(chainer.Function):
    def forward(self, inputs):
        return inputs[0] + inputs[1],

    def backward(self, inputs, grad_outputs):
        return grad_outputs[0], grad_outputs[0]

It can be seen that the backward computation of this function does not use any of the inputs. Then, specifying an empty tuple of indexes to ~Function.retain_inputs will reduce the memory overhead.

class AddFunction(chainer.Function):
    def forward(self, inputs):
        self.retain_inputs(())  # does not retain both inputs
        return inputs[0] + inputs[1],

    def backward(self, inputs, grad_outputs):
        return grad_outputs[0], grad_outputs[0]

In some cases, the function can (or have to) use the output arrays instead of the inputs in its backward computation. In Chainer v1, we have written code that store the output arrays to attributes of the Function object and reuse them in the ~Function.backward method. In Chainer v2, it is recommended to use Function.retain_outputs to declare which outputs are required in the backward computation. The retained output arrays can be accessed via Function.output_data.

Note

The existing Function implementations that store the output arrays to its attributes will run correctly in Chainer v2. There is no any memory overhead right now. It is recommended to use ~Function.retain_outputs, though, so that we can incorporate more memory optimization in the future.

Example

For example, consider the following simple implementation of the tanh function.

class TanhFunction(chainer.Function):
    def forward(self, inputs):
        xp = chainer.cuda.get_array_module(inputs[0])
        self.y = xp.tanh(inputs[0])
        return self.y,

    def backward(self, inputs, grad_outputs):
        one = self.y.dtype.type(1)  # avoid type promotion
        return grad_outputs[0] * (one - self.y * self.y),

We can use ~Function.retain_outputs instead of preserving the output array by ourselves as follows.

class TanhFunction(chainer.Function):
    def forward(self, inputs):
        self.retain_outputs((0,))
        xp = chainer.cuda.get_array_module(inputs[0])
        return xp.tanh(inputs[0]),

    def backward(self, inputs, grad_outputs):
        y = self.output_data[0]
        one = y.dtype.type(1)  # avoid type promotion
        return grad_outputs[0] * (one - y * y)

Link/Chain/ChainList

wscale option is removed from links

The wscale option has been removed from links since Chainer v2. If you are using wscale option, you have to update your code. The recommended way is to explicitly set the initializer.

Example

Consider the case of adding a ~links.Linear link with the weight initialized by 0.5x of the default initialization.

# Chainer v1
linear = chainer.links.Linear(10, 5, wscale=0.5)

Note that the default initializer of the weight matrix of ~links.Linear is a normal distribution of the standard deviation $1 / \sqrt{fan in}$. Therefore, it can be fixed as follows.

# Chainer v2
linear = chainer.links.Linear(10, 5, initialW=chainer.initializers.Normal(0.5 / math.sqrt(10)))

Or, by using the fact that initializers.HeNormal provides the initialization with a normal distribution of the standard deviation $scale * \sqrt{2 / fan in}$, the following code is also equivalent to the original.

# Chainer v2, using HeNormal
linear = chainer.links.Linear(10, 5, initialW=chainer.initializers.HeNormal(0.5 / math.sqrt(2))

bias option is removed from links

In Chainer v2, the bias option is removed from the following links: ~links.Linear, ~links.Convolution2D, ~links.Deconvolution2D, and ~links.DilatedConvolution2D. The effect of this argument was duplicated with the initial_bias option. Use initial_bias instead.

The bias vector is enabled by default in N-dimensional convolution links

In Chainer v2, the bias parameter is enabled by default in ~links.ConvolutionND and ~linkd.DeconvolutionND. It was unintentionally disabled by default in Chainer v1.

If you are using ConvolutionND or DeconvolutionND without specifying the initial_bias argument, you have to fix your code. If you want to keep the old behavior (i.e., no bias vector is created by the link), pass nobias=True to the link at the construction. Otherwise it will automatically create a bias vector.

init_weight function is removed

The chainer.initializers.init_weight function that was used on weight initialization has been removed since Chainer v2.

You have to update your code if you are using init_weight. In most cases, the update is simple: pass an initializer to Parameter.

Example

Consider the following code that initializes a weight matrix randomly and a bias vector by zero.

# Chainer v1
class MyLink(chainer.Link):
    def __init__(self):
        super(MyLink, self).__init__(
            W=(10, 5),
            b=(5,),
        )
        chainer.initializers.init_weight(self.W, chainer.initializers.Normal(0.05))
        self.b.data.fill(0)
    ...

This code should be fixed as follows (see the next topic for the use of Parameter).

# Chainer v2
class MyLink(chainer.Link):
    def __init__(self):
        super(MyLink, self).__init__()
        self.W = chainer.Parameter(chainer.initializers.Normal(0.05), (10, 5))
        self.b = chainer.Parameter(0, (5,))
    ...

The order of arguments of GRU is changed

In Chainer v2, the first two arguments of ~links.GRU is the input size and the output size. It was reversed in Chainer v1, causing an inconsistent interface compared to other links including ~links.LSTM. If you are using ~links.GRU, you have to update your code. The update is done by simply flipping the first two arguments.

Example

Consider the following code that creates a ~links.GRU link.

# Chainer v1
gru = chainer.links.GRU(20, 10)

It should be fixed into the following code.

# Chainer v2
gru = chainer.links.GRU(10, 20)

Note that if you were omitting the output size, the code works as is because ~links.GRU supports the omitted input size <update-omit-input-size>.

# Chainer v1/v2
gru = chainer.links.GRU(20)

The default value of the forget bias for LSTM and StatelessLSTM is changed to 1

In Chainer v2, the default forget bias value of ~links.LSTM and ~links.StatelessLSTM links is changed to 1. This change is based on the paper reporting that using a large forget bias improves the training performance. The new behavior is also consistent with the implementation of BasicLSTMCell in TensorFlow.

It will improve the most use cases of LSTMs, although this change would break the reproducibility of the existing experiments. If you want to keep the same initialization procedure, you have to update your code. The change is simple: pass forget_bias_init=0 to ~links.LSTM and ~links.StatelessLSTM.

The interfaces of GRU and LSTM are aligned

In Chainer v1, ~chainer.links.GRU was stateless, as opposed to the current implementation. To align with the naming convention of LSTM links, we have changed the naming convention from Chainer v2 so that the shorthand name points the stateful links. If you are using ~links.GRU, you have to update your code. You can use ~chainer.links.StatelessGRU for stateless version, whose implementation is identical to chainer.linksGRU in v1.

Aliases of links in chainer.functions are removed

For the compatibility reason, there were some links that have aliases in the chainer.functions module. These aliases are removed in Chainer v2. Use chainer.links instead.

Parameter link is removed

The chainer.links.Parameter link is removed in Chainer v2. This link existed in Chainer v1 only for the backward compatibility. Use chainer.Parameter instead (for the new Parameter class, see upgrade-parameter).

New-style parameter registration APIs are added to Link

In Chainer v2, Link.init_scope method returns a context manager that automatically registers a Parameter object to the link at setting it to an attribute. If you are using IDE like PyCharm, it is recommended to use this new-style parameter registration so that IDEs can easily detect the existence of the parameter as an attribute. It is also a good practice to use the new-style API even if you are not using IDEs, if you are planning to make the code public.

Note

The existing code that uses the conventional way of registering parameters are still valid.

Example

For example, the following link initialization code

# Chainer v1
class MyLink(chainer.Link):
    def __init__(self):
        super(MyLink, self).__init__(
            W=(10, 5),
            b=(5,),
        )
        chainer.initializers.Normal(0.05)(self.W.data)
        self.b.data.fill(0)
    ...

is recommended to be updated as follows.

# Chainer v2
class MyLink(chainer.Link):
    def __init__(self):
        super(MyLink, self).__init__()
        with self.init_scope():
            self.W = chainer.Parameter(chainer.initializers.Normal(0.05), (10, 5))
            self.b = chainer.Parameter(0, (5,))  # initialize by zero
    ...

Note

To keep a Parameter object as an attribute without registration, you can set the attribute without using the with self.init_scope(): block.

New-style child link registration APIs are added to Chain

Like Parameter, a Link object is also automatically registered to a Chain object by substitution to an attribute within a ~Link.init_scope scope. If you are using IDE like PyCharm, it is recommended to use the new-style child link registration so that IDEs can easily detect the existence of the child link as an attribute. It is also a good practice to use the new-style API even if you are not using IDEs, if you are planning to make the code public.

Note

The existing code that uses the conventional way of registering child links are still valid.

Example

For example, the following chain initialization code

# Chainer v1
class MyMLP(chainer.Chain):
    def __init__(self):
        super(MyMLP, self).__init__(
            layer1=L.Linear(None, 20),
            layer2=L.Linear(None, 30),
        )
    ...

is recommended to be updated as follows.

# Chainer v2
class MyMLP(chainer.Chain):
    def __init__(self):
        super(MyMLP, self).__init__()
        with self.init_scope():
            self.layer1 = L.Linear(20)
            self.layer2 = L.Linear(30)

Note that this example also demonstrates the use of new APIs with the omitted input size <update-omit-input-size>, explained below.

Note

To keep a Link object as an attribute without registration, you can set the attribute without using the with self.init_scope(): block.

The input-size placeholder of links are made optional

In Chainer v2, the input size of many links, including ~links.Linear and ~links.Convolution2D, is made optional. In Chainer v1, we had to use None as the placeholder to specify that the input size should be determined at the first iteration. The placeholder can also be used in Chainer v2, although it is easier to just omit the input size.

See the previous item for the example of omitting the input size of ~links.Linear. The following links currently support the omitted input size.

~links.Convolution2D
~links.Deconvolution2D
~links.DilatedConvolution2D
~links.Linear
~links.LSTM
~links.MLPConvolution2D
~links.StatelessLSTM

Optimizer

Deprecated methods of Optimizer are removed

The following methods are removed from Optimizer. These methods have been already deprecated in the past versions. If you are using these methods, you have to update your code.

zero_grads: use Link.zerograds instead.
compute_grads_norm: you can compute the gradient norm by iterating the list of parameters by Link.params.
clip_grads: use ~optimizer.GradientClipping instead.
weight_decay: use ~optimizer.WeightDecay instead.
accumulate_grads: use Link.addgrads instead.

GradientMethod uses Link.cleargrads instead of Link.zerograds by default

In Chainer v2, GradientMethod clears the gradient before running backprop by Link.cleargrads. It means that the gradient of each parameter is initialized by None instead of a zero array. Note that all the optimizer implementations provided by Chainer are subclasses of GradientMethod, and therefore this change affects all of them.

In most cases, you do not need to update your code. If your code relies on the zeroing initialization, you have to fix your code to explicitly initialize the gradient by zero, or to pass False to GradientMethod.use_cleargrads.

GradientMethod is redesigned to allow parameter-specific update rules

In Chainer v2, the new class UpdateRule is used to define an update rule specific to each Parameter object. The UpdateRule is set to each Parameter object, and is used at each update step. This object implements an update formula using the data and gradient arrays.

Each UpdateRule object has ~UpdateRule.enabled flag, which configures if the update rule should be applied to that parameter on update. By setting the flag to False, you can freeze the parameter. There is also a convenient method Link.enable_update and Link.disable_update, which configure the flag of each parameter under the link hierarchy. In other frameworks, a similar feature is called layer freezing. In Chainer v2, this is officially supported by these methods.

Each UpdateRule object can also hold its own hook functions similar to Optimizer. The built-in hook functions except for ~optimizer.GradientClipping can also be used as a hook function of UpdateRule.

In most cases, you do not have to update your code because each optimizer automatically sets up an appropriate UpdaterRule object to each parameter.

If you are using a custom gradient-based optimizer implementation, you need to update the implementation. The following list shows what you have to do.

Write a subclass of UpdateRule that implements the update rule.
Rewrite your GradientMethod implementation. The new implementation only has to set up the update rule for each parameter in the target link.

You can see live examples in the optimizer implementations provided by Chainer.

Serializer

None is serializable

In Chainer v2, all serializers start supporting None value to be serialized and deserialized. Users' code can rely on this feature, i.e., it can serialize and deserialize None value with any given serializer. This change only affects your code if it provides its own serializer implementations.

Trainer and Extension

Updater and Evaluator pass raw data arrays to the loss function

In Chainer v2, ~training.Updater and ~training.extensions.Evaluator pass raw data arrays to the loss function without wrapping them with Variable. You might need to update your code so that the loss function (in most cases, the model's __call__ ) accepts raw arrays.

Note that raw arrays can be directly passed to any Function; they are automatically wrapped by Variable. For example, if the input is directly passed to a Function object (or any function under chainer.functions), you do not need to update the code.

Example

Consider the following code that obtains the shape of the input via Variable.data.

# Chainer v1
class MyLink(chainer.Link):
    def __call__(self, x):
        shape = x.data.shape  # valid if x is Variable, invalid if x is ndarray
        ...

It should be updated so that the link also accepts a raw array as the input. In this case, we have Variable.shape which is equivalent to data.shape, so you can simply write as follows.

# Chainer v2
class MyLink(chainer.Link):
    def __call__(self, x):
        shape = x.shape  # valid regardless of x being Variable or ndarray
        ...

trigger option is removed from snapshot and snapshot_object

In Chainer v2, the trigger option is removed from the ~training.extensions.snapshot and ~training.extenisons.snapshot_object extensions. The effect of the option was duplicated with the trigger option of Trainer.extend <training.Trainer.extend>. If you are passing the trigger argument to these extensions, you have to update your code. The update can be done by passing the value to the corresponding Trainer.extend <training.Trainer.extend>.

Example

Assume that trainer is an instance of ~training.Trainer, and consider that you were adding a ~training.extensions.snapshot extension as follows.

# Chainer v1
trainer.extend(chainer.training.extensions.snapshot(trigger=(1000, 'iteration')))

It should be updated as follows (note that this code also works with Chainer v1).

# Chainer v1/v2
trainer.extend(chainer.training.extensions.snapshot(), trigger=(1000, 'iteration'))

Extension.invoke_before_training is removed

In Chainer v2, The attribute invoke_before_training of ~training.Extension is removed. Instead, the Extension.initialize <training.Extension.initialize> method is added. This method is called by Trainer.run <training.Trainer.run> before entering the training loop.

In Chainer v1, the extension is just called before entering the training loop when invoke_before_training is True. If you have a custom extension that has invoke_before_training=True , you have to update the code. What you have to do is to remove the invoke_before_training flag and override ~training.Extension.initialize method. If you are using the ~training.make_extension decorator, you can set the initialize function by passing the initializer argument to ~training.make_extension.

The dump_graph extension dumps the valid graph only at its first invocation

In Chainer v2, the ~training.extensions.dump_graph extension dumps the valid computational graph only at its first invocation. If you want to dump the graph more than once, you have to fix the code. The easiest fix is setting the chainer.config.keep_graph_on_report flag to True. Note that this fix will cancel the improvement on the memory consumption made in Chainer v2. More memory-efficient fix is to dump the graph without using an extension, e.g. by customizing the loss function or the updater.

Here is the background of this change. In Chainer v2, the Reporter copies reported variables with purging the computational graph by default. <upgrade-reporter-purge-variable> On the other hand, the ~training.extensions.dump_graph extension requires the computational graph reachable from the reported variable. In order to make the graph available, the ~training.extensions.dump_graph extension turns on the chainer.config.keep_graph_on_report flag at its initializer (i.e., it turns on the graph before entering the training loop). Since we also wanted to achieve the memory efficiency, the ~training.extensions.dump_graph extension turns off the flag after dumping the graph at its first invocation (strictly speaking, it recovers the original value). As a result, the computational graph is not available from the second invocation.

Since the ~training.extensions.dump_graph recovers the original flag value at its invocation, you can keep the graph dumped more than once by changing the original flag value.

Reporter

When a variable is reported, the variable is copied with the graph purged

In Chainer v2, when a Variable object is reported using report function (or directly using Reporter), a copy of the variable is made without preserving the computational graph. If your code depends on the reachability of the computational graph from the reported variable, you have to update your code. The easiest way to update your code is setting chainer.config.keep_graph_on_report to True, then Chainer will keep the computational graph reachable from the reported variable.

The possible examples that are affected by this change are as follows (not exhaustive).

A custom extension that runs backprop from a reported variable. It is definitely an example of assuming the reachability of the computational graph from the reported variable.
An extension that visualizes the computational graph from a reported variable. If you are writing such an extension by yourself, you have to turn on the keep_graph_on_report flag. The ~training.extensions.dump_graph extension is another example, for which see the above item <upgrade-dump-graph-only-once> for the details.

This change is made for the memory performance reason; with this change, the memory used by the computational graph for training is immediately released before invoking extensions. Therefore, changing the behavior by overwriting chainer.config.keep_graph_on_report may increase the memory consumption. It may cause an out-of-memory error if the computational graph of the loss function consumes almost all the memory available in your environment and there is an extension that uses a certain amount of memory (e.g. ~training.extensions.Evaluator).

Other utilities

Some obsolete classes and functions are removed

The following classes and functions are removed in Chainer v2.

chainer.Flag
chainer.FunctionSet (Use Chain or ChainList instead)
chainer.cuda.init (It did nothing except for calling ~cuda.check_cuda_available)
chainer.cuda.empty (Use cupy.empty)
chainer.cuda.empty_like (Use cupy.empty_like)
chainer.cuda.full (Use cupy.full)
chainer.cuda.full_like (Use cupy.full_like)
chainer.cuda.ones (Use cupy.ones)
chainer.cuda.ones_like (Use cupy.ones_like)
chainer.cuda.zeros (Use cupy.zeros)
chainer.cuda.zeros_like (Use cupy.zeros_like)

Files

upgrade_v2.rst

Latest commit

History

upgrade_v2.rst

File metadata and controls

Upgrade Guide from v1 to v2

CuPy

CuPy has been separated from Chainer into a separate package

Global configurations

Training mode is configured by a thread-local flag

Configurations are added and replace some of existing global flags

Variable

Volatile flag is removed

Variable is not a part of a computational graph anymore

Parameter has to be an instance of Parameter class

Small changes to Variable

Function

The force_tuple option of split_axis is True by default

Type check APIs are updated to enable lazy building of the error messages

Methods to release unneeded arrays are added

Link/Chain/ChainList

wscale option is removed from links

bias option is removed from links

The bias vector is enabled by default in N-dimensional convolution links

init_weight function is removed

The order of arguments of GRU is changed

The default value of the forget bias for LSTM and StatelessLSTM is changed to 1

The interfaces of GRU and LSTM are aligned

Aliases of links in chainer.functions are removed

Parameter link is removed

New-style parameter registration APIs are added to Link

New-style child link registration APIs are added to Chain

The input-size placeholder of links are made optional

Optimizer

Deprecated methods of Optimizer are removed

GradientMethod uses Link.cleargrads instead of Link.zerograds by default

GradientMethod is redesigned to allow parameter-specific update rules

Serializer

None is serializable

Trainer and Extension

Updater and Evaluator pass raw data arrays to the loss function

trigger option is removed from snapshot and snapshot_object

Extension.invoke_before_training is removed

The dump_graph extension dumps the valid graph only at its first invocation

Reporter

When a variable is reported, the variable is copied with the graph purged

Other utilities

Some obsolete classes and functions are removed