Adding the Xavier Initializer #5270

abhinavarora · 2017-11-01T00:51:15Z

No description provided.

pengli09 · 2017-11-01T02:13:35Z

python/paddle/v2/framework/initializer.py

+            fan_out = shape[1]
+        else:
+            # Assume this to be a convolutional kernel
+            # In Paddlepaddle, the shape of the kernel is like:


s/Paddlepaddle/PaddlePaddle/g

Fixed in 4e1fa1b

pengli09 · 2017-11-01T02:17:04Z

python/paddle/v2/framework/initializer.py

+            (http://proceedings.mlr.press/v9/glorot10a.html)
+    """
+
+    def __init__(self, uniform=True, fan_in=None, fan_out=None, seed=0):


Normal distribution is used more frequently than Uniform distribution. Shall we change the default value of uniform to False

I think uniform distribution is acceptable for me, my concern is we are not in a stage that we are intensively encoding the best practices from the community into our framework. We can carefully tune this later.

But uniform distribution from [-1, 1] as a default initialization seems terrible (only from my personal experience. It seems too large). Maybe, we can reduce the region limit into le-3?

I was looking at tensorflow where they keep uniform as the default. That is why I chose uniform as the default.

Got it~ thank you for the information.

pengli09 · 2017-11-01T02:17:49Z

python/paddle/v2/framework/initializer.py

+    def __init__(self, uniform=True, fan_in=None, fan_out=None, seed=0):
+        """Constructor for XavierInitializer
+
+        Args:


Please add descriptions for the default values.

Thank you for the feedback. Could you give me an example of the description you are referring to. I have talked about fan_in and fan_out in the Args section of my docstring.

Just ignore this comment, I didn't realize that the signature of the function will be displayed in the doc.

~~For example, "uniform: .... Default value is True.". Otherwise, one may need to dig into the code to find what are the default values.~~

pengli09 · 2017-11-01T02:35:08Z

python/paddle/v2/framework/initializer.py

+        f_in, f_out = self._compute_fans(var)
+
+        # If fan_in and fan_out are passed, use them
+        fan_in = self._fan_in or f_in


f_in will be used if self._fan_in is 0, which is counter-intuitive. Therefore I think f_in if self._fan_in is None else self._fan_in. 0 is not a proper value for self._fan_in, which means, when encountering a 0, either the user is using it intentionally for debugging purpose or an error occurs. Therefore, I think failing deterministically will be a better choice.

Fixed in 4e1fa1b

pengli09 · 2017-11-01T02:35:17Z

python/paddle/v2/framework/initializer.py

+
+        # If fan_in and fan_out are passed, use them
+        fan_in = self._fan_in or f_in
+        fan_out = self._fan_out or f_out


pengli09 · 2017-11-01T02:38:37Z

python/paddle/v2/framework/initializer.py

+                outputs={"Out": var},
+                attrs={
+                    "shape": var.shape,
+                    "data_type": int(var.data_type),


I'm not familiar with Block. Is int correct?

Yes this is correct. We pass the data type as integer which is then mapped to a given type.

pengli09 · 2017-11-01T02:42:00Z

python/paddle/v2/framework/tests/test_initializer.py

+        limit = np.sqrt(6.0 / (param.shape[0] + param.shape[1]))
+        self.assertAlmostEqual(init_op.attr('min'), -limit, delta=DELTA)
+        self.assertAlmostEqual(init_op.attr('max'), limit, delta=DELTA)
+        self.assertEqual(init_op.attr('seed'), 0)


I think we should also test whether seed can be set properly.

Fixed in 4e1fa1b

pengli09 · 2017-11-01T02:47:44Z

python/paddle/v2/framework/initializer.py

+    approximately same in all the layers. In case of Uniform distribution,
+    the range is [-x, x], where x = sqrt(6 / (fan_in + fan_out)).
+    In case of Normal distribution, the mean is 0 and the standard deviation
+    is sqrt(2/ (fan_in + fan_out)).


The recommended scale/std for different nonlinearity functions are different. I think we can borrow the idea from PyTorch to make this initializer more general.
https://github.com/pytorch/pytorch/blob/master/torch/nn/init.py#L8
https://github.com/pytorch/pytorch/blob/master/torch/nn/init.py#L184

The original paper does not talk about Relus and gain because this paper was before the time when RELUs became popular. I also looked at tensorflow and keras source code. They have kept this initialization as it is defined in the paper. Maybe we can merge this for now and then add the gain later in a separate PR when we have more knowledge about it. I can read few more papers to see how the gain attribute is used. I do not think it is right to borrow the idea directly without researching it thoroughly. We can merge this and can look at this in more detail after refactoring is complete. What do you suggest?

lcy-seso

I like to read your doc/comments, it is quite easy to follow, thank you.

lcy-seso · 2017-11-01T05:34:37Z

python/paddle/v2/framework/initializer.py

+            networks. International conference on artificial intelligence and
+            statistics.
+            (http://proceedings.mlr.press/v9/glorot10a.html)
+    """


I like to read your comments~ Quite easy to follow.

Just for my personal interests. I remember this is a recommended initializer for tanh, the usually recommended initializer varies according to different activations.

I think we can just add all these helpful initializers into our framework for the first goal, and later to choose a much better initialization strategy to encode the best practice.

Thank you for the feedback. I agree with you.

lcy-seso · 2017-11-01T05:50:47Z

python/paddle/v2/framework/initializer.py

+            (http://proceedings.mlr.press/v9/glorot10a.html)
+    """
+
+    def __init__(self, uniform=True, fan_in=None, fan_out=None, seed=0):


I think uniform distribution is acceptable for me, my concern is we are not in a stage that we are intensively encoding the best practices from the community into our framework. We can carefully tune this later.

But uniform distribution from [-1, 1] as a default initialization seems terrible (only from my personal experience. It seems too large). Maybe, we can reduce the region limit into le-3?

lcy-seso

LGTM.

lcy-seso · 2017-11-02T01:12:41Z

Not changing the default uniform distribution is acceptable for me. (Personally) Because for me, this is not a top concern currently and I do not have a strong evidence to support any other change (they all seems reasonable).

Adding the Xavier Initializer

369aafc

abhinavarora requested review from reyoung, pengli09 and lcy-seso November 1, 2017 01:18

Merge remote-tracking branch 'origin/develop' into xavier

016bdbe

pengli09 requested changes Nov 1, 2017

View reviewed changes

lcy-seso reviewed Nov 1, 2017

View reviewed changes

Abhinav Arora added 2 commits November 1, 2017 12:47

Addressing code review feedback

4e1fa1b

Merge remote-tracking branch 'origin/develop' into xavier

c3e557c

lcy-seso approved these changes Nov 2, 2017

View reviewed changes

pengli09 approved these changes Nov 2, 2017

View reviewed changes

abhinavarora merged commit 66d1c6c into PaddlePaddle:develop Nov 2, 2017

abhinavarora deleted the xavier branch November 2, 2017 17:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding the Xavier Initializer #5270

Adding the Xavier Initializer #5270

abhinavarora commented Nov 1, 2017

pengli09 Nov 1, 2017

abhinavarora Nov 1, 2017

pengli09 Nov 1, 2017

lcy-seso Nov 1, 2017 •

edited

abhinavarora Nov 1, 2017

lcy-seso Nov 2, 2017

pengli09 Nov 1, 2017

abhinavarora Nov 1, 2017

pengli09 Nov 2, 2017 •

edited

pengli09 Nov 1, 2017

abhinavarora Nov 1, 2017

pengli09 Nov 1, 2017

pengli09 Nov 1, 2017

abhinavarora Nov 1, 2017

pengli09 Nov 1, 2017

abhinavarora Nov 1, 2017

pengli09 Nov 1, 2017

abhinavarora Nov 1, 2017

lcy-seso left a comment

lcy-seso Nov 1, 2017 •

edited

abhinavarora Nov 1, 2017

lcy-seso Nov 1, 2017 •

edited

lcy-seso left a comment

lcy-seso commented Nov 2, 2017 •

edited

Adding the Xavier Initializer #5270

Adding the Xavier Initializer #5270

Conversation

abhinavarora commented Nov 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso Nov 1, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pengli09 Nov 2, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso left a comment

Choose a reason for hiding this comment

lcy-seso Nov 1, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso Nov 1, 2017 • edited

Choose a reason for hiding this comment

lcy-seso left a comment

Choose a reason for hiding this comment

lcy-seso commented Nov 2, 2017 • edited

lcy-seso Nov 1, 2017 •

edited

pengli09 Nov 2, 2017 •

edited

lcy-seso Nov 1, 2017 •

edited

lcy-seso Nov 1, 2017 •

edited

lcy-seso commented Nov 2, 2017 •

edited