Design Doc for Regularization #4869

abhinavarora · 2017-10-17T19:04:20Z

No description provided.

mkliegl

LGTM. I definitely agree that we need the flexibility to add different forms and/or strengths of regularization to different variables.

The below comments may be more appropriate for the Python API discussion, but here goes:

In general, I think it would be good if each Parameter had a flexible user-accessible dict/JSON object of attributes rather than just a few hard-coded flags like trainable plus some for regularization.

That is, make it easy for users to (a) add arbitrary flags/properties to Parameters during layer creation and (b) make it easy for users to filter all Parameters according to those flags/properties. Then it will be very easy to write apply_regularization functions. It will be a natural part of the framework and won't have to be somehow separately designed or implemented.

This is more generally useful. For example, people may want to tag some parameters with {'debug': True} and then write some code that prints min/max/np.any(np.isnan(.)) etc. for those Parameters regularly during training.

Such patterns are common and often a little cumbersome in TensorFlow code: People maintain separate lists of variables in global scope for each attribute, rather than just storing the attributes with the parameter.

mkliegl · 2017-10-17T19:35:34Z

doc/design/regularization.md

+<img src="./images/l1_regularization.png" align="center"/><br/>
+
+A much more detailed mathematical background of reguilarization can be found [here](http://www.deeplearningbook.org/contents/regularization.html).
+


typo: reguilarization => regularization

mkliegl · 2017-10-17T19:50:12Z

doc/design/regularization.md

+#### High-level API
+
+In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) to create neural network layers. Hence, we lso need to provide regularization functionality in layer functions. The design of these APIs can be postponed for later right now. A good reference for these APIs can be found in [Keras](https://keras.io/regularizers/) and also by looking at Tensorflow in [`tf.contrib.layers`](https://www.tensorflow.org/api_guides/python/contrib.layers).
+


typo: 'lso' => 'also'

mkliegl · 2017-10-17T19:51:07Z

doc/design/regularization.md

+- L1_regularization_op
+
+These ops can be like any other ops with their own CPU/GPU implementations either using Eigen or separate Cpu and GPU kernels. As the initial implementation, we can implement their kernels using Eigen following the abstraction pattern implemented for [Activation Ops](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/accuracy_op.h). This abstraction pattern can make it very easy to implement new regularization schemes. other than L1 and L2 norm penalties. 
+


typo: "Cpu" => "CPU"
typo: "schemes. other than" => delete period

jacquesqiao · 2017-10-17T20:44:29Z

doc/design/regularization.md

+
+### Low-Level implementation
+
+In the new design, we propose to create new operations for regularization. For now, we can add 2 ops thgat correspond to the most frequently used regularizations:


thgat ==> that

dzhwinter · 2017-10-18T02:27:35Z

doc/design/regularization.md

+
+## How to do Regularization in PaddlePaddle
+
+On surveying existing frameworks like Tensorflow, PyTorch, Caffe, etc, it can be seen that there are 2 common approaches of doing regularization:


Thanks for the comparison about torch, tensorflow!

Maybe the survey part can be put into an issue or wiki page and add a reference to the bottom. e.g dependency engine.

My point is the design doc should be simplified enough for the users who want to learn the design detail. If they want more details about the design decision, they can find the related survey/discussion easily. But we should not put them into design doc.

dzhwinter · 2017-10-18T02:48:15Z

doc/design/regularization.md

+
+## Introduction to Regularization
+A central problem in machine learning is how to design an algorithm that will perform well not just on the training data, but also on new data. Many strategies are used by machine learning practitioners to reduce the test error, possibly at the expense of increased training error. These strategies are collectively known as **regularization**. 
+


Maybe we can introduce the overfit problem and then comes out the regularization.

Many strategies are used by machine learning practitioners to reduce the test error, possibly at the expense of increased training error.

These strategies summarization of regularization is not very accurate to my first glance.
from the wikipedia,
regularization

In general, regularization is a technique that applies to objective functions in ill-posed problems formulated as optimization problems

dzhwinter · 2017-10-18T02:51:18Z

doc/design/regularization.md

+
+#### High-level API
+
+In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) to create neural network layers. Hence, we lso need to provide regularization functionality in layer functions. The design of these APIs can be postponed for later right now. A good reference for these APIs can be found in [Keras](https://keras.io/regularizers/) and also by looking at Tensorflow in [`tf.contrib.layers`](https://www.tensorflow.org/api_guides/python/contrib.layers).


Not related this PR, I just want to remind that our V2 API is a determined interface. We must compatible with it. I'm not sure the layer function will take this job or regularization function also need to consider it.

Hi @dzhwinter We might have to discuss this offline because even I had this concern. However, I saw that we are also changing the interface for the python Optimizers.

I just see this PR. As I understand, the regularizers only have gradient operators. There is no need to create the forward operator which causes computations playing not important roles (only to print a loss containing regularizers to users). The regularizer can be calculated only when parameter updates. The l2 regularizer can be easily implemented in the optimizer (the old concept in PaddlePaddle, maybe it is changed now.), but L1 is very special.

Just some thoughts.

@lcy-seso I agree with you that the L2 regularizer can be easily implemented in the optimizer. However, I believe that implementing in the optimizer can be an added feature that Paddle could support. Having regularization ops can be a more general way for us and Paddle users to implement custom regularization schemes. Let me clean this doc for typos and then we can vote/discuss on both the options?

No problem. I am also curious about and interested in how we are going to implement the L1 regularization.

dzhwinter · 2017-10-18T02:57:26Z

doc/design/regularization.md

+
+#### Creation of Regularization ops
+There are two possibilities for creating the regularization ops:
+1. We create these ops immediately while building the computation graph. 


Maybe we should write down the proposal one. The same reason with above comment, we can put these two options in an issue or somewhere, then we discuss or vote for a better choice.

And for here it will definitely be option 2.

Because when we run a specific target, some operators should not be included, for example, the googlenet(inception model) run different level optimized targets, or when do the serving job such as inference examples.

wangkuiyi

LGTM

abhinavarora and others added 5 commits October 16, 2017 20:02

Add initail design doc for regularization

becad14

Updating image links

5423041

Commiting the images for the equations

c72d7b7

Adding computation graph images

a6ca2b4

Adding section on computation graph

e5eb1af

abhinavarora self-assigned this Oct 17, 2017

abhinavarora requested review from reyoung, shoeybi, jacquesqiao, dzhwinter, qingqing01 and mkliegl October 17, 2017 19:04

mkliegl approved these changes Oct 17, 2017

View reviewed changes

jacquesqiao reviewed Oct 17, 2017

View reviewed changes

dzhwinter reviewed Oct 18, 2017

View reviewed changes

wangkuiyi approved these changes Oct 18, 2017

View reviewed changes

wangkuiyi merged commit f4a21e3 into PaddlePaddle:develop Oct 18, 2017

abhinavarora deleted the regularization branch October 20, 2017 18:12

abhinavarora mentioned this pull request Oct 24, 2017

Regularization Design for PaddlePaddle #5054

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design Doc for Regularization #4869

Design Doc for Regularization #4869

abhinavarora commented Oct 17, 2017

mkliegl left a comment

mkliegl Oct 17, 2017

mkliegl Oct 17, 2017

mkliegl Oct 17, 2017

jacquesqiao Oct 17, 2017

dzhwinter Oct 18, 2017

dzhwinter Oct 18, 2017

dzhwinter Oct 18, 2017

abhinavarora Oct 18, 2017

lcy-seso Oct 18, 2017 •

edited

Loading

abhinavarora Oct 18, 2017

lcy-seso Oct 19, 2017

dzhwinter Oct 18, 2017

wangkuiyi left a comment

		<img src="./images/l1_regularization.png" align="center"/><br/>

		A much more detailed mathematical background of reguilarization can be found [here](http://www.deeplearningbook.org/contents/regularization.html).

		#### High-level API

		In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) to create neural network layers. Hence, we lso need to provide regularization functionality in layer functions. The design of these APIs can be postponed for later right now. A good reference for these APIs can be found in [Keras](https://keras.io/regularizers/) and also by looking at Tensorflow in [`tf.contrib.layers`](https://www.tensorflow.org/api_guides/python/contrib.layers).

		- L1_regularization_op

		These ops can be like any other ops with their own CPU/GPU implementations either using Eigen or separate Cpu and GPU kernels. As the initial implementation, we can implement their kernels using Eigen following the abstraction pattern implemented for [Activation Ops](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/accuracy_op.h). This abstraction pattern can make it very easy to implement new regularization schemes. other than L1 and L2 norm penalties.


		### Low-Level implementation

		In the new design, we propose to create new operations for regularization. For now, we can add 2 ops thgat correspond to the most frequently used regularizations:


		## How to do Regularization in PaddlePaddle

		On surveying existing frameworks like Tensorflow, PyTorch, Caffe, etc, it can be seen that there are 2 common approaches of doing regularization:


		## Introduction to Regularization
		A central problem in machine learning is how to design an algorithm that will perform well not just on the training data, but also on new data. Many strategies are used by machine learning practitioners to reduce the test error, possibly at the expense of increased training error. These strategies are collectively known as regularization.

Design Doc for Regularization #4869

Design Doc for Regularization #4869

Conversation

abhinavarora commented Oct 17, 2017

mkliegl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso Oct 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangkuiyi left a comment

Choose a reason for hiding this comment

lcy-seso Oct 18, 2017 •

edited

Loading