Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Doc for Regularization #4869

Merged
merged 5 commits into from
Oct 18, 2017

Conversation

abhinavarora
Copy link
Contributor

No description provided.

Copy link
Contributor

@mkliegl mkliegl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I definitely agree that we need the flexibility to add different forms and/or strengths of regularization to different variables.

The below comments may be more appropriate for the Python API discussion, but here goes:

In general, I think it would be good if each Parameter had a flexible user-accessible dict/JSON object of attributes rather than just a few hard-coded flags like trainable plus some for regularization.

That is, make it easy for users to (a) add arbitrary flags/properties to Parameters during layer creation and (b) make it easy for users to filter all Parameters according to those flags/properties. Then it will be very easy to write apply_regularization functions. It will be a natural part of the framework and won't have to be somehow separately designed or implemented.

This is more generally useful. For example, people may want to tag some parameters with {'debug': True} and then write some code that prints min/max/np.any(np.isnan(.)) etc. for those Parameters regularly during training.

Such patterns are common and often a little cumbersome in TensorFlow code: People maintain separate lists of variables in global scope for each attribute, rather than just storing the attributes with the parameter.

<img src="./images/l1_regularization.png" align="center"/><br/>

A much more detailed mathematical background of reguilarization can be found [here](http://www.deeplearningbook.org/contents/regularization.html).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: reguilarization => regularization

#### High-level API

In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) to create neural network layers. Hence, we lso need to provide regularization functionality in layer functions. The design of these APIs can be postponed for later right now. A good reference for these APIs can be found in [Keras](https://keras.io/regularizers/) and also by looking at Tensorflow in [`tf.contrib.layers`](https://www.tensorflow.org/api_guides/python/contrib.layers).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: 'lso' => 'also'

- L1_regularization_op

These ops can be like any other ops with their own CPU/GPU implementations either using Eigen or separate Cpu and GPU kernels. As the initial implementation, we can implement their kernels using Eigen following the abstraction pattern implemented for [Activation Ops](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/accuracy_op.h). This abstraction pattern can make it very easy to implement new regularization schemes. other than L1 and L2 norm penalties.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "Cpu" => "CPU"
typo: "schemes. other than" => delete period


### Low-Level implementation

In the new design, we propose to create new operations for regularization. For now, we can add 2 ops thgat correspond to the most frequently used regularizations:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thgat ==> that


## How to do Regularization in PaddlePaddle

On surveying existing frameworks like Tensorflow, PyTorch, Caffe, etc, it can be seen that there are 2 common approaches of doing regularization:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comparison about torch, tensorflow!

Maybe the survey part can be put into an issue or wiki page and add a reference to the bottom. e.g dependency engine.

My point is the design doc should be simplified enough for the users who want to learn the design detail. If they want more details about the design decision, they can find the related survey/discussion easily. But we should not put them into design doc.


## Introduction to Regularization
A central problem in machine learning is how to design an algorithm that will perform well not just on the training data, but also on new data. Many strategies are used by machine learning practitioners to reduce the test error, possibly at the expense of increased training error. These strategies are collectively known as **regularization**.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can introduce the overfit problem and then comes out the regularization.

Many strategies are used by machine learning practitioners to reduce the test error, possibly at the expense of increased training error.

These strategies summarization of regularization is not very accurate to my first glance.
from the wikipedia,
regularization

In general, regularization is a technique that applies to objective functions in ill-posed problems formulated as optimization problems


#### High-level API

In PaddlePaddle Python API, users will primarily rely on [layer functions](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/python_api.md#layer-function) to create neural network layers. Hence, we lso need to provide regularization functionality in layer functions. The design of these APIs can be postponed for later right now. A good reference for these APIs can be found in [Keras](https://keras.io/regularizers/) and also by looking at Tensorflow in [`tf.contrib.layers`](https://www.tensorflow.org/api_guides/python/contrib.layers).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related this PR, I just want to remind that our V2 API is a determined interface. We must compatible with it. I'm not sure the layer function will take this job or regularization function also need to consider it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dzhwinter We might have to discuss this offline because even I had this concern. However, I saw that we are also changing the interface for the python Optimizers.

Copy link
Contributor

@lcy-seso lcy-seso Oct 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just see this PR. As I understand, the regularizers only have gradient operators. There is no need to create the forward operator which causes computations playing not important roles (only to print a loss containing regularizers to users). The regularizer can be calculated only when parameter updates. The l2 regularizer can be easily implemented in the optimizer (the old concept in PaddlePaddle, maybe it is changed now.), but L1 is very special.

Just some thoughts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lcy-seso I agree with you that the L2 regularizer can be easily implemented in the optimizer. However, I believe that implementing in the optimizer can be an added feature that Paddle could support. Having regularization ops can be a more general way for us and Paddle users to implement custom regularization schemes. Let me clean this doc for typos and then we can vote/discuss on both the options?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. I am also curious about and interested in how we are going to implement the L1 regularization.


#### Creation of Regularization ops
There are two possibilities for creating the regularization ops:
1. We create these ops immediately while building the computation graph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should write down the proposal one. The same reason with above comment, we can put these two options in an issue or somewhere, then we discuss or vote for a better choice.

And for here it will definitely be option 2.

Because when we run a specific target, some operators should not be included, for example, the googlenet(inception model) run different level optimized targets, or when do the serving job such as inference examples.

Copy link
Collaborator

@wangkuiyi wangkuiyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wangkuiyi wangkuiyi merged commit f4a21e3 into PaddlePaddle:develop Oct 18, 2017
@abhinavarora abhinavarora deleted the regularization branch October 20, 2017 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants