Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PaddlePaddle Hackathon 2】13、为 Paddle 新增 OneCycleLR 优化调度器 #4713

Merged
merged 6 commits into from
May 18, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/api/api_label
Original file line number Diff line number Diff line change
Expand Up @@ -1010,6 +1010,7 @@ paddle.optimizer.lr.ReduceOnPlateau .. _api_paddle_optimizer_lr_ReduceOnPlateau:
paddle.optimizer.lr.StepDecay .. _api_paddle_optimizer_lr_StepDecay:
paddle.optimizer.lr.PolynomialDecay .. _api_paddle_optimizer_lr_PolynomialDecay:
paddle.optimizer.lr.NaturalExpDecay .. _api_paddle_optimizer_lr_NaturalExpDecay:
paddle.optimizer.lr.OneCycleLR .. _cn_api_paddle_optimizer_lr_OneCycleLR:
paddle.regularizer.L1Decay .. _api_paddle_regularizer_L1Decay:
paddle.regularizer.L2Decay .. _api_paddle_regularizer_L2Decay:
paddle.static.InputSpec .. _api_paddle_static_InputSpec:
Expand Down
2 changes: 1 addition & 1 deletion docs/api/paddle/optimizer/Overview_cn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,4 +53,4 @@ paddle.optimizer 目录下包含飞桨框架支持的优化器算法相关的API
" :ref:`ReduceOnPlateau <cn_api_paddle_optimizer_lr_ReduceOnPlateau>` ", "loss 自适应学习率衰减"
" :ref:`StepDecay <cn_api_paddle_optimizer_lr_StepDecay>` ", "按指定间隔轮数学习率衰减"
" :ref:`MultiplicativeDecay <cn_api_paddle_optimizer_lr_MultiplicativeDecay>` ", "根据lambda函数进行学习率衰减"

" :ref:`OneCycleLR <cn_api_paddle_optimizer_lr_OneCycleLR>` ", "One Cycle学习率衰减"
4 changes: 3 additions & 1 deletion docs/api/paddle/optimizer/lr/LRScheduler_cn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ LRScheduler

学习率策略的基类。定义了所有学习率调整策略的公共接口。

目前在paddle中基于该基类,已经实现了12种策略,分别为:
目前在paddle中基于该基类,已经实现了13种策略,分别为:

* :code:`NoamDecay`: 诺姆衰减,相关算法请参考 `《Attention Is All You Need》 <https://arxiv.org/pdf/1706.03762.pdf>`_ 。请参考 :ref:`cn_api_paddle_optimizer_lr_NoamDecay`。

Expand Down Expand Up @@ -35,6 +35,8 @@ LRScheduler

* :code:`MultiplicativeDecay`: 每次将当前学习率乘以lambda函数得到下一个学习率。请参考 :ref:`cn_api_paddle_optimizer_lr_MultiplicativeDecay`。

* :code:`OneCycleLR`: One Cycle衰减,学习率上升至最大,再下降至最小。请参考 :ref:`cn_api_paddle_optimizer_lr_OneCycleLR`。

你可以继承该基类实现任意的学习率策略,导出基类的方法为 ``form paddle.optimizer.lr import LRScheduler`` ,
必须要重写该基类的 ``get_lr()`` 函数,否则会抛出 ``NotImplementedError`` 异常。

Expand Down
40 changes: 40 additions & 0 deletions docs/api/paddle/optimizer/lr/OneCycleLR_cn.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
.. _cn_api_paddle_optimizer_lr_OneCycleLR:

OneCycleLR
-----------------------------------

.. py:class:: paddle.optimizer.lr.OneCycleLR(max_learning_rate, total_steps=None, epochs=None, steps_per_epoch=None, pct_start=0.3, anneal_strategy='cos', divide_factor=25., final_divide_factor=1e4, three_phase=False, last_epoch=-1, verbose=False)

该接口使用 ``one cycle`` 策略来动态调整学习率。

该策略将学习率从初始学习率调整到最大学习率,再从最大学习率调整到远小于初始学习率的最小学习率。

相关论文: `Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates <https://arxiv.org/abs/1708.07120>`_

注意,本调度器默认行为参考fastai仓库,其声称二阶段拥有比三阶段更好的效果。设置 ``three_phase=True`` 可以与论文中所描述的行为保持一致。

同时也请注意本调度器需要在每次迭代后调用 ``step`` 方法。

参数
::::::::::::

- **max_learning_rate** (float) - 最大学习率,学习率变化的上边界。功能上其通过 ``divide_factor`` 和 ``final_divide_factor``分别定义了初始学习率和最小学习率。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final_divide_factor 后面的 `` 后要加个空格

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

- **total_steps** (int,可选) - 训练过程中总的迭代数。如果 ``total_steps`` 未被指定,则会根据 ``epochs`` 和 ``steps_per_epoch`` 进行计算,因此请确保 ``total_steps`` 或 ( ``epochs`` , ``steps_per_epoch`` ) 其中之一被指定。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

默认值:None,表示XXX

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

- **epochs** (int,可选) - 训练过程中epoch的数量。默认为 ``None`` 。
- **steps_per_epoch** (int,可选) - 训练过程中每个epoch所需的迭代数。默认值为 ``None`` 。
- **pct_start** (float) - 学习率从初始学习率增长到最大学习率所需迭代数占总迭代数的比例。默认值为0.3。
- **anneal_strategy** (str,可选) - 调整学习率的策略。必须是 ( ``cos``, ``linear`` )其中之一, ``cos`` 表示使用余弦退火, ``linear`` 表示线性退火。默认值为cos。
- **divide_factor** (float,可选) - 该参数用于推断初始学习率,公式为initial_lr = max_lr/div_factor。默认值为25。
- **final_divide_factor** (float,可选) - 该参数用于推断最小学习率,公式为min_lr = max_lr/div_factor。默认值为25。
- **three_phase** (bool,可选) - 是否使用三阶段调度策略。如果是 ``True`` ,学习率将先从初始学习率上升到最大学习率,再从最大学习率下降到初始学习率(这两步所需要的迭代数是一致的),最后学习率会下降至最小学习率。如果是 ``False`` ,学习率在上升至最大学习率之后,直接下降至最小学习率。默认值为 ``False`` 。
- **last_epoch** (int,可选) - 上一轮的轮数,重启训练时设置为上一轮的epoch数。默认值为 -1,则为初始学习率。
- **verbose** (bool,可选) - 如果是 ``True`` ,则在每一轮更新时在标准输出 `stdout` 输出一条信息。默认值为 ``False`` 。

返回
::::::::::::
用于调整学习率的 ``OneCycleLR`` 实例对象。

代码示例
::::::::::::

COPY-FROM: paddle.optimizer.lr.OneCycleLR
5 changes: 4 additions & 1 deletion docs/api_guides/low_level/layers/learning_rate_scheduler.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,7 @@
相关API Reference请参考 :ref:`cn_api_paddle_optimizer_lr_ReduceOnPlateau`

* :code:`MultiplicativeDecay`: 每次将当前学习率乘以lambda函数得到下一个学习率。
相关API Reference请参考 :ref:`cn_api_paddle_optimizer_lr_MultiplicativeDecay`
相关API Reference请参考 :ref:`cn_api_paddle_optimizer_lr_MultiplicativeDecay`

* :code:`OneCycleLR`: One Cycle衰减,学习率上升至最大,再下降至最小.
相关API Reference请参考 :ref:`cn_api_paddle_optimizer_lr_OneCycleLR`