【Hackathon No. 91】add: rfcs for tensorhook #477

yangguohao · 2023-03-24T06:44:03Z

该 PR 目前没有一个成熟的方案，想借这个 PR 讨论一下该问题。目前的动转静模式我比较困惑。
按照官网给出的以下的例子，我在 Linear Layer 内部注册了 hook, 并且将 Linear layer 转换为静态图之后，注册的 TensorHook 仍然可以使用。
想请问这属于是在静态图模式下吗？

import numpy as np
import paddle
import paddle.nn as nn
import paddle.optimizer as opt

BATCH_SIZE = 16
BATCH_NUM = 4
EPOCH_NUM = 4

IMAGE_SIZE = 784
CLASS_NUM = 10

# define a random dataset
class RandomDataset(paddle.io.Dataset):
  def __init__(self, num_samples):
      self.num_samples = num_samples

  def __getitem__(self, idx):
      image = np.random.random([IMAGE_SIZE]).astype('float32')
      label = np.random.randint(0, CLASS_NUM, (1,)).astype('int64')
      return image, label

  def __len__(self):
      return self.num_samples

def hook(grad):
  print(grad)

class LinearNet(nn.Layer):
  def __init__(self):
      super().__init__()
      self._linear = nn.Linear(IMAGE_SIZE, CLASS_NUM)
      # 参数中带有 tensorhook
      self._linear.parameters()[0].register_hook(hook)

  def forward(self, x):
      return self._linear(x)

def train(layer, loader, loss_fn, opt):

  for epoch_id in range(EPOCH_NUM):
      for batch_id, (image, label) in enumerate(loader()):
          out = layer(image)
          loss = loss_fn(out, label)
          loss.backward()
          opt.step()
          opt.clear_grad()
          print("Epoch {} batch {}: loss = {}".format(
              epoch_id, batch_id, np.mean(loss.numpy())))

# create network
layer = LinearNet()
layer = paddle.jit.to_static(layer)  # <----通过函数式调用 paddle.jit.to_static(layer) 一键实现动转静
loss_fn = nn.CrossEntropyLoss()
adam = opt.Adam(learning_rate=0.001, parameters=layer.parameters())

# create data loader
dataset = RandomDataset(BATCH_NUM * BATCH_SIZE)
loader = paddle.io.DataLoader(dataset,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            drop_last=True,
                            num_workers=2)

# train
train(layer, loader, loss_fn, adam)

#########################

Tensor(shape=[784, 10], dtype=float32, place=Place(cpu), stop_gradient=False,
       [[ 0.00685843, -0.04473465,  0.02630691, ...,  0.02114842,
         -0.00420425,  0.05158593],
        [ 0.04170314, -0.00664469,  0.02890819, ...,  0.02144193,
          0.01615764,  0.00732486],
        [ 0.03491056, -0.09118906,  0.04260214, ..., -0.02643691,
         -0.01101459,  0.02013674],
        ...,
        [ 0.02838284, -0.02267420,  0.03526176, ..., -0.02025566,
         -0.03277330,  0.00581156],
        [ 0.03858029, -0.06052501,  0.02197650, ..., -0.00450924,
          0.00246816, -0.02545753],
        [ 0.05698286, -0.04875685,  0.02872605, ..., -0.03730274,
         -0.03659698,  0.03724873]])
Epoch 0 batch 0: loss = 2.5687308311462402

paddle-bot · 2023-03-24T06:44:09Z

你的PR提交成功，感谢你对开源项目的贡献!
请检查PR提交格式和内容是否完备，具体请参考示例和模版。
Your PR has been submitted. Thanks for your contribution!
Please check its format and content. For this, you can refer to Template and Demo.

CLAassistant · 2023-03-24T06:44:15Z

All committers have signed the CLA.

yangguohao · 2023-03-28T03:09:34Z

查看了相关的 issues #48234 中存在的问题

Aurelius84 · 2023-03-28T06:23:52Z

查看了相关的 issues #48234 中存在的问题

@yangguohao 你的样例中，hook是挂到parameters上的，目前是没有问题的，但是如果register_hook的操作是放到 forward里，且是挂在一个非param类型的Tensor上时，你可以尝试下动转静是否还会有输出

Aurelius84 · 2023-03-29T02:20:32Z

@yangguohao 非常欢迎积极参与TensorHook动静统一的方案讨论，有任何调研或者创新想法都可以在RFC里抛出来。

2742195759 · 2023-03-30T07:34:20Z

这里有一些之前的调研结果和样例PR，同学可以参考一下：PaddlePaddle/Paddle#48234
有问题可以在这个ISSUE下问我。

paddle-bot bot added contributor status: proposed labels Mar 24, 2023

yangguohao mentioned this pull request Mar 25, 2023

【PaddlePaddle Hackathon 第四期】任务总览 PaddlePaddle/Paddle#51281

Closed

recommit: rfc for TensorHook

2b52620

yangguohao force-pushed the master branch from 47db949 to 2b52620 Compare March 28, 2023 02:38

luotao1 assigned luotao1, Aurelius84 and Ligoml Mar 28, 2023

Aurelius84 approved these changes Mar 29, 2023

View reviewed changes

Aurelius84 merged commit 9b16388 into PaddlePaddle:master Mar 29, 2023

yangguohao mentioned this pull request Apr 20, 2023

【Hackathon No.91】 PaddlePaddle/Paddle#52948

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Hackathon No. 91】add: rfcs for tensorhook #477

【Hackathon No. 91】add: rfcs for tensorhook #477

yangguohao commented Mar 24, 2023

paddle-bot bot commented Mar 24, 2023

CLAassistant commented Mar 24, 2023 •

edited

yangguohao commented Mar 28, 2023

Aurelius84 commented Mar 28, 2023 •

edited

Aurelius84 commented Mar 29, 2023

2742195759 commented Mar 30, 2023

【Hackathon No. 91】add: rfcs for tensorhook #477

【Hackathon No. 91】add: rfcs for tensorhook #477

Conversation

yangguohao commented Mar 24, 2023

paddle-bot bot commented Mar 24, 2023

CLAassistant commented Mar 24, 2023 • edited

yangguohao commented Mar 28, 2023

Aurelius84 commented Mar 28, 2023 • edited

Aurelius84 commented Mar 29, 2023

2742195759 commented Mar 30, 2023

CLAassistant commented Mar 24, 2023 •

edited

Aurelius84 commented Mar 28, 2023 •

edited