Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon No. 91】add: rfcs for tensorhook #477

Merged
merged 1 commit into from
Mar 29, 2023

Conversation

yangguohao
Copy link
Contributor

该 PR 目前没有一个成熟的方案,想借这个 PR 讨论一下该问题。目前的动转静模式我比较困惑。
按照官网给出的以下的例子,我在 Linear Layer 内部注册了 hook, 并且将 Linear layer 转换为静态图之后,注册的 TensorHook 仍然可以使用。
想请问这属于是在静态图模式下吗?

import numpy as np
import paddle
import paddle.nn as nn
import paddle.optimizer as opt

BATCH_SIZE = 16
BATCH_NUM = 4
EPOCH_NUM = 4

IMAGE_SIZE = 784
CLASS_NUM = 10

# define a random dataset
class RandomDataset(paddle.io.Dataset):
  def __init__(self, num_samples):
      self.num_samples = num_samples

  def __getitem__(self, idx):
      image = np.random.random([IMAGE_SIZE]).astype('float32')
      label = np.random.randint(0, CLASS_NUM, (1,)).astype('int64')
      return image, label

  def __len__(self):
      return self.num_samples

def hook(grad):
  print(grad)

class LinearNet(nn.Layer):
  def __init__(self):
      super().__init__()
      self._linear = nn.Linear(IMAGE_SIZE, CLASS_NUM)
      # 参数中带有 tensorhook
      self._linear.parameters()[0].register_hook(hook)

  def forward(self, x):
      return self._linear(x)

def train(layer, loader, loss_fn, opt):

  for epoch_id in range(EPOCH_NUM):
      for batch_id, (image, label) in enumerate(loader()):
          out = layer(image)
          loss = loss_fn(out, label)
          loss.backward()
          opt.step()
          opt.clear_grad()
          print("Epoch {} batch {}: loss = {}".format(
              epoch_id, batch_id, np.mean(loss.numpy())))

# create network
layer = LinearNet()
layer = paddle.jit.to_static(layer)  # <----通过函数式调用 paddle.jit.to_static(layer) 一键实现动转静
loss_fn = nn.CrossEntropyLoss()
adam = opt.Adam(learning_rate=0.001, parameters=layer.parameters())

# create data loader
dataset = RandomDataset(BATCH_NUM * BATCH_SIZE)
loader = paddle.io.DataLoader(dataset,
                            batch_size=BATCH_SIZE,
                            shuffle=True,
                            drop_last=True,
                            num_workers=2)

# train
train(layer, loader, loss_fn, adam)

#########################

Tensor(shape=[784, 10], dtype=float32, place=Place(cpu), stop_gradient=False,
       [[ 0.00685843, -0.04473465,  0.02630691, ...,  0.02114842,
         -0.00420425,  0.05158593],
        [ 0.04170314, -0.00664469,  0.02890819, ...,  0.02144193,
          0.01615764,  0.00732486],
        [ 0.03491056, -0.09118906,  0.04260214, ..., -0.02643691,
         -0.01101459,  0.02013674],
        ...,
        [ 0.02838284, -0.02267420,  0.03526176, ..., -0.02025566,
         -0.03277330,  0.00581156],
        [ 0.03858029, -0.06052501,  0.02197650, ..., -0.00450924,
          0.00246816, -0.02545753],
        [ 0.05698286, -0.04875685,  0.02872605, ..., -0.03730274,
         -0.03659698,  0.03724873]])
Epoch 0 batch 0: loss = 2.5687308311462402

@paddle-bot
Copy link

paddle-bot bot commented Mar 24, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请检查PR提交格式和内容是否完备,具体请参考示例模版
Your PR has been submitted. Thanks for your contribution!
Please check its format and content. For this, you can refer to Template and Demo.

@CLAassistant
Copy link

CLAassistant commented Mar 24, 2023

CLA assistant check
All committers have signed the CLA.

@yangguohao
Copy link
Contributor Author

查看了相关的 issues #48234 中存在的问题

@Aurelius84
Copy link
Collaborator

Aurelius84 commented Mar 28, 2023

查看了相关的 issues #48234 中存在的问题

@yangguohao 你的样例中,hook是挂到parameters上的,目前是没有问题的,但是如果register_hook的操作是放到 forward里,且是挂在一个非param类型的Tensor上时,你可以尝试下动转静是否还会有输出

@Aurelius84
Copy link
Collaborator

@yangguohao 非常欢迎积极参与TensorHook动静统一的方案讨论,有任何调研或者创新想法都可以在RFC里抛出来。

@Aurelius84 Aurelius84 merged commit 9b16388 into PaddlePaddle:master Mar 29, 2023
@2742195759
Copy link

这里有一些之前的调研结果和样例PR,同学可以参考一下:PaddlePaddle/Paddle#48234
有问题可以在这个ISSUE下问我。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants