Remove inplace broadcast_add #5551

wyg1997 · 2021-07-21T02:58:30Z

由于BroadcastAdd Op会把x和y都做广播处理，不一定能inplace，所以inplace版本的BroadcastAdd不能直接调用此Op。

另外使broadcast_like支持axes参数为None。

github-actions · 2021-07-21T03:46:15Z

CI failed, removing label automerge

wyg1997 · 2021-08-14T06:48:21Z

这个问题是当时跑SincNet时发现的，需要进一步查一下为什么会在linear中错误调用了BroadcastAdd

hjchen2 · 2021-08-14T06:49:57Z

这个问题是当时跑SincNet时发现的，需要进一步查一下为什么会在linear中错误调用了BroadcastAdd

那是bias add走了BroadcastAdd的逻辑吧

wyg1997 · 2021-08-14T06:54:11Z

那是bias add走了BroadcastAdd的逻辑吧

对，但是我记得当时bias add的时候是会对x做broadcast的，这里按理说不应该broadcast

hjchen2 · 2021-08-14T06:57:18Z

但是我记得当时bias add的时候是会对x做broadcast的，这里按理说不应该broadcast

之前应该也是因为broadcast add不支持inplace，在python里面就直接换成broadcast+add了，你看一下这两行python代码，

oneflow/oneflow/python/nn/modules/math_ops.py

Lines 474 to 475 in a5df297

    
           y = flow.experimental.broadcast_like(y, x) 
        
           return ElementwiseAdd(inplace=True)(x, y)

wyg1997 · 2021-08-14T07:00:49Z

之前应该也是因为broadcast add不支持inplace，在python里面就直接换成broadcast+add了

这个PR改之前单测是能过的，也就是真的做了broadcast add，但是会有显存泄露问题，对训练正确性没影响。才有这个PR把BroadcastAdd拆成了broadcast+add。

wyg1997 · 2021-08-14T07:44:48Z

是使用BroadcastAdd时，如果跑完前向不执行 loss.backward() 时，前一个后向图不会释放，多次调用就会OOM，怀疑是Broadcast在inplace情况下捕获时出现了循环引用。

这里实现了一个最小复现代码，在这个PR前的commit上可以稳定复现：

#!/usr/bin/env python
# coding=utf-8

import oneflow as flow

for i in range(1000):
    a = flow.ones((256*256, 1024), requires_grad=True).to("cuda")  # 256MB
    b = flow.ones((1024), requires_grad=True).to("cuda")  # 256MB
    a += b
    c = a.sum()
    print(c.numpy())

fix(*): remove inplace broadcast_add

2c8a187

wyg1997 requested a review from BBuf July 21, 2021 02:59

wyg1997 added bug eager labels Jul 21, 2021

wyg1997 mentioned this pull request Jul 21, 2021

Support inplace add #5432

Merged

BBuf approved these changes Jul 21, 2021

View reviewed changes

wyg1997 added the automerge label Jul 21, 2021

wyg1997 requested a review from oneflow-ci-bot July 21, 2021 03:07

github-actions bot removed the automerge label Jul 21, 2021

oneflow-ci-bot removed their request for review July 21, 2021 04:09

fix(BroadcastLike): fix axes bug

a5df297

wyg1997 added the automerge label Jul 21, 2021

wyg1997 assigned oneflow-ci-bot Jul 21, 2021

wyg1997 requested a review from oneflow-ci-bot July 21, 2021 07:17

wyg1997 unassigned oneflow-ci-bot Jul 21, 2021

oneflow-ci-bot merged commit 5af1c83 into master Jul 21, 2021

oneflow-ci-bot deleted the fix-inplace_broadcast_add branch July 21, 2021 08:36

wyg1997 mentioned this pull request Aug 16, 2021

Fix inplace op circle reference bug #5910

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove inplace broadcast_add #5551

Remove inplace broadcast_add #5551

wyg1997 commented Jul 21, 2021 •

edited

github-actions bot commented Jul 21, 2021

wyg1997 commented Aug 14, 2021

hjchen2 commented Aug 14, 2021

wyg1997 commented Aug 14, 2021

hjchen2 commented Aug 14, 2021 •

edited

wyg1997 commented Aug 14, 2021

wyg1997 commented Aug 14, 2021 •

edited

Remove inplace broadcast_add #5551

Remove inplace broadcast_add #5551

Conversation

wyg1997 commented Jul 21, 2021 • edited

github-actions bot commented Jul 21, 2021

wyg1997 commented Aug 14, 2021

hjchen2 commented Aug 14, 2021

wyg1997 commented Aug 14, 2021

hjchen2 commented Aug 14, 2021 • edited

wyg1997 commented Aug 14, 2021

wyg1997 commented Aug 14, 2021 • edited

wyg1997 commented Jul 21, 2021 •

edited

hjchen2 commented Aug 14, 2021 •

edited

wyg1997 commented Aug 14, 2021 •

edited