Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[OP] broadcast plus, minus, mul, ... #2039

Closed
wants to merge 8 commits into from
Closed

[OP] broadcast plus, minus, mul, ... #2039

wants to merge 8 commits into from

Conversation

mli
Copy link
Member

@mli mli commented May 5, 2016

add a new op _broadcast_plus, both forward and backward are tested.

before extending to other elemental binary operators easily, i'd like to PR first because the implementation is unexpected complicate, i may be get something wrong..

}

template<typename xpu>
void PlusBroadcastBackward_(const OutputGrad& out_grad,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reuse backward code for different operation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only plus and minus can reuse the same codes

@pluskid
Copy link
Contributor

pluskid commented May 6, 2016

Very nice! I would suggest an add on to our shape inference mechanism here: we should allow the user to attach shape information to a Variable. During shape inference, this information can be used. This is especially useful in the case when broadcasting is supported. Sometimes I have a parameter that should be broadcast through the samples in a mini-batch, if I use broad cast plus to construct a symbol, the shape inference will not be able to figure out the shape of that parameter purely based on the data shape.

@mli
Copy link
Member Author

mli commented May 6, 2016

@pluskid i didn't get your question. currently the shape inference is done by BinaryBroadcastShape_, which will determine the broadcast dimensions. One can also always use symbol.Reshape to make the broadcast performs correctly.

@mli mli changed the title [OP] let symbol plus support broadcast [OP] broadcast plus, minus, mul, ... May 6, 2016
@pluskid
Copy link
Contributor

pluskid commented May 6, 2016

@mli For example, this currently works

import mxnet as mx

a = mx.sym.Variable('a')
b = mx.sym.Variable('b')
c = a+b
c.infer_shape(a=(10,20))

But if we replace + with boradcasting plus, will it still work? How is it going to infer the shape of b purely based on a for broadcasting? If you need a real scenario, there is this full LSTM implementation in acoustic model where a broadcasting elementwise multiplication is needed.

@mli
Copy link
Member Author

mli commented May 6, 2016

the only way works is

>>> d = mx.sym.BroadcastPlus(a,b)
>>> d.infer_shape(a=(10,20), b=(10,1))

so your use case is that the shape of b is unclear?

@pluskid
Copy link
Contributor

pluskid commented May 6, 2016

@mli in my use case, b is a parameter, while a is the data (or some hidden layer computed from the data). Our model or module are all based on the assumption that shape information are only from provide_data and provide_label from a DataIter, everything else should be able to be shape inferred. This works very well so far, except for this case. I think allowing the user to attach shape information to Variable sounds like a simple solution.

a = mx.sym.Variable('a')
b = mx.sym.Variable('b', shape=(1, 10))
c = mx.sym.BroadcastPlus(a,b)

c.simple_bind(a=(5, 10))

@mli
Copy link
Member Author

mli commented May 6, 2016

understood now. i'm agree that adding a shape attribution is cleaner than the possible solution that passes additional shape info to the data iterator

@piiswrong
Copy link
Contributor

I guess dont broadcast unless you have to and chiyuan's example still works.
On May 5, 2016 11:49 PM, "Mu Li" notifications@github.com wrote:

understood now. i'm agree that adding a shape attribution is cleaner than
the possible solution that passes additional shape info to the data iterator


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#2039 (comment)

@pluskid
Copy link
Contributor

pluskid commented May 6, 2016

@piiswrong My problem is that I know that the expected behavior is to be shared. For example, if I want to implement the fully connected layer with low level arithmetics, I could say out = matmul(X, W) + b, here b should not be of shape N-by-d, instead it should be broadcast and have the shape 1-by-d or just (d, ).

}


template<typename xpu, typename LHS_OP, typename RHS_OP>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you feed both left hand right to both LHD_OP and RHS_OP mulbackward can be merged right? This shouldn't increase time or memory.

@tqchen tqchen mentioned this pull request May 7, 2016
@tqchen
Copy link
Member

tqchen commented May 7, 2016

please check why the python test fails. Note that if bcast do not support inplace, inplace declaration should be disabled

.describe("lhs minus rhs with broadcast");

MXNET_REGISTER_SIMPLE_OP(_broadcast_mul, XPU)
.set_symbol_op_name("BroadcastMul")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for new operators, let us keep the name consistent, with lower cases, so mx.sym and mx.nd have exact the same function.

@piiswrong
Copy link
Contributor

@mli Any updates on this?

@pluskid pluskid mentioned this pull request May 10, 2016
@sxjscience
Copy link
Member

Really looking forward to some updates.

@piiswrong
Copy link
Contributor

@mli is probably busy lately. Can someone takeover this?

@piiswrong
Copy link
Contributor

closing this since we are moving to new one

@piiswrong piiswrong closed this May 17, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants