-
Notifications
You must be signed in to change notification settings - Fork 53
Description
In the existing WebNN spec, conv2d supports two input operand layouts defined by MLInputOperandLayout and four filter operand layouts defined by MLConv2dFilterOperandLayout.
enum MLInputOperandLayout {
"nchw",
"nhwc"
};
enum MLConv2dFilterOperandLayout {
"oihw",
"hwio",
"ohwi",
"ihwo"
};
This may make the implementation more complicated especially if a native ML framework or OS API doesn't support some of these layouts. If one layout is unsupported, the implementation may need to insert the transpose
operations into the graph around the conv2d
operation that transposes the unsupported layout to supported one. This would easily lead to an inefficient graph representation that may have redundant transpose
operations. Or the implementation may need to optimize the graph by techniques such as "transpose sink" which may require a more complex implementation. This issue was raised in Chromium CL review.
To simplify the implementation, the proposal is to reduce the supported operand layouts, for example, just keep the default one. Because WebNN supports transpose
operation, the layout adaption and graph level optimization can be handled by ML frameworks that usually already support such functionalities.
Thanks @wacky6 for this idea.
Activity
WebNN: Define XNNPACK Node for conv2d MLOperator
WebNN: Define XNNPACK Node for pooling MLOperators
anssiko commentedon Mar 23, 2023
This issue was discussed at the WebML WG Teleconference – 16 March 2023. Summary: Awaits further implementation feedback.
fdwr commentedon Mar 23, 2023
Picking just one preferred layout in WebNN could make life easier for the calling framework and the underlying backend implementation, or it could make it harder for both:
build()
time and should be able to see and collapse any such adjacent transposes, and only the backend has enough information to be able to performantly select the right approach.I prefer accepting both (keeping the current spec), but it would be informative to see a holistic table each major framework's preferred layout and each backend's preferred layout.
[updated...] Table added (✅ == default):
torch.memory_format.channels_last
data_format
defaults to NHWC orchannelsLast
.cudnnSetTensor4dDescriptor
anssiko commentedon Mar 24, 2023
@fdwr thanks for sharing your preference and the supporting details.
As an aside, I encourage incorporating considerations such as this into specification informatively alongside the normative prose. It helps explain the specification to people who look at it without the full context active WG participants have.
wacky6 commentedon Apr 11, 2023
Layout support comes up in MLOperand implementation that allows data shape broadcasting. https://chromium-review.googlesource.com/c/chromium/src/+/4396686/comment/f02acaeb_3c2795f2/
Supporting both channel-first and channel-last layout will complicate spec steps and implementation because the current numpy broadcast rule implies right-most first broadcast.
Example: caller wants to apply a per-channel multiplication.
How to support case 1 isn't clear. Some questions might help decision:
I have a slight preference for supporting only one layout (NHWC to be precise).
build()
time (one-off cost), and a very small over head incur atcompute()
(convert to the right layout before passing data to backend, from result to what's defined in spec, probably negligible?).Support WebNN EP (#15698)
Support WebNN EP (#15698)
wacky6 commentedon Jul 20, 2023
I want to share a data point.
I was playing with Real-ESRGAN today, and found out that
torch.compile
channel_last layout is faster thantorch.compile
channel_first layout on my NVIDIA A4000.I'm not sure how well this transfer to other models (ESRGAN is heavily based on CNN + residual connection) though.
I wonder if we should benchmark on channel ordering on different hardware (i.e. vendor other than NVIDIA could optimize for channel_first).
Or maybe this won't matter if graph builder (or rather optimizer) is "clever" enough.
huningxin commentedon Aug 15, 2023
There is a security perspective from @quidity (Thanks Alex!) in Chromium CL-4653303: WebNN: Define conv2d operator in mojo review.
Alex mentioned:
wacky6 commentedon Aug 15, 2023
FWIW, another way to tackle layout is to tell the implementation which layout should be used, like: https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html
This could be a hint to
GraphBuilder.build()
(right before producing a graph that can be passed tocompute()
).--
Taking a step back, I still strongly prefer a single unified layout (i.e. NHWC) that's applied throughout MLGraphBuilder methods, and let the backend (e.g. DMLImpl) change the layout (if necessary) before sending to hardware.
18 remaining items