[TOPI][ARM] Improve injective schedule #2801

hlu1 · 2019-03-13T04:29:30Z

The generic injective schedule does not have vectorization and is therefore slow on ARM CPU. With vectorization, it can run 2-3x faster. For example, for a upsample_relu layer with 48 x 48 x 48 (C, H, W), the vectorized code runs at 0.003 ms/iter compared to 0.008 ms/iter on raspberry pi.

hlu1 · 2019-03-13T04:30:05Z

@ajtulloch please review :)

FrozenGene · 2019-03-13T06:52:10Z

Yeah. this is very useful likelayout_transform op.

@hlu1 How about (io, ii) = s[x].split(list(s[x].op.axis)[-1], 8) if s[x].op.axis[-1] <8 and how about poor arm cpu like A9 (maybe 4 is better)? I have done it like this:

  if len(s[x].op.axis) >= 5: # it is very useful when we have NCHWxC.
        fused = s[x].fuse(s[x].op.axis[0], s[x].op.axis[1], s[x].op.axis[2])
        s[x].parallel(fused)
    elif len(s[x].op.axis) >= 3:
        fused = s[x].fuse(s[x].op.axis[0], s[x].op.axis[1])
        s[x].parallel(fused)
    else:
        s[x].parallel(s[x].op.axis[0])
    s[x].vectorize(list(s[x].op.axis)[-1])

hlu1 · 2019-03-13T07:43:44Z

We also need to consider the case of int8/unit8. For example, when you add two int8 numbers together to produce 1 int16 number, the simd width is 128/16 = 8. I think in general 8 should be a good compromise.

FrozenGene · 2019-03-13T07:58:35Z

Ok. Could you add >=5 like we do it in x86? https://github.com/dmlc/tvm/blob/master/topi/python/topi/x86/injective.py#L26. This could help us in NCHWxC layout transform.

hlu1 · 2019-03-13T08:06:58Z

That should have been covered by:

if len(s[x].op.axis) >= 3:
    fused = s[x].fuse(s[x].op.axis[0], s[x].op.axis[1], s[x].op.axis[2])
    s[x].parallel(fused)

I used len(s[x].op.axis) >= 3 because it's needed for special cases like [1, 1, 224, 224].

FrozenGene · 2019-03-13T08:14:23Z

Oops. I haven't noticed it.

FrozenGene

LGTM.

FrozenGene · 2019-04-19T06:20:00Z

@hlu1 Could you help to see this discussion on discuss forum? https://discuss.tvm.ai/t/relay-build-target-rasp3b-something-wrong/2195 This issue should be related with this changeset.

hlu1 · 2019-04-21T03:54:50Z

@FrozenGene, thanks for letting me know. Fixed in #3061

FrozenGene approved these changes Mar 13, 2019

View reviewed changes

hlu1 force-pushed the injective branch 4 times, most recently from d191369 to a6e3656 Compare March 15, 2019 00:57

[TOPI][ARM] Improve injective schedule

659bc8a

hlu1 force-pushed the injective branch from a6e3656 to 659bc8a Compare March 15, 2019 01:00

merrymercy approved these changes Mar 18, 2019

View reviewed changes

merrymercy merged commit 5a8ab8f into apache:master Mar 18, 2019

wweic pushed a commit to wweic/tvm that referenced this pull request Mar 20, 2019

[TOPI][ARM] Improve injective schedule (apache#2801)

e902d18

wweic pushed a commit to neo-ai/tvm that referenced this pull request Mar 20, 2019

[TOPI][ARM] Improve injective schedule (apache#2801)

f260b16

hlu1 deleted the injective branch April 17, 2019 06:51

tqchen mentioned this pull request Nov 8, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TOPI][ARM] Improve injective schedule #2801

[TOPI][ARM] Improve injective schedule #2801

hlu1 commented Mar 13, 2019

hlu1 commented Mar 13, 2019

FrozenGene commented Mar 13, 2019 •

edited

hlu1 commented Mar 13, 2019

FrozenGene commented Mar 13, 2019

hlu1 commented Mar 13, 2019 •

edited

FrozenGene commented Mar 13, 2019 •

edited

FrozenGene left a comment

FrozenGene commented Apr 19, 2019

hlu1 commented Apr 21, 2019

[TOPI][ARM] Improve injective schedule #2801

[TOPI][ARM] Improve injective schedule #2801

Conversation

hlu1 commented Mar 13, 2019

hlu1 commented Mar 13, 2019

FrozenGene commented Mar 13, 2019 • edited

hlu1 commented Mar 13, 2019

FrozenGene commented Mar 13, 2019

hlu1 commented Mar 13, 2019 • edited

FrozenGene commented Mar 13, 2019 • edited

FrozenGene left a comment

Choose a reason for hiding this comment

FrozenGene commented Apr 19, 2019

hlu1 commented Apr 21, 2019

FrozenGene commented Mar 13, 2019 •

edited

hlu1 commented Mar 13, 2019 •

edited

FrozenGene commented Mar 13, 2019 •

edited