[PatternLang] The pattern failed to match some subgraphs in a model #5928

comaniac · 2020-06-25T19:16:59Z

Pointed out by @trevor-m, the following case that uses a batch_norm -> get(0) to match MobileNet V2, but some subgraphs were failed to be matched and partitioned.

import tvm
from tvm import relay
from tvm.relay.dataflow_pattern import *
import gluoncv as gcv # require pip install gluoncv
from tvm.relay.build_module import bind_params_by_name

def get_gcv_model(model_name):
    model_name = model_name.lower()
    shape = (1, 3, 224, 224)
    net = gcv.model_zoo.get_model(model_name, pretrained=True)
    ret = relay.frontend.from_mxnet(net, shape={'data': shape})
    return ret[0], ret[1], ('data', shape)

mod, params, data_shape = get_gcv_model('mobilenetv2_1.0')
mod["main"] = bind_params_by_name(mod["main"], params)


bn_out = is_op('nn.batch_norm')(wildcard(), wildcard(), wildcard(), wildcard(), wildcard())
pat = is_tuple_get_item(bn_out, 0)
print(pat.partition(mod['main']))

Here is a log snippet:

  %21 = fn (%FunctionVar_52_0, %FunctionVar_52_1, %FunctionVar_52_2, %FunctionVar_52_3, %FunctionVar_52_4, PartitionedFromPattern="nn.batch_norm_TupleGetItem0_") {
    %20 = nn.batch_norm(%FunctionVar_52_0, %FunctionVar_52_1, %FunctionVar_52_2, %FunctionVar_52_3, %FunctionVar_52_4);
    %20.0
  };
%22 = %21(%19, meta[relay.Constant][21] /* ty=Tensor[(96), float32] */ /* ty=Tensor[(96), float32] */, meta[relay.Constant][22] /* ty=Tensor[(96), float32] */ /* ty=Tensor[(96), float32] */, meta[relay.Constant][23] /* ty=Tensor[(96), float32] */ /* ty=Tensor[(96), float32] */, meta[relay.Constant][24] /* ty=Tensor[(96), float32] */ /* ty=Tensor[(96), float32] */);
  %23 = clip(%22, a_min=0f, a_max=6f);
  %24 = nn.conv2d(%23, meta[relay.Constant][25] /* ty=Tensor[(24, 96, 1, 1), float32] */ /* ty=Tensor[(24, 96, 1, 1), float32] */, padding=[0, 0, 0, 0], channels=24, kernel_size=[1, 1]);
  %25 = nn.batch_norm(%24, meta[relay.Constant][26] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */, meta[relay.Constant][27] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */, meta[relay.Constant][28] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */, meta[relay.Constant][29] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */);
  %26 = %25.0;
  %27 = nn.conv2d(%26, meta[relay.Constant][30] /* ty=Tensor[(144, 24, 1, 1), float32] */ /* ty=Tensor[(144, 24, 1, 1), float32] */, padding=[0, 0, 0, 0], channels=144, kernel_size=[1, 1]);
  %29 = fn (%FunctionVar_50_0, %FunctionVar_50_1, %FunctionVar_50_2, %FunctionVar_50_3, %FunctionVar_50_4, PartitionedFromPattern="nn.batch_norm_TupleGetItem0_") {
    %28 = nn.batch_norm(%FunctionVar_50_0, %FunctionVar_50_1, %FunctionVar_50_2, %FunctionVar_50_3, %FunctionVar_50_4);
    %28.0
  };

As can be seen, batch_norm %20 and %28 were successfully matched and partitioned, but %25 wasn't. It seems to me that they are all the same, tho.

@mbrookhart could you help take a look? Thanks

The text was updated successfully, but these errors were encountered:

mbrookhart · 2020-06-25T20:04:27Z

I'll try to reproduce, thanks for filing the issue!

mbrookhart · 2020-06-25T20:49:25Z

This isn't making much sense to me, it looks like a few of the batch_norm call nodes are duplicated, i.e., exactly the same object. The partition pass is refusing to fuse the same op into two different graphs, cause that would be bad.
But I don't see an obvious duplication in the relay IR. Digging more.

comaniac · 2020-06-25T20:59:45Z

Yeah it should not be a case for MobileNet model. MobileNet model should be a single dataflow pipeline without branches and residual connections.

mbrookhart · 2020-06-25T21:00:55Z

Found it:

The original model has this:

  %20 = nn.batch_norm(%19, meta[relay.Constant][26] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */, meta[relay.Constant][27] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */, meta[relay.Constant][28] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */, meta[relay.Constant][29] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */) /* ty=(Tensor[(1, 24, 56, 56), float32], Tensor[(24), float32], Tensor[(24), float32]) */;
  %21 = %20.0;

and later:

  %31 = nn.batch_norm(%30, meta[relay.Constant][41] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */, meta[relay.Constant][42] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */, meta[relay.Constant][43] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */, meta[relay.Constant][44] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */) /* ty=(Tensor[(1, 24, 56, 56), float32], Tensor[(24), float32], Tensor[(24), float32]) */;
  %32 = %31.0;
  %33 = %20.0;
  %34 = add(%32, %33) /* ty=Tensor[(1, 24, 56, 56), float32] */;

mbrookhart · 2020-06-25T21:05:17Z

It should probably just be:

%34 = add(%32, %21)

To get the behavior you want. So...Relay level CSE? I'm going to double check the partition rejection logic, I'm not sure that 100% works right now, I think I might be partitioning one of those when I shouldn't.

comaniac · 2020-06-25T21:11:53Z

Ah apparently I was wrong. MobileNet has residual connections...I see. So %20 and %33 are partitioned to a function. In this case "another %20" is rejected to be partitioned again with %21. I need to think about the proper semantic for this case. Having Relay level CSE would definitely solve this issue but I'm not sure if it's overkill.

comaniac · 2020-06-25T21:22:01Z

I guess an ideal solution would be trying to figure out those two matches are duplicated and reuse the first partitioned function for the second match? But this would look like an ad hoc solution targeting to Relay graph with duplicated nodes...

mbrookhart · 2020-06-25T21:53:12Z

Yes. I think the right solution is CSE, but that is at lot of work.

mbrookhart · 2020-06-25T23:43:21Z

Those two PRs plus this change to the script fixes the issue for me:

import tvm 
from tvm import relay
from tvm.relay.dataflow_pattern import *
import gluoncv as gcv # require pip install gluoncv
from tvm.relay.build_module import bind_params_by_name

def get_gcv_model(model_name):
    """Pull a Gluon CV model."""
    import gluoncv as gcv 

    model_name = model_name.lower()

    print('Pulling the model from Gluon CV model zoo...')
    shape = (1, 3, 224, 224)
    if model_name.find('inception') != -1: 
        shape = (1, 3, 299, 299)
    elif model_name.find('yolo3') != -1: 
        shape = (1, 3, 320, 320)
    elif model_name.startswith('ssd'):
        tokens = re.search(r'ssd_(\d+)_', model_name)
        size = int(tokens.group(1))
        shape = (1, 3, size, size)
    net = gcv.model_zoo.get_model(model_name, pretrained=True)
    ret = relay.frontend.from_mxnet(net, shape={'data': shape})
    return ret[0], ret[1], ('data', shape)

mod, params, data_shape = get_gcv_model('mobilenetv2_1.0')
mod["main"] = bind_params_by_name(mod["main"], params)
mod = relay.transform.EliminateCommonSubexpr()(mod)

bn_out = is_op('nn.batch_norm')(wildcard(), wildcard(), wildcard(), wildcard(), wildcard())
pat = is_tuple_get_item(bn_out, 0)
print(pat.partition(mod['main']))

This was referenced Jun 25, 2020

[PatternLang] Don't rewrite expressions used outside of the pattern #5930

Merged

Add TupleGetItem to CSE #5931

Merged

tqchen closed this as completed in #5930 Jun 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PatternLang] The pattern failed to match some subgraphs in a model #5928

[PatternLang] The pattern failed to match some subgraphs in a model #5928

comaniac commented Jun 25, 2020 •

edited

mbrookhart commented Jun 25, 2020

mbrookhart commented Jun 25, 2020

comaniac commented Jun 25, 2020

mbrookhart commented Jun 25, 2020

mbrookhart commented Jun 25, 2020

comaniac commented Jun 25, 2020

comaniac commented Jun 25, 2020

mbrookhart commented Jun 25, 2020

mbrookhart commented Jun 25, 2020 •

edited

[PatternLang] The pattern failed to match some subgraphs in a model #5928

[PatternLang] The pattern failed to match some subgraphs in a model #5928

Comments

comaniac commented Jun 25, 2020 • edited

mbrookhart commented Jun 25, 2020

mbrookhart commented Jun 25, 2020

comaniac commented Jun 25, 2020

mbrookhart commented Jun 25, 2020

mbrookhart commented Jun 25, 2020

comaniac commented Jun 25, 2020

comaniac commented Jun 25, 2020

mbrookhart commented Jun 25, 2020

mbrookhart commented Jun 25, 2020 • edited

comaniac commented Jun 25, 2020 •

edited

mbrookhart commented Jun 25, 2020 •

edited