Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PatternLang] The pattern failed to match some subgraphs in a model #5928

Closed
comaniac opened this issue Jun 25, 2020 · 9 comments · Fixed by #5930
Closed

[PatternLang] The pattern failed to match some subgraphs in a model #5928

comaniac opened this issue Jun 25, 2020 · 9 comments · Fixed by #5930

Comments

@comaniac
Copy link
Contributor

comaniac commented Jun 25, 2020

Pointed out by @trevor-m, the following case that uses a batch_norm -> get(0) to match MobileNet V2, but some subgraphs were failed to be matched and partitioned.

import tvm
from tvm import relay
from tvm.relay.dataflow_pattern import *
import gluoncv as gcv # require pip install gluoncv
from tvm.relay.build_module import bind_params_by_name

def get_gcv_model(model_name):
    model_name = model_name.lower()
    shape = (1, 3, 224, 224)
    net = gcv.model_zoo.get_model(model_name, pretrained=True)
    ret = relay.frontend.from_mxnet(net, shape={'data': shape})
    return ret[0], ret[1], ('data', shape)

mod, params, data_shape = get_gcv_model('mobilenetv2_1.0')
mod["main"] = bind_params_by_name(mod["main"], params)


bn_out = is_op('nn.batch_norm')(wildcard(), wildcard(), wildcard(), wildcard(), wildcard())
pat = is_tuple_get_item(bn_out, 0)
print(pat.partition(mod['main']))

Here is a log snippet:

  %21 = fn (%FunctionVar_52_0, %FunctionVar_52_1, %FunctionVar_52_2, %FunctionVar_52_3, %FunctionVar_52_4, PartitionedFromPattern="nn.batch_norm_TupleGetItem0_") {
    %20 = nn.batch_norm(%FunctionVar_52_0, %FunctionVar_52_1, %FunctionVar_52_2, %FunctionVar_52_3, %FunctionVar_52_4);
    %20.0
  };
%22 = %21(%19, meta[relay.Constant][21] /* ty=Tensor[(96), float32] */ /* ty=Tensor[(96), float32] */, meta[relay.Constant][22] /* ty=Tensor[(96), float32] */ /* ty=Tensor[(96), float32] */, meta[relay.Constant][23] /* ty=Tensor[(96), float32] */ /* ty=Tensor[(96), float32] */, meta[relay.Constant][24] /* ty=Tensor[(96), float32] */ /* ty=Tensor[(96), float32] */);
  %23 = clip(%22, a_min=0f, a_max=6f);
  %24 = nn.conv2d(%23, meta[relay.Constant][25] /* ty=Tensor[(24, 96, 1, 1), float32] */ /* ty=Tensor[(24, 96, 1, 1), float32] */, padding=[0, 0, 0, 0], channels=24, kernel_size=[1, 1]);
  %25 = nn.batch_norm(%24, meta[relay.Constant][26] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */, meta[relay.Constant][27] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */, meta[relay.Constant][28] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */, meta[relay.Constant][29] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */);
  %26 = %25.0;
  %27 = nn.conv2d(%26, meta[relay.Constant][30] /* ty=Tensor[(144, 24, 1, 1), float32] */ /* ty=Tensor[(144, 24, 1, 1), float32] */, padding=[0, 0, 0, 0], channels=144, kernel_size=[1, 1]);
  %29 = fn (%FunctionVar_50_0, %FunctionVar_50_1, %FunctionVar_50_2, %FunctionVar_50_3, %FunctionVar_50_4, PartitionedFromPattern="nn.batch_norm_TupleGetItem0_") {
    %28 = nn.batch_norm(%FunctionVar_50_0, %FunctionVar_50_1, %FunctionVar_50_2, %FunctionVar_50_3, %FunctionVar_50_4);
    %28.0
  };

As can be seen, batch_norm %20 and %28 were successfully matched and partitioned, but %25 wasn't. It seems to me that they are all the same, tho.

@mbrookhart could you help take a look? Thanks

@mbrookhart
Copy link
Contributor

I'll try to reproduce, thanks for filing the issue!

@mbrookhart
Copy link
Contributor

This isn't making much sense to me, it looks like a few of the batch_norm call nodes are duplicated, i.e., exactly the same object. The partition pass is refusing to fuse the same op into two different graphs, cause that would be bad.
But I don't see an obvious duplication in the relay IR. Digging more.

@comaniac
Copy link
Contributor Author

Yeah it should not be a case for MobileNet model. MobileNet model should be a single dataflow pipeline without branches and residual connections.

@mbrookhart
Copy link
Contributor

Found it:

The original model has this:

  %20 = nn.batch_norm(%19, meta[relay.Constant][26] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */, meta[relay.Constant][27] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */, meta[relay.Constant][28] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */, meta[relay.Constant][29] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */) /* ty=(Tensor[(1, 24, 56, 56), float32], Tensor[(24), float32], Tensor[(24), float32]) */;
  %21 = %20.0;

and later:

  %31 = nn.batch_norm(%30, meta[relay.Constant][41] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */, meta[relay.Constant][42] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */, meta[relay.Constant][43] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */, meta[relay.Constant][44] /* ty=Tensor[(24), float32] */ /* ty=Tensor[(24), float32] */) /* ty=(Tensor[(1, 24, 56, 56), float32], Tensor[(24), float32], Tensor[(24), float32]) */;
  %32 = %31.0;
  %33 = %20.0;
  %34 = add(%32, %33) /* ty=Tensor[(1, 24, 56, 56), float32] */;

@mbrookhart
Copy link
Contributor

It should probably just be:

%34 = add(%32, %21)

To get the behavior you want. So...Relay level CSE? I'm going to double check the partition rejection logic, I'm not sure that 100% works right now, I think I might be partitioning one of those when I shouldn't.

@comaniac
Copy link
Contributor Author

Ah apparently I was wrong. MobileNet has residual connections...I see. So %20 and %33 are partitioned to a function. In this case "another %20" is rejected to be partitioned again with %21. I need to think about the proper semantic for this case. Having Relay level CSE would definitely solve this issue but I'm not sure if it's overkill.

@comaniac
Copy link
Contributor Author

I guess an ideal solution would be trying to figure out those two matches are duplicated and reuse the first partitioned function for the second match? But this would look like an ad hoc solution targeting to Relay graph with duplicated nodes...

@mbrookhart
Copy link
Contributor

Yes. I think the right solution is CSE, but that is at lot of work.

@mbrookhart
Copy link
Contributor

mbrookhart commented Jun 25, 2020

Those two PRs plus this change to the script fixes the issue for me:

import tvm 
from tvm import relay
from tvm.relay.dataflow_pattern import *
import gluoncv as gcv # require pip install gluoncv
from tvm.relay.build_module import bind_params_by_name

def get_gcv_model(model_name):
    """Pull a Gluon CV model."""
    import gluoncv as gcv 

    model_name = model_name.lower()

    print('Pulling the model from Gluon CV model zoo...')
    shape = (1, 3, 224, 224)
    if model_name.find('inception') != -1: 
        shape = (1, 3, 299, 299)
    elif model_name.find('yolo3') != -1: 
        shape = (1, 3, 320, 320)
    elif model_name.startswith('ssd'):
        tokens = re.search(r'ssd_(\d+)_', model_name)
        size = int(tokens.group(1))
        shape = (1, 3, size, size)
    net = gcv.model_zoo.get_model(model_name, pretrained=True)
    ret = relay.frontend.from_mxnet(net, shape={'data': shape})
    return ret[0], ret[1], ('data', shape)

mod, params, data_shape = get_gcv_model('mobilenetv2_1.0')
mod["main"] = bind_params_by_name(mod["main"], params)
mod = relay.transform.EliminateCommonSubexpr()(mod)

bn_out = is_op('nn.batch_norm')(wildcard(), wildcard(), wildcard(), wildcard(), wildcard())
pat = is_tuple_get_item(bn_out, 0)
print(pat.partition(mod['main']))                    

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants