Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MatMul op is not supported #61

Open
JosePereiraUA opened this issue Feb 15, 2022 · 27 comments
Open

MatMul op is not supported #61

JosePereiraUA opened this issue Feb 15, 2022 · 27 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@JosePereiraUA
Copy link
Contributor

I'm trying to load an .ONNX file, and get the error on the title.

Also:

Closest candidates are:
  load_node!(::Ghost.Tape, ::ONNX.OpConfig{:ONNX, :Gemm}, ::Vector{Ghost.Variable}, ::Dict{Symbol, Any}) at ~/.julia/packages/ONNX/TEIEp/src/load.jl:67
  load_node!(::Ghost.Tape, ::ONNX.OpConfig{:ONNX, :Conv}, ::Vector{Ghost.Variable}, ::Dict{Symbol, Any}) at ~/.julia/packages/ONNX/TEIEp/src/load.jl:86
  load_node!(::Ghost.Tape, ::ONNX.OpConfig{:ONNX, :MaxPool}, ::Vector{Ghost.Variable}, ::Dict{Symbol, Any}) at ~/.julia/packages/ONNX/TEIEp/src/load.jl:92

If I understand correctly, there's not a load_node! dispatch for the :MatMul operation?
How can I fix this? Thank you.

@ToucheSir
Copy link
Member

ToucheSir commented Feb 15, 2022

How can I fix this? Thank you.

Put in a PR adding the op 😉

@ToucheSir ToucheSir added enhancement New feature or request help wanted Extra attention is needed labels Feb 15, 2022
@ToucheSir ToucheSir changed the title no method matching load_node!(::Ghost.Tape{ONNX.ONNXCtx}, ::ONNX.OpConfig{:ONNX, :MatMul}, ::Vector{Ghost.Variable}, ::Dict{Symbol, Any}) MatMul op is not supported Feb 15, 2022
@JosePereiraUA
Copy link
Contributor Author

I'm 100% a noob here.

I've added something like:

function load_node!(tape::Tape, ::OpConfig{:ONNX, :MatMul}, args::VarVec, attrs::AttrDict)
    return push_call!(tape, mul, args...)
end

The mul function (mul(xs...) = .*(xs...)) should work?

@ToucheSir
Copy link
Member

I'm not exactly sure what would be equivalent, but if it can match the examples in https://numpy.org/doc/stable/reference/generated/numpy.matmul.html then you're probably on the right track.

@JosePereiraUA
Copy link
Contributor Author

So, according to those examples, Julia does the same thing:

julia> a = [1 0; 0 1]
2×2 Matrix{Int64}:
 1  0
 0  1

julia> b = [4 1; 2 2]
2×2 Matrix{Int64}:
 4  1
 2  2

julia> a * b
2×2 Matrix{Int64}:
 4  1
 2  2

This means the implementation should be:

function load_node!(tape::Tape, ::OpConfig{:ONNX, :MatMul}, args::VarVec, attrs::AttrDict)
    return push_call!(tape, *, args...)
end

Right? I don't know is this makes sense.


Also, I'm trying to load the following .onnx file.
model_C.zip

It first asks me:

julia> ONNX.load("model_C.onnx")
ERROR: AssertionError: Neither initializer, nor argument is provided for input dense_1_input

I understand that I have to input some dummy data? I try to but get a bunch of dimension missmatch errors. I understand that another option is to run ONNX.load("model_C.onnx", exec = false). Can you explain this part a little better, please?

@ToucheSir
Copy link
Member

ToucheSir commented Feb 15, 2022

That's just one example. Vector-vector and >2D arrays don't work in *. One or more of LinearAlgebra.dot or NNlib.batched_mul might be required for the others.

Can you explain this part a little better, please?

I don't understand the API terribly well myself, but have a look at how this is done in the tests.

@JosePereiraUA
Copy link
Contributor Author

That makes sense. I'm not entirelly sure how to implement this. Oh well.

@ToucheSir
Copy link
Member

I didn't mean to discourage you! Perhaps as a start we could just consider the 2D case by importing :MatMul as *. Can you double check that's enough for the model you're working with?

@darsnack
Copy link
Member

I think the correct thing to do would be to define matmul(x, y) similar to mul in this repo. You can define different methods of matmul for different types of x or y. When x and y are AbstractVectors, then it should call LinearAlgebra.dot. When x and y are AbstractMatrix, then it should call *. And when either x or y is a AbstractArray{T, 3}, it should call NNlib.batched_mul. All other combinations should error for now.

In terms of loading the operation

function load_node!(tape::Tape, ::OpConfig{:ONNX, :MatMul}, args::VarVec, attrs::AttrDict)
    return push_call!(tape, matmul, args...)
end

should work.

For running your model, you need to pass in some dummy data. This can be done by initializing an input that's the same size as your input data:

# dummy input that is 224 x 224 x 3 x 1 (the last dimension is 1 batch)
x = rand(Float32, 224, 224, 3, 1)
model = ONNX.load("model_C.onnx", x)

@JosePereiraUA
Copy link
Contributor Author

My model is supposed to work on a 2D case, so I've implemented the * operation only. It solved that particular error. It seems I can't publish my changes as a PR (says I don't have permission).

For running your model, you need to pass in some dummy data. This can be done by initializing an input that's the same size as your input data:

# dummy input that is 224 x 224 x 3 x 1 (the last dimension is 1 batch)
x = rand(Float32, 224, 224, 3, 1)
model = ONNX.load("model_C.onnx", x)

Did you actually download the model and test these lines, or are these just an example? Because I get:

julia> model = ONNX.load("model_C.onnx", x)
ERROR: MethodError: no method matching *(::Array{Float32, 4}, ::Matrix{Float32})

It somehow has to do with the * implementation for the :MatMul.

@ToucheSir
Copy link
Member

You'd need to fork the repo first, create a branch and then file a PR with that. Happy to help walk through the steps if you're interested.

I think Kyle's snippet was just an example. Judging from the MethodError, model_C actually needs the higher-dim case and not the matrix-matrix multiplication that * covers.

@JosePereiraUA
Copy link
Contributor Author

I'm currently working on forking the thing.

I understand that, those dims were not in accordance to what I expected, so I also think it was an example. I've changed the * to LinearAlgebra.dot, still getting dimension missmatch errors which I can't pinpoint. Thank you for all the patience

@darsnack
Copy link
Member

Did you actually download the model and test these lines, or are these just an example?

Yes, these are just an example. Do you know what size input your model expects? These lines will need to be changed to match what your model expects. Or if this is indeed the correct dimensionality for your model, then you appear to need the higher dim matmul like Brian mentioned.

@darsnack
Copy link
Member

Okay, I downloaded your model and took a look. It seems like your input should be "400 x N" where N is the batch size? In that case, the dummy input should be:

x = rand(Float32, 400, 1)

@JosePereiraUA
Copy link
Contributor Author

Yes, the dimensions should be 400 x 1. In the meanwhile I have a PR request to do with these changes, but I don't think it makes sense to do it, since it's not currently working.

Ok, so:
If the matmul is the *, I get:

julia> model = ONNX.load("model_C.onnx", x)
ERROR: DimensionMismatch("arrays could not be broadcast to a common size; got a dimension with lengths 400 and 32")

If the matmul is the LinearAlgebra.dot (which doesn't make much sense to me), I get:

julia> model = ONNX.load("model_C.onnx", x)
ERROR: DimensionMismatch("dot product arguments have lengths 400 and 12800")

Does this give you any useful information?

@dfdx
Copy link
Collaborator

dfdx commented Feb 15, 2022

Try this:

ENV["JULIA_DEBUG"] = Main    # this will help to debug the loading

function load_node!(tape::Tape, ::OpConfig{:ONNX, :MatMul}, args::VarVec, attrs::AttrDict)
    return push_call!(tape, *, args[2], args[1])
end

function load_node!(tape::Tape, ::OpConfig{:ONNX, :Sigmoid}, args::VarVec, attrs::AttrDict)
    return push_call!(tape, Broadcast.broadcast, NNlib.sigmoid, args...)
end

load(filename, rand(Float32, 400, 1))

Your matrices are indeed two-dimensional, but the order of arguments in Python/C and Julia are different (e.g. see the implementation for Gemm, which is just * on steroids).

@dfdx
Copy link
Collaborator

dfdx commented Feb 15, 2022

Also, it looks like we can't load the model with batch size > 1, but it may be a limitation of the saved model itself.

@JosePereiraUA
Copy link
Contributor Author

Try this:

ENV["JULIA_DEBUG"] = Main    # this will help to debug the loading

function load_node!(tape::Tape, ::OpConfig{:ONNX, :MatMul}, args::VarVec, attrs::AttrDict)
    return push_call!(tape, *, args[2], args[1])
end

function load_node!(tape::Tape, ::OpConfig{:ONNX, :Sigmoid}, args::VarVec, attrs::AttrDict)
    return push_call!(tape, Broadcast.broadcast, NNlib.sigmoid, args...)
end

load(filename, rand(Float32, 400, 1))

Your matrices are indeed two-dimensional, but the order of arguments in Python/C and Julia are different (e.g. see the implementation for Gemm, which is just * on steroids).

@dfdx, this worked! I think I had everything the same, but my implementation of the :Sigmoid operation was wrong. Thank you so much! I can even check if it is faster to use Julia & ONNX.jl or directly call Python. I hope it is the first option!

Again, thank you so much. Should I try to push these changes in a PR, or you'll push them directly? 😄

@dfdx
Copy link
Collaborator

dfdx commented Feb 16, 2022

MatMul needs a bit more work to be useful in general case, so let's leave it as is for now. I'm currently close to finishing a large update in some other packages and coming back to ONNX.jl, so it won't be long before MatMul gets to the repo anyway.

@JosePereiraUA
Copy link
Contributor Author

Awesome, can't wait! Great work, and again, thank you so much.

@Moelf
Copy link
Contributor

Moelf commented Apr 10, 2022

what's the status on MatMul? after reading the thread a bit, I think basically we need * for 2D and 2D, and NNlib.batched_mul for higher dimension case? not sure what's the cleanest way to write it.

@dfdx
Copy link
Collaborator

dfdx commented Apr 11, 2022

According to the spec, ONNX Matmul behaves like numpy matmul, i.e.:

The behavior depends on the arguments in the following way.

  • If both arguments are 2-D they are multiplied like conventional matrices.
  • If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
  • If the first argument is 1-D, it is promoted to a matrix by prepending a 1 to its dimensions. After matrix multiplication the prepended 1 is removed.
  • If the second argument is 1-D, it is promoted to a matrix by appending a 1 to its dimensions. After matrix multiplication the appended 1 is removed.

So yes, it's a mix of * and batched_mul, but we need to be careful about order of dimensions and write tests thoroughly. If you feel like that, you can continue this PR or wait for 1-3 weeks until I get to it.

@JosePereiraUA
Copy link
Contributor Author

Hey, sorry to bother, but do we have an update on this topic? I've been using my ops-fix PR, but for some reason it's giving me headaches:

ERROR: LoadError: Evaluation into the closed module `ONNX` breaks incremental compilation because the side effects will not be permanent. This is likely due to some other module mutating `ONNX` with `eval` during precompilation - don't do this.

I have no idea why all of a sudden is giving me this pre-compilation error. Anyone has any clue?

@dfdx
Copy link
Collaborator

dfdx commented May 26, 2022

Do you use eval() anywhere in your code?

@JosePereiraUA
Copy link
Contributor Author

I think ONNX.jl does, that's why it's giving this error.
I managed to fix this without messing with ONNX.jl: instead of loading ONNX models during precompilation, as the next example:

module T
    model = ONNX.load(...)
end

T.model

I load the models during runtime, in the __init__() method, as such:

module T
    function __init__()
        @eval(T, model = ONNX.load(...))
    end
end

T.model

Maybe I'm wrong and doing a really bad thing, I don't know ahah. But it works. For now! ahah

@dfdx
Copy link
Collaborator

dfdx commented May 26, 2022

I can't reproduce the issue using either of these snippets, so perhaps there's something else in between.

Other notes:

  1. Generally, it's not recommended to load resources during module precompilation because it has very few guarantees, __init__() method is indeed a more robust way.
  2. I think you don't need @eval in your second example, models should be perfectly loadable using just ONNX.load(...)

Exact code I successfully used from the branch of this PR:

module T
    import ONNX
    args = (rand(Float32, 224, 224, 3, 1),)
    model = ONNX.load(expanduser("~/data/onnx/resnet18-v1-7.onnx"), args...)
end

T.model

and

module T
    import ONNX

    function __init__()
        args = (rand(Float32, 224, 224, 3, 1),)
        @eval(T, model = ONNX.load(expanduser("~/data/onnx/resnet18-v1-7.onnx"), $args...))
    end
end

T.model

@dfdx
Copy link
Collaborator

dfdx commented May 26, 2022

I added MatMul implementation for several most popular cases in #69. It's incomplete, but perhaps it makes more sense to open a new issue for improvements.

@JosePereiraUA
Copy link
Contributor Author

I can't reproduce the issue using either of these snippets, so perhaps there's something else in between.

That's probably what's happening. I saw somewhere that the @eval allows to the model variable to be on the module scope, so that you can do T.model later on. So that's why I use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants