-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic shapes #3
Comments
NNEF supports dynamic shapes for inputs. See https://www.khronos.org/registry/NNEF/specs/1.0/nnef-1.0.html#external-op |
Yes, however, this is true only for "external". However, for parameters (e.g. |
Since the variables in NNEF come out of training, like weights, and needs to be stored in th container using Tensor File Format, they can’t be dynamic in NNEF 1.0. Do you have a use case that requires dynamic shape support for variables coming out of any training frameworks? |
Not for post-training no. The only use case would be for actual compilers that optimize graphs and cache, as in this way changing the shape would not (most likely not) change the optimized code. |
Interesting idea. Such compilers could use “0” value in the shape to indicate dynamic shape of variables. Since such behavior falls outside the scope of NNEF 1.0, it could be a vendor extension for now. |
@botev just to clarify if I get it right: are you saying that the structure description could be generic, not exactly specifying the shape of variables, and the shape would get fixed when it is bound to some actual data for the weights? For example I could have the same general network structure, but in one data I have 3x3 convolutions, in the other I have 5x5, or the channel counts may be different. As @rgiduthuri says, this could be expressed syntactically (with 0s in the shape, just as in case of externals), but we'd have to see clearly its implications on semantics, such as shape propagation (verification of correctness as you mention). Can you give some more info (maybe links) on how polynomial integer expressions are used for this in practise? |
@gyenesvi Well, the main issue is - nobody uses polynomial integer expressions for now. I have argued a few times people for adopting it as it can make graph optimization significantly more generic and easy. This is something I think would be a nice feature. To clarify imagine you have the following network (pseudocode):
Now when you initialize/construct the model you will need to bind |
Okay, I understand what you mean now. I played around a bit with a similar idea, but thought it would be too complicated to implement (I did not think it through that you end up with polynomial expressions). It would be easy to express in NNEF syntax, since it can have compile-type symbols, something like the following along your example (this is not valid now, because graphs cannot have compile-time parameters (only fragments can), just for the sake of the example, but otherwise the syntax is legal):
So theoretically, it would be possible to deduce from the syntax that the last line is not valid, and raise an error, however, it would require somewhat complicated evaluation and comparison of the shape expressions (essentially creating the polynomial expressions in some canonical form). And as you say, it would require a binding mechanism for the compile-time parameters |
Yes, this is exactly it. In terms of the complexity I have already implemented the integer polynomials module for this, which support operations like a floor, ceil (for convolutions when we have bigger strides) and max, min. The implementation is in Rust, however, it should not be too difficult to port that to C++ - here. I do agree, however, that it is up to you guys to weight it in if that is something you want. Tbh I'm happy to help on this if you do think it is an interesting feature to add in the future. My personal like of it is that in very complicated models, having something to verify your shapes at construction time, rather than run time, is really amazing for research debugging. Similar benefits to having a static-typed variable compared to dynamically typed. |
Thanks, I'll let you know if we consider this in the future. |
@botev, some updates on this one. I believe, that from the point of view of NNEF, enough information is present to implement what you discuss above. Previously, I thought you need some symbolic integers to declare parametric shapes (like 'n' above), but in fact I have realized that in order to perform the checks, it does not matter how those variables are named. One only needs to be able to express that it is not a fixed value, and that is possible via setting some components of the shape to 0 in the
So the real question is whether the NNEF specification should mandate such checks that you mention. I'd say that no, it should not mandate them to allow for simpler parser implementations, however, since all information is present, an NNEF parser/graph compiler may further check whatever it wants to. So I'd say the above NNEF document would be declared valid by the spec, but parser/validator tools are free to implement further checking. For example a tool that realizes/proves that it will actually never result in a valid graph, no matter what you substitute for the batch dimension could be useful. How does that sound to you? |
I think that is a valid point and it makes things a lot easier. However, note that the inverse problem (e.g. given all of the graph operations deciding what values of |
Sure, I can imagine that the problem may become easier if you express the unknown (here batch) dimension with a formula that guides the validation. However, what if in the above example, instead of writing (Actually, I believe that your original example above should contain |
You are right my example actually is quite bad at showing the issue I described. So let me remake the example to make this clear:
I think what you are refering is the frist error, which I think is easy to check as long as you have constant shapes. However, the ones like the second are the more general and difficult to catch. Hope that makes it a bit clear. I personally prefer to be able to solve both of them, not only the first. However, that does require the user to specify some constraints on the unknown shapes. |
I was actually referring to the second error (sure, the first is easy). My point was that the example is the same if you write the following (
I guess if I was the user, I would expect these two formulations to be equivalent, coming to the same conclusions about validity, which means that you have to prepare your implementation for cases where you do not have the luxury of constraints on the unknown shapes. And if you have to prepare for the hard cases, you don't win too much by having some cases simplified. On the other hand, not preparing for the hard cases feels like a half solution. I think if your intention is exactly to catch errors made by the user, you cannot rely on the user providing you some hints to ease validation, that sounds kind of counter-intuitive to me. Or am I still missing something here? |
So if there is a symbolic expression of |
So how would the To see this in a real example:
Here you can't reason about the addition being valid, because the two input shapes could be unrelated, however, if we could write:
Then the user explicitly states that the two are the same (although unknown), so you can reason about it and perform more checks. Is this the essential difference that you'd need to let you perform all checks that you'd like? |
Yes, that is indeed what I mean. And to I think that these constraints are quite reasonable to be specified as well as being very simple. E.g. things like both things are from the same batch, or m,y convolution kernel dimension is equal to that of the input etc... Libraries on top can as well use the shapes to create new tensor with matching, but unknown shapes. This is a bit like compile-time checks vs runtime checks in programming languages. |
Okay, finally we got the essence of it, I agree that such a description carries more information and is something that users or high level libraries could naturally express. For now, the graph can only have tensor parameters (that's why it does not need to be marked as such), so this would require allowing non-tensor parameters as well. We'd have to see what that entails, but at least now it is clear in which direction the syntax would require extension. |
I have recently been experimenting with describing and validating output shapes in NNEF, and not surprisingly, arrived to integer polynomial expressions, so it might be interesting to understand this in more detail. The first question we have to answer is how does this exactly help compilation. The reason I have been looking at this is validation of a parameterized description (such as a single, possibly compound operation, or alternatively a whole graph) for structural correctness. Also, you mentioned above that you know of some use cases for optimization, could you elaborate on that just to have some motivating example? Second, in order to estimate the complexity of implementation, it would be good to clarify what properties/operations are imaginable/required on integer polynomials (for example in mathematical, implementation independent terms). From my experiments, I have gathered the following:
For the examples I have encountered so far, multiplication of two polynomials were not required, so each monomial could be of degree 1. Can you give an example when multiplication of polynomials is required? Any further operations that you found necessary? For example, looking at symbolic polynomials library you linked, I don't quite get the meaning/utility of floor/ceil and min/max operations, if they would be required here. Could you explain these? Also, maybe some implementation tricks if any. A canonical form of polynomials with maximal degree 1 can be easily implemented as a map from strings to signed integers. These maps can then be simply compared for equality. How much more complicated would it get if higher degree/polynomial multiplication/division is needed? |
It turns out that for the fully general case, we do need multiplication and division of polynomials, which leads us to multivariate rational polynomial expressions, i.e. the ratio of two multivariate (integer) polynomials. So in the end we would need to test equality of two such expressions. Building those polynomials from arithmetic expressions seems not too complicated. Furthermore, a seemingly simple way to perform comparison is taking the difference of the two expressions, transforming them to common denominator form, and seeing if the nominator becomes zero (after terms cancelling out each other). This seems doable. Any experience on this? |
Indeed as you have arrived at this the main purpose of it is validation of parameter (and graph) descriptors that they conform to correct shapes. In terms of optimization this can be used for a optimizing memory allocator as it will potentially allow you to have an explicit polynomial expression for the storage of all tensors in the computation and together with arithmetic scheduler can build explicit division of memory ahead of execution, even if that is dynamic (e.g. input shapes can change) the division of the memory can still be done. This is quite involved and most likely you don't need to be bothered with it. So multiplication and division is usually are used during reshaping. I'm not sure that you need rational polynomial expression though, what would be the case where you have something that is not an integer? |
Just to clarify: in a rational polynomial expression (or function) everything is still integer. It is not the same as a rational polynomial, where the coefficients are rational, but instead the ratio of two integer polynomials. It is required exactly for things like stride in convolution, for example a simple formula for the output size of a convolution is:
which is the ratio of two integer polynomials. Here Your approach to formulate divisibility assumptions seems an interesting approach, although I find it a bit unnatural on the user side, and it may lead to parsing difficulties as well, as it would require the declaration of a shape to be an expression, like Divisibility assumptions may also be given in the form
and so we can see that the convolution output shape is as we expected. That was my idea for approaching this. |
This is the main reason why in that library there exists
Deducing integer polynomial constraints is (I think) generally NP-hard hence why I think for any real purposes this has to be something that the user provides. However, as a researcher and user of libraries, I'm very comfortable (and even prefer to) provide myself with all assumptions about the data that a model needs for correctly inferencing shapes (e.g. that some size needs to be of the form
I think this is equivalent to E.g. using your example, the user says that the shape of the convolution is |
I agree that it's okay if the user provides the assumptions about the model, what I don't see clearly is the actual means to do so with your approach. Can you give me more concrete examples of how this would work out in practise? Here are the bits/cases I want to be able to describe. Let's take a concrete operation, like
How would that look in your approach, given that it must be defined for all input shapes, not just the cases when there is a divisibility constraint. Would it be something like
where / means real division? Or alternatively, with an expression The next bit is when I have a graph, and I want to define it's input size with some divisibility constraint, and I want to express the output size with a formula. Say, I have a single strided conv op in the graph, I substitute 2 for the value of stride, and I want to express that the input is divisible by 2. I can write that the input size is How is the floor and ceil represented in your approach? Is it actually the ratio of two polynomials, with the additional info of knowing whether the division is floor or ceil? Or is the key here that in the end when you have to do equality comparisons, the division always cancels out, and you are left with comparing simple polynomials? |
Indeed in the approach taken in the library, there are several "primitives" - a monomial, a constant and a ceil/floor operation. Essentially, you create a special "monomial" primitive that contains information about the quotient and dividend and whether its a floor or ceil (I think they are actually two different types).
The system as it is even atm whenever you call ceil/floor with two polynomials, it always first checks if they are divisible, in which case returns the polynomial exact division and if not constructs the new primitive. E.g:
This basically allows for such a comparison to be done correctly. I even think that it simplifies the expressions, e.g. |
As an example I've added to the README in https://github.com/Metadiff/symbolic_polynomials:
You get:
There exist also |
Thanks, that clarifies some things about floor/ceil operations. However, I have some further bits that are not clear. If floor and ceil are separate things, how do you define operations on floor/ceil objects? Say, addition/multiplication of floor/ceil with a regular poly, or even addition/multiplication of a floor object with a ceil object, like What I like about a simple ratio of polynomials (essentially a Furthermore, ratio polynomials can be compared for equality without the need to do actual polynomial division and simplification, which would require algorithms, like polynomial gcd, that are quite complicated, especially in the case of multivariate polynomials. I understand that it is possible to implement it, but may be too much burden for an NNEF consumer. To make this work in NNEF, it should be possible to clearly define how expressions can be converted to polynomials, how operations are defined on those polynomials, and prove that it is always possible to decide equality of two expressions via polynomials. With ratio polynomials, this seems to be true in my view. Have you done some analysis in this direction using the floor/ceil formulation? |
You can imagine that the Floor and Ceil entities are treated as separate irreducable variables. Mathmatically is like saying if we have variables
Then a monomial is defined as
And a polynomial is just a vector of monomials, representing their sum. Hence, you can build polynomials of these composite expressions in arbitrary form.
That's a very good point which have not crossed my mind before. Hence, it is probably enough to define only one of these as a Composite and the other being constructed in the manner you described.
This is a good point, however you will need to represent standard polynomials somehow separately from irreducable ratios (similar to how the |
Okay, now I understand that you treat those sub-expression as irreducibles, and that means that you cannot further simplify the expression. I agree that it is probably necessary to have such an irreducible, so you need to have two kinds of ratios, one reducible ( However, I still want to establish some relation between the two, because it is needed in some cases to prove equalities. For example, imagine two operations, one downsamples the input by a When checking the validity of that compound definition, combining the formulas of the two ops, we get We could use the identity Or alternatively, as you proposed, we can state in our compound operation definition that the input and output size is in the form I find the first approach ( |
Actually, in the |
I think that the two approaches are on some level "equivalent" hence I'm not convinced that one of them leads to a very hard problem (polynomial gcd) and the other does not. PS: did that close by mistake |
Actually, maybe you can clarify in what curcumstances you envision to be required to do GCD in my approach to clear that out, as I think there we have some form of misscomunication. |
I agree that the two approaches may be equivalent and it would be good to see the difference if any. I guess you still wanted to format this example, but something went wrong, I cannot interpret it, there seem to be unfinished lines and unused names. |
Yeah the example I started writing I thought that it would need GCD in my approach, but after I ran it in the library I realize I was wrong and it still worked. I actually don't think even in my approach you ever will need GCD, instead polynomial division is enough. |
When you say you can get away with simple polynomial division, do you mean that you perform the division and throw away the remainder? And calculate on with the quotient? How easy is that to do in the multivariate case? I have never implemented it, but did not seem as simple as addition and multiplication. |
Ok so maybe this is a somewhat reasonable example, however it still works and I explain below why:
Now the first expression Could you maybe follow how you see this would work out in your construction? |
I defined equality exactly the same way for ratio polynomials as you do above to avoid requiring to get the exact same structure for the two polynomials, but I define it for the true ratio, not for the floored one, and I believe that is when you can actually do that trick, and not as you apply it. To represent Does that make sense, or am I mistaken somewhere? |
I think I started to see your point and you might be right that this is an easier and potentially better repreresentation. Let me experiment a bit with it on that library during the weekend and see if it comes out as expected.
This is true, however since we work with polynomials, which have variables one can not do this in general except with constants. The equality I have defined guarantees that Floor(a, b) = Floor(c, d) for all possible values of the variables involve. It is a stronger notion then that for some specific set of values the two are equal. I think for constants the floor operation can actually do the floor and return the correct integer. |
I think the essence of the above counter example is that since a constant is a special case of a polynomial, if I find a counter-example for constants, that is also a counter-example for polynomials. But here's a counter example using proper polynomials. Suppose I am given that
If I follow your rules, then I can substitute
which does not hold in general (for all Furthermore, I believe you can do the same rewriting and calculations as above with polynomials too, not only with constants (since all operations, including modulo, are defined, and are generalizations of corresponding integer operations). And that stronger notion will hold; the equality will hold for all possible values substituted for the variables:
is rewritten to
which holds iff
Doing that for the above example, we have (given that
which seems to be true, assuming Let me know what comes out of your experiments! |
So one of my concerns regarding this is that when you have more complicated expressions it would be very difficult to "proove" that something is still an integer. Consider the same example with
let's add to this
How do we check that this expression is indeed integer? Removing all parts of the dividend that are divsible by As an a more concrete example is |
Maybe it would be nice if we have several test cases which I can test. Other than that I think I have implemented what you were discussing an approach, but at this point, it is a bit hard to reason what the system could or not infer without having explicit examples. |
Sorry for not responding for so long, I have been out of office for a while and forgot a bit about this thread.. I have to agree that things can get wild quite easily, and we may end up with problematic cases (as I said previously, we might need more rewrite rules related to modulo calculations to prove equivalences, how many, I don't know). Basically, as I see, the cases that we cannot decide whether to be integers or not basically result in cases in which there might be some simplifications via some equivalences that we don't discover, and hence we fail to derive that two formulas are the same. In general, we have two options: either try to aim for a representation/rewriting rules that provably always works to decide equivalences of two formulas (the existence of which I highly doubt at this point), or just embrace the fact that some equivalences cannot be proven, and work with that. This could still be useful, if all implementations would use the same rewriting rules and same algorithm for simplification, and would declare two formulas equivalent or not in the same cases. Then we could just say that a network description is valid if the algorithm can prove the required equivalences, and invalid otherwise (meaning that it is truly invalid or the validity cannot be proven by the tools we have). I agree that it would be good to see a set of examples, that we think the system should be able to handle (decide equivalences of formulas). What I wanted to use this for is to decide if the definition of intermediate or output shapes in a (sub-)graph is valid. Suppose that each operation has an output shape definition (via a math formula). Take a sequence of operations, and compose a shape formula at the end of the (sub-)graph by composing the shape formulas in the sequence of operations. Is the math formula for a composed shape consistent with some probably simpler definition of that shape at the output of the (sub-)graph? I think it should be easy to enumerate quite a few interesting cases by composing just a few operations into a graph, defining the output shape with a formula, and see if that definition can be decided to be consistent with the sequence of operations inside the graph? Like my example above (jan 9) for downsampling and then upsampling an input is a small sub-graph of two operations. |
So, pretty much for similar reasons I had my own view on this with the "extra entity" for One can potentially resort to numerics when things get too harry - e.g. compare the results for 10-20 random "base" integer values. Of course this will never be fully correct, but potentially can still catch wrong shapes often. |
Reading the current spec it seems that NNEF supports only fixed (constant) shape sizes. Is that correctly understood or not and if so do you intend to support dynamic/variable shapes?
The text was updated successfully, but these errors were encountered: