-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make rule for multiple calling of nnfw_set_input_tensorinfo() and nnfw_run() #4625
Comments
In MNN, dynamic shapes are not supported, and changing input triggers update: tensor shapes are reinferenced. tensor and internal memory is reallocated, and related internal preprocessing of operators is redone. Hence, it is possible to do the same in ONE when only input shapes are changed, and all internal shapes may be reinferred statically. In MNN, operator configure is two-step, first step done when building graph and second step is done after the shape inference. In ONE, configuration now is single-step, this step is executed after shape inference, but before the operator sequence is built. To reduce reconfiguration overhead, two-step configuration would also be necessary. |
@hyunsik-yoon I like simple rule or no rule. However considering some model like your target, for performance, rule 1 and 2 looks reasonable. However, there may be several cases:
It would be good we can remove performance penalty of |
If BTW I would like to clarify that we have 3 situations:
About 3rd case, some models require some tensors to be dynamic. For example, when
Could you please elaborate what it does for each step?
Please note that we also have |
The rules look good to me. It is consistent with other APIs, for example, the buffer that is used for |
@glistening, for #4625 (comment)
For example, assuming an input is [batch, 100], tflite file will have input shape [1, 100].
So if we infer the model with well distributed batch size, 80 ms (static) or 100 ms (dynamic) when batch == 1 may not be that important. However, if a model is used mostly with batch == 1 and sometimes batch > 1, considering what you mention would be important.
You mean, changing only input dtype without changing shape? Good point to consider.
So I guess we will be fine for some time being. :-) |
In MNN, different operators are not allocated between threads, instead each operator uses OpenMP to utilize all threads. In ONE cpu backend, not all (or probably even none) operators support data-based parallelization, so running networks in batch mode in ONE cpu backend look like a doubtful idea anyway. When using MNN-based applications, I preferred creating several single-thread contexts for the same network and run them in parallel with batch=1, to reduce latency and avoid batch-managing logic, even though it was not memory-effective (still, in theory, it is possible to implement special logic for that in ONE). Collecting batches is more necessary when utilizing GPUs, which are not always good on small batches. In batch mode, the batch size is typically varied randomly between 1 and some maximum value, depending on some external activity, i.e. batches are collected within some timeout and sent to inference once collecting timeout is exceeded or maximum batch size is reached. In this case, we could in theory allow user to set maximum batch size, so we could allocate tensor memory only once, but that is all the benefit we get,shape inference, etc. will have to be redone anyway. Besides, reconfiguration overhead is not typically expected to be so important in batch mode. Dynamic models are unlikely to be used in batch mode at all, except when they have been specially designed for that, because a tensor size has to be the same for the whole batch. Since dynamic part of the model may terminate on some constant shape output operator like Reduce, the whole downstream could again be inferred statically, I guess such case is not supported in ONE for now, but instead, once any dynamic tensor is met, all the downstream tensors up to the very end become dynamic. Since changing tensor size requires some memory-management and shape-inference overhead anyway, it becomes not so important, whether it is done statically or dynamically. So, rerunning static memory allocation and shape inference on input change instead of marking everything dynamic would be good idea, besides, static results may be reused between runs, while dynamic inference is rerun every time. And, I guess, it is not so impossible to change ONE logic to two-step configuration. Considering again #4544, I think abstract About dynamic data type, I think now cpu backend operators do not support this, as they often read data in In MNN, two-phase configuration consists of operator constructor run when building graph, and OnResize method that configures the network to specific input shapes, including reallocation of internal buffers. Since MNN is static-shaped, all OnResize methods are executed during shape inference and memory allocation phase. For example, non-quantized binary ops are implemented in |
I am thinking of last year model's requirement you've told me. Resizing the input size (row and col), not batch size.
Your reference seems nnapi specific. We may have same limitation on our nnfw API. |
I am not sure exactly what's your concern but I guess you're mentioning a model of which input in tflite file is, e.g., [1, 128, 128] but its actual input is, e.g., [1, 512, 512]? (some model we mentioned for input reshaping (or input resizing)?)
Right. I referred to nnapi just a reference of other framework. We haven't discussed if we can change dtype dynamically. Also our friend nnapi also assumes that dtype cannot be changed. So, for now how about only considering shape? |
Right. If it has been working as
Yes, it looks good. |
For this year, how about finalizing rule as mentioned in #4625 (comment)? |
I still have not got what this discussion is all about. Right now, Once Yes, dynamic tensors require somewhat more runtime managing overhead, than static. But in principle, both dynamic and static tensor management do same things: shape inference and memory management. And indeed, static memory management may be more effective, and it depends on statically derived shapes, which is possible to do when only inputs change. Dynamic approach is good for networks which are dependent on it. But in most practical cases they do not (or do, but not much). And so, rerunning static management on changing the input size would be the best approach. So what I propose is: redesign runtime to make possible redoing static shape/memory management between runs when an input shape changes. I think this will bring most benefit in most practical cases. When user wants to alternate within a fixed set of input shapes, and shape inference/memory management step becomes a performance stopper for him, he can just create several sessions at the same time, one for each input shape. |
I agree with what you mentioned. Also I definitely agree that we always have to enhance ONERT with better design. Meanwhile, we can also initiate discussion about better design, which will take more time in discussion and implementation. So IMHO, we'd better discuss the opinion about new design in a separate issue since it has different time-frame. How about creating an issue for your idea? |
What I see there is a bug (wrong comparison), that makes setting the input shape back to original impossible. Indeed this bug needs to be fixed, but this discussion topic is definitely not about this bug, but about something more global. |
Repeating my opinion from #4718 (comment) Using the fact of resizing an input as an explicit signal of making tensor dynamic and trigger dynamic shape inference is a bad idea. Although I don't see a reason for now of why the user may want to turn on dynamic inferring explicitly, but even if such a reason existed, this switching should be accomplished by an additional flag parameter or a separate API function, rather than by just the fact of resizing an input. Teaching a user to rely on such a fact may introduce backward-compatibility issues in the future. Now it is just our implementation-related problem that resizing an input activates less effective dynamic mode, and this may change in the future. To avoid involuntary switching on dynamic mode when not necessary (say, user application gets batch size from some external source after the model is aready prepared) the user will have to introduce explicit shape comparison in his code that will make the code more complex. Also this optimization logic in user code will turn out obsolete if we change our input resizing implementation. So I think such an optimization would be more appropriate in ONE code, rather than in user code. And so, keeping the static mode when possible is the best choice. |
In short term, we have our current version of API. We have to make this API work. If onert crashes, we have to fix. (#4718) Let's not mix short and mix terms together. Good idea for mid-term work is always welcome but that does not mean that we should let onert crash without fixing existing bug or spending long time to fix it. I would like to propose to discuss short term fix in #4718 and longer term topic here. |
@hyunsik-yoon Indeed, fixing the bug is essential, I'll discuss the bugfix in #4718 . I'm just against standardizing side effects of API calls at the moment, like using resizer API call to signal something implementation-specific. |
% FYI I did not fully followed this up, I continued reading from #4625 (comment) .
@krayzemli I respectfully disagree. The thing with static/dynamic tensors are black box for users. When we first decided introducing As implied from @hyunsik-yoon , "running correctly" is the first thing we care. So I think he is taking the optimization back that caused problems first. I think eventually it will work great when the runtime gets smart enough. |
The problem of PR #3998 is just the comparison in the wrong place. PR #4748 is the way I would fix the problem without removing the optimization.
Now looking at the start of this thread, I see the following:
And that's exactly the standardizing of a side effect: signaling some 'intention' with |
Good point. I agree. However, we also need to inform the user that there are performance degradation after |
@wateret @krayzemli , besides what @glistening suggested, any other suggestion on the current version of API ? |
@krayzemli I am still not sure if I got you right, 😅 About "side effect", it looks like we understand this word differently. I see your way minimizes redundant shape inferences, and it looks nice. But I still think it is just about whether optimization is applied or not. Say, modern javascript engines have tiered JITC then how user writes code would have "side effect" and i think our situation is just like that. What I understood is that @hyunsik-yoon tries to fix the crash first as applying the optimization(MNN way?) looks not so trivial. Are you saying that we have a simple fix(#4748) without disabling optimization, but why would we detour? Or are you insisting that we need better API? Then could you please elaborate a little? |
@wateret I think Rule 1 is bad.
That would be better. I'd tell something like: changing the tensor shape before |
@krayzemli yes, it is what I understood by "side effect" you mentioned. I prefer this one. @hyunsik-yoon, @wateret Do we have some code or reason that we must assumes rule 1? |
I would like to clarify again for people since there are many talks in this discussion and tough to follow: Current implementation and what problem we have (ONE runtime crashes):
Short term solution I propose (without API modification):
Mid-long term solution (API modification could be considered):
Please refer to the first comment in this issue. :-) |
@hyunsik-yoon Could you please point out the statement? I read the first comment. But I cannot find the answer. |
@glistening Maybe #4625 (comment) would be more helpful.
To communicate with users, I agree. @krayzemli |
@krayzemli I am against introducing this API since as I said before:
I also agree. Actually I was not aware that the document says something about static/dynamic tensors. That's definitely my bad as a maintainer of NNFW API. That may have caused all the misunderstanding between us. So if we remove dynamic/static tensor stuff in the comment, would everything be alright? And putting the words @glistening suggested("there may be performance penalty by calling |
It is not so bad that API mentions dynamic and static modes, since there is a fundamental difference between them: in the dynamic mode, an output shape can't be determined in advance, i.e. user can't rely on querying the shape from TL;DR: User must use |
@krayzemli Thank you for several suggestions. I am afraid that discussion in this issue seemingly went too further from the subject of this issue's creation time. We have to solve the confronting issue. What about creating new issues for your suggestion? My suggestion: Rule 1 - It will be internal Rule 2 - It seems okay both in internal developer and API reference. |
I agree.
I agree (with the current version of API). |
Actually I think this is WAY BEYOND the scope of the discussion here. But If I'd share my opinion... It is true that user must guess the buffer size and that is not so elegant. Here's some reasons I remember.
And the alternative way would be defining an alternative for
Anyways we could switch to this or have both, if we need to. |
In my personal opinion, I don't agree, due to the reason as I kept saying - I don't think our target users need to know about that stuff(or we may need to make them understand). So my opinion is removing all about the static/dynamic tensors in the API documentation. |
@wateret I agree. Then, we may need to remove the mention about ONE/runtime/onert/api/include/nnfw.h Lines 230 to 247 in 1a52839
Then, what about?
|
@krayzemli Could you let us know your opinion? |
Moving static/dynamic mode details from the API description to a separate technical article would be a good idea. In future, we may want to introduce control options that influence shape inference and memory management, and this technical article would reflect this. However, now any changes in API description without changes in API itself will in vain attract user attention. So I would postpone it till some change/extension of the API. |
@krayzemli , I would suggest writing a Design Documents or RFCs for this purpose. It will be a draft of detailed design for implementation, it could be a technical document as it is later, or it could be a user guide with a little touch. Just organizing the discussions so far in this issue would be a good example. :) Unfortunately we do not yet have an official guide to this. Perhaps through this attempt, we can make a guide together. I have written down some examples of my experiences so far, so please refer to them. .NET Design Proposals
TensorFlow RFCs |
This issue is not active for a month. |
Rule for multiple calling of
nnfw_set_input_tensorinfo()
andnnfw_run()
is not clear.How about the following rule?
/cc @Samsung/nnfw
Rule 1)
nnfw_set_input_tensorinfo()
will always set its input tensor dynamic (see case A) and B) in the example below)Rule 2) Once
nnfw_set_input_tensorinfo(new_shape)
is called,new_shape
will be used for the successive calls ofnnfw_run()
until anothernnfw_set_input_tensorinfo()
is called. (see case C) and D))For example: assuming a model has an input of [1, 128]
FAQ)
Can we run a model with static input tensor in case of case A) or D)?
#0
meanstensor with index 0
)nnfw_prepare()
, [CI] push-android-runtime-build failed! #3 and [CI] nnpackage-test failed! #4 become dynamic and others are static --> state A)nnfw_set_input_tensorinfo(#0, [2,2])
andnnfw_run()
, all tensors become dynamicnnfw_set_input_tensorinfo(#0, [1])
andnnfw_run()
again to run with static input tensor?nnfw_prepare()
: code will be complex and more memory is requiredI will start fixing code for this rule after listening to opinions.
The text was updated successfully, but these errors were encountered: