New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Relay][VM] JIT #4129
[Relay][VM] JIT #4129
Conversation
feb2bdf
to
1da6f85
Compare
python/tvm/relay/op/_transform.py
Outdated
@@ -50,11 +51,13 @@ | |||
_reg.register_schedule("strided_slice", schedule_injective) | |||
_reg.register_schedule("slice_like", schedule_injective) | |||
_reg.register_schedule("split", schedule_injective) | |||
_reg.register_dynamic_compute("split", True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can ops automatically register dynamic compute? It seems like this would be a useful feature for every op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can make it by default true, because the false case will break immediately, then we can add annote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think that's a little cleaner, but we can wait to see what other reviewers think :)
auto new_func = Downcast<Function>(InferType(new_func_untyped, Module(), GlobalVar())); | ||
auto key = CCacheKeyNode::make(new_func, target); | ||
CompileEngine ce = CompileEngine::Global(); // WHY cant I use engine_? | ||
auto jit_pf = ce->JIT(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought - would it be worthwhile including an option for users to specify a range for each dimension of a shape that can be Any
, then we can pre-compile all of those? It will use more memory, but save latency at runtime.
Alternatively, we can have the user supply a max value for each dimension that can be Any
, then use that max value everywhere and pad when necessary. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to leave it as future work - it is indeed possible, however they are more complex, and have different options (range? max value and pad?). Meanwhile, this make it possible to compute for dense, is simple and allow further extension that does those. Sounds good to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, I think we can leave it for a later change.
I'm under the impression that for operators that accepts/returns dynamic shape tensor, we just need to implement the shape function. Please correct me if I'm wrong. @icemelon9 |
@wweic yes, however some schedule does not accept dynamic shape tensor, such as nn.dense |
How do we pick a descent dense schedule at runtime? My feeling is that JIT compilation done in this PR can be covered by AOT compilation, like in #4118. |
@MarisaKirisame Got it. I think it might be better to fix the schedule for |
@MarisaKirisame I don't think tune a dynamic shape kernel is related to JIT. |
@kevinthesun they both have their own merits. Both approach can definitely coexist and we can have some LoweringStrategy option which pick which to use. |
@kevinthesun we cant tune symbolic shape, but in JIT, there is no symbolic shape/any shape/dynamic shape. All that's left is purely concrete shape, and we can tune them individually. |
But how do we know which shapes will come in? In AOT, we can split a symbolic axis into several buckets and AOT tune/compile kernel for each bucket. I don't see how we can achieve this in JIT. |
@kevinthesun there will be two run of the same program. In the first run, you tune the schedule beforehand according to the stdout output. |
@MarisaKirisame The difficulty for dynamic shape kernel is that we cannot predict which concrete shape will come in at runtime. In the zero run, only a sample of concrete shape comes in, and in the first run only this sample is tuned. There can be a lot of different shapes coming in and a lot of tuning have to be done in runtime. We cannot tune every concrete shape because both of time and memory complexity. Also if we want to tune kernels, we should do this even before compilation. |
@kevinthesun it depend on the work load. For my current work load, only the batch_size is dynamic, and it is either 32/16, so the problem is easily solved. |
I have some concern at this PR as it is a temporary solution and is limited to your current workload. It's definitely possible that the workload is fully dynamic, such as batch size, sequence length, output of nms and arange op, etc. |
This method only works for very limited cases, but introduces extra complexity and dependency to VM. I'm not quite sure if this is the way we want to go. |
@icemelon9 @kevinthesun the 'your current workload' include all classical vision model (resnet/densenet/vgg), and all classical nlp model (treelstm/lstm, as they already use ADT). That leave only bert/transformer uncovered, which AFAIK did not exist in tvm yet. |
This looks a very limited use case to me. |
@kevinthesun batch size is universal in all vision task. My current workload is just a single datapoint of all the workload this PR will bring. |
current fail doesn't mean we cannot support this in the future. We can have better approach to support symbolic shape. I don't think we should merge a half-bake solution which will be replaced or deprecated soon. and, we should move the discussion to https://discuss.tvm.ai |
It looks to me that your use case is to support some fixed batch sizes. This definitely doesn't cover most actual use cases for dynamic shape in practice. |
Agree we should move this discussion to forum or an RFC issue since there are some fundamental issues to be resolved. |
close due to inactive status for now |
Right now relay vm dont support any/symbolic shape with dense, because dense schedule/compute require input shape known as constant beforehand.
This PR add polymorphic inline JIT, so if a function cannot be lowered at compile time, for each new shape we will encounter, we will compile a tvm kernel at runtime, and invoke it.
@wweic @jroesch @junrushao1994 @icemelon9 @vinx13 please review.