For Q1, there is no guarantee the serialized engine will work on different platforms, as the optimized graph uses kernels specific to the GPU. If you are running on different machines with the Same GPU architecture/OS it should work. You can support multiple GPU's as backends, with multiple saved serialized engines with TRT-IS (Inference Server) if you are trying to support a larger collection of GPU's as one suggestion.
Originally posted by @CMagWheels in #70 (comment)
For Q1, there is no guarantee the serialized engine will work on different platforms, as the optimized graph uses kernels specific to the GPU. If you are running on different machines with the Same GPU architecture/OS it should work. You can support multiple GPU's as backends, with multiple saved serialized engines with TRT-IS (Inference Server) if you are trying to support a larger collection of GPU's as one suggestion.
Originally posted by @CMagWheels in #70 (comment)