  I would like to ask, considering using TensorRT plugin or CUDA Kernel to implement more efficient Argmax operations, will it be faster?
I would like to ask, considering using TensorRT plugin or CUDA Kernel to implement more efficient Argmax operations, will it be faster?