There are two places that can cause leaks First, every enqueue calls cudamalloc without releasing https://github.com/NVIDIA/TensorRT/blob/572d54f91791448c015e74a4f1d6923b77b79795/plugin/instanceNormalizationPlugin/instanceNormalizationPlugin.cpp#L211 Second, cudnn_handle will also be created every time https://github.com/NVIDIA/TensorRT/blob/572d54f91791448c015e74a4f1d6923b77b79795/plugin/instanceNormalizationPlugin/instanceNormalizationPlugin.cpp#L217