-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable layer fusion optimizations? #252
Comments
Hi @jasonliu19, Sounds like an interesting problem, I don't think that should be expected to happen. Could you provide a simple model and script to reproduce so we can look into the underlying issue? We don't currently expose turning off layer fusion. Edit: Also just for completeness, can you share your environment info from the issue template? |
Added my environment above. import numpy as np
import tensorrt as trt
import torch
SEED = 123
torch.manual_seed(SEED)
np.random.seed(SEED)
def test_output_equality(output_base, output_diff):
"""Test model output equality."""
for k in range(len(output_base)):
base = output_base[k]
diff = output_diff[k]
assert base.dtype == diff.dtype, "dtypes do not match {} != {}".format(base.dtype, diff.dtype)
assert base.shape == diff.shape, "shapes do not match {} != {}".format(base.shape, diff.shape)
total_count = base.numel()
epsilons = [1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6, 1e-7, 1e-8]
print("---Output {}---".format(k))
for idx, epsilon in enumerate(epsilons):
# Count how many tensors are different from base, using the provided epsilon threshold
failed = torch.gt(torch.abs(torch.add(base, -diff)), epsilon)
diff_count = torch.sum(failed).item()
diff_percent = float(diff_count) / float(total_count) * 100.
print(
" Epsilon {}) base and diff are different for {} values ({:.2f}%)".format(
epsilon, diff_count, diff_percent
)
)
print("Max difference:", (base - diff).abs().max())
def add_scale(network, trt_input, shift, scale, power, dtype):
scale_layer = network.add_scale(trt_input, trt.ScaleMode.CHANNEL, shift, scale, power)
scale_layer.precision = dtype
out_tensor = scale_layer.get_output(0)
return out_tensor
def compare_torch_and_trt_scale():
scale_a, shift_a, scale_b, shift_b = (
np.random.randn(64).astype(np.float16) * 3,
np.random.randn(64).astype(np.float16),
np.random.randn(64).astype(np.float16) * 3,
np.random.randn(64).astype(np.float16),
)
logger = trt.Logger(trt.Logger.VERBOSE)
builder = trt.Builder(logger)
builder.fp16_mode = True
builder.max_batch_size = 1
builder.strict_type_constraints = True
network = builder.create_network()
torch_input = torch.ones((1, 64, 512, 512), dtype=torch.float16, device="cuda").contiguous()
trt_output = torch.zeros((1, 64, 512, 512), dtype=torch.float16, device="cuda").contiguous()
trt_input = network.add_input(name="input_0", shape=tuple(torch_input.shape)[1:], dtype=trt.float16)
trt_input.location = trt.TensorLocation.DEVICE
scale_out = add_scale(network, trt_input, shift_a, scale_a, trt.Weights(trt.float16), trt.float16)
scale_out = add_scale(network, scale_out, shift_b, scale_b, trt.Weights(trt.float16), trt.float16)
scale_out.name = "output_0"
scale_out.location = trt.TensorLocation.DEVICE
scale_out.dtype = trt.float16
network.mark_output(scale_out)
engine = builder.build_cuda_engine(network)
bindings = [None] * 2
bindings[engine.get_binding_index("input_0")] = torch_input.data_ptr()
bindings[engine.get_binding_index("output_0")] = trt_output.data_ptr()
context = engine.create_execution_context()
context.execute(1, bindings)
print("Pytorch no fusion vs trt")
scale_a_torch = torch.tensor(scale_a, dtype=torch.float16, device="cuda").reshape(1, -1, 1, 1)
scale_b_torch = torch.tensor(scale_b, dtype=torch.float16, device="cuda").reshape(1, -1, 1, 1)
shift_a_torch = torch.tensor(shift_a, dtype=torch.float16, device="cuda").reshape(1, -1, 1, 1)
shift_b_torch = torch.tensor(shift_b, dtype=torch.float16, device="cuda").reshape(1, -1, 1, 1)
torch_output = torch_input * scale_a_torch
torch_output = torch_output + shift_a_torch
torch_output = torch_output * scale_b_torch
torch_output = torch_output + shift_b_torch
test_output_equality([trt_output], [torch_output])
print("Pytorch fusion in fp16 vs trt")
scale_fused = scale_a_torch * scale_b_torch
shift_fused = shift_a_torch * scale_b_torch + shift_b_torch
torch_output = torch_input * scale_fused + shift_fused
test_output_equality([trt_output], [torch_output])
print("Pytorch fusion in fp32 vs trt")
scale_fused = scale_a_torch.to(torch.float32) * scale_b_torch.to(torch.float32)
shift_fused = shift_a_torch.to(torch.float32) * scale_b_torch.to(torch.float32) + shift_b_torch.to(torch.float32)
torch_output = torch_input * scale_fused.to(torch.float16) + shift_fused.to(torch.float16)
test_output_equality([trt_output], [torch_output])
if __name__ == "__main__":
compare_torch_and_trt_scale() |
Repro'd
|
Also repro'd with TensorRT 7.0 + PyTorch 1.3.1. Might be related to #305 |
Hi @jasonliu19, Sorry this is super late. Two things: 1. Disabling Layer Fusion It turns out there is a workaround to disable layer fusion for debugging purposes. When you mark a layer as a network output ( 2. Pytorch vs TensorRT Fusion Output Differences TensorRT computes new scales while fusing in fp32 precision, which is exactly what's happening in the third comparison (Pytorch fusion in fp32 vs TRT), which is why the results are the same. |
Hi @rmccorm4 Is there any method to prevent the host latency from increasing? |
Is there a way to disable layer fusion when building an engine? I'm facing some correctness problems when scale layers are fused together in fp16. Disabling fusion to help debug this issue would be useful.
Environment
TensorRT Version: 6.0.1.5
GPU Type: GTX 1080ti
Nvidia Driver Version: 418.87.00
CUDA Version: 10.1
CUDNN Version: 7.6.3
Operating System + Version: Ubuntu 18.04.3
Python Version (if applicable): 3.6.8
TensorFlow Version (if applicable): n/a
PyTorch Version (if applicable): 1.3.0
Baremetal or Container (if container which image + tag): baremetal
The text was updated successfully, but these errors were encountered: