Unrecognized AIO_DEBUG_MODE= 5 usigng default level = WARNING Version: v0.8.0 Built with: clang++(Ubuntu Clang 14.0.0) git-90df81a2a,Kuba Wolynko,2023-08-07T16:35:09+02:00 built 20230809_111727 by on 96f65684ca4a Internal environment variable DLS_DEBUG_SAVE_FAULTY_DATA is not prefixed with AIO_. Internal environment variable DLS_DEBUG_PRINT_ON_SAME_KERNEL is not prefixed with AIO_. AIO_DATA_DIR is /usr/local/share//libampere-aio Available cores: 0, 1 AIO_NUM_THREADS read (but not applied yet) as 16 Couldn't read cpu governor Numa balancing is off - OK Requested 16 but only 2 are available. Num threads limited to 2 Binding thread 0 to 0 Binding thread 1 to 1 CPU bind done Attempt to register kernel AvgPoolingMeta@NEON with priority clashes (priority-wise) with the following kernels: AvgPoolingMeta@NEON AvgPoolingMeta@NEON Attempt to register kernel MaxPoolingMeta@NEON with priority clashes (priority-wise) with the following kernels: MaxPoolingMeta@NEON MaxPoolingMeta@NEON Attempt to register kernel TransposeBERTVectorized@NEON with priority clashes (priority-wise) with the following kernels: TransposeBERTVectorized@NEON TransposeBERTVectorized@NEON Attempt to register kernel TorchSliceVectorized@NEON with priority clashes (priority-wise) with the following kernels: TorchSliceVectorized@NEON TorchSliceVectorized@NEON Attempt to register kernel TorchSliceVectorized@NEON with priority clashes (priority-wise) with the following kernels: TorchSliceVectorized@NEON TorchSliceVectorized@NEON TorchSliceVectorized@NEON Attempt to register kernel TorchSliceVectorized@NEON with priority clashes (priority-wise) with the following kernels: TorchSliceVectorized@NEON TorchSliceVectorized@NEON TorchSliceVectorized@NEON TorchSliceVectorized@NEON Registered Variables: AIO_ALLOW_UNSAFE_DEPTHWISE = "0" is using default value AIO_JIT_PROFILING = "0" is using default value AIO_MICROKERNEL_MATMUL_FORCE = "0" is using default value AIO_MICROKERNEL_DOTPROD_FORCE = "0" is using default value AIO_DEBUG_LAYER_MERGING = "0" is using default value AIO_DATA_CHECK_IMMUTABLE = "0" is using default value AIO_LAYERS_TO_DEBUG is not set and has no default value AIO_IMPLICIT_FP16_TRANSFORM_FILTER = "" is using default value DLS_DEBUG_SAVE_FAULTY_DATA is not set and has no default value AIO_DEBUG_LAYER_MAX_ERROR_FLOAT = "1e-5" is using default value AIO_DEBUG_LAYER_MEAN_ERROR_FP16 = "1e-5" is using default value AIO_DEBUG_LAYER_MEAN_ERROR_INT8 = "1" is using default value AIO_DEBUG_LAYER_MEAN_ERROR is not set and has no default value AIO_DEBUG_LAYER_MAX_ERROR is not set and has no default value AIO_CVJM_USE_MAGIC = "1" is using default value DLS_DEBUG_PRINT_ON_SAME_KERNEL = "0" is using default value AIO_CPU_BIND = "1" is using default value AIO_PROFILER_TIME_SCALE = "1e3" is using default value AIO_LEGACY_TF = "0" is using default value AIO_PROCESS_MODE = "1" (default = "1" ) AIO_REMOVE_PASSTHRU = "1" is using default value AIO_PROFILER_SORT_MODE = "0" is using default value AIO_DEBUGGER_LAYER_ID is not set and has no default value AIO_GRAPH_FILE = "dls_graph" is using default value AIO_PROFILER_SKIP_FIRST = "1" is using default value AIO_DEBUG_LAYER_MAX_ERROR_INT8 = "1" is using default value AIO_TRACING is not set and has no default value AIO_SUPERNODE = "0" is using default value AIO_PROFILER_LAYERS_TO_SKIP = "Data [merged]" is using default value AIO_DEBUG_STRING_PRECISION = "3" is using default value AIO_RECYCLE_BUFFERS = "1" is using default value AIO_DEBUGGER = "0" is using default value AIO_FORCE_MODE = "0" is using default value AIO_MEM_BIND = "1" is using default value AIO_PROFILER_OUTPUT_MODE = "NL" is using default value AIO_CPU_LEVEL is not set and has no default value AIO_NUMA_CPUS = "ALL" is using default value AIO_KERNEL_PREFERLIST = "" is using default value AIO_PROFILER_FLOAT_PRECISION = "6" is using default value AIO_SOFT_FP16 is not set and has no default value AIO_LIST_ENV_VARIABLES = "0" is using default value AIO_PROFILER_MAX_NAME_LEN = "60" is using default value AIO_ABORT_ON_ERROR = "0" is using default value AIO_PREFER_FLOAT_QUANTIZATION = "1" is using default value AIO_FORCE_GENERIC_MICROKERNEL = "0" is using default value AIO_EXPORT_GRAPH = "0" is using default value AIO_PROFILER_CONFIDENCE = "0.9" is using default value AIO_DEBUG_FILE = "" is using default value AIO_PROFILER_CSV_FILE = "cout" is using default value AIO_TOPOLOGY_DEBUG = "0" is using default value AIO_PROFILER_OUT_FILE = "cout" is using default value AIO_SANITIZE_OUTPUT = "0" is using default value AIO_CONV_ONE_JIT_USE_MAGIC = "1" is using default value AIO_NUM_THREADS = "16" has no default AIO_DEBUG_STRING_WIDTH = "-1" is using default value AIO_TRACER_STRING_POOL = "1000000" is using default value AIO_KERNEL_BLACKLIST = "" is using default value AIO_SHOULD_USE_NUMA = "0" is using default value AIO_SPLIT_BATCH = "0" is using default value AIO_USE_NAIVE_BINOP_ALG = "1" is using default value AIO_NEON_CONV_ONE_D = "256" is using default value AIO_NO_LAYER_MERGING = "0" is using default value AIO_DEBUG_LAYER_MAX_ERROR_FP16 = "1e-4" is using default value AIO_USE_SIMPLE_TRANSFORM = "1" is using default value AIO_USE_DETRANSPOSER_TRANSFORM = "1" is using default value AIO_PROFILER_CSV_MODE = "0" is using default value AIO_SAVE_MODEL = "0" is using default value AIO_SKIP_MASTER_THREAD = "0" is using default value AIO_UKERNEL_QADD_ROUND_INPUT = "1" is using default value AIO_MERGE_PAD_TO_CONV = "1" is using default value AIO_DEBUG_LAYER_MEAN_ERROR_FLOAT = "1e-6" is using default value AIO_PROFILER = "0" is using default value AIO_NEON_CONV_ONE_N = "200" is using default value AIO_REPORT_CONV_TASK is not set and has no default value AIO_CVJM_USE_LOOKUP = "1" is using default value AIO_DEBUG_MODE = "5" (default = "WARN" ) AIO_LIST_UNREGISTERED_ENV_VARIABLES = "1" is using default value XDG_DATA_DIRS = "/usr/local/share/:/usr/share/" is using default value AIO_CVJM_SPARSE_THRESHOLD = "0.05" is using default value AIO_NUMA_NODES = "LOCAL" is using default value AIO_NEON_CONV_ONE_F = "32" is using default value AIO_CONV_ONE_JIT_USE_LOOKUP = "1" is using default value Unknown AIO variable: AIO_LIB_ROOT = "/aio" DLS STARTED 14-08-2023 16:49:46 AIO_PROCESS_MODE: 1 AIO_FORCE_MODE: 0 AIO_NUM_THREADS: 2 CPU_BIND: 1 MEM_BIND: 1 AIO_SPLIT_BATCH: 0 AIO_NO_LAYER_MERGING 0 AIO_LEGACY_TF 0 AIO_SUPERNODE 0 AIO_USE_SIMPLE_TRANSFORM 1 AIO_USE_DETRANSPOSER_TRANSFORM 1 AIO_GRAPH_FILE dls_graph DLS_DEBUG (threshold): 0 AIO_DEBUG_FILE: AIO_PROFILER: 0 Unrecognized AIO_DEBUG_MODE= 5 usigng default level = WARNING Graph before optimizations graph(%self.1 : __torch__.torch.fx.graph_module.___torch_mangle_247.GraphModule, %x : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu)): %self.self_layer4_2_conv3.weight_fused_bn : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_2_conv2.weight_fused_bn : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_2_conv1.weight_fused_bn : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_1_conv3.weight_fused_bn : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_1_conv2.weight_fused_bn : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_1_conv1.weight_fused_bn : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_0_downsample_0.weight_fused_bn : Float(2048, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.91 : Float(2048, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_0_conv3.weight_fused_bn : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_0_conv2.weight_fused_bn : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_0_conv1.weight_fused_bn : Float(512, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_5_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_5_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_5_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_4_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_4_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_4_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_3_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_3_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_3_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_2_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_2_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_2_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_1_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_1_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_1_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_0_downsample_0.weight_fused_bn : Float(1024, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.53 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_0_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_0_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_0_conv1.weight_fused_bn : Float(256, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_3_conv3.weight_fused_bn : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_3_conv2.weight_fused_bn : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_3_conv1.weight_fused_bn : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_2_conv3.weight_fused_bn : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_2_conv2.weight_fused_bn : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_2_conv1.weight_fused_bn : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_1_conv3.weight_fused_bn : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_1_conv2.weight_fused_bn : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_1_conv1.weight_fused_bn : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_0_downsample_0.weight_fused_bn : Float(512, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_0_conv3.weight_fused_bn : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_0_conv2.weight_fused_bn : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.23 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_0_conv1.weight_fused_bn : Float(128, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_2_conv3.weight_fused_bn : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_2_conv2.weight_fused_bn : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_2_conv1.weight_fused_bn : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_1_conv3.weight_fused_bn : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_1_conv2.weight_fused_bn : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_1_conv1.weight_fused_bn : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_0_downsample_0.weight_fused_bn : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_0_conv3.weight_fused_bn : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_0_conv2.weight_fused_bn : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_0_conv1.weight_fused_bn : Float(64, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.1 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_conv1.weight_fused_bn : Float(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_fc.bias : Float(1000, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_fc.weight : Float(1000, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %9 : bool = prim::Constant[value=1](), scope: __module.self_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %8 : bool = prim::Constant[value=0](), scope: __module.self_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %7 : int = prim::Constant[value=1](), scope: __module.self_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %6 : int[] = prim::Constant[value=[2, 2]]() %5 : int[] = prim::Constant[value=[3, 3]]() %4 : int[] = prim::Constant[value=[1, 1]]() %3 : int[] = prim::Constant[value=[0, 0]]() %2 : int = prim::Constant[value=-1]() # .1:178:0 %input.1 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = aten::_convolution(%x, %self.self_conv1.weight_fused_bn, %328_fused_bn.1, %6, %5, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.5 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = aten::relu_(%input.1), scope: __module.self_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.7 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::max_pool2d(%input.5, %5, %6, %4, %4, %8), scope: __module.self_maxpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:782:0 %input.9 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.7, %self.self_layer1_0_conv1.weight_fused_bn, %328_fused_bn.1, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.13 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.9), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.15 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.13, %self.self_layer1_0_conv2.weight_fused_bn, %328_fused_bn.1, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.19 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.15), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.21 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.19, %self.self_layer1_0_conv3.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.23 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.7, %self.self_layer1_0_downsample_0.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.25 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.21, %input.23, %7) # .1:19:0 %input.27 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.25), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.29 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.27, %self.self_layer1_1_conv1.weight_fused_bn, %328_fused_bn.1, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.33 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.29), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.35 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.33, %self.self_layer1_1_conv2.weight_fused_bn, %328_fused_bn.1, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.39 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.35), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.41 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.39, %self.self_layer1_1_conv3.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.43 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.41, %input.27, %7) # .1:29:0 %input.45 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.43), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.47 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.45, %self.self_layer1_2_conv1.weight_fused_bn, %328_fused_bn.1, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.51 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.47), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.53 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.51, %self.self_layer1_2_conv2.weight_fused_bn, %328_fused_bn.1, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.57 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.53), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.59 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.57, %self.self_layer1_2_conv3.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.61 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.59, %input.45, %7) # .1:39:0 %input.63 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.61), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.65 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.63, %self.self_layer2_0_conv1.weight_fused_bn, %328_fused_bn.23, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.69 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.65), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.71 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.69, %self.self_layer2_0_conv2.weight_fused_bn, %328_fused_bn.23, %6, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.75 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.71), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.77 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.75, %self.self_layer2_0_conv3.weight_fused_bn, %328_fused_bn.27, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.79 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.63, %self.self_layer2_0_downsample_0.weight_fused_bn, %328_fused_bn.27, %6, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.81 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.77, %input.79, %7) # .1:51:0 %input.83 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.81), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.85 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.83, %self.self_layer2_1_conv1.weight_fused_bn, %328_fused_bn.23, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.89 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.85), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.91 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.89, %self.self_layer2_1_conv2.weight_fused_bn, %328_fused_bn.23, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.95 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.91), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.97 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.95, %self.self_layer2_1_conv3.weight_fused_bn, %328_fused_bn.27, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.99 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.97, %input.83, %7) # .1:61:0 %input.101 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.99), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.103 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.101, %self.self_layer2_2_conv1.weight_fused_bn, %328_fused_bn.23, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.107 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.103), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.109 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.107, %self.self_layer2_2_conv2.weight_fused_bn, %328_fused_bn.23, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.113 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.109), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.115 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.113, %self.self_layer2_2_conv3.weight_fused_bn, %328_fused_bn.27, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.117 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.115, %input.101, %7) # .1:71:0 %input.119 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.117), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.121 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.119, %self.self_layer2_3_conv1.weight_fused_bn, %328_fused_bn.23, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.125 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.121), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.127 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.125, %self.self_layer2_3_conv2.weight_fused_bn, %328_fused_bn.23, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.131 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.127), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.133 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.131, %self.self_layer2_3_conv3.weight_fused_bn, %328_fused_bn.27, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.135 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.133, %input.119, %7) # .1:81:0 %input.137 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.135), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.139 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.137, %self.self_layer3_0_conv1.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.143 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.139), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.145 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.143, %self.self_layer3_0_conv2.weight_fused_bn, %328_fused_bn.7, %6, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.149 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.145), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.151 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.149, %self.self_layer3_0_conv3.weight_fused_bn, %328_fused_bn.53, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.153 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.137, %self.self_layer3_0_downsample_0.weight_fused_bn, %328_fused_bn.53, %6, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.155 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.151, %input.153, %7) # .1:93:0 %input.157 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.155), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.159 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.157, %self.self_layer3_1_conv1.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.163 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.159), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.165 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.163, %self.self_layer3_1_conv2.weight_fused_bn, %328_fused_bn.7, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.169 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.165), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.171 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.169, %self.self_layer3_1_conv3.weight_fused_bn, %328_fused_bn.53, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.173 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.171, %input.157, %7) # .1:103:0 %input.175 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.173), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.177 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.175, %self.self_layer3_2_conv1.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.181 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.177), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.183 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.181, %self.self_layer3_2_conv2.weight_fused_bn, %328_fused_bn.7, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.187 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.183), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.189 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.187, %self.self_layer3_2_conv3.weight_fused_bn, %328_fused_bn.53, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.191 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.189, %input.175, %7) # .1:113:0 %input.193 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.191), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.195 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.193, %self.self_layer3_3_conv1.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.199 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.195), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.201 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.199, %self.self_layer3_3_conv2.weight_fused_bn, %328_fused_bn.7, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.205 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.201), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.207 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.205, %self.self_layer3_3_conv3.weight_fused_bn, %328_fused_bn.53, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.209 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.207, %input.193, %7) # .1:123:0 %input.211 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.209), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.213 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.211, %self.self_layer3_4_conv1.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_4_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.217 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.213), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.219 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.217, %self.self_layer3_4_conv2.weight_fused_bn, %328_fused_bn.7, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_4_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.223 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.219), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.225 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.223, %self.self_layer3_4_conv3.weight_fused_bn, %328_fused_bn.53, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_4_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.227 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.225, %input.211, %7) # .1:133:0 %input.229 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.227), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.231 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.229, %self.self_layer3_5_conv1.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_5_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.235 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.231), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.237 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.235, %self.self_layer3_5_conv2.weight_fused_bn, %328_fused_bn.7, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_5_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.241 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.237), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.243 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.241, %self.self_layer3_5_conv3.weight_fused_bn, %328_fused_bn.53, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_5_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.245 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.243, %input.229, %7) # .1:143:0 %input.247 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.245), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.249 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.247, %self.self_layer4_0_conv1.weight_fused_bn, %328_fused_bn.27, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.253 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.249), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.255 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.253, %self.self_layer4_0_conv2.weight_fused_bn, %328_fused_bn.27, %6, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.259 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.255), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.261 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.259, %self.self_layer4_0_conv3.weight_fused_bn, %328_fused_bn.91, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.263 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.247, %self.self_layer4_0_downsample_0.weight_fused_bn, %328_fused_bn.91, %6, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.265 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.261, %input.263, %7) # .1:155:0 %input.267 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.265), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.269 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.267, %self.self_layer4_1_conv1.weight_fused_bn, %328_fused_bn.27, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.273 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.269), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.275 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.273, %self.self_layer4_1_conv2.weight_fused_bn, %328_fused_bn.27, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.279 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.275), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.281 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.279, %self.self_layer4_1_conv3.weight_fused_bn, %328_fused_bn.91, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.283 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.281, %input.267, %7) # .1:165:0 %input.285 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.283), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.287 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.285, %self.self_layer4_2_conv1.weight_fused_bn, %328_fused_bn.27, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.291 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.287), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.293 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.291, %self.self_layer4_2_conv2.weight_fused_bn, %328_fused_bn.27, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.297 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.293), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.299 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.297, %self.self_layer4_2_conv3.weight_fused_bn, %328_fused_bn.91, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.301 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.299, %input.285, %7) # .1:175:0 %input.303 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.301), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %self_avgpool.1 : Float(1, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = aten::adaptive_avg_pool2d(%input.303, %4), scope: __module.self_avgpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1214:0 %input : Float(1, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = aten::flatten(%self_avgpool.1, %7, %2) # .1:178:0 %438 : Float(1, 1000, strides=[1000, 1], requires_grad=0, device=cpu) = aten::linear(%input, %self.self_fc.weight, %self.self_fc.bias), scope: __module.self_fc # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py:114:0 %440 : (Tensor) = prim::TupleConstruct(%438) return (%440) Graph after fusion pass graph(%self.1 : __torch__.torch.fx.graph_module.___torch_mangle_247.GraphModule, %x : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu)): %input.2 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_0(%x) %input.6 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_1(%input.2) %input.10 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_2(%input.6) %input.14 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_3(%input.10) %input.18 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_4(%input.14) %input.22 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_5(%input.18) %input.26 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_6(%input.22) %input.30 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_7(%input.26) %input.34 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_8(%input.10) %input.38 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_9(%input.30, %input.34) %input.42 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_10(%input.38) %input.46 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_11(%input.42) %input.50 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_12(%input.46) %input.54 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_13(%input.50) %input.58 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_14(%input.54) %input.62 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_15(%input.58) %input.66 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_16(%input.62, %input.42) %input.70 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_17(%input.66) %input.74 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_18(%input.70) %input.78 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_19(%input.74) %input.82 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_20(%input.78) %input.86 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_21(%input.82) %input.90 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_22(%input.86) %input.94 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_23(%input.90, %input.70) %input.98 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_24(%input.94) %input.102 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_25(%input.98) %input.106 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_26(%input.102) %input.110 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_27(%input.106) %input.114 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_28(%input.110) %input.118 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_29(%input.114) %input.122 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_30(%input.98) %input.126 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_31(%input.118, %input.122) %input.130 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_32(%input.126) %input.134 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_33(%input.130) %input.138 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_34(%input.134) %input.142 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_35(%input.138) %input.146 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_36(%input.142) %input.150 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_37(%input.146) %input.154 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_38(%input.150, %input.130) %input.158 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_39(%input.154) %input.162 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_40(%input.158) %input.166 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_41(%input.162) %input.170 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_42(%input.166) %input.174 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_43(%input.170) %input.178 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_44(%input.174) %input.182 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_45(%input.178, %input.158) %input.186 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_46(%input.182) %input.190 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_47(%input.186) %input.194 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_48(%input.190) %input.198 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_49(%input.194) %input.202 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_50(%input.198) %input.206 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_51(%input.202) %input.210 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_52(%input.206, %input.186) %input.214 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_53(%input.210) %input.218 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_54(%input.214) %input.222 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_55(%input.218) %input.226 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_56(%input.222) %input.230 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_57(%input.226) %input.234 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_58(%input.230) %input.238 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_59(%input.214) %input.242 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_60(%input.234, %input.238) %input.246 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_61(%input.242) %input.250 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_62(%input.246) %input.254 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_63(%input.250) %input.258 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_64(%input.254) %input.262 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_65(%input.258) %input.266 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_66(%input.262) %input.270 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_67(%input.266, %input.246) %input.274 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_68(%input.270) %input.278 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_69(%input.274) %input.282 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_70(%input.278) %input.286 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_71(%input.282) %input.290 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_72(%input.286) %input.294 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_73(%input.290) %input.298 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_74(%input.294, %input.274) %input.302 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_75(%input.298) %input.306 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_76(%input.302) %input.310 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_77(%input.306) %input.314 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_78(%input.310) %input.318 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_79(%input.314) %input.322 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_80(%input.318) %input.326 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_81(%input.322, %input.302) %input.330 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_82(%input.326) %input.334 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_83(%input.330) %input.338 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_84(%input.334) %input.342 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_85(%input.338) %input.346 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_86(%input.342) %input.350 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_87(%input.346) %input.354 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_88(%input.350, %input.330) %input.358 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_89(%input.354) %input.362 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_90(%input.358) %input.366 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_91(%input.362) %input.370 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_92(%input.366) %input.374 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_93(%input.370) %input.378 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_94(%input.374) %input.382 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_95(%input.378, %input.358) %input.386 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_96(%input.382) %input.390 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_97(%input.386) %input.394 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_98(%input.390) %input.398 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_99(%input.394) %input.402 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_100(%input.398) %input.406 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_101(%input.402) %input.410 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_102(%input.386) %input.414 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_103(%input.406, %input.410) %input.418 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_104(%input.414) %input.422 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_105(%input.418) %input.426 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_106(%input.422) %input.430 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_107(%input.426) %input.434 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_108(%input.430) %input.438 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_109(%input.434) %input.442 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_110(%input.438, %input.418) %input.446 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_111(%input.442) %input.450 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_112(%input.446) %input.454 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_113(%input.450) %input.458 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_114(%input.454) %input.462 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_115(%input.458) %input.466 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_116(%input.462) %input.470 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_117(%input.466, %input.446) %input.474 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_118(%input.470) %self_avgpool.2 : Float(1, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_119(%input.474) %input.478 : Float(1, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_120(%self_avgpool.2) %684 : Float(1, 1000, strides=[1000, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_121(%input.478) %440 : (Tensor) = prim::TupleConstruct(%684) return (%440) with prim::AIOFusionGroup_0 = graph(%x : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu)): %self.self_conv1.weight_fused_bn : Float(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.1 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[2, 2]]() %4 : int[] = prim::Constant[value=[3, 3]]() %5 : int[] = prim::Constant[value=[1, 1]]() %6 : bool = prim::Constant[value=0]() %7 : int[] = prim::Constant[value=[0, 0]]() %8 : int = prim::Constant[value=1]() %9 : bool = prim::Constant[value=1]() %input.2 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = aten::_convolution(%x, %self.self_conv1.weight_fused_bn, %328_fused_bn.1, %3, %4, %5, %6, %7, %8, %6, %6, %9, %9), scope: __module.self_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.2) with prim::AIOFusionGroup_1 = graph(%input.2 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu)): %input.6 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = aten::relu_(%input.2), scope: __module.self_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.6) with prim::AIOFusionGroup_2 = graph(%input.6 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu)): %1 : int[] = prim::Constant[value=[3, 3]]() %2 : int[] = prim::Constant[value=[2, 2]]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : bool = prim::Constant[value=0]() %input.10 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::max_pool2d(%input.6, %1, %2, %3, %3, %4), scope: __module.self_maxpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:782:0 return (%input.10) with prim::AIOFusionGroup_3 = graph(%input.10 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)): %self.self_layer1_0_conv1.weight_fused_bn : Float(64, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.1 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.14 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.10, %self.self_layer1_0_conv1.weight_fused_bn, %328_fused_bn.1, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer1_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.14) with prim::AIOFusionGroup_4 = graph(%input.14 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)): %input.18 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.14), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.18) with prim::AIOFusionGroup_5 = graph(%input.18 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)): %self.self_layer1_0_conv2.weight_fused_bn : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.1 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : bool = prim::Constant[value=0]() %5 : int[] = prim::Constant[value=[0, 0]]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.22 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.18, %self.self_layer1_0_conv2.weight_fused_bn, %328_fused_bn.1, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer1_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.22) with prim::AIOFusionGroup_6 = graph(%input.22 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)): %input.26 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.22), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.26) with prim::AIOFusionGroup_7 = graph(%input.26 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)): %self.self_layer1_0_conv3.weight_fused_bn : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.30 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.26, %self.self_layer1_0_conv3.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer1_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.30) with prim::AIOFusionGroup_8 = graph(%input.10 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)): %self.self_layer1_0_downsample_0.weight_fused_bn : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.34 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.10, %self.self_layer1_0_downsample_0.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer1_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.34) with prim::AIOFusionGroup_9 = graph(%input.30 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu), %input.34 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)): %2 : int = prim::Constant[value=1]() %input.38 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.30, %input.34, %2) # .1:19:0 return (%input.38) with prim::AIOFusionGroup_10 = graph(%input.38 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)): %input.42 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.38), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.42) with prim::AIOFusionGroup_11 = graph(%input.42 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)): %self.self_layer1_1_conv1.weight_fused_bn : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.1 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.46 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.42, %self.self_layer1_1_conv1.weight_fused_bn, %328_fused_bn.1, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer1_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.46) with prim::AIOFusionGroup_12 = graph(%input.46 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)): %input.50 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.46), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.50) with prim::AIOFusionGroup_13 = graph(%input.50 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)): %self.self_layer1_1_conv2.weight_fused_bn : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.1 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : bool = prim::Constant[value=0]() %5 : int[] = prim::Constant[value=[0, 0]]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.54 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.50, %self.self_layer1_1_conv2.weight_fused_bn, %328_fused_bn.1, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer1_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.54) with prim::AIOFusionGroup_14 = graph(%input.54 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)): %input.58 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.54), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.58) with prim::AIOFusionGroup_15 = graph(%input.58 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)): %self.self_layer1_1_conv3.weight_fused_bn : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.62 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.58, %self.self_layer1_1_conv3.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer1_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.62) with prim::AIOFusionGroup_16 = graph(%input.62 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu), %input.42 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)): %2 : int = prim::Constant[value=1]() %input.66 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.62, %input.42, %2) # .1:29:0 return (%input.66) with prim::AIOFusionGroup_17 = graph(%input.66 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)): %input.70 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.66), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.70) with prim::AIOFusionGroup_18 = graph(%input.70 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)): %self.self_layer1_2_conv1.weight_fused_bn : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.1 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.74 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.70, %self.self_layer1_2_conv1.weight_fused_bn, %328_fused_bn.1, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer1_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.74) with prim::AIOFusionGroup_19 = graph(%input.74 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)): %input.78 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.74), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.78) with prim::AIOFusionGroup_20 = graph(%input.78 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)): %self.self_layer1_2_conv2.weight_fused_bn : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.1 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : bool = prim::Constant[value=0]() %5 : int[] = prim::Constant[value=[0, 0]]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.82 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.78, %self.self_layer1_2_conv2.weight_fused_bn, %328_fused_bn.1, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer1_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.82) with prim::AIOFusionGroup_21 = graph(%input.82 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)): %input.86 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.82), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.86) with prim::AIOFusionGroup_22 = graph(%input.86 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)): %self.self_layer1_2_conv3.weight_fused_bn : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.90 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.86, %self.self_layer1_2_conv3.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer1_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.90) with prim::AIOFusionGroup_23 = graph(%input.90 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu), %input.70 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)): %2 : int = prim::Constant[value=1]() %input.94 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.90, %input.70, %2) # .1:39:0 return (%input.94) with prim::AIOFusionGroup_24 = graph(%input.94 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)): %input.98 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.94), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.98) with prim::AIOFusionGroup_25 = graph(%input.98 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)): %self.self_layer2_0_conv1.weight_fused_bn : Float(128, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.23 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.102 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.98, %self.self_layer2_0_conv1.weight_fused_bn, %328_fused_bn.23, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer2_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.102) with prim::AIOFusionGroup_26 = graph(%input.102 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu)): %input.106 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.102), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.106) with prim::AIOFusionGroup_27 = graph(%input.106 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu)): %self.self_layer2_0_conv2.weight_fused_bn : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.23 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[2, 2]]() %4 : int[] = prim::Constant[value=[1, 1]]() %5 : bool = prim::Constant[value=0]() %6 : int[] = prim::Constant[value=[0, 0]]() %7 : int = prim::Constant[value=1]() %8 : bool = prim::Constant[value=1]() %input.110 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.106, %self.self_layer2_0_conv2.weight_fused_bn, %328_fused_bn.23, %3, %4, %4, %5, %6, %7, %5, %5, %8, %8), scope: __module.self_layer2_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.110) with prim::AIOFusionGroup_28 = graph(%input.110 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)): %input.114 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.110), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.114) with prim::AIOFusionGroup_29 = graph(%input.114 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)): %self.self_layer2_0_conv3.weight_fused_bn : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.118 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.114, %self.self_layer2_0_conv3.weight_fused_bn, %328_fused_bn.27, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer2_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.118) with prim::AIOFusionGroup_30 = graph(%input.98 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)): %self.self_layer2_0_downsample_0.weight_fused_bn : Float(512, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[2, 2]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : int[] = prim::Constant[value=[1, 1]]() %6 : bool = prim::Constant[value=0]() %7 : int = prim::Constant[value=1]() %8 : bool = prim::Constant[value=1]() %input.122 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.98, %self.self_layer2_0_downsample_0.weight_fused_bn, %328_fused_bn.27, %3, %4, %5, %6, %4, %7, %6, %6, %8, %8), scope: __module.self_layer2_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.122) with prim::AIOFusionGroup_31 = graph(%input.118 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu), %input.122 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)): %2 : int = prim::Constant[value=1]() %input.126 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.118, %input.122, %2) # .1:51:0 return (%input.126) with prim::AIOFusionGroup_32 = graph(%input.126 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)): %input.130 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.126), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.130) with prim::AIOFusionGroup_33 = graph(%input.130 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)): %self.self_layer2_1_conv1.weight_fused_bn : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.23 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.134 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.130, %self.self_layer2_1_conv1.weight_fused_bn, %328_fused_bn.23, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer2_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.134) with prim::AIOFusionGroup_34 = graph(%input.134 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)): %input.138 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.134), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.138) with prim::AIOFusionGroup_35 = graph(%input.138 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)): %self.self_layer2_1_conv2.weight_fused_bn : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.23 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : bool = prim::Constant[value=0]() %5 : int[] = prim::Constant[value=[0, 0]]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.142 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.138, %self.self_layer2_1_conv2.weight_fused_bn, %328_fused_bn.23, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer2_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.142) with prim::AIOFusionGroup_36 = graph(%input.142 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)): %input.146 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.142), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.146) with prim::AIOFusionGroup_37 = graph(%input.146 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)): %self.self_layer2_1_conv3.weight_fused_bn : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.150 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.146, %self.self_layer2_1_conv3.weight_fused_bn, %328_fused_bn.27, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer2_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.150) with prim::AIOFusionGroup_38 = graph(%input.150 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu), %input.130 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)): %2 : int = prim::Constant[value=1]() %input.154 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.150, %input.130, %2) # .1:61:0 return (%input.154) with prim::AIOFusionGroup_39 = graph(%input.154 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)): %input.158 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.154), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.158) with prim::AIOFusionGroup_40 = graph(%input.158 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)): %self.self_layer2_2_conv1.weight_fused_bn : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.23 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.162 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.158, %self.self_layer2_2_conv1.weight_fused_bn, %328_fused_bn.23, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer2_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.162) with prim::AIOFusionGroup_41 = graph(%input.162 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)): %input.166 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.162), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.166) with prim::AIOFusionGroup_42 = graph(%input.166 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)): %self.self_layer2_2_conv2.weight_fused_bn : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.23 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : bool = prim::Constant[value=0]() %5 : int[] = prim::Constant[value=[0, 0]]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.170 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.166, %self.self_layer2_2_conv2.weight_fused_bn, %328_fused_bn.23, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer2_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.170) with prim::AIOFusionGroup_43 = graph(%input.170 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)): %input.174 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.170), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.174) with prim::AIOFusionGroup_44 = graph(%input.174 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)): %self.self_layer2_2_conv3.weight_fused_bn : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.178 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.174, %self.self_layer2_2_conv3.weight_fused_bn, %328_fused_bn.27, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer2_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.178) with prim::AIOFusionGroup_45 = graph(%input.178 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu), %input.158 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)): %2 : int = prim::Constant[value=1]() %input.182 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.178, %input.158, %2) # .1:71:0 return (%input.182) with prim::AIOFusionGroup_46 = graph(%input.182 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)): %input.186 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.182), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.186) with prim::AIOFusionGroup_47 = graph(%input.186 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)): %self.self_layer2_3_conv1.weight_fused_bn : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.23 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.190 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.186, %self.self_layer2_3_conv1.weight_fused_bn, %328_fused_bn.23, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer2_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.190) with prim::AIOFusionGroup_48 = graph(%input.190 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)): %input.194 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.190), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.194) with prim::AIOFusionGroup_49 = graph(%input.194 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)): %self.self_layer2_3_conv2.weight_fused_bn : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.23 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : bool = prim::Constant[value=0]() %5 : int[] = prim::Constant[value=[0, 0]]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.198 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.194, %self.self_layer2_3_conv2.weight_fused_bn, %328_fused_bn.23, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer2_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.198) with prim::AIOFusionGroup_50 = graph(%input.198 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)): %input.202 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.198), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.202) with prim::AIOFusionGroup_51 = graph(%input.202 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)): %self.self_layer2_3_conv3.weight_fused_bn : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.206 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.202, %self.self_layer2_3_conv3.weight_fused_bn, %328_fused_bn.27, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer2_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.206) with prim::AIOFusionGroup_52 = graph(%input.206 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu), %input.186 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)): %2 : int = prim::Constant[value=1]() %input.210 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.206, %input.186, %2) # .1:81:0 return (%input.210) with prim::AIOFusionGroup_53 = graph(%input.210 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)): %input.214 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.210), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.214) with prim::AIOFusionGroup_54 = graph(%input.214 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)): %self.self_layer3_0_conv1.weight_fused_bn : Float(256, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.218 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.214, %self.self_layer3_0_conv1.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.218) with prim::AIOFusionGroup_55 = graph(%input.218 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu)): %input.222 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.218), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.222) with prim::AIOFusionGroup_56 = graph(%input.222 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu)): %self.self_layer3_0_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[2, 2]]() %4 : int[] = prim::Constant[value=[1, 1]]() %5 : bool = prim::Constant[value=0]() %6 : int[] = prim::Constant[value=[0, 0]]() %7 : int = prim::Constant[value=1]() %8 : bool = prim::Constant[value=1]() %input.226 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.222, %self.self_layer3_0_conv2.weight_fused_bn, %328_fused_bn.7, %3, %4, %4, %5, %6, %7, %5, %5, %8, %8), scope: __module.self_layer3_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.226) with prim::AIOFusionGroup_57 = graph(%input.226 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %input.230 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.226), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.230) with prim::AIOFusionGroup_58 = graph(%input.230 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer3_0_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.53 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.234 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.230, %self.self_layer3_0_conv3.weight_fused_bn, %328_fused_bn.53, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.234) with prim::AIOFusionGroup_59 = graph(%input.214 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)): %self.self_layer3_0_downsample_0.weight_fused_bn : Float(1024, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.53 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[2, 2]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : int[] = prim::Constant[value=[1, 1]]() %6 : bool = prim::Constant[value=0]() %7 : int = prim::Constant[value=1]() %8 : bool = prim::Constant[value=1]() %input.238 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.214, %self.self_layer3_0_downsample_0.weight_fused_bn, %328_fused_bn.53, %3, %4, %5, %6, %4, %7, %6, %6, %8, %8), scope: __module.self_layer3_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.238) with prim::AIOFusionGroup_60 = graph(%input.234 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu), %input.238 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %2 : int = prim::Constant[value=1]() %input.242 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.234, %input.238, %2) # .1:93:0 return (%input.242) with prim::AIOFusionGroup_61 = graph(%input.242 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %input.246 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.242), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.246) with prim::AIOFusionGroup_62 = graph(%input.246 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer3_1_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.250 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.246, %self.self_layer3_1_conv1.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.250) with prim::AIOFusionGroup_63 = graph(%input.250 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %input.254 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.250), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.254) with prim::AIOFusionGroup_64 = graph(%input.254 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer3_1_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : bool = prim::Constant[value=0]() %5 : int[] = prim::Constant[value=[0, 0]]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.258 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.254, %self.self_layer3_1_conv2.weight_fused_bn, %328_fused_bn.7, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer3_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.258) with prim::AIOFusionGroup_65 = graph(%input.258 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %input.262 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.258), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.262) with prim::AIOFusionGroup_66 = graph(%input.262 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer3_1_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.53 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.266 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.262, %self.self_layer3_1_conv3.weight_fused_bn, %328_fused_bn.53, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.266) with prim::AIOFusionGroup_67 = graph(%input.266 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu), %input.246 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %2 : int = prim::Constant[value=1]() %input.270 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.266, %input.246, %2) # .1:103:0 return (%input.270) with prim::AIOFusionGroup_68 = graph(%input.270 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %input.274 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.270), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.274) with prim::AIOFusionGroup_69 = graph(%input.274 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer3_2_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.278 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.274, %self.self_layer3_2_conv1.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.278) with prim::AIOFusionGroup_70 = graph(%input.278 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %input.282 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.278), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.282) with prim::AIOFusionGroup_71 = graph(%input.282 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer3_2_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : bool = prim::Constant[value=0]() %5 : int[] = prim::Constant[value=[0, 0]]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.286 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.282, %self.self_layer3_2_conv2.weight_fused_bn, %328_fused_bn.7, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer3_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.286) with prim::AIOFusionGroup_72 = graph(%input.286 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %input.290 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.286), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.290) with prim::AIOFusionGroup_73 = graph(%input.290 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer3_2_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.53 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.294 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.290, %self.self_layer3_2_conv3.weight_fused_bn, %328_fused_bn.53, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.294) with prim::AIOFusionGroup_74 = graph(%input.294 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu), %input.274 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %2 : int = prim::Constant[value=1]() %input.298 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.294, %input.274, %2) # .1:113:0 return (%input.298) with prim::AIOFusionGroup_75 = graph(%input.298 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %input.302 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.298), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.302) with prim::AIOFusionGroup_76 = graph(%input.302 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer3_3_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.306 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.302, %self.self_layer3_3_conv1.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.306) with prim::AIOFusionGroup_77 = graph(%input.306 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %input.310 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.306), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.310) with prim::AIOFusionGroup_78 = graph(%input.310 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer3_3_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : bool = prim::Constant[value=0]() %5 : int[] = prim::Constant[value=[0, 0]]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.314 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.310, %self.self_layer3_3_conv2.weight_fused_bn, %328_fused_bn.7, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer3_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.314) with prim::AIOFusionGroup_79 = graph(%input.314 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %input.318 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.314), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.318) with prim::AIOFusionGroup_80 = graph(%input.318 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer3_3_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.53 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.322 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.318, %self.self_layer3_3_conv3.weight_fused_bn, %328_fused_bn.53, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.322) with prim::AIOFusionGroup_81 = graph(%input.322 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu), %input.302 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %2 : int = prim::Constant[value=1]() %input.326 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.322, %input.302, %2) # .1:123:0 return (%input.326) with prim::AIOFusionGroup_82 = graph(%input.326 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %input.330 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.326), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.330) with prim::AIOFusionGroup_83 = graph(%input.330 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer3_4_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.334 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.330, %self.self_layer3_4_conv1.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_4_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.334) with prim::AIOFusionGroup_84 = graph(%input.334 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %input.338 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.334), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.338) with prim::AIOFusionGroup_85 = graph(%input.338 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer3_4_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : bool = prim::Constant[value=0]() %5 : int[] = prim::Constant[value=[0, 0]]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.342 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.338, %self.self_layer3_4_conv2.weight_fused_bn, %328_fused_bn.7, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer3_4_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.342) with prim::AIOFusionGroup_86 = graph(%input.342 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %input.346 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.342), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.346) with prim::AIOFusionGroup_87 = graph(%input.346 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer3_4_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.53 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.350 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.346, %self.self_layer3_4_conv3.weight_fused_bn, %328_fused_bn.53, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_4_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.350) with prim::AIOFusionGroup_88 = graph(%input.350 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu), %input.330 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %2 : int = prim::Constant[value=1]() %input.354 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.350, %input.330, %2) # .1:133:0 return (%input.354) with prim::AIOFusionGroup_89 = graph(%input.354 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %input.358 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.354), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.358) with prim::AIOFusionGroup_90 = graph(%input.358 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer3_5_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.362 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.358, %self.self_layer3_5_conv1.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_5_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.362) with prim::AIOFusionGroup_91 = graph(%input.362 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %input.366 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.362), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.366) with prim::AIOFusionGroup_92 = graph(%input.366 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer3_5_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : bool = prim::Constant[value=0]() %5 : int[] = prim::Constant[value=[0, 0]]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.370 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.366, %self.self_layer3_5_conv2.weight_fused_bn, %328_fused_bn.7, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer3_5_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.370) with prim::AIOFusionGroup_93 = graph(%input.370 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %input.374 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.370), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.374) with prim::AIOFusionGroup_94 = graph(%input.374 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer3_5_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.53 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.378 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.374, %self.self_layer3_5_conv3.weight_fused_bn, %328_fused_bn.53, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_5_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.378) with prim::AIOFusionGroup_95 = graph(%input.378 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu), %input.358 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %2 : int = prim::Constant[value=1]() %input.382 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.378, %input.358, %2) # .1:143:0 return (%input.382) with prim::AIOFusionGroup_96 = graph(%input.382 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %input.386 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.382), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.386) with prim::AIOFusionGroup_97 = graph(%input.386 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer4_0_conv1.weight_fused_bn : Float(512, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.390 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.386, %self.self_layer4_0_conv1.weight_fused_bn, %328_fused_bn.27, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer4_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.390) with prim::AIOFusionGroup_98 = graph(%input.390 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu)): %input.394 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.390), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.394) with prim::AIOFusionGroup_99 = graph(%input.394 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer4_0_conv2.weight_fused_bn : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[2, 2]]() %4 : int[] = prim::Constant[value=[1, 1]]() %5 : bool = prim::Constant[value=0]() %6 : int[] = prim::Constant[value=[0, 0]]() %7 : int = prim::Constant[value=1]() %8 : bool = prim::Constant[value=1]() %input.398 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.394, %self.self_layer4_0_conv2.weight_fused_bn, %328_fused_bn.27, %3, %4, %4, %5, %6, %7, %5, %5, %8, %8), scope: __module.self_layer4_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.398) with prim::AIOFusionGroup_100 = graph(%input.398 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)): %input.402 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.398), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.402) with prim::AIOFusionGroup_101 = graph(%input.402 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)): %self.self_layer4_0_conv3.weight_fused_bn : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.91 : Float(2048, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.406 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.402, %self.self_layer4_0_conv3.weight_fused_bn, %328_fused_bn.91, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer4_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.406) with prim::AIOFusionGroup_102 = graph(%input.386 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)): %self.self_layer4_0_downsample_0.weight_fused_bn : Float(2048, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.91 : Float(2048, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[2, 2]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : int[] = prim::Constant[value=[1, 1]]() %6 : bool = prim::Constant[value=0]() %7 : int = prim::Constant[value=1]() %8 : bool = prim::Constant[value=1]() %input.410 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.386, %self.self_layer4_0_downsample_0.weight_fused_bn, %328_fused_bn.91, %3, %4, %5, %6, %4, %7, %6, %6, %8, %8), scope: __module.self_layer4_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.410) with prim::AIOFusionGroup_103 = graph(%input.406 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu), %input.410 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu)): %2 : int = prim::Constant[value=1]() %input.414 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.406, %input.410, %2) # .1:155:0 return (%input.414) with prim::AIOFusionGroup_104 = graph(%input.414 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu)): %input.418 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.414), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.418) with prim::AIOFusionGroup_105 = graph(%input.418 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu)): %self.self_layer4_1_conv1.weight_fused_bn : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.422 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.418, %self.self_layer4_1_conv1.weight_fused_bn, %328_fused_bn.27, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer4_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.422) with prim::AIOFusionGroup_106 = graph(%input.422 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)): %input.426 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.422), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.426) with prim::AIOFusionGroup_107 = graph(%input.426 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)): %self.self_layer4_1_conv2.weight_fused_bn : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : bool = prim::Constant[value=0]() %5 : int[] = prim::Constant[value=[0, 0]]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.430 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.426, %self.self_layer4_1_conv2.weight_fused_bn, %328_fused_bn.27, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer4_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.430) with prim::AIOFusionGroup_108 = graph(%input.430 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)): %input.434 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.430), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.434) with prim::AIOFusionGroup_109 = graph(%input.434 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)): %self.self_layer4_1_conv3.weight_fused_bn : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.91 : Float(2048, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.438 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.434, %self.self_layer4_1_conv3.weight_fused_bn, %328_fused_bn.91, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer4_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.438) with prim::AIOFusionGroup_110 = graph(%input.438 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu), %input.418 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu)): %2 : int = prim::Constant[value=1]() %input.442 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.438, %input.418, %2) # .1:165:0 return (%input.442) with prim::AIOFusionGroup_111 = graph(%input.442 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu)): %input.446 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.442), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.446) with prim::AIOFusionGroup_112 = graph(%input.446 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu)): %self.self_layer4_2_conv1.weight_fused_bn : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.450 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.446, %self.self_layer4_2_conv1.weight_fused_bn, %328_fused_bn.27, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer4_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.450) with prim::AIOFusionGroup_113 = graph(%input.450 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)): %input.454 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.450), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.454) with prim::AIOFusionGroup_114 = graph(%input.454 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)): %self.self_layer4_2_conv2.weight_fused_bn : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : bool = prim::Constant[value=0]() %5 : int[] = prim::Constant[value=[0, 0]]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.458 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.454, %self.self_layer4_2_conv2.weight_fused_bn, %328_fused_bn.27, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer4_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.458) with prim::AIOFusionGroup_115 = graph(%input.458 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)): %input.462 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.458), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.462) with prim::AIOFusionGroup_116 = graph(%input.462 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)): %self.self_layer4_2_conv3.weight_fused_bn : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.91 : Float(2048, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int[] = prim::Constant[value=[1, 1]]() %4 : int[] = prim::Constant[value=[0, 0]]() %5 : bool = prim::Constant[value=0]() %6 : int = prim::Constant[value=1]() %7 : bool = prim::Constant[value=1]() %input.466 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.462, %self.self_layer4_2_conv3.weight_fused_bn, %328_fused_bn.91, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer4_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 return (%input.466) with prim::AIOFusionGroup_117 = graph(%input.466 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu), %input.446 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu)): %2 : int = prim::Constant[value=1]() %input.470 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.466, %input.446, %2) # .1:175:0 return (%input.470) with prim::AIOFusionGroup_118 = graph(%input.470 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu)): %input.474 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.470), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 return (%input.474) with prim::AIOFusionGroup_119 = graph(%input.474 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu)): %1 : int[] = prim::Constant[value=[1, 1]]() %self_avgpool.2 : Float(1, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = aten::adaptive_avg_pool2d(%input.474, %1), scope: __module.self_avgpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1214:0 return (%self_avgpool.2) with prim::AIOFusionGroup_120 = graph(%self_avgpool.2 : Float(1, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu)): %1 : int = prim::Constant[value=1]() %2 : int = prim::Constant[value=-1]() %input.478 : Float(1, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = aten::flatten(%self_avgpool.2, %1, %2) # .1:178:0 return (%input.478) with prim::AIOFusionGroup_121 = graph(%input.478 : Float(1, 2048, strides=[2048, 1], requires_grad=0, device=cpu)): %self.self_fc.weight : Float(1000, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_fc.bias : Float(1000, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : Float(1, 1000, strides=[1000, 1], requires_grad=0, device=cpu) = aten::linear(%input.478, %self.self_fc.weight, %self.self_fc.bias), scope: __module.self_fc # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py:114:0 return (%3) Graph after AIOFuser graph(%self.1 : __torch__.torch.fx.graph_module.___torch_mangle_247.GraphModule, %x : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu)): %66458 : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu), %66459 : bool = prim::AIOFusionGuard[types=[Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu)]](%x) %66460 : Float(1, 1000, strides=[1000, 1], requires_grad=0, device=cpu) = prim::If(%66459) block0(): %684 : Float(1, 1000, strides=[1000, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_0(%66458) -> (%684) block1(): %66652 : Tensor = prim::FallbackGraph_1(%x) -> (%66652) %440 : (Tensor) = prim::TupleConstruct(%66460) return (%440) with prim::AIOFusionGroup_0 = graph(%x : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu)): %self.self_fc.bias : Float(1000, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_fc.weight : Float(1000, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %6 : int = prim::Constant[value=-1]() %self.self_layer4_2_conv3.weight_fused_bn.5 : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_2_conv2.weight_fused_bn.7 : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_2_conv1.weight_fused_bn.9 : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_1_conv3.weight_fused_bn.12 : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_1_conv2.weight_fused_bn.14 : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_1_conv1.weight_fused_bn.16 : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_0_downsample_0.weight_fused_bn.19 : Float(2048, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.112 : Float(2048, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_0_conv3.weight_fused_bn.20 : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_0_conv2.weight_fused_bn.22 : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_0_conv1.weight_fused_bn.24 : Float(512, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_5_conv3.weight_fused_bn.27 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_5_conv2.weight_fused_bn.29 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_5_conv1.weight_fused_bn.31 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_4_conv3.weight_fused_bn.34 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_4_conv2.weight_fused_bn.36 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_4_conv1.weight_fused_bn.38 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_3_conv3.weight_fused_bn.41 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_3_conv2.weight_fused_bn.43 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_3_conv1.weight_fused_bn.45 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_2_conv3.weight_fused_bn.48 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_2_conv2.weight_fused_bn.50 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_2_conv1.weight_fused_bn.52 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_1_conv3.weight_fused_bn.55 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_1_conv2.weight_fused_bn.57 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_1_conv1.weight_fused_bn.59 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_0_downsample_0.weight_fused_bn.62 : Float(1024, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.162 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_0_conv3.weight_fused_bn.63 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_0_conv2.weight_fused_bn.65 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_0_conv1.weight_fused_bn.67 : Float(256, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_3_conv3.weight_fused_bn.70 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_3_conv2.weight_fused_bn.72 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_3_conv1.weight_fused_bn.74 : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_2_conv3.weight_fused_bn.77 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_2_conv2.weight_fused_bn.79 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_2_conv1.weight_fused_bn.81 : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_1_conv3.weight_fused_bn.84 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_1_conv2.weight_fused_bn.86 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_1_conv1.weight_fused_bn.88 : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_0_downsample_0.weight_fused_bn.91 : Float(512, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.214 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_0_conv3.weight_fused_bn.92 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_0_conv2.weight_fused_bn.94 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.222 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_0_conv1.weight_fused_bn.96 : Float(128, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_2_conv3.weight_fused_bn.99 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_2_conv2.weight_fused_bn.101 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_2_conv1.weight_fused_bn.103 : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_1_conv3.weight_fused_bn.106 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_1_conv2.weight_fused_bn.108 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_1_conv1.weight_fused_bn.110 : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_0_downsample_0.weight_fused_bn.113 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.238 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_0_conv3.weight_fused_bn.114 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_0_conv2.weight_fused_bn.116 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_0_conv1.weight_fused_bn.118 : Float(64, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %1039 : bool = prim::Constant[value=1]() %1038 : int = prim::Constant[value=1]() %1037 : int[] = prim::Constant[value=[0, 0]]() %1036 : bool = prim::Constant[value=0]() %1035 : int[] = prim::Constant[value=[1, 1]]() %1034 : int[] = prim::Constant[value=[3, 3]]() %1033 : int[] = prim::Constant[value=[2, 2]]() %328_fused_bn.244 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_conv1.weight_fused_bn.121 : Float(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %input.2 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = aten::_convolution(%x, %self.self_conv1.weight_fused_bn.121, %328_fused_bn.244, %1033, %1034, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.6 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = aten::relu_(%input.2), scope: __module.self_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.10 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::max_pool2d(%input.6, %1034, %1033, %1035, %1035, %1036), scope: __module.self_maxpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:782:0 %input.14 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.10, %self.self_layer1_0_conv1.weight_fused_bn.118, %328_fused_bn.244, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.18 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.14), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.22 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.18, %self.self_layer1_0_conv2.weight_fused_bn.116, %328_fused_bn.244, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.26 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.22), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.30 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.26, %self.self_layer1_0_conv3.weight_fused_bn.114, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.34 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.10, %self.self_layer1_0_downsample_0.weight_fused_bn.113, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.38 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.30, %input.34, %1038) # .1:19:0 %input.42 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.38), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.46 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.42, %self.self_layer1_1_conv1.weight_fused_bn.110, %328_fused_bn.244, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.50 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.46), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.54 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.50, %self.self_layer1_1_conv2.weight_fused_bn.108, %328_fused_bn.244, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.58 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.54), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.62 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.58, %self.self_layer1_1_conv3.weight_fused_bn.106, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.66 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.62, %input.42, %1038) # .1:29:0 %input.70 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.66), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.74 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.70, %self.self_layer1_2_conv1.weight_fused_bn.103, %328_fused_bn.244, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.78 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.74), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.82 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.78, %self.self_layer1_2_conv2.weight_fused_bn.101, %328_fused_bn.244, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.86 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.82), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.90 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.86, %self.self_layer1_2_conv3.weight_fused_bn.99, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.94 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.90, %input.70, %1038) # .1:39:0 %input.98 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.94), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.102 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.98, %self.self_layer2_0_conv1.weight_fused_bn.96, %328_fused_bn.222, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.106 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.102), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.110 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.106, %self.self_layer2_0_conv2.weight_fused_bn.94, %328_fused_bn.222, %1033, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.114 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.110), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.118 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.114, %self.self_layer2_0_conv3.weight_fused_bn.92, %328_fused_bn.214, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.122 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.98, %self.self_layer2_0_downsample_0.weight_fused_bn.91, %328_fused_bn.214, %1033, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.126 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.118, %input.122, %1038) # .1:51:0 %input.130 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.126), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.134 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.130, %self.self_layer2_1_conv1.weight_fused_bn.88, %328_fused_bn.222, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.138 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.134), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.142 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.138, %self.self_layer2_1_conv2.weight_fused_bn.86, %328_fused_bn.222, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.146 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.142), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.150 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.146, %self.self_layer2_1_conv3.weight_fused_bn.84, %328_fused_bn.214, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.154 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.150, %input.130, %1038) # .1:61:0 %input.158 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.154), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.162 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.158, %self.self_layer2_2_conv1.weight_fused_bn.81, %328_fused_bn.222, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.166 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.162), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.170 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.166, %self.self_layer2_2_conv2.weight_fused_bn.79, %328_fused_bn.222, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.174 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.170), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.178 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.174, %self.self_layer2_2_conv3.weight_fused_bn.77, %328_fused_bn.214, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.182 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.178, %input.158, %1038) # .1:71:0 %input.186 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.182), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.190 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.186, %self.self_layer2_3_conv1.weight_fused_bn.74, %328_fused_bn.222, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.194 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.190), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.198 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.194, %self.self_layer2_3_conv2.weight_fused_bn.72, %328_fused_bn.222, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.202 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.198), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.206 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.202, %self.self_layer2_3_conv3.weight_fused_bn.70, %328_fused_bn.214, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.210 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.206, %input.186, %1038) # .1:81:0 %input.214 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.210), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.218 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.214, %self.self_layer3_0_conv1.weight_fused_bn.67, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.222 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.218), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.226 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.222, %self.self_layer3_0_conv2.weight_fused_bn.65, %328_fused_bn.238, %1033, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.230 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.226), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.234 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.230, %self.self_layer3_0_conv3.weight_fused_bn.63, %328_fused_bn.162, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.238 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.214, %self.self_layer3_0_downsample_0.weight_fused_bn.62, %328_fused_bn.162, %1033, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.242 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.234, %input.238, %1038) # .1:93:0 %input.246 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.242), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.250 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.246, %self.self_layer3_1_conv1.weight_fused_bn.59, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.254 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.250), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.258 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.254, %self.self_layer3_1_conv2.weight_fused_bn.57, %328_fused_bn.238, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.262 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.258), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.266 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.262, %self.self_layer3_1_conv3.weight_fused_bn.55, %328_fused_bn.162, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.270 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.266, %input.246, %1038) # .1:103:0 %input.274 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.270), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.278 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.274, %self.self_layer3_2_conv1.weight_fused_bn.52, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.282 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.278), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.286 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.282, %self.self_layer3_2_conv2.weight_fused_bn.50, %328_fused_bn.238, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.290 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.286), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.294 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.290, %self.self_layer3_2_conv3.weight_fused_bn.48, %328_fused_bn.162, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.298 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.294, %input.274, %1038) # .1:113:0 %input.302 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.298), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.306 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.302, %self.self_layer3_3_conv1.weight_fused_bn.45, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.310 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.306), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.314 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.310, %self.self_layer3_3_conv2.weight_fused_bn.43, %328_fused_bn.238, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.318 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.314), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.322 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.318, %self.self_layer3_3_conv3.weight_fused_bn.41, %328_fused_bn.162, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.326 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.322, %input.302, %1038) # .1:123:0 %input.330 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.326), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.334 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.330, %self.self_layer3_4_conv1.weight_fused_bn.38, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_4_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.338 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.334), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.342 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.338, %self.self_layer3_4_conv2.weight_fused_bn.36, %328_fused_bn.238, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_4_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.346 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.342), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.350 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.346, %self.self_layer3_4_conv3.weight_fused_bn.34, %328_fused_bn.162, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_4_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.354 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.350, %input.330, %1038) # .1:133:0 %input.358 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.354), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.362 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.358, %self.self_layer3_5_conv1.weight_fused_bn.31, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_5_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.366 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.362), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.370 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.366, %self.self_layer3_5_conv2.weight_fused_bn.29, %328_fused_bn.238, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_5_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.374 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.370), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.378 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.374, %self.self_layer3_5_conv3.weight_fused_bn.27, %328_fused_bn.162, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_5_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.382 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.378, %input.358, %1038) # .1:143:0 %input.386 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.382), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.390 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.386, %self.self_layer4_0_conv1.weight_fused_bn.24, %328_fused_bn.214, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.394 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.390), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.398 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.394, %self.self_layer4_0_conv2.weight_fused_bn.22, %328_fused_bn.214, %1033, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.402 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.398), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.406 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.402, %self.self_layer4_0_conv3.weight_fused_bn.20, %328_fused_bn.112, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.410 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.386, %self.self_layer4_0_downsample_0.weight_fused_bn.19, %328_fused_bn.112, %1033, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.414 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.406, %input.410, %1038) # .1:155:0 %input.418 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.414), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.422 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.418, %self.self_layer4_1_conv1.weight_fused_bn.16, %328_fused_bn.214, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.426 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.422), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.430 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.426, %self.self_layer4_1_conv2.weight_fused_bn.14, %328_fused_bn.214, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.434 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.430), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.438 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.434, %self.self_layer4_1_conv3.weight_fused_bn.12, %328_fused_bn.112, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.442 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.438, %input.418, %1038) # .1:165:0 %input.446 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.442), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.450 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.446, %self.self_layer4_2_conv1.weight_fused_bn.9, %328_fused_bn.214, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.454 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.450), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.458 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.454, %self.self_layer4_2_conv2.weight_fused_bn.7, %328_fused_bn.214, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.462 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.458), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.466 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.462, %self.self_layer4_2_conv3.weight_fused_bn.5, %328_fused_bn.112, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.470 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.466, %input.446, %1038) # .1:175:0 %input.474 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.470), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %self_avgpool.2 : Float(1, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = aten::adaptive_avg_pool2d(%input.474, %1035), scope: __module.self_avgpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1214:0 %input.478 : Float(1, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = aten::flatten(%self_avgpool.2, %1038, %6) # .1:178:0 %3 : Float(1, 1000, strides=[1000, 1], requires_grad=0, device=cpu) = aten::linear(%input.478, %self.self_fc.weight, %self.self_fc.bias), scope: __module.self_fc # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py:114:0 return (%3) with prim::FallbackGraph_1 = graph(%x : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu)): %self.self_fc.bias : Float(1000, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_fc.weight : Float(1000, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int = prim::Constant[value=-1]() %self.self_layer4_2_conv3.weight_fused_bn.5 : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_2_conv2.weight_fused_bn.7 : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_2_conv1.weight_fused_bn.9 : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_1_conv3.weight_fused_bn.12 : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_1_conv2.weight_fused_bn.14 : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_1_conv1.weight_fused_bn.16 : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_0_downsample_0.weight_fused_bn.19 : Float(2048, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.112 : Float(2048, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_0_conv3.weight_fused_bn.20 : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_0_conv2.weight_fused_bn.22 : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_0_conv1.weight_fused_bn.24 : Float(512, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_5_conv3.weight_fused_bn.27 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_5_conv2.weight_fused_bn.29 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_5_conv1.weight_fused_bn.31 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_4_conv3.weight_fused_bn.34 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_4_conv2.weight_fused_bn.36 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_4_conv1.weight_fused_bn.38 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_3_conv3.weight_fused_bn.41 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_3_conv2.weight_fused_bn.43 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_3_conv1.weight_fused_bn.45 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_2_conv3.weight_fused_bn.48 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_2_conv2.weight_fused_bn.50 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_2_conv1.weight_fused_bn.52 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_1_conv3.weight_fused_bn.55 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_1_conv2.weight_fused_bn.57 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_1_conv1.weight_fused_bn.59 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_0_downsample_0.weight_fused_bn.62 : Float(1024, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.162 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_0_conv3.weight_fused_bn.63 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_0_conv2.weight_fused_bn.65 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_0_conv1.weight_fused_bn.67 : Float(256, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_3_conv3.weight_fused_bn.70 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_3_conv2.weight_fused_bn.72 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_3_conv1.weight_fused_bn.74 : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_2_conv3.weight_fused_bn.77 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_2_conv2.weight_fused_bn.79 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_2_conv1.weight_fused_bn.81 : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_1_conv3.weight_fused_bn.84 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_1_conv2.weight_fused_bn.86 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_1_conv1.weight_fused_bn.88 : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_0_downsample_0.weight_fused_bn.91 : Float(512, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.214 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_0_conv3.weight_fused_bn.92 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_0_conv2.weight_fused_bn.94 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.222 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_0_conv1.weight_fused_bn.96 : Float(128, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_2_conv3.weight_fused_bn.99 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_2_conv2.weight_fused_bn.101 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_2_conv1.weight_fused_bn.103 : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_1_conv3.weight_fused_bn.106 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_1_conv2.weight_fused_bn.108 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_1_conv1.weight_fused_bn.110 : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_0_downsample_0.weight_fused_bn.113 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.238 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_0_conv3.weight_fused_bn.114 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_0_conv2.weight_fused_bn.116 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_0_conv1.weight_fused_bn.118 : Float(64, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %61 : bool = prim::Constant[value=1]() %62 : int = prim::Constant[value=1]() %63 : int[] = prim::Constant[value=[0, 0]]() %64 : bool = prim::Constant[value=0]() %65 : int[] = prim::Constant[value=[1, 1]]() %66 : int[] = prim::Constant[value=[3, 3]]() %67 : int[] = prim::Constant[value=[2, 2]]() %328_fused_bn.244 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_conv1.weight_fused_bn.121 : Float(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %input.2 : Tensor = aten::_convolution(%x, %self.self_conv1.weight_fused_bn.121, %328_fused_bn.244, %67, %66, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.6 : Tensor = aten::relu_(%input.2), scope: __module.self_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.10 : Tensor = aten::max_pool2d(%input.6, %66, %67, %65, %65, %64), scope: __module.self_maxpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:782:0 %input.14 : Tensor = aten::_convolution(%input.10, %self.self_layer1_0_conv1.weight_fused_bn.118, %328_fused_bn.244, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.18 : Tensor = aten::relu_(%input.14), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.22 : Tensor = aten::_convolution(%input.18, %self.self_layer1_0_conv2.weight_fused_bn.116, %328_fused_bn.244, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.26 : Tensor = aten::relu_(%input.22), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.30 : Tensor = aten::_convolution(%input.26, %self.self_layer1_0_conv3.weight_fused_bn.114, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.34 : Tensor = aten::_convolution(%input.10, %self.self_layer1_0_downsample_0.weight_fused_bn.113, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.38 : Tensor = aten::add_(%input.30, %input.34, %62) # .1:19:0 %input.42 : Tensor = aten::relu_(%input.38), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.46 : Tensor = aten::_convolution(%input.42, %self.self_layer1_1_conv1.weight_fused_bn.110, %328_fused_bn.244, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.50 : Tensor = aten::relu_(%input.46), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.54 : Tensor = aten::_convolution(%input.50, %self.self_layer1_1_conv2.weight_fused_bn.108, %328_fused_bn.244, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.58 : Tensor = aten::relu_(%input.54), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.62 : Tensor = aten::_convolution(%input.58, %self.self_layer1_1_conv3.weight_fused_bn.106, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.66 : Tensor = aten::add_(%input.62, %input.42, %62) # .1:29:0 %input.70 : Tensor = aten::relu_(%input.66), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.74 : Tensor = aten::_convolution(%input.70, %self.self_layer1_2_conv1.weight_fused_bn.103, %328_fused_bn.244, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.78 : Tensor = aten::relu_(%input.74), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.82 : Tensor = aten::_convolution(%input.78, %self.self_layer1_2_conv2.weight_fused_bn.101, %328_fused_bn.244, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.86 : Tensor = aten::relu_(%input.82), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.90 : Tensor = aten::_convolution(%input.86, %self.self_layer1_2_conv3.weight_fused_bn.99, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.94 : Tensor = aten::add_(%input.90, %input.70, %62) # .1:39:0 %input.98 : Tensor = aten::relu_(%input.94), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.102 : Tensor = aten::_convolution(%input.98, %self.self_layer2_0_conv1.weight_fused_bn.96, %328_fused_bn.222, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.106 : Tensor = aten::relu_(%input.102), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.110 : Tensor = aten::_convolution(%input.106, %self.self_layer2_0_conv2.weight_fused_bn.94, %328_fused_bn.222, %67, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.114 : Tensor = aten::relu_(%input.110), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.118 : Tensor = aten::_convolution(%input.114, %self.self_layer2_0_conv3.weight_fused_bn.92, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.122 : Tensor = aten::_convolution(%input.98, %self.self_layer2_0_downsample_0.weight_fused_bn.91, %328_fused_bn.214, %67, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.126 : Tensor = aten::add_(%input.118, %input.122, %62) # .1:51:0 %input.130 : Tensor = aten::relu_(%input.126), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.134 : Tensor = aten::_convolution(%input.130, %self.self_layer2_1_conv1.weight_fused_bn.88, %328_fused_bn.222, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.138 : Tensor = aten::relu_(%input.134), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.142 : Tensor = aten::_convolution(%input.138, %self.self_layer2_1_conv2.weight_fused_bn.86, %328_fused_bn.222, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.146 : Tensor = aten::relu_(%input.142), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.150 : Tensor = aten::_convolution(%input.146, %self.self_layer2_1_conv3.weight_fused_bn.84, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.154 : Tensor = aten::add_(%input.150, %input.130, %62) # .1:61:0 %input.158 : Tensor = aten::relu_(%input.154), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.162 : Tensor = aten::_convolution(%input.158, %self.self_layer2_2_conv1.weight_fused_bn.81, %328_fused_bn.222, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.166 : Tensor = aten::relu_(%input.162), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.170 : Tensor = aten::_convolution(%input.166, %self.self_layer2_2_conv2.weight_fused_bn.79, %328_fused_bn.222, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.174 : Tensor = aten::relu_(%input.170), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.178 : Tensor = aten::_convolution(%input.174, %self.self_layer2_2_conv3.weight_fused_bn.77, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.182 : Tensor = aten::add_(%input.178, %input.158, %62) # .1:71:0 %input.186 : Tensor = aten::relu_(%input.182), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.190 : Tensor = aten::_convolution(%input.186, %self.self_layer2_3_conv1.weight_fused_bn.74, %328_fused_bn.222, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.194 : Tensor = aten::relu_(%input.190), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.198 : Tensor = aten::_convolution(%input.194, %self.self_layer2_3_conv2.weight_fused_bn.72, %328_fused_bn.222, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.202 : Tensor = aten::relu_(%input.198), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.206 : Tensor = aten::_convolution(%input.202, %self.self_layer2_3_conv3.weight_fused_bn.70, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.210 : Tensor = aten::add_(%input.206, %input.186, %62) # .1:81:0 %input.214 : Tensor = aten::relu_(%input.210), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.218 : Tensor = aten::_convolution(%input.214, %self.self_layer3_0_conv1.weight_fused_bn.67, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.222 : Tensor = aten::relu_(%input.218), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.226 : Tensor = aten::_convolution(%input.222, %self.self_layer3_0_conv2.weight_fused_bn.65, %328_fused_bn.238, %67, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.230 : Tensor = aten::relu_(%input.226), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.234 : Tensor = aten::_convolution(%input.230, %self.self_layer3_0_conv3.weight_fused_bn.63, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.238 : Tensor = aten::_convolution(%input.214, %self.self_layer3_0_downsample_0.weight_fused_bn.62, %328_fused_bn.162, %67, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.242 : Tensor = aten::add_(%input.234, %input.238, %62) # .1:93:0 %input.246 : Tensor = aten::relu_(%input.242), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.250 : Tensor = aten::_convolution(%input.246, %self.self_layer3_1_conv1.weight_fused_bn.59, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.254 : Tensor = aten::relu_(%input.250), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.258 : Tensor = aten::_convolution(%input.254, %self.self_layer3_1_conv2.weight_fused_bn.57, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.262 : Tensor = aten::relu_(%input.258), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.266 : Tensor = aten::_convolution(%input.262, %self.self_layer3_1_conv3.weight_fused_bn.55, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.270 : Tensor = aten::add_(%input.266, %input.246, %62) # .1:103:0 %input.274 : Tensor = aten::relu_(%input.270), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.278 : Tensor = aten::_convolution(%input.274, %self.self_layer3_2_conv1.weight_fused_bn.52, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.282 : Tensor = aten::relu_(%input.278), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.286 : Tensor = aten::_convolution(%input.282, %self.self_layer3_2_conv2.weight_fused_bn.50, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.290 : Tensor = aten::relu_(%input.286), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.294 : Tensor = aten::_convolution(%input.290, %self.self_layer3_2_conv3.weight_fused_bn.48, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.298 : Tensor = aten::add_(%input.294, %input.274, %62) # .1:113:0 %input.302 : Tensor = aten::relu_(%input.298), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.306 : Tensor = aten::_convolution(%input.302, %self.self_layer3_3_conv1.weight_fused_bn.45, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.310 : Tensor = aten::relu_(%input.306), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.314 : Tensor = aten::_convolution(%input.310, %self.self_layer3_3_conv2.weight_fused_bn.43, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.318 : Tensor = aten::relu_(%input.314), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.322 : Tensor = aten::_convolution(%input.318, %self.self_layer3_3_conv3.weight_fused_bn.41, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.326 : Tensor = aten::add_(%input.322, %input.302, %62) # .1:123:0 %input.330 : Tensor = aten::relu_(%input.326), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.334 : Tensor = aten::_convolution(%input.330, %self.self_layer3_4_conv1.weight_fused_bn.38, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_4_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.338 : Tensor = aten::relu_(%input.334), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.342 : Tensor = aten::_convolution(%input.338, %self.self_layer3_4_conv2.weight_fused_bn.36, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_4_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.346 : Tensor = aten::relu_(%input.342), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.350 : Tensor = aten::_convolution(%input.346, %self.self_layer3_4_conv3.weight_fused_bn.34, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_4_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.354 : Tensor = aten::add_(%input.350, %input.330, %62) # .1:133:0 %input.358 : Tensor = aten::relu_(%input.354), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.362 : Tensor = aten::_convolution(%input.358, %self.self_layer3_5_conv1.weight_fused_bn.31, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_5_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.366 : Tensor = aten::relu_(%input.362), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.370 : Tensor = aten::_convolution(%input.366, %self.self_layer3_5_conv2.weight_fused_bn.29, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_5_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.374 : Tensor = aten::relu_(%input.370), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.378 : Tensor = aten::_convolution(%input.374, %self.self_layer3_5_conv3.weight_fused_bn.27, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_5_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.382 : Tensor = aten::add_(%input.378, %input.358, %62) # .1:143:0 %input.386 : Tensor = aten::relu_(%input.382), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.390 : Tensor = aten::_convolution(%input.386, %self.self_layer4_0_conv1.weight_fused_bn.24, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.394 : Tensor = aten::relu_(%input.390), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.398 : Tensor = aten::_convolution(%input.394, %self.self_layer4_0_conv2.weight_fused_bn.22, %328_fused_bn.214, %67, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.402 : Tensor = aten::relu_(%input.398), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.406 : Tensor = aten::_convolution(%input.402, %self.self_layer4_0_conv3.weight_fused_bn.20, %328_fused_bn.112, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.410 : Tensor = aten::_convolution(%input.386, %self.self_layer4_0_downsample_0.weight_fused_bn.19, %328_fused_bn.112, %67, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.414 : Tensor = aten::add_(%input.406, %input.410, %62) # .1:155:0 %input.418 : Tensor = aten::relu_(%input.414), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.422 : Tensor = aten::_convolution(%input.418, %self.self_layer4_1_conv1.weight_fused_bn.16, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.426 : Tensor = aten::relu_(%input.422), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.430 : Tensor = aten::_convolution(%input.426, %self.self_layer4_1_conv2.weight_fused_bn.14, %328_fused_bn.214, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.434 : Tensor = aten::relu_(%input.430), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.438 : Tensor = aten::_convolution(%input.434, %self.self_layer4_1_conv3.weight_fused_bn.12, %328_fused_bn.112, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.442 : Tensor = aten::add_(%input.438, %input.418, %62) # .1:165:0 %input.446 : Tensor = aten::relu_(%input.442), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.450 : Tensor = aten::_convolution(%input.446, %self.self_layer4_2_conv1.weight_fused_bn.9, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.454 : Tensor = aten::relu_(%input.450), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.458 : Tensor = aten::_convolution(%input.454, %self.self_layer4_2_conv2.weight_fused_bn.7, %328_fused_bn.214, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.462 : Tensor = aten::relu_(%input.458), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.466 : Tensor = aten::_convolution(%input.462, %self.self_layer4_2_conv3.weight_fused_bn.5, %328_fused_bn.112, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.470 : Tensor = aten::add_(%input.466, %input.446, %62) # .1:175:0 %input.474 : Tensor = aten::relu_(%input.470), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %self_avgpool.2 : Tensor = aten::adaptive_avg_pool2d(%input.474, %65), scope: __module.self_avgpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1214:0 %input.478 : Tensor = aten::flatten(%self_avgpool.2, %62, %3) # .1:178:0 %191 : Tensor = aten::linear(%input.478, %self.self_fc.weight, %self.self_fc.bias), scope: __module.self_fc # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py:114:0 return (%191) Running DLS graph fuser taken 322169 microseconds Building AIO network from graph graph(%x : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu)): %self.self_fc.bias : Float(1000, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_fc.weight : Float(1000, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %3 : int = prim::Constant[value=-1]() %self.self_layer4_2_conv3.weight_fused_bn.5 : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_2_conv2.weight_fused_bn.7 : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_2_conv1.weight_fused_bn.9 : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_1_conv3.weight_fused_bn.12 : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_1_conv2.weight_fused_bn.14 : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_1_conv1.weight_fused_bn.16 : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_0_downsample_0.weight_fused_bn.19 : Float(2048, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.112 : Float(2048, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_0_conv3.weight_fused_bn.20 : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_0_conv2.weight_fused_bn.22 : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer4_0_conv1.weight_fused_bn.24 : Float(512, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_5_conv3.weight_fused_bn.27 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_5_conv2.weight_fused_bn.29 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_5_conv1.weight_fused_bn.31 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_4_conv3.weight_fused_bn.34 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_4_conv2.weight_fused_bn.36 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_4_conv1.weight_fused_bn.38 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_3_conv3.weight_fused_bn.41 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_3_conv2.weight_fused_bn.43 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_3_conv1.weight_fused_bn.45 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_2_conv3.weight_fused_bn.48 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_2_conv2.weight_fused_bn.50 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_2_conv1.weight_fused_bn.52 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_1_conv3.weight_fused_bn.55 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_1_conv2.weight_fused_bn.57 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_1_conv1.weight_fused_bn.59 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_0_downsample_0.weight_fused_bn.62 : Float(1024, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.162 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_0_conv3.weight_fused_bn.63 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_0_conv2.weight_fused_bn.65 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer3_0_conv1.weight_fused_bn.67 : Float(256, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_3_conv3.weight_fused_bn.70 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_3_conv2.weight_fused_bn.72 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_3_conv1.weight_fused_bn.74 : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_2_conv3.weight_fused_bn.77 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_2_conv2.weight_fused_bn.79 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_2_conv1.weight_fused_bn.81 : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_1_conv3.weight_fused_bn.84 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_1_conv2.weight_fused_bn.86 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_1_conv1.weight_fused_bn.88 : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_0_downsample_0.weight_fused_bn.91 : Float(512, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.214 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_0_conv3.weight_fused_bn.92 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_0_conv2.weight_fused_bn.94 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.222 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer2_0_conv1.weight_fused_bn.96 : Float(128, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_2_conv3.weight_fused_bn.99 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_2_conv2.weight_fused_bn.101 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_2_conv1.weight_fused_bn.103 : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_1_conv3.weight_fused_bn.106 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_1_conv2.weight_fused_bn.108 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_1_conv1.weight_fused_bn.110 : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_0_downsample_0.weight_fused_bn.113 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %328_fused_bn.238 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_0_conv3.weight_fused_bn.114 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_0_conv2.weight_fused_bn.116 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_layer1_0_conv1.weight_fused_bn.118 : Float(64, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %61 : bool = prim::Constant[value=1]() %62 : int = prim::Constant[value=1]() %63 : int[] = prim::Constant[value=[0, 0]]() %64 : bool = prim::Constant[value=0]() %65 : int[] = prim::Constant[value=[1, 1]]() %66 : int[] = prim::Constant[value=[3, 3]]() %67 : int[] = prim::Constant[value=[2, 2]]() %328_fused_bn.244 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=]() %self.self_conv1.weight_fused_bn.121 : Float(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=]() %input.2 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = aten::_convolution(%x, %self.self_conv1.weight_fused_bn.121, %328_fused_bn.244, %67, %66, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.6 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = aten::relu_(%input.2), scope: __module.self_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.10 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::max_pool2d(%input.6, %66, %67, %65, %65, %64), scope: __module.self_maxpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:782:0 %input.14 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.10, %self.self_layer1_0_conv1.weight_fused_bn.118, %328_fused_bn.244, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.18 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.14), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.22 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.18, %self.self_layer1_0_conv2.weight_fused_bn.116, %328_fused_bn.244, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.26 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.22), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.30 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.26, %self.self_layer1_0_conv3.weight_fused_bn.114, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.34 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.10, %self.self_layer1_0_downsample_0.weight_fused_bn.113, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.38 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.30, %input.34, %62) # .1:19:0 %input.42 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.38), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.46 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.42, %self.self_layer1_1_conv1.weight_fused_bn.110, %328_fused_bn.244, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.50 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.46), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.54 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.50, %self.self_layer1_1_conv2.weight_fused_bn.108, %328_fused_bn.244, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.58 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.54), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.62 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.58, %self.self_layer1_1_conv3.weight_fused_bn.106, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.66 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.62, %input.42, %62) # .1:29:0 %input.70 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.66), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.74 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.70, %self.self_layer1_2_conv1.weight_fused_bn.103, %328_fused_bn.244, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.78 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.74), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.82 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.78, %self.self_layer1_2_conv2.weight_fused_bn.101, %328_fused_bn.244, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.86 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.82), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.90 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.86, %self.self_layer1_2_conv3.weight_fused_bn.99, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.94 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.90, %input.70, %62) # .1:39:0 %input.98 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.94), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.102 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.98, %self.self_layer2_0_conv1.weight_fused_bn.96, %328_fused_bn.222, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.106 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.102), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.110 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.106, %self.self_layer2_0_conv2.weight_fused_bn.94, %328_fused_bn.222, %67, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.114 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.110), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.118 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.114, %self.self_layer2_0_conv3.weight_fused_bn.92, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.122 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.98, %self.self_layer2_0_downsample_0.weight_fused_bn.91, %328_fused_bn.214, %67, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.126 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.118, %input.122, %62) # .1:51:0 %input.130 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.126), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.134 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.130, %self.self_layer2_1_conv1.weight_fused_bn.88, %328_fused_bn.222, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.138 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.134), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.142 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.138, %self.self_layer2_1_conv2.weight_fused_bn.86, %328_fused_bn.222, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.146 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.142), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.150 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.146, %self.self_layer2_1_conv3.weight_fused_bn.84, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.154 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.150, %input.130, %62) # .1:61:0 %input.158 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.154), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.162 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.158, %self.self_layer2_2_conv1.weight_fused_bn.81, %328_fused_bn.222, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.166 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.162), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.170 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.166, %self.self_layer2_2_conv2.weight_fused_bn.79, %328_fused_bn.222, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.174 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.170), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.178 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.174, %self.self_layer2_2_conv3.weight_fused_bn.77, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.182 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.178, %input.158, %62) # .1:71:0 %input.186 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.182), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.190 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.186, %self.self_layer2_3_conv1.weight_fused_bn.74, %328_fused_bn.222, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.194 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.190), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.198 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.194, %self.self_layer2_3_conv2.weight_fused_bn.72, %328_fused_bn.222, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.202 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.198), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.206 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.202, %self.self_layer2_3_conv3.weight_fused_bn.70, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.210 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.206, %input.186, %62) # .1:81:0 %input.214 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.210), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.218 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.214, %self.self_layer3_0_conv1.weight_fused_bn.67, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.222 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.218), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.226 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.222, %self.self_layer3_0_conv2.weight_fused_bn.65, %328_fused_bn.238, %67, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.230 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.226), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.234 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.230, %self.self_layer3_0_conv3.weight_fused_bn.63, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.238 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.214, %self.self_layer3_0_downsample_0.weight_fused_bn.62, %328_fused_bn.162, %67, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.242 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.234, %input.238, %62) # .1:93:0 %input.246 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.242), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.250 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.246, %self.self_layer3_1_conv1.weight_fused_bn.59, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.254 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.250), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.258 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.254, %self.self_layer3_1_conv2.weight_fused_bn.57, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.262 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.258), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.266 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.262, %self.self_layer3_1_conv3.weight_fused_bn.55, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.270 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.266, %input.246, %62) # .1:103:0 %input.274 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.270), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.278 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.274, %self.self_layer3_2_conv1.weight_fused_bn.52, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.282 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.278), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.286 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.282, %self.self_layer3_2_conv2.weight_fused_bn.50, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.290 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.286), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.294 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.290, %self.self_layer3_2_conv3.weight_fused_bn.48, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.298 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.294, %input.274, %62) # .1:113:0 %input.302 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.298), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.306 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.302, %self.self_layer3_3_conv1.weight_fused_bn.45, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.310 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.306), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.314 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.310, %self.self_layer3_3_conv2.weight_fused_bn.43, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.318 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.314), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.322 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.318, %self.self_layer3_3_conv3.weight_fused_bn.41, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.326 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.322, %input.302, %62) # .1:123:0 %input.330 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.326), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.334 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.330, %self.self_layer3_4_conv1.weight_fused_bn.38, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_4_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.338 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.334), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.342 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.338, %self.self_layer3_4_conv2.weight_fused_bn.36, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_4_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.346 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.342), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.350 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.346, %self.self_layer3_4_conv3.weight_fused_bn.34, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_4_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.354 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.350, %input.330, %62) # .1:133:0 %input.358 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.354), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.362 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.358, %self.self_layer3_5_conv1.weight_fused_bn.31, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_5_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.366 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.362), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.370 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.366, %self.self_layer3_5_conv2.weight_fused_bn.29, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_5_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.374 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.370), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.378 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.374, %self.self_layer3_5_conv3.weight_fused_bn.27, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_5_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.382 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.378, %input.358, %62) # .1:143:0 %input.386 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.382), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.390 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.386, %self.self_layer4_0_conv1.weight_fused_bn.24, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.394 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.390), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.398 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.394, %self.self_layer4_0_conv2.weight_fused_bn.22, %328_fused_bn.214, %67, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.402 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.398), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.406 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.402, %self.self_layer4_0_conv3.weight_fused_bn.20, %328_fused_bn.112, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.410 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.386, %self.self_layer4_0_downsample_0.weight_fused_bn.19, %328_fused_bn.112, %67, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.414 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.406, %input.410, %62) # .1:155:0 %input.418 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.414), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.422 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.418, %self.self_layer4_1_conv1.weight_fused_bn.16, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.426 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.422), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.430 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.426, %self.self_layer4_1_conv2.weight_fused_bn.14, %328_fused_bn.214, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.434 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.430), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.438 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.434, %self.self_layer4_1_conv3.weight_fused_bn.12, %328_fused_bn.112, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.442 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.438, %input.418, %62) # .1:165:0 %input.446 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.442), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.450 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.446, %self.self_layer4_2_conv1.weight_fused_bn.9, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.454 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.450), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.458 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.454, %self.self_layer4_2_conv2.weight_fused_bn.7, %328_fused_bn.214, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.462 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.458), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %input.466 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.462, %self.self_layer4_2_conv3.weight_fused_bn.5, %328_fused_bn.112, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0 %input.470 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.466, %input.446, %62) # .1:175:0 %input.474 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.470), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0 %self_avgpool.2 : Float(1, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = aten::adaptive_avg_pool2d(%input.474, %65), scope: __module.self_avgpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1214:0 %input.478 : Float(1, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = aten::flatten(%self_avgpool.2, %62, %3) # .1:178:0 %191 : Float(1, 1000, strides=[1000, 1], requires_grad=0, device=cpu) = aten::linear(%input.478, %self.self_fc.weight, %self.self_fc.bias), scope: __module.self_fc # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py:114:0 return (%191) Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding prim::Constant layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaaddae7140 , Bias 0xaaaae0db6e00 , padding [3, 3] , stride [2, 2] , dilation [1, 1] , groups 1 Registering network input: Conv input index: 0 Creating blob for Input layer 3 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [1, 3, 224, 224] Creating blob for Data layer 4 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [64, 3, 7, 7] Creating blob for Data layer 5 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [64] Adding aten::relu_ layer to network Adding aten::max_pool2d layer to network Binding inputs for Max_pooling layer kernel_size [3, 3] , dilation [1, 1] , padding [1, 1] , stride [2, 2] , ceil_mode 0 Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaade743780 , Bias 0xaaaae0db6e00 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 9 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [64, 64, 1, 1] Creating blob for Data layer 10 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [64] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaaddafa780 , Bias 0xaaaae0db6e00 , padding [1, 1] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 13 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [64, 64, 3, 3] Creating blob for Data layer 14 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [64] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaade77ed80 , Bias 0xaaaae0dc7bc0 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 17 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [256, 64, 1, 1] Creating blob for Data layer 18 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [256] Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae1f62200 , Bias 0xaaaae0dc7bc0 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 20 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [256, 64, 1, 1] Creating blob for Data layer 21 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [256] Adding aten::add_ layer to network Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaaddb1e800 , Bias 0xaaaae0db6e00 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 25 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [64, 256, 1, 1] Creating blob for Data layer 26 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [64] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaaddb2e880 , Bias 0xaaaae0db6e00 , padding [1, 1] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 29 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [64, 64, 3, 3] Creating blob for Data layer 30 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [64] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae1f7e7c0 , Bias 0xaaaae0dc7bc0 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 33 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [256, 64, 1, 1] Creating blob for Data layer 34 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [256] Adding aten::add_ layer to network Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae1f8e800 , Bias 0xaaaae0db6e00 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 38 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [64, 256, 1, 1] Creating blob for Data layer 39 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [64] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae1f9e880 , Bias 0xaaaae0db6e00 , padding [1, 1] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 42 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [64, 64, 3, 3] Creating blob for Data layer 43 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [64] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae1fc28c0 , Bias 0xaaaae0dc7bc0 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 46 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [256, 64, 1, 1] Creating blob for Data layer 47 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [256] Adding aten::add_ layer to network Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae1fd2940 , Bias 0xaaaae2031440 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 51 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [128, 256, 1, 1] Creating blob for Data layer 52 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [128] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae20ea240 , Bias 0xaaaae2031440 , padding [1, 1] , stride [2, 2] , dilation [1, 1] , groups 1 Creating blob for Data layer 55 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [128, 128, 3, 3] Creating blob for Data layer 56 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [128] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae1e79640 , Bias 0xaaaae0dbf340 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 59 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [512, 128, 1, 1] Creating blob for Data layer 60 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [512] Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae1eb9680 , Bias 0xaaaae0dbf340 , padding [0, 0] , stride [2, 2] , dilation [1, 1] , groups 1 Creating blob for Data layer 62 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [512, 256, 1, 1] Creating blob for Data layer 63 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [512] Adding aten::add_ layer to network Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaadf4d7a40 , Bias 0xaaaae2031440 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 67 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [128, 512, 1, 1] Creating blob for Data layer 68 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [128] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaadf517a80 , Bias 0xaaaae2031440 , padding [1, 1] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 71 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [128, 128, 3, 3] Creating blob for Data layer 72 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [128] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaadf5a7ac0 , Bias 0xaaaae0dbf340 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 75 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [512, 128, 1, 1] Creating blob for Data layer 76 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [512] Adding aten::add_ layer to network Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaadf5e7b40 , Bias 0xaaaae2031440 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 80 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [128, 512, 1, 1] Creating blob for Data layer 81 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [128] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaadd9477c0 , Bias 0xaaaae2031440 , padding [1, 1] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 84 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [128, 128, 3, 3] Creating blob for Data layer 85 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [128] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaadd9d7800 , Bias 0xaaaae0dbf340 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 88 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [512, 128, 1, 1] Creating blob for Data layer 89 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [512] Adding aten::add_ layer to network Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaadda17880 , Bias 0xaaaae2031440 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 93 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [128, 512, 1, 1] Creating blob for Data layer 94 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [128] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae2272240 , Bias 0xaaaae2031440 , padding [1, 1] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 97 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [128, 128, 3, 3] Creating blob for Data layer 98 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [128] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaadda578c0 , Bias 0xaaaae0dbf340 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 101 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [512, 128, 1, 1] Creating blob for Data layer 102 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [512] Adding aten::add_ layer to network Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae2302280 , Bias 0xaaaae0dc7bc0 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 106 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [256, 512, 1, 1] Creating blob for Data layer 107 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [256] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae1859580 , Bias 0xaaaae0dc7bc0 , padding [1, 1] , stride [2, 2] , dilation [1, 1] , groups 1 Creating blob for Data layer 110 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [256, 256, 3, 3] Creating blob for Data layer 111 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [256] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae2382300 , Bias 0xaaaade75f5c0 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 114 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [1024, 256, 1, 1] Creating blob for Data layer 115 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [1024] Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae1b695c0 , Bias 0xaaaade75f5c0 , padding [0, 0] , stride [2, 2] , dilation [1, 1] , groups 1 Creating blob for Data layer 117 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [1024, 512, 1, 1] Creating blob for Data layer 118 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [1024] Adding aten::add_ layer to network Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae1d69600 , Bias 0xaaaae0dc7bc0 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 122 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [256, 1024, 1, 1] Creating blob for Data layer 123 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [256] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae1175480 , Bias 0xaaaae0dc7bc0 , padding [1, 1] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 126 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [256, 256, 3, 3] Creating blob for Data layer 127 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [256] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae0a91380 , Bias 0xaaaade75f5c0 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 130 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [1024, 256, 1, 1] Creating blob for Data layer 131 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [1024] Adding aten::add_ layer to network Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae0b913c0 , Bias 0xaaaae0dc7bc0 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 135 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [256, 1024, 1, 1] Creating blob for Data layer 136 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [256] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae14e7500 , Bias 0xaaaae0dc7bc0 , padding [1, 1] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 139 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [256, 256, 3, 3] Creating blob for Data layer 140 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [256] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae0c91400 , Bias 0xaaaade75f5c0 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 143 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [1024, 256, 1, 1] Creating blob for Data layer 144 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [1024] Adding aten::add_ layer to network Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae06bd340 , Bias 0xaaaae0dc7bc0 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 148 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [256, 1024, 1, 1] Creating blob for Data layer 149 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [256] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae009d280 , Bias 0xaaaae0dc7bc0 , padding [1, 1] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 152 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [256, 256, 3, 3] Creating blob for Data layer 153 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [256] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae07bd380 , Bias 0xaaaade75f5c0 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 156 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [1024, 256, 1, 1] Creating blob for Data layer 157 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [1024] Adding aten::add_ layer to network Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae08bd400 , Bias 0xaaaae0dc7bc0 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 161 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [256, 1024, 1, 1] Creating blob for Data layer 162 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [256] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae03ad300 , Bias 0xaaaae0dc7bc0 , padding [1, 1] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 165 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [256, 256, 3, 3] Creating blob for Data layer 166 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [256] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaadf6845c0 , Bias 0xaaaade75f5c0 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 169 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [1024, 256, 1, 1] Creating blob for Data layer 170 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [1024] Adding aten::add_ layer to network Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaadf784600 , Bias 0xaaaae0dc7bc0 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 174 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [256, 1024, 1, 1] Creating blob for Data layer 175 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [256] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaadfd8d240 , Bias 0xaaaae0dc7bc0 , padding [1, 1] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 178 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [256, 256, 3, 3] Creating blob for Data layer 179 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [256] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaadf884680 , Bias 0xaaaade75f5c0 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 182 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [1024, 256, 1, 1] Creating blob for Data layer 183 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [1024] Adding aten::add_ layer to network Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaadf994600 , Bias 0xaaaae0dbf340 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 187 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [512, 1024, 1, 1] Creating blob for Data layer 188 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [512] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xffff6d08d040 , Bias 0xaaaae0dbf340 , padding [1, 1] , stride [2, 2] , dilation [1, 1] , groups 1 Creating blob for Data layer 191 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [512, 512, 3, 3] Creating blob for Data layer 192 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [512] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae33ac440 , Bias 0xaaaaddaf4500 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 195 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [2048, 512, 1, 1] Creating blob for Data layer 196 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [2048] Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xffff6c88c040 , Bias 0xaaaaddaf4500 , padding [0, 0] , stride [2, 2] , dilation [1, 1] , groups 1 Creating blob for Data layer 198 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [2048, 1024, 1, 1] Creating blob for Data layer 199 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [2048] Adding aten::add_ layer to network Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae37ac480 , Bias 0xaaaae0dbf340 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 203 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [512, 2048, 1, 1] Creating blob for Data layer 204 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [512] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xffff5d6ff040 , Bias 0xaaaae0dbf340 , padding [1, 1] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 207 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [512, 512, 3, 3] Creating blob for Data layer 208 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [512] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae3bac500 , Bias 0xaaaaddaf4500 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 211 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [2048, 512, 1, 1] Creating blob for Data layer 212 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [2048] Adding aten::add_ layer to network Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae3fac580 , Bias 0xaaaae0dbf340 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 216 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [512, 2048, 1, 1] Creating blob for Data layer 217 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [512] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xffff5cdfe040 , Bias 0xaaaae0dbf340 , padding [1, 1] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 220 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [512, 512, 3, 3] Creating blob for Data layer 221 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [512] Adding aten::relu_ layer to network Adding aten::_convolution layer to network Binding inputs for Convolution Layer Weight 0xaaaae43ac5c0 , Bias 0xaaaaddaf4500 , padding [0, 0] , stride [1, 1] , dilation [1, 1] , groups 1 Creating blob for Data layer 224 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [2048, 512, 1, 1] Creating blob for Data layer 225 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [2048] Adding aten::add_ layer to network Adding aten::relu_ layer to network Adding aten::adaptive_avg_pool2d layer to network Binding inputs for Adaptive_avg_pooling Layer kernel_size [1, 1] Adding aten::flatten layer to network Adding aten::linear layer to network Binding inputs for Linear layer Weight 0xffff6f88e040 , Bias 0xaaaadcb52dc0 Creating blob for Data layer 231 with type FLOAT format PlainDataFormat(FORMATF_ROW_MAJOR)[0x0000000000000004] shape [1000, 2048] Creating blob for Data layer 232 with type FLOAT format PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001] shape [1000] Running AIO Network Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Layer FullyConnected got PlainDataFormat(FORMATF_BATCH_ROW_MAJOR)[0x0000000000000015] while it prefers PlainDataFormat(FORMATF_ROW_MAJOR)[0x0000000000000004] but no such conversion is available in DLS Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Allocating 16 bytes (aligned) Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Data for layer Data : Selected kernel Input for layer Input : Conv input Selected kernel TransposeBRC3x4 for layer Transpose : Selected kernel ConvViaJitMatmul for layer Convolution : PlatformInfo(vendor_id=3, cpu_family=8, cpu_model=3340, isa=NEON, L1=CacheInfo(size=65536, inclusive=1, share_count=1), L2=CacheInfo(size=1048576, inclusive=0, share_count=1), L3=CacheInfo(size=33554432, inclusive=0, share_count=80)) Tuning ConvTask(batch=1,idepth=1,iheight=224,iwidth=224,ichannels=3,odepth=1,oheight=112,owidth=112,ochannels=64,kdepth=1,kheight=7,kwidth=7,dstride=1,hstride=2,wstride=2,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=3,wpad=3,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score CvjmPreset(in_regs=7,w_regs=3,batch_tile=3136,inpf_step_tile=49,outf_tile=64) Allocating 42336 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel MaxPoolingMeta@NEON for layer Pooling : Selected kernel ConvOneJit for layer Convolution : /usr/local/share//libampere-aio/data/lookup_files/conv_one_jit.csv 1 Could not parse lookup entry: Missing column task.extbatch Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=64,odepth=1,oheight=56,owidth=56,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=64,n_minibatch=3136) Allocating 65536 bytes (aligned) Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=64,odepth=1,oheight=56,owidth=56,ochannels=64,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=32,in_mode=MS,d_minibatch=64,n_minibatch=3136) Allocating 16384 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=64,odepth=1,oheight=56,owidth=56,ochannels=64,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score CvjmPreset(in_regs=6,w_regs=4,batch_tile=784,inpf_step_tile=9,outf_tile=64) Allocating 147456 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=64,odepth=1,oheight=56,owidth=56,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=64,n_minibatch=3136) Allocating 65536 bytes (aligned) Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=256,odepth=1,oheight=56,owidth=56,ochannels=64,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=32,in_mode=MS,d_minibatch=256,n_minibatch=3136) Allocating 65536 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=64,odepth=1,oheight=56,owidth=56,ochannels=64,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup CvjmPreset(in_regs=6,w_regs=4,batch_tile=784,inpf_step_tile=9,outf_tile=64) Allocating 147456 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=64,odepth=1,oheight=56,owidth=56,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=64,n_minibatch=3136) Allocating 65536 bytes (aligned) Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=256,odepth=1,oheight=56,owidth=56,ochannels=64,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=32,in_mode=MS,d_minibatch=256,n_minibatch=3136) Allocating 65536 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=64,odepth=1,oheight=56,owidth=56,ochannels=64,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup CvjmPreset(in_regs=6,w_regs=4,batch_tile=784,inpf_step_tile=9,outf_tile=64) Allocating 147456 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=64,odepth=1,oheight=56,owidth=56,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=64,n_minibatch=3136) Allocating 65536 bytes (aligned) Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=256,odepth=1,oheight=28,owidth=28,ochannels=512,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=2,wstride=2,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score CvjmPreset(in_regs=7,w_regs=3,batch_tile=196,inpf_step_tile=1,outf_tile=512) Allocating 528384 bytes (aligned) Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=256,odepth=1,oheight=56,owidth=56,ochannels=128,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=256,n_minibatch=3136) Allocating 131072 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=128,odepth=1,oheight=28,owidth=28,ochannels=128,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=2,wstride=2,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score CvjmPreset(in_regs=6,w_regs=4,batch_tile=196,inpf_step_tile=9,outf_tile=128) Allocating 589824 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=128,odepth=1,oheight=28,owidth=28,ochannels=512,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=512,in_mode=MS,d_minibatch=32,n_minibatch=196) Allocating 264192 bytes (aligned) Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=512,odepth=1,oheight=28,owidth=28,ochannels=128,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=512,n_minibatch=784) Allocating 262144 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=128,odepth=1,oheight=28,owidth=28,ochannels=128,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score CvjmPreset(in_regs=6,w_regs=4,batch_tile=196,inpf_step_tile=9,outf_tile=128) Allocating 589824 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=128,odepth=1,oheight=28,owidth=28,ochannels=512,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=512,in_mode=MS,d_minibatch=32,n_minibatch=196) Allocating 264192 bytes (aligned) Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=512,odepth=1,oheight=28,owidth=28,ochannels=128,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=512,n_minibatch=784) Allocating 262144 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=128,odepth=1,oheight=28,owidth=28,ochannels=128,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup CvjmPreset(in_regs=6,w_regs=4,batch_tile=196,inpf_step_tile=9,outf_tile=128) Allocating 589824 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=128,odepth=1,oheight=28,owidth=28,ochannels=512,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=512,in_mode=MS,d_minibatch=32,n_minibatch=196) Allocating 264192 bytes (aligned) Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=512,odepth=1,oheight=28,owidth=28,ochannels=128,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=512,n_minibatch=784) Allocating 262144 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=128,odepth=1,oheight=28,owidth=28,ochannels=128,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup CvjmPreset(in_regs=6,w_regs=4,batch_tile=196,inpf_step_tile=9,outf_tile=128) Allocating 589824 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=128,odepth=1,oheight=28,owidth=28,ochannels=512,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=512,in_mode=MS,d_minibatch=32,n_minibatch=196) Allocating 264192 bytes (aligned) Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=512,odepth=1,oheight=14,owidth=14,ochannels=1024,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=2,wstride=2,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score CvjmPreset(in_regs=6,w_regs=4,batch_tile=196,inpf_step_tile=1,outf_tile=32) Allocating 2 MB (numa) Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=512,odepth=1,oheight=28,owidth=28,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=512,n_minibatch=784) Allocating 524288 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=2,wstride=2,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=9,outf_tile=256) Allocating 2 MB (numa) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=1024,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=1024,in_mode=MS,d_minibatch=144,n_minibatch=49) Allocating 1 MB (numa) Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=1024,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=1024,n_minibatch=196) Allocating 1048576 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=9,outf_tile=256) Allocating 2 MB (numa) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=1024,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=1024,in_mode=MS,d_minibatch=144,n_minibatch=49) Allocating 1 MB (numa) Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=1024,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=1024,n_minibatch=196) Allocating 1048576 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=9,outf_tile=256) Allocating 2 MB (numa) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=1024,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=1024,in_mode=MS,d_minibatch=144,n_minibatch=49) Allocating 1 MB (numa) Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=1024,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=1024,n_minibatch=196) Allocating 1048576 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=9,outf_tile=256) Allocating 2 MB (numa) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=1024,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=1024,in_mode=MS,d_minibatch=144,n_minibatch=49) Allocating 1 MB (numa) Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=1024,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=1024,n_minibatch=196) Allocating 1048576 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=9,outf_tile=256) Allocating 2 MB (numa) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=1024,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=1024,in_mode=MS,d_minibatch=144,n_minibatch=49) Allocating 1 MB (numa) Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=1024,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=1024,n_minibatch=196) Allocating 1048576 bytes (aligned) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=9,outf_tile=256) Allocating 2 MB (numa) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=1024,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=1024,in_mode=MS,d_minibatch=144,n_minibatch=49) Allocating 1 MB (numa) Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=1024,odepth=1,oheight=7,owidth=7,ochannels=2048,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=2,wstride=2,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=1,outf_tile=24) Allocating 8 MB (numa) Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=1024,odepth=1,oheight=14,owidth=14,ochannels=512,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=1024,n_minibatch=196) Allocating 2 MB (numa) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=512,odepth=1,oheight=7,owidth=7,ochannels=512,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=2,wstride=2,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=9,outf_tile=24) Allocating 9 MB (numa) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=7,iwidth=7,ichannels=512,odepth=1,oheight=7,owidth=7,ochannels=2048,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=24,in_mode=MS,d_minibatch=512,n_minibatch=49) Allocating 4 MB (numa) Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=7,iwidth=7,ichannels=2048,odepth=1,oheight=7,owidth=7,ochannels=512,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=2048,n_minibatch=49) Allocating 4 MB (numa) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=7,iwidth=7,ichannels=512,odepth=1,oheight=7,owidth=7,ochannels=512,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=9,outf_tile=24) Allocating 9 MB (numa) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=7,iwidth=7,ichannels=512,odepth=1,oheight=7,owidth=7,ochannels=2048,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=24,in_mode=MS,d_minibatch=512,n_minibatch=49) Allocating 4 MB (numa) Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=7,iwidth=7,ichannels=2048,odepth=1,oheight=7,owidth=7,ochannels=512,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=2048,n_minibatch=49) Allocating 4 MB (numa) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvViaJitMatmul for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=7,iwidth=7,ichannels=512,odepth=1,oheight=7,owidth=7,ochannels=512,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=9,outf_tile=24) Allocating 9 MB (numa) Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel ConvOneJit for layer Convolution : Tuning ConvTask(batch=1,idepth=1,iheight=7,iwidth=7,ichannels=512,odepth=1,oheight=7,owidth=7,ochannels=2048,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=24,in_mode=MS,d_minibatch=512,n_minibatch=49) Allocating 4 MB (numa) Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : Selected kernel AdaptiveAvgPoolingMeta@NEON for layer AdaptiveAvgPool : Selected kernel TransposeIndexed for layer Transpose : Allocating 8192 bytes (aligned) Selected kernel ForwardingKernelFlatten for layer Flatten : Selected kernel FCViaConvOne for layer FullyConnected : Selected kernel ForwardingKernelOutput for layer Output : Merge of ( Transpose [1, 224, 224, 3] ) to Conv input ( Input Input ): Target layer type is not mergeable Merge of ( Convolution [1, 112, 112, 64] ) to ( Transpose TransposeBRC3x4 ): Target layer type is not mergeable Considering merge of RELU to ConvViaJitMatmul Merge of ( RELU [1, 112, 112, 64] ) to ( Convolution ConvViaJitMatmul ): Successful Merge of ( Pooling [1, 56, 56, 64] ) to ( Convolution ConvViaJitMatmul ): Target layer type is not mergeable Considering merge of Add to ConvOneJit Merge of ( Add [1, 56, 56, 256] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 56, 56, 256] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 56, 56, 64] ) to ( Convolution ConvOneJit ): Successful Merge of ( Convolution [1, 56, 56, 64] ) to ( Convolution ConvOneJit ): Target layer type is not mergeable Considering merge of RELU to ConvViaJitMatmul Merge of ( RELU [1, 56, 56, 64] ) to ( Convolution ConvViaJitMatmul ): Successful Merge of ( Convolution [1, 56, 56, 256] ) to ( Convolution ConvViaJitMatmul ): Target layer type is not mergeable Considering merge of Add to ConvOneJit Kernel ConvOneJit rejected merge Merge of ( Add [1, 56, 56, 256] ) to ( Convolution ConvOneJit ): Attempt merge failed Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 56, 56, 64] ) to ( Convolution ConvOneJit ): Successful Merge of ( Convolution [1, 56, 56, 64] ) to ( Convolution ConvOneJit ): Target layer type is not mergeable Considering merge of RELU to ConvViaJitMatmul Merge of ( RELU [1, 56, 56, 64] ) to ( Convolution ConvViaJitMatmul ): Successful Merge of ( Convolution [1, 56, 56, 256] ) to ( Convolution ConvViaJitMatmul ): Target layer type is not mergeable Considering merge of Add to ConvOneJit Merge of ( Add [1, 56, 56, 256] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 56, 56, 256] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 56, 56, 64] ) to ( Convolution ConvOneJit ): Successful Merge of ( Convolution [1, 56, 56, 64] ) to ( Convolution ConvOneJit ): Target layer type is not mergeable Considering merge of RELU to ConvViaJitMatmul Merge of ( RELU [1, 56, 56, 64] ) to ( Convolution ConvViaJitMatmul ): Successful Merge of ( Convolution [1, 56, 56, 256] ) to ( Convolution ConvViaJitMatmul ): Target layer type is not mergeable Considering merge of Add to ConvOneJit Merge of ( Add [1, 56, 56, 256] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 56, 56, 256] ) to ( Convolution ConvOneJit ): Successful Considering merge of Add to ConvViaJitMatmul Kernel ConvViaJitMatmul rejected merge Merge of ( Add [1, 28, 28, 512] ) to ( Convolution ConvViaJitMatmul ): Attempt merge failed Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 56, 56, 128] ) to ( Convolution ConvOneJit ): Successful Merge of ( Convolution [1, 28, 28, 128] ) to ( Convolution ConvOneJit ): Target layer type is not mergeable Considering merge of RELU to ConvViaJitMatmul Merge of ( RELU [1, 28, 28, 128] ) to ( Convolution ConvViaJitMatmul ): Successful Merge of ( Convolution [1, 28, 28, 512] ) to ( Convolution ConvViaJitMatmul ): Target layer type is not mergeable Considering merge of Add to ConvOneJit Merge of ( Add [1, 28, 28, 512] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 28, 28, 512] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 28, 28, 128] ) to ( Convolution ConvOneJit ): Successful Merge of ( Convolution [1, 28, 28, 128] ) to ( Convolution ConvOneJit ): Target layer type is not mergeable Considering merge of RELU to ConvViaJitMatmul Merge of ( RELU [1, 28, 28, 128] ) to ( Convolution ConvViaJitMatmul ): Successful Merge of ( Convolution [1, 28, 28, 512] ) to ( Convolution ConvViaJitMatmul ): Target layer type is not mergeable Considering merge of Add to ConvOneJit Merge of ( Add [1, 28, 28, 512] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 28, 28, 512] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 28, 28, 128] ) to ( Convolution ConvOneJit ): Successful Merge of ( Convolution [1, 28, 28, 128] ) to ( Convolution ConvOneJit ): Target layer type is not mergeable Considering merge of RELU to ConvViaJitMatmul Merge of ( RELU [1, 28, 28, 128] ) to ( Convolution ConvViaJitMatmul ): Successful Merge of ( Convolution [1, 28, 28, 512] ) to ( Convolution ConvViaJitMatmul ): Target layer type is not mergeable Considering merge of Add to ConvOneJit Merge of ( Add [1, 28, 28, 512] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 28, 28, 512] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 28, 28, 128] ) to ( Convolution ConvOneJit ): Successful Merge of ( Convolution [1, 28, 28, 128] ) to ( Convolution ConvOneJit ): Target layer type is not mergeable Considering merge of RELU to ConvViaJitMatmul Merge of ( RELU [1, 28, 28, 128] ) to ( Convolution ConvViaJitMatmul ): Successful Merge of ( Convolution [1, 28, 28, 512] ) to ( Convolution ConvViaJitMatmul ): Target layer type is not mergeable Considering merge of Add to ConvOneJit Merge of ( Add [1, 28, 28, 512] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 28, 28, 512] ) to ( Convolution ConvOneJit ): Successful Considering merge of Add to ConvViaJitMatmul Kernel ConvViaJitMatmul rejected merge Merge of ( Add [1, 14, 14, 1024] ) to ( Convolution ConvViaJitMatmul ): Attempt merge failed Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 28, 28, 256] ) to ( Convolution ConvOneJit ): Successful Merge of ( Convolution [1, 14, 14, 256] ) to ( Convolution ConvOneJit ): Target layer type is not mergeable Considering merge of RELU to ConvViaJitMatmul Merge of ( RELU [1, 14, 14, 256] ) to ( Convolution ConvViaJitMatmul ): Successful Merge of ( Convolution [1, 14, 14, 1024] ) to ( Convolution ConvViaJitMatmul ): Target layer type is not mergeable Considering merge of Add to ConvOneJit Merge of ( Add [1, 14, 14, 1024] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 14, 14, 1024] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 14, 14, 256] ) to ( Convolution ConvOneJit ): Successful Merge of ( Convolution [1, 14, 14, 256] ) to ( Convolution ConvOneJit ): Target layer type is not mergeable Considering merge of RELU to ConvViaJitMatmul Merge of ( RELU [1, 14, 14, 256] ) to ( Convolution ConvViaJitMatmul ): Successful Merge of ( Convolution [1, 14, 14, 1024] ) to ( Convolution ConvViaJitMatmul ): Target layer type is not mergeable Considering merge of Add to ConvOneJit Merge of ( Add [1, 14, 14, 1024] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 14, 14, 1024] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 14, 14, 256] ) to ( Convolution ConvOneJit ): Successful Merge of ( Convolution [1, 14, 14, 256] ) to ( Convolution ConvOneJit ): Target layer type is not mergeable Considering merge of RELU to ConvViaJitMatmul Merge of ( RELU [1, 14, 14, 256] ) to ( Convolution ConvViaJitMatmul ): Successful Merge of ( Convolution [1, 14, 14, 1024] ) to ( Convolution ConvViaJitMatmul ): Target layer type is not mergeable Considering merge of Add to ConvOneJit Merge of ( Add [1, 14, 14, 1024] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 14, 14, 1024] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 14, 14, 256] ) to ( Convolution ConvOneJit ): Successful Merge of ( Convolution [1, 14, 14, 256] ) to ( Convolution ConvOneJit ): Target layer type is not mergeable Considering merge of RELU to ConvViaJitMatmul Merge of ( RELU [1, 14, 14, 256] ) to ( Convolution ConvViaJitMatmul ): Successful Merge of ( Convolution [1, 14, 14, 1024] ) to ( Convolution ConvViaJitMatmul ): Target layer type is not mergeable Considering merge of Add to ConvOneJit Merge of ( Add [1, 14, 14, 1024] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 14, 14, 1024] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 14, 14, 256] ) to ( Convolution ConvOneJit ): Successful Merge of ( Convolution [1, 14, 14, 256] ) to ( Convolution ConvOneJit ): Target layer type is not mergeable Considering merge of RELU to ConvViaJitMatmul Merge of ( RELU [1, 14, 14, 256] ) to ( Convolution ConvViaJitMatmul ): Successful Merge of ( Convolution [1, 14, 14, 1024] ) to ( Convolution ConvViaJitMatmul ): Target layer type is not mergeable Considering merge of Add to ConvOneJit Merge of ( Add [1, 14, 14, 1024] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 14, 14, 1024] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 14, 14, 256] ) to ( Convolution ConvOneJit ): Successful Merge of ( Convolution [1, 14, 14, 256] ) to ( Convolution ConvOneJit ): Target layer type is not mergeable Considering merge of RELU to ConvViaJitMatmul Merge of ( RELU [1, 14, 14, 256] ) to ( Convolution ConvViaJitMatmul ): Successful Merge of ( Convolution [1, 14, 14, 1024] ) to ( Convolution ConvViaJitMatmul ): Target layer type is not mergeable Considering merge of Add to ConvOneJit Merge of ( Add [1, 14, 14, 1024] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 14, 14, 1024] ) to ( Convolution ConvOneJit ): Successful Considering merge of Add to ConvViaJitMatmul Kernel ConvViaJitMatmul rejected merge Merge of ( Add [1, 7, 7, 2048] ) to ( Convolution ConvViaJitMatmul ): Attempt merge failed Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 14, 14, 512] ) to ( Convolution ConvOneJit ): Successful Merge of ( Convolution [1, 7, 7, 512] ) to ( Convolution ConvOneJit ): Target layer type is not mergeable Considering merge of RELU to ConvViaJitMatmul Merge of ( RELU [1, 7, 7, 512] ) to ( Convolution ConvViaJitMatmul ): Successful Merge of ( Convolution [1, 7, 7, 2048] ) to ( Convolution ConvViaJitMatmul ): Target layer type is not mergeable Considering merge of Add to ConvOneJit Merge of ( Add [1, 7, 7, 2048] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 7, 7, 2048] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 7, 7, 512] ) to ( Convolution ConvOneJit ): Successful Merge of ( Convolution [1, 7, 7, 512] ) to ( Convolution ConvOneJit ): Target layer type is not mergeable Considering merge of RELU to ConvViaJitMatmul Merge of ( RELU [1, 7, 7, 512] ) to ( Convolution ConvViaJitMatmul ): Successful Merge of ( Convolution [1, 7, 7, 2048] ) to ( Convolution ConvViaJitMatmul ): Target layer type is not mergeable Considering merge of Add to ConvOneJit Merge of ( Add [1, 7, 7, 2048] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 7, 7, 2048] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 7, 7, 512] ) to ( Convolution ConvOneJit ): Successful Merge of ( Convolution [1, 7, 7, 512] ) to ( Convolution ConvOneJit ): Target layer type is not mergeable Considering merge of RELU to ConvViaJitMatmul Merge of ( RELU [1, 7, 7, 512] ) to ( Convolution ConvViaJitMatmul ): Successful Merge of ( Convolution [1, 7, 7, 2048] ) to ( Convolution ConvViaJitMatmul ): Target layer type is not mergeable Considering merge of Add to ConvOneJit Merge of ( Add [1, 7, 7, 2048] ) to ( Convolution ConvOneJit ): Successful Considering merge of RELU to ConvOneJit Merge of ( RELU [1, 7, 7, 2048] ) to ( Convolution ConvOneJit ): Successful Merge of ( AdaptiveAvgPool [1, 1, 1, 2048] ) to ( Convolution ConvOneJit ): Target layer type is not mergeable Merge of ( Transpose [1, 2048, 1, 1] ) to ( AdaptiveAvgPool AdaptiveAvgPoolingMeta@NEON ): Target layer type is not mergeable Merge of ( Flatten [1, 2048] ) to ( Transpose TransposeIndexed ): Target layer type is not mergeable Merge of ( FullyConnected [1, 1000] ) to ( Flatten ForwardingKernelFlatten ): Target layer type is not mergeable Merge of ( Output [1, 1000] ) to ( FullyConnected FCViaConvOne ): Target layer type is not mergeable External allocation: allocating 4000 bytes Creating external output: 0 , shape: [1, 1000] Allocating 602112 bytes (aligned) Allocating 3 MB (numa) Allocating 802816 bytes (aligned) Allocating 802816 bytes (aligned) Allocating 802816 bytes (aligned) Allocating 3 MB (numa) Allocating 1 MB (numa) Allocating 1 MB (numa) Allocating 401408 bytes (aligned) Allocating 200704 bytes (aligned) Allocating 200704 bytes (aligned) Allocating 100352 bytes (aligned) Allocating 100352 bytes (aligned) Allocating 8192 bytes (aligned) Allocating 8192 bytes (aligned) Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Input Running TransposeBRC3x4 Running ConvViaJitMatmul Allocating 2 MB (numa) Jitted kernel for init: in_mode: ProxyInput acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [7x3] out_features: 12 in_tail_cols: 3 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff8a440000 , used 6464 B Jitted kernel for init: in_mode: ProxyInput acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [7x3] out_features: 4 in_tail_cols: 3 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff81d50000 , used 3748 B Running MaxPoolingMeta@NEON Running ConvOneJit Scratches: 0 @ 0 Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [6x4] out_features: 16 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff81d20000 , used 4076 B Running ConvViaJitMatmul Allocating 112896 bytes (aligned) Jitted kernel for init: in_mode: ProxyInput acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [6x4] out_features: 16 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff7dd10000 , used 4604 B Running ConvOneJit Scratches: 0 @ 0 Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [6x4] out_features: 16 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,NONE,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff781f0000 , used 3716 B Running ConvOneJit Scratches: 0 @ 0 Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [6x4] out_features: 16 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,BINOP_ADD_MATRIX,RELU,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff6c077000 , used 5012 B Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Allocating 112896 bytes (aligned) Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Allocating 112896 bytes (aligned) Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Allocating 3136 bytes (aligned) Jitted kernel for init: in_mode: ProxyInput acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [7x3] out_features: 12 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,NONE,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff6c067000 , used 4504 B Jitted kernel for init: in_mode: ProxyInput acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [7x3] out_features: 8 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,NONE,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff6c057000 , used 3692 B Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Allocating 28224 bytes (aligned) Jitted kernel for init: in_mode: ProxyInput acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [6x4] out_features: 16 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff6c047000 , used 4604 B Running ConvOneJit Scratches: 0 @ 0 Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [7x3] out_features: 12 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[NONE,NONE,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff6c037000 , used 3356 B Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [7x3] out_features: 8 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[NONE,NONE,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff6c027000 , used 2684 B Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::OUTPUT in_dtype: FLOAT ref_grid: [7x3] out_features: 12 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[NONE,NONE,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff6c017000 , used 4280 B Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::OUTPUT in_dtype: FLOAT ref_grid: [7x3] out_features: 8 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[NONE,NONE,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff6c007000 , used 3384 B Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::OUTPUT in_dtype: FLOAT ref_grid: [7x3] out_features: 12 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,BINOP_ADD_MATRIX,RELU,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5b2b5000 , used 6128 B Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::OUTPUT in_dtype: FLOAT ref_grid: [7x3] out_features: 8 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,BINOP_ADD_MATRIX,RELU,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5b2a5000 , used 4756 B Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Allocating 28224 bytes (aligned) Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Allocating 28224 bytes (aligned) Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Allocating 28224 bytes (aligned) Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Allocating 784 bytes (aligned) Jitted kernel for init: in_mode: ProxyInput acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [6x4] out_features: 16 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,NONE,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5b295000 , used 4244 B Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Allocating 7056 bytes (aligned) Jitted kernel for init: in_mode: ProxyInput acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [7x3] out_features: 12 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5b285000 , used 4868 B Jitted kernel for init: in_mode: ProxyInput acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [7x3] out_features: 4 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5b275000 , used 2908 B Running ConvOneJit Scratches: 0 @ 0 Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [7x3] out_features: 4 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[NONE,NONE,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5b265000 , used 1900 B Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::OUTPUT in_dtype: FLOAT ref_grid: [7x3] out_features: 4 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,BINOP_ADD_MATRIX,RELU,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5b255000 , used 3272 B Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Allocating 7056 bytes (aligned) Jitted kernel for init: in_mode: ProxyInput acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [7x3] out_features: 12 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 1 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5b245000 , used 62312 B Jitted kernel for init: in_mode: ProxyInput acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [7x3] out_features: 4 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 1 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5b235000 , used 32800 B Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Allocating 7056 bytes (aligned) Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Allocating 7056 bytes (aligned) Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Allocating 7056 bytes (aligned) Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Allocating 7056 bytes (aligned) Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Allocating 196 bytes (aligned) Jitted kernel for init: in_mode: ProxyInput acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [7x3] out_features: 12 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,NONE,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5b225000 , used 4504 B Jitted kernel for init: in_mode: ProxyInput acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [7x3] out_features: 8 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,NONE,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5b215000 , used 3692 B Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Allocating 1764 bytes (aligned) Jitted kernel for init: in_mode: ProxyInput acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [7x3] out_features: 12 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 1 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5b205000 , used 62312 B Jitted kernel for init: in_mode: ProxyInput acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [7x3] out_features: 8 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 1 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5b1f5000 , used 49532 B Running ConvOneJit Scratches: 0 @ 0 Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [7x3] out_features: 12 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,BINOP_ADD_MATRIX,RELU,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5b1e5000 , used 5204 B Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [7x3] out_features: 8 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,BINOP_ADD_MATRIX,RELU,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5b1d5000 , used 4056 B Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Allocating 1764 bytes (aligned) Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Allocating 1764 bytes (aligned) Running ConvOneJit Scratches: 0 @ 0 Running AdaptiveAvgPoolingMeta@NEON Running TransposeIndexed Running ForwardingKernelFlatten Running FCViaConvOne Tuning ConvTask(batch=1,idepth=1,iheight=1,iwidth=1,ichannels=2048,odepth=1,oheight=1,owidth=1,ochannels=1000,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0) Found preset with best score ConvOnePreset(in_regs=3,w_regs=4,w_prefetches=0,outf_tile=32,in_mode=MS,d_minibatch=256,n_minibatch=200) Allocating 7 MB (numa) Scratches: 0 @ 0 Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [3x4] out_features: 16 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[NONE,NONE,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5a9e4000 , used 1204 B Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::OUTPUT in_dtype: FLOAT ref_grid: [3x4] out_features: 16 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[NONE,NONE,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5a9d4000 , used 1456 B Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::OUTPUT in_dtype: FLOAT ref_grid: [3x4] out_features: 16 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,NONE,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5a9c4000 , used 1624 B Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::ZERO in_dtype: FLOAT ref_grid: [3x4] out_features: 8 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[NONE,NONE,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5a9b4000 , used 868 B Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::OUTPUT in_dtype: FLOAT ref_grid: [3x4] out_features: 8 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[NONE,NONE,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5a9a4000 , used 1024 B Jitted kernel for init: in_mode: MultiStream acc_init: AccInitializer::OUTPUT in_dtype: FLOAT ref_grid: [3x4] out_features: 8 in_tail_cols: 0 int8_apply_filter_offset: 0 int8_shift_uint8_to_sint8: 0 postprocessing_ops: PP[BINOP_ADD_LINEAR,NONE,NONE,NONE,NONE,NONE,] inner_iter_length: [no-value] sparse_proxy_in_optimization: 0 strided_weights: 0 input_can_read_last_full_vector: 0 weights_can_read_last_full_vector: 0 prefetch_options: {w_ahead: 0} at 0xffff5a994000 , used 1120 B Running ForwardingKernelOutput Creating blob for Input layer 3 with type FLOAT format PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a] shape [1, 3, 224, 224] Running AIO Network External allocation: allocating 4000 bytes Creating external output: 0 , shape: [1, 1000] Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Data Running Input Running TransposeBRC3x4 Running ConvViaJitMatmul Running MaxPoolingMeta@NEON Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running ConvOneJit Scratches: 0 @ 0 Running ConvViaJitMatmul Running ConvOneJit Scratches: 0 @ 0 Running AdaptiveAvgPoolingMeta@NEON Running TransposeIndexed Running ForwardingKernelFlatten Running FCViaConvOne Scratches: 0 @ 0 Running ForwardingKernelOutput Latency: 210 ms