Unrecognized AIO_DEBUG_MODE= 5 usigng default level = WARNING
Version: v0.8.0
Built with: clang++(Ubuntu Clang 14.0.0)
git-90df81a2a,Kuba Wolynko,2023-08-07T16:35:09+02:00 built 20230809_111727 by  on 96f65684ca4a
Internal environment variable DLS_DEBUG_SAVE_FAULTY_DATA is not prefixed with AIO_.
Internal environment variable DLS_DEBUG_PRINT_ON_SAME_KERNEL is not prefixed with AIO_.
AIO_DATA_DIR is /usr/local/share//libampere-aio
Available cores: 0, 1
AIO_NUM_THREADS read (but not applied yet) as 16
Couldn't read cpu governor
Numa balancing is off - OK
Requested 16 but only 2 are available. Num threads limited to 2
Binding thread 0 to 0
Binding thread 1 to 1
CPU bind done
Attempt to register kernel  AvgPoolingMeta<FP16[8]>@NEON with priority clashes (priority-wise) with the following kernels:  AvgPoolingMeta<FLOAT[4]>@NEON AvgPoolingMeta<INT8[32]>@NEON 
Attempt to register kernel  MaxPoolingMeta<FP16[8]>@NEON with priority clashes (priority-wise) with the following kernels:  MaxPoolingMeta<FLOAT[4]>@NEON MaxPoolingMeta<INT8[16]>@NEON 
Attempt to register kernel  TransposeBERTVectorized<INT8[16]>@NEON with priority clashes (priority-wise) with the following kernels:  TransposeBERTVectorized<FLOAT[16]>@NEON TransposeBERTVectorized<FP16[32]>@NEON 
Attempt to register kernel  TorchSliceVectorized<INT16[8]>@NEON with priority clashes (priority-wise) with the following kernels:  TorchSliceVectorized<FLOAT[4]>@NEON TorchSliceVectorized<FP16[8]>@NEON 
Attempt to register kernel  TorchSliceVectorized<INT8[16]>@NEON with priority clashes (priority-wise) with the following kernels:  TorchSliceVectorized<FLOAT[4]>@NEON TorchSliceVectorized<FP16[8]>@NEON TorchSliceVectorized<INT16[8]>@NEON 
Attempt to register kernel  TorchSliceVectorized<FLOAT[16]>@NEON with priority clashes (priority-wise) with the following kernels:  TorchSliceVectorized<FLOAT[4]>@NEON TorchSliceVectorized<FP16[8]>@NEON TorchSliceVectorized<INT16[8]>@NEON TorchSliceVectorized<INT8[16]>@NEON 
Registered Variables:
AIO_ALLOW_UNSAFE_DEPTHWISE = "0" is using default value
AIO_JIT_PROFILING = "0" is using default value
AIO_MICROKERNEL_MATMUL_FORCE = "0" is using default value
AIO_MICROKERNEL_DOTPROD_FORCE = "0" is using default value
AIO_DEBUG_LAYER_MERGING = "0" is using default value
AIO_DATA_CHECK_IMMUTABLE = "0" is using default value
AIO_LAYERS_TO_DEBUG is not set and has no default value
AIO_IMPLICIT_FP16_TRANSFORM_FILTER = "" is using default value
DLS_DEBUG_SAVE_FAULTY_DATA is not set and has no default value
AIO_DEBUG_LAYER_MAX_ERROR_FLOAT = "1e-5" is using default value
AIO_DEBUG_LAYER_MEAN_ERROR_FP16 = "1e-5" is using default value
AIO_DEBUG_LAYER_MEAN_ERROR_INT8 = "1" is using default value
AIO_DEBUG_LAYER_MEAN_ERROR is not set and has no default value
AIO_DEBUG_LAYER_MAX_ERROR is not set and has no default value
AIO_CVJM_USE_MAGIC = "1" is using default value
DLS_DEBUG_PRINT_ON_SAME_KERNEL = "0" is using default value
AIO_CPU_BIND = "1" is using default value
AIO_PROFILER_TIME_SCALE = "1e3" is using default value
AIO_LEGACY_TF = "0" is using default value
AIO_PROCESS_MODE = "1" (default = "1" )
AIO_REMOVE_PASSTHRU = "1" is using default value
AIO_PROFILER_SORT_MODE = "0" is using default value
AIO_DEBUGGER_LAYER_ID is not set and has no default value
AIO_GRAPH_FILE = "dls_graph" is using default value
AIO_PROFILER_SKIP_FIRST = "1" is using default value
AIO_DEBUG_LAYER_MAX_ERROR_INT8 = "1" is using default value
AIO_TRACING is not set and has no default value
AIO_SUPERNODE = "0" is using default value
AIO_PROFILER_LAYERS_TO_SKIP = "Data [merged]" is using default value
AIO_DEBUG_STRING_PRECISION = "3" is using default value
AIO_RECYCLE_BUFFERS = "1" is using default value
AIO_DEBUGGER = "0" is using default value
AIO_FORCE_MODE = "0" is using default value
AIO_MEM_BIND = "1" is using default value
AIO_PROFILER_OUTPUT_MODE = "NL" is using default value
AIO_CPU_LEVEL is not set and has no default value
AIO_NUMA_CPUS = "ALL" is using default value
AIO_KERNEL_PREFERLIST = "" is using default value
AIO_PROFILER_FLOAT_PRECISION = "6" is using default value
AIO_SOFT_FP16 is not set and has no default value
AIO_LIST_ENV_VARIABLES = "0" is using default value
AIO_PROFILER_MAX_NAME_LEN = "60" is using default value
AIO_ABORT_ON_ERROR = "0" is using default value
AIO_PREFER_FLOAT_QUANTIZATION = "1" is using default value
AIO_FORCE_GENERIC_MICROKERNEL = "0" is using default value
AIO_EXPORT_GRAPH = "0" is using default value
AIO_PROFILER_CONFIDENCE = "0.9" is using default value
AIO_DEBUG_FILE = "" is using default value
AIO_PROFILER_CSV_FILE = "cout" is using default value
AIO_TOPOLOGY_DEBUG = "0" is using default value
AIO_PROFILER_OUT_FILE = "cout" is using default value
AIO_SANITIZE_OUTPUT = "0" is using default value
AIO_CONV_ONE_JIT_USE_MAGIC = "1" is using default value
AIO_NUM_THREADS = "16" has no default
AIO_DEBUG_STRING_WIDTH = "-1" is using default value
AIO_TRACER_STRING_POOL = "1000000" is using default value
AIO_KERNEL_BLACKLIST = "" is using default value
AIO_SHOULD_USE_NUMA = "0" is using default value
AIO_SPLIT_BATCH = "0" is using default value
AIO_USE_NAIVE_BINOP_ALG = "1" is using default value
AIO_NEON_CONV_ONE_D = "256" is using default value
AIO_NO_LAYER_MERGING = "0" is using default value
AIO_DEBUG_LAYER_MAX_ERROR_FP16 = "1e-4" is using default value
AIO_USE_SIMPLE_TRANSFORM = "1" is using default value
AIO_USE_DETRANSPOSER_TRANSFORM = "1" is using default value
AIO_PROFILER_CSV_MODE = "0" is using default value
AIO_SAVE_MODEL = "0" is using default value
AIO_SKIP_MASTER_THREAD = "0" is using default value
AIO_UKERNEL_QADD_ROUND_INPUT = "1" is using default value
AIO_MERGE_PAD_TO_CONV = "1" is using default value
AIO_DEBUG_LAYER_MEAN_ERROR_FLOAT = "1e-6" is using default value
AIO_PROFILER = "0" is using default value
AIO_NEON_CONV_ONE_N = "200" is using default value
AIO_REPORT_CONV_TASK is not set and has no default value
AIO_CVJM_USE_LOOKUP = "1" is using default value
AIO_DEBUG_MODE = "5" (default = "WARN" )
AIO_LIST_UNREGISTERED_ENV_VARIABLES = "1" is using default value
XDG_DATA_DIRS = "/usr/local/share/:/usr/share/" is using default value
AIO_CVJM_SPARSE_THRESHOLD = "0.05" is using default value
AIO_NUMA_NODES = "LOCAL" is using default value
AIO_NEON_CONV_ONE_F = "32" is using default value
AIO_CONV_ONE_JIT_USE_LOOKUP = "1" is using default value
Unknown AIO variable: AIO_LIB_ROOT = "/aio"
DLS STARTED  14-08-2023 16:49:46
AIO_PROCESS_MODE:  1
AIO_FORCE_MODE: 0
AIO_NUM_THREADS:  2
CPU_BIND:  1
MEM_BIND:  1
AIO_SPLIT_BATCH:  0
AIO_NO_LAYER_MERGING 0
AIO_LEGACY_TF 0
AIO_SUPERNODE 0
AIO_USE_SIMPLE_TRANSFORM 1
AIO_USE_DETRANSPOSER_TRANSFORM 1
AIO_GRAPH_FILE dls_graph
DLS_DEBUG (threshold): 0
AIO_DEBUG_FILE: 
AIO_PROFILER: 0
Unrecognized AIO_DEBUG_MODE= 5 usigng default level = WARNING
Graph before optimizations
graph(%self.1 : __torch__.torch.fx.graph_module.___torch_mangle_247.GraphModule,
      %x : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu)):
  %self.self_layer4_2_conv3.weight_fused_bn : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_2_conv2.weight_fused_bn : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_2_conv1.weight_fused_bn : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_1_conv3.weight_fused_bn : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_1_conv2.weight_fused_bn : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_1_conv1.weight_fused_bn : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_0_downsample_0.weight_fused_bn : Float(2048, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.91 : Float(2048, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_0_conv3.weight_fused_bn : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_0_conv2.weight_fused_bn : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_0_conv1.weight_fused_bn : Float(512, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_5_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_5_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_5_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_4_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_4_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_4_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_3_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_3_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_3_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_2_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_2_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_2_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_1_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_1_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_1_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_0_downsample_0.weight_fused_bn : Float(1024, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.53 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_0_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_0_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_0_conv1.weight_fused_bn : Float(256, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_3_conv3.weight_fused_bn : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_3_conv2.weight_fused_bn : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_3_conv1.weight_fused_bn : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_2_conv3.weight_fused_bn : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_2_conv2.weight_fused_bn : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_2_conv1.weight_fused_bn : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_1_conv3.weight_fused_bn : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_1_conv2.weight_fused_bn : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_1_conv1.weight_fused_bn : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_0_downsample_0.weight_fused_bn : Float(512, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_0_conv3.weight_fused_bn : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_0_conv2.weight_fused_bn : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.23 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_0_conv1.weight_fused_bn : Float(128, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_2_conv3.weight_fused_bn : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_2_conv2.weight_fused_bn : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_2_conv1.weight_fused_bn : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_1_conv3.weight_fused_bn : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_1_conv2.weight_fused_bn : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_1_conv1.weight_fused_bn : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_0_downsample_0.weight_fused_bn : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_0_conv3.weight_fused_bn : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_0_conv2.weight_fused_bn : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_0_conv1.weight_fused_bn : Float(64, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.1 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_conv1.weight_fused_bn : Float(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_fc.bias : Float(1000, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_fc.weight : Float(1000, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %9 : bool = prim::Constant[value=1](), scope: __module.self_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %8 : bool = prim::Constant[value=0](), scope: __module.self_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %7 : int = prim::Constant[value=1](), scope: __module.self_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %6 : int[] = prim::Constant[value=[2, 2]]()
  %5 : int[] = prim::Constant[value=[3, 3]]()
  %4 : int[] = prim::Constant[value=[1, 1]]()
  %3 : int[] = prim::Constant[value=[0, 0]]()
  %2 : int = prim::Constant[value=-1]() # <eval_with_key>.1:178:0
  %input.1 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = aten::_convolution(%x, %self.self_conv1.weight_fused_bn, %328_fused_bn.1, %6, %5, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.5 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = aten::relu_(%input.1), scope: __module.self_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.7 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::max_pool2d(%input.5, %5, %6, %4, %4, %8), scope: __module.self_maxpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:782:0
  %input.9 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.7, %self.self_layer1_0_conv1.weight_fused_bn, %328_fused_bn.1, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.13 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.9), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.15 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.13, %self.self_layer1_0_conv2.weight_fused_bn, %328_fused_bn.1, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.19 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.15), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.21 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.19, %self.self_layer1_0_conv3.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.23 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.7, %self.self_layer1_0_downsample_0.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.25 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.21, %input.23, %7) # <eval_with_key>.1:19:0
  %input.27 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.25), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.29 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.27, %self.self_layer1_1_conv1.weight_fused_bn, %328_fused_bn.1, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.33 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.29), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.35 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.33, %self.self_layer1_1_conv2.weight_fused_bn, %328_fused_bn.1, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.39 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.35), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.41 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.39, %self.self_layer1_1_conv3.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.43 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.41, %input.27, %7) # <eval_with_key>.1:29:0
  %input.45 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.43), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.47 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.45, %self.self_layer1_2_conv1.weight_fused_bn, %328_fused_bn.1, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.51 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.47), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.53 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.51, %self.self_layer1_2_conv2.weight_fused_bn, %328_fused_bn.1, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.57 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.53), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.59 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.57, %self.self_layer1_2_conv3.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer1_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.61 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.59, %input.45, %7) # <eval_with_key>.1:39:0
  %input.63 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.61), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.65 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.63, %self.self_layer2_0_conv1.weight_fused_bn, %328_fused_bn.23, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.69 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.65), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.71 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.69, %self.self_layer2_0_conv2.weight_fused_bn, %328_fused_bn.23, %6, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.75 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.71), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.77 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.75, %self.self_layer2_0_conv3.weight_fused_bn, %328_fused_bn.27, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.79 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.63, %self.self_layer2_0_downsample_0.weight_fused_bn, %328_fused_bn.27, %6, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.81 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.77, %input.79, %7) # <eval_with_key>.1:51:0
  %input.83 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.81), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.85 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.83, %self.self_layer2_1_conv1.weight_fused_bn, %328_fused_bn.23, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.89 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.85), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.91 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.89, %self.self_layer2_1_conv2.weight_fused_bn, %328_fused_bn.23, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.95 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.91), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.97 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.95, %self.self_layer2_1_conv3.weight_fused_bn, %328_fused_bn.27, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.99 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.97, %input.83, %7) # <eval_with_key>.1:61:0
  %input.101 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.99), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.103 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.101, %self.self_layer2_2_conv1.weight_fused_bn, %328_fused_bn.23, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.107 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.103), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.109 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.107, %self.self_layer2_2_conv2.weight_fused_bn, %328_fused_bn.23, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.113 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.109), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.115 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.113, %self.self_layer2_2_conv3.weight_fused_bn, %328_fused_bn.27, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.117 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.115, %input.101, %7) # <eval_with_key>.1:71:0
  %input.119 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.117), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.121 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.119, %self.self_layer2_3_conv1.weight_fused_bn, %328_fused_bn.23, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.125 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.121), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.127 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.125, %self.self_layer2_3_conv2.weight_fused_bn, %328_fused_bn.23, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.131 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.127), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.133 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.131, %self.self_layer2_3_conv3.weight_fused_bn, %328_fused_bn.27, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer2_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.135 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.133, %input.119, %7) # <eval_with_key>.1:81:0
  %input.137 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.135), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.139 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.137, %self.self_layer3_0_conv1.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.143 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.139), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.145 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.143, %self.self_layer3_0_conv2.weight_fused_bn, %328_fused_bn.7, %6, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.149 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.145), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.151 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.149, %self.self_layer3_0_conv3.weight_fused_bn, %328_fused_bn.53, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.153 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.137, %self.self_layer3_0_downsample_0.weight_fused_bn, %328_fused_bn.53, %6, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.155 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.151, %input.153, %7) # <eval_with_key>.1:93:0
  %input.157 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.155), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.159 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.157, %self.self_layer3_1_conv1.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.163 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.159), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.165 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.163, %self.self_layer3_1_conv2.weight_fused_bn, %328_fused_bn.7, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.169 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.165), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.171 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.169, %self.self_layer3_1_conv3.weight_fused_bn, %328_fused_bn.53, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.173 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.171, %input.157, %7) # <eval_with_key>.1:103:0
  %input.175 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.173), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.177 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.175, %self.self_layer3_2_conv1.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.181 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.177), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.183 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.181, %self.self_layer3_2_conv2.weight_fused_bn, %328_fused_bn.7, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.187 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.183), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.189 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.187, %self.self_layer3_2_conv3.weight_fused_bn, %328_fused_bn.53, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.191 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.189, %input.175, %7) # <eval_with_key>.1:113:0
  %input.193 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.191), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.195 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.193, %self.self_layer3_3_conv1.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.199 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.195), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.201 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.199, %self.self_layer3_3_conv2.weight_fused_bn, %328_fused_bn.7, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.205 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.201), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.207 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.205, %self.self_layer3_3_conv3.weight_fused_bn, %328_fused_bn.53, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.209 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.207, %input.193, %7) # <eval_with_key>.1:123:0
  %input.211 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.209), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.213 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.211, %self.self_layer3_4_conv1.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_4_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.217 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.213), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.219 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.217, %self.self_layer3_4_conv2.weight_fused_bn, %328_fused_bn.7, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_4_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.223 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.219), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.225 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.223, %self.self_layer3_4_conv3.weight_fused_bn, %328_fused_bn.53, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_4_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.227 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.225, %input.211, %7) # <eval_with_key>.1:133:0
  %input.229 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.227), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.231 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.229, %self.self_layer3_5_conv1.weight_fused_bn, %328_fused_bn.7, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_5_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.235 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.231), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.237 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.235, %self.self_layer3_5_conv2.weight_fused_bn, %328_fused_bn.7, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_5_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.241 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.237), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.243 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.241, %self.self_layer3_5_conv3.weight_fused_bn, %328_fused_bn.53, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer3_5_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.245 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.243, %input.229, %7) # <eval_with_key>.1:143:0
  %input.247 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.245), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.249 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.247, %self.self_layer4_0_conv1.weight_fused_bn, %328_fused_bn.27, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.253 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.249), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.255 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.253, %self.self_layer4_0_conv2.weight_fused_bn, %328_fused_bn.27, %6, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.259 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.255), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.261 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.259, %self.self_layer4_0_conv3.weight_fused_bn, %328_fused_bn.91, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.263 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.247, %self.self_layer4_0_downsample_0.weight_fused_bn, %328_fused_bn.91, %6, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.265 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.261, %input.263, %7) # <eval_with_key>.1:155:0
  %input.267 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.265), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.269 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.267, %self.self_layer4_1_conv1.weight_fused_bn, %328_fused_bn.27, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.273 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.269), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.275 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.273, %self.self_layer4_1_conv2.weight_fused_bn, %328_fused_bn.27, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.279 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.275), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.281 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.279, %self.self_layer4_1_conv3.weight_fused_bn, %328_fused_bn.91, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.283 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.281, %input.267, %7) # <eval_with_key>.1:165:0
  %input.285 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.283), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.287 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.285, %self.self_layer4_2_conv1.weight_fused_bn, %328_fused_bn.27, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.291 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.287), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.293 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.291, %self.self_layer4_2_conv2.weight_fused_bn, %328_fused_bn.27, %4, %4, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.297 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.293), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.299 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.297, %self.self_layer4_2_conv3.weight_fused_bn, %328_fused_bn.91, %4, %3, %4, %8, %3, %7, %8, %8, %9, %9), scope: __module.self_layer4_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.301 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.299, %input.285, %7) # <eval_with_key>.1:175:0
  %input.303 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.301), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %self_avgpool.1 : Float(1, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = aten::adaptive_avg_pool2d(%input.303, %4), scope: __module.self_avgpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1214:0
  %input : Float(1, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = aten::flatten(%self_avgpool.1, %7, %2) # <eval_with_key>.1:178:0
  %438 : Float(1, 1000, strides=[1000, 1], requires_grad=0, device=cpu) = aten::linear(%input, %self.self_fc.weight, %self.self_fc.bias), scope: __module.self_fc # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py:114:0
  %440 : (Tensor) = prim::TupleConstruct(%438)
  return (%440)
Graph after fusion pass
graph(%self.1 : __torch__.torch.fx.graph_module.___torch_mangle_247.GraphModule,
      %x : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu)):
  %input.2 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_0(%x)
  %input.6 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_1(%input.2)
  %input.10 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_2(%input.6)
  %input.14 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_3(%input.10)
  %input.18 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_4(%input.14)
  %input.22 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_5(%input.18)
  %input.26 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_6(%input.22)
  %input.30 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_7(%input.26)
  %input.34 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_8(%input.10)
  %input.38 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_9(%input.30, %input.34)
  %input.42 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_10(%input.38)
  %input.46 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_11(%input.42)
  %input.50 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_12(%input.46)
  %input.54 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_13(%input.50)
  %input.58 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_14(%input.54)
  %input.62 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_15(%input.58)
  %input.66 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_16(%input.62, %input.42)
  %input.70 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_17(%input.66)
  %input.74 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_18(%input.70)
  %input.78 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_19(%input.74)
  %input.82 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_20(%input.78)
  %input.86 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_21(%input.82)
  %input.90 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_22(%input.86)
  %input.94 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_23(%input.90, %input.70)
  %input.98 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_24(%input.94)
  %input.102 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_25(%input.98)
  %input.106 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_26(%input.102)
  %input.110 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_27(%input.106)
  %input.114 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_28(%input.110)
  %input.118 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_29(%input.114)
  %input.122 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_30(%input.98)
  %input.126 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_31(%input.118, %input.122)
  %input.130 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_32(%input.126)
  %input.134 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_33(%input.130)
  %input.138 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_34(%input.134)
  %input.142 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_35(%input.138)
  %input.146 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_36(%input.142)
  %input.150 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_37(%input.146)
  %input.154 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_38(%input.150, %input.130)
  %input.158 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_39(%input.154)
  %input.162 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_40(%input.158)
  %input.166 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_41(%input.162)
  %input.170 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_42(%input.166)
  %input.174 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_43(%input.170)
  %input.178 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_44(%input.174)
  %input.182 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_45(%input.178, %input.158)
  %input.186 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_46(%input.182)
  %input.190 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_47(%input.186)
  %input.194 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_48(%input.190)
  %input.198 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_49(%input.194)
  %input.202 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_50(%input.198)
  %input.206 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_51(%input.202)
  %input.210 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_52(%input.206, %input.186)
  %input.214 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_53(%input.210)
  %input.218 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_54(%input.214)
  %input.222 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_55(%input.218)
  %input.226 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_56(%input.222)
  %input.230 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_57(%input.226)
  %input.234 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_58(%input.230)
  %input.238 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_59(%input.214)
  %input.242 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_60(%input.234, %input.238)
  %input.246 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_61(%input.242)
  %input.250 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_62(%input.246)
  %input.254 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_63(%input.250)
  %input.258 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_64(%input.254)
  %input.262 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_65(%input.258)
  %input.266 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_66(%input.262)
  %input.270 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_67(%input.266, %input.246)
  %input.274 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_68(%input.270)
  %input.278 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_69(%input.274)
  %input.282 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_70(%input.278)
  %input.286 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_71(%input.282)
  %input.290 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_72(%input.286)
  %input.294 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_73(%input.290)
  %input.298 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_74(%input.294, %input.274)
  %input.302 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_75(%input.298)
  %input.306 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_76(%input.302)
  %input.310 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_77(%input.306)
  %input.314 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_78(%input.310)
  %input.318 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_79(%input.314)
  %input.322 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_80(%input.318)
  %input.326 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_81(%input.322, %input.302)
  %input.330 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_82(%input.326)
  %input.334 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_83(%input.330)
  %input.338 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_84(%input.334)
  %input.342 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_85(%input.338)
  %input.346 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_86(%input.342)
  %input.350 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_87(%input.346)
  %input.354 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_88(%input.350, %input.330)
  %input.358 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_89(%input.354)
  %input.362 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_90(%input.358)
  %input.366 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_91(%input.362)
  %input.370 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_92(%input.366)
  %input.374 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_93(%input.370)
  %input.378 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_94(%input.374)
  %input.382 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_95(%input.378, %input.358)
  %input.386 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_96(%input.382)
  %input.390 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_97(%input.386)
  %input.394 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_98(%input.390)
  %input.398 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_99(%input.394)
  %input.402 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_100(%input.398)
  %input.406 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_101(%input.402)
  %input.410 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_102(%input.386)
  %input.414 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_103(%input.406, %input.410)
  %input.418 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_104(%input.414)
  %input.422 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_105(%input.418)
  %input.426 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_106(%input.422)
  %input.430 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_107(%input.426)
  %input.434 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_108(%input.430)
  %input.438 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_109(%input.434)
  %input.442 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_110(%input.438, %input.418)
  %input.446 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_111(%input.442)
  %input.450 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_112(%input.446)
  %input.454 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_113(%input.450)
  %input.458 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_114(%input.454)
  %input.462 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_115(%input.458)
  %input.466 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_116(%input.462)
  %input.470 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_117(%input.466, %input.446)
  %input.474 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_118(%input.470)
  %self_avgpool.2 : Float(1, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_119(%input.474)
  %input.478 : Float(1, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_120(%self_avgpool.2)
  %684 : Float(1, 1000, strides=[1000, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_121(%input.478)
  %440 : (Tensor) = prim::TupleConstruct(%684)
  return (%440)
with prim::AIOFusionGroup_0 = graph(%x : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu)):
  %self.self_conv1.weight_fused_bn : Float(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.1 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[2, 2]]()
  %4 : int[] = prim::Constant[value=[3, 3]]()
  %5 : int[] = prim::Constant[value=[1, 1]]()
  %6 : bool = prim::Constant[value=0]()
  %7 : int[] = prim::Constant[value=[0, 0]]()
  %8 : int = prim::Constant[value=1]()
  %9 : bool = prim::Constant[value=1]()
  %input.2 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = aten::_convolution(%x, %self.self_conv1.weight_fused_bn, %328_fused_bn.1, %3, %4, %5, %6, %7, %8, %6, %6, %9, %9), scope: __module.self_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.2)
with prim::AIOFusionGroup_1 = graph(%input.2 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu)):
  %input.6 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = aten::relu_(%input.2), scope: __module.self_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.6)
with prim::AIOFusionGroup_2 = graph(%input.6 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu)):
  %1 : int[] = prim::Constant[value=[3, 3]]()
  %2 : int[] = prim::Constant[value=[2, 2]]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : bool = prim::Constant[value=0]()
  %input.10 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::max_pool2d(%input.6, %1, %2, %3, %3, %4), scope: __module.self_maxpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:782:0
  return (%input.10)
with prim::AIOFusionGroup_3 = graph(%input.10 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)):
  %self.self_layer1_0_conv1.weight_fused_bn : Float(64, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.1 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.14 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.10, %self.self_layer1_0_conv1.weight_fused_bn, %328_fused_bn.1, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer1_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.14)
with prim::AIOFusionGroup_4 = graph(%input.14 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)):
  %input.18 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.14), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.18)
with prim::AIOFusionGroup_5 = graph(%input.18 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)):
  %self.self_layer1_0_conv2.weight_fused_bn : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.1 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : bool = prim::Constant[value=0]()
  %5 : int[] = prim::Constant[value=[0, 0]]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.22 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.18, %self.self_layer1_0_conv2.weight_fused_bn, %328_fused_bn.1, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer1_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.22)
with prim::AIOFusionGroup_6 = graph(%input.22 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)):
  %input.26 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.22), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.26)
with prim::AIOFusionGroup_7 = graph(%input.26 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)):
  %self.self_layer1_0_conv3.weight_fused_bn : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.30 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.26, %self.self_layer1_0_conv3.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer1_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.30)
with prim::AIOFusionGroup_8 = graph(%input.10 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)):
  %self.self_layer1_0_downsample_0.weight_fused_bn : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.34 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.10, %self.self_layer1_0_downsample_0.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer1_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.34)
with prim::AIOFusionGroup_9 = graph(%input.30 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu),
      %input.34 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)):
  %2 : int = prim::Constant[value=1]()
  %input.38 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.30, %input.34, %2) # <eval_with_key>.1:19:0
  return (%input.38)
with prim::AIOFusionGroup_10 = graph(%input.38 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)):
  %input.42 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.38), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.42)
with prim::AIOFusionGroup_11 = graph(%input.42 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)):
  %self.self_layer1_1_conv1.weight_fused_bn : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.1 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.46 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.42, %self.self_layer1_1_conv1.weight_fused_bn, %328_fused_bn.1, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer1_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.46)
with prim::AIOFusionGroup_12 = graph(%input.46 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)):
  %input.50 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.46), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.50)
with prim::AIOFusionGroup_13 = graph(%input.50 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)):
  %self.self_layer1_1_conv2.weight_fused_bn : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.1 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : bool = prim::Constant[value=0]()
  %5 : int[] = prim::Constant[value=[0, 0]]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.54 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.50, %self.self_layer1_1_conv2.weight_fused_bn, %328_fused_bn.1, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer1_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.54)
with prim::AIOFusionGroup_14 = graph(%input.54 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)):
  %input.58 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.54), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.58)
with prim::AIOFusionGroup_15 = graph(%input.58 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)):
  %self.self_layer1_1_conv3.weight_fused_bn : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.62 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.58, %self.self_layer1_1_conv3.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer1_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.62)
with prim::AIOFusionGroup_16 = graph(%input.62 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu),
      %input.42 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)):
  %2 : int = prim::Constant[value=1]()
  %input.66 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.62, %input.42, %2) # <eval_with_key>.1:29:0
  return (%input.66)
with prim::AIOFusionGroup_17 = graph(%input.66 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)):
  %input.70 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.66), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.70)
with prim::AIOFusionGroup_18 = graph(%input.70 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)):
  %self.self_layer1_2_conv1.weight_fused_bn : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.1 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.74 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.70, %self.self_layer1_2_conv1.weight_fused_bn, %328_fused_bn.1, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer1_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.74)
with prim::AIOFusionGroup_19 = graph(%input.74 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)):
  %input.78 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.74), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.78)
with prim::AIOFusionGroup_20 = graph(%input.78 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)):
  %self.self_layer1_2_conv2.weight_fused_bn : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.1 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : bool = prim::Constant[value=0]()
  %5 : int[] = prim::Constant[value=[0, 0]]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.82 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.78, %self.self_layer1_2_conv2.weight_fused_bn, %328_fused_bn.1, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer1_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.82)
with prim::AIOFusionGroup_21 = graph(%input.82 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)):
  %input.86 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.82), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.86)
with prim::AIOFusionGroup_22 = graph(%input.86 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu)):
  %self.self_layer1_2_conv3.weight_fused_bn : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.90 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.86, %self.self_layer1_2_conv3.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer1_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.90)
with prim::AIOFusionGroup_23 = graph(%input.90 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu),
      %input.70 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)):
  %2 : int = prim::Constant[value=1]()
  %input.94 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.90, %input.70, %2) # <eval_with_key>.1:39:0
  return (%input.94)
with prim::AIOFusionGroup_24 = graph(%input.94 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)):
  %input.98 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.94), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.98)
with prim::AIOFusionGroup_25 = graph(%input.98 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)):
  %self.self_layer2_0_conv1.weight_fused_bn : Float(128, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.23 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.102 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.98, %self.self_layer2_0_conv1.weight_fused_bn, %328_fused_bn.23, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer2_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.102)
with prim::AIOFusionGroup_26 = graph(%input.102 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu)):
  %input.106 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.102), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.106)
with prim::AIOFusionGroup_27 = graph(%input.106 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu)):
  %self.self_layer2_0_conv2.weight_fused_bn : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.23 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[2, 2]]()
  %4 : int[] = prim::Constant[value=[1, 1]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int[] = prim::Constant[value=[0, 0]]()
  %7 : int = prim::Constant[value=1]()
  %8 : bool = prim::Constant[value=1]()
  %input.110 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.106, %self.self_layer2_0_conv2.weight_fused_bn, %328_fused_bn.23, %3, %4, %4, %5, %6, %7, %5, %5, %8, %8), scope: __module.self_layer2_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.110)
with prim::AIOFusionGroup_28 = graph(%input.110 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)):
  %input.114 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.110), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.114)
with prim::AIOFusionGroup_29 = graph(%input.114 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)):
  %self.self_layer2_0_conv3.weight_fused_bn : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.118 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.114, %self.self_layer2_0_conv3.weight_fused_bn, %328_fused_bn.27, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer2_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.118)
with prim::AIOFusionGroup_30 = graph(%input.98 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu)):
  %self.self_layer2_0_downsample_0.weight_fused_bn : Float(512, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[2, 2]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : int[] = prim::Constant[value=[1, 1]]()
  %6 : bool = prim::Constant[value=0]()
  %7 : int = prim::Constant[value=1]()
  %8 : bool = prim::Constant[value=1]()
  %input.122 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.98, %self.self_layer2_0_downsample_0.weight_fused_bn, %328_fused_bn.27, %3, %4, %5, %6, %4, %7, %6, %6, %8, %8), scope: __module.self_layer2_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.122)
with prim::AIOFusionGroup_31 = graph(%input.118 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu),
      %input.122 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)):
  %2 : int = prim::Constant[value=1]()
  %input.126 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.118, %input.122, %2) # <eval_with_key>.1:51:0
  return (%input.126)
with prim::AIOFusionGroup_32 = graph(%input.126 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)):
  %input.130 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.126), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.130)
with prim::AIOFusionGroup_33 = graph(%input.130 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)):
  %self.self_layer2_1_conv1.weight_fused_bn : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.23 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.134 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.130, %self.self_layer2_1_conv1.weight_fused_bn, %328_fused_bn.23, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer2_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.134)
with prim::AIOFusionGroup_34 = graph(%input.134 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)):
  %input.138 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.134), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.138)
with prim::AIOFusionGroup_35 = graph(%input.138 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)):
  %self.self_layer2_1_conv2.weight_fused_bn : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.23 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : bool = prim::Constant[value=0]()
  %5 : int[] = prim::Constant[value=[0, 0]]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.142 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.138, %self.self_layer2_1_conv2.weight_fused_bn, %328_fused_bn.23, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer2_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.142)
with prim::AIOFusionGroup_36 = graph(%input.142 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)):
  %input.146 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.142), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.146)
with prim::AIOFusionGroup_37 = graph(%input.146 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)):
  %self.self_layer2_1_conv3.weight_fused_bn : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.150 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.146, %self.self_layer2_1_conv3.weight_fused_bn, %328_fused_bn.27, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer2_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.150)
with prim::AIOFusionGroup_38 = graph(%input.150 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu),
      %input.130 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)):
  %2 : int = prim::Constant[value=1]()
  %input.154 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.150, %input.130, %2) # <eval_with_key>.1:61:0
  return (%input.154)
with prim::AIOFusionGroup_39 = graph(%input.154 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)):
  %input.158 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.154), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.158)
with prim::AIOFusionGroup_40 = graph(%input.158 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)):
  %self.self_layer2_2_conv1.weight_fused_bn : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.23 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.162 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.158, %self.self_layer2_2_conv1.weight_fused_bn, %328_fused_bn.23, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer2_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.162)
with prim::AIOFusionGroup_41 = graph(%input.162 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)):
  %input.166 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.162), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.166)
with prim::AIOFusionGroup_42 = graph(%input.166 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)):
  %self.self_layer2_2_conv2.weight_fused_bn : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.23 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : bool = prim::Constant[value=0]()
  %5 : int[] = prim::Constant[value=[0, 0]]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.170 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.166, %self.self_layer2_2_conv2.weight_fused_bn, %328_fused_bn.23, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer2_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.170)
with prim::AIOFusionGroup_43 = graph(%input.170 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)):
  %input.174 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.170), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.174)
with prim::AIOFusionGroup_44 = graph(%input.174 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)):
  %self.self_layer2_2_conv3.weight_fused_bn : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.178 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.174, %self.self_layer2_2_conv3.weight_fused_bn, %328_fused_bn.27, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer2_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.178)
with prim::AIOFusionGroup_45 = graph(%input.178 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu),
      %input.158 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)):
  %2 : int = prim::Constant[value=1]()
  %input.182 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.178, %input.158, %2) # <eval_with_key>.1:71:0
  return (%input.182)
with prim::AIOFusionGroup_46 = graph(%input.182 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)):
  %input.186 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.182), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.186)
with prim::AIOFusionGroup_47 = graph(%input.186 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)):
  %self.self_layer2_3_conv1.weight_fused_bn : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.23 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.190 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.186, %self.self_layer2_3_conv1.weight_fused_bn, %328_fused_bn.23, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer2_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.190)
with prim::AIOFusionGroup_48 = graph(%input.190 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)):
  %input.194 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.190), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.194)
with prim::AIOFusionGroup_49 = graph(%input.194 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)):
  %self.self_layer2_3_conv2.weight_fused_bn : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.23 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : bool = prim::Constant[value=0]()
  %5 : int[] = prim::Constant[value=[0, 0]]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.198 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.194, %self.self_layer2_3_conv2.weight_fused_bn, %328_fused_bn.23, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer2_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.198)
with prim::AIOFusionGroup_50 = graph(%input.198 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)):
  %input.202 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.198), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.202)
with prim::AIOFusionGroup_51 = graph(%input.202 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu)):
  %self.self_layer2_3_conv3.weight_fused_bn : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.206 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.202, %self.self_layer2_3_conv3.weight_fused_bn, %328_fused_bn.27, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer2_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.206)
with prim::AIOFusionGroup_52 = graph(%input.206 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu),
      %input.186 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)):
  %2 : int = prim::Constant[value=1]()
  %input.210 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.206, %input.186, %2) # <eval_with_key>.1:81:0
  return (%input.210)
with prim::AIOFusionGroup_53 = graph(%input.210 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)):
  %input.214 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.210), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.214)
with prim::AIOFusionGroup_54 = graph(%input.214 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_0_conv1.weight_fused_bn : Float(256, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.218 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.214, %self.self_layer3_0_conv1.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.218)
with prim::AIOFusionGroup_55 = graph(%input.218 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu)):
  %input.222 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.218), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.222)
with prim::AIOFusionGroup_56 = graph(%input.222 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_0_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[2, 2]]()
  %4 : int[] = prim::Constant[value=[1, 1]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int[] = prim::Constant[value=[0, 0]]()
  %7 : int = prim::Constant[value=1]()
  %8 : bool = prim::Constant[value=1]()
  %input.226 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.222, %self.self_layer3_0_conv2.weight_fused_bn, %328_fused_bn.7, %3, %4, %4, %5, %6, %7, %5, %5, %8, %8), scope: __module.self_layer3_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.226)
with prim::AIOFusionGroup_57 = graph(%input.226 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %input.230 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.226), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.230)
with prim::AIOFusionGroup_58 = graph(%input.230 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_0_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.53 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.234 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.230, %self.self_layer3_0_conv3.weight_fused_bn, %328_fused_bn.53, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.234)
with prim::AIOFusionGroup_59 = graph(%input.214 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_0_downsample_0.weight_fused_bn : Float(1024, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.53 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[2, 2]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : int[] = prim::Constant[value=[1, 1]]()
  %6 : bool = prim::Constant[value=0]()
  %7 : int = prim::Constant[value=1]()
  %8 : bool = prim::Constant[value=1]()
  %input.238 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.214, %self.self_layer3_0_downsample_0.weight_fused_bn, %328_fused_bn.53, %3, %4, %5, %6, %4, %7, %6, %6, %8, %8), scope: __module.self_layer3_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.238)
with prim::AIOFusionGroup_60 = graph(%input.234 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu),
      %input.238 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %2 : int = prim::Constant[value=1]()
  %input.242 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.234, %input.238, %2) # <eval_with_key>.1:93:0
  return (%input.242)
with prim::AIOFusionGroup_61 = graph(%input.242 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %input.246 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.242), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.246)
with prim::AIOFusionGroup_62 = graph(%input.246 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_1_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.250 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.246, %self.self_layer3_1_conv1.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.250)
with prim::AIOFusionGroup_63 = graph(%input.250 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %input.254 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.250), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.254)
with prim::AIOFusionGroup_64 = graph(%input.254 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_1_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : bool = prim::Constant[value=0]()
  %5 : int[] = prim::Constant[value=[0, 0]]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.258 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.254, %self.self_layer3_1_conv2.weight_fused_bn, %328_fused_bn.7, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer3_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.258)
with prim::AIOFusionGroup_65 = graph(%input.258 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %input.262 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.258), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.262)
with prim::AIOFusionGroup_66 = graph(%input.262 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_1_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.53 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.266 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.262, %self.self_layer3_1_conv3.weight_fused_bn, %328_fused_bn.53, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.266)
with prim::AIOFusionGroup_67 = graph(%input.266 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu),
      %input.246 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %2 : int = prim::Constant[value=1]()
  %input.270 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.266, %input.246, %2) # <eval_with_key>.1:103:0
  return (%input.270)
with prim::AIOFusionGroup_68 = graph(%input.270 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %input.274 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.270), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.274)
with prim::AIOFusionGroup_69 = graph(%input.274 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_2_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.278 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.274, %self.self_layer3_2_conv1.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.278)
with prim::AIOFusionGroup_70 = graph(%input.278 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %input.282 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.278), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.282)
with prim::AIOFusionGroup_71 = graph(%input.282 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_2_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : bool = prim::Constant[value=0]()
  %5 : int[] = prim::Constant[value=[0, 0]]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.286 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.282, %self.self_layer3_2_conv2.weight_fused_bn, %328_fused_bn.7, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer3_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.286)
with prim::AIOFusionGroup_72 = graph(%input.286 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %input.290 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.286), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.290)
with prim::AIOFusionGroup_73 = graph(%input.290 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_2_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.53 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.294 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.290, %self.self_layer3_2_conv3.weight_fused_bn, %328_fused_bn.53, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.294)
with prim::AIOFusionGroup_74 = graph(%input.294 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu),
      %input.274 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %2 : int = prim::Constant[value=1]()
  %input.298 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.294, %input.274, %2) # <eval_with_key>.1:113:0
  return (%input.298)
with prim::AIOFusionGroup_75 = graph(%input.298 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %input.302 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.298), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.302)
with prim::AIOFusionGroup_76 = graph(%input.302 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_3_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.306 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.302, %self.self_layer3_3_conv1.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.306)
with prim::AIOFusionGroup_77 = graph(%input.306 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %input.310 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.306), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.310)
with prim::AIOFusionGroup_78 = graph(%input.310 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_3_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : bool = prim::Constant[value=0]()
  %5 : int[] = prim::Constant[value=[0, 0]]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.314 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.310, %self.self_layer3_3_conv2.weight_fused_bn, %328_fused_bn.7, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer3_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.314)
with prim::AIOFusionGroup_79 = graph(%input.314 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %input.318 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.314), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.318)
with prim::AIOFusionGroup_80 = graph(%input.318 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_3_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.53 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.322 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.318, %self.self_layer3_3_conv3.weight_fused_bn, %328_fused_bn.53, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.322)
with prim::AIOFusionGroup_81 = graph(%input.322 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu),
      %input.302 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %2 : int = prim::Constant[value=1]()
  %input.326 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.322, %input.302, %2) # <eval_with_key>.1:123:0
  return (%input.326)
with prim::AIOFusionGroup_82 = graph(%input.326 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %input.330 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.326), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.330)
with prim::AIOFusionGroup_83 = graph(%input.330 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_4_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.334 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.330, %self.self_layer3_4_conv1.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_4_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.334)
with prim::AIOFusionGroup_84 = graph(%input.334 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %input.338 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.334), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.338)
with prim::AIOFusionGroup_85 = graph(%input.338 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_4_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : bool = prim::Constant[value=0]()
  %5 : int[] = prim::Constant[value=[0, 0]]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.342 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.338, %self.self_layer3_4_conv2.weight_fused_bn, %328_fused_bn.7, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer3_4_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.342)
with prim::AIOFusionGroup_86 = graph(%input.342 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %input.346 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.342), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.346)
with prim::AIOFusionGroup_87 = graph(%input.346 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_4_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.53 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.350 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.346, %self.self_layer3_4_conv3.weight_fused_bn, %328_fused_bn.53, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_4_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.350)
with prim::AIOFusionGroup_88 = graph(%input.350 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu),
      %input.330 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %2 : int = prim::Constant[value=1]()
  %input.354 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.350, %input.330, %2) # <eval_with_key>.1:133:0
  return (%input.354)
with prim::AIOFusionGroup_89 = graph(%input.354 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %input.358 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.354), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.358)
with prim::AIOFusionGroup_90 = graph(%input.358 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_5_conv1.weight_fused_bn : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.362 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.358, %self.self_layer3_5_conv1.weight_fused_bn, %328_fused_bn.7, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_5_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.362)
with prim::AIOFusionGroup_91 = graph(%input.362 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %input.366 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.362), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.366)
with prim::AIOFusionGroup_92 = graph(%input.366 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_5_conv2.weight_fused_bn : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.7 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : bool = prim::Constant[value=0]()
  %5 : int[] = prim::Constant[value=[0, 0]]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.370 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.366, %self.self_layer3_5_conv2.weight_fused_bn, %328_fused_bn.7, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer3_5_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.370)
with prim::AIOFusionGroup_93 = graph(%input.370 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %input.374 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.370), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.374)
with prim::AIOFusionGroup_94 = graph(%input.374 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer3_5_conv3.weight_fused_bn : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.53 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.378 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.374, %self.self_layer3_5_conv3.weight_fused_bn, %328_fused_bn.53, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer3_5_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.378)
with prim::AIOFusionGroup_95 = graph(%input.378 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu),
      %input.358 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %2 : int = prim::Constant[value=1]()
  %input.382 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.378, %input.358, %2) # <eval_with_key>.1:143:0
  return (%input.382)
with prim::AIOFusionGroup_96 = graph(%input.382 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %input.386 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.382), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.386)
with prim::AIOFusionGroup_97 = graph(%input.386 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer4_0_conv1.weight_fused_bn : Float(512, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.390 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.386, %self.self_layer4_0_conv1.weight_fused_bn, %328_fused_bn.27, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer4_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.390)
with prim::AIOFusionGroup_98 = graph(%input.390 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu)):
  %input.394 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.390), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.394)
with prim::AIOFusionGroup_99 = graph(%input.394 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer4_0_conv2.weight_fused_bn : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[2, 2]]()
  %4 : int[] = prim::Constant[value=[1, 1]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int[] = prim::Constant[value=[0, 0]]()
  %7 : int = prim::Constant[value=1]()
  %8 : bool = prim::Constant[value=1]()
  %input.398 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.394, %self.self_layer4_0_conv2.weight_fused_bn, %328_fused_bn.27, %3, %4, %4, %5, %6, %7, %5, %5, %8, %8), scope: __module.self_layer4_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.398)
with prim::AIOFusionGroup_100 = graph(%input.398 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)):
  %input.402 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.398), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.402)
with prim::AIOFusionGroup_101 = graph(%input.402 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)):
  %self.self_layer4_0_conv3.weight_fused_bn : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.91 : Float(2048, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.406 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.402, %self.self_layer4_0_conv3.weight_fused_bn, %328_fused_bn.91, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer4_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.406)
with prim::AIOFusionGroup_102 = graph(%input.386 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu)):
  %self.self_layer4_0_downsample_0.weight_fused_bn : Float(2048, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.91 : Float(2048, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[2, 2]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : int[] = prim::Constant[value=[1, 1]]()
  %6 : bool = prim::Constant[value=0]()
  %7 : int = prim::Constant[value=1]()
  %8 : bool = prim::Constant[value=1]()
  %input.410 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.386, %self.self_layer4_0_downsample_0.weight_fused_bn, %328_fused_bn.91, %3, %4, %5, %6, %4, %7, %6, %6, %8, %8), scope: __module.self_layer4_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.410)
with prim::AIOFusionGroup_103 = graph(%input.406 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu),
      %input.410 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu)):
  %2 : int = prim::Constant[value=1]()
  %input.414 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.406, %input.410, %2) # <eval_with_key>.1:155:0
  return (%input.414)
with prim::AIOFusionGroup_104 = graph(%input.414 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu)):
  %input.418 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.414), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.418)
with prim::AIOFusionGroup_105 = graph(%input.418 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu)):
  %self.self_layer4_1_conv1.weight_fused_bn : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.422 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.418, %self.self_layer4_1_conv1.weight_fused_bn, %328_fused_bn.27, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer4_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.422)
with prim::AIOFusionGroup_106 = graph(%input.422 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)):
  %input.426 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.422), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.426)
with prim::AIOFusionGroup_107 = graph(%input.426 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)):
  %self.self_layer4_1_conv2.weight_fused_bn : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : bool = prim::Constant[value=0]()
  %5 : int[] = prim::Constant[value=[0, 0]]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.430 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.426, %self.self_layer4_1_conv2.weight_fused_bn, %328_fused_bn.27, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer4_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.430)
with prim::AIOFusionGroup_108 = graph(%input.430 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)):
  %input.434 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.430), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.434)
with prim::AIOFusionGroup_109 = graph(%input.434 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)):
  %self.self_layer4_1_conv3.weight_fused_bn : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.91 : Float(2048, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.438 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.434, %self.self_layer4_1_conv3.weight_fused_bn, %328_fused_bn.91, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer4_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.438)
with prim::AIOFusionGroup_110 = graph(%input.438 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu),
      %input.418 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu)):
  %2 : int = prim::Constant[value=1]()
  %input.442 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.438, %input.418, %2) # <eval_with_key>.1:165:0
  return (%input.442)
with prim::AIOFusionGroup_111 = graph(%input.442 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu)):
  %input.446 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.442), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.446)
with prim::AIOFusionGroup_112 = graph(%input.446 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu)):
  %self.self_layer4_2_conv1.weight_fused_bn : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.450 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.446, %self.self_layer4_2_conv1.weight_fused_bn, %328_fused_bn.27, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer4_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.450)
with prim::AIOFusionGroup_113 = graph(%input.450 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)):
  %input.454 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.450), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.454)
with prim::AIOFusionGroup_114 = graph(%input.454 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)):
  %self.self_layer4_2_conv2.weight_fused_bn : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.27 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : bool = prim::Constant[value=0]()
  %5 : int[] = prim::Constant[value=[0, 0]]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.458 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.454, %self.self_layer4_2_conv2.weight_fused_bn, %328_fused_bn.27, %3, %3, %3, %4, %5, %6, %4, %4, %7, %7), scope: __module.self_layer4_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.458)
with prim::AIOFusionGroup_115 = graph(%input.458 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)):
  %input.462 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.458), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.462)
with prim::AIOFusionGroup_116 = graph(%input.462 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu)):
  %self.self_layer4_2_conv3.weight_fused_bn : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.91 : Float(2048, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int[] = prim::Constant[value=[1, 1]]()
  %4 : int[] = prim::Constant[value=[0, 0]]()
  %5 : bool = prim::Constant[value=0]()
  %6 : int = prim::Constant[value=1]()
  %7 : bool = prim::Constant[value=1]()
  %input.466 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.462, %self.self_layer4_2_conv3.weight_fused_bn, %328_fused_bn.91, %3, %4, %3, %5, %4, %6, %5, %5, %7, %7), scope: __module.self_layer4_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  return (%input.466)
with prim::AIOFusionGroup_117 = graph(%input.466 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu),
      %input.446 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu)):
  %2 : int = prim::Constant[value=1]()
  %input.470 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.466, %input.446, %2) # <eval_with_key>.1:175:0
  return (%input.470)
with prim::AIOFusionGroup_118 = graph(%input.470 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu)):
  %input.474 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.470), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  return (%input.474)
with prim::AIOFusionGroup_119 = graph(%input.474 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu)):
  %1 : int[] = prim::Constant[value=[1, 1]]()
  %self_avgpool.2 : Float(1, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = aten::adaptive_avg_pool2d(%input.474, %1), scope: __module.self_avgpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1214:0
  return (%self_avgpool.2)
with prim::AIOFusionGroup_120 = graph(%self_avgpool.2 : Float(1, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu)):
  %1 : int = prim::Constant[value=1]()
  %2 : int = prim::Constant[value=-1]()
  %input.478 : Float(1, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = aten::flatten(%self_avgpool.2, %1, %2) # <eval_with_key>.1:178:0
  return (%input.478)
with prim::AIOFusionGroup_121 = graph(%input.478 : Float(1, 2048, strides=[2048, 1], requires_grad=0, device=cpu)):
  %self.self_fc.weight : Float(1000, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_fc.bias : Float(1000, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : Float(1, 1000, strides=[1000, 1], requires_grad=0, device=cpu) = aten::linear(%input.478, %self.self_fc.weight, %self.self_fc.bias), scope: __module.self_fc # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py:114:0
  return (%3)
Graph after AIOFuser
graph(%self.1 : __torch__.torch.fx.graph_module.___torch_mangle_247.GraphModule,
      %x : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu)):
  %66458 : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu), %66459 : bool = prim::AIOFusionGuard[types=[Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu)]](%x)
  %66460 : Float(1, 1000, strides=[1000, 1], requires_grad=0, device=cpu) = prim::If(%66459)
    block0():
      %684 : Float(1, 1000, strides=[1000, 1], requires_grad=0, device=cpu) = prim::AIOFusionGroup_0(%66458)
      -> (%684)
    block1():
      %66652 : Tensor = prim::FallbackGraph_1(%x)
      -> (%66652)
  %440 : (Tensor) = prim::TupleConstruct(%66460)
  return (%440)
with prim::AIOFusionGroup_0 = graph(%x : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu)):
  %self.self_fc.bias : Float(1000, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_fc.weight : Float(1000, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %6 : int = prim::Constant[value=-1]()
  %self.self_layer4_2_conv3.weight_fused_bn.5 : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_2_conv2.weight_fused_bn.7 : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_2_conv1.weight_fused_bn.9 : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_1_conv3.weight_fused_bn.12 : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_1_conv2.weight_fused_bn.14 : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_1_conv1.weight_fused_bn.16 : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_0_downsample_0.weight_fused_bn.19 : Float(2048, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.112 : Float(2048, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_0_conv3.weight_fused_bn.20 : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_0_conv2.weight_fused_bn.22 : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_0_conv1.weight_fused_bn.24 : Float(512, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_5_conv3.weight_fused_bn.27 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_5_conv2.weight_fused_bn.29 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_5_conv1.weight_fused_bn.31 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_4_conv3.weight_fused_bn.34 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_4_conv2.weight_fused_bn.36 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_4_conv1.weight_fused_bn.38 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_3_conv3.weight_fused_bn.41 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_3_conv2.weight_fused_bn.43 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_3_conv1.weight_fused_bn.45 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_2_conv3.weight_fused_bn.48 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_2_conv2.weight_fused_bn.50 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_2_conv1.weight_fused_bn.52 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_1_conv3.weight_fused_bn.55 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_1_conv2.weight_fused_bn.57 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_1_conv1.weight_fused_bn.59 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_0_downsample_0.weight_fused_bn.62 : Float(1024, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.162 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_0_conv3.weight_fused_bn.63 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_0_conv2.weight_fused_bn.65 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_0_conv1.weight_fused_bn.67 : Float(256, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_3_conv3.weight_fused_bn.70 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_3_conv2.weight_fused_bn.72 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_3_conv1.weight_fused_bn.74 : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_2_conv3.weight_fused_bn.77 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_2_conv2.weight_fused_bn.79 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_2_conv1.weight_fused_bn.81 : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_1_conv3.weight_fused_bn.84 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_1_conv2.weight_fused_bn.86 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_1_conv1.weight_fused_bn.88 : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_0_downsample_0.weight_fused_bn.91 : Float(512, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.214 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_0_conv3.weight_fused_bn.92 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_0_conv2.weight_fused_bn.94 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.222 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_0_conv1.weight_fused_bn.96 : Float(128, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_2_conv3.weight_fused_bn.99 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_2_conv2.weight_fused_bn.101 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_2_conv1.weight_fused_bn.103 : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_1_conv3.weight_fused_bn.106 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_1_conv2.weight_fused_bn.108 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_1_conv1.weight_fused_bn.110 : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_0_downsample_0.weight_fused_bn.113 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.238 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_0_conv3.weight_fused_bn.114 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_0_conv2.weight_fused_bn.116 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_0_conv1.weight_fused_bn.118 : Float(64, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %1039 : bool = prim::Constant[value=1]()
  %1038 : int = prim::Constant[value=1]()
  %1037 : int[] = prim::Constant[value=[0, 0]]()
  %1036 : bool = prim::Constant[value=0]()
  %1035 : int[] = prim::Constant[value=[1, 1]]()
  %1034 : int[] = prim::Constant[value=[3, 3]]()
  %1033 : int[] = prim::Constant[value=[2, 2]]()
  %328_fused_bn.244 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_conv1.weight_fused_bn.121 : Float(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %input.2 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = aten::_convolution(%x, %self.self_conv1.weight_fused_bn.121, %328_fused_bn.244, %1033, %1034, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.6 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = aten::relu_(%input.2), scope: __module.self_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.10 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::max_pool2d(%input.6, %1034, %1033, %1035, %1035, %1036), scope: __module.self_maxpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:782:0
  %input.14 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.10, %self.self_layer1_0_conv1.weight_fused_bn.118, %328_fused_bn.244, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.18 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.14), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.22 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.18, %self.self_layer1_0_conv2.weight_fused_bn.116, %328_fused_bn.244, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.26 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.22), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.30 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.26, %self.self_layer1_0_conv3.weight_fused_bn.114, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.34 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.10, %self.self_layer1_0_downsample_0.weight_fused_bn.113, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.38 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.30, %input.34, %1038) # <eval_with_key>.1:19:0
  %input.42 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.38), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.46 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.42, %self.self_layer1_1_conv1.weight_fused_bn.110, %328_fused_bn.244, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.50 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.46), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.54 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.50, %self.self_layer1_1_conv2.weight_fused_bn.108, %328_fused_bn.244, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.58 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.54), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.62 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.58, %self.self_layer1_1_conv3.weight_fused_bn.106, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.66 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.62, %input.42, %1038) # <eval_with_key>.1:29:0
  %input.70 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.66), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.74 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.70, %self.self_layer1_2_conv1.weight_fused_bn.103, %328_fused_bn.244, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.78 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.74), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.82 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.78, %self.self_layer1_2_conv2.weight_fused_bn.101, %328_fused_bn.244, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.86 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.82), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.90 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.86, %self.self_layer1_2_conv3.weight_fused_bn.99, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer1_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.94 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.90, %input.70, %1038) # <eval_with_key>.1:39:0
  %input.98 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.94), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.102 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.98, %self.self_layer2_0_conv1.weight_fused_bn.96, %328_fused_bn.222, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.106 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.102), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.110 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.106, %self.self_layer2_0_conv2.weight_fused_bn.94, %328_fused_bn.222, %1033, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.114 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.110), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.118 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.114, %self.self_layer2_0_conv3.weight_fused_bn.92, %328_fused_bn.214, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.122 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.98, %self.self_layer2_0_downsample_0.weight_fused_bn.91, %328_fused_bn.214, %1033, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.126 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.118, %input.122, %1038) # <eval_with_key>.1:51:0
  %input.130 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.126), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.134 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.130, %self.self_layer2_1_conv1.weight_fused_bn.88, %328_fused_bn.222, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.138 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.134), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.142 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.138, %self.self_layer2_1_conv2.weight_fused_bn.86, %328_fused_bn.222, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.146 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.142), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.150 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.146, %self.self_layer2_1_conv3.weight_fused_bn.84, %328_fused_bn.214, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.154 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.150, %input.130, %1038) # <eval_with_key>.1:61:0
  %input.158 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.154), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.162 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.158, %self.self_layer2_2_conv1.weight_fused_bn.81, %328_fused_bn.222, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.166 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.162), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.170 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.166, %self.self_layer2_2_conv2.weight_fused_bn.79, %328_fused_bn.222, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.174 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.170), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.178 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.174, %self.self_layer2_2_conv3.weight_fused_bn.77, %328_fused_bn.214, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.182 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.178, %input.158, %1038) # <eval_with_key>.1:71:0
  %input.186 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.182), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.190 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.186, %self.self_layer2_3_conv1.weight_fused_bn.74, %328_fused_bn.222, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.194 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.190), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.198 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.194, %self.self_layer2_3_conv2.weight_fused_bn.72, %328_fused_bn.222, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.202 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.198), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.206 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.202, %self.self_layer2_3_conv3.weight_fused_bn.70, %328_fused_bn.214, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer2_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.210 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.206, %input.186, %1038) # <eval_with_key>.1:81:0
  %input.214 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.210), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.218 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.214, %self.self_layer3_0_conv1.weight_fused_bn.67, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.222 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.218), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.226 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.222, %self.self_layer3_0_conv2.weight_fused_bn.65, %328_fused_bn.238, %1033, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.230 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.226), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.234 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.230, %self.self_layer3_0_conv3.weight_fused_bn.63, %328_fused_bn.162, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.238 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.214, %self.self_layer3_0_downsample_0.weight_fused_bn.62, %328_fused_bn.162, %1033, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.242 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.234, %input.238, %1038) # <eval_with_key>.1:93:0
  %input.246 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.242), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.250 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.246, %self.self_layer3_1_conv1.weight_fused_bn.59, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.254 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.250), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.258 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.254, %self.self_layer3_1_conv2.weight_fused_bn.57, %328_fused_bn.238, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.262 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.258), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.266 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.262, %self.self_layer3_1_conv3.weight_fused_bn.55, %328_fused_bn.162, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.270 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.266, %input.246, %1038) # <eval_with_key>.1:103:0
  %input.274 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.270), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.278 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.274, %self.self_layer3_2_conv1.weight_fused_bn.52, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.282 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.278), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.286 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.282, %self.self_layer3_2_conv2.weight_fused_bn.50, %328_fused_bn.238, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.290 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.286), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.294 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.290, %self.self_layer3_2_conv3.weight_fused_bn.48, %328_fused_bn.162, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.298 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.294, %input.274, %1038) # <eval_with_key>.1:113:0
  %input.302 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.298), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.306 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.302, %self.self_layer3_3_conv1.weight_fused_bn.45, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.310 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.306), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.314 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.310, %self.self_layer3_3_conv2.weight_fused_bn.43, %328_fused_bn.238, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.318 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.314), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.322 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.318, %self.self_layer3_3_conv3.weight_fused_bn.41, %328_fused_bn.162, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.326 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.322, %input.302, %1038) # <eval_with_key>.1:123:0
  %input.330 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.326), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.334 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.330, %self.self_layer3_4_conv1.weight_fused_bn.38, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_4_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.338 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.334), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.342 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.338, %self.self_layer3_4_conv2.weight_fused_bn.36, %328_fused_bn.238, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_4_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.346 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.342), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.350 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.346, %self.self_layer3_4_conv3.weight_fused_bn.34, %328_fused_bn.162, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_4_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.354 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.350, %input.330, %1038) # <eval_with_key>.1:133:0
  %input.358 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.354), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.362 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.358, %self.self_layer3_5_conv1.weight_fused_bn.31, %328_fused_bn.238, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_5_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.366 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.362), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.370 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.366, %self.self_layer3_5_conv2.weight_fused_bn.29, %328_fused_bn.238, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_5_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.374 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.370), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.378 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.374, %self.self_layer3_5_conv3.weight_fused_bn.27, %328_fused_bn.162, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer3_5_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.382 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.378, %input.358, %1038) # <eval_with_key>.1:143:0
  %input.386 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.382), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.390 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.386, %self.self_layer4_0_conv1.weight_fused_bn.24, %328_fused_bn.214, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.394 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.390), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.398 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.394, %self.self_layer4_0_conv2.weight_fused_bn.22, %328_fused_bn.214, %1033, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.402 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.398), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.406 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.402, %self.self_layer4_0_conv3.weight_fused_bn.20, %328_fused_bn.112, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.410 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.386, %self.self_layer4_0_downsample_0.weight_fused_bn.19, %328_fused_bn.112, %1033, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.414 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.406, %input.410, %1038) # <eval_with_key>.1:155:0
  %input.418 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.414), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.422 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.418, %self.self_layer4_1_conv1.weight_fused_bn.16, %328_fused_bn.214, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.426 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.422), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.430 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.426, %self.self_layer4_1_conv2.weight_fused_bn.14, %328_fused_bn.214, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.434 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.430), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.438 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.434, %self.self_layer4_1_conv3.weight_fused_bn.12, %328_fused_bn.112, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.442 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.438, %input.418, %1038) # <eval_with_key>.1:165:0
  %input.446 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.442), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.450 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.446, %self.self_layer4_2_conv1.weight_fused_bn.9, %328_fused_bn.214, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.454 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.450), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.458 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.454, %self.self_layer4_2_conv2.weight_fused_bn.7, %328_fused_bn.214, %1035, %1035, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.462 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.458), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.466 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.462, %self.self_layer4_2_conv3.weight_fused_bn.5, %328_fused_bn.112, %1035, %1037, %1035, %1036, %1037, %1038, %1036, %1036, %1039, %1039), scope: __module.self_layer4_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.470 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.466, %input.446, %1038) # <eval_with_key>.1:175:0
  %input.474 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.470), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %self_avgpool.2 : Float(1, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = aten::adaptive_avg_pool2d(%input.474, %1035), scope: __module.self_avgpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1214:0
  %input.478 : Float(1, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = aten::flatten(%self_avgpool.2, %1038, %6) # <eval_with_key>.1:178:0
  %3 : Float(1, 1000, strides=[1000, 1], requires_grad=0, device=cpu) = aten::linear(%input.478, %self.self_fc.weight, %self.self_fc.bias), scope: __module.self_fc # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py:114:0
  return (%3)
with prim::FallbackGraph_1 = graph(%x : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu)):
  %self.self_fc.bias : Float(1000, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_fc.weight : Float(1000, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int = prim::Constant[value=-1]()
  %self.self_layer4_2_conv3.weight_fused_bn.5 : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_2_conv2.weight_fused_bn.7 : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_2_conv1.weight_fused_bn.9 : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_1_conv3.weight_fused_bn.12 : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_1_conv2.weight_fused_bn.14 : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_1_conv1.weight_fused_bn.16 : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_0_downsample_0.weight_fused_bn.19 : Float(2048, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.112 : Float(2048, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_0_conv3.weight_fused_bn.20 : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_0_conv2.weight_fused_bn.22 : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_0_conv1.weight_fused_bn.24 : Float(512, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_5_conv3.weight_fused_bn.27 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_5_conv2.weight_fused_bn.29 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_5_conv1.weight_fused_bn.31 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_4_conv3.weight_fused_bn.34 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_4_conv2.weight_fused_bn.36 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_4_conv1.weight_fused_bn.38 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_3_conv3.weight_fused_bn.41 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_3_conv2.weight_fused_bn.43 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_3_conv1.weight_fused_bn.45 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_2_conv3.weight_fused_bn.48 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_2_conv2.weight_fused_bn.50 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_2_conv1.weight_fused_bn.52 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_1_conv3.weight_fused_bn.55 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_1_conv2.weight_fused_bn.57 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_1_conv1.weight_fused_bn.59 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_0_downsample_0.weight_fused_bn.62 : Float(1024, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.162 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_0_conv3.weight_fused_bn.63 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_0_conv2.weight_fused_bn.65 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_0_conv1.weight_fused_bn.67 : Float(256, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_3_conv3.weight_fused_bn.70 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_3_conv2.weight_fused_bn.72 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_3_conv1.weight_fused_bn.74 : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_2_conv3.weight_fused_bn.77 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_2_conv2.weight_fused_bn.79 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_2_conv1.weight_fused_bn.81 : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_1_conv3.weight_fused_bn.84 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_1_conv2.weight_fused_bn.86 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_1_conv1.weight_fused_bn.88 : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_0_downsample_0.weight_fused_bn.91 : Float(512, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.214 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_0_conv3.weight_fused_bn.92 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_0_conv2.weight_fused_bn.94 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.222 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_0_conv1.weight_fused_bn.96 : Float(128, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_2_conv3.weight_fused_bn.99 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_2_conv2.weight_fused_bn.101 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_2_conv1.weight_fused_bn.103 : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_1_conv3.weight_fused_bn.106 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_1_conv2.weight_fused_bn.108 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_1_conv1.weight_fused_bn.110 : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_0_downsample_0.weight_fused_bn.113 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.238 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_0_conv3.weight_fused_bn.114 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_0_conv2.weight_fused_bn.116 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_0_conv1.weight_fused_bn.118 : Float(64, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %61 : bool = prim::Constant[value=1]()
  %62 : int = prim::Constant[value=1]()
  %63 : int[] = prim::Constant[value=[0, 0]]()
  %64 : bool = prim::Constant[value=0]()
  %65 : int[] = prim::Constant[value=[1, 1]]()
  %66 : int[] = prim::Constant[value=[3, 3]]()
  %67 : int[] = prim::Constant[value=[2, 2]]()
  %328_fused_bn.244 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_conv1.weight_fused_bn.121 : Float(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %input.2 : Tensor = aten::_convolution(%x, %self.self_conv1.weight_fused_bn.121, %328_fused_bn.244, %67, %66, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.6 : Tensor = aten::relu_(%input.2), scope: __module.self_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.10 : Tensor = aten::max_pool2d(%input.6, %66, %67, %65, %65, %64), scope: __module.self_maxpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:782:0
  %input.14 : Tensor = aten::_convolution(%input.10, %self.self_layer1_0_conv1.weight_fused_bn.118, %328_fused_bn.244, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.18 : Tensor = aten::relu_(%input.14), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.22 : Tensor = aten::_convolution(%input.18, %self.self_layer1_0_conv2.weight_fused_bn.116, %328_fused_bn.244, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.26 : Tensor = aten::relu_(%input.22), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.30 : Tensor = aten::_convolution(%input.26, %self.self_layer1_0_conv3.weight_fused_bn.114, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.34 : Tensor = aten::_convolution(%input.10, %self.self_layer1_0_downsample_0.weight_fused_bn.113, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.38 : Tensor = aten::add_(%input.30, %input.34, %62) # <eval_with_key>.1:19:0
  %input.42 : Tensor = aten::relu_(%input.38), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.46 : Tensor = aten::_convolution(%input.42, %self.self_layer1_1_conv1.weight_fused_bn.110, %328_fused_bn.244, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.50 : Tensor = aten::relu_(%input.46), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.54 : Tensor = aten::_convolution(%input.50, %self.self_layer1_1_conv2.weight_fused_bn.108, %328_fused_bn.244, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.58 : Tensor = aten::relu_(%input.54), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.62 : Tensor = aten::_convolution(%input.58, %self.self_layer1_1_conv3.weight_fused_bn.106, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.66 : Tensor = aten::add_(%input.62, %input.42, %62) # <eval_with_key>.1:29:0
  %input.70 : Tensor = aten::relu_(%input.66), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.74 : Tensor = aten::_convolution(%input.70, %self.self_layer1_2_conv1.weight_fused_bn.103, %328_fused_bn.244, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.78 : Tensor = aten::relu_(%input.74), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.82 : Tensor = aten::_convolution(%input.78, %self.self_layer1_2_conv2.weight_fused_bn.101, %328_fused_bn.244, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.86 : Tensor = aten::relu_(%input.82), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.90 : Tensor = aten::_convolution(%input.86, %self.self_layer1_2_conv3.weight_fused_bn.99, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.94 : Tensor = aten::add_(%input.90, %input.70, %62) # <eval_with_key>.1:39:0
  %input.98 : Tensor = aten::relu_(%input.94), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.102 : Tensor = aten::_convolution(%input.98, %self.self_layer2_0_conv1.weight_fused_bn.96, %328_fused_bn.222, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.106 : Tensor = aten::relu_(%input.102), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.110 : Tensor = aten::_convolution(%input.106, %self.self_layer2_0_conv2.weight_fused_bn.94, %328_fused_bn.222, %67, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.114 : Tensor = aten::relu_(%input.110), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.118 : Tensor = aten::_convolution(%input.114, %self.self_layer2_0_conv3.weight_fused_bn.92, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.122 : Tensor = aten::_convolution(%input.98, %self.self_layer2_0_downsample_0.weight_fused_bn.91, %328_fused_bn.214, %67, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.126 : Tensor = aten::add_(%input.118, %input.122, %62) # <eval_with_key>.1:51:0
  %input.130 : Tensor = aten::relu_(%input.126), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.134 : Tensor = aten::_convolution(%input.130, %self.self_layer2_1_conv1.weight_fused_bn.88, %328_fused_bn.222, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.138 : Tensor = aten::relu_(%input.134), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.142 : Tensor = aten::_convolution(%input.138, %self.self_layer2_1_conv2.weight_fused_bn.86, %328_fused_bn.222, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.146 : Tensor = aten::relu_(%input.142), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.150 : Tensor = aten::_convolution(%input.146, %self.self_layer2_1_conv3.weight_fused_bn.84, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.154 : Tensor = aten::add_(%input.150, %input.130, %62) # <eval_with_key>.1:61:0
  %input.158 : Tensor = aten::relu_(%input.154), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.162 : Tensor = aten::_convolution(%input.158, %self.self_layer2_2_conv1.weight_fused_bn.81, %328_fused_bn.222, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.166 : Tensor = aten::relu_(%input.162), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.170 : Tensor = aten::_convolution(%input.166, %self.self_layer2_2_conv2.weight_fused_bn.79, %328_fused_bn.222, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.174 : Tensor = aten::relu_(%input.170), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.178 : Tensor = aten::_convolution(%input.174, %self.self_layer2_2_conv3.weight_fused_bn.77, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.182 : Tensor = aten::add_(%input.178, %input.158, %62) # <eval_with_key>.1:71:0
  %input.186 : Tensor = aten::relu_(%input.182), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.190 : Tensor = aten::_convolution(%input.186, %self.self_layer2_3_conv1.weight_fused_bn.74, %328_fused_bn.222, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.194 : Tensor = aten::relu_(%input.190), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.198 : Tensor = aten::_convolution(%input.194, %self.self_layer2_3_conv2.weight_fused_bn.72, %328_fused_bn.222, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.202 : Tensor = aten::relu_(%input.198), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.206 : Tensor = aten::_convolution(%input.202, %self.self_layer2_3_conv3.weight_fused_bn.70, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.210 : Tensor = aten::add_(%input.206, %input.186, %62) # <eval_with_key>.1:81:0
  %input.214 : Tensor = aten::relu_(%input.210), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.218 : Tensor = aten::_convolution(%input.214, %self.self_layer3_0_conv1.weight_fused_bn.67, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.222 : Tensor = aten::relu_(%input.218), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.226 : Tensor = aten::_convolution(%input.222, %self.self_layer3_0_conv2.weight_fused_bn.65, %328_fused_bn.238, %67, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.230 : Tensor = aten::relu_(%input.226), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.234 : Tensor = aten::_convolution(%input.230, %self.self_layer3_0_conv3.weight_fused_bn.63, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.238 : Tensor = aten::_convolution(%input.214, %self.self_layer3_0_downsample_0.weight_fused_bn.62, %328_fused_bn.162, %67, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.242 : Tensor = aten::add_(%input.234, %input.238, %62) # <eval_with_key>.1:93:0
  %input.246 : Tensor = aten::relu_(%input.242), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.250 : Tensor = aten::_convolution(%input.246, %self.self_layer3_1_conv1.weight_fused_bn.59, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.254 : Tensor = aten::relu_(%input.250), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.258 : Tensor = aten::_convolution(%input.254, %self.self_layer3_1_conv2.weight_fused_bn.57, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.262 : Tensor = aten::relu_(%input.258), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.266 : Tensor = aten::_convolution(%input.262, %self.self_layer3_1_conv3.weight_fused_bn.55, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.270 : Tensor = aten::add_(%input.266, %input.246, %62) # <eval_with_key>.1:103:0
  %input.274 : Tensor = aten::relu_(%input.270), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.278 : Tensor = aten::_convolution(%input.274, %self.self_layer3_2_conv1.weight_fused_bn.52, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.282 : Tensor = aten::relu_(%input.278), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.286 : Tensor = aten::_convolution(%input.282, %self.self_layer3_2_conv2.weight_fused_bn.50, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.290 : Tensor = aten::relu_(%input.286), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.294 : Tensor = aten::_convolution(%input.290, %self.self_layer3_2_conv3.weight_fused_bn.48, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.298 : Tensor = aten::add_(%input.294, %input.274, %62) # <eval_with_key>.1:113:0
  %input.302 : Tensor = aten::relu_(%input.298), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.306 : Tensor = aten::_convolution(%input.302, %self.self_layer3_3_conv1.weight_fused_bn.45, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.310 : Tensor = aten::relu_(%input.306), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.314 : Tensor = aten::_convolution(%input.310, %self.self_layer3_3_conv2.weight_fused_bn.43, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.318 : Tensor = aten::relu_(%input.314), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.322 : Tensor = aten::_convolution(%input.318, %self.self_layer3_3_conv3.weight_fused_bn.41, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.326 : Tensor = aten::add_(%input.322, %input.302, %62) # <eval_with_key>.1:123:0
  %input.330 : Tensor = aten::relu_(%input.326), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.334 : Tensor = aten::_convolution(%input.330, %self.self_layer3_4_conv1.weight_fused_bn.38, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_4_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.338 : Tensor = aten::relu_(%input.334), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.342 : Tensor = aten::_convolution(%input.338, %self.self_layer3_4_conv2.weight_fused_bn.36, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_4_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.346 : Tensor = aten::relu_(%input.342), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.350 : Tensor = aten::_convolution(%input.346, %self.self_layer3_4_conv3.weight_fused_bn.34, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_4_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.354 : Tensor = aten::add_(%input.350, %input.330, %62) # <eval_with_key>.1:133:0
  %input.358 : Tensor = aten::relu_(%input.354), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.362 : Tensor = aten::_convolution(%input.358, %self.self_layer3_5_conv1.weight_fused_bn.31, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_5_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.366 : Tensor = aten::relu_(%input.362), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.370 : Tensor = aten::_convolution(%input.366, %self.self_layer3_5_conv2.weight_fused_bn.29, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_5_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.374 : Tensor = aten::relu_(%input.370), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.378 : Tensor = aten::_convolution(%input.374, %self.self_layer3_5_conv3.weight_fused_bn.27, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_5_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.382 : Tensor = aten::add_(%input.378, %input.358, %62) # <eval_with_key>.1:143:0
  %input.386 : Tensor = aten::relu_(%input.382), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.390 : Tensor = aten::_convolution(%input.386, %self.self_layer4_0_conv1.weight_fused_bn.24, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.394 : Tensor = aten::relu_(%input.390), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.398 : Tensor = aten::_convolution(%input.394, %self.self_layer4_0_conv2.weight_fused_bn.22, %328_fused_bn.214, %67, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.402 : Tensor = aten::relu_(%input.398), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.406 : Tensor = aten::_convolution(%input.402, %self.self_layer4_0_conv3.weight_fused_bn.20, %328_fused_bn.112, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.410 : Tensor = aten::_convolution(%input.386, %self.self_layer4_0_downsample_0.weight_fused_bn.19, %328_fused_bn.112, %67, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.414 : Tensor = aten::add_(%input.406, %input.410, %62) # <eval_with_key>.1:155:0
  %input.418 : Tensor = aten::relu_(%input.414), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.422 : Tensor = aten::_convolution(%input.418, %self.self_layer4_1_conv1.weight_fused_bn.16, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.426 : Tensor = aten::relu_(%input.422), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.430 : Tensor = aten::_convolution(%input.426, %self.self_layer4_1_conv2.weight_fused_bn.14, %328_fused_bn.214, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.434 : Tensor = aten::relu_(%input.430), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.438 : Tensor = aten::_convolution(%input.434, %self.self_layer4_1_conv3.weight_fused_bn.12, %328_fused_bn.112, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.442 : Tensor = aten::add_(%input.438, %input.418, %62) # <eval_with_key>.1:165:0
  %input.446 : Tensor = aten::relu_(%input.442), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.450 : Tensor = aten::_convolution(%input.446, %self.self_layer4_2_conv1.weight_fused_bn.9, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.454 : Tensor = aten::relu_(%input.450), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.458 : Tensor = aten::_convolution(%input.454, %self.self_layer4_2_conv2.weight_fused_bn.7, %328_fused_bn.214, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.462 : Tensor = aten::relu_(%input.458), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.466 : Tensor = aten::_convolution(%input.462, %self.self_layer4_2_conv3.weight_fused_bn.5, %328_fused_bn.112, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.470 : Tensor = aten::add_(%input.466, %input.446, %62) # <eval_with_key>.1:175:0
  %input.474 : Tensor = aten::relu_(%input.470), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %self_avgpool.2 : Tensor = aten::adaptive_avg_pool2d(%input.474, %65), scope: __module.self_avgpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1214:0
  %input.478 : Tensor = aten::flatten(%self_avgpool.2, %62, %3) # <eval_with_key>.1:178:0
  %191 : Tensor = aten::linear(%input.478, %self.self_fc.weight, %self.self_fc.bias), scope: __module.self_fc # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py:114:0
  return (%191)
Running DLS graph fuser taken 322169 microseconds
Building AIO network from graph
graph(%x : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu)):
  %self.self_fc.bias : Float(1000, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_fc.weight : Float(1000, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %3 : int = prim::Constant[value=-1]()
  %self.self_layer4_2_conv3.weight_fused_bn.5 : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_2_conv2.weight_fused_bn.7 : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_2_conv1.weight_fused_bn.9 : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_1_conv3.weight_fused_bn.12 : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_1_conv2.weight_fused_bn.14 : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_1_conv1.weight_fused_bn.16 : Float(512, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_0_downsample_0.weight_fused_bn.19 : Float(2048, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.112 : Float(2048, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_0_conv3.weight_fused_bn.20 : Float(2048, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_0_conv2.weight_fused_bn.22 : Float(512, 512, 3, 3, strides=[4608, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer4_0_conv1.weight_fused_bn.24 : Float(512, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_5_conv3.weight_fused_bn.27 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_5_conv2.weight_fused_bn.29 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_5_conv1.weight_fused_bn.31 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_4_conv3.weight_fused_bn.34 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_4_conv2.weight_fused_bn.36 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_4_conv1.weight_fused_bn.38 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_3_conv3.weight_fused_bn.41 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_3_conv2.weight_fused_bn.43 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_3_conv1.weight_fused_bn.45 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_2_conv3.weight_fused_bn.48 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_2_conv2.weight_fused_bn.50 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_2_conv1.weight_fused_bn.52 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_1_conv3.weight_fused_bn.55 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_1_conv2.weight_fused_bn.57 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_1_conv1.weight_fused_bn.59 : Float(256, 1024, 1, 1, strides=[1024, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_0_downsample_0.weight_fused_bn.62 : Float(1024, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.162 : Float(1024, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_0_conv3.weight_fused_bn.63 : Float(1024, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_0_conv2.weight_fused_bn.65 : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer3_0_conv1.weight_fused_bn.67 : Float(256, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_3_conv3.weight_fused_bn.70 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_3_conv2.weight_fused_bn.72 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_3_conv1.weight_fused_bn.74 : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_2_conv3.weight_fused_bn.77 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_2_conv2.weight_fused_bn.79 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_2_conv1.weight_fused_bn.81 : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_1_conv3.weight_fused_bn.84 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_1_conv2.weight_fused_bn.86 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_1_conv1.weight_fused_bn.88 : Float(128, 512, 1, 1, strides=[512, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_0_downsample_0.weight_fused_bn.91 : Float(512, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.214 : Float(512, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_0_conv3.weight_fused_bn.92 : Float(512, 128, 1, 1, strides=[128, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_0_conv2.weight_fused_bn.94 : Float(128, 128, 3, 3, strides=[1152, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.222 : Float(128, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer2_0_conv1.weight_fused_bn.96 : Float(128, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_2_conv3.weight_fused_bn.99 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_2_conv2.weight_fused_bn.101 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_2_conv1.weight_fused_bn.103 : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_1_conv3.weight_fused_bn.106 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_1_conv2.weight_fused_bn.108 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_1_conv1.weight_fused_bn.110 : Float(64, 256, 1, 1, strides=[256, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_0_downsample_0.weight_fused_bn.113 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %328_fused_bn.238 : Float(256, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_0_conv3.weight_fused_bn.114 : Float(256, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_0_conv2.weight_fused_bn.116 : Float(64, 64, 3, 3, strides=[576, 9, 3, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_layer1_0_conv1.weight_fused_bn.118 : Float(64, 64, 1, 1, strides=[64, 1, 1, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %61 : bool = prim::Constant[value=1]()
  %62 : int = prim::Constant[value=1]()
  %63 : int[] = prim::Constant[value=[0, 0]]()
  %64 : bool = prim::Constant[value=0]()
  %65 : int[] = prim::Constant[value=[1, 1]]()
  %66 : int[] = prim::Constant[value=[3, 3]]()
  %67 : int[] = prim::Constant[value=[2, 2]]()
  %328_fused_bn.244 : Float(64, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %self.self_conv1.weight_fused_bn.121 : Float(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]()
  %input.2 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = aten::_convolution(%x, %self.self_conv1.weight_fused_bn.121, %328_fused_bn.244, %67, %66, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.6 : Float(1, 64, 112, 112, strides=[802816, 12544, 112, 1], requires_grad=0, device=cpu) = aten::relu_(%input.2), scope: __module.self_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.10 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::max_pool2d(%input.6, %66, %67, %65, %65, %64), scope: __module.self_maxpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:782:0
  %input.14 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.10, %self.self_layer1_0_conv1.weight_fused_bn.118, %328_fused_bn.244, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.18 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.14), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.22 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.18, %self.self_layer1_0_conv2.weight_fused_bn.116, %328_fused_bn.244, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.26 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.22), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.30 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.26, %self.self_layer1_0_conv3.weight_fused_bn.114, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.34 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.10, %self.self_layer1_0_downsample_0.weight_fused_bn.113, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.38 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.30, %input.34, %62) # <eval_with_key>.1:19:0
  %input.42 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.38), scope: __module.self_layer1_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.46 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.42, %self.self_layer1_1_conv1.weight_fused_bn.110, %328_fused_bn.244, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.50 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.46), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.54 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.50, %self.self_layer1_1_conv2.weight_fused_bn.108, %328_fused_bn.244, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.58 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.54), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.62 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.58, %self.self_layer1_1_conv3.weight_fused_bn.106, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.66 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.62, %input.42, %62) # <eval_with_key>.1:29:0
  %input.70 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.66), scope: __module.self_layer1_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.74 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.70, %self.self_layer1_2_conv1.weight_fused_bn.103, %328_fused_bn.244, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.78 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.74), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.82 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.78, %self.self_layer1_2_conv2.weight_fused_bn.101, %328_fused_bn.244, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.86 : Float(1, 64, 56, 56, strides=[200704, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.82), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.90 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.86, %self.self_layer1_2_conv3.weight_fused_bn.99, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer1_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.94 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::add_(%input.90, %input.70, %62) # <eval_with_key>.1:39:0
  %input.98 : Float(1, 256, 56, 56, strides=[802816, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.94), scope: __module.self_layer1_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.102 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.98, %self.self_layer2_0_conv1.weight_fused_bn.96, %328_fused_bn.222, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.106 : Float(1, 128, 56, 56, strides=[401408, 3136, 56, 1], requires_grad=0, device=cpu) = aten::relu_(%input.102), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.110 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.106, %self.self_layer2_0_conv2.weight_fused_bn.94, %328_fused_bn.222, %67, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.114 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.110), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.118 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.114, %self.self_layer2_0_conv3.weight_fused_bn.92, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.122 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.98, %self.self_layer2_0_downsample_0.weight_fused_bn.91, %328_fused_bn.214, %67, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.126 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.118, %input.122, %62) # <eval_with_key>.1:51:0
  %input.130 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.126), scope: __module.self_layer2_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.134 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.130, %self.self_layer2_1_conv1.weight_fused_bn.88, %328_fused_bn.222, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.138 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.134), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.142 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.138, %self.self_layer2_1_conv2.weight_fused_bn.86, %328_fused_bn.222, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.146 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.142), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.150 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.146, %self.self_layer2_1_conv3.weight_fused_bn.84, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.154 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.150, %input.130, %62) # <eval_with_key>.1:61:0
  %input.158 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.154), scope: __module.self_layer2_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.162 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.158, %self.self_layer2_2_conv1.weight_fused_bn.81, %328_fused_bn.222, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.166 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.162), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.170 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.166, %self.self_layer2_2_conv2.weight_fused_bn.79, %328_fused_bn.222, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.174 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.170), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.178 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.174, %self.self_layer2_2_conv3.weight_fused_bn.77, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.182 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.178, %input.158, %62) # <eval_with_key>.1:71:0
  %input.186 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.182), scope: __module.self_layer2_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.190 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.186, %self.self_layer2_3_conv1.weight_fused_bn.74, %328_fused_bn.222, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.194 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.190), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.198 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.194, %self.self_layer2_3_conv2.weight_fused_bn.72, %328_fused_bn.222, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.202 : Float(1, 128, 28, 28, strides=[100352, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.198), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.206 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.202, %self.self_layer2_3_conv3.weight_fused_bn.70, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer2_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.210 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::add_(%input.206, %input.186, %62) # <eval_with_key>.1:81:0
  %input.214 : Float(1, 512, 28, 28, strides=[401408, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.210), scope: __module.self_layer2_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.218 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.214, %self.self_layer3_0_conv1.weight_fused_bn.67, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.222 : Float(1, 256, 28, 28, strides=[200704, 784, 28, 1], requires_grad=0, device=cpu) = aten::relu_(%input.218), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.226 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.222, %self.self_layer3_0_conv2.weight_fused_bn.65, %328_fused_bn.238, %67, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.230 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.226), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.234 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.230, %self.self_layer3_0_conv3.weight_fused_bn.63, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.238 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.214, %self.self_layer3_0_downsample_0.weight_fused_bn.62, %328_fused_bn.162, %67, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.242 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.234, %input.238, %62) # <eval_with_key>.1:93:0
  %input.246 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.242), scope: __module.self_layer3_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.250 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.246, %self.self_layer3_1_conv1.weight_fused_bn.59, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.254 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.250), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.258 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.254, %self.self_layer3_1_conv2.weight_fused_bn.57, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.262 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.258), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.266 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.262, %self.self_layer3_1_conv3.weight_fused_bn.55, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.270 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.266, %input.246, %62) # <eval_with_key>.1:103:0
  %input.274 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.270), scope: __module.self_layer3_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.278 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.274, %self.self_layer3_2_conv1.weight_fused_bn.52, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.282 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.278), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.286 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.282, %self.self_layer3_2_conv2.weight_fused_bn.50, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.290 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.286), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.294 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.290, %self.self_layer3_2_conv3.weight_fused_bn.48, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.298 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.294, %input.274, %62) # <eval_with_key>.1:113:0
  %input.302 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.298), scope: __module.self_layer3_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.306 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.302, %self.self_layer3_3_conv1.weight_fused_bn.45, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_3_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.310 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.306), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.314 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.310, %self.self_layer3_3_conv2.weight_fused_bn.43, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_3_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.318 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.314), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.322 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.318, %self.self_layer3_3_conv3.weight_fused_bn.41, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_3_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.326 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.322, %input.302, %62) # <eval_with_key>.1:123:0
  %input.330 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.326), scope: __module.self_layer3_3_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.334 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.330, %self.self_layer3_4_conv1.weight_fused_bn.38, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_4_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.338 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.334), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.342 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.338, %self.self_layer3_4_conv2.weight_fused_bn.36, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_4_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.346 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.342), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.350 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.346, %self.self_layer3_4_conv3.weight_fused_bn.34, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_4_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.354 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.350, %input.330, %62) # <eval_with_key>.1:133:0
  %input.358 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.354), scope: __module.self_layer3_4_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.362 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.358, %self.self_layer3_5_conv1.weight_fused_bn.31, %328_fused_bn.238, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_5_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.366 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.362), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.370 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.366, %self.self_layer3_5_conv2.weight_fused_bn.29, %328_fused_bn.238, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_5_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.374 : Float(1, 256, 14, 14, strides=[50176, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.370), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.378 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.374, %self.self_layer3_5_conv3.weight_fused_bn.27, %328_fused_bn.162, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer3_5_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.382 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::add_(%input.378, %input.358, %62) # <eval_with_key>.1:143:0
  %input.386 : Float(1, 1024, 14, 14, strides=[200704, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.382), scope: __module.self_layer3_5_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.390 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.386, %self.self_layer4_0_conv1.weight_fused_bn.24, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_0_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.394 : Float(1, 512, 14, 14, strides=[100352, 196, 14, 1], requires_grad=0, device=cpu) = aten::relu_(%input.390), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.398 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.394, %self.self_layer4_0_conv2.weight_fused_bn.22, %328_fused_bn.214, %67, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_0_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.402 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.398), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.406 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.402, %self.self_layer4_0_conv3.weight_fused_bn.20, %328_fused_bn.112, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_0_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.410 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.386, %self.self_layer4_0_downsample_0.weight_fused_bn.19, %328_fused_bn.112, %67, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_0_downsample_0 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.414 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.406, %input.410, %62) # <eval_with_key>.1:155:0
  %input.418 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.414), scope: __module.self_layer4_0_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.422 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.418, %self.self_layer4_1_conv1.weight_fused_bn.16, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_1_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.426 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.422), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.430 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.426, %self.self_layer4_1_conv2.weight_fused_bn.14, %328_fused_bn.214, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_1_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.434 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.430), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.438 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.434, %self.self_layer4_1_conv3.weight_fused_bn.12, %328_fused_bn.112, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_1_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.442 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.438, %input.418, %62) # <eval_with_key>.1:165:0
  %input.446 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.442), scope: __module.self_layer4_1_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.450 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.446, %self.self_layer4_2_conv1.weight_fused_bn.9, %328_fused_bn.214, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_2_conv1 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.454 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.450), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.458 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.454, %self.self_layer4_2_conv2.weight_fused_bn.7, %328_fused_bn.214, %65, %65, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_2_conv2 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.462 : Float(1, 512, 7, 7, strides=[25088, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.458), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %input.466 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::_convolution(%input.462, %self.self_layer4_2_conv3.weight_fused_bn.5, %328_fused_bn.112, %65, %63, %65, %64, %63, %62, %64, %64, %61, %61), scope: __module.self_layer4_2_conv3 # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py:459:0
  %input.470 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::add_(%input.466, %input.446, %62) # <eval_with_key>.1:175:0
  %input.474 : Float(1, 2048, 7, 7, strides=[100352, 49, 7, 1], requires_grad=0, device=cpu) = aten::relu_(%input.470), scope: __module.self_layer4_2_relu # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1455:0
  %self_avgpool.2 : Float(1, 2048, 1, 1, strides=[2048, 1, 1, 1], requires_grad=0, device=cpu) = aten::adaptive_avg_pool2d(%input.474, %65), scope: __module.self_avgpool # /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py:1214:0
  %input.478 : Float(1, 2048, strides=[2048, 1], requires_grad=0, device=cpu) = aten::flatten(%self_avgpool.2, %62, %3) # <eval_with_key>.1:178:0
  %191 : Float(1, 1000, strides=[1000, 1], requires_grad=0, device=cpu) = aten::linear(%input.478, %self.self_fc.weight, %self.self_fc.bias), scope: __module.self_fc # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py:114:0
  return (%191)
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding prim::Constant layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaaddae7140 , Bias 0xaaaae0db6e00 , padding  [3, 3] , stride  [2, 2] , dilation [1, 1] , groups  1
Registering network input: Conv input index: 0
Creating blob for Input layer  3  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [1, 3, 224, 224]
Creating blob for Data layer  4  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [64, 3, 7, 7]
Creating blob for Data layer  5  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [64]
Adding aten::relu_ layer to network
Adding aten::max_pool2d layer to network
Binding inputs for Max_pooling layer kernel_size  [3, 3] , dilation [1, 1] , padding  [1, 1] , stride  [2, 2] , ceil_mode  0
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaade743780 , Bias 0xaaaae0db6e00 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  9  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [64, 64, 1, 1]
Creating blob for Data layer  10  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [64]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaaddafa780 , Bias 0xaaaae0db6e00 , padding  [1, 1] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  13  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [64, 64, 3, 3]
Creating blob for Data layer  14  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [64]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaade77ed80 , Bias 0xaaaae0dc7bc0 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  17  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [256, 64, 1, 1]
Creating blob for Data layer  18  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [256]
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae1f62200 , Bias 0xaaaae0dc7bc0 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  20  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [256, 64, 1, 1]
Creating blob for Data layer  21  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [256]
Adding aten::add_ layer to network
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaaddb1e800 , Bias 0xaaaae0db6e00 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  25  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [64, 256, 1, 1]
Creating blob for Data layer  26  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [64]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaaddb2e880 , Bias 0xaaaae0db6e00 , padding  [1, 1] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  29  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [64, 64, 3, 3]
Creating blob for Data layer  30  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [64]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae1f7e7c0 , Bias 0xaaaae0dc7bc0 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  33  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [256, 64, 1, 1]
Creating blob for Data layer  34  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [256]
Adding aten::add_ layer to network
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae1f8e800 , Bias 0xaaaae0db6e00 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  38  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [64, 256, 1, 1]
Creating blob for Data layer  39  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [64]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae1f9e880 , Bias 0xaaaae0db6e00 , padding  [1, 1] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  42  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [64, 64, 3, 3]
Creating blob for Data layer  43  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [64]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae1fc28c0 , Bias 0xaaaae0dc7bc0 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  46  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [256, 64, 1, 1]
Creating blob for Data layer  47  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [256]
Adding aten::add_ layer to network
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae1fd2940 , Bias 0xaaaae2031440 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  51  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [128, 256, 1, 1]
Creating blob for Data layer  52  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [128]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae20ea240 , Bias 0xaaaae2031440 , padding  [1, 1] , stride  [2, 2] , dilation [1, 1] , groups  1
Creating blob for Data layer  55  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [128, 128, 3, 3]
Creating blob for Data layer  56  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [128]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae1e79640 , Bias 0xaaaae0dbf340 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  59  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [512, 128, 1, 1]
Creating blob for Data layer  60  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [512]
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae1eb9680 , Bias 0xaaaae0dbf340 , padding  [0, 0] , stride  [2, 2] , dilation [1, 1] , groups  1
Creating blob for Data layer  62  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [512, 256, 1, 1]
Creating blob for Data layer  63  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [512]
Adding aten::add_ layer to network
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaadf4d7a40 , Bias 0xaaaae2031440 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  67  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [128, 512, 1, 1]
Creating blob for Data layer  68  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [128]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaadf517a80 , Bias 0xaaaae2031440 , padding  [1, 1] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  71  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [128, 128, 3, 3]
Creating blob for Data layer  72  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [128]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaadf5a7ac0 , Bias 0xaaaae0dbf340 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  75  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [512, 128, 1, 1]
Creating blob for Data layer  76  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [512]
Adding aten::add_ layer to network
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaadf5e7b40 , Bias 0xaaaae2031440 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  80  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [128, 512, 1, 1]
Creating blob for Data layer  81  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [128]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaadd9477c0 , Bias 0xaaaae2031440 , padding  [1, 1] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  84  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [128, 128, 3, 3]
Creating blob for Data layer  85  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [128]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaadd9d7800 , Bias 0xaaaae0dbf340 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  88  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [512, 128, 1, 1]
Creating blob for Data layer  89  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [512]
Adding aten::add_ layer to network
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaadda17880 , Bias 0xaaaae2031440 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  93  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [128, 512, 1, 1]
Creating blob for Data layer  94  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [128]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae2272240 , Bias 0xaaaae2031440 , padding  [1, 1] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  97  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [128, 128, 3, 3]
Creating blob for Data layer  98  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [128]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaadda578c0 , Bias 0xaaaae0dbf340 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  101  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [512, 128, 1, 1]
Creating blob for Data layer  102  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [512]
Adding aten::add_ layer to network
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae2302280 , Bias 0xaaaae0dc7bc0 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  106  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [256, 512, 1, 1]
Creating blob for Data layer  107  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [256]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae1859580 , Bias 0xaaaae0dc7bc0 , padding  [1, 1] , stride  [2, 2] , dilation [1, 1] , groups  1
Creating blob for Data layer  110  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [256, 256, 3, 3]
Creating blob for Data layer  111  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [256]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae2382300 , Bias 0xaaaade75f5c0 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  114  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [1024, 256, 1, 1]
Creating blob for Data layer  115  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [1024]
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae1b695c0 , Bias 0xaaaade75f5c0 , padding  [0, 0] , stride  [2, 2] , dilation [1, 1] , groups  1
Creating blob for Data layer  117  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [1024, 512, 1, 1]
Creating blob for Data layer  118  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [1024]
Adding aten::add_ layer to network
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae1d69600 , Bias 0xaaaae0dc7bc0 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  122  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [256, 1024, 1, 1]
Creating blob for Data layer  123  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [256]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae1175480 , Bias 0xaaaae0dc7bc0 , padding  [1, 1] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  126  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [256, 256, 3, 3]
Creating blob for Data layer  127  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [256]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae0a91380 , Bias 0xaaaade75f5c0 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  130  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [1024, 256, 1, 1]
Creating blob for Data layer  131  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [1024]
Adding aten::add_ layer to network
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae0b913c0 , Bias 0xaaaae0dc7bc0 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  135  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [256, 1024, 1, 1]
Creating blob for Data layer  136  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [256]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae14e7500 , Bias 0xaaaae0dc7bc0 , padding  [1, 1] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  139  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [256, 256, 3, 3]
Creating blob for Data layer  140  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [256]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae0c91400 , Bias 0xaaaade75f5c0 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  143  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [1024, 256, 1, 1]
Creating blob for Data layer  144  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [1024]
Adding aten::add_ layer to network
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae06bd340 , Bias 0xaaaae0dc7bc0 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  148  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [256, 1024, 1, 1]
Creating blob for Data layer  149  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [256]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae009d280 , Bias 0xaaaae0dc7bc0 , padding  [1, 1] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  152  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [256, 256, 3, 3]
Creating blob for Data layer  153  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [256]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae07bd380 , Bias 0xaaaade75f5c0 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  156  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [1024, 256, 1, 1]
Creating blob for Data layer  157  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [1024]
Adding aten::add_ layer to network
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae08bd400 , Bias 0xaaaae0dc7bc0 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  161  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [256, 1024, 1, 1]
Creating blob for Data layer  162  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [256]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae03ad300 , Bias 0xaaaae0dc7bc0 , padding  [1, 1] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  165  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [256, 256, 3, 3]
Creating blob for Data layer  166  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [256]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaadf6845c0 , Bias 0xaaaade75f5c0 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  169  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [1024, 256, 1, 1]
Creating blob for Data layer  170  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [1024]
Adding aten::add_ layer to network
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaadf784600 , Bias 0xaaaae0dc7bc0 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  174  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [256, 1024, 1, 1]
Creating blob for Data layer  175  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [256]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaadfd8d240 , Bias 0xaaaae0dc7bc0 , padding  [1, 1] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  178  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [256, 256, 3, 3]
Creating blob for Data layer  179  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [256]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaadf884680 , Bias 0xaaaade75f5c0 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  182  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [1024, 256, 1, 1]
Creating blob for Data layer  183  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [1024]
Adding aten::add_ layer to network
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaadf994600 , Bias 0xaaaae0dbf340 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  187  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [512, 1024, 1, 1]
Creating blob for Data layer  188  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [512]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xffff6d08d040 , Bias 0xaaaae0dbf340 , padding  [1, 1] , stride  [2, 2] , dilation [1, 1] , groups  1
Creating blob for Data layer  191  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [512, 512, 3, 3]
Creating blob for Data layer  192  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [512]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae33ac440 , Bias 0xaaaaddaf4500 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  195  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [2048, 512, 1, 1]
Creating blob for Data layer  196  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [2048]
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xffff6c88c040 , Bias 0xaaaaddaf4500 , padding  [0, 0] , stride  [2, 2] , dilation [1, 1] , groups  1
Creating blob for Data layer  198  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [2048, 1024, 1, 1]
Creating blob for Data layer  199  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [2048]
Adding aten::add_ layer to network
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae37ac480 , Bias 0xaaaae0dbf340 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  203  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [512, 2048, 1, 1]
Creating blob for Data layer  204  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [512]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xffff5d6ff040 , Bias 0xaaaae0dbf340 , padding  [1, 1] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  207  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [512, 512, 3, 3]
Creating blob for Data layer  208  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [512]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae3bac500 , Bias 0xaaaaddaf4500 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  211  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [2048, 512, 1, 1]
Creating blob for Data layer  212  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [2048]
Adding aten::add_ layer to network
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae3fac580 , Bias 0xaaaae0dbf340 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  216  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [512, 2048, 1, 1]
Creating blob for Data layer  217  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [512]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xffff5cdfe040 , Bias 0xaaaae0dbf340 , padding  [1, 1] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  220  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [512, 512, 3, 3]
Creating blob for Data layer  221  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [512]
Adding aten::relu_ layer to network
Adding aten::_convolution layer to network
Binding inputs for Convolution Layer Weight  0xaaaae43ac5c0 , Bias 0xaaaaddaf4500 , padding  [0, 0] , stride  [1, 1] , dilation [1, 1] , groups  1
Creating blob for Data layer  224  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [2048, 512, 1, 1]
Creating blob for Data layer  225  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [2048]
Adding aten::add_ layer to network
Adding aten::relu_ layer to network
Adding aten::adaptive_avg_pool2d layer to network
Binding inputs for Adaptive_avg_pooling Layer kernel_size  [1, 1]
Adding aten::flatten layer to network
Adding aten::linear layer to network
Binding inputs for Linear layer Weight  0xffff6f88e040 , Bias 0xaaaadcb52dc0
Creating blob for Data layer  231  with type  FLOAT  format  PlainDataFormat(FORMATF_ROW_MAJOR)[0x0000000000000004]  shape  [1000, 2048]
Creating blob for Data layer  232  with type  FLOAT  format  PlainDataFormat(FORMATF_LINEAR)[0x0000000000000001]  shape  [1000]
Running AIO Network
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Layer  FullyConnected  got PlainDataFormat(FORMATF_BATCH_ROW_MAJOR)[0x0000000000000015]  while it prefers PlainDataFormat(FORMATF_ROW_MAJOR)[0x0000000000000004] but no such conversion is available in DLS
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Allocating 16 bytes (aligned)
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Data for layer Data : 
Selected kernel Input for layer Input : Conv input
Selected kernel TransposeBRC3x4 for layer Transpose : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
PlatformInfo(vendor_id=3, cpu_family=8, cpu_model=3340, isa=NEON, L1=CacheInfo(size=65536, inclusive=1, share_count=1), L2=CacheInfo(size=1048576, inclusive=0, share_count=1), L3=CacheInfo(size=33554432, inclusive=0, share_count=80))
Tuning ConvTask(batch=1,idepth=1,iheight=224,iwidth=224,ichannels=3,odepth=1,oheight=112,owidth=112,ochannels=64,kdepth=1,kheight=7,kwidth=7,dstride=1,hstride=2,wstride=2,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=3,wpad=3,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score CvjmPreset(in_regs=7,w_regs=3,batch_tile=3136,inpf_step_tile=49,outf_tile=64)
Allocating 42336 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel MaxPoolingMeta<FLOAT[4]>@NEON for layer Pooling : 
Selected kernel ConvOneJit for layer Convolution : 
/usr/local/share//libampere-aio/data/lookup_files/conv_one_jit.csv 1 Could not parse lookup entry: Missing column task.extbatch
Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=64,odepth=1,oheight=56,owidth=56,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=64,n_minibatch=3136)
Allocating 65536 bytes (aligned)
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=64,odepth=1,oheight=56,owidth=56,ochannels=64,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=32,in_mode=MS,d_minibatch=64,n_minibatch=3136)
Allocating 16384 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=64,odepth=1,oheight=56,owidth=56,ochannels=64,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score CvjmPreset(in_regs=6,w_regs=4,batch_tile=784,inpf_step_tile=9,outf_tile=64)
Allocating 147456 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=64,odepth=1,oheight=56,owidth=56,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=64,n_minibatch=3136)
Allocating 65536 bytes (aligned)
Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : 
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=256,odepth=1,oheight=56,owidth=56,ochannels=64,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=32,in_mode=MS,d_minibatch=256,n_minibatch=3136)
Allocating 65536 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=64,odepth=1,oheight=56,owidth=56,ochannels=64,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup CvjmPreset(in_regs=6,w_regs=4,batch_tile=784,inpf_step_tile=9,outf_tile=64)
Allocating 147456 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=64,odepth=1,oheight=56,owidth=56,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=64,n_minibatch=3136)
Allocating 65536 bytes (aligned)
Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : 
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=256,odepth=1,oheight=56,owidth=56,ochannels=64,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=32,in_mode=MS,d_minibatch=256,n_minibatch=3136)
Allocating 65536 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=64,odepth=1,oheight=56,owidth=56,ochannels=64,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup CvjmPreset(in_regs=6,w_regs=4,batch_tile=784,inpf_step_tile=9,outf_tile=64)
Allocating 147456 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=64,odepth=1,oheight=56,owidth=56,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=64,n_minibatch=3136)
Allocating 65536 bytes (aligned)
Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : 
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=256,odepth=1,oheight=28,owidth=28,ochannels=512,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=2,wstride=2,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score CvjmPreset(in_regs=7,w_regs=3,batch_tile=196,inpf_step_tile=1,outf_tile=512)
Allocating 528384 bytes (aligned)
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=256,odepth=1,oheight=56,owidth=56,ochannels=128,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=256,n_minibatch=3136)
Allocating 131072 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=56,iwidth=56,ichannels=128,odepth=1,oheight=28,owidth=28,ochannels=128,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=2,wstride=2,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score CvjmPreset(in_regs=6,w_regs=4,batch_tile=196,inpf_step_tile=9,outf_tile=128)
Allocating 589824 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=128,odepth=1,oheight=28,owidth=28,ochannels=512,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=512,in_mode=MS,d_minibatch=32,n_minibatch=196)
Allocating 264192 bytes (aligned)
Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : 
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=512,odepth=1,oheight=28,owidth=28,ochannels=128,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=512,n_minibatch=784)
Allocating 262144 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=128,odepth=1,oheight=28,owidth=28,ochannels=128,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score CvjmPreset(in_regs=6,w_regs=4,batch_tile=196,inpf_step_tile=9,outf_tile=128)
Allocating 589824 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=128,odepth=1,oheight=28,owidth=28,ochannels=512,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=512,in_mode=MS,d_minibatch=32,n_minibatch=196)
Allocating 264192 bytes (aligned)
Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : 
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=512,odepth=1,oheight=28,owidth=28,ochannels=128,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=512,n_minibatch=784)
Allocating 262144 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=128,odepth=1,oheight=28,owidth=28,ochannels=128,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup CvjmPreset(in_regs=6,w_regs=4,batch_tile=196,inpf_step_tile=9,outf_tile=128)
Allocating 589824 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=128,odepth=1,oheight=28,owidth=28,ochannels=512,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=512,in_mode=MS,d_minibatch=32,n_minibatch=196)
Allocating 264192 bytes (aligned)
Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : 
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=512,odepth=1,oheight=28,owidth=28,ochannels=128,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=512,n_minibatch=784)
Allocating 262144 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=128,odepth=1,oheight=28,owidth=28,ochannels=128,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup CvjmPreset(in_regs=6,w_regs=4,batch_tile=196,inpf_step_tile=9,outf_tile=128)
Allocating 589824 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=128,odepth=1,oheight=28,owidth=28,ochannels=512,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=512,in_mode=MS,d_minibatch=32,n_minibatch=196)
Allocating 264192 bytes (aligned)
Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : 
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=512,odepth=1,oheight=14,owidth=14,ochannels=1024,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=2,wstride=2,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score CvjmPreset(in_regs=6,w_regs=4,batch_tile=196,inpf_step_tile=1,outf_tile=32)
Allocating 2 MB (numa)
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=512,odepth=1,oheight=28,owidth=28,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=512,n_minibatch=784)
Allocating 524288 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=28,iwidth=28,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=2,wstride=2,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=9,outf_tile=256)
Allocating 2 MB (numa)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=1024,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=1024,in_mode=MS,d_minibatch=144,n_minibatch=49)
Allocating 1 MB (numa)
Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : 
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=1024,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=1024,n_minibatch=196)
Allocating 1048576 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=9,outf_tile=256)
Allocating 2 MB (numa)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=1024,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=1024,in_mode=MS,d_minibatch=144,n_minibatch=49)
Allocating 1 MB (numa)
Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : 
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=1024,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=1024,n_minibatch=196)
Allocating 1048576 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=9,outf_tile=256)
Allocating 2 MB (numa)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=1024,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=1024,in_mode=MS,d_minibatch=144,n_minibatch=49)
Allocating 1 MB (numa)
Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : 
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=1024,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=1024,n_minibatch=196)
Allocating 1048576 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=9,outf_tile=256)
Allocating 2 MB (numa)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=1024,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=1024,in_mode=MS,d_minibatch=144,n_minibatch=49)
Allocating 1 MB (numa)
Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : 
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=1024,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=1024,n_minibatch=196)
Allocating 1048576 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=9,outf_tile=256)
Allocating 2 MB (numa)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=1024,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=1024,in_mode=MS,d_minibatch=144,n_minibatch=49)
Allocating 1 MB (numa)
Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : 
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=1024,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=1024,n_minibatch=196)
Allocating 1048576 bytes (aligned)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=256,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=9,outf_tile=256)
Allocating 2 MB (numa)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=256,odepth=1,oheight=14,owidth=14,ochannels=1024,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=1024,in_mode=MS,d_minibatch=144,n_minibatch=49)
Allocating 1 MB (numa)
Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : 
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=1024,odepth=1,oheight=7,owidth=7,ochannels=2048,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=2,wstride=2,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=1,outf_tile=24)
Allocating 8 MB (numa)
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=1024,odepth=1,oheight=14,owidth=14,ochannels=512,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=1024,n_minibatch=196)
Allocating 2 MB (numa)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=14,iwidth=14,ichannels=512,odepth=1,oheight=7,owidth=7,ochannels=512,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=2,wstride=2,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=9,outf_tile=24)
Allocating 9 MB (numa)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=7,iwidth=7,ichannels=512,odepth=1,oheight=7,owidth=7,ochannels=2048,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=24,in_mode=MS,d_minibatch=512,n_minibatch=49)
Allocating 4 MB (numa)
Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : 
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=7,iwidth=7,ichannels=2048,odepth=1,oheight=7,owidth=7,ochannels=512,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=2048,n_minibatch=49)
Allocating 4 MB (numa)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=7,iwidth=7,ichannels=512,odepth=1,oheight=7,owidth=7,ochannels=512,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=9,outf_tile=24)
Allocating 9 MB (numa)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=7,iwidth=7,ichannels=512,odepth=1,oheight=7,owidth=7,ochannels=2048,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=24,in_mode=MS,d_minibatch=512,n_minibatch=49)
Allocating 4 MB (numa)
Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : 
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=7,iwidth=7,ichannels=2048,odepth=1,oheight=7,owidth=7,ochannels=512,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=6,w_regs=4,w_prefetches=0,outf_tile=64,in_mode=MS,d_minibatch=2048,n_minibatch=49)
Allocating 4 MB (numa)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvViaJitMatmul for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=7,iwidth=7,ichannels=512,odepth=1,oheight=7,owidth=7,ochannels=512,kdepth=1,kheight=3,kwidth=3,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=1,wpad=1,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup CvjmPreset(in_regs=7,w_regs=3,batch_tile=49,inpf_step_tile=9,outf_tile=24)
Allocating 9 MB (numa)
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel ConvOneJit for layer Convolution : 
Tuning ConvTask(batch=1,idepth=1,iheight=7,iwidth=7,ichannels=512,odepth=1,oheight=7,owidth=7,ochannels=2048,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset via lookup ConvOnePreset(in_regs=7,w_regs=3,w_prefetches=0,outf_tile=24,in_mode=MS,d_minibatch=512,n_minibatch=49)
Allocating 4 MB (numa)
Selected kernel BinaryOpVectorized[Add]@NEON for layer Add : 
Selected kernel UnaryOpVectorized[RELU]@NEON for layer RELU : 
Selected kernel AdaptiveAvgPoolingMeta<FLOAT[4]>@NEON for layer AdaptiveAvgPool : 
Selected kernel TransposeIndexed for layer Transpose : 
Allocating 8192 bytes (aligned)
Selected kernel ForwardingKernelFlatten for layer Flatten : 
Selected kernel FCViaConvOne for layer FullyConnected : 
Selected kernel ForwardingKernelOutput for layer Output : 
Merge of   ( Transpose [1, 224, 224, 3] ) to  Conv input ( Input Input ):  Target layer type is not mergeable
Merge of   ( Convolution [1, 112, 112, 64] ) to   ( Transpose TransposeBRC3x4 ):  Target layer type is not mergeable
Considering merge of RELU  to  ConvViaJitMatmul
Merge of   ( RELU [1, 112, 112, 64] ) to   ( Convolution ConvViaJitMatmul ):  Successful
Merge of   ( Pooling [1, 56, 56, 64] ) to   ( Convolution ConvViaJitMatmul ):  Target layer type is not mergeable
Considering merge of Add  to  ConvOneJit
Merge of   ( Add [1, 56, 56, 256] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 56, 56, 256] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 56, 56, 64] ) to   ( Convolution ConvOneJit ):  Successful
Merge of   ( Convolution [1, 56, 56, 64] ) to   ( Convolution ConvOneJit ):  Target layer type is not mergeable
Considering merge of RELU  to  ConvViaJitMatmul
Merge of   ( RELU [1, 56, 56, 64] ) to   ( Convolution ConvViaJitMatmul ):  Successful
Merge of   ( Convolution [1, 56, 56, 256] ) to   ( Convolution ConvViaJitMatmul ):  Target layer type is not mergeable
Considering merge of Add  to  ConvOneJit
Kernel  ConvOneJit  rejected merge
Merge of   ( Add [1, 56, 56, 256] ) to   ( Convolution ConvOneJit ):  Attempt merge failed
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 56, 56, 64] ) to   ( Convolution ConvOneJit ):  Successful
Merge of   ( Convolution [1, 56, 56, 64] ) to   ( Convolution ConvOneJit ):  Target layer type is not mergeable
Considering merge of RELU  to  ConvViaJitMatmul
Merge of   ( RELU [1, 56, 56, 64] ) to   ( Convolution ConvViaJitMatmul ):  Successful
Merge of   ( Convolution [1, 56, 56, 256] ) to   ( Convolution ConvViaJitMatmul ):  Target layer type is not mergeable
Considering merge of Add  to  ConvOneJit
Merge of   ( Add [1, 56, 56, 256] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 56, 56, 256] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 56, 56, 64] ) to   ( Convolution ConvOneJit ):  Successful
Merge of   ( Convolution [1, 56, 56, 64] ) to   ( Convolution ConvOneJit ):  Target layer type is not mergeable
Considering merge of RELU  to  ConvViaJitMatmul
Merge of   ( RELU [1, 56, 56, 64] ) to   ( Convolution ConvViaJitMatmul ):  Successful
Merge of   ( Convolution [1, 56, 56, 256] ) to   ( Convolution ConvViaJitMatmul ):  Target layer type is not mergeable
Considering merge of Add  to  ConvOneJit
Merge of   ( Add [1, 56, 56, 256] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 56, 56, 256] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of Add  to  ConvViaJitMatmul
Kernel  ConvViaJitMatmul  rejected merge
Merge of   ( Add [1, 28, 28, 512] ) to   ( Convolution ConvViaJitMatmul ):  Attempt merge failed
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 56, 56, 128] ) to   ( Convolution ConvOneJit ):  Successful
Merge of   ( Convolution [1, 28, 28, 128] ) to   ( Convolution ConvOneJit ):  Target layer type is not mergeable
Considering merge of RELU  to  ConvViaJitMatmul
Merge of   ( RELU [1, 28, 28, 128] ) to   ( Convolution ConvViaJitMatmul ):  Successful
Merge of   ( Convolution [1, 28, 28, 512] ) to   ( Convolution ConvViaJitMatmul ):  Target layer type is not mergeable
Considering merge of Add  to  ConvOneJit
Merge of   ( Add [1, 28, 28, 512] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 28, 28, 512] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 28, 28, 128] ) to   ( Convolution ConvOneJit ):  Successful
Merge of   ( Convolution [1, 28, 28, 128] ) to   ( Convolution ConvOneJit ):  Target layer type is not mergeable
Considering merge of RELU  to  ConvViaJitMatmul
Merge of   ( RELU [1, 28, 28, 128] ) to   ( Convolution ConvViaJitMatmul ):  Successful
Merge of   ( Convolution [1, 28, 28, 512] ) to   ( Convolution ConvViaJitMatmul ):  Target layer type is not mergeable
Considering merge of Add  to  ConvOneJit
Merge of   ( Add [1, 28, 28, 512] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 28, 28, 512] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 28, 28, 128] ) to   ( Convolution ConvOneJit ):  Successful
Merge of   ( Convolution [1, 28, 28, 128] ) to   ( Convolution ConvOneJit ):  Target layer type is not mergeable
Considering merge of RELU  to  ConvViaJitMatmul
Merge of   ( RELU [1, 28, 28, 128] ) to   ( Convolution ConvViaJitMatmul ):  Successful
Merge of   ( Convolution [1, 28, 28, 512] ) to   ( Convolution ConvViaJitMatmul ):  Target layer type is not mergeable
Considering merge of Add  to  ConvOneJit
Merge of   ( Add [1, 28, 28, 512] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 28, 28, 512] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 28, 28, 128] ) to   ( Convolution ConvOneJit ):  Successful
Merge of   ( Convolution [1, 28, 28, 128] ) to   ( Convolution ConvOneJit ):  Target layer type is not mergeable
Considering merge of RELU  to  ConvViaJitMatmul
Merge of   ( RELU [1, 28, 28, 128] ) to   ( Convolution ConvViaJitMatmul ):  Successful
Merge of   ( Convolution [1, 28, 28, 512] ) to   ( Convolution ConvViaJitMatmul ):  Target layer type is not mergeable
Considering merge of Add  to  ConvOneJit
Merge of   ( Add [1, 28, 28, 512] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 28, 28, 512] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of Add  to  ConvViaJitMatmul
Kernel  ConvViaJitMatmul  rejected merge
Merge of   ( Add [1, 14, 14, 1024] ) to   ( Convolution ConvViaJitMatmul ):  Attempt merge failed
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 28, 28, 256] ) to   ( Convolution ConvOneJit ):  Successful
Merge of   ( Convolution [1, 14, 14, 256] ) to   ( Convolution ConvOneJit ):  Target layer type is not mergeable
Considering merge of RELU  to  ConvViaJitMatmul
Merge of   ( RELU [1, 14, 14, 256] ) to   ( Convolution ConvViaJitMatmul ):  Successful
Merge of   ( Convolution [1, 14, 14, 1024] ) to   ( Convolution ConvViaJitMatmul ):  Target layer type is not mergeable
Considering merge of Add  to  ConvOneJit
Merge of   ( Add [1, 14, 14, 1024] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 14, 14, 1024] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 14, 14, 256] ) to   ( Convolution ConvOneJit ):  Successful
Merge of   ( Convolution [1, 14, 14, 256] ) to   ( Convolution ConvOneJit ):  Target layer type is not mergeable
Considering merge of RELU  to  ConvViaJitMatmul
Merge of   ( RELU [1, 14, 14, 256] ) to   ( Convolution ConvViaJitMatmul ):  Successful
Merge of   ( Convolution [1, 14, 14, 1024] ) to   ( Convolution ConvViaJitMatmul ):  Target layer type is not mergeable
Considering merge of Add  to  ConvOneJit
Merge of   ( Add [1, 14, 14, 1024] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 14, 14, 1024] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 14, 14, 256] ) to   ( Convolution ConvOneJit ):  Successful
Merge of   ( Convolution [1, 14, 14, 256] ) to   ( Convolution ConvOneJit ):  Target layer type is not mergeable
Considering merge of RELU  to  ConvViaJitMatmul
Merge of   ( RELU [1, 14, 14, 256] ) to   ( Convolution ConvViaJitMatmul ):  Successful
Merge of   ( Convolution [1, 14, 14, 1024] ) to   ( Convolution ConvViaJitMatmul ):  Target layer type is not mergeable
Considering merge of Add  to  ConvOneJit
Merge of   ( Add [1, 14, 14, 1024] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 14, 14, 1024] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 14, 14, 256] ) to   ( Convolution ConvOneJit ):  Successful
Merge of   ( Convolution [1, 14, 14, 256] ) to   ( Convolution ConvOneJit ):  Target layer type is not mergeable
Considering merge of RELU  to  ConvViaJitMatmul
Merge of   ( RELU [1, 14, 14, 256] ) to   ( Convolution ConvViaJitMatmul ):  Successful
Merge of   ( Convolution [1, 14, 14, 1024] ) to   ( Convolution ConvViaJitMatmul ):  Target layer type is not mergeable
Considering merge of Add  to  ConvOneJit
Merge of   ( Add [1, 14, 14, 1024] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 14, 14, 1024] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 14, 14, 256] ) to   ( Convolution ConvOneJit ):  Successful
Merge of   ( Convolution [1, 14, 14, 256] ) to   ( Convolution ConvOneJit ):  Target layer type is not mergeable
Considering merge of RELU  to  ConvViaJitMatmul
Merge of   ( RELU [1, 14, 14, 256] ) to   ( Convolution ConvViaJitMatmul ):  Successful
Merge of   ( Convolution [1, 14, 14, 1024] ) to   ( Convolution ConvViaJitMatmul ):  Target layer type is not mergeable
Considering merge of Add  to  ConvOneJit
Merge of   ( Add [1, 14, 14, 1024] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 14, 14, 1024] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 14, 14, 256] ) to   ( Convolution ConvOneJit ):  Successful
Merge of   ( Convolution [1, 14, 14, 256] ) to   ( Convolution ConvOneJit ):  Target layer type is not mergeable
Considering merge of RELU  to  ConvViaJitMatmul
Merge of   ( RELU [1, 14, 14, 256] ) to   ( Convolution ConvViaJitMatmul ):  Successful
Merge of   ( Convolution [1, 14, 14, 1024] ) to   ( Convolution ConvViaJitMatmul ):  Target layer type is not mergeable
Considering merge of Add  to  ConvOneJit
Merge of   ( Add [1, 14, 14, 1024] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 14, 14, 1024] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of Add  to  ConvViaJitMatmul
Kernel  ConvViaJitMatmul  rejected merge
Merge of   ( Add [1, 7, 7, 2048] ) to   ( Convolution ConvViaJitMatmul ):  Attempt merge failed
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 14, 14, 512] ) to   ( Convolution ConvOneJit ):  Successful
Merge of   ( Convolution [1, 7, 7, 512] ) to   ( Convolution ConvOneJit ):  Target layer type is not mergeable
Considering merge of RELU  to  ConvViaJitMatmul
Merge of   ( RELU [1, 7, 7, 512] ) to   ( Convolution ConvViaJitMatmul ):  Successful
Merge of   ( Convolution [1, 7, 7, 2048] ) to   ( Convolution ConvViaJitMatmul ):  Target layer type is not mergeable
Considering merge of Add  to  ConvOneJit
Merge of   ( Add [1, 7, 7, 2048] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 7, 7, 2048] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 7, 7, 512] ) to   ( Convolution ConvOneJit ):  Successful
Merge of   ( Convolution [1, 7, 7, 512] ) to   ( Convolution ConvOneJit ):  Target layer type is not mergeable
Considering merge of RELU  to  ConvViaJitMatmul
Merge of   ( RELU [1, 7, 7, 512] ) to   ( Convolution ConvViaJitMatmul ):  Successful
Merge of   ( Convolution [1, 7, 7, 2048] ) to   ( Convolution ConvViaJitMatmul ):  Target layer type is not mergeable
Considering merge of Add  to  ConvOneJit
Merge of   ( Add [1, 7, 7, 2048] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 7, 7, 2048] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 7, 7, 512] ) to   ( Convolution ConvOneJit ):  Successful
Merge of   ( Convolution [1, 7, 7, 512] ) to   ( Convolution ConvOneJit ):  Target layer type is not mergeable
Considering merge of RELU  to  ConvViaJitMatmul
Merge of   ( RELU [1, 7, 7, 512] ) to   ( Convolution ConvViaJitMatmul ):  Successful
Merge of   ( Convolution [1, 7, 7, 2048] ) to   ( Convolution ConvViaJitMatmul ):  Target layer type is not mergeable
Considering merge of Add  to  ConvOneJit
Merge of   ( Add [1, 7, 7, 2048] ) to   ( Convolution ConvOneJit ):  Successful
Considering merge of RELU  to  ConvOneJit
Merge of   ( RELU [1, 7, 7, 2048] ) to   ( Convolution ConvOneJit ):  Successful
Merge of   ( AdaptiveAvgPool [1, 1, 1, 2048] ) to   ( Convolution ConvOneJit ):  Target layer type is not mergeable
Merge of   ( Transpose [1, 2048, 1, 1] ) to   ( AdaptiveAvgPool AdaptiveAvgPoolingMeta<FLOAT[4]>@NEON ):  Target layer type is not mergeable
Merge of   ( Flatten [1, 2048] ) to   ( Transpose TransposeIndexed ):  Target layer type is not mergeable
Merge of   ( FullyConnected [1, 1000] ) to   ( Flatten ForwardingKernelFlatten ):  Target layer type is not mergeable
Merge of   ( Output [1, 1000] ) to   ( FullyConnected FCViaConvOne ):  Target layer type is not mergeable
External allocation: allocating 4000 bytes
Creating external output: 0 , shape: [1, 1000]
Allocating 602112 bytes (aligned)
Allocating 3 MB (numa)
Allocating 802816 bytes (aligned)
Allocating 802816 bytes (aligned)
Allocating 802816 bytes (aligned)
Allocating 3 MB (numa)
Allocating 1 MB (numa)
Allocating 1 MB (numa)
Allocating 401408 bytes (aligned)
Allocating 200704 bytes (aligned)
Allocating 200704 bytes (aligned)
Allocating 100352 bytes (aligned)
Allocating 100352 bytes (aligned)
Allocating 8192 bytes (aligned)
Allocating 8192 bytes (aligned)
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Input
Running TransposeBRC3x4
Running ConvViaJitMatmul
Allocating 2 MB (numa)
Jitted kernel for init: in_mode: ProxyInput
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 12
in_tail_cols: 3
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff8a440000 , used 6464 B
Jitted kernel for init: in_mode: ProxyInput
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 4
in_tail_cols: 3
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff81d50000 , used 3748 B
Running MaxPoolingMeta<FLOAT[4]>@NEON
Running ConvOneJit
Scratches: 0 @ 0
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [6x4]
out_features: 16
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff81d20000 , used 4076 B
Running ConvViaJitMatmul
Allocating 112896 bytes (aligned)
Jitted kernel for init: in_mode: ProxyInput
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [6x4]
out_features: 16
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff7dd10000 , used 4604 B
Running ConvOneJit
Scratches: 0 @ 0
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [6x4]
out_features: 16
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,NONE,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff781f0000 , used 3716 B
Running ConvOneJit
Scratches: 0 @ 0
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [6x4]
out_features: 16
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,BINOP_ADD_MATRIX,RELU,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff6c077000 , used 5012 B
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Allocating 112896 bytes (aligned)
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Allocating 112896 bytes (aligned)
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Allocating 3136 bytes (aligned)
Jitted kernel for init: in_mode: ProxyInput
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 12
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,NONE,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff6c067000 , used 4504 B
Jitted kernel for init: in_mode: ProxyInput
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 8
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,NONE,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff6c057000 , used 3692 B
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Allocating 28224 bytes (aligned)
Jitted kernel for init: in_mode: ProxyInput
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [6x4]
out_features: 16
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff6c047000 , used 4604 B
Running ConvOneJit
Scratches: 0 @ 0
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 12
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[NONE,NONE,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff6c037000 , used 3356 B
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 8
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[NONE,NONE,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff6c027000 , used 2684 B
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::OUTPUT
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 12
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[NONE,NONE,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff6c017000 , used 4280 B
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::OUTPUT
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 8
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[NONE,NONE,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff6c007000 , used 3384 B
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::OUTPUT
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 12
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,BINOP_ADD_MATRIX,RELU,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5b2b5000 , used 6128 B
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::OUTPUT
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 8
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,BINOP_ADD_MATRIX,RELU,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5b2a5000 , used 4756 B
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Allocating 28224 bytes (aligned)
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Allocating 28224 bytes (aligned)
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Allocating 28224 bytes (aligned)
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Allocating 784 bytes (aligned)
Jitted kernel for init: in_mode: ProxyInput
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [6x4]
out_features: 16
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,NONE,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5b295000 , used 4244 B
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Allocating 7056 bytes (aligned)
Jitted kernel for init: in_mode: ProxyInput
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 12
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5b285000 , used 4868 B
Jitted kernel for init: in_mode: ProxyInput
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 4
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5b275000 , used 2908 B
Running ConvOneJit
Scratches: 0 @ 0
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 4
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[NONE,NONE,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5b265000 , used 1900 B
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::OUTPUT
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 4
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,BINOP_ADD_MATRIX,RELU,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5b255000 , used 3272 B
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Allocating 7056 bytes (aligned)
Jitted kernel for init: in_mode: ProxyInput
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 12
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 1
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5b245000 , used 62312 B
Jitted kernel for init: in_mode: ProxyInput
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 4
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 1
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5b235000 , used 32800 B
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Allocating 7056 bytes (aligned)
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Allocating 7056 bytes (aligned)
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Allocating 7056 bytes (aligned)
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Allocating 7056 bytes (aligned)
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Allocating 196 bytes (aligned)
Jitted kernel for init: in_mode: ProxyInput
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 12
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,NONE,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5b225000 , used 4504 B
Jitted kernel for init: in_mode: ProxyInput
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 8
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,NONE,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5b215000 , used 3692 B
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Allocating 1764 bytes (aligned)
Jitted kernel for init: in_mode: ProxyInput
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 12
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 1
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5b205000 , used 62312 B
Jitted kernel for init: in_mode: ProxyInput
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 8
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,RELU,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 1
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5b1f5000 , used 49532 B
Running ConvOneJit
Scratches: 0 @ 0
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 12
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,BINOP_ADD_MATRIX,RELU,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5b1e5000 , used 5204 B
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [7x3]
out_features: 8
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,BINOP_ADD_MATRIX,RELU,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5b1d5000 , used 4056 B
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Allocating 1764 bytes (aligned)
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Allocating 1764 bytes (aligned)
Running ConvOneJit
Scratches: 0 @ 0
Running AdaptiveAvgPoolingMeta<FLOAT[4]>@NEON
Running TransposeIndexed
Running ForwardingKernelFlatten
Running FCViaConvOne
Tuning ConvTask(batch=1,idepth=1,iheight=1,iwidth=1,ichannels=2048,odepth=1,oheight=1,owidth=1,ochannels=1000,kdepth=1,kheight=1,kwidth=1,dstride=1,hstride=1,wstride=1,ddilation=1,hdilation=1,wdilation=1,dpad=0,hpad=0,wpad=0,dtype=FLOAT,extbatch=1,mut_w=0)
Found preset with best score ConvOnePreset(in_regs=3,w_regs=4,w_prefetches=0,outf_tile=32,in_mode=MS,d_minibatch=256,n_minibatch=200)
Allocating 7 MB (numa)
Scratches: 0 @ 0
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [3x4]
out_features: 16
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[NONE,NONE,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5a9e4000 , used 1204 B
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::OUTPUT
in_dtype: FLOAT
ref_grid: [3x4]
out_features: 16
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[NONE,NONE,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5a9d4000 , used 1456 B
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::OUTPUT
in_dtype: FLOAT
ref_grid: [3x4]
out_features: 16
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,NONE,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5a9c4000 , used 1624 B
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::ZERO
in_dtype: FLOAT
ref_grid: [3x4]
out_features: 8
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[NONE,NONE,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5a9b4000 , used 868 B
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::OUTPUT
in_dtype: FLOAT
ref_grid: [3x4]
out_features: 8
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[NONE,NONE,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5a9a4000 , used 1024 B
Jitted kernel for init: in_mode: MultiStream
acc_init: AccInitializer::OUTPUT
in_dtype: FLOAT
ref_grid: [3x4]
out_features: 8
in_tail_cols: 0
int8_apply_filter_offset: 0
int8_shift_uint8_to_sint8: 0
postprocessing_ops: PP[BINOP_ADD_LINEAR,NONE,NONE,NONE,NONE,NONE,]
inner_iter_length: [no-value]

sparse_proxy_in_optimization: 0
strided_weights: 0
input_can_read_last_full_vector: 0
weights_can_read_last_full_vector: 0
prefetch_options: {w_ahead: 0}
 
 at 0xffff5a994000 , used 1120 B
Running ForwardingKernelOutput
Creating blob for Input layer  3  with type  FLOAT  format  PlainDataFormat(FORMATF_CAFFE)[0x000000000000000a]  shape  [1, 3, 224, 224]
Running AIO Network
External allocation: allocating 4000 bytes
Creating external output: 0 , shape: [1, 1000]
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Data
Running Input
Running TransposeBRC3x4
Running ConvViaJitMatmul
Running MaxPoolingMeta<FLOAT[4]>@NEON
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running ConvOneJit
Scratches: 0 @ 0
Running ConvViaJitMatmul
Running ConvOneJit
Scratches: 0 @ 0
Running AdaptiveAvgPoolingMeta<FLOAT[4]>@NEON
Running TransposeIndexed
Running ForwardingKernelFlatten
Running FCViaConvOne
Scratches: 0 @ 0
Running ForwardingKernelOutput
Latency: 210 ms