Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Julia Crashes from Assertion #1429

Closed
avik-pal opened this issue May 11, 2024 · 10 comments
Closed

Julia Crashes from Assertion #1429

avik-pal opened this issue May 11, 2024 · 10 comments

Comments

@avik-pal
Copy link
Contributor

avik-pal commented May 11, 2024

using Lux, Enzyme, Random

model = Dense(10 => 10, gelu; use_bias=false)  # use_bias = false produces the segfault
ps, st = Lux.setup(Xoshiro(1234), model)
x = randn(Float32, 10)

loss_function(model, x, ps, st) = sum(abs2, first(model(x, ps, st)))

loss_function(model, x, ps, st)

begin
    dps = Enzyme.make_zero(ps)
    dx = Enzyme.make_zero(x)

    Enzyme.autodiff(Enzyme.Reverse, loss_function, Active, Const(model),
        Duplicated(x, dx), Duplicated(ps, dps), Const(st))

    dx, dps
end
julia: /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:3791: bool GradientUtils::legalRecompute(const llvm::Value*, const ValueToValueMapTy&, llvm::IRBuilder<>*, bool, bool) const: Assertion `phi->getNumIncomingValues() != 0' failed.

If use_bias = true, then we don't get a julia segfault but it still errors.

If we don't specify an activation function, then it works fine

Crash Log
; Function Attrs: mustprogress willreturn
define internal fastcc void @preprocess_julia_fast_materialize_threaded__2755({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="125797725781712" "enzymejl_parmtype_ref"="2" %0, { [1 x {} addrspace(10)*] } addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(8) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,-1]:Float@float, [-1,0,8]:Integer, [-1,0,9]:Integer, [-1,0,10]:Integer, [-1,0,11]:Integer, [-1,0,12]:Integer, [-1,0,13]:Integer, [-1,0,14]:Integer, [-1,0,15]:Integer, [-1,0,16]:Integer, [-1,0,17]:Integer, [-1,0,18]:Integer, [-1,0,19]:Integer, [-1,0,20]:Integer, [-1,0,21]:Integer, [-1,0,22]:Integer, [-1,0,23]:Integer, [-1,0,24]:Integer, [-1,0,25]:Integer, [-1,0,26]:Integer, [-1,0,27]:Integer, [-1,0,28]:Integer, [-1,0,29]:Integer, [-1,0,30]:Integer, [-1,0,31]:Integer, [-1,0,32]:Integer, [-1,0,33]:Integer, [-1,0,34]:Integer, [-1,0,35]:Integer, [-1,0,36]:Integer, [-1,0,37]:Integer, [-1,0,38]:Integer, [-1,0,39]:Integer}" "enzymejl_parmtype"="125797729165136" "enzymejl_parmtype_ref"="1" %1, [2 x [1 x i64]] addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(16) "enzyme_inactive" "enzyme_type"="{[-1]:Pointer, [-1,0]:Integer, [-1,1]:Integer, [-1,2]:Integer, [-1,3]:Integer, [-1,4]:Integer, [-1,5]:Integer, [-1,6]:Integer, [-1,7]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer}" "enzymejl_parmtype"="125797567845648" "enzymejl_parmtype_ref"="1" %2) unnamed_addr #88 !dbg !4085 {
top:
  %3 = call noalias nonnull dereferenceable(56) dereferenceable_or_null(56) i8* @malloc(i64 56), !enzyme_fromstack !316
  %4 = bitcast i8* %3 to { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }*, !enzyme_caststack !90
  %.sub = bitcast { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }* %4 to i8*
  %5 = call noalias nonnull dereferenceable(24) dereferenceable_or_null(24) i8* @malloc(i64 24), !enzyme_fromstack !968
  %newstruct13 = bitcast i8* %5 to { [1 x [1 x i64]], [2 x i64] }*, !enzyme_caststack !90
  %6 = call noalias nonnull dereferenceable(24) dereferenceable_or_null(24) i8* @malloc(i64 24), !enzyme_fromstack !968
  %newstruct30 = bitcast i8* %6 to { [1 x [1 x i64]], [2 x i64] }*, !enzyme_caststack !90
  %7 = call {}*** @julia.get_pgcstack() #91
  %ptls_field170 = getelementptr inbounds {}**, {}*** %7, i64 2
  %8 = bitcast {}*** %ptls_field170 to i64***
  %ptls_load171172 = load i64**, i64*** %8, align 8, !tbaa !91
  %9 = getelementptr inbounds i64*, i64** %ptls_load171172, i64 2
  %safepoint = load i64*, i64** %9, align 8, !tbaa !95
  fence syncscope("singlethread") seq_cst
  call void @julia.safepoint(i64* %safepoint) #91, !dbg !4086
  fence syncscope("singlethread") seq_cst
  %10 = getelementptr inbounds [2 x [1 x i64]], [2 x [1 x i64]] addrspace(11)* %2, i64 0, i64 1, i64 0, !dbg !4087
  %11 = call i64 @julia_nthreads_2932() #92, !dbg !4089
  %unbox = load i64, i64 addrspace(11)* %10, align 8, !dbg !4090, !tbaa !95, !alias.scope !313, !noalias !314
  %12 = icmp slt i64 %unbox, 1, !dbg !4090
  br i1 %12, label %L616, label %L6, !dbg !4092

L6:                                               ; preds = %top
  %13 = call i64 @llvm.smin.i64(i64 %unbox, i64 %11) #91, !dbg !4094
  %.not = icmp eq i64 %13, 0, !dbg !4095
  br i1 %.not, label %L393, label %L14, !dbg !4096

L14:                                              ; preds = %L6
  %14 = trunc i64 %13 to i32, !dbg !4097
  %15 = add i32 %14, -1, !dbg !4097
  %16 = call nonnull "enzyme_inactive" {}* @julia.pointer_from_objref({} addrspace(11)* noundef addrspacecast ({}* inttoptr (i64 125797279085328 to {}*) to {} addrspace(11)*)) #93, !dbg !4101
  %17 = icmp sgt i32 %15, 0, !dbg !4103
  br i1 %17, label %L24, label %L393, !dbg !4104

L24:                                              ; preds = %L14
  %p.i = bitcast {}* %16 to i64*, !dbg !4106
  %v.i = atomicrmw xchg i64* %p.i, i64 0 acq_rel, align 8, !dbg !4106
  %18 = call i64 @llvm.ctpop.i64(i64 %v.i) #91, !dbg !4109, !range !1713
  %19 = trunc i64 %18 to i32, !dbg !4111
  %20 = sub nsw i32 %15, %19, !dbg !4112
  %21 = icmp slt i32 %20, 0, !dbg !4114
  br i1 %21, label %L37, label %L72, !dbg !4117

L37:                                              ; preds = %L24
  %22 = call i64 @llvm.ctlz.i64(i64 %v.i, i1 noundef false) #91, !dbg !4118, !range !1713
  %23 = trunc i64 %22 to i32, !dbg !4120
  br label %L40, !dbg !4120

L40:                                              ; preds = %L40, %L37
  %iv = phi i64 [ %iv.next, %L40 ], [ 0, %L37 ]
  %value_phi119 = phi i32 [ %23, %L37 ], [ %24, %L40 ]
  %value_phi120 = phi i32 [ %20, %L37 ], [ %33, %L40 ]
  %value_phi121 = phi i64 [ %v.i, %L37 ], [ %29, %L40 ]
  %iv.next = add nuw nsw i64 %iv, 1, !dbg !4121
  %24 = sub i32 %value_phi119, %value_phi120, !dbg !4121
  %25 = sub i32 64, %24, !dbg !4123
  %26 = zext i32 %25 to i64, !dbg !4125
  %27 = icmp ugt i32 %25, 63, !dbg !4125
  %notmask = shl nsw i64 -1, %26, !dbg !4123
  %.op = xor i64 %notmask, -1, !dbg !4123
  %28 = select i1 %27, i64 -1, i64 %.op, !dbg !4123
  %29 = and i64 %28, %value_phi121, !dbg !4126
  %30 = xor i64 %29, %value_phi121, !dbg !4128
  %31 = call i64 @llvm.ctpop.i64(i64 %30) #91, !dbg !4129, !range !1713
  %32 = trunc i64 %31 to i32, !dbg !4131
  %33 = add i32 %value_phi120, %32, !dbg !4132
  %.not185 = icmp eq i32 %33, 0, !dbg !4133
  br i1 %.not185, label %L61, label %L40, !dbg !4134

L61:                                              ; preds = %L40
  %34 = xor i64 %29, -1, !dbg !4135
  %35 = and i64 %v.i, %34, !dbg !4137
  store atomic i64 %35, i64* %p.i release, align 16, !dbg !4138, !noalias !4139
  br label %L72, !dbg !4142

L72:                                              ; preds = %L61, %L24
  %value_phi60 = phi i32 [ %15, %L61 ], [ %19, %L24 ]
  %value_phi61 = phi i64 [ %29, %L61 ], [ %v.i, %L24 ]
  %36 = icmp sgt i32 %value_phi60, 0, !dbg !4143
  br i1 %36, label %L133.lr.ph, label %L393, !dbg !4144

L133.lr.ph:                                       ; preds = %L72
  %37 = zext i32 %value_phi60 to i64, !dbg !4145
  %38 = add nuw nsw i64 %37, 1, !dbg !4162
  %39 = udiv i64 %unbox, %38, !dbg !4164
  %40 = mul i64 %39, %38, !dbg !4165
  %41 = sub i64 %unbox, %40, !dbg !4167
  %42 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !4168
  %43 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %42) #93, !dbg !4168
  %44 = bitcast {}* %43 to i8**, !dbg !4168
  %arrayptr64 = load i8*, i8** %44, align 8, !dbg !4168, !tbaa !95, !alias.scope !313, !noalias !314, !nonnull !90
  %45 = ptrtoint i8* %arrayptr64 to i64, !dbg !4168
  %46 = addrspacecast {} addrspace(10)* %0 to {} addrspace(10)* addrspace(11)*, !dbg !4178
  %arraysize_ptr65 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %46, i64 3, !dbg !4178
  %47 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr65 to i64 addrspace(11)*, !dbg !4178
  %arraysize66 = load i64, i64 addrspace(11)* %47, align 8, !dbg !4178, !tbaa !95, !range !131, !alias.scope !313, !noalias !314
  %arraysize_ptr67 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %46, i64 4, !dbg !4178
  %48 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr67 to i64 addrspace(11)*, !dbg !4178
  %arraysize68 = load i64, i64 addrspace(11)* %48, align 16, !dbg !4178, !tbaa !95, !range !131, !alias.scope !313, !noalias !314
  %getfield_addr73 = getelementptr inbounds { [1 x {} addrspace(10)*] }, { [1 x {} addrspace(10)*] } addrspace(11)* %1, i64 0, i32 0, i64 0, !dbg !4184
  %getfield74 = load atomic {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %getfield_addr73 unordered, align 8, !dbg !4184, !tbaa !95, !alias.scope !313, !noalias !314, !nonnull !90, !dereferenceable !315, !align !316
  %49 = addrspacecast {} addrspace(10)* %getfield74 to {} addrspace(11)*, !dbg !4188
  %50 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %49) #93, !dbg !4188
  %51 = bitcast {}* %50 to i8**, !dbg !4188
  %arrayptr76 = load i8*, i8** %51, align 8, !dbg !4188, !tbaa !95, !alias.scope !313, !noalias !314, !nonnull !90
  %52 = ptrtoint i8* %arrayptr76 to i64, !dbg !4188
  %53 = addrspacecast {} addrspace(10)* %getfield74 to {} addrspace(10)* addrspace(11)*, !dbg !4195
  %arraysize_ptr77 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %53, i64 3, !dbg !4195
  %54 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr77 to i64 addrspace(11)*, !dbg !4195
  %arraysize78 = load i64, i64 addrspace(11)* %54, align 8, !dbg !4195, !tbaa !95, !range !131, !alias.scope !313, !noalias !314
  %arraysize_ptr79 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %53, i64 4, !dbg !4195
  %55 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr79 to i64 addrspace(11)*, !dbg !4195
  %arraysize80 = load i64, i64 addrspace(11)* %55, align 16, !dbg !4195, !tbaa !95, !range !131, !alias.scope !313, !noalias !314
  %56 = insertvalue [1 x {} addrspace(10)*] zeroinitializer, {} addrspace(10)* %getfield74, 0, !dbg !4201
  %57 = load i64, i64 addrspace(11)* %10, align 8, !dbg !4202, !tbaa !95, !alias.scope !313, !noalias !314
  %newstruct87.sroa.0.0..sroa_idx = getelementptr inbounds { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }, { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }* %4, i64 0, i32 0, i32 0, !dbg !4203
  store i64 %45, i64* %newstruct87.sroa.0.0..sroa_idx, align 16, !dbg !4203, !tbaa !340, !alias.scope !1043, !noalias !4204
  %newstruct87.sroa.2.0..sroa_idx134 = getelementptr inbounds { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }, { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }* %4, i64 0, i32 0, i32 1, i64 0, !dbg !4203
  store i64 %arraysize66, i64* %newstruct87.sroa.2.0..sroa_idx134, align 8, !dbg !4203, !tbaa !340, !alias.scope !1043, !noalias !4204
  %newstruct87.sroa.3.0..sroa_idx135 = getelementptr inbounds { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }, { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }* %4, i64 0, i32 0, i32 1, i64 1, !dbg !4203
  store i64 %arraysize68, i64* %newstruct87.sroa.3.0..sroa_idx135, align 16, !dbg !4203, !tbaa !340, !alias.scope !1043, !noalias !4204
  %newstruct87.sroa.4.0..sroa_idx136 = getelementptr inbounds { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }, { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }* %4, i64 0, i32 1, i64 0, !dbg !4203
  store i64 %57, i64* %newstruct87.sroa.4.0..sroa_idx136, align 8, !dbg !4203, !tbaa !340, !alias.scope !1043, !noalias !4204
  %newstruct87.sroa.5.0..sroa_idx137 = getelementptr inbounds { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }, { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }* %4, i64 0, i32 2, i32 0, i64 0, i32 0, !dbg !4203
  store i64 %52, i64* %newstruct87.sroa.5.0..sroa_idx137, align 16, !dbg !4203, !tbaa !340, !alias.scope !1043, !noalias !4204
  %newstruct87.sroa.6.0..sroa_idx138 = getelementptr inbounds { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }, { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }* %4, i64 0, i32 2, i32 0, i64 0, i32 1, i64 0, !dbg !4203
  store i64 %arraysize78, i64* %newstruct87.sroa.6.0..sroa_idx138, align 8, !dbg !4203, !tbaa !340, !alias.scope !1043, !noalias !4204
  %newstruct87.sroa.7.0..sroa_idx139 = getelementptr inbounds { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }, { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }* %4, i64 0, i32 2, i32 0, i64 0, i32 1, i64 1, !dbg !4203
  store i64 %arraysize80, i64* %newstruct87.sroa.7.0..sroa_idx139, align 16, !dbg !4203, !tbaa !340, !alias.scope !1043, !noalias !4204
  %58 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %0, [1 x {} addrspace(10)*] %56) #91, !dbg !4175
  %59 = icmp sgt i64 %41, -1
  br label %L133, !dbg !4205

L133:                                             ; preds = %L187, %L133.lr.ph
  %iv1 = phi i64 [ %iv.next2, %L187 ], [ 0, %L133.lr.ph ]
  %value_phi95200 = phi i64 [ %value_phi61, %L133.lr.ph ], [ %72, %L187 ]
  %value_phi93198 = phi i64 [ 0, %L133.lr.ph ], [ %66, %L187 ]
  %value_phi92197 = phi i32 [ 0, %L133.lr.ph ], [ %68, %L187 ]
  %iv.next2 = add nuw nsw i64 %iv1, 1, !dbg !4206
  %60 = icmp ne i64 %value_phi95200, 0, !dbg !4206
  call void @llvm.assume(i1 noundef %60) #91, !dbg !4209
  %61 = call i64 @llvm.cttz.i64(i64 %value_phi95200, i1 noundef true) #91, !dbg !4210, !range !1713
  %62 = trunc i64 %61 to i32, !dbg !4212
  %63 = icmp ugt i64 %41, %iv1, !dbg !4213
  %not.ifelse_cond96 = and i1 %59, %63, !dbg !4217
  %64 = zext i1 %not.ifelse_cond96 to i64, !dbg !4217
  %65 = add i64 %value_phi93198, %39, !dbg !4217
  %66 = add i64 %65, %64, !dbg !4218
  %67 = add nuw nsw i32 %62, 1, !dbg !4219
  %68 = add i32 %67, %value_phi92197, !dbg !4221
  %69 = zext i32 %67 to i64, !dbg !4223
  %70 = lshr i64 %value_phi95200, %69, !dbg !4223
  %71 = icmp eq i32 %62, 63, !dbg !4223
  %72 = select i1 %71, i64 0, i64 %70, !dbg !4223
  %73 = load i64, i64* inttoptr (i64 125797243527104 to i64*), align 64, !dbg !4225, !tbaa !247, !alias.scope !117, !noalias !120
  %74 = shl i32 %68, 9, !dbg !4231
  %75 = zext i32 %74 to i64, !dbg !4232
  %76 = inttoptr i64 %73 to i8*, !dbg !4236
  %77 = getelementptr i8, i8* %76, i64 %75, !dbg !4236
  %78 = getelementptr i8, i8* %77, i64 8, !dbg !4237
  %coercion = bitcast i8* %78 to i64*, !dbg !4243
  store i64 ptrtoint (void (i64)* @jlcapi_BatchClosure_2763 to i64), i64* %coercion, align 1, !dbg !4243, !tbaa !331, !alias.scope !117, !noalias !4247
  %79 = getelementptr i8, i8* %77, i64 16, !dbg !4248
  %80 = bitcast i8* %79 to { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }**, !dbg !4252
  store { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }* %4, { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }** %80, align 1, !dbg !4252, !tbaa !331, !alias.scope !117, !noalias !4247
  %81 = getelementptr i8, i8* %77, i64 24, !dbg !4256
  %coercion98 = bitcast i8* %81 to i64*, !dbg !4260
  store i64 %value_phi93198, i64* %coercion98, align 1, !dbg !4260, !tbaa !331, !alias.scope !117, !noalias !4247
  %82 = getelementptr i8, i8* %77, i64 32, !dbg !4264
  %coercion99 = bitcast i8* %82 to i64*, !dbg !4268
  store i64 %66, i64* %coercion99, align 1, !dbg !4268, !tbaa !331, !alias.scope !117, !noalias !4247
  %p.i128 = bitcast i8* %77 to i32*, !dbg !4272
  %v.i129 = atomicrmw xchg i32* %p.i128, i32 0 acq_rel, align 4, !dbg !4272
  %.not178 = icmp eq i32 %v.i129, 1, !dbg !4275
  br i1 %.not178, label %L184, label %L187, !dbg !4276

L184:                                             ; preds = %L133
  call fastcc void @julia_wake_thread__2921(i32 zeroext %68) #91, !dbg !4276
  br label %L187, !dbg !4276

L187:                                             ; preds = %L184, %L133
  %83 = icmp eq i64 %iv.next2, %37, !dbg !4277
  br i1 %83, label %L189, label %L133, !dbg !4205

L189:                                             ; preds = %L187
  %84 = add i64 %66, 1, !dbg !4279
  %.not179 = icmp sgt i64 %84, %unbox, !dbg !4281
  %value_phi101 = select i1 %.not179, i64 %66, i64 %unbox, !dbg !4283
  %.not180 = icmp sgt i64 %84, %value_phi101, !dbg !4287
  %85 = shl i64 %arraysize66, 2, !dbg !4297
  %86 = mul i64 %85, %66, !dbg !4307
  %87 = getelementptr i8, i8* %arrayptr64, i64 %86, !dbg !4309
  %88 = sub i64 %value_phi101, %66, !dbg !4310
  %89 = select i1 %.not180, i64 0, i64 %88, !dbg !4310
  %90 = shl i64 %arraysize78, 2, !dbg !4318
  %91 = mul i64 %90, %66, !dbg !4329
  %92 = getelementptr i8, i8* %arrayptr76, i64 %91, !dbg !4331
  %93 = mul i64 %89, %arraysize66, !dbg !4332
  %94 = call i64 @llvm.smax.i64(i64 %93, i64 noundef 0) #91, !dbg !4341
  %.not181 = icmp slt i64 %93, 1, !dbg !4346
  br i1 %.not181, label %L349, label %L301.preheader, !dbg !4347

L301.preheader:                                   ; preds = %L189
  br label %L301, !dbg !4348

L301:                                             ; preds = %L301.preheader, %L301
  %iv3 = phi i64 [ 0, %L301.preheader ], [ %iv.next4, %L301 ]
  %iv.next4 = add nuw nsw i64 %iv3, 1, !dbg !4349
  %95 = shl i64 %iv3, 2, !dbg !4352
  %96 = getelementptr i8, i8* %92, i64 %95, !dbg !4357
  %coercion111 = bitcast i8* %96 to float*, !dbg !4358
  %pointerref = load float, float* %coercion111, align 1, !dbg !4358, !tbaa !331, !alias.scope !117, !noalias !120
  call void @llvm.lifetime.end.p0i8(i64 noundef 56, i8* noundef nonnull %.sub) #91
  %97 = call fastcc float @julia_gelu_2739(float %pointerref) #91, !dbg !4355
  %98 = getelementptr i8, i8* %87, i64 %95, !dbg !4362
  %coercion112 = bitcast i8* %98 to float*, !dbg !4364
  store float %97, float* %coercion112, align 1, !dbg !4364, !tbaa !331, !alias.scope !117, !noalias !4247
  %exitcond202.not = icmp eq i64 %iv.next4, %94, !dbg !4368
  br i1 %exitcond202.not, label %L349.loopexit, label %L301, !dbg !4348, !llvm.loop !4369

L349.loopexit:                                    ; preds = %L301
  br label %L349, !dbg !4370

L349:                                             ; preds = %L349.loopexit, %L189
  %99 = icmp eq i64 %value_phi61, 0, !dbg !4370
  br i1 %99, label %L387, label %L355.preheader, !dbg !4372

L355.preheader:                                   ; preds = %L349
  br label %L355, !dbg !4373

L355:                                             ; preds = %L355.preheader, %L385
  %iv5 = phi i64 [ 0, %L355.preheader ], [ %iv.next6, %L385 ]
  %value_phi115194 = phi i64 [ %104, %L385 ], [ %value_phi61, %L355.preheader ]
  %value_phi114193 = phi i32 [ %106, %L385 ], [ 0, %L355.preheader ]
  %iv.next6 = add nuw nsw i64 %iv5, 1, !dbg !4376
  %100 = call i64 @llvm.cttz.i64(i64 %value_phi115194, i1 noundef true) #91, !dbg !4376, !range !1713
  %101 = trunc i64 %100 to i32, !dbg !4378
  %102 = add nuw nsw i32 %101, 1, !dbg !4379
  %103 = zext i32 %102 to i64, !dbg !4381
  %104 = lshr i64 %value_phi115194, %103, !dbg !4381
  %105 = icmp eq i32 %101, 63, !dbg !4381
  %106 = add i32 %102, %value_phi114193, !dbg !4383
  %107 = load i64, i64* inttoptr (i64 125797243527104 to i64*), align 64, !dbg !4385, !tbaa !247, !alias.scope !117, !noalias !120
  %108 = shl i32 %106, 9, !dbg !4388
  %109 = zext i32 %108 to i64, !dbg !4389
  %110 = inttoptr i64 %107 to i8*, !dbg !4393
  %111 = getelementptr i8, i8* %110, i64 %109, !dbg !4393
  %p.i130 = bitcast i8* %111 to i32*, !dbg !4394
  %v.i131190 = load atomic i32, i32* %p.i130 acquire, align 16, !dbg !4394
  %.not183191 = icmp eq i32 %v.i131190, 0, !dbg !4396
  br i1 %.not183191, label %L375.preheader, label %L385, !dbg !4373

L375.preheader:                                   ; preds = %L355
  br label %L375, !dbg !4397

L375:                                             ; preds = %L375.preheader, %L382
  %iv7 = phi i64 [ 0, %L375.preheader ], [ %iv.next8, %L382 ]
  %112 = trunc i64 %iv7 to i32
  %iv.next8 = add nuw nsw i64 %iv7, 1
  call void @llvm.lifetime.end.p0i8(i64 noundef 56, i8* noundef nonnull %.sub) #91
  call void asm sideeffect "pause", "~{memory}"() #94, !dbg !4398
  %113 = add i32 %112, 1, !dbg !4400
  %114 = icmp ult i32 %113, 65537, !dbg !4401
  br i1 %114, label %L382, label %L379, !dbg !4397

L379:                                             ; preds = %L375
  %115 = call fastcc i8 @julia_checktask_2772(i32 zeroext %106) #91, !dbg !4403
  %116 = and i8 %115, 1, !dbg !4403
  %.not184 = icmp eq i8 %116, 0, !dbg !4403
  br i1 %.not184, label %L382, label %L385.loopexit, !dbg !4403

L382:                                             ; preds = %L379, %L375
  %v.i131 = load atomic i32, i32* %p.i130 acquire, align 16, !dbg !4394
  %.not183 = icmp eq i32 %v.i131, 0, !dbg !4396
  br i1 %.not183, label %L375, label %L385.loopexit, !dbg !4373

L385.loopexit:                                    ; preds = %L379, %L382
  br label %L385, !dbg !4370

L385:                                             ; preds = %L385.loopexit, %L355
  %117 = icmp eq i64 %104, 0, !dbg !4370
  %118 = select i1 %105, i1 true, i1 %117, !dbg !4370
  br i1 %118, label %L387.loopexit, label %L355, !dbg !4372

L387.loopexit:                                    ; preds = %L385
  br label %L387, !dbg !4404

L387:                                             ; preds = %L387.loopexit, %L349
  %v.i133 = atomicrmw or i64* %p.i, i64 %value_phi61 acq_rel, align 8, !dbg !4404
  br label %L616, !dbg !4407

L393:                                             ; preds = %L72, %L14, %L6
  %119 = call i64 @llvm.smax.i64(i64 %unbox, i64 noundef 0) #91, !dbg !4408
  %.not173.inv = icmp sgt i64 %unbox, 0, !dbg !4411
  %value_phi7 = select i1 %.not173.inv, i64 %119, i64 0, !dbg !4411
  %120 = addrspacecast {} addrspace(10)* %0 to {} addrspace(10)* addrspace(11)*, !dbg !4419
  %arraysize_ptr = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %120, i64 3, !dbg !4419
  %121 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr to i64 addrspace(11)*, !dbg !4419
  %arraysize = load i64, i64 addrspace(11)* %121, align 8, !dbg !4419, !tbaa !95, !range !131, !alias.scope !313, !noalias !314
  %memcpy_refined_dst14 = getelementptr inbounds { [1 x [1 x i64]], [2 x i64] }, { [1 x [1 x i64]], [2 x i64] }* %newstruct13, i64 0, i32 0, i64 0, i64 0, !dbg !4425
  store i64 %arraysize, i64* %memcpy_refined_dst14, align 8, !dbg !4425, !tbaa !397, !alias.scope !399, !noalias !4427
  %newstruct8.sroa.0.0..sroa_idx = getelementptr inbounds { [1 x [1 x i64]], [2 x i64] }, { [1 x [1 x i64]], [2 x i64] }* %newstruct13, i64 0, i32 1, i64 0, !dbg !4425
  store i64 1, i64* %newstruct8.sroa.0.0..sroa_idx, align 8, !dbg !4425, !tbaa !397, !alias.scope !399, !noalias !4427
  %newstruct8.sroa.5.0..sroa_idx146 = getelementptr inbounds { [1 x [1 x i64]], [2 x i64] }, { [1 x [1 x i64]], [2 x i64] }* %newstruct13, i64 0, i32 1, i64 1, !dbg !4425
  store i64 %value_phi7, i64* %newstruct8.sroa.5.0..sroa_idx146, align 8, !dbg !4425, !tbaa !397, !alias.scope !399, !noalias !4427
  %arraysize_ptr15 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %120, i64 4, !dbg !4428
  %122 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr15 to i64 addrspace(11)*, !dbg !4428
  %arraysize16 = load i64, i64 addrspace(11)* %122, align 16, !dbg !4428, !tbaa !95, !range !131, !alias.scope !313, !noalias !314
  %123 = icmp eq i64 %value_phi7, 0, !dbg !4432
  %124 = add nsw i64 %value_phi7, -1, !dbg !4438
  %125 = icmp ult i64 %124, %arraysize16, !dbg !4440
  %126 = or i1 %123, %125, !dbg !4441
  br i1 %126, label %L464, label %L461, !dbg !4431

L461:                                             ; preds = %L393
  %127 = addrspacecast { [1 x [1 x i64]], [2 x i64] }* %newstruct13 to { [1 x [1 x i64]], [2 x i64] } addrspace(11)*, !dbg !4431
  call fastcc void @julia_throw_boundserror_2928({} addrspace(10)* nofree noundef nonnull align 16 dereferenceable(40) %0, { [1 x [1 x i64]], [2 x i64] } addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(24) %127) #95, !dbg !4431
  unreachable, !dbg !4431

L464:                                             ; preds = %L393
  %getfield_addr = getelementptr inbounds { [1 x {} addrspace(10)*] }, { [1 x {} addrspace(10)*] } addrspace(11)* %1, i64 0, i32 0, i64 0, !dbg !4442
  %getfield = load atomic {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %getfield_addr unordered, align 8, !dbg !4442, !tbaa !95, !alias.scope !313, !noalias !314, !nonnull !90, !dereferenceable !315, !align !316
  %128 = addrspacecast {} addrspace(10)* %getfield to {} addrspace(10)* addrspace(11)*, !dbg !4446
  %arraysize_ptr25 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %128, i64 3, !dbg !4446
  %129 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr25 to i64 addrspace(11)*, !dbg !4446
  %arraysize26 = load i64, i64 addrspace(11)* %129, align 8, !dbg !4446, !tbaa !95, !range !131, !alias.scope !313, !noalias !314
  %memcpy_refined_dst32 = getelementptr inbounds { [1 x [1 x i64]], [2 x i64] }, { [1 x [1 x i64]], [2 x i64] }* %newstruct30, i64 0, i32 0, i64 0, i64 0, !dbg !4451
  store i64 %arraysize26, i64* %memcpy_refined_dst32, align 8, !dbg !4451, !tbaa !397, !alias.scope !399, !noalias !4427
  %newstruct8.sroa.0.0..sroa_idx142 = getelementptr inbounds { [1 x [1 x i64]], [2 x i64] }, { [1 x [1 x i64]], [2 x i64] }* %newstruct30, i64 0, i32 1, i64 0, !dbg !4451
  store i64 1, i64* %newstruct8.sroa.0.0..sroa_idx142, align 8, !dbg !4451, !tbaa !397, !alias.scope !399, !noalias !4427
  %newstruct8.sroa.5.0..sroa_idx147 = getelementptr inbounds { [1 x [1 x i64]], [2 x i64] }, { [1 x [1 x i64]], [2 x i64] }* %newstruct30, i64 0, i32 1, i64 1, !dbg !4451
  store i64 %value_phi7, i64* %newstruct8.sroa.5.0..sroa_idx147, align 8, !dbg !4451, !tbaa !397, !alias.scope !399, !noalias !4427
  %arraysize_ptr33 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %128, i64 4, !dbg !4453
  %130 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr33 to i64 addrspace(11)*, !dbg !4453
  %arraysize34 = load i64, i64 addrspace(11)* %130, align 16, !dbg !4453, !tbaa !95, !range !131, !alias.scope !313, !noalias !314
  %131 = icmp ult i64 %124, %arraysize34, !dbg !4457
  %132 = or i1 %123, %131, !dbg !4462
  br i1 %132, label %L503, label %L500, !dbg !4456

L500:                                             ; preds = %L464
  %133 = addrspacecast { [1 x [1 x i64]], [2 x i64] }* %newstruct30 to { [1 x [1 x i64]], [2 x i64] } addrspace(11)*, !dbg !4456
  call fastcc void @julia_throw_boundserror_2928({} addrspace(10)* nofree noundef nonnull align 16 dereferenceable(40) %getfield, { [1 x [1 x i64]], [2 x i64] } addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(24) %133) #95, !dbg !4456
  unreachable, !dbg !4456

L503:                                             ; preds = %L464
  %134 = mul i64 %arraysize, %value_phi7, !dbg !4463
  %135 = call i64 @llvm.smax.i64(i64 %134, i64 noundef 0) #91, !dbg !4472
  %.not174 = icmp slt i64 %134, 1, !dbg !4477
  br i1 %.not174, label %L616, label %L552.lr.ph, !dbg !4478

L552.lr.ph:                                       ; preds = %L503
  %136 = addrspacecast {} addrspace(10)* %getfield to float addrspace(13)* addrspace(11)*
  %137 = addrspacecast {} addrspace(10)* %0 to float addrspace(13)* addrspace(11)*
  br label %L552, !dbg !4479

L552:                                             ; preds = %L552, %L552.lr.ph
  %iv9 = phi i64 [ %iv.next10, %L552 ], [ 0, %L552.lr.ph ]
  %iv.next10 = add nuw nsw i64 %iv9, 1, !dbg !4480
  %arrayptr176 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %136, align 16, !dbg !4483, !tbaa !95, !alias.scope !4487, !noalias !314, !llvm.mem.parallel_loop_access !4488, !nonnull !90
  %138 = getelementptr inbounds float, float addrspace(13)* %arrayptr176, i64 %iv9, !dbg !4483
  %arrayref = load float, float addrspace(13)* %138, align 4, !dbg !4483, !tbaa !177, !alias.scope !117, !noalias !120, !llvm.mem.parallel_loop_access !4488
  %139 = call fastcc float @julia_gelu_2739(float %arrayref) #91, !dbg !4485, !llvm.mem.parallel_loop_access !4488
  %arrayptr54177 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %137, align 16, !dbg !4490, !tbaa !95, !alias.scope !4487, !noalias !314, !llvm.mem.parallel_loop_access !4488, !nonnull !90
  %140 = getelementptr inbounds float, float addrspace(13)* %arrayptr54177, i64 %iv9, !dbg !4490
  store float %139, float addrspace(13)* %140, align 4, !dbg !4490, !tbaa !177, !alias.scope !117, !noalias !4247, !llvm.mem.parallel_loop_access !4488
  %exitcond.not = icmp eq i64 %iv.next10, %135, !dbg !4492
  br i1 %exitcond.not, label %L616.loopexit, label %L552, !dbg !4479, !llvm.loop !4489

L616.loopexit:                                    ; preds = %L552
  br label %L616

L616:                                             ; preds = %L616.loopexit, %L503, %L387, %top
  call void @llvm.lifetime.end.p0i8(i64 noundef 56, i8* noundef nonnull %.sub) #91
  ret void, !dbg !4493
}

; Function Attrs: mustprogress willreturn
define internal fastcc void @diffejulia_fast_materialize_threaded__2755({} addrspace(10)* align 16 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="125797725781712" "enzymejl_parmtype_ref"="2" %0, {} addrspace(10)* align 16 "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="125797725781712" "enzymejl_parmtype_ref"="2" %"'", { [1 x {} addrspace(10)*] } addrspace(11)* nocapture nofree readonly align 8 dereferenceable(8) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,-1]:Float@float, [-1,0,8]:Integer, [-1,0,9]:Integer, [-1,0,10]:Integer, [-1,0,11]:Integer, [-1,0,12]:Integer, [-1,0,13]:Integer, [-1,0,14]:Integer, [-1,0,15]:Integer, [-1,0,16]:Integer, [-1,0,17]:Integer, [-1,0,18]:Integer, [-1,0,19]:Integer, [-1,0,20]:Integer, [-1,0,21]:Integer, [-1,0,22]:Integer, [-1,0,23]:Integer, [-1,0,24]:Integer, [-1,0,25]:Integer, [-1,0,26]:Integer, [-1,0,27]:Integer, [-1,0,28]:Integer, [-1,0,29]:Integer, [-1,0,30]:Integer, [-1,0,31]:Integer, [-1,0,32]:Integer, [-1,0,33]:Integer, [-1,0,34]:Integer, [-1,0,35]:Integer, [-1,0,36]:Integer, [-1,0,37]:Integer, [-1,0,38]:Integer, [-1,0,39]:Integer}" "enzymejl_parmtype"="125797729165136" "enzymejl_parmtype_ref"="1" %1, { [1 x {} addrspace(10)*] } addrspace(11)* nocapture nofree align 8 "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,-1]:Float@float, [-1,0,8]:Integer, [-1,0,9]:Integer, [-1,0,10]:Integer, [-1,0,11]:Integer, [-1,0,12]:Integer, [-1,0,13]:Integer, [-1,0,14]:Integer, [-1,0,15]:Integer, [-1,0,16]:Integer, [-1,0,17]:Integer, [-1,0,18]:Integer, [-1,0,19]:Integer, [-1,0,20]:Integer, [-1,0,21]:Integer, [-1,0,22]:Integer, [-1,0,23]:Integer, [-1,0,24]:Integer, [-1,0,25]:Integer, [-1,0,26]:Integer, [-1,0,27]:Integer, [-1,0,28]:Integer, [-1,0,29]:Integer, [-1,0,30]:Integer, [-1,0,31]:Integer, [-1,0,32]:Integer, [-1,0,33]:Integer, [-1,0,34]:Integer, [-1,0,35]:Integer, [-1,0,36]:Integer, [-1,0,37]:Integer, [-1,0,38]:Integer, [-1,0,39]:Integer}" "enzymejl_parmtype"="125797729165136" "enzymejl_parmtype_ref"="1" %"'1", [2 x [1 x i64]] addrspace(11)* nocapture nofree readonly align 8 dereferenceable(16) "enzyme_inactive" "enzyme_type"="{[-1]:Pointer, [-1,0]:Integer, [-1,1]:Integer, [-1,2]:Integer, [-1,3]:Integer, [-1,4]:Integer, [-1,5]:Integer, [-1,6]:Integer, [-1,7]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer}" "enzymejl_parmtype"="125797567845648" "enzymejl_parmtype_ref"="1" %2, { i8*, {} addrspace(10)*, i8*, {} addrspace(10)*, i64, i64, i32*, i64, i64, {} addrspace(10)*, i64*, i1*, i64, float*, i64*, i1*, i1**, i1**, i64, float* } %tapeArg) unnamed_addr #88 !dbg !5064 {
top:
  %_replacementA9 = phi i8* 
  %_replacementA8 = phi { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }* 
  %.sub_replacementA = phi i8* 
  %_replacementA7 = phi i8* 
  %newstruct13_replacementA = phi { [1 x [1 x i64]], [2 x i64] }* 
  %_replacementA6 = phi i8* 
  %newstruct30_replacementA = phi { [1 x [1 x i64]], [2 x i64] }* 
  %3 = call {}*** @julia.get_pgcstack() #91
  %ptls_field170_replacementA = phi {}*** 
  %_replacementA5 = phi i64*** 
  %ptls_load171172_replacementA = phi i64** 
  %_replacementA4 = phi i64** 
  %safepoint_replacementA = phi i64* 
  %_replacementA = phi i64 addrspace(11)* , !dbg !5065
  %4 = call i64 @julia_nthreads_2932() #92, !dbg !5067
  %unbox = load i64, i64 addrspace(11)* %_replacementA, align 8, !dbg !5068, !tbaa !95, !alias.scope !5072, !noalias !5075
  %5 = icmp slt i64 %unbox, 1, !dbg !5068
  br i1 %5, label %L616, label %L6, !dbg !5070

L6:                                               ; preds = %top
  %6 = call i64 @llvm.smin.i64(i64 %unbox, i64 %4) #91, !dbg !5077
  %.not = icmp eq i64 %6, 0, !dbg !5078
  br i1 %.not, label %L393, label %L14, !dbg !5079

L14:                                              ; preds = %L6
  %7 = trunc i64 %6 to i32, !dbg !5080
  %8 = add i32 %7, -1, !dbg !5080
  %_replacementA10 = phi {}* , !dbg !5084
  %9 = icmp sgt i32 %8, 0, !dbg !5086
  br i1 %9, label %L24, label %L393, !dbg !5087

L24:                                              ; preds = %L14
  %p.i_replacementA = phi i64* , !dbg !5089
  %v.i_replacementA = phi i64 , !dbg !5089
  %10 = call i64 @llvm.ctpop.i64(i64 %v.i_replacementA) #91, !dbg !5092, !range !1713
  %11 = trunc i64 %10 to i32, !dbg !5094
  %12 = sub nsw i32 %8, %11, !dbg !5095
  %13 = icmp slt i32 %12, 0, !dbg !5097
  br i1 %13, label %L37, label %L72, !dbg !5100

L37:                                              ; preds = %L24
  %_replacementA12 = phi i64 , !dbg !5101
  %_replacementA11 = phi i32 , !dbg !5103
  br label %L40, !dbg !5103

L40:                                              ; preds = %L40, %L37
  %iv = phi i64 [ %iv.next, %L40 ], [ 0, %L37 ]
  %value_phi119_replacementA = phi i32 
  %value_phi120_replacementA = phi i32 
  %value_phi121_replacementA = phi i64 
  %iv.next = add nuw nsw i64 %iv, 1, !dbg !5104
  %_replacementA21 = phi i32 , !dbg !5104
  %_replacementA20 = phi i32 , !dbg !5106
  %_replacementA19 = phi i64 , !dbg !5108
  %_replacementA18 = phi i1 , !dbg !5108
  %notmask_replacementA = phi i64 , !dbg !5106
  %.op_replacementA = phi i64 , !dbg !5106
  %_replacementA17 = phi i64 , !dbg !5106
  %_replacementA16 = phi i64 , !dbg !5109
  %_replacementA15 = phi i64 , !dbg !5111
  %_replacementA14 = phi i64 , !dbg !5112
  %_replacementA13 = phi i32 , !dbg !5114
  %14 = add i32 %value_phi120_replacementA, %_replacementA13, !dbg !5115
  %.not185 = icmp eq i32 %14, 0, !dbg !5116
  br i1 %.not185, label %L61, label %L40, !dbg !5117

L61:                                              ; preds = %L40
  %_replacementA23 = phi i64 , !dbg !5118
  %_replacementA22 = phi i64 , !dbg !5120
  br label %L72, !dbg !5121

L72:                                              ; preds = %L61, %L24
  %value_phi60 = phi i32 [ %8, %L61 ], [ %11, %L24 ]
  %value_phi61 = phi i64 [ %_replacementA16, %L61 ], [ %v.i_replacementA, %L24 ]
  %15 = icmp sgt i32 %value_phi60, 0, !dbg !5122
  br i1 %15, label %L133.lr.ph, label %L393, !dbg !5123

L133.lr.ph:                                       ; preds = %L72
  %16 = zext i32 %value_phi60 to i64, !dbg !5124
  %17 = add nuw nsw i64 %16, 1, !dbg !5141
  %18 = udiv i64 %unbox, %17, !dbg !5143
  %19 = mul i64 %18, %17, !dbg !5144
  %20 = sub i64 %unbox, %19, !dbg !5146
  %21 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !5147
  %22 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %21) #93, !dbg !5147
  %"'ip_phi" = phi {}* , !dbg !5147
  %23 = bitcast {}* %22 to i8**, !dbg !5147
  %arrayptr64 = load i8*, i8** %23, align 8, !dbg !5147, !tbaa !95, !alias.scope !313, !noalias !314, !nonnull !90
  %"arrayptr64'il_phi" = phi i8* , !dbg !5147
  %24 = ptrtoint i8* %arrayptr64 to i64, !dbg !5147
  %25 = addrspacecast {} addrspace(10)* %0 to {} addrspace(10)* addrspace(11)*, !dbg !5157
  %arraysize_ptr65 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %25, i64 3, !dbg !5157
  %26 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr65 to i64 addrspace(11)*, !dbg !5157
  %arraysize66 = load i64, i64 addrspace(11)* %26, align 8, !dbg !5157, !tbaa !95, !range !131, !alias.scope !313, !noalias !314
  %arraysize_ptr67 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %25, i64 4, !dbg !5157
  %27 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr67 to i64 addrspace(11)*, !dbg !5157
  %arraysize68 = load i64, i64 addrspace(11)* %27, align 16, !dbg !5157, !tbaa !95, !range !131, !alias.scope !313, !noalias !314
  %getfield_addr73 = getelementptr inbounds { [1 x {} addrspace(10)*] }, { [1 x {} addrspace(10)*] } addrspace(11)* %1, i64 0, i32 0, i64 0, !dbg !5163
  %getfield74 = load atomic {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %getfield_addr73 unordered, align 8, !dbg !5163, !tbaa !95, !alias.scope !313, !noalias !314, !nonnull !90, !dereferenceable !315, !align !316
  %"getfield74'il_phi" = phi {} addrspace(10)* , !dbg !5167
  %28 = addrspacecast {} addrspace(10)* %getfield74 to {} addrspace(11)*, !dbg !5167
  %29 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %28) #93, !dbg !5167
  %"'ip_phi2" = phi {}* , !dbg !5167
  %30 = bitcast {}* %29 to i8**, !dbg !5167
  %arrayptr76 = load i8*, i8** %30, align 8, !dbg !5167, !tbaa !95, !alias.scope !313, !noalias !314, !nonnull !90
  %"arrayptr76'il_phi" = phi i8* , !dbg !5167
  %31 = ptrtoint i8* %arrayptr76 to i64, !dbg !5167
  %32 = addrspacecast {} addrspace(10)* %getfield74 to {} addrspace(10)* addrspace(11)*, !dbg !5174
  %arraysize_ptr77 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %32, i64 3, !dbg !5174
  %33 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr77 to i64 addrspace(11)*, !dbg !5174
  %arraysize78 = load i64, i64 addrspace(11)* %33, align 8, !dbg !5174, !tbaa !95, !range !131, !alias.scope !313, !noalias !314
  %arraysize_ptr79 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %32, i64 4, !dbg !5174
  %34 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr79 to i64 addrspace(11)*, !dbg !5174
  %arraysize80 = load i64, i64 addrspace(11)* %34, align 16, !dbg !5174, !tbaa !95, !range !131, !alias.scope !313, !noalias !314
  %35 = insertvalue [1 x {} addrspace(10)*] zeroinitializer, {} addrspace(10)* %getfield74, 0, !dbg !5180
  %36 = load i64, i64 addrspace(11)* %_replacementA, align 8, !dbg !5181, !tbaa !95, !alias.scope !313, !noalias !314
  %newstruct87.sroa.0.0..sroa_idx = getelementptr inbounds { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }, { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }* %_replacementA8, i64 0, i32 0, i32 0, !dbg !5182
  store i64 %24, i64* %newstruct87.sroa.0.0..sroa_idx, align 16, !dbg !5182, !tbaa !340, !alias.scope !1043, !noalias !5183
  %newstruct87.sroa.2.0..sroa_idx134 = getelementptr inbounds { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }, { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }* %_replacementA8, i64 0, i32 0, i32 1, i64 0, !dbg !5182
  store i64 %arraysize66, i64* %newstruct87.sroa.2.0..sroa_idx134, align 8, !dbg !5182, !tbaa !340, !alias.scope !1043, !noalias !5183
  %newstruct87.sroa.3.0..sroa_idx135 = getelementptr inbounds { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }, { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }* %_replacementA8, i64 0, i32 0, i32 1, i64 1, !dbg !5182
  store i64 %arraysize68, i64* %newstruct87.sroa.3.0..sroa_idx135, align 16, !dbg !5182, !tbaa !340, !alias.scope !1043, !noalias !5183
  %newstruct87.sroa.4.0..sroa_idx136 = getelementptr inbounds { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }, { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }* %_replacementA8, i64 0, i32 1, i64 0, !dbg !5182
  store i64 %36, i64* %newstruct87.sroa.4.0..sroa_idx136, align 8, !dbg !5182, !tbaa !340, !alias.scope !1043, !noalias !5183
  %newstruct87.sroa.5.0..sroa_idx137 = getelementptr inbounds { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }, { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }* %_replacementA8, i64 0, i32 2, i32 0, i64 0, i32 0, !dbg !5182
  store i64 %31, i64* %newstruct87.sroa.5.0..sroa_idx137, align 16, !dbg !5182, !tbaa !340, !alias.scope !1043, !noalias !5183
  %newstruct87.sroa.6.0..sroa_idx138 = getelementptr inbounds { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }, { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }* %_replacementA8, i64 0, i32 2, i32 0, i64 0, i32 1, i64 0, !dbg !5182
  store i64 %arraysize78, i64* %newstruct87.sroa.6.0..sroa_idx138, align 8, !dbg !5182, !tbaa !340, !alias.scope !1043, !noalias !5183
  %newstruct87.sroa.7.0..sroa_idx139 = getelementptr inbounds { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }, { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }* %_replacementA8, i64 0, i32 2, i32 0, i64 0, i32 1, i64 1, !dbg !5182
  store i64 %arraysize80, i64* %newstruct87.sroa.7.0..sroa_idx139, align 16, !dbg !5182, !tbaa !340, !alias.scope !1043, !noalias !5183
  %37 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %0, [1 x {} addrspace(10)*] %35) #91, !dbg !5154
  %"'ip" = call token (...) @llvm.julia.gc_preserve_begin(), !dbg !5154
  %38 = icmp sgt i64 %20, -1
  %39 = add nsw i64 %16, -1, !dbg !5186
  br label %L133, !dbg !5186

L133:                                             ; preds = %L187, %L133.lr.ph
  %iv1 = phi i64 [ %iv.next2, %L187 ], [ 0, %L133.lr.ph ]
  %value_phi95200 = phi i64 [ %value_phi61, %L133.lr.ph ], [ %52, %L187 ]
  %value_phi93198 = phi i64 [ 0, %L133.lr.ph ], [ %46, %L187 ]
  %value_phi92197 = phi i32 [ 0, %L133.lr.ph ], [ %48, %L187 ]
  %iv.next2 = add nuw nsw i64 %iv1, 1, !dbg !5187
  %40 = icmp ne i64 %value_phi95200, 0, !dbg !5187
  call void @llvm.assume(i1 noundef %40) #91, !dbg !5190
  %41 = call i64 @llvm.cttz.i64(i64 %value_phi95200, i1 noundef true) #91, !dbg !5191, !range !1713
  %42 = trunc i64 %41 to i32, !dbg !5193
  %43 = icmp ugt i64 %20, %iv1, !dbg !5194
  %not.ifelse_cond96 = and i1 %38, %43, !dbg !5198
  %44 = zext i1 %not.ifelse_cond96 to i64, !dbg !5198
  %45 = add i64 %value_phi93198, %18, !dbg !5198
  %46 = add i64 %45, %44, !dbg !5199
  %47 = add nuw nsw i32 %42, 1, !dbg !5200
  %48 = add i32 %47, %value_phi92197, !dbg !5202
  %49 = zext i32 %47 to i64, !dbg !5204
  %50 = lshr i64 %value_phi95200, %49, !dbg !5204
  %51 = icmp eq i32 %42, 63, !dbg !5204
  %52 = select i1 %51, i64 0, i64 %50, !dbg !5204
  %53 = load i64, i64* inttoptr (i64 125797243527104 to i64*), align 64, !dbg !5206, !tbaa !247, !alias.scope !117, !noalias !120
  %"'il_phi" = phi i64 , !dbg !5212
  %54 = shl i32 %48, 9, !dbg !5212
  %55 = zext i32 %54 to i64, !dbg !5213
  %56 = inttoptr i64 %53 to i8*, !dbg !5217
  %57 = getelementptr i8, i8* %56, i64 %55, !dbg !5217
  %58 = getelementptr i8, i8* %57, i64 8, !dbg !5218
  %coercion = bitcast i8* %58 to i64*, !dbg !5224
  store i64 ptrtoint (void (i64)* @jlcapi_BatchClosure_2763 to i64), i64* %coercion, align 1, !dbg !5224, !tbaa !331, !alias.scope !117, !noalias !5228
  %59 = getelementptr i8, i8* %57, i64 16, !dbg !5229
  %60 = bitcast i8* %59 to { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }**, !dbg !5233
  store { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }* %_replacementA8, { { i64, [2 x i64] }, [1 x i64], { [1 x { i64, [2 x i64] }] } }** %60, align 1, !dbg !5233, !tbaa !331, !alias.scope !117, !noalias !5228
  %61 = getelementptr i8, i8* %57, i64 24, !dbg !5237
  %coercion98 = bitcast i8* %61 to i64*, !dbg !5241
  store i64 %value_phi93198, i64* %coercion98, align 1, !dbg !5241, !tbaa !331, !alias.scope !117, !noalias !5228
  %62 = getelementptr i8, i8* %57, i64 32, !dbg !5245
  %coercion99 = bitcast i8* %62 to i64*, !dbg !5249
  store i64 %46, i64* %coercion99, align 1, !dbg !5249, !tbaa !331, !alias.scope !117, !noalias !5228
  %p.i128 = bitcast i8* %57 to i32*, !dbg !5253
  %v.i129 = atomicrmw xchg i32* %p.i128, i32 0 acq_rel, align 4, !dbg !5253
  %.not178 = icmp eq i32 %v.i129, 1, !dbg !5256
  br i1 %.not178, label %L184, label %L187, !dbg !5257

L184:                                             ; preds = %L133
  call fastcc void @julia_wake_thread__2921(i32 zeroext %48) #91, !dbg !5257
  br label %L187, !dbg !5257

L187:                                             ; preds = %L184, %L133
  %63 = icmp eq i64 %iv.next2, %16, !dbg !5258
  br i1 %63, label %L189, label %L133, !dbg !5186

L189:                                             ; preds = %L187
  %64 = add i64 %46, 1, !dbg !5260
  %.not179 = icmp sgt i64 %64, %unbox, !dbg !5262
  %value_phi101 = select i1 %.not179, i64 %46, i64 %unbox, !dbg !5264
  %.not180 = icmp sgt i64 %64, %value_phi101, !dbg !5268
  %65 = shl i64 %arraysize66, 2, !dbg !5278
  %66 = mul i64 %65, %46, !dbg !5288
  %67 = getelementptr i8, i8* %arrayptr64, i64 %66, !dbg !5290
  %68 = sub i64 %value_phi101, %46, !dbg !5291
  %69 = select i1 %.not180, i64 0, i64 %68, !dbg !5291
  %70 = shl i64 %arraysize78, 2, !dbg !5299
  %71 = mul i64 %70, %46, !dbg !5310
  %72 = getelementptr i8, i8* %arrayptr76, i64 %71, !dbg !5312
  %73 = mul i64 %69, %arraysize66, !dbg !5313
  %74 = call i64 @llvm.smax.i64(i64 %73, i64 noundef 0) #91, !dbg !5322
  %.not181 = icmp slt i64 %73, 1, !dbg !5327
  br i1 %.not181, label %L349, label %L301.preheader, !dbg !5328

L301.preheader:                                   ; preds = %L189
  %75 = add nsw i64 %74, -1, !dbg !5329
  br label %L301, !dbg !5329

L301:                                             ; preds = %L301, %L301.preheader
  %iv3 = phi i64 [ 0, %L301.preheader ], [ %iv.next4, %L301 ]
  %iv.next4 = add nuw nsw i64 %iv3, 1, !dbg !5330
  %76 = shl i64 %iv3, 2, !dbg !5333
  %77 = getelementptr i8, i8* %72, i64 %76, !dbg !5338
  %coercion111 = bitcast i8* %77 to float*, !dbg !5339
  %pointerref = load float, float* %coercion111, align 1, !dbg !5339, !tbaa !331, !alias.scope !117, !noalias !120
  call void @llvm.lifetime.end.p0i8(i64 noundef 56, i8* noundef nonnull %.sub_replacementA) #91
  %78 = call fastcc float @julia_gelu_2739(float %pointerref) #91, !dbg !5336
  %79 = getelementptr i8, i8* %67, i64 %76, !dbg !5343
  %coercion112 = bitcast i8* %79 to float*, !dbg !5345
  store float %78, float* %coercion112, align 1, !dbg !5345, !tbaa !331, !alias.scope !117, !noalias !5228
  %exitcond202.not = icmp eq i64 %iv.next4, %74, !dbg !5349
  br i1 %exitcond202.not, label %L349.loopexit, label %L301, !dbg !5329, !llvm.loop !5350

L349.loopexit:                                    ; preds = %L301
  br label %L349, !dbg !5351

L349:                                             ; preds = %L349.loopexit, %L189
  %80 = icmp eq i64 %value_phi61, 0, !dbg !5351
  br i1 %80, label %L387, label %L355.preheader, !dbg !5353

L355.preheader:                                   ; preds = %L349
  br label %L355, !dbg !5354

L355:                                             ; preds = %L385, %L355.preheader
  %iv5 = phi i64 [ 0, %L355.preheader ], [ %iv.next6, %L385 ]
  %value_phi115194 = phi i64 [ %85, %L385 ], [ %value_phi61, %L355.preheader ]
  %value_phi114193 = phi i32 [ %87, %L385 ], [ 0, %L355.preheader ]
  %iv.next6 = add nuw nsw i64 %iv5, 1, !dbg !5357
  %81 = call i64 @llvm.cttz.i64(i64 %value_phi115194, i1 noundef true) #91, !dbg !5357, !range !1713
  %82 = trunc i64 %81 to i32, !dbg !5359
  %83 = add nuw nsw i32 %82, 1, !dbg !5360
  %84 = zext i32 %83 to i64, !dbg !5362
  %85 = lshr i64 %value_phi115194, %84, !dbg !5362
  %86 = icmp eq i32 %82, 63, !dbg !5362
  %87 = add i32 %83, %value_phi114193, !dbg !5364
  %88 = load i64, i64* inttoptr (i64 125797243527104 to i64*), align 64, !dbg !5366, !tbaa !247, !alias.scope !117, !noalias !120
  %"'il_phi3" = phi i64 , !dbg !5369
  %89 = shl i32 %87, 9, !dbg !5369
  %90 = zext i32 %89 to i64, !dbg !5370
  %91 = inttoptr i64 %88 to i8*, !dbg !5374
  %92 = getelementptr i8, i8* %91, i64 %90, !dbg !5374
  %p.i130 = bitcast i8* %92 to i32*, !dbg !5375
  %v.i131190 = load atomic i32, i32* %p.i130 acquire, align 16, !dbg !5375
  %"v.i131190'il_phi" = phi i32 , !dbg !5377
  %.not183191 = icmp eq i32 %v.i131190, 0, !dbg !5377
  br i1 %.not183191, label %L375.preheader, label %L385, !dbg !5354

L375.preheader:                                   ; preds = %L355
  br label %L375, !dbg !5378

L375:                                             ; preds = %L382, %L375.preheader
  %iv7 = phi i64 [ 0, %L375.preheader ], [ %iv.next8, %L382 ]
  %iv.next8 = add nuw nsw i64 %iv7, 1
  %93 = trunc i64 %iv7 to i32
  call void @llvm.lifetime.end.p0i8(i64 noundef 56, i8* noundef nonnull %.sub_replacementA) #91
  call void asm sideeffect "pause", "~{memory}"() #94, !dbg !5379
  %94 = add i32 %93, 1, !dbg !5381
  %95 = icmp ult i32 %94, 65537, !dbg !5382
  br i1 %95, label %L382, label %L379, !dbg !5378

L379:                                             ; preds = %L375
  %96 = call fastcc i8 @julia_checktask_2772(i32 zeroext %87) #91, !dbg !5384
  %97 = and i8 %96, 1, !dbg !5384
  %.not184 = icmp eq i8 %97, 0, !dbg !5384
  br i1 %.not184, label %L382, label %L385.loopexit, !dbg !5384

L382:                                             ; preds = %L379, %L375
  %v.i131 = load atomic i32, i32* %p.i130 acquire, align 16, !dbg !5375
  %"v.i131'il_phi" = phi i32 , !dbg !5377
  %.not183 = icmp eq i32 %v.i131, 0, !dbg !5377
  br i1 %.not183, label %L375, label %L385.loopexit, !dbg !5354

L385.loopexit:                                    ; preds = %L382, %L379
  br label %L385, !dbg !5351

L385:                                             ; preds = %L385.loopexit, %L355
  %98 = icmp eq i64 %85, 0, !dbg !5351
  %99 = select i1 %86, i1 true, i1 %98, !dbg !5351
  br i1 %99, label %L387.loopexit, label %L355, !dbg !5353

L387.loopexit:                                    ; preds = %L385
  br label %L387, !dbg !5385

L387:                                             ; preds = %L387.loopexit, %L349
  %v.i133 = atomicrmw or i64* %p.i_replacementA, i64 %value_phi61 acq_rel, align 8, !dbg !5385
  br label %L616, !dbg !5388

L393:                                             ; preds = %L72, %L14, %L6
  %100 = call i64 @llvm.smax.i64(i64 %unbox, i64 noundef 0) #91, !dbg !5389
  %.not173.inv = icmp sgt i64 %unbox, 0, !dbg !5392
  %value_phi7 = select i1 %.not173.inv, i64 %100, i64 0, !dbg !5392
  %101 = addrspacecast {} addrspace(10)* %0 to {} addrspace(10)* addrspace(11)*, !dbg !5400
  %arraysize_ptr = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %101, i64 3, !dbg !5400
  %102 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr to i64 addrspace(11)*, !dbg !5400
  %arraysize = load i64, i64 addrspace(11)* %102, align 8, !dbg !5400, !tbaa !95, !range !131, !alias.scope !313, !noalias !314
  %memcpy_refined_dst14 = getelementptr inbounds { [1 x [1 x i64]], [2 x i64] }, { [1 x [1 x i64]], [2 x i64] }* %newstruct13_replacementA, i64 0, i32 0, i64 0, i64 0, !dbg !5406
  store i64 %arraysize, i64* %memcpy_refined_dst14, align 8, !dbg !5406, !tbaa !397, !alias.scope !399, !noalias !5408
  %newstruct8.sroa.0.0..sroa_idx = getelementptr inbounds { [1 x [1 x i64]], [2 x i64] }, { [1 x [1 x i64]], [2 x i64] }* %newstruct13_replacementA, i64 0, i32 1, i64 0, !dbg !5406
  store i64 1, i64* %newstruct8.sroa.0.0..sroa_idx, align 8, !dbg !5406, !tbaa !397, !alias.scope !399, !noalias !5408
  %newstruct8.sroa.5.0..sroa_idx146 = getelementptr inbounds { [1 x [1 x i64]], [2 x i64] }, { [1 x [1 x i64]], [2 x i64] }* %newstruct13_replacementA, i64 0, i32 1, i64 1, !dbg !5406
  store i64 %value_phi7, i64* %newstruct8.sroa.5.0..sroa_idx146, align 8, !dbg !5406, !tbaa !397, !alias.scope !399, !noalias !5408
  %arraysize_ptr15 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %101, i64 4, !dbg !5409
  %103 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr15 to i64 addrspace(11)*, !dbg !5409
  %arraysize16 = load i64, i64 addrspace(11)* %103, align 16, !dbg !5409, !tbaa !95, !range !131, !alias.scope !313, !noalias !314
  %104 = icmp eq i64 %value_phi7, 0, !dbg !5413
  %105 = add nsw i64 %value_phi7, -1, !dbg !5419
  %106 = icmp ult i64 %105, %arraysize16, !dbg !5421
  %107 = or i1 %104, %106, !dbg !5422
  br i1 %107, label %L464, label %L461, !dbg !5412

L461:                                             ; preds = %L393
  %108 = addrspacecast { [1 x [1 x i64]], [2 x i64] }* %newstruct13_replacementA to { [1 x [1 x i64]], [2 x i64] } addrspace(11)*, !dbg !5412
  call fastcc void @julia_throw_boundserror_2928({} addrspace(10)* nofree noundef nonnull align 16 dereferenceable(40) %0, { [1 x [1 x i64]], [2 x i64] } addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(24) %108) #95, !dbg !5412
  unreachable, !dbg !5412

L464:                                             ; preds = %L393
  %getfield_addr = getelementptr inbounds { [1 x {} addrspace(10)*] }, { [1 x {} addrspace(10)*] } addrspace(11)* %1, i64 0, i32 0, i64 0, !dbg !5423
  %getfield = load atomic {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %getfield_addr unordered, align 8, !dbg !5423, !tbaa !95, !alias.scope !313, !noalias !314, !nonnull !90, !dereferenceable !315, !align !316
  %"getfield'il_phi" = phi {} addrspace(10)* , !dbg !5427
  %109 = addrspacecast {} addrspace(10)* %getfield to {} addrspace(10)* addrspace(11)*, !dbg !5427
  %arraysize_ptr25 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %109, i64 3, !dbg !5427
  %110 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr25 to i64 addrspace(11)*, !dbg !5427
  %arraysize26 = load i64, i64 addrspace(11)* %110, align 8, !dbg !5427, !tbaa !95, !range !131, !alias.scope !313, !noalias !314
  %memcpy_refined_dst32 = getelementptr inbounds { [1 x [1 x i64]], [2 x i64] }, { [1 x [1 x i64]], [2 x i64] }* %newstruct30_replacementA, i64 0, i32 0, i64 0, i64 0, !dbg !5432
  store i64 %arraysize26, i64* %memcpy_refined_dst32, align 8, !dbg !5432, !tbaa !397, !alias.scope !399, !noalias !5408
  %newstruct8.sroa.0.0..sroa_idx142 = getelementptr inbounds { [1 x [1 x i64]], [2 x i64] }, { [1 x [1 x i64]], [2 x i64] }* %newstruct30_replacementA, i64 0, i32 1, i64 0, !dbg !5432
  store i64 1, i64* %newstruct8.sroa.0.0..sroa_idx142, align 8, !dbg !5432, !tbaa !397, !alias.scope !399, !noalias !5408
  %newstruct8.sroa.5.0..sroa_idx147 = getelementptr inbounds { [1 x [1 x i64]], [2 x i64] }, { [1 x [1 x i64]], [2 x i64] }* %newstruct30_replacementA, i64 0, i32 1, i64 1, !dbg !5432
  store i64 %value_phi7, i64* %newstruct8.sroa.5.0..sroa_idx147, align 8, !dbg !5432, !tbaa !397, !alias.scope !399, !noalias !5408
  %arraysize_ptr33 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %109, i64 4, !dbg !5434
  %111 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr33 to i64 addrspace(11)*, !dbg !5434
  %arraysize34 = load i64, i64 addrspace(11)* %111, align 16, !dbg !5434, !tbaa !95, !range !131, !alias.scope !313, !noalias !314
  %112 = icmp ult i64 %105, %arraysize34, !dbg !5438
  %113 = or i1 %104, %112, !dbg !5443
  br i1 %113, label %L503, label %L500, !dbg !5437

L500:                                             ; preds = %L464
  %114 = addrspacecast { [1 x [1 x i64]], [2 x i64] }* %newstruct30_replacementA to { [1 x [1 x i64]], [2 x i64] } addrspace(11)*, !dbg !5437
  call fastcc void @julia_throw_boundserror_2928({} addrspace(10)* nofree noundef nonnull align 16 dereferenceable(40) %getfield, { [1 x [1 x i64]], [2 x i64] } addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(24) %114) #95, !dbg !5437
  unreachable, !dbg !5437

L503:                                             ; preds = %L464
  %115 = mul i64 %arraysize, %value_phi7, !dbg !5444
  %116 = call i64 @llvm.smax.i64(i64 %115, i64 noundef 0) #91, !dbg !5453
  %.not174 = icmp slt i64 %115, 1, !dbg !5458
  br i1 %.not174, label %L616, label %L552.lr.ph, !dbg !5459

L552.lr.ph:                                       ; preds = %L503
  %117 = addrspacecast {} addrspace(10)* %getfield to float addrspace(13)* addrspace(11)*
  %118 = addrspacecast {} addrspace(10)* %0 to float addrspace(13)* addrspace(11)*
  %119 = add nsw i64 %116, -1, !dbg !5460
  br label %L552, !dbg !5460

L552:                                             ; preds = %L552, %L552.lr.ph
  %iv9 = phi i64 [ %iv.next10, %L552 ], [ 0, %L552.lr.ph ]
  %iv.next10 = add nuw nsw i64 %iv9, 1, !dbg !5461
  %arrayptr176 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %117, align 16, !dbg !5464, !tbaa !95, !alias.scope !5468, !noalias !314, !llvm.mem.parallel_loop_access !5469, !nonnull !90
  %"arrayptr176'il_phi" = phi float addrspace(13)* , !dbg !5464
  %120 = getelementptr inbounds float, float addrspace(13)* %arrayptr176, i64 %iv9, !dbg !5464
  %arrayref = load float, float addrspace(13)* %120, align 4, !dbg !5464, !tbaa !177, !alias.scope !117, !noalias !120, !llvm.mem.parallel_loop_access !5469
  %121 = call fastcc float @julia_gelu_2739(float %arrayref) #91, !dbg !5466, !llvm.mem.parallel_loop_access !5469
  %arrayptr54177 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %118, align 16, !dbg !5471, !tbaa !95, !alias.scope !5468, !noalias !314, !llvm.mem.parallel_loop_access !5469, !nonnull !90
  %"arrayptr54177'il_phi" = phi float addrspace(13)* , !dbg !5471
  %122 = getelementptr inbounds float, float addrspace(13)* %arrayptr54177, i64 %iv9, !dbg !5471
  store float %121, float addrspace(13)* %122, align 4, !dbg !5471, !tbaa !177, !alias.scope !117, !noalias !5228, !llvm.mem.parallel_loop_access !5469
  %exitcond.not = icmp eq i64 %iv.next10, %116, !dbg !5473
  br i1 %exitcond.not, label %L616.loopexit, label %L552, !dbg !5460, !llvm.loop !5470

L616.loopexit:                                    ; preds = %L552
  br label %L616

L616:                                             ; preds = %L616.loopexit, %L503, %L387, %top
  call void @llvm.lifetime.end.p0i8(i64 noundef 56, i8* noundef nonnull %.sub_replacementA) #91
  br label %invertL616, !dbg !5474

allocsForInversion:                               ; No predecessors!
  %"iv'ac" = alloca i64, align 8
  %"iv1'ac" = alloca i64, align 8
  %"iv3'ac" = alloca i64, align 8
  %"iv5'ac" = alloca i64, align 8
  %"iv7'ac" = alloca i64, align 8
  %"iv9'ac" = alloca i64, align 8

inverttop:                                        ; preds = %invertL6
  fence syncscope("singlethread") seq_cst
  fence syncscope("singlethread") seq_cst
  ret void

invertL6:                                         ; preds = %invertL14
  br label %inverttop

invertL14:                                        ; preds = %invertL24
  br label %invertL6

invertL24:                                        ; preds = %invertL37
  br label %invertL14

invertL37:                                        ; preds = %invertL40
  br label %invertL24

invertL40:                                        ; preds = %mergeinvertL40_L61, %incinvertL40
  %123 = load i64, i64* %"iv'ac", align 8
  %124 = icmp eq i64 %123, 0
  %125 = xor i1 %124, true
  br i1 %124, label %invertL37, label %incinvertL40

incinvertL40:                                     ; preds = %invertL40
  %126 = load i64, i64* %"iv'ac", align 8
  %127 = add nsw i64 %126, -1
  store i64 %127, i64* %"iv'ac", align 8
  br label %invertL40

invertL61:                                        ; No predecessors!
  br label %mergeinvertL40_L61

mergeinvertL40_L61:                               ; preds = %invertL61
  store i64 0, i64* %"iv'ac", align 8
  br label %invertL40

invertL72:                                        ; No predecessors!
  %128 = call i64 @llvm.smin.i64(i64 %unbox, i64 %4) #91, !dbg !5077
  %_unwrap = trunc i64 %128 to i32
  %_unwrap24 = add i32 %_unwrap, -1

invertL133.lr.ph:                                 ; No predecessors!

invertL133:                                       ; No predecessors!

invertL184:                                       ; No predecessors!

invertL187:                                       ; No predecessors!

invertL189:                                       ; No predecessors!

invertL301.preheader:                             ; No predecessors!

invertL301:                                       ; No predecessors!

invertL349.loopexit:                              ; No predecessors!

invertL349:                                       ; No predecessors!

invertL355.preheader:                             ; No predecessors!

invertL355:                                       ; No predecessors!

invertL375.preheader:                             ; No predecessors!

invertL375:                                       ; No predecessors!

invertL379:                                       ; No predecessors!

invertL382:                                       ; No predecessors!

invertL385.loopexit:                              ; No predecessors!

invertL385:                                       ; No predecessors!

invertL387.loopexit:                              ; No predecessors!

invertL387:                                       ; No predecessors!

invertL393:                                       ; No predecessors!

invertL461:                                       ; No predecessors!

invertL464:                                       ; No predecessors!

invertL500:                                       ; No predecessors!

invertL503:                                       ; No predecessors!

invertL552.lr.ph:                                 ; No predecessors!

invertL552:                                       ; No predecessors!

invertL616.loopexit:                              ; No predecessors!

invertL616:                                       ; preds = %L616
}

  %v.i_replacementA = phi i64 , !dbg !146
julia: /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:3791: bool GradientUtils::legalRecompute(const llvm::Value*, const ValueToValueMapTy&, llvm::IRBuilder<>*, bool, bool) const: Assertion `phi->getNumIncomingValues() != 0' failed.

[840325] signal (6.-6): Aborted
in expression starting at REPL[10]:1
unknown function (ip: 0x72699a25b32c)
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x72699a1f23db)
__assert_fail at /usr/lib/libc.so.6 (unknown line)
legalRecompute at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:3791
lookupM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:6535
unwrapM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:1327
lookupM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:6537
unwrapM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:930
lookupM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:6537
unwrapM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:1066
lookupM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:6537
unwrapM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:1088
lookupM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:6537
branchToCorrespondingTarget at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:7738
createInvertedTerminator at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:3611
CreatePrimalAndGradient at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:4382
recursivelyHandleSubfunction at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:5744
visitCallInst at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:6611
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:111 [inlined]
CreatePrimalAndGradient at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:4378
EnzymeCreatePrimalAndGradient at /workspace/srcdir/Enzyme/enzyme/Enzyme/CApi.cpp:615
EnzymeCreatePrimalAndGradient at /home/avikpal/.julia/packages/Enzyme/wOi4l/src/api.jl:154
unknown function (ip: 0x72696410805b)
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
enzyme! at /home/avikpal/.julia/packages/Enzyme/wOi4l/src/compiler.jl:3147
unknown function (ip: 0x726964103918)
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
#codegen#487 at /home/avikpal/.julia/packages/Enzyme/wOi4l/src/compiler.jl:5022
codegen at /home/avikpal/.julia/packages/Enzyme/wOi4l/src/compiler.jl:4444 [inlined]
_thunk at /home/avikpal/.julia/packages/Enzyme/wOi4l/src/compiler.jl:5707
_thunk at /home/avikpal/.julia/packages/Enzyme/wOi4l/src/compiler.jl:5707 [inlined]
cached_compilation at /home/avikpal/.julia/packages/Enzyme/wOi4l/src/compiler.jl:5741 [inlined]
#532 at /home/avikpal/.julia/packages/Enzyme/wOi4l/src/compiler.jl:5807
#JuliaContext#149 at /home/avikpal/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:52
unknown function (ip: 0x726964d58b36)
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
JuliaContext at /home/avikpal/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:42
#s1946#531 at /home/avikpal/.julia/packages/Enzyme/wOi4l/src/compiler.jl:5759 [inlined]
#s1946#531 at ./none:0
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
GeneratedFunctionStub at ./boot.jl:602
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_call_staged at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/method.c:540
ijl_code_for_staged at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/method.c:593
get_staged at ./compiler/utilities.jl:123
retrieve_code_info at ./compiler/utilities.jl:135 [inlined]
InferenceState at ./compiler/inferencestate.jl:430
typeinf_edge at ./compiler/typeinfer.jl:920
abstract_call_method at ./compiler/abstractinterpretation.jl:629
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:95
abstract_call_known at ./compiler/abstractinterpretation.jl:2087
abstract_call at ./compiler/abstractinterpretation.jl:2169
abstract_call at ./compiler/abstractinterpretation.jl:2162
abstract_call at ./compiler/abstractinterpretation.jl:2354
abstract_eval_call at ./compiler/abstractinterpretation.jl:2370
abstract_eval_statement_expr at ./compiler/abstractinterpretation.jl:2380
abstract_eval_statement at ./compiler/abstractinterpretation.jl:2624
abstract_eval_basic_statement at ./compiler/abstractinterpretation.jl:2889
typeinf_local at ./compiler/abstractinterpretation.jl:3098
typeinf_nocycle at ./compiler/abstractinterpretation.jl:3186
_typeinf at ./compiler/typeinfer.jl:247
typeinf at ./compiler/typeinfer.jl:216
typeinf_edge at ./compiler/typeinfer.jl:930
abstract_call_method at ./compiler/abstractinterpretation.jl:629
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:95
abstract_call_known at ./compiler/abstractinterpretation.jl:2087
abstract_call at ./compiler/abstractinterpretation.jl:2169
abstract_apply at ./compiler/abstractinterpretation.jl:1612
abstract_call_known at ./compiler/abstractinterpretation.jl:2004
abstract_call at ./compiler/abstractinterpretation.jl:2169
abstract_call at ./compiler/abstractinterpretation.jl:2162
abstract_call at ./compiler/abstractinterpretation.jl:2354
abstract_eval_call at ./compiler/abstractinterpretation.jl:2370
abstract_eval_statement_expr at ./compiler/abstractinterpretation.jl:2380
abstract_eval_statement at ./compiler/abstractinterpretation.jl:2624
abstract_eval_basic_statement at ./compiler/abstractinterpretation.jl:2913
typeinf_local at ./compiler/abstractinterpretation.jl:3098
typeinf_nocycle at ./compiler/abstractinterpretation.jl:3186
_typeinf at ./compiler/typeinfer.jl:247
typeinf at ./compiler/typeinfer.jl:216
typeinf_ext at ./compiler/typeinfer.jl:1051
typeinf_ext_toplevel at ./compiler/typeinfer.jl:1082
typeinf_ext_toplevel at ./compiler/typeinfer.jl:1078
jfptr_typeinf_ext_toplevel_35682.1 at /home/avikpal/.julia/juliaup/julia-1.10.3+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_type_infer at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:394
jl_generate_fptr_impl at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/jitlayers.cpp:504
jl_compile_method_internal at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2481 [inlined]
jl_compile_method_internal at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2368
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2887 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/interpreter.c:617
jl_interpret_toplevel_thunk at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/toplevel.c:877
eval_body at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/interpreter.c:579
eval_body at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/interpreter.c:544
jl_interpret_toplevel_thunk at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/toplevel.c:877
jl_toplevel_eval_flex at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
eval_user_input at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
repl_backend_loop at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
#start_repl_backend#46 at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
start_repl_backend at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:228
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
#run_repl#59 at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:389
run_repl at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:375
jfptr_run_repl_91734.1 at /home/avikpal/.julia/juliaup/julia-1.10.3+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
#1013 at ./client.jl:432
jfptr_YY.1013_82700.1 at /home/avikpal/.julia/juliaup/julia-1.10.3+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_latest at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/builtins.c:812
#invokelatest#2 at ./essentials.jl:892 [inlined]
invokelatest at ./essentials.jl:889 [inlined]
run_main_repl at ./client.jl:416
exec_options at ./client.jl:333
_start at ./client.jl:552
jfptr__start_82726.1 at /home/avikpal/.julia/juliaup/julia-1.10.3+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
unknown function (ip: 0x72699a1f3ccf)
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 60085108 (Pool: 60011990; Big: 73118); GC: 58
[1]  + 840325 IOT instruction (core dumped)  julia --threads=auto --project=.
@avik-pal avik-pal changed the title Segfault from Assertion Julia Crashes from Assertion May 11, 2024
@wsmoses
Copy link
Member

wsmoses commented May 11, 2024

@avik-pal are you able to reduce out the Dense into more fundamental function calls and still get this to trigger?

@avik-pal
Copy link
Contributor Author

using LuxLib, Enzyme

y = randn(Float32, 10, 10)
b = randn(Float32, 10)
act = gelu

function loss_function(act, y, b)
    return sum(LuxLib.__apply_bias_activation!!(act, y, b, Val(false)))
end

loss_function(act, y, b)

begin
    dy = Enzyme.make_zero(y)
    db = Enzyme.make_zero(b)

    Enzyme.autodiff(Enzyme.Reverse, loss_function, Active, Const(act),
        Duplicated(y, dy), Duplicated(b, db))
end

Now I get

ERROR: LLVM error: function failed verification (4)

without the julia crash

@wsmoses
Copy link
Member

wsmoses commented May 11, 2024

For better or worse that's a different error, I'll try to repo the last one and see if it's a quick fix. For both errors can you include the full logs?

@avik-pal
Copy link
Contributor Author

avik-pal commented May 11, 2024

using Enzyme, Polyester

y = randn(Float32, 10, 10)
b = randn(Float32, 10)
act = x -> max(x, 0)

function __apply_bias_activation!!::F, x, bias::Union{Nothing, AbstractArray}) where {F}
    f_fused = σ  +
    if maximum(length, (x, bias)) > 100_000
        bc = Broadcast.instantiate(Broadcast.broadcasted(f_fused, x, bias))
        @batch for I in eachindex(bc)
            @inbounds x[I] = bc[I]
        end
    else
        @. x = f_fused(x, bias)
    end
    return x
    # return LuxLib.__nonuniform_fast_broadcast!(σ ∘ +, x, bias)
end

function loss_function(act, y, b)
    return sum(__apply_bias_activation!!(act, y, b))
end

loss_function(act, y, b)

begin
    dy = Enzyme.make_zero(y)
    db = Enzyme.make_zero(b)

    Enzyme.autodiff(Enzyme.Reverse, loss_function, Active,
        Const(act), Duplicated(y, dy), Duplicated(b, db))
end

A minimal version without Lux deps

Crash Log
 Function Attrs: mustprogress willreturn
define internal fastcc void @preprocess_julia___apply_bias_activation___2450({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="137517761287184" "enzymejl_parmtype_ref"="2" %0, {} addrspace(10)* noundef nonnull align 16 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="137517789463248" "enzymejl_parmtype_ref"="2" %1) unnamed_addr #74 !dbg !3125 {
top:
  %2 = call noalias nonnull dereferenceable(88) dereferenceable_or_null(88) i8* @malloc(i64 88), !enzyme_fromstack !483
  %3 = bitcast i8* %2 to { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }*, !enzyme_caststack !63
  %.sub = bitcast { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %3 to i8*
  %4 = call {}*** @julia.get_pgcstack() #78
  %current_task1530 = getelementptr inbounds {}**, {}*** %4, i64 -14
  %current_task1 = bitcast {}*** %current_task1530 to {}**
  %ptls_field531 = getelementptr inbounds {}**, {}*** %4, i64 2
  %5 = bitcast {}*** %ptls_field531 to i64***
  %ptls_load532533 = load i64**, i64*** %5, align 8, !tbaa !64
  %6 = getelementptr inbounds i64*, i64** %ptls_load532533, i64 2
  %safepoint = load i64*, i64** %6, align 8, !tbaa !68
  fence syncscope("singlethread") seq_cst
  call void @julia.safepoint(i64* %safepoint) #78, !dbg !3126
  fence syncscope("singlethread") seq_cst
  %7 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !3127
  %8 = addrspacecast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !3127
  %arraylen_ptr = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %8, i64 0, i32 1, !dbg !3127
  %arraylen = load i64, i64 addrspace(11)* %arraylen_ptr, align 8, !dbg !3127, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %9 = addrspacecast {} addrspace(10)* %1 to {} addrspace(11)*, !dbg !3140
  %10 = addrspacecast {} addrspace(10)* %1 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !3140
  %arraylen_ptr2 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %10, i64 0, i32 1, !dbg !3140
  %arraylen3 = load i64, i64 addrspace(11)* %arraylen_ptr2, align 8, !dbg !3140, !tbaa !250, !range !253, !alias.scope !254, !noalias !255
  %11 = call i64 @llvm.umax.i64(i64 %arraylen3, i64 %arraylen) #78, !dbg !3143
  %12 = icmp ult i64 %11, 100001, !dbg !3146
  br i1 %12, label %L811, label %L7, !dbg !3139

L7:                                               ; preds = %top
  %13 = addrspacecast {} addrspace(10)* %0 to {} addrspace(10)* addrspace(11)*, !dbg !3148
  %arraysize_ptr = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %13, i64 3, !dbg !3148
  %14 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr to i64 addrspace(11)*, !dbg !3148
  %arraysize = load i64, i64 addrspace(11)* %14, align 8, !dbg !3148, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %arraysize_ptr4 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %13, i64 4, !dbg !3148
  %15 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr4 to i64 addrspace(11)*, !dbg !3148
  %arraysize5 = load i64, i64 addrspace(11)* %15, align 16, !dbg !3148, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %16 = icmp eq i64 %arraylen3, %arraysize, !dbg !3153
  %17 = icmp eq i64 %arraysize, 1, !dbg !3155
  %value_phi = or i1 %16, %17, !dbg !3155
  br i1 %value_phi, label %L40, label %L28, !dbg !3156

L28:                                              ; preds = %L7
  %.not559 = icmp eq i64 %arraylen3, 1, !dbg !3155
  br i1 %.not559, label %L40, label %L36, !dbg !3156

L36:                                              ; preds = %L28
  %18 = call noalias nonnull "enzyme_inactive" {} addrspace(10)* @ijl_box_int64(i64 signext %arraysize) #79, !dbg !3156
  %19 = call noalias nonnull "enzyme_inactive" {} addrspace(10)* @ijl_box_int64(i64 signext %arraylen3) #79, !dbg !3156
  %20 = call nonnull {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)*, {} addrspace(10)*, {} addrspace(10)*, ...) @julia.call2({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)* noundef nonnull @ijl_invoke, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 137517617037136 to {}*) to {} addrspace(10)*), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 137517584367872 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 137517712977280 to {}*) to {} addrspace(10)*), {} addrspace(10)* nonnull %18, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 137517712977248 to {}*) to {} addrspace(10)*), {} addrspace(10)* nonnull %19) #80, !dbg !3156
  %box = call noalias nonnull dereferenceable(8) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 137517631542736 to {}*) to {} addrspace(10)*)) #81, !dbg !3156
  %21 = bitcast {} addrspace(10)* %box to [1 x {} addrspace(10)*] addrspace(10)*, !dbg !3156
  %22 = getelementptr [1 x {} addrspace(10)*], [1 x {} addrspace(10)*] addrspace(10)* %21, i64 0, i64 0, !dbg !3156
  store {} addrspace(10)* %20, {} addrspace(10)* addrspace(10)* %22, align 8, !dbg !3156, !tbaa !188, !alias.scope !83, !noalias !3159
  %23 = addrspacecast {} addrspace(10)* %box to {} addrspace(12)*, !dbg !3156
  call void @ijl_throw({} addrspace(12)* %23) #82, !dbg !3156
  unreachable, !dbg !3156

L40:                                              ; preds = %L28, %L7
  %.sroa.0449.0 = phi i64 [ %arraylen3, %L7 ], [ %arraysize, %L28 ]
  %24 = call i64 @julia_nthreads_2651() #78, !dbg !3162
  %.not = icmp eq i64 %24, 1, !dbg !3164
  br i1 %.not, label %L62, label %L221, !dbg !3165

L62:                                              ; preds = %L40
  %25 = icmp ne i64 %.sroa.0449.0, 0, !dbg !3166
  %26 = icmp ne i64 %arraysize5, 0, !dbg !3166
  %.demorgan = and i1 %26, %25, !dbg !3170
  br i1 %.demorgan, label %guard_exit374, label %L1042, !dbg !3170

L86:                                              ; preds = %guard_exit379, %guard_exit374
  %iv17 = phi i64 [ %iv.next18, %guard_exit379 ], [ 0, %guard_exit374 ], !dbg !3171
  %arraylen64 = phi i64 [ %arraylen3, %guard_exit374 ], [ %arraylen64.pre, %guard_exit379 ], !dbg !3171
  %nodecayed.arrayptr = phi {} addrspace(10)* [ %237, %guard_exit374 ], [ %240, %guard_exit379 ], !dbg !3180
  %arraysize54 = phi i64 [ %arraysize5, %guard_exit374 ], [ %arraysize54.pre, %guard_exit379 ], !dbg !3183
  %arraysize62 = phi i64 [ %arraysize, %guard_exit374 ], [ %arraysize72, %guard_exit379 ], !dbg !3180
  %value_phi40 = phi i64 [ 1, %guard_exit374 ], [ %value_phi77572, %guard_exit379 ]
  %value_phi41 = phi i64 [ 1, %guard_exit374 ], [ %value_phi78573, %guard_exit379 ]
  %iv.next18 = add nuw nsw i64 %iv17, 1, !dbg !3186
  %27 = bitcast {} addrspace(10)* %nodecayed.arrayptr to i8 addrspace(13)* addrspace(10)*, !dbg !3186
  %28 = addrspacecast i8 addrspace(13)* addrspace(10)* %27 to i8 addrspace(13)* addrspace(11)*, !dbg !3186
  %29 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %28, align 8, !dbg !3186
  %.not534 = icmp eq i64 %arraysize62, 1, !dbg !3186
  %.not535 = icmp eq i64 %arraysize54, 1, !dbg !3188
  %value_phi40.op = add i64 %value_phi40, -1, !dbg !3180
  %30 = select i1 %.not534, i64 0, i64 %value_phi40.op, !dbg !3180
  %value_phi41.op = add i64 %value_phi41, -1, !dbg !3180
  %31 = select i1 %.not535, i64 0, i64 %value_phi41.op, !dbg !3180
  %32 = mul i64 %31, %arraysize62, !dbg !3180
  %33 = add i64 %32, %30, !dbg !3180
  %34 = bitcast i8 addrspace(13)* %29 to float addrspace(13)*, !dbg !3180
  %35 = getelementptr inbounds float, float addrspace(13)* %34, i64 %33, !dbg !3180
  %arrayref = load float, float addrspace(13)* %35, align 4, !dbg !3180, !tbaa !494, !alias.scope !83, !noalias !86
  %.not536 = icmp eq i64 %arraylen64, 1, !dbg !3190
  %36 = select i1 %.not536, i64 0, i64 %value_phi40.op, !dbg !3192
  %arrayptr69538 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %234, align 16, !dbg !3192, !tbaa !271, !alias.scope !3194, !noalias !255, !nonnull !63
  %37 = getelementptr inbounds float, float addrspace(13)* %arrayptr69538, i64 %36, !dbg !3192
  %arrayref70 = load float, float addrspace(13)* %37, align 4, !dbg !3192, !tbaa !494, !alias.scope !83, !noalias !86
  %38 = fadd float %arrayref, %arrayref70, !dbg !3195
  %39 = call fastcc float @julia_gelu_2643(float %38) #78, !dbg !3197
  %arraysize72 = load i64, i64 addrspace(11)* %14, align 8, !dbg !3202, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %40 = mul i64 %arraysize72, %value_phi41.op, !dbg !3202
  %41 = add i64 %40, %value_phi40.op, !dbg !3202
  %arrayptr75 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %arrayptr_ptr.phi.trans.insert, align 16, !dbg !3202, !tbaa !68, !alias.scope !3204, !noalias !337, !nonnull !63
  %42 = bitcast i8 addrspace(13)* %arrayptr75 to float addrspace(13)*, !dbg !3202
  %43 = getelementptr inbounds float, float addrspace(13)* %42, i64 %41, !dbg !3202
  store float %39, float addrspace(13)* %43, align 4, !dbg !3202, !tbaa !494, !alias.scope !83, !noalias !3159
  %44 = add i64 %value_phi40, 1, !dbg !3205
  %45 = icmp ugt i64 %value_phi40, 9223372036854775806, !dbg !3208
  %46 = icmp sgt i64 %44, %.sroa.0449.0, !dbg !3208
  %47 = or i1 %45, %46, !dbg !3211
  %48 = icmp eq i64 %value_phi40, %.sroa.0449.0
  %or.cond = or i1 %48, %47, !dbg !3211
  br i1 %or.cond, label %L194, label %guard_exit379, !dbg !3211

L194:                                             ; preds = %L86
  %49 = add i64 %value_phi41, 1, !dbg !3212
  %50 = icmp ult i64 %value_phi41, 9223372036854775807, !dbg !3215
  %51 = icmp sle i64 %49, %arraysize5, !dbg !3215
  %52 = and i1 %50, %51, !dbg !3219
  %53 = icmp ne i64 %value_phi41, %arraysize5, !dbg !3218
  %value_phi95 = and i1 %53, %52, !dbg !3218
  br i1 %value_phi95, label %guard_exit379, label %L1042.loopexit1, !dbg !3170

L221:                                             ; preds = %L40
  %.not540 = icmp eq i64 %arraysize5, 0, !dbg !3220
  br i1 %.not540, label %L1042, label %L227, !dbg !3222

L227:                                             ; preds = %L221
  %54 = call i64 @llvm.smin.i64(i64 %24, i64 %arraysize5) #78, !dbg !3224
  %.not541 = icmp eq i64 %54, 0, !dbg !3226
  br i1 %.not541, label %L691.lr.ph, label %L235, !dbg !3227

L235:                                             ; preds = %L227
  %55 = trunc i64 %54 to i32, !dbg !3228
  %56 = add i32 %55, -1, !dbg !3228
  %57 = call nonnull "enzyme_inactive" {}* @julia.pointer_from_objref({} addrspace(11)* noundef addrspacecast ({}* inttoptr (i64 137517510837008 to {}*) to {} addrspace(11)*)) #83, !dbg !3232
  %58 = icmp sgt i32 %56, 0, !dbg !3234
  br i1 %58, label %L245, label %L691.lr.ph, !dbg !3235

L245:                                             ; preds = %L235
  %p.i = bitcast {}* %57 to i64*, !dbg !3237
  %v.i = atomicrmw xchg i64* %p.i, i64 0 acq_rel, align 8, !dbg !3237
  %59 = call i64 @llvm.ctpop.i64(i64 %v.i) #78, !dbg !3240, !range !2055
  %60 = trunc i64 %59 to i32, !dbg !3242
  %61 = sub nsw i32 %56, %60, !dbg !3243
  %62 = icmp slt i32 %61, 0, !dbg !3245
  br i1 %62, label %L258, label %L293, !dbg !3248

L258:                                             ; preds = %L245
  %63 = call i64 @llvm.ctlz.i64(i64 %v.i, i1 noundef false) #78, !dbg !3249, !range !2055
  %64 = trunc i64 %63 to i32, !dbg !3251
  br label %L261, !dbg !3252

L261:                                             ; preds = %L261, %L258
  %iv = phi i64 [ %iv.next, %L261 ], [ 0, %L258 ]
  %value_phi239 = phi i32 [ %64, %L258 ], [ %65, %L261 ]
  %value_phi240 = phi i32 [ %61, %L258 ], [ %74, %L261 ]
  %value_phi241 = phi i64 [ %v.i, %L258 ], [ %70, %L261 ]
  %iv.next = add nuw nsw i64 %iv, 1, !dbg !3257
  %65 = sub i32 %value_phi239, %value_phi240, !dbg !3257
  %66 = sub i32 64, %65, !dbg !3259
  %67 = zext i32 %66 to i64, !dbg !3261
  %68 = icmp ugt i32 %66, 63, !dbg !3261
  %notmask = shl nsw i64 -1, %67, !dbg !3259
  %.op = xor i64 %notmask, -1, !dbg !3259
  %69 = select i1 %68, i64 -1, i64 %.op, !dbg !3259
  %70 = and i64 %69, %value_phi241, !dbg !3262
  %71 = xor i64 %70, %value_phi241, !dbg !3264
  %72 = call i64 @llvm.ctpop.i64(i64 %71) #78, !dbg !3265, !range !2055
  %73 = trunc i64 %72 to i32, !dbg !3267
  %74 = add i32 %value_phi240, %73, !dbg !3268
  %.not558 = icmp eq i32 %74, 0, !dbg !3269
  br i1 %.not558, label %L282, label %L261, !dbg !3270

L282:                                             ; preds = %L261
  %75 = xor i64 %70, -1, !dbg !3271
  %76 = and i64 %v.i, %75, !dbg !3273
  store atomic i64 %76, i64* %p.i release, align 16, !dbg !3274, !noalias !3275
  br label %L293, !dbg !3276

L293:                                             ; preds = %L282, %L245
  %value_phi155 = phi i32 [ %56, %L282 ], [ %60, %L245 ]
  %value_phi156 = phi i64 [ %70, %L282 ], [ %v.i, %L245 ]
  %77 = icmp sgt i32 %value_phi155, 0, !dbg !3279
  br i1 %77, label %L361.lr.ph, label %L691.lr.ph, !dbg !3280

L361.lr.ph:                                       ; preds = %L293
  %78 = zext i32 %value_phi155 to i64, !dbg !3281
  %79 = add nuw nsw i64 %78, 1, !dbg !3298
  %80 = udiv i64 %arraysize5, %79, !dbg !3300
  %81 = mul i64 %80, %79, !dbg !3301
  %82 = sub i64 %arraysize5, %81, !dbg !3303
  %83 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %7) #83, !dbg !3304
  %84 = bitcast {}* %83 to i8**, !dbg !3304
  %arrayptr159 = load i8*, i8** %84, align 8, !dbg !3304, !tbaa !68, !alias.scope !336, !noalias !337, !nonnull !63
  %85 = ptrtoint i8* %arrayptr159 to i64, !dbg !3304
  %arraysize161 = load i64, i64 addrspace(11)* %14, align 8, !dbg !3312, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %arraysize163 = load i64, i64 addrspace(11)* %15, align 16, !dbg !3312, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %86 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %9) #83, !dbg !3318
  %87 = bitcast {}* %86 to i8**, !dbg !3318
  %arrayptr179 = load i8*, i8** %87, align 8, !dbg !3318, !tbaa !271, !alias.scope !254, !noalias !255, !nonnull !63
  %88 = ptrtoint i8* %arrayptr179 to i64, !dbg !3318
  %arraylen181 = load i64, i64 addrspace(11)* %arraylen_ptr2, align 8, !dbg !3328, !tbaa !250, !range !253, !alias.scope !254, !noalias !255
  %89 = insertvalue [2 x {} addrspace(10)*] zeroinitializer, {} addrspace(10)* %0, 0, !dbg !3334
  %90 = insertvalue [2 x {} addrspace(10)*] %89, {} addrspace(10)* %1, 1, !dbg !3334
  %newstruct187.sroa.0.0..sroa_idx = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %3, i64 0, i32 0, i64 0, i64 0, i64 0, !dbg !3335
  store i64 %.sroa.0449.0, i64* %newstruct187.sroa.0.0..sroa_idx, align 16, !dbg !3335, !tbaa !682, !alias.scope !2185, !noalias !3336
  %newstruct187.sroa.2.sroa.0.0.newstruct187.sroa.2.0..sroa_cast.sroa_idx = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %3, i64 0, i32 1, i32 0, !dbg !3335
  store i64 %85, i64* %newstruct187.sroa.2.sroa.0.0.newstruct187.sroa.2.0..sroa_cast.sroa_idx, align 8, !dbg !3335, !tbaa !682, !alias.scope !2185, !noalias !3336
  %newstruct187.sroa.2.sroa.2.0.newstruct187.sroa.2.0..sroa_cast.sroa_idx415 = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %3, i64 0, i32 1, i32 1, i64 0, !dbg !3335
  store i64 %arraysize161, i64* %newstruct187.sroa.2.sroa.2.0.newstruct187.sroa.2.0..sroa_cast.sroa_idx415, align 16, !dbg !3335, !tbaa !682, !alias.scope !2185, !noalias !3336
  %newstruct187.sroa.2.sroa.3.0.newstruct187.sroa.2.0..sroa_cast.sroa_idx416 = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %3, i64 0, i32 1, i32 1, i64 1, !dbg !3335
  store i64 %arraysize163, i64* %newstruct187.sroa.2.sroa.3.0.newstruct187.sroa.2.0..sroa_cast.sroa_idx416, align 8, !dbg !3335, !tbaa !682, !alias.scope !2185, !noalias !3336
  %newstruct187.sroa.3.sroa.0.sroa.0.0.newstruct187.sroa.3.sroa.0.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %3, i64 0, i32 2, i32 0, i32 0, i32 0, !dbg !3335
  store i64 %85, i64* %newstruct187.sroa.3.sroa.0.sroa.0.0.newstruct187.sroa.3.sroa.0.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx, align 16, !dbg !3335, !tbaa !682, !alias.scope !2185, !noalias !3336
  %newstruct187.sroa.3.sroa.0.sroa.2.0.newstruct187.sroa.3.sroa.0.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx411 = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %3, i64 0, i32 2, i32 0, i32 0, i32 1, i64 0, !dbg !3335
  store i64 %arraysize161, i64* %newstruct187.sroa.3.sroa.0.sroa.2.0.newstruct187.sroa.3.sroa.0.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx411, align 8, !dbg !3335, !tbaa !682, !alias.scope !2185, !noalias !3336
  %newstruct187.sroa.3.sroa.0.sroa.3.0.newstruct187.sroa.3.sroa.0.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx412 = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %3, i64 0, i32 2, i32 0, i32 0, i32 1, i64 1, !dbg !3335
  store i64 %arraysize163, i64* %newstruct187.sroa.3.sroa.0.sroa.3.0.newstruct187.sroa.3.sroa.0.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx412, align 16, !dbg !3335, !tbaa !682, !alias.scope !2185, !noalias !3336
  %newstruct187.sroa.3.sroa.2.0.newstruct187.sroa.3.0..sroa_cast.sroa_idx405 = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %3, i64 0, i32 2, i32 0, i32 1, i32 0, !dbg !3335
  store i64 %88, i64* %newstruct187.sroa.3.sroa.2.0.newstruct187.sroa.3.0..sroa_cast.sroa_idx405, align 8, !dbg !3335, !tbaa !682, !alias.scope !2185, !noalias !3336
  %newstruct187.sroa.3.sroa.3.0.newstruct187.sroa.3.0..sroa_cast.sroa_idx406 = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %3, i64 0, i32 2, i32 0, i32 1, i32 1, i64 0, !dbg !3335
  store i64 %arraylen181, i64* %newstruct187.sroa.3.sroa.3.0.newstruct187.sroa.3.0..sroa_cast.sroa_idx406, align 16, !dbg !3335, !tbaa !682, !alias.scope !2185, !noalias !3336
  %newstruct187.sroa.3.sroa.4.sroa.0.0.newstruct187.sroa.3.sroa.4.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %3, i64 0, i32 2, i32 1, i64 0, i64 0, !dbg !3335
  store i64 %.sroa.0449.0, i64* %newstruct187.sroa.3.sroa.4.sroa.0.0.newstruct187.sroa.3.sroa.4.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx, align 8, !dbg !3335, !tbaa !682, !alias.scope !2185, !noalias !3336
  %newstruct187.sroa.3.sroa.4.sroa.2.0.newstruct187.sroa.3.sroa.4.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx446 = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %3, i64 0, i32 2, i32 1, i64 1, i64 0, !dbg !3335
  store i64 %arraysize5, i64* %newstruct187.sroa.3.sroa.4.sroa.2.0.newstruct187.sroa.3.sroa.4.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx446, align 16, !dbg !3335, !tbaa !682, !alias.scope !2185, !noalias !3336
  %91 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %0, [2 x {} addrspace(10)*] %90) #78, !dbg !3311
  %92 = icmp sgt i64 %82, -1
  br label %L361, !dbg !3337

L427.preheader:                                   ; preds = %L415
  %value_phi201587 = add i64 %105, 1, !dbg !3338
  %.not549588 = icmp sgt i64 %value_phi201587, %arraysize5, !dbg !3339
  br i1 %.not549588, label %L641.preheader, label %L430.lr.ph, !dbg !3340

L430.lr.ph:                                       ; preds = %L427.preheader
  %93 = icmp eq i64 %.sroa.0449.0, 0
  %.not551 = icmp eq i64 %arraysize161, 1
  %.not552 = icmp eq i64 %arraysize163, 1
  %.not553 = icmp eq i64 %arraylen181, 1
  %94 = add nsw i64 %.sroa.0449.0, -1, !dbg !3340
  %umin600 = call i64 @llvm.umin.i64(i64 %94, i64 noundef 9223372036854775806) #78, !dbg !3340
  %95 = add nuw nsw i64 %umin600, 1
  %96 = add i64 %80, %value_phi193592, !dbg !3341
  %umin = call i1 @llvm.umin.i1(i1 %102, i1 %92), !dbg !3340
  %97 = zext i1 %umin to i64, !dbg !3340
  %98 = add i64 %96, %97, !dbg !3341
  br label %L430, !dbg !3340

L361:                                             ; preds = %L415, %L361.lr.ph
  %iv3 = phi i64 [ %iv.next4, %L415 ], [ 0, %L361.lr.ph ]
  %value_phi195594 = phi i64 [ %value_phi156, %L361.lr.ph ], [ %111, %L415 ]
  %value_phi193592 = phi i64 [ 0, %L361.lr.ph ], [ %105, %L415 ]
  %value_phi192591 = phi i32 [ 0, %L361.lr.ph ], [ %107, %L415 ]
  %iv.next4 = add nuw nsw i64 %iv3, 1, !dbg !3342
  %99 = icmp ne i64 %value_phi195594, 0, !dbg !3342
  call void @llvm.assume(i1 noundef %99) #78, !dbg !3345
  %100 = call i64 @llvm.cttz.i64(i64 %value_phi195594, i1 noundef true) #78, !dbg !3346, !range !2055
  %101 = trunc i64 %100 to i32, !dbg !3348
  %102 = icmp ugt i64 %82, %iv3, !dbg !3349
  %not.ifelse_cond196 = and i1 %92, %102, !dbg !3353
  %103 = zext i1 %not.ifelse_cond196 to i64, !dbg !3353
  %104 = add i64 %value_phi193592, %80, !dbg !3353
  %105 = add i64 %104, %103, !dbg !3354
  %106 = add nuw nsw i32 %101, 1, !dbg !3355
  %107 = add i32 %106, %value_phi192591, !dbg !3357
  %108 = zext i32 %106 to i64, !dbg !3359
  %109 = lshr i64 %value_phi195594, %108, !dbg !3359
  %110 = icmp eq i32 %101, 63, !dbg !3359
  %111 = select i1 %110, i64 0, i64 %109, !dbg !3359
  %112 = load i64, i64* inttoptr (i64 137517345406912 to i64*), align 64, !dbg !3361, !tbaa !131, !alias.scope !83, !noalias !86
  %113 = shl i32 %107, 9, !dbg !3367
  %114 = zext i32 %113 to i64, !dbg !3368
  %115 = inttoptr i64 %112 to i8*, !dbg !3372
  %116 = getelementptr i8, i8* %115, i64 %114, !dbg !3372
  %117 = getelementptr i8, i8* %116, i64 8, !dbg !3373
  %coercion = bitcast i8* %117 to i64*, !dbg !3379
  store i64 ptrtoint (void (i64)* @jlcapi_BatchClosure_2456 to i64), i64* %coercion, align 1, !dbg !3379, !tbaa !81, !alias.scope !83, !noalias !3159
  %118 = getelementptr i8, i8* %116, i64 16, !dbg !3383
  %119 = bitcast i8* %118 to { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }**, !dbg !3387
  store { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %3, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }** %119, align 1, !dbg !3387, !tbaa !81, !alias.scope !83, !noalias !3159
  %120 = getelementptr i8, i8* %116, i64 24, !dbg !3391
  %coercion198 = bitcast i8* %120 to i64*, !dbg !3395
  store i64 %value_phi193592, i64* %coercion198, align 1, !dbg !3395, !tbaa !81, !alias.scope !83, !noalias !3159
  %121 = getelementptr i8, i8* %116, i64 32, !dbg !3399
  %coercion199 = bitcast i8* %121 to i64*, !dbg !3403
  store i64 %105, i64* %coercion199, align 1, !dbg !3403, !tbaa !81, !alias.scope !83, !noalias !3159
  %p.i386 = bitcast i8* %116 to i32*, !dbg !3407
  %v.i387 = atomicrmw xchg i32* %p.i386, i32 0 acq_rel, align 4, !dbg !3407
  %.not548 = icmp eq i32 %v.i387, 1, !dbg !3410
  br i1 %.not548, label %L412, label %L415, !dbg !3411

L412:                                             ; preds = %L361
  call fastcc void @julia_wake_thread__2634(i32 zeroext %107) #78, !dbg !3411
  br label %L415, !dbg !3411

L415:                                             ; preds = %L412, %L361
  %122 = icmp eq i64 %iv.next4, %78, !dbg !3412
  br i1 %122, label %L427.preheader, label %L361, !dbg !3337

L641.preheader.loopexit:                          ; preds = %L622
  br label %L641.preheader, !dbg !3414

L641.preheader:                                   ; preds = %L641.preheader.loopexit, %L427.preheader
  %123 = icmp eq i64 %value_phi156, 0, !dbg !3414
  br i1 %123, label %L678, label %L646.preheader, !dbg !3416

L646.preheader:                                   ; preds = %L641.preheader
  br label %L646, !dbg !3417

L430:                                             ; preds = %L622, %L430.lr.ph
  %iv5 = phi i64 [ %iv.next6, %L622 ], [ 0, %L430.lr.ph ]
  %124 = add i64 %98, %iv5, !dbg !3341
  %iv.next6 = add nuw nsw i64 %iv5, 1, !dbg !3341
  %125 = add i64 %value_phi201587, %iv5, !dbg !3341
  br i1 %93, label %L622, label %L442.preheader, !dbg !3341

L442.preheader:                                   ; preds = %L430
  %126 = select i1 %.not552, i64 0, i64 %124
  %127 = mul i64 %126, %arraysize161
  %128 = mul i64 %124, %arraysize161
  br label %L442, !dbg !3253

L442:                                             ; preds = %L442, %L442.preheader
  %iv7 = phi i64 [ %iv.next8, %L442 ], [ 0, %L442.preheader ]
  %iv.next8 = add nuw nsw i64 %iv7, 1, !dbg !3420
  %129 = select i1 %.not551, i64 1, i64 %iv.next8, !dbg !3420
  %130 = add i64 %129, %127, !dbg !3429
  %131 = shl i64 %130, 2, !dbg !3437
  %132 = add i64 %131, -4, !dbg !3437
  %133 = getelementptr i8, i8* %arrayptr159, i64 %132, !dbg !3440
  %coercion216 = bitcast i8* %133 to float*, !dbg !3441
  %pointerref = load float, float* %coercion216, align 1, !dbg !3441, !tbaa !81, !alias.scope !83, !noalias !86
  %value_phi206.op = shl i64 %iv.next8, 2, !dbg !3445
  %value_phi206.op.op = add i64 %value_phi206.op, -4, !dbg !3445
  %134 = select i1 %.not553, i64 0, i64 %value_phi206.op.op, !dbg !3445
  %135 = getelementptr i8, i8* %arrayptr179, i64 %134, !dbg !3452
  %coercion219 = bitcast i8* %135 to float*, !dbg !3453
  %pointerref220 = load float, float* %coercion219, align 1, !dbg !3453, !tbaa !81, !alias.scope !83, !noalias !86
  %136 = fadd float %pointerref, %pointerref220, !dbg !3457
  call void @llvm.lifetime.end.p0i8(i64 noundef 88, i8* noundef nonnull %.sub) #78
  %137 = call fastcc float @julia_gelu_2643(float %136) #78, !dbg !3459
  %138 = add i64 %iv.next8, %128, !dbg !3464
  %139 = shl i64 %138, 2, !dbg !3472
  %140 = add i64 %139, -4, !dbg !3472
  %141 = getelementptr i8, i8* %arrayptr159, i64 %140, !dbg !3475
  %coercion222 = bitcast i8* %141 to float*, !dbg !3476
  store float %137, float* %coercion222, align 1, !dbg !3476, !tbaa !81, !alias.scope !83, !noalias !3159
  %142 = add nuw nsw i64 %iv.next8, 1, !dbg !3480
  %exitcond601.not = icmp eq i64 %iv.next8, %95, !dbg !3483
  br i1 %exitcond601.not, label %L622.loopexit, label %L442, !dbg !3253

L622.loopexit:                                    ; preds = %L442
  br label %L622, !dbg !3338

L622:                                             ; preds = %L622.loopexit, %L430
  %value_phi201 = add i64 %125, 1, !dbg !3338
  %exitcond602 = icmp eq i64 %125, %arraysize5, !dbg !3339
  br i1 %exitcond602, label %L641.preheader.loopexit, label %L430, !dbg !3340

L646:                                             ; preds = %L646.preheader, %L676
  %iv9 = phi i64 [ 0, %L646.preheader ], [ %iv.next10, %L676 ]
  %value_phi236586 = phi i64 [ %147, %L676 ], [ %value_phi156, %L646.preheader ]
  %value_phi235585 = phi i32 [ %149, %L676 ], [ 0, %L646.preheader ]
  %iv.next10 = add nuw nsw i64 %iv9, 1, !dbg !3484
  %143 = call i64 @llvm.cttz.i64(i64 %value_phi236586, i1 noundef true) #78, !dbg !3484, !range !2055
  %144 = trunc i64 %143 to i32, !dbg !3486
  %145 = add nuw nsw i32 %144, 1, !dbg !3487
  %146 = zext i32 %145 to i64, !dbg !3489
  %147 = lshr i64 %value_phi236586, %146, !dbg !3489
  %148 = icmp eq i32 %144, 63, !dbg !3489
  %149 = add i32 %145, %value_phi235585, !dbg !3491
  %150 = load i64, i64* inttoptr (i64 137517345406912 to i64*), align 64, !dbg !3493, !tbaa !131, !alias.scope !83, !noalias !86
  %151 = shl i32 %149, 9, !dbg !3496
  %152 = zext i32 %151 to i64, !dbg !3497
  %153 = inttoptr i64 %150 to i8*, !dbg !3501
  %154 = getelementptr i8, i8* %153, i64 %152, !dbg !3501
  %p.i388 = bitcast i8* %154 to i32*, !dbg !3502
  %v.i389582 = load atomic i32, i32* %p.i388 acquire, align 16, !dbg !3502
  %.not556583 = icmp eq i32 %v.i389582, 0, !dbg !3504
  br i1 %.not556583, label %L666.preheader, label %L676, !dbg !3417

L666.preheader:                                   ; preds = %L646
  br label %L666, !dbg !3505

L666:                                             ; preds = %L666.preheader, %L673
  %iv11 = phi i64 [ 0, %L666.preheader ], [ %iv.next12, %L673 ]
  %155 = trunc i64 %iv11 to i32
  %iv.next12 = add nuw nsw i64 %iv11, 1
  call void @llvm.lifetime.end.p0i8(i64 noundef 88, i8* noundef nonnull %.sub) #78
  call void asm sideeffect "pause", "~{memory}"() #84, !dbg !3506
  %156 = add i32 %155, 1, !dbg !3508
  %157 = icmp ult i32 %156, 65537, !dbg !3509
  br i1 %157, label %L673, label %L670, !dbg !3505

L670:                                             ; preds = %L666
  %158 = call fastcc i8 @julia_checktask_2476(i32 zeroext %149) #78, !dbg !3511
  %159 = and i8 %158, 1, !dbg !3511
  %.not557 = icmp eq i8 %159, 0, !dbg !3511
  br i1 %.not557, label %L673, label %L676.loopexit, !dbg !3511

L673:                                             ; preds = %L670, %L666
  %v.i389 = load atomic i32, i32* %p.i388 acquire, align 16, !dbg !3502
  %.not556 = icmp eq i32 %v.i389, 0, !dbg !3504
  br i1 %.not556, label %L666, label %L676.loopexit, !dbg !3417

L676.loopexit:                                    ; preds = %L670, %L673
  br label %L676, !dbg !3414

L676:                                             ; preds = %L676.loopexit, %L646
  %160 = icmp eq i64 %147, 0, !dbg !3414
  %161 = select i1 %148, i1 true, i1 %160, !dbg !3414
  br i1 %161, label %L678.loopexit, label %L646, !dbg !3416

L678.loopexit:                                    ; preds = %L676
  br label %L678, !dbg !3512

L678:                                             ; preds = %L678.loopexit, %L641.preheader
  %v.i391 = atomicrmw or i64* %p.i, i64 %value_phi156 acq_rel, align 8, !dbg !3512
  br label %L1042, !dbg !3515

L691.lr.ph:                                       ; preds = %L293, %L235, %L227
  %162 = icmp eq i64 %.sroa.0449.0, 0
  %arrayptr_ptr129.phi.trans.insert = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %8, i64 0, i32 0
  %163 = addrspacecast {} addrspace(10)* %1 to float addrspace(13)* addrspace(11)*
  %164 = add nsw i64 %.sroa.0449.0, -1, !dbg !3516
  %umin597 = call i64 @llvm.umin.i64(i64 %164, i64 noundef 9223372036854775806) #78, !dbg !3516
  %165 = call i64 @llvm.smax.i64(i64 %arraysize5, i64 noundef 1) #78, !dbg !3516
  %166 = add nuw nsw i64 %umin597, 1
  br label %L691, !dbg !3516

L691:                                             ; preds = %L804, %L691.lr.ph
  %iv13 = phi i64 [ %iv.next14, %L804 ], [ 0, %L691.lr.ph ]
  %iv.next14 = add nuw nsw i64 %iv13, 1, !dbg !3517
  br i1 %162, label %L804, label %L691.L703_crit_edge, !dbg !3517

L691.L703_crit_edge:                              ; preds = %L691
  %arraysize117.pre = load i64, i64 addrspace(11)* %14, align 8, !dbg !3518, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %arrayptr130.pre = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %arrayptr_ptr129.phi.trans.insert, align 16, !dbg !3527, !tbaa !68, !alias.scope !3204, !noalias !337
  %value_phi106.op = add nsw i64 %iv.next14, -1
  %167 = bitcast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !3517
  %168 = bitcast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %167 to i8 addrspace(13)* addrspace(10)*, !dbg !3517
  %169 = bitcast i8 addrspace(13)* addrspace(10)* %168 to {} addrspace(10)*, !dbg !3517
  br label %L703, !dbg !3517

L703:                                             ; preds = %L703, %L691.L703_crit_edge
  %iv15 = phi i64 [ %iv.next16, %L703 ], [ 0, %L691.L703_crit_edge ], !dbg !3527
  %nodecayed.arrayptr130 = phi {} addrspace(10)* [ %169, %L691.L703_crit_edge ], [ %190, %L703 ], !dbg !3527
  %arraysize127 = phi i64 [ %arraysize117.pre, %L691.L703_crit_edge ], [ %arraysize141, %L703 ], !dbg !3527
  %iv.next16 = add nuw nsw i64 %iv15, 1, !dbg !3518
  %170 = bitcast {} addrspace(10)* %nodecayed.arrayptr130 to i8 addrspace(13)* addrspace(10)*, !dbg !3518
  %171 = addrspacecast i8 addrspace(13)* addrspace(10)* %170 to i8 addrspace(13)* addrspace(11)*, !dbg !3518
  %172 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %171, align 8, !dbg !3518
  %arraysize119 = load i64, i64 addrspace(11)* %15, align 16, !dbg !3518, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %.not543 = icmp eq i64 %arraysize127, 1, !dbg !3529
  %.not544 = icmp eq i64 %arraysize119, 1, !dbg !3531
  %value_phi111.op = add nsw i64 %iv.next16, -1, !dbg !3527
  %173 = select i1 %.not543, i64 0, i64 %value_phi111.op, !dbg !3527
  %174 = select i1 %.not544, i64 0, i64 %value_phi106.op, !dbg !3527
  %175 = mul i64 %174, %arraysize127, !dbg !3527
  %176 = add i64 %175, %173, !dbg !3527
  %177 = bitcast i8 addrspace(13)* %172 to float addrspace(13)*, !dbg !3527
  %178 = getelementptr inbounds float, float addrspace(13)* %177, i64 %176, !dbg !3527
  %arrayref131 = load float, float addrspace(13)* %178, align 4, !dbg !3527, !tbaa !494, !alias.scope !83, !noalias !86
  %arraylen133 = load i64, i64 addrspace(11)* %arraylen_ptr2, align 8, !dbg !3533, !tbaa !250, !range !253, !alias.scope !254, !noalias !255
  %.not545 = icmp eq i64 %arraylen133, 1, !dbg !3538
  %179 = select i1 %.not545, i64 0, i64 %value_phi111.op, !dbg !3540
  %arrayptr138547 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %163, align 16, !dbg !3540, !tbaa !271, !alias.scope !3194, !noalias !255, !nonnull !63
  %180 = getelementptr inbounds float, float addrspace(13)* %arrayptr138547, i64 %179, !dbg !3540
  %arrayref139 = load float, float addrspace(13)* %180, align 4, !dbg !3540, !tbaa !494, !alias.scope !83, !noalias !86
  %181 = fadd float %arrayref131, %arrayref139, !dbg !3542
  %182 = call fastcc float @julia_gelu_2643(float %181) #78, !dbg !3544
  %arraysize141 = load i64, i64 addrspace(11)* %14, align 8, !dbg !3549, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %183 = mul i64 %arraysize141, %value_phi106.op, !dbg !3549
  %184 = add i64 %183, %value_phi111.op, !dbg !3549
  %arrayptr144 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %arrayptr_ptr129.phi.trans.insert, align 16, !dbg !3549, !tbaa !68, !alias.scope !3204, !noalias !337, !nonnull !63
  %185 = bitcast i8 addrspace(13)* %arrayptr144 to float addrspace(13)*, !dbg !3549
  %186 = getelementptr inbounds float, float addrspace(13)* %185, i64 %184, !dbg !3549
  store float %182, float addrspace(13)* %186, align 4, !dbg !3549, !tbaa !494, !alias.scope !83, !noalias !3159
  %187 = add nuw nsw i64 %iv.next16, 1, !dbg !3551
  %exitcond598.not = icmp eq i64 %iv.next16, %166, !dbg !3554
  %188 = bitcast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !3277
  %189 = bitcast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %188 to i8 addrspace(13)* addrspace(10)*, !dbg !3277
  %190 = bitcast i8 addrspace(13)* addrspace(10)* %189 to {} addrspace(10)*, !dbg !3277
  br i1 %exitcond598.not, label %L804.loopexit, label %L703, !dbg !3277

L804.loopexit:                                    ; preds = %L703
  br label %L804, !dbg !3555

L804:                                             ; preds = %L804.loopexit, %L691
  %191 = add nuw nsw i64 %iv.next14, 1, !dbg !3555
  %exitcond599 = icmp eq i64 %iv.next14, %165, !dbg !3558
  br i1 %exitcond599, label %L1042.loopexit2, label %L691, !dbg !3516

L811:                                             ; preds = %top
  %192 = addrspacecast {} addrspace(10)* %0 to {} addrspace(10)* addrspace(11)*, !dbg !3559
  %arraysize_ptr246 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %192, i64 3, !dbg !3559
  %193 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr246 to i64 addrspace(11)*, !dbg !3559
  %arraysize247 = load i64, i64 addrspace(11)* %193, align 8, !dbg !3559, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %arraysize_ptr248 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %192, i64 4, !dbg !3559
  %194 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr248 to i64 addrspace(11)*, !dbg !3559
  %arraysize249 = load i64, i64 addrspace(11)* %194, align 16, !dbg !3559, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %195 = icmp eq i64 %arraysize247, 1, !dbg !3564
  %196 = icmp eq i64 %arraysize249, 1, !dbg !3569
  %197 = icmp ne i64 %arraysize247, %arraylen3, !dbg !3572
  %198 = icmp ne i64 %arraylen3, 1, !dbg !3574
  %199 = and i1 %198, %197, !dbg !3575
  br i1 %199, label %L860, label %L902, !dbg !3575

L860:                                             ; preds = %L811
  call fastcc void @julia_DimensionMismatch_2470() #78, !dbg !3575
  %box348 = call noalias nonnull dereferenceable(8) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 137517631542736 to {}*) to {} addrspace(10)*)) #81, !dbg !3575
  %200 = bitcast {} addrspace(10)* %box348 to [1 x {} addrspace(10)*] addrspace(10)*, !dbg !3575
  %201 = getelementptr [1 x {} addrspace(10)*], [1 x {} addrspace(10)*] addrspace(10)* %200, i64 0, i64 0, !dbg !3575
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 137517709292320 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %201, align 8, !dbg !3575, !tbaa !188, !alias.scope !83, !noalias !3159
  %202 = addrspacecast {} addrspace(10)* %box348 to {} addrspace(12)*, !dbg !3575
  call void @ijl_throw({} addrspace(12)* %202) #82, !dbg !3575
  unreachable, !dbg !3575

L902:                                             ; preds = %L811
  %203 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %7) #83, !dbg !3578
  %204 = bitcast {}* %203 to i8**, !dbg !3578
  %arrayptr286 = load i8*, i8** %204, align 8, !dbg !3578, !tbaa !68, !alias.scope !336, !noalias !337, !nonnull !63
  %205 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %9) #83, !dbg !3578
  %206 = bitcast {}* %205 to i8**, !dbg !3578
  %arrayptr288 = load i8*, i8** %206, align 8, !dbg !3578, !tbaa !271, !alias.scope !254, !noalias !255, !nonnull !63
  %.not560 = icmp eq i8* %arrayptr286, %arrayptr288, !dbg !3590
  %207 = bitcast {} addrspace(10)* %1 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !3582
  br i1 %.not560, label %L924, label %L929, !dbg !3582

L924:                                             ; preds = %L902
  %208 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %1) #78, !dbg !3593
  %.phi.trans.insert517 = addrspacecast {} addrspace(10)* %208 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*
  %arraylen_ptr290.phi.trans.insert = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %.phi.trans.insert517, i64 0, i32 1
  %arraylen291.pre = load i64, i64 addrspace(11)* %arraylen_ptr290.phi.trans.insert, align 8, !dbg !3595, !tbaa !250, !range !253, !alias.scope !254, !noalias !255
  %209 = bitcast {} addrspace(10)* %208 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !3252
  br label %L929, !dbg !3252

L929:                                             ; preds = %L924, %L902
  %nodecayed..pre-phi529 = phi { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* [ %209, %L924 ], [ %207, %L902 ], !dbg !3595
  %arraylen291 = phi i64 [ %arraylen291.pre, %L924 ], [ %arraylen3, %L902 ], !dbg !3595
  %210 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %nodecayed..pre-phi529 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !3599
  %211 = icmp eq i64 %arraylen291, 1, !dbg !3599
  %.not561 = icmp eq i64 %arraysize249, 0, !dbg !3603
  br i1 %.not561, label %L1042, label %L954.preheader, !dbg !3607

L954.preheader:                                   ; preds = %L929
  %.not562 = icmp eq i64 %arraysize247, 0
  %212 = addrspacecast {} addrspace(10)* %0 to float addrspace(13)* addrspace(11)*
  %213 = bitcast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %210 to float addrspace(13)* addrspace(11)*
  br label %L954, !dbg !3609

L954:                                             ; preds = %L1005, %L954.preheader
  %iv19 = phi i64 [ %iv.next20, %L1005 ], [ 0, %L954.preheader ]
  %iv.next20 = add nuw nsw i64 %iv19, 1, !dbg !3609
  br i1 %.not562, label %L1005, label %L963.lr.ph, !dbg !3609

L963.lr.ph:                                       ; preds = %L954
  %value_phi300.op = add nsw i64 %iv.next20, -1
  %214 = select i1 %196, i64 0, i64 %value_phi300.op
  %arraysize309.pre = load i64, i64 addrspace(11)* %193, align 8, !dbg !3610, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %arrayptr312564.pre = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %212, align 16, !dbg !3610, !tbaa !68, !alias.scope !3204, !noalias !337
  %215 = bitcast {} addrspace(10)* %0 to float addrspace(13)* addrspace(10)*, !dbg !3618
  %216 = bitcast float addrspace(13)* addrspace(10)* %215 to {} addrspace(10)*, !dbg !3618
  br label %L963, !dbg !3618

L963:                                             ; preds = %L963, %L963.lr.ph
  %iv21 = phi i64 [ %iv.next22, %L963 ], [ 0, %L963.lr.ph ], !dbg !3610
  %nodecayed.arrayptr312564 = phi {} addrspace(10)* [ %216, %L963.lr.ph ], [ %232, %L963 ], !dbg !3610
  %arraysize309 = phi i64 [ %arraysize309.pre, %L963.lr.ph ], [ %arraysize319, %L963 ], !dbg !3610
  %iv.next22 = add nuw nsw i64 %iv21, 1, !dbg !3619
  %217 = bitcast {} addrspace(10)* %nodecayed.arrayptr312564 to float addrspace(13)* addrspace(10)*, !dbg !3619
  %218 = addrspacecast float addrspace(13)* addrspace(10)* %217 to float addrspace(13)* addrspace(11)*, !dbg !3619
  %219 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %218, align 8, !dbg !3619
  %220 = select i1 %195, i64 0, i64 %iv21, !dbg !3610
  %221 = mul i64 %arraysize309, %214, !dbg !3610
  %222 = add i64 %220, %221, !dbg !3610
  %223 = getelementptr inbounds float, float addrspace(13)* %219, i64 %222, !dbg !3610
  %arrayref313 = load float, float addrspace(13)* %223, align 4, !dbg !3610, !tbaa !494, !alias.scope !83, !noalias !86
  %224 = select i1 %211, i64 0, i64 %iv21, !dbg !3622
  %arrayptr316565 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %213, align 8, !dbg !3622, !tbaa !271, !alias.scope !3194, !noalias !255, !nonnull !63
  %225 = getelementptr inbounds float, float addrspace(13)* %arrayptr316565, i64 %224, !dbg !3622
  %arrayref317 = load float, float addrspace(13)* %225, align 4, !dbg !3622, !tbaa !494, !alias.scope !83, !noalias !86
  %226 = fadd float %arrayref313, %arrayref317, !dbg !3626
  %227 = call fastcc float @julia_gelu_2643(float %226) #78, !dbg !3628
  %arraysize319 = load i64, i64 addrspace(11)* %193, align 8, !dbg !3633, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %228 = mul i64 %arraysize319, %value_phi300.op, !dbg !3633
  %229 = add i64 %228, %iv21, !dbg !3633
  %arrayptr322566 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %212, align 16, !dbg !3633, !tbaa !68, !alias.scope !3204, !noalias !337, !nonnull !63
  %230 = getelementptr inbounds float, float addrspace(13)* %arrayptr322566, i64 %229, !dbg !3633
  store float %227, float addrspace(13)* %230, align 4, !dbg !3633, !tbaa !494, !alias.scope !83, !noalias !3159
  %exitcond.not = icmp eq i64 %iv.next22, %arraysize247, !dbg !3635
  %231 = bitcast {} addrspace(10)* %0 to float addrspace(13)* addrspace(10)*, !dbg !3618
  %232 = bitcast float addrspace(13)* addrspace(10)* %231 to {} addrspace(10)*, !dbg !3618
  br i1 %exitcond.not, label %L1005.loopexit, label %L963, !dbg !3618, !llvm.loop !3636

L1005.loopexit:                                   ; preds = %L963
  br label %L1005, !dbg !3637

L1005:                                            ; preds = %L1005.loopexit, %L954
  %233 = add nuw nsw i64 %iv.next20, 1, !dbg !3637
  %exitcond596.not = icmp eq i64 %iv.next20, %arraysize249, !dbg !3641
  br i1 %exitcond596.not, label %L1042.loopexit, label %L954, !dbg !3640

L1042.loopexit:                                   ; preds = %L1005
  br label %L1042

L1042.loopexit1:                                  ; preds = %L194
  br label %L1042

L1042.loopexit2:                                  ; preds = %L804
  br label %L1042

L1042:                                            ; preds = %L1042.loopexit2, %L1042.loopexit1, %L1042.loopexit, %L929, %L678, %L221, %L62
  call void @llvm.lifetime.end.p0i8(i64 noundef 88, i8* noundef nonnull %.sub) #78
  ret void, !dbg !3642

guard_exit374:                                    ; preds = %L62
  %arrayptr_ptr.phi.trans.insert = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %8, i64 0, i32 0
  %arrayptr.pre = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %arrayptr_ptr.phi.trans.insert, align 16, !dbg !3180, !tbaa !68, !alias.scope !3204, !noalias !337
  %234 = addrspacecast {} addrspace(10)* %1 to float addrspace(13)* addrspace(11)*
  %235 = bitcast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !3643
  %236 = bitcast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %235 to i8 addrspace(13)* addrspace(10)*, !dbg !3643
  %237 = bitcast i8 addrspace(13)* addrspace(10)* %236 to {} addrspace(10)*, !dbg !3643
  br label %L86, !dbg !3643

guard_exit379:                                    ; preds = %L194, %L86
  %value_phi78573 = phi i64 [ %49, %L194 ], [ %value_phi41, %L86 ]
  %value_phi77572 = phi i64 [ 1, %L194 ], [ %44, %L86 ]
  %arraysize54.pre = load i64, i64 addrspace(11)* %15, align 16, !dbg !3183, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %arraylen64.pre = load i64, i64 addrspace(11)* %arraylen_ptr2, align 8, !dbg !3171, !tbaa !250, !range !253, !alias.scope !254, !noalias !255
  %238 = bitcast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !3643
  %239 = bitcast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %238 to i8 addrspace(13)* addrspace(10)*, !dbg !3643
  %240 = bitcast i8 addrspace(13)* addrspace(10)* %239 to {} addrspace(10)*, !dbg !3643
  br label %L86, !dbg !3643
}

; Function Attrs: mustprogress willreturn
define internal fastcc void @diffejulia___apply_bias_activation___2450({} addrspace(10)* align 16 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="137517761287184" "enzymejl_parmtype_ref"="2" %0, {} addrspace(10)* align 16 "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="137517761287184" "enzymejl_parmtype_ref"="2" %"'", {} addrspace(10)* align 16 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="137517789463248" "enzymejl_parmtype_ref"="2" %1, {} addrspace(10)* align 16 "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="137517789463248" "enzymejl_parmtype_ref"="2" %"'1", { i8*, i8*, {} addrspace(10)*, {} addrspace(10)*, i64, i1, i64, i64, i64*, i64*, float*, i64*, i64, i32*, i64, i64, i1, i1, i64*, i1*, float*, i64*, i1*, i1**, i1**, i64*, i1*, float*, i64*, i64, i64, i1, i1, i64*, float*, i64*, i64*, i64* } %tapeArg) unnamed_addr #74 !dbg !4393 {
top:
  %_replacementA18 = phi i8* 
  %_replacementA17 = phi { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* 
  %.sub_replacementA = phi i8* 
  %2 = call {}*** @julia.get_pgcstack() #78
  %current_task1530_replacementA = phi {}*** 
  %current_task1_replacementA = phi {}** 
  %ptls_field531_replacementA = phi {}*** 
  %_replacementA16 = phi i64*** 
  %ptls_load532533_replacementA = phi i64** 
  %_replacementA15 = phi i64** 
  %safepoint_replacementA = phi i64* 
  %_replacementA14 = phi {} addrspace(11)* , !dbg !4394
  %"'ipc29" = addrspacecast {} addrspace(10)* %"'" to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !4394
  %_replacementA13 = phi { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* , !dbg !4394
  %arraylen_ptr_replacementA = phi i64 addrspace(11)* , !dbg !4394
  %arraylen_replacementA = phi i64 , !dbg !4394
  %_replacementA12 = phi {} addrspace(11)* , !dbg !4407
  %_replacementA11 = phi { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* , !dbg !4407
  %arraylen_ptr2_replacementA = phi i64 addrspace(11)* , !dbg !4407
  %arraylen3 = load i64, i64 addrspace(11)* %arraylen_ptr2_replacementA, align 8, !dbg !4407, !tbaa !250, !range !253, !alias.scope !4410, !noalias !4413
  %_replacementA = phi i64 , !dbg !4415
  %3 = icmp ult i64 %_replacementA, 100001, !dbg !4418
  br i1 %3, label %L811, label %L7, !dbg !4406

L7:                                               ; preds = %top
  %_replacementA21 = phi {} addrspace(10)* addrspace(11)* , !dbg !4420
  %arraysize_ptr_replacementA = phi {} addrspace(10)* addrspace(11)* , !dbg !4420
  %_replacementA20 = phi i64 addrspace(11)* , !dbg !4420
  %arraysize = load i64, i64 addrspace(11)* %_replacementA20, align 8, !dbg !4420, !tbaa !68, !range !253, !alias.scope !4425, !noalias !4428
  store i64 %arraysize, i64* %arraysize_cache, align 8, !dbg !4420, !tbaa !68, !invariant.group !4430
  %arraysize_ptr4_replacementA = phi {} addrspace(10)* addrspace(11)* , !dbg !4420
  %_replacementA19 = phi i64 addrspace(11)* , !dbg !4420
  %arraysize5 = load i64, i64 addrspace(11)* %_replacementA19, align 16, !dbg !4420, !tbaa !68, !range !253, !alias.scope !4425, !noalias !4428
  store i64 %arraysize5, i64* %arraysize5_cache, align 8, !dbg !4431, !tbaa !68, !invariant.group !4437
  %4 = icmp eq i64 %arraylen3, %arraysize, !dbg !4431
  %5 = icmp eq i64 %arraysize, 1, !dbg !4433
  %value_phi = or i1 %4, %5, !dbg !4433
  br i1 %value_phi, label %L40, label %L28, !dbg !4434

L28:                                              ; preds = %L7
  %.not559_replacementA = phi i1 , !dbg !4433
  br i1 %.not559_replacementA, label %L40, label %L36, !dbg !4434

L36:                                              ; preds = %L28
  %_replacementA27 = phi {} addrspace(10)* , !dbg !4434
  %_replacementA26 = phi {} addrspace(10)* , !dbg !4434
  %_replacementA25 = phi {} addrspace(10)* , !dbg !4434
  %box_replacementA = phi {} addrspace(10)* , !dbg !4434
  %_replacementA24 = phi [1 x {} addrspace(10)*] addrspace(10)* , !dbg !4434
  %_replacementA23 = phi {} addrspace(10)* addrspace(10)* , !dbg !4434
  %_replacementA22 = phi {} addrspace(12)* , !dbg !4434
  unreachable

L40:                                              ; preds = %L28, %L7
  %.sroa.0449.0 = phi i64 [ %arraylen3, %L7 ], [ %arraysize, %L28 ]
  %6 = call i64 @julia_nthreads_2651() #78, !dbg !4438
  %.not = icmp eq i64 %6, 1, !dbg !4440
  br i1 %.not, label %L62, label %L221, !dbg !4441

L62:                                              ; preds = %L40
  %7 = icmp ne i64 %.sroa.0449.0, 0, !dbg !4442
  %8 = icmp ne i64 %arraysize5, 0, !dbg !4442
  %.demorgan = and i1 %8, %7, !dbg !4446
  br i1 %.demorgan, label %guard_exit374, label %L1042, !dbg !4446

L86:                                              ; preds = %guard_exit379, %guard_exit374
  %iv17 = phi i64 [ %iv.next18, %guard_exit379 ], [ 0, %guard_exit374 ], !dbg !4447
  %arraylen64 = phi i64 [ %arraylen3, %guard_exit374 ], [ %arraylen64.pre, %guard_exit379 ], !dbg !4447
  %9 = phi {} addrspace(10)* [ %"'ipc49", %guard_exit374 ], [ %"'ipc52", %guard_exit379 ], !dbg !4456
  %nodecayed.arrayptr_replacementA = phi {} addrspace(10)* , !dbg !4456
  %arraysize54 = phi i64 [ %arraysize5, %guard_exit374 ], [ %arraysize54.pre, %guard_exit379 ], !dbg !4459
  %arraysize62 = phi i64 [ %arraysize, %guard_exit374 ], [ %arraysize72, %guard_exit379 ], !dbg !4456
  %value_phi40 = phi i64 [ 1, %guard_exit374 ], [ %value_phi77572, %guard_exit379 ]
  %value_phi41 = phi i64 [ 1, %guard_exit374 ], [ %value_phi78573, %guard_exit379 ]
  %iv.next18 = add nuw nsw i64 %iv17, 1, !dbg !4462
  %10 = load i64*, i64** %arraysize54.pre_cache, align 8, !dbg !4462
  %11 = bitcast i64* %10 to i8*, !dbg !4462
  %arraysize54.pre_realloccache = call i8* @__enzyme_exponentialallocationzero(i8* %11, i64 %iv.next18, i64 8), !dbg !4462
  %12 = bitcast i8* %arraysize54.pre_realloccache to i64*, !dbg !4462
  store i64* %12, i64** %arraysize54.pre_cache, align 8, !dbg !4462
  %13 = load i64*, i64** %arraylen64.pre_cache, align 8, !dbg !4462
  %14 = bitcast i64* %13 to i8*, !dbg !4462
  %arraylen64.pre_realloccache = call i8* @__enzyme_exponentialallocationzero(i8* %14, i64 %iv.next18, i64 8), !dbg !4462
  %15 = bitcast i8* %arraylen64.pre_realloccache to i64*, !dbg !4462
  store i64* %15, i64** %arraylen64.pre_cache, align 8, !dbg !4462
  %16 = load float*, float** %_cache, align 8, !dbg !4462
  %17 = bitcast float* %16 to i8*, !dbg !4462
  %_realloccache = call i8* @__enzyme_exponentialallocationzero(i8* %17, i64 %iv.next18, i64 4), !dbg !4462
  %18 = bitcast i8* %_realloccache to float*, !dbg !4462
  store float* %18, float** %_cache, align 4, !dbg !4462
  %19 = load i64*, i64** %value_phi40_cache, align 8, !dbg !4462
  %20 = bitcast i64* %19 to i8*, !dbg !4462
  %value_phi40_realloccache = call i8* @__enzyme_exponentialallocationzero(i8* %20, i64 %iv.next18, i64 8), !dbg !4462
  %21 = bitcast i8* %value_phi40_realloccache to i64*, !dbg !4462
  store i64* %21, i64** %value_phi40_cache, align 8, !dbg !4462
  %22 = load i64*, i64** %value_phi40_cache, align 8, !dbg !4462, !dereferenceable !880, !invariant.group !4464
  %23 = getelementptr inbounds i64, i64* %22, i64 %iv17, !dbg !4462
  store i64 %value_phi40, i64* %23, align 8, !dbg !4462, !invariant.group !4465
  %24 = load i64*, i64** %value_phi41_cache, align 8, !dbg !4462
  %25 = bitcast i64* %24 to i8*, !dbg !4462
  %value_phi41_realloccache = call i8* @__enzyme_exponentialallocationzero(i8* %25, i64 %iv.next18, i64 8), !dbg !4462
  %26 = bitcast i8* %value_phi41_realloccache to i64*, !dbg !4462
  store i64* %26, i64** %value_phi41_cache, align 8, !dbg !4462
  %27 = load i64*, i64** %value_phi41_cache, align 8, !dbg !4462, !dereferenceable !880, !invariant.group !4466
  %28 = getelementptr inbounds i64, i64* %27, i64 %iv17, !dbg !4462
  store i64 %value_phi41, i64* %28, align 8, !dbg !4462, !invariant.group !4467
  %29 = load i64*, i64** %arraysize72_cache, align 8, !dbg !4462
  %30 = bitcast i64* %29 to i8*, !dbg !4462
  %arraysize72_realloccache = call i8* @__enzyme_exponentialallocationzero(i8* %30, i64 %iv.next18, i64 8), !dbg !4462
  %31 = bitcast i8* %arraysize72_realloccache to i64*, !dbg !4462
  store i64* %31, i64** %arraysize72_cache, align 8, !dbg !4462
  %"'ipc53" = bitcast {} addrspace(10)* %9 to i8 addrspace(13)* addrspace(10)*, !dbg !4462
  %_replacementA67 = phi i8 addrspace(13)* addrspace(10)* , !dbg !4462
  %"'ipc54" = addrspacecast i8 addrspace(13)* addrspace(10)* %"'ipc53" to i8 addrspace(13)* addrspace(11)*, !dbg !4462
  %_replacementA66 = phi i8 addrspace(13)* addrspace(11)* , !dbg !4462
  %"'ipl" = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %"'ipc54", align 8, !dbg !4462, !alias.scope !4468, !noalias !4471
  %_replacementA65 = phi i8 addrspace(13)* , !dbg !4462
  %.not534 = icmp eq i64 %arraysize62, 1, !dbg !4462
  %.not535 = icmp eq i64 %arraysize54, 1, !dbg !4473
  %value_phi40.op = add i64 %value_phi40, -1, !dbg !4456
  %32 = select i1 %.not534, i64 0, i64 %value_phi40.op, !dbg !4456
  %value_phi41.op = add i64 %value_phi41, -1, !dbg !4456
  %33 = select i1 %.not535, i64 0, i64 %value_phi41.op, !dbg !4456
  %34 = mul i64 %33, %arraysize62, !dbg !4456
  %35 = add i64 %34, %32, !dbg !4456
  %"'ipc45" = bitcast i8 addrspace(13)* %"'ipl" to float addrspace(13)*, !dbg !4456
  %_replacementA64 = phi float addrspace(13)* , !dbg !4456
  %"'ipg46" = getelementptr inbounds float, float addrspace(13)* %"'ipc45", i64 %35, !dbg !4456
  %_replacementA63 = phi float addrspace(13)* , !dbg !4456
  %arrayref_replacementA = phi float , !dbg !4456
  %.not536 = icmp eq i64 %arraylen64, 1, !dbg !4475
  %36 = select i1 %.not536, i64 0, i64 %value_phi40.op, !dbg !4477
  %"arrayptr69538'ipl" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc40", align 16, !dbg !4477, !tbaa !271, !alias.scope !4479, !noalias !4480, !nonnull !63
  %arrayptr69538_replacementA = phi float addrspace(13)* , !dbg !4477
  %"'ipg39" = getelementptr inbounds float, float addrspace(13)* %"arrayptr69538'ipl", i64 %36, !dbg !4477
  %_replacementA44 = phi float addrspace(13)* , !dbg !4477
  %arrayref70_replacementA = phi float , !dbg !4477
  %37 = fadd float %arrayref_replacementA, %arrayref70_replacementA, !dbg !4481
  %_replacementA37 = phi float , !dbg !4483
  %arraysize72 = load i64, i64 addrspace(11)* %_replacementA20, align 8, !dbg !4488, !tbaa !68, !range !253, !alias.scope !4425, !noalias !4428
  %38 = mul i64 %arraysize72, %value_phi41.op, !dbg !4488
  %39 = add i64 %38, %value_phi40.op, !dbg !4488
  %"arrayptr75'ipl" = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %"arrayptr_ptr.phi.trans.insert'ipg", align 16, !dbg !4488, !tbaa !68, !alias.scope !4490, !noalias !4491, !nonnull !63
  %arrayptr75_replacementA = phi i8 addrspace(13)* , !dbg !4488
  %"'ipc" = bitcast i8 addrspace(13)* %"arrayptr75'ipl" to float addrspace(13)*, !dbg !4488
  %_replacementA35 = phi float addrspace(13)* , !dbg !4488
  %"'ipg" = getelementptr inbounds float, float addrspace(13)* %"'ipc", i64 %39, !dbg !4488
  %_replacementA34 = phi float addrspace(13)* , !dbg !4488
  %40 = load i64*, i64** %arraysize72_cache, align 8, !dbg !4492, !dereferenceable !880, !invariant.group !4495
  %41 = getelementptr inbounds i64, i64* %40, i64 %iv17, !dbg !4492
  store i64 %arraysize72, i64* %41, align 8, !dbg !4492, !tbaa !68, !invariant.group !4496
  %42 = load float*, float** %_cache, align 8, !dbg !4492, !dereferenceable !880, !invariant.group !4497
  %43 = getelementptr inbounds float, float* %42, i64 %iv17, !dbg !4492
  store float %37, float* %43, align 4, !dbg !4492, !invariant.group !4498
  %44 = add i64 %value_phi40, 1, !dbg !4492
  %45 = icmp ugt i64 %value_phi40, 9223372036854775806, !dbg !4499
  %46 = icmp sgt i64 %44, %.sroa.0449.0, !dbg !4499
  %47 = or i1 %45, %46, !dbg !4502
  %48 = icmp eq i64 %value_phi40, %.sroa.0449.0
  %or.cond = or i1 %48, %47, !dbg !4502
  br i1 %or.cond, label %L194, label %guard_exit379, !dbg !4502

L194:                                             ; preds = %L86
  %49 = add i64 %value_phi41, 1, !dbg !4503
  %50 = icmp ult i64 %value_phi41, 9223372036854775807, !dbg !4506
  %51 = icmp sle i64 %49, %arraysize5, !dbg !4506
  %52 = and i1 %50, %51, !dbg !4510
  %53 = icmp ne i64 %value_phi41, %arraysize5, !dbg !4509
  %value_phi95 = and i1 %53, %52, !dbg !4509
  br i1 %value_phi95, label %guard_exit379, label %L1042.loopexit1, !dbg !4446

L221:                                             ; preds = %L40
  %.not540 = icmp eq i64 %arraysize5, 0, !dbg !4511
  br i1 %.not540, label %L1042, label %L227, !dbg !4513

L227:                                             ; preds = %L221
  %54 = call i64 @llvm.smin.i64(i64 %6, i64 %arraysize5) #78, !dbg !4515
  %.not541 = icmp eq i64 %54, 0, !dbg !4517
  br i1 %.not541, label %L691.lr.ph, label %L235, !dbg !4518

L235:                                             ; preds = %L227
  %55 = trunc i64 %54 to i32, !dbg !4519
  %56 = add i32 %55, -1, !dbg !4519
  %_replacementA68 = phi {}* , !dbg !4523
  %57 = icmp sgt i32 %56, 0, !dbg !4525
  br i1 %57, label %L245, label %L691.lr.ph, !dbg !4526

L245:                                             ; preds = %L235
  %p.i_replacementA = phi i64* , !dbg !4528
  %v.i_replacementA = phi i64 , !dbg !4528
  %58 = call i64 @llvm.ctpop.i64(i64 %v.i_replacementA) #78, !dbg !4531, !range !2055
  %59 = trunc i64 %58 to i32, !dbg !4533
  %60 = sub nsw i32 %56, %59, !dbg !4534
  %61 = icmp slt i32 %60, 0, !dbg !4536
  br i1 %61, label %L258, label %L293, !dbg !4539

L258:                                             ; preds = %L245
  %_replacementA70 = phi i64 , !dbg !4540
  %_replacementA69 = phi i32 , !dbg !4542
  br label %L261, !dbg !4543

L261:                                             ; preds = %L261, %L258
  %iv = phi i64 [ %iv.next, %L261 ], [ 0, %L258 ]
  %value_phi239_replacementA = phi i32 
  %value_phi240_replacementA = phi i32 
  %value_phi241_replacementA = phi i64 
  %iv.next = add nuw nsw i64 %iv, 1, !dbg !4548
  %_replacementA79 = phi i32 , !dbg !4548
  %_replacementA78 = phi i32 , !dbg !4550
  %_replacementA77 = phi i64 , !dbg !4552
  %_replacementA76 = phi i1 , !dbg !4552
  %notmask_replacementA = phi i64 , !dbg !4550
  %.op_replacementA = phi i64 , !dbg !4550
  %_replacementA75 = phi i64 , !dbg !4550
  %_replacementA74 = phi i64 , !dbg !4553
  %_replacementA73 = phi i64 , !dbg !4555
  %_replacementA72 = phi i64 , !dbg !4556
  %_replacementA71 = phi i32 , !dbg !4558
  %62 = add i32 %value_phi240_replacementA, %_replacementA71, !dbg !4559
  %.not558 = icmp eq i32 %62, 0, !dbg !4560
  br i1 %.not558, label %L282, label %L261, !dbg !4561

L282:                                             ; preds = %L261
  %_replacementA81 = phi i64 , !dbg !4562
  %_replacementA80 = phi i64 , !dbg !4564
  br label %L293, !dbg !4565

L293:                                             ; preds = %L282, %L245
  %value_phi155 = phi i32 [ %56, %L282 ], [ %59, %L245 ]
  %value_phi156 = phi i64 [ %_replacementA74, %L282 ], [ %v.i_replacementA, %L245 ]
  %63 = icmp sgt i32 %value_phi155, 0, !dbg !4568
  br i1 %63, label %L361.lr.ph, label %L691.lr.ph, !dbg !4569

L361.lr.ph:                                       ; preds = %L293
  %64 = zext i32 %value_phi155 to i64, !dbg !4570
  %65 = add nuw nsw i64 %64, 1, !dbg !4587
  %66 = udiv i64 %arraysize5, %65, !dbg !4589
  %67 = mul i64 %66, %65, !dbg !4590
  %68 = sub i64 %arraysize5, %67, !dbg !4592
  %69 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %_replacementA14) #79, !dbg !4593
  %"'ip_phi" = phi {}* , !dbg !4593
  %70 = bitcast {}* %69 to i8**, !dbg !4593
  %arrayptr159 = load i8*, i8** %70, align 8, !dbg !4593, !tbaa !68, !alias.scope !336, !noalias !337, !nonnull !63
  %"arrayptr159'il_phi" = phi i8* , !dbg !4593
  %71 = ptrtoint i8* %arrayptr159 to i64, !dbg !4593
  %arraysize161 = load i64, i64 addrspace(11)* %_replacementA20, align 8, !dbg !4601, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %arraysize163 = load i64, i64 addrspace(11)* %_replacementA19, align 16, !dbg !4601, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %72 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %_replacementA12) #79, !dbg !4607
  %"'ip_phi2" = phi {}* , !dbg !4607
  %73 = bitcast {}* %72 to i8**, !dbg !4607
  %arrayptr179 = load i8*, i8** %73, align 8, !dbg !4607, !tbaa !271, !alias.scope !254, !noalias !255, !nonnull !63
  %"arrayptr179'il_phi" = phi i8* , !dbg !4607
  %74 = ptrtoint i8* %arrayptr179 to i64, !dbg !4607
  %arraylen181 = load i64, i64 addrspace(11)* %arraylen_ptr2_replacementA, align 8, !dbg !4617, !tbaa !250, !range !253, !alias.scope !254, !noalias !255
  %75 = insertvalue [2 x {} addrspace(10)*] zeroinitializer, {} addrspace(10)* %0, 0, !dbg !4623
  %76 = insertvalue [2 x {} addrspace(10)*] %75, {} addrspace(10)* %1, 1, !dbg !4623
  %newstruct187.sroa.0.0..sroa_idx = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %_replacementA17, i64 0, i32 0, i64 0, i64 0, i64 0, !dbg !4624
  store i64 %.sroa.0449.0, i64* %newstruct187.sroa.0.0..sroa_idx, align 16, !dbg !4624, !tbaa !682, !alias.scope !2185, !noalias !4625
  %newstruct187.sroa.2.sroa.0.0.newstruct187.sroa.2.0..sroa_cast.sroa_idx = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %_replacementA17, i64 0, i32 1, i32 0, !dbg !4624
  store i64 %71, i64* %newstruct187.sroa.2.sroa.0.0.newstruct187.sroa.2.0..sroa_cast.sroa_idx, align 8, !dbg !4624, !tbaa !682, !alias.scope !2185, !noalias !4625
  %newstruct187.sroa.2.sroa.2.0.newstruct187.sroa.2.0..sroa_cast.sroa_idx415 = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %_replacementA17, i64 0, i32 1, i32 1, i64 0, !dbg !4624
  store i64 %arraysize161, i64* %newstruct187.sroa.2.sroa.2.0.newstruct187.sroa.2.0..sroa_cast.sroa_idx415, align 16, !dbg !4624, !tbaa !682, !alias.scope !2185, !noalias !4625
  %newstruct187.sroa.2.sroa.3.0.newstruct187.sroa.2.0..sroa_cast.sroa_idx416 = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %_replacementA17, i64 0, i32 1, i32 1, i64 1, !dbg !4624
  store i64 %arraysize163, i64* %newstruct187.sroa.2.sroa.3.0.newstruct187.sroa.2.0..sroa_cast.sroa_idx416, align 8, !dbg !4624, !tbaa !682, !alias.scope !2185, !noalias !4625
  %newstruct187.sroa.3.sroa.0.sroa.0.0.newstruct187.sroa.3.sroa.0.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %_replacementA17, i64 0, i32 2, i32 0, i32 0, i32 0, !dbg !4624
  store i64 %71, i64* %newstruct187.sroa.3.sroa.0.sroa.0.0.newstruct187.sroa.3.sroa.0.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx, align 16, !dbg !4624, !tbaa !682, !alias.scope !2185, !noalias !4625
  %newstruct187.sroa.3.sroa.0.sroa.2.0.newstruct187.sroa.3.sroa.0.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx411 = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %_replacementA17, i64 0, i32 2, i32 0, i32 0, i32 1, i64 0, !dbg !4624
  store i64 %arraysize161, i64* %newstruct187.sroa.3.sroa.0.sroa.2.0.newstruct187.sroa.3.sroa.0.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx411, align 8, !dbg !4624, !tbaa !682, !alias.scope !2185, !noalias !4625
  %newstruct187.sroa.3.sroa.0.sroa.3.0.newstruct187.sroa.3.sroa.0.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx412 = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %_replacementA17, i64 0, i32 2, i32 0, i32 0, i32 1, i64 1, !dbg !4624
  store i64 %arraysize163, i64* %newstruct187.sroa.3.sroa.0.sroa.3.0.newstruct187.sroa.3.sroa.0.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx412, align 16, !dbg !4624, !tbaa !682, !alias.scope !2185, !noalias !4625
  %newstruct187.sroa.3.sroa.2.0.newstruct187.sroa.3.0..sroa_cast.sroa_idx405 = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %_replacementA17, i64 0, i32 2, i32 0, i32 1, i32 0, !dbg !4624
  store i64 %74, i64* %newstruct187.sroa.3.sroa.2.0.newstruct187.sroa.3.0..sroa_cast.sroa_idx405, align 8, !dbg !4624, !tbaa !682, !alias.scope !2185, !noalias !4625
  %newstruct187.sroa.3.sroa.3.0.newstruct187.sroa.3.0..sroa_cast.sroa_idx406 = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %_replacementA17, i64 0, i32 2, i32 0, i32 1, i32 1, i64 0, !dbg !4624
  store i64 %arraylen181, i64* %newstruct187.sroa.3.sroa.3.0.newstruct187.sroa.3.0..sroa_cast.sroa_idx406, align 16, !dbg !4624, !tbaa !682, !alias.scope !2185, !noalias !4625
  %newstruct187.sroa.3.sroa.4.sroa.0.0.newstruct187.sroa.3.sroa.4.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %_replacementA17, i64 0, i32 2, i32 1, i64 0, i64 0, !dbg !4624
  store i64 %.sroa.0449.0, i64* %newstruct187.sroa.3.sroa.4.sroa.0.0.newstruct187.sroa.3.sroa.4.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx, align 8, !dbg !4624, !tbaa !682, !alias.scope !2185, !noalias !4625
  %newstruct187.sroa.3.sroa.4.sroa.2.0.newstruct187.sroa.3.sroa.4.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx446 = getelementptr inbounds { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %_replacementA17, i64 0, i32 2, i32 1, i64 1, i64 0, !dbg !4624
  store i64 %arraysize5, i64* %newstruct187.sroa.3.sroa.4.sroa.2.0.newstruct187.sroa.3.sroa.4.0.newstruct187.sroa.3.0..sroa_cast.sroa_cast.sroa_idx446, align 16, !dbg !4624, !tbaa !682, !alias.scope !2185, !noalias !4625
  %77 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %0, [2 x {} addrspace(10)*] %76) #78, !dbg !4600
  %"'ip" = call token (...) @llvm.julia.gc_preserve_begin(), !dbg !4600
  %78 = icmp sgt i64 %68, -1
  %79 = add nsw i64 %64, -1, !dbg !4628
  br label %L361, !dbg !4628

L427.preheader:                                   ; preds = %L415
  %.lcssa = phi i1 [ %94, %L415 ], !dbg !4629
  %value_phi193592.lcssa = phi i64 [ %value_phi193592, %L415 ]
  %value_phi201587 = add i64 %97, 1, !dbg !4633
  %.not549588 = icmp sgt i64 %value_phi201587, %arraysize5, !dbg !4634
  br i1 %.not549588, label %L641.preheader, label %L430.lr.ph, !dbg !4635

L430.lr.ph:                                       ; preds = %L427.preheader
  %80 = icmp eq i64 %.sroa.0449.0, 0
  %.not551 = icmp eq i64 %arraysize161, 1
  %.not552 = icmp eq i64 %arraysize163, 1
  %.not553 = icmp eq i64 %arraylen181, 1
  %81 = add nsw i64 %.sroa.0449.0, -1, !dbg !4635
  %umin600 = call i64 @llvm.umin.i64(i64 %81, i64 noundef 9223372036854775806) #78, !dbg !4635
  %82 = add nuw nsw i64 %umin600, 1
  %83 = add i64 %66, %value_phi193592.lcssa, !dbg !4636
  %umin = call i1 @llvm.umin.i1(i1 %.lcssa, i1 %78), !dbg !4635
  %84 = zext i1 %umin to i64, !dbg !4635
  %85 = add i64 %83, %84, !dbg !4636
  %86 = add i64 %arraysize5, -1, !dbg !4635
  %87 = sub i64 %86, %66, !dbg !4635
  %88 = sub i64 %87, %value_phi193592.lcssa, !dbg !4635
  %umin4 = call i1 @llvm.umin.i1(i1 %.lcssa, i1 %78), !dbg !4635
  %89 = zext i1 %umin4 to i64, !dbg !4635
  %90 = sub i64 %88, %89, !dbg !4635
  br label %L430, !dbg !4635

L361:                                             ; preds = %L415, %L361.lr.ph
  %iv3 = phi i64 [ %iv.next4, %L415 ], [ 0, %L361.lr.ph ]
  %value_phi195594 = phi i64 [ %value_phi156, %L361.lr.ph ], [ %103, %L415 ]
  %value_phi193592 = phi i64 [ 0, %L361.lr.ph ], [ %97, %L415 ]
  %value_phi192591 = phi i32 [ 0, %L361.lr.ph ], [ %99, %L415 ]
  %iv.next4 = add nuw nsw i64 %iv3, 1, !dbg !4637
  %91 = icmp ne i64 %value_phi195594, 0, !dbg !4637
  call void @llvm.assume(i1 noundef %91) #78, !dbg !4640
  %92 = call i64 @llvm.cttz.i64(i64 %value_phi195594, i1 noundef true) #78, !dbg !4641, !range !2055
  %93 = trunc i64 %92 to i32, !dbg !4643
  %94 = icmp ugt i64 %68, %iv3, !dbg !4629
  %not.ifelse_cond196 = and i1 %78, %94, !dbg !4644
  %95 = zext i1 %not.ifelse_cond196 to i64, !dbg !4644
  %96 = add i64 %value_phi193592, %66, !dbg !4644
  %97 = add i64 %96, %95, !dbg !4645
  %98 = add nuw nsw i32 %93, 1, !dbg !4646
  %99 = add i32 %98, %value_phi192591, !dbg !4648
  %100 = zext i32 %98 to i64, !dbg !4650
  %101 = lshr i64 %value_phi195594, %100, !dbg !4650
  %102 = icmp eq i32 %93, 63, !dbg !4650
  %103 = select i1 %102, i64 0, i64 %101, !dbg !4650
  %104 = load i64, i64* inttoptr (i64 137517345406912 to i64*), align 64, !dbg !4652, !tbaa !131, !alias.scope !83, !noalias !86
  %"'il_phi3" = phi i64 , !dbg !4658
  %105 = shl i32 %99, 9, !dbg !4658
  %106 = zext i32 %105 to i64, !dbg !4659
  %107 = inttoptr i64 %104 to i8*, !dbg !4663
  %108 = getelementptr i8, i8* %107, i64 %106, !dbg !4663
  %109 = getelementptr i8, i8* %108, i64 8, !dbg !4664
  %coercion = bitcast i8* %109 to i64*, !dbg !4670
  store i64 ptrtoint (void (i64)* @jlcapi_BatchClosure_2456 to i64), i64* %coercion, align 1, !dbg !4670, !tbaa !81, !alias.scope !83, !noalias !4674
  %110 = getelementptr i8, i8* %108, i64 16, !dbg !4675
  %111 = bitcast i8* %110 to { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }**, !dbg !4679
  store { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }* %_replacementA17, { [1 x [1 x [1 x i64]]], { i64, [2 x i64] }, { { { i64, [2 x i64] }, { i64, [1 x i64] } }, [2 x [1 x i64]] } }** %111, align 1, !dbg !4679, !tbaa !81, !alias.scope !83, !noalias !4674
  %112 = getelementptr i8, i8* %108, i64 24, !dbg !4683
  %coercion198 = bitcast i8* %112 to i64*, !dbg !4687
  store i64 %value_phi193592, i64* %coercion198, align 1, !dbg !4687, !tbaa !81, !alias.scope !83, !noalias !4674
  %113 = getelementptr i8, i8* %108, i64 32, !dbg !4691
  %coercion199 = bitcast i8* %113 to i64*, !dbg !4695
  store i64 %97, i64* %coercion199, align 1, !dbg !4695, !tbaa !81, !alias.scope !83, !noalias !4674
  %p.i386 = bitcast i8* %108 to i32*, !dbg !4699
  %v.i387 = atomicrmw xchg i32* %p.i386, i32 0 acq_rel, align 4, !dbg !4699
  %.not548 = icmp eq i32 %v.i387, 1, !dbg !4702
  br i1 %.not548, label %L412, label %L415, !dbg !4703

L412:                                             ; preds = %L361
  call fastcc void @julia_wake_thread__2634(i32 zeroext %99) #78, !dbg !4703
  br label %L415, !dbg !4703

L415:                                             ; preds = %L412, %L361
  %114 = icmp eq i64 %iv.next4, %64, !dbg !4704
  br i1 %114, label %L427.preheader, label %L361, !dbg !4628

L641.preheader.loopexit:                          ; preds = %L622
  br label %L641.preheader, !dbg !4706

L641.preheader:                                   ; preds = %L641.preheader.loopexit, %L427.preheader
  %115 = icmp eq i64 %value_phi156, 0, !dbg !4706
  br i1 %115, label %L678, label %L646.preheader, !dbg !4708

L646.preheader:                                   ; preds = %L641.preheader
  br label %L646, !dbg !4709

L430:                                             ; preds = %L622, %L430.lr.ph
  %iv5 = phi i64 [ %iv.next6, %L622 ], [ 0, %L430.lr.ph ]
  %iv.next6 = add nuw nsw i64 %iv5, 1, !dbg !4636
  %116 = add i64 %85, %iv5, !dbg !4636
  %117 = add i64 %value_phi201587, %iv5, !dbg !4636
  br i1 %80, label %L622, label %L442.preheader, !dbg !4636

L442.preheader:                                   ; preds = %L430
  %118 = select i1 %.not552, i64 0, i64 %116
  %119 = mul i64 %118, %arraysize161
  %120 = mul i64 %116, %arraysize161
  br label %L442, !dbg !4544

L442:                                             ; preds = %L442, %L442.preheader
  %iv7 = phi i64 [ %iv.next8, %L442 ], [ 0, %L442.preheader ]
  %iv.next8 = add nuw nsw i64 %iv7, 1, !dbg !4712
  %121 = select i1 %.not551, i64 1, i64 %iv.next8, !dbg !4712
  %122 = add i64 %121, %119, !dbg !4721
  %123 = shl i64 %122, 2, !dbg !4729
  %124 = add i64 %123, -4, !dbg !4729
  %125 = getelementptr i8, i8* %arrayptr159, i64 %124, !dbg !4732
  %coercion216 = bitcast i8* %125 to float*, !dbg !4733
  %pointerref = load float, float* %coercion216, align 1, !dbg !4733, !tbaa !81, !alias.scope !83, !noalias !86
  %value_phi206.op = shl i64 %iv.next8, 2, !dbg !4737
  %value_phi206.op.op = add i64 %value_phi206.op, -4, !dbg !4737
  %126 = select i1 %.not553, i64 0, i64 %value_phi206.op.op, !dbg !4737
  %127 = getelementptr i8, i8* %arrayptr179, i64 %126, !dbg !4744
  %coercion219 = bitcast i8* %127 to float*, !dbg !4745
  %pointerref220 = load float, float* %coercion219, align 1, !dbg !4745, !tbaa !81, !alias.scope !83, !noalias !86
  %128 = fadd float %pointerref, %pointerref220, !dbg !4749
  call void @llvm.lifetime.end.p0i8(i64 noundef 88, i8* noundef nonnull %.sub_replacementA) #78
  %129 = call fastcc float @julia_gelu_2643(float %128) #78, !dbg !4751
  %130 = add i64 %iv.next8, %120, !dbg !4756
  %131 = shl i64 %130, 2, !dbg !4764
  %132 = add i64 %131, -4, !dbg !4764
  %133 = getelementptr i8, i8* %arrayptr159, i64 %132, !dbg !4767
  %coercion222 = bitcast i8* %133 to float*, !dbg !4768
  store float %129, float* %coercion222, align 1, !dbg !4768, !tbaa !81, !alias.scope !83, !noalias !4674
  %134 = add nuw nsw i64 %iv.next8, 1, !dbg !4772
  %exitcond601.not = icmp eq i64 %iv.next8, %82, !dbg !4775
  br i1 %exitcond601.not, label %L622.loopexit, label %L442, !dbg !4544

L622.loopexit:                                    ; preds = %L442
  br label %L622, !dbg !4633

L622:                                             ; preds = %L622.loopexit, %L430
  %value_phi201 = add i64 %117, 1, !dbg !4633
  %exitcond602 = icmp eq i64 %117, %arraysize5, !dbg !4634
  br i1 %exitcond602, label %L641.preheader.loopexit, label %L430, !dbg !4635

L646:                                             ; preds = %L676, %L646.preheader
  %iv9 = phi i64 [ 0, %L646.preheader ], [ %iv.next10, %L676 ]
  %value_phi236586 = phi i64 [ %139, %L676 ], [ %value_phi156, %L646.preheader ]
  %value_phi235585 = phi i32 [ %141, %L676 ], [ 0, %L646.preheader ]
  %iv.next10 = add nuw nsw i64 %iv9, 1, !dbg !4776
  %135 = call i64 @llvm.cttz.i64(i64 %value_phi236586, i1 noundef true) #78, !dbg !4776, !range !2055
  %136 = trunc i64 %135 to i32, !dbg !4778
  %137 = add nuw nsw i32 %136, 1, !dbg !4779
  %138 = zext i32 %137 to i64, !dbg !4781
  %139 = lshr i64 %value_phi236586, %138, !dbg !4781
  %140 = icmp eq i32 %136, 63, !dbg !4781
  %141 = add i32 %137, %value_phi235585, !dbg !4783
  %142 = load i64, i64* inttoptr (i64 137517345406912 to i64*), align 64, !dbg !4785, !tbaa !131, !alias.scope !83, !noalias !86
  %"'il_phi5" = phi i64 , !dbg !4788
  %143 = shl i32 %141, 9, !dbg !4788
  %144 = zext i32 %143 to i64, !dbg !4789
  %145 = inttoptr i64 %142 to i8*, !dbg !4793
  %146 = getelementptr i8, i8* %145, i64 %144, !dbg !4793
  %p.i388 = bitcast i8* %146 to i32*, !dbg !4794
  %v.i389582 = load atomic i32, i32* %p.i388 acquire, align 16, !dbg !4794
  %"v.i389582'il_phi" = phi i32 , !dbg !4796
  %.not556583 = icmp eq i32 %v.i389582, 0, !dbg !4796
  br i1 %.not556583, label %L666.preheader, label %L676, !dbg !4709

L666.preheader:                                   ; preds = %L646
  br label %L666, !dbg !4797

L666:                                             ; preds = %L673, %L666.preheader
  %iv11 = phi i64 [ 0, %L666.preheader ], [ %iv.next12, %L673 ]
  %iv.next12 = add nuw nsw i64 %iv11, 1
  %147 = trunc i64 %iv11 to i32
  call void @llvm.lifetime.end.p0i8(i64 noundef 88, i8* noundef nonnull %.sub_replacementA) #78
  call void asm sideeffect "pause", "~{memory}"() #80, !dbg !4798
  %148 = add i32 %147, 1, !dbg !4800
  %149 = icmp ult i32 %148, 65537, !dbg !4801
  br i1 %149, label %L673, label %L670, !dbg !4797

L670:                                             ; preds = %L666
  %150 = call fastcc i8 @julia_checktask_2476(i32 zeroext %141) #78, !dbg !4803
  %151 = and i8 %150, 1, !dbg !4803
  %.not557 = icmp eq i8 %151, 0, !dbg !4803
  br i1 %.not557, label %L673, label %L676.loopexit, !dbg !4803

L673:                                             ; preds = %L670, %L666
  %v.i389 = load atomic i32, i32* %p.i388 acquire, align 16, !dbg !4794
  %"v.i389'il_phi" = phi i32 , !dbg !4796
  %.not556 = icmp eq i32 %v.i389, 0, !dbg !4796
  br i1 %.not556, label %L666, label %L676.loopexit, !dbg !4709

L676.loopexit:                                    ; preds = %L673, %L670
  br label %L676, !dbg !4706

L676:                                             ; preds = %L676.loopexit, %L646
  %152 = icmp eq i64 %139, 0, !dbg !4706
  %153 = select i1 %140, i1 true, i1 %152, !dbg !4706
  br i1 %153, label %L678.loopexit, label %L646, !dbg !4708

L678.loopexit:                                    ; preds = %L676
  br label %L678, !dbg !4804

L678:                                             ; preds = %L678.loopexit, %L641.preheader
  %v.i391 = atomicrmw or i64* %p.i_replacementA, i64 %value_phi156 acq_rel, align 8, !dbg !4804
  br label %L1042, !dbg !4807

L691.lr.ph:                                       ; preds = %L293, %L235, %L227
  %154 = icmp eq i64 %.sroa.0449.0, 0
  %arrayptr_ptr129.phi.trans.insert = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %_replacementA13, i64 0, i32 0
  %155 = addrspacecast {} addrspace(10)* %1 to float addrspace(13)* addrspace(11)*
  %156 = add nsw i64 %.sroa.0449.0, -1, !dbg !4808
  %umin597 = call i64 @llvm.umin.i64(i64 %156, i64 noundef 9223372036854775806) #78, !dbg !4808
  %157 = call i64 @llvm.smax.i64(i64 %arraysize5, i64 noundef 1) #78, !dbg !4808
  %158 = add nuw nsw i64 %umin597, 1
  %159 = add nsw i64 %157, -1, !dbg !4808
  br label %L691, !dbg !4808

L691:                                             ; preds = %L804, %L691.lr.ph
  %iv13 = phi i64 [ %iv.next14, %L804 ], [ 0, %L691.lr.ph ]
  %iv.next14 = add nuw nsw i64 %iv13, 1, !dbg !4809
  br i1 %154, label %L804, label %L691.L703_crit_edge, !dbg !4809

L691.L703_crit_edge:                              ; preds = %L691
  %arraysize117.pre = load i64, i64 addrspace(11)* %_replacementA20, align 8, !dbg !4810, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %arrayptr130.pre = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %arrayptr_ptr129.phi.trans.insert, align 16, !dbg !4819, !tbaa !68, !alias.scope !4821, !noalias !337
  %"arrayptr130.pre'il_phi" = phi i8 addrspace(13)* 
  %value_phi106.op = add nsw i64 %iv.next14, -1
  %160 = bitcast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !4809
  %161 = bitcast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %160 to i8 addrspace(13)* addrspace(10)*, !dbg !4809
  %162 = bitcast i8 addrspace(13)* addrspace(10)* %161 to {} addrspace(10)*, !dbg !4809
  br label %L703, !dbg !4809

L703:                                             ; preds = %L703, %L691.L703_crit_edge
  %iv15 = phi i64 [ %iv.next16, %L703 ], [ 0, %L691.L703_crit_edge ], !dbg !4819
  %nodecayed.arrayptr130 = phi {} addrspace(10)* [ %162, %L691.L703_crit_edge ], [ %183, %L703 ], !dbg !4819
  %arraysize127 = phi i64 [ %arraysize117.pre, %L691.L703_crit_edge ], [ %arraysize141, %L703 ], !dbg !4819
  %iv.next16 = add nuw nsw i64 %iv15, 1, !dbg !4810
  %163 = bitcast {} addrspace(10)* %nodecayed.arrayptr130 to i8 addrspace(13)* addrspace(10)*, !dbg !4810
  %164 = addrspacecast i8 addrspace(13)* addrspace(10)* %163 to i8 addrspace(13)* addrspace(11)*, !dbg !4810
  %165 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %164, align 8, !dbg !4810
  %"'il_phi6" = phi i8 addrspace(13)* , !dbg !4810
  %arraysize119 = load i64, i64 addrspace(11)* %_replacementA19, align 16, !dbg !4810, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %.not543 = icmp eq i64 %arraysize127, 1, !dbg !4822
  %.not544 = icmp eq i64 %arraysize119, 1, !dbg !4824
  %value_phi111.op = add nsw i64 %iv.next16, -1, !dbg !4819
  %166 = select i1 %.not543, i64 0, i64 %value_phi111.op, !dbg !4819
  %167 = select i1 %.not544, i64 0, i64 %value_phi106.op, !dbg !4819
  %168 = mul i64 %167, %arraysize127, !dbg !4819
  %169 = add i64 %168, %166, !dbg !4819
  %170 = bitcast i8 addrspace(13)* %165 to float addrspace(13)*, !dbg !4819
  %171 = getelementptr inbounds float, float addrspace(13)* %170, i64 %169, !dbg !4819
  %arrayref131 = load float, float addrspace(13)* %171, align 4, !dbg !4819, !tbaa !494, !alias.scope !83, !noalias !86
  %arraylen133 = load i64, i64 addrspace(11)* %arraylen_ptr2_replacementA, align 8, !dbg !4826, !tbaa !250, !range !253, !alias.scope !254, !noalias !255
  %.not545 = icmp eq i64 %arraylen133, 1, !dbg !4831
  %172 = select i1 %.not545, i64 0, i64 %value_phi111.op, !dbg !4833
  %arrayptr138547 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %155, align 16, !dbg !4833, !tbaa !271, !alias.scope !4835, !noalias !255, !nonnull !63
  %"arrayptr138547'il_phi" = phi float addrspace(13)* , !dbg !4833
  %173 = getelementptr inbounds float, float addrspace(13)* %arrayptr138547, i64 %172, !dbg !4833
  %arrayref139 = load float, float addrspace(13)* %173, align 4, !dbg !4833, !tbaa !494, !alias.scope !83, !noalias !86
  %174 = fadd float %arrayref131, %arrayref139, !dbg !4836
  %175 = call fastcc float @julia_gelu_2643(float %174) #78, !dbg !4838
  %arraysize141 = load i64, i64 addrspace(11)* %_replacementA20, align 8, !dbg !4843, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %176 = mul i64 %arraysize141, %value_phi106.op, !dbg !4843
  %177 = add i64 %176, %value_phi111.op, !dbg !4843
  %arrayptr144 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %arrayptr_ptr129.phi.trans.insert, align 16, !dbg !4843, !tbaa !68, !alias.scope !4821, !noalias !337, !nonnull !63
  %"arrayptr144'il_phi" = phi i8 addrspace(13)* , !dbg !4843
  %178 = bitcast i8 addrspace(13)* %arrayptr144 to float addrspace(13)*, !dbg !4843
  %179 = getelementptr inbounds float, float addrspace(13)* %178, i64 %177, !dbg !4843
  store float %175, float addrspace(13)* %179, align 4, !dbg !4843, !tbaa !494, !alias.scope !83, !noalias !4674
  %180 = add nuw nsw i64 %iv.next16, 1, !dbg !4845
  %exitcond598.not = icmp eq i64 %iv.next16, %158, !dbg !4848
  %181 = bitcast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !4566
  %182 = bitcast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %181 to i8 addrspace(13)* addrspace(10)*, !dbg !4566
  %183 = bitcast i8 addrspace(13)* addrspace(10)* %182 to {} addrspace(10)*, !dbg !4566
  br i1 %exitcond598.not, label %L804.loopexit, label %L703, !dbg !4566

L804.loopexit:                                    ; preds = %L703
  br label %L804, !dbg !4849

L804:                                             ; preds = %L804.loopexit, %L691
  %184 = add nuw nsw i64 %iv.next14, 1, !dbg !4849
  %exitcond599 = icmp eq i64 %iv.next14, %157, !dbg !4852
  br i1 %exitcond599, label %L1042.loopexit2, label %L691, !dbg !4808

L811:                                             ; preds = %top
  %185 = addrspacecast {} addrspace(10)* %0 to {} addrspace(10)* addrspace(11)*, !dbg !4853
  %arraysize_ptr246 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %185, i64 3, !dbg !4853
  %186 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr246 to i64 addrspace(11)*, !dbg !4853
  %arraysize247 = load i64, i64 addrspace(11)* %186, align 8, !dbg !4853, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %arraysize_ptr248 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %185, i64 4, !dbg !4853
  %187 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr248 to i64 addrspace(11)*, !dbg !4853
  %arraysize249 = load i64, i64 addrspace(11)* %187, align 16, !dbg !4853, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %188 = icmp eq i64 %arraysize247, 1, !dbg !4858
  %189 = icmp eq i64 %arraysize249, 1, !dbg !4863
  %190 = icmp ne i64 %arraysize247, %arraylen3, !dbg !4866
  %191 = icmp ne i64 %arraylen3, 1, !dbg !4868
  %192 = and i1 %191, %190, !dbg !4869
  br i1 %192, label %L860, label %L902, !dbg !4869

L860:                                             ; preds = %L811
  call fastcc void @julia_DimensionMismatch_2470() #78, !dbg !4869
  %box348 = call noalias nonnull dereferenceable(8) "enzyme_inactive" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1_replacementA, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 137517631542736 to {}*) to {} addrspace(10)*)) #81, !dbg !4869
  %193 = bitcast {} addrspace(10)* %box348 to [1 x {} addrspace(10)*] addrspace(10)*, !dbg !4869
  %194 = getelementptr [1 x {} addrspace(10)*], [1 x {} addrspace(10)*] addrspace(10)* %193, i64 0, i64 0, !dbg !4869
  store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 137517709292320 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %194, align 8, !dbg !4869, !tbaa !188, !alias.scope !83, !noalias !4674
  %195 = addrspacecast {} addrspace(10)* %box348 to {} addrspace(12)*, !dbg !4869
  call void @ijl_throw({} addrspace(12)* %195) #82, !dbg !4869
  unreachable, !dbg !4869

L902:                                             ; preds = %L811
  %196 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %_replacementA14) #79, !dbg !4872
  %"'ip_phi7" = phi {}* , !dbg !4872
  %197 = bitcast {}* %196 to i8**, !dbg !4872
  %arrayptr286 = load i8*, i8** %197, align 8, !dbg !4872, !tbaa !68, !alias.scope !336, !noalias !337, !nonnull !63
  %"arrayptr286'il_phi" = phi i8* , !dbg !4872
  %198 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %_replacementA12) #79, !dbg !4872
  %"'ip_phi8" = phi {}* , !dbg !4872
  %199 = bitcast {}* %198 to i8**, !dbg !4872
  %arrayptr288 = load i8*, i8** %199, align 8, !dbg !4872, !tbaa !271, !alias.scope !254, !noalias !255, !nonnull !63
  %"arrayptr288'il_phi" = phi i8* , !dbg !4884
  %.not560 = icmp eq i8* %arrayptr286, %arrayptr288, !dbg !4884
  %200 = bitcast {} addrspace(10)* %1 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !4876
  br i1 %.not560, label %L924, label %L929, !dbg !4876

L924:                                             ; preds = %L902
  %201 = call noalias nonnull {} addrspace(10)* @ijl_array_copy({} addrspace(10)* noundef nonnull %1) #78, !dbg !4887
  %"'ip_phi9" = phi {} addrspace(10)* , !dbg !4887
  %.phi.trans.insert517 = addrspacecast {} addrspace(10)* %201 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*
  %arraylen_ptr290.phi.trans.insert = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %.phi.trans.insert517, i64 0, i32 1
  %arraylen291.pre = load i64, i64 addrspace(11)* %arraylen_ptr290.phi.trans.insert, align 8, !dbg !4889, !tbaa !250, !range !253, !alias.scope !254, !noalias !255
  %202 = bitcast {} addrspace(10)* %201 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !4543
  br label %L929, !dbg !4543

L929:                                             ; preds = %L924, %L902
  %nodecayed..pre-phi529 = phi { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* [ %202, %L924 ], [ %200, %L902 ], !dbg !4889
  %arraylen291 = phi i64 [ %arraylen291.pre, %L924 ], [ %arraylen3, %L902 ], !dbg !4889
  %203 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %nodecayed..pre-phi529 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !4893
  %204 = icmp eq i64 %arraylen291, 1, !dbg !4893
  %.not561 = icmp eq i64 %arraysize249, 0, !dbg !4897
  br i1 %.not561, label %L1042, label %L954.preheader, !dbg !4901

L954.preheader:                                   ; preds = %L929
  %.not562 = icmp eq i64 %arraysize247, 0
  %205 = addrspacecast {} addrspace(10)* %0 to float addrspace(13)* addrspace(11)*
  %206 = bitcast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %203 to float addrspace(13)* addrspace(11)*
  %207 = add nsw i64 %arraysize249, -1, !dbg !4903
  %208 = add nsw i64 %arraysize247, -1, !dbg !4903
  br label %L954, !dbg !4903

L954:                                             ; preds = %L1005, %L954.preheader
  %iv19 = phi i64 [ %iv.next20, %L1005 ], [ 0, %L954.preheader ]
  %iv.next20 = add nuw nsw i64 %iv19, 1, !dbg !4903
  br i1 %.not562, label %L1005, label %L963.lr.ph, !dbg !4903

L963.lr.ph:                                       ; preds = %L954
  %value_phi300.op = add nsw i64 %iv.next20, -1
  %209 = select i1 %189, i64 0, i64 %value_phi300.op
  %arraysize309.pre = load i64, i64 addrspace(11)* %186, align 8, !dbg !4904, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %arrayptr312564.pre = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %205, align 16, !dbg !4904, !tbaa !68, !alias.scope !4821, !noalias !337
  %"arrayptr312564.pre'il_phi" = phi float addrspace(13)* , !dbg !4912
  %210 = bitcast {} addrspace(10)* %0 to float addrspace(13)* addrspace(10)*, !dbg !4912
  %211 = bitcast float addrspace(13)* addrspace(10)* %210 to {} addrspace(10)*, !dbg !4912
  br label %L963, !dbg !4912

L963:                                             ; preds = %L963, %L963.lr.ph
  %iv21 = phi i64 [ %iv.next22, %L963 ], [ 0, %L963.lr.ph ], !dbg !4904
  %nodecayed.arrayptr312564 = phi {} addrspace(10)* [ %211, %L963.lr.ph ], [ %227, %L963 ], !dbg !4904
  %arraysize309 = phi i64 [ %arraysize309.pre, %L963.lr.ph ], [ %arraysize319, %L963 ], !dbg !4904
  %iv.next22 = add nuw nsw i64 %iv21, 1, !dbg !4913
  %212 = bitcast {} addrspace(10)* %nodecayed.arrayptr312564 to float addrspace(13)* addrspace(10)*, !dbg !4913
  %213 = addrspacecast float addrspace(13)* addrspace(10)* %212 to float addrspace(13)* addrspace(11)*, !dbg !4913
  %214 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %213, align 8, !dbg !4913
  %"'il_phi10" = phi float addrspace(13)* , !dbg !4904
  %215 = select i1 %188, i64 0, i64 %iv21, !dbg !4904
  %216 = mul i64 %arraysize309, %209, !dbg !4904
  %217 = add i64 %215, %216, !dbg !4904
  %218 = getelementptr inbounds float, float addrspace(13)* %214, i64 %217, !dbg !4904
  %arrayref313 = load float, float addrspace(13)* %218, align 4, !dbg !4904, !tbaa !494, !alias.scope !83, !noalias !86
  %219 = select i1 %204, i64 0, i64 %iv21, !dbg !4916
  %arrayptr316565 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %206, align 8, !dbg !4916, !tbaa !271, !alias.scope !4835, !noalias !255, !nonnull !63
  %"arrayptr316565'il_phi" = phi float addrspace(13)* , !dbg !4916
  %220 = getelementptr inbounds float, float addrspace(13)* %arrayptr316565, i64 %219, !dbg !4916
  %arrayref317 = load float, float addrspace(13)* %220, align 4, !dbg !4916, !tbaa !494, !alias.scope !83, !noalias !86
  %221 = fadd float %arrayref313, %arrayref317, !dbg !4920
  %222 = call fastcc float @julia_gelu_2643(float %221) #78, !dbg !4922
  %arraysize319 = load i64, i64 addrspace(11)* %186, align 8, !dbg !4927, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %223 = mul i64 %arraysize319, %value_phi300.op, !dbg !4927
  %224 = add i64 %223, %iv21, !dbg !4927
  %arrayptr322566 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %205, align 16, !dbg !4927, !tbaa !68, !alias.scope !4821, !noalias !337, !nonnull !63
  %"arrayptr322566'il_phi" = phi float addrspace(13)* , !dbg !4927
  %225 = getelementptr inbounds float, float addrspace(13)* %arrayptr322566, i64 %224, !dbg !4927
  store float %222, float addrspace(13)* %225, align 4, !dbg !4927, !tbaa !494, !alias.scope !83, !noalias !4674
  %exitcond.not = icmp eq i64 %iv.next22, %arraysize247, !dbg !4929
  %226 = bitcast {} addrspace(10)* %0 to float addrspace(13)* addrspace(10)*, !dbg !4912
  %227 = bitcast float addrspace(13)* addrspace(10)* %226 to {} addrspace(10)*, !dbg !4912
  br i1 %exitcond.not, label %L1005.loopexit, label %L963, !dbg !4912, !llvm.loop !4930

L1005.loopexit:                                   ; preds = %L963
  br label %L1005, !dbg !4931

L1005:                                            ; preds = %L1005.loopexit, %L954
  %228 = add nuw nsw i64 %iv.next20, 1, !dbg !4931
  %exitcond596.not = icmp eq i64 %iv.next20, %arraysize249, !dbg !4935
  br i1 %exitcond596.not, label %L1042.loopexit, label %L954, !dbg !4934

L1042.loopexit:                                   ; preds = %L1005
  br label %L1042

L1042.loopexit1:                                  ; preds = %L194
  %229 = phi i64 [ %iv17, %L194 ]
  store i64 %229, i64* %loopLimit_cache, align 8, !invariant.group !4936
  br label %L1042

L1042.loopexit2:                                  ; preds = %L804
  br label %L1042

L1042:                                            ; preds = %L1042.loopexit2, %L1042.loopexit1, %L1042.loopexit, %L929, %L678, %L221, %L62
  call void @llvm.lifetime.end.p0i8(i64 noundef 88, i8* noundef nonnull %.sub_replacementA) #78
  br label %invertL1042, !dbg !4937

guard_exit374:                                    ; preds = %L62
  %"arrayptr_ptr.phi.trans.insert'ipg" = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %"'ipc29", i64 0, i32 0
  %arrayptr_ptr.phi.trans.insert = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %_replacementA13, i64 0, i32 0
  %arrayptr.pre = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %arrayptr_ptr.phi.trans.insert, align 16, !dbg !4456, !tbaa !68, !alias.scope !4821, !noalias !337
  %"arrayptr.pre'il_phi" = phi i8 addrspace(13)* 
  %"'ipc40" = addrspacecast {} addrspace(10)* %"'1" to float addrspace(13)* addrspace(11)*
  %230 = addrspacecast {} addrspace(10)* %1 to float addrspace(13)* addrspace(11)*
  %"'ipc47" = bitcast {} addrspace(10)* %"'" to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !4938
  %231 = bitcast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !4938
  %"'ipc48" = bitcast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %"'ipc47" to i8 addrspace(13)* addrspace(10)*, !dbg !4938
  %232 = bitcast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %231 to i8 addrspace(13)* addrspace(10)*, !dbg !4938
  %"'ipc49" = bitcast i8 addrspace(13)* addrspace(10)* %"'ipc48" to {} addrspace(10)*, !dbg !4938
  %233 = bitcast i8 addrspace(13)* addrspace(10)* %232 to {} addrspace(10)*, !dbg !4938
  store i64* null, i64** %arraysize72_cache, align 8, !dbg !4938
  store i64* null, i64** %value_phi41_cache, align 8, !dbg !4938
  store i64* null, i64** %value_phi40_cache, align 8, !dbg !4938
  store float* null, float** %_cache, align 8, !dbg !4938
  store i64* null, i64** %arraylen64.pre_cache, align 8, !dbg !4938
  store i64* null, i64** %arraysize54.pre_cache, align 8, !dbg !4938
  br label %L86, !dbg !4938

guard_exit379:                                    ; preds = %L194, %L86
  %value_phi78573 = phi i64 [ %49, %L194 ], [ %value_phi41, %L86 ]
  %value_phi77572 = phi i64 [ 1, %L194 ], [ %44, %L86 ]
  %arraysize54.pre = load i64, i64 addrspace(11)* %_replacementA19, align 16, !dbg !4459, !tbaa !68, !range !253, !alias.scope !336, !noalias !337
  %arraylen64.pre = load i64, i64 addrspace(11)* %arraylen_ptr2_replacementA, align 8, !dbg !4447, !tbaa !250, !range !253, !alias.scope !254, !noalias !255
  %234 = load i64*, i64** %arraylen64.pre_cache, align 8, !dbg !4938, !dereferenceable !880, !invariant.group !4939
  %235 = getelementptr inbounds i64, i64* %234, i64 %iv17, !dbg !4938
  store i64 %arraylen64.pre, i64* %235, align 8, !dbg !4938, !tbaa !250, !invariant.group !4940
  %236 = load i64*, i64** %arraysize54.pre_cache, align 8, !dbg !4938, !dereferenceable !880, !invariant.group !4941
  %237 = getelementptr inbounds i64, i64* %236, i64 %iv17, !dbg !4938
  store i64 %arraysize54.pre, i64* %237, align 8, !dbg !4938, !tbaa !68, !invariant.group !4942
  %"'ipc50" = bitcast {} addrspace(10)* %"'" to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !4938
  %238 = bitcast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !4938
  %"'ipc51" = bitcast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %"'ipc50" to i8 addrspace(13)* addrspace(10)*, !dbg !4938
  %239 = bitcast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %238 to i8 addrspace(13)* addrspace(10)*, !dbg !4938
  %"'ipc52" = bitcast i8 addrspace(13)* addrspace(10)* %"'ipc51" to {} addrspace(10)*, !dbg !4938
  %240 = bitcast i8 addrspace(13)* addrspace(10)* %239 to {} addrspace(10)*, !dbg !4938
  br label %L86, !dbg !4938

allocsForInversion:                               ; No predecessors!
  %"iv17'ac" = alloca i64, align 8
  %loopLimit_cache = alloca i64, align 8
  %"iv'ac" = alloca i64, align 8
  %"iv3'ac" = alloca i64, align 8
  %"iv5'ac" = alloca i64, align 8
  %"iv7'ac" = alloca i64, align 8
  %"iv9'ac" = alloca i64, align 8
  %"iv11'ac" = alloca i64, align 8
  %"iv13'ac" = alloca i64, align 8
  %"iv15'ac" = alloca i64, align 8
  %"iv19'ac" = alloca i64, align 8
  %"iv21'ac" = alloca i64, align 8
  %arraysize_cache = alloca i64, align 8
  %arraysize72_cache = alloca i64*, align 8
  %value_phi41_cache = alloca i64*, align 8
  %value_phi40_cache = alloca i64*, align 8
  %"'de" = alloca float, align 4
  %241 = getelementptr float, float* %"'de", i64 0
  store float 0.000000e+00, float* %241, align 4
  %_cache = alloca float*, align 8
  %"'de38" = alloca float, align 4
  %242 = getelementptr float, float* %"'de38", i64 0
  store float 0.000000e+00, float* %242, align 4
  %"arrayref'de" = alloca float, align 4
  %243 = getelementptr float, float* %"arrayref'de", i64 0
  store float 0.000000e+00, float* %243, align 4
  %"arrayref70'de" = alloca float, align 4
  %244 = getelementptr float, float* %"arrayref70'de", i64 0
  store float 0.000000e+00, float* %244, align 4
  %arraylen64.pre_cache = alloca i64*, align 8
  %arraysize54.pre_cache = alloca i64*, align 8
  %arraysize5_cache = alloca i64, align 8

inverttop:                                        ; preds = %invertL7
  fence syncscope("singlethread") seq_cst
  fence syncscope("singlethread") seq_cst
  ret void

invertL7:                                         ; preds = %invertL40, %invertL28
  br label %inverttop

invertL28:                                        ; preds = %invertL40
  br label %invertL7

invertL36:                                        ; No predecessors!

invertL40:                                        ; preds = %invertL221, %invertL62
  %245 = load i64, i64* %arraysize_cache, align 8, !dbg !4420, !tbaa !68, !range !253, !alias.scope !4425, !noalias !4428, !invariant.group !4430
  %_unwrap = icmp eq i64 %arraylen3, %245
  %_unwrap28 = icmp eq i64 %245, 1
  %value_phi_unwrap = or i1 %_unwrap, %_unwrap28
  br i1 %value_phi_unwrap, label %invertL7, label %invertL28

invertL62:                                        ; No predecessors!
  br label %invertL40

invertL86:                                        ; preds = %invertL194
  %246 = load i64, i64* %"iv17'ac", align 8, !dbg !4488
  %"arrayptr_ptr.phi.trans.insert'ipg_unwrap" = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %"'ipc29", i64 0, i32 0, !dbg !4488
  %"arrayptr75'il_phi_unwrap" = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %"arrayptr_ptr.phi.trans.insert'ipg_unwrap", align 16, !dbg !4488, !tbaa !68, !alias.scope !4490, !noalias !4491, !nonnull !63
  %"'ipc_unwrap" = bitcast i8 addrspace(13)* %"arrayptr75'il_phi_unwrap" to float addrspace(13)*, !dbg !4488
  %247 = load i64*, i64** %arraysize72_cache, align 8, !dbg !4488, !dereferenceable !880, !invariant.group !4495
  %248 = getelementptr inbounds i64, i64* %247, i64 %246, !dbg !4488
  %249 = load i64, i64* %248, align 8, !dbg !4488, !tbaa !68, !range !253, !alias.scope !336, !noalias !337, !invariant.group !4496
  %250 = load i64*, i64** %value_phi41_cache, align 8, !dbg !4488, !dereferenceable !880, !invariant.group !4466
  %251 = getelementptr inbounds i64, i64* %250, i64 %246, !dbg !4488
  %252 = load i64, i64* %251, align 8, !dbg !4488, !invariant.group !4467
  %value_phi41.op_unwrap = add i64 %252, -1, !dbg !4488
  %_unwrap31 = mul i64 %249, %value_phi41.op_unwrap, !dbg !4488
  %253 = load i64*, i64** %value_phi40_cache, align 8, !dbg !4488, !dereferenceable !880, !invariant.group !4464
  %254 = getelementptr inbounds i64, i64* %253, i64 %246, !dbg !4488
  %255 = load i64, i64* %254, align 8, !dbg !4488, !invariant.group !4465
  %value_phi40.op_unwrap = add i64 %255, -1, !dbg !4488
  %_unwrap33 = add i64 %_unwrap31, %value_phi40.op_unwrap, !dbg !4488
  %"'ipg_unwrap" = getelementptr inbounds float, float addrspace(13)* %"'ipc_unwrap", i64 %_unwrap33, !dbg !4488
  %256 = load float, float addrspace(13)* %"'ipg_unwrap", align 4, !dbg !4488, !tbaa !494, !alias.scope !4943, !noalias !4946
  store float 0.000000e+00, float addrspace(13)* %"'ipg_unwrap", align 4, !dbg !4488, !tbaa !494, !alias.scope !4943, !noalias !4946
  %257 = load float, float* %"'de", align 4, !dbg !4488
  %258 = fadd fast float %257, %256, !dbg !4488
  store float %258, float* %"'de", align 4, !dbg !4488
  %259 = load i64, i64* %"iv17'ac", align 8, !dbg !4483
  %260 = load float*, float** %_cache, align 8, !dbg !4483, !dereferenceable !880, !invariant.group !4497
  %261 = getelementptr inbounds float, float* %260, i64 %259, !dbg !4483
  %262 = load float, float* %261, align 4, !dbg !4483, !invariant.group !4498
  %263 = load float, float* %"'de", align 4, !dbg !4483
  %264 = call fastcc { float } @diffejulia_gelu_2643(float %262, float %263), !dbg !4483
  %265 = extractvalue { float } %264, 0, !dbg !4483
  %266 = load float, float* %"'de38", align 4, !dbg !4483
  %267 = fadd fast float %266, %265, !dbg !4483
  store float %267, float* %"'de38", align 4, !dbg !4483
  store float 0.000000e+00, float* %"'de", align 4, !dbg !4483
  %268 = load float, float* %"'de38", align 4, !dbg !4481
  store float 0.000000e+00, float* %"'de38", align 4, !dbg !4481
  %269 = load float, float* %"arrayref'de", align 4, !dbg !4481
  %270 = fadd fast float %269, %268, !dbg !4481
  store float %270, float* %"arrayref'de", align 4, !dbg !4481
  %271 = load float, float* %"arrayref70'de", align 4, !dbg !4481
  %272 = fadd fast float %271, %268, !dbg !4481
  store float %272, float* %"arrayref70'de", align 4, !dbg !4481
  %273 = load float, float* %"arrayref70'de", align 4, !dbg !4477
  store float 0.000000e+00, float* %"arrayref70'de", align 4, !dbg !4477
  %274 = load i64, i64* %"iv17'ac", align 8, !dbg !4477
  %"'ipc40_unwrap" = addrspacecast {} addrspace(10)* %"'1" to float addrspace(13)* addrspace(11)*, !dbg !4477
  %"arrayptr69538'il_phi_unwrap" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc40_unwrap", align 16, !dbg !4477, !tbaa !271, !alias.scope !4479, !noalias !4480, !nonnull !63
  %275 = icmp ne i64 %274, 0, !dbg !4477
  br i1 %275, label %invertL86_phirc, label %invertL86_phirc42, !dbg !4477

invertL86_phirc:                                  ; preds = %invertL86
  %276 = sub nuw i64 %274, 1
  %277 = load i64*, i64** %arraylen64.pre_cache, align 8, !dereferenceable !880, !invariant.group !4939
  %278 = getelementptr inbounds i64, i64* %277, i64 %276
  %279 = load i64, i64* %278, align 8, !dbg !4447, !tbaa !250, !range !253, !alias.scope !254, !noalias !255, !invariant.group !4940
  br label %invertL86_phimerge

invertL86_phirc42:                                ; preds = %invertL86
  br label %invertL86_phimerge

invertL86_phimerge:                               ; preds = %invertL86_phirc42, %invertL86_phirc
  %280 = phi i64 [ %279, %invertL86_phirc ], [ %arraylen3, %invertL86_phirc42 ], !dbg !4477
  %.not536_unwrap = icmp eq i64 %280, 1, !dbg !4477
  %_unwrap43 = select i1 %.not536_unwrap, i64 0, i64 %value_phi40.op_unwrap, !dbg !4477
  %"'ipg39_unwrap" = getelementptr inbounds float, float addrspace(13)* %"arrayptr69538'il_phi_unwrap", i64 %_unwrap43, !dbg !4477
  %281 = atomicrmw fadd float addrspace(13)* %"'ipg39_unwrap", float %273 monotonic, align 4, !dbg !4477
  %282 = load float, float* %"arrayref'de", align 4, !dbg !4456
  store float 0.000000e+00, float* %"arrayref'de", align 4, !dbg !4456
  %283 = load i64, i64* %"iv17'ac", align 8, !dbg !4456
  %"'ipc47_unwrap" = bitcast {} addrspace(10)* %"'" to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !4456
  %"'ipc48_unwrap" = bitcast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %"'ipc47_unwrap" to i8 addrspace(13)* addrspace(10)*, !dbg !4456
  %"'ipc49_unwrap" = bitcast i8 addrspace(13)* addrspace(10)* %"'ipc48_unwrap" to {} addrspace(10)*, !dbg !4456
  %284 = icmp ne i64 %283, 0, !dbg !4456
  br i1 %284, label %invertL86_phimerge_phirc, label %invertL86_phimerge_phirc55, !dbg !4456

invertL86_phimerge_phirc:                         ; preds = %invertL86_phimerge
  %285 = sub nuw i64 %283, 1
  %"'ipc50_unwrap" = bitcast {} addrspace(10)* %"'" to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*
  %"'ipc51_unwrap" = bitcast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %"'ipc50_unwrap" to i8 addrspace(13)* addrspace(10)*
  %"'ipc52_unwrap" = bitcast i8 addrspace(13)* addrspace(10)* %"'ipc51_unwrap" to {} addrspace(10)*
  br label %invertL86_phimerge_phimerge

invertL86_phimerge_phirc55:                       ; preds = %invertL86_phimerge
  br label %invertL86_phimerge_phimerge

invertL86_phimerge_phimerge:                      ; preds = %invertL86_phimerge_phirc55, %invertL86_phimerge_phirc
  %286 = phi {} addrspace(10)* [ %"'ipc52_unwrap", %invertL86_phimerge_phirc ], [ %"'ipc49_unwrap", %invertL86_phimerge_phirc55 ], !dbg !4456
  %"'ipc53_unwrap" = bitcast {} addrspace(10)* %286 to i8 addrspace(13)* addrspace(10)*, !dbg !4456
  %"'ipc54_unwrap" = addrspacecast i8 addrspace(13)* addrspace(10)* %"'ipc53_unwrap" to i8 addrspace(13)* addrspace(11)*, !dbg !4456
  %"'il_phi_unwrap" = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %"'ipc54_unwrap", align 8, !dbg !4462, !alias.scope !4468, !noalias !4471
  %"'ipc45_unwrap" = bitcast i8 addrspace(13)* %"'il_phi_unwrap" to float addrspace(13)*, !dbg !4456
  %287 = icmp ne i64 %283, 0, !dbg !4456
  br i1 %287, label %invertL86_phimerge_phimerge_phirc, label %invertL86_phimerge_phimerge_phirc57, !dbg !4456

invertL86_phimerge_phimerge_phirc:                ; preds = %invertL86_phimerge_phimerge
  %288 = sub nuw i64 %283, 1
  %289 = load i64*, i64** %arraysize54.pre_cache, align 8, !dereferenceable !880, !invariant.group !4941
  %290 = getelementptr inbounds i64, i64* %289, i64 %288
  %291 = load i64, i64* %290, align 8, !dbg !4459, !tbaa !68, !range !253, !alias.scope !336, !noalias !337, !invariant.group !4942
  br label %invertL86_phimerge_phimerge_phimerge

invertL86_phimerge_phimerge_phirc57:              ; preds = %invertL86_phimerge_phimerge
  %292 = load i64, i64* %arraysize5_cache, align 8, !dbg !4420, !tbaa !68, !range !253, !alias.scope !4425, !noalias !4428, !invariant.group !4437
  br label %invertL86_phimerge_phimerge_phimerge

invertL86_phimerge_phimerge_phimerge:             ; preds = %invertL86_phimerge_phimerge_phirc57, %invertL86_phimerge_phimerge_phirc
  %293 = phi i64 [ %291, %invertL86_phimerge_phimerge_phirc ], [ %292, %invertL86_phimerge_phimerge_phirc57 ], !dbg !4456
  %.not535_unwrap = icmp eq i64 %293, 1, !dbg !4456
  %_unwrap58 = select i1 %.not535_unwrap, i64 0, i64 %value_phi41.op_unwrap, !dbg !4456
  %294 = icmp ne i64 %283, 0, !dbg !4456
  br i1 %294, label %invertL86_phimerge_phimerge_phimerge_phirc, label %invertL86_phimerge_phimerge_phimerge_phirc59, !dbg !4456

invertL86_phimerge_phimerge_phimerge_phirc:       ; preds = %invertL86_phimerge_phimerge_phimerge
  %295 = sub nuw i64 %283, 1
  %296 = load i64*, i64** %arraysize72_cache, align 8, !dereferenceable !880, !invariant.group !4495
  %297 = getelementptr inbounds i64, i64* %296, i64 %295
  %298 = load i64, i64* %297, align 8, !dbg !4488, !tbaa !68, !range !253, !alias.scope !4425, !noalias !4428, !invariant.group !4496
  br label %invertL86_phimerge_phimerge_phimerge_phimerge

invertL86_phimerge_phimerge_phimerge_phirc59:     ; preds = %invertL86_phimerge_phimerge_phimerge
  %299 = load i64, i64* %arraysize_cache, align 8, !dbg !4420, !tbaa !68, !range !253, !alias.scope !4425, !noalias !4428, !invariant.group !4430
  br label %invertL86_phimerge_phimerge_phimerge_phimerge

invertL86_phimerge_phimerge_phimerge_phimerge:    ; preds = %invertL86_phimerge_phimerge_phimerge_phirc59, %invertL86_phimerge_phimerge_phimerge_phirc
  %300 = phi i64 [ %298, %invertL86_phimerge_phimerge_phimerge_phirc ], [ %299, %invertL86_phimerge_phimerge_phimerge_phirc59 ], !dbg !4456
  %_unwrap60 = mul i64 %_unwrap58, %300, !dbg !4456
  %.not534_unwrap = icmp eq i64 %300, 1, !dbg !4456
  %_unwrap61 = select i1 %.not534_unwrap, i64 0, i64 %value_phi40.op_unwrap, !dbg !4456
  %_unwrap62 = add i64 %_unwrap60, %_unwrap61, !dbg !4456
  %"'ipg46_unwrap" = getelementptr inbounds float, float addrspace(13)* %"'ipc45_unwrap", i64 %_unwrap62, !dbg !4456
  %301 = atomicrmw fadd float addrspace(13)* %"'ipg46_unwrap", float %282 monotonic, align 4, !dbg !4456
  %302 = load i64, i64* %"iv17'ac", align 8
  %303 = icmp eq i64 %302, 0
  %304 = xor i1 %303, true
  br i1 %303, label %invertguard_exit374, label %incinvertL86

incinvertL86:                                     ; preds = %invertL86_phimerge_phimerge_phimerge_phimerge
  %305 = load i64, i64* %"iv17'ac", align 8
  %306 = add nsw i64 %305, -1
  store i64 %306, i64* %"iv17'ac", align 8
  br label %invertguard_exit379

invertL194:                                       ; No predecessors!
  br label %invertL86

invertL221:                                       ; preds = %invertL227
  br label %invertL40

invertL227:                                       ; preds = %invertL235
  br label %invertL221

invertL235:                                       ; preds = %invertL245
  br label %invertL227

invertL245:                                       ; preds = %invertL258
  br label %invertL235

invertL258:                                       ; preds = %invertL261
  br label %invertL245

invertL261:                                       ; preds = %mergeinvertL261_L282, %incinvertL261
  %307 = load i64, i64* %"iv'ac", align 8
  %308 = icmp eq i64 %307, 0
  %309 = xor i1 %308, true
  br i1 %308, label %invertL258, label %incinvertL261

incinvertL261:                                    ; preds = %invertL261
  %310 = load i64, i64* %"iv'ac", align 8
  %311 = add nsw i64 %310, -1
  store i64 %311, i64* %"iv'ac", align 8
  br label %invertL261

invertL282:                                       ; No predecessors!
  br label %mergeinvertL261_L282

mergeinvertL261_L282:                             ; preds = %invertL282
  store i64 0, i64* %"iv'ac", align 8
  br label %invertL261

invertL293:                                       ; No predecessors!
  %312 = call i64 @julia_nthreads_2651() #78, !dbg !4438
  %313 = load i64, i64* %arraysize5_cache, align 8, !dbg !4420, !tbaa !68, !range !253, !alias.scope !4425, !noalias !4428, !invariant.group !4437
  %314 = call i64 @llvm.smin.i64(i64 %312, i64 %313) #78, !dbg !4515
  %_unwrap82 = trunc i64 %314 to i32
  %_unwrap83 = add i32 %_unwrap82, -1

invertL361.lr.ph:                                 ; No predecessors!

invertL427.preheader:                             ; No predecessors!

invertL430.lr.ph:                                 ; No predecessors!

invertL361:                                       ; No predecessors!

invertL412:                                       ; No predecessors!

invertL415:                                       ; No predecessors!

invertL641.preheader.loopexit:                    ; No predecessors!

invertL641.preheader:                             ; No predecessors!

invertL646.preheader:                             ; No predecessors!

invertL430:                                       ; No predecessors!

invertL442.preheader:                             ; No predecessors!

invertL442:                                       ; No predecessors!

invertL622.loopexit:                              ; No predecessors!

invertL622:                                       ; No predecessors!

invertL646:                                       ; No predecessors!

invertL666.preheader:                             ; No predecessors!

invertL666:                                       ; No predecessors!

invertL670:                                       ; No predecessors!

invertL673:                                       ; No predecessors!

invertL676.loopexit:                              ; No predecessors!

invertL676:                                       ; No predecessors!

invertL678.loopexit:                              ; No predecessors!

invertL678:                                       ; No predecessors!

invertL691.lr.ph:                                 ; No predecessors!

invertL691:                                       ; No predecessors!

invertL691.L703_crit_edge:                        ; No predecessors!

invertL703:                                       ; No predecessors!

invertL804.loopexit:                              ; No predecessors!

invertL804:                                       ; No predecessors!

invertL811:                                       ; No predecessors!

invertL860:                                       ; No predecessors!

invertL902:                                       ; No predecessors!

invertL924:                                       ; No predecessors!

invertL929:                                       ; No predecessors!

invertL954.preheader:                             ; No predecessors!

invertL954:                                       ; No predecessors!

invertL963.lr.ph:                                 ; No predecessors!

invertL963:                                       ; No predecessors!

invertL1005.loopexit:                             ; No predecessors!

invertL1005:                                      ; No predecessors!

invertL1042.loopexit:                             ; No predecessors!

invertL1042.loopexit1:                            ; No predecessors!

invertL1042.loopexit2:                            ; No predecessors!

invertL1042:                                      ; preds = %L1042

invertguard_exit374:                              ; preds = %invertL86_phimerge_phimerge_phimerge_phimerge
  %315 = load i64, i64* %"iv17'ac", align 8
  %forfree = load i64*, i64** %arraysize72_cache, align 8, !dereferenceable !880, !invariant.group !4495
  %316 = bitcast i64* %forfree to i8*
  call void @free(i8* nonnull %316), !dbg !4948
  %317 = load i64, i64* %"iv17'ac", align 8
  %forfree30 = load i64*, i64** %value_phi41_cache, align 8, !dereferenceable !880, !invariant.group !4466
  %318 = bitcast i64* %forfree30 to i8*
  call void @free(i8* nonnull %318), !dbg !4948
  %319 = load i64, i64* %"iv17'ac", align 8
  %forfree32 = load i64*, i64** %value_phi40_cache, align 8, !dereferenceable !880, !invariant.group !4464
  %320 = bitcast i64* %forfree32 to i8*
  call void @free(i8* nonnull %320), !dbg !4948
  %321 = load i64, i64* %"iv17'ac", align 8
  %forfree36 = load float*, float** %_cache, align 8, !dereferenceable !4949, !invariant.group !4497
  %322 = bitcast float* %forfree36 to i8*
  call void @free(i8* nonnull %322), !dbg !4948
  %323 = load i64, i64* %"iv17'ac", align 8
  %forfree41 = load i64*, i64** %arraylen64.pre_cache, align 8, !dereferenceable !880, !invariant.group !4939
  %324 = bitcast i64* %forfree41 to i8*
  call void @free(i8* nonnull %324), !dbg !4948
  %325 = load i64, i64* %"iv17'ac", align 8
  %forfree56 = load i64*, i64** %arraysize54.pre_cache, align 8, !dereferenceable !880, !invariant.group !4941
  %326 = bitcast i64* %forfree56 to i8*
  call void @free(i8* nonnull %326), !dbg !4948

invertguard_exit379:                              ; preds = %incinvertL86
}

  %v.i_replacementA = phi i64 , !dbg !293
julia: /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:3791: bool GradientUtils::legalRecompute(const llvm::Value*, const ValueToValueMapTy&, llvm::IRBuilder<>*, bool, bool) const: Assertion `phi->getNumIncomingValues() != 0' failed.

[836700] signal (6.-6): Aborted
in expression starting at REPL[9]:1
unknown function (ip: 0x7d126491d32c)
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x7d12648b43db)
__assert_fail at /usr/lib/libc.so.6 (unknown line)
legalRecompute at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:3791
lookupM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:6535
unwrapM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:1327
lookupM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:6537
unwrapM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:930
lookupM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:6537
unwrapM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:1066
lookupM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:6537
unwrapM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:1088
lookupM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:6537
branchToCorrespondingTarget at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:7738
createInvertedTerminator at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:3611
CreatePrimalAndGradient at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:4382
recursivelyHandleSubfunction at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:5744
visitCallInst at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:6611
visit at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:111 [inlined]
CreatePrimalAndGradient at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:4378
EnzymeCreatePrimalAndGradient at /workspace/srcdir/Enzyme/enzyme/Enzyme/CApi.cpp:615
EnzymeCreatePrimalAndGradient at /home/avikpal/.julia/packages/Enzyme/wOi4l/src/api.jl:154
unknown function (ip: 0x7d12341e7f6b)
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
enzyme! at /home/avikpal/.julia/packages/Enzyme/wOi4l/src/compiler.jl:3147
unknown function (ip: 0x7d12341e3828)
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
#codegen#487 at /home/avikpal/.julia/packages/Enzyme/wOi4l/src/compiler.jl:5022
codegen at /home/avikpal/.julia/packages/Enzyme/wOi4l/src/compiler.jl:4444 [inlined]
_thunk at /home/avikpal/.julia/packages/Enzyme/wOi4l/src/compiler.jl:5707
_thunk at /home/avikpal/.julia/packages/Enzyme/wOi4l/src/compiler.jl:5707 [inlined]
cached_compilation at /home/avikpal/.julia/packages/Enzyme/wOi4l/src/compiler.jl:5741 [inlined]
#532 at /home/avikpal/.julia/packages/Enzyme/wOi4l/src/compiler.jl:5807
#JuliaContext#149 at /home/avikpal/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:52
unknown function (ip: 0x7d123414fe06)
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
JuliaContext at /home/avikpal/.julia/packages/GPUCompiler/kqxyC/src/driver.jl:42
#s1946#531 at /home/avikpal/.julia/packages/Enzyme/wOi4l/src/compiler.jl:5759 [inlined]
#s1946#531 at ./none:0
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
GeneratedFunctionStub at ./boot.jl:602
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_call_staged at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/method.c:540
ijl_code_for_staged at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/method.c:593
get_staged at ./compiler/utilities.jl:123
retrieve_code_info at ./compiler/utilities.jl:135 [inlined]
InferenceState at ./compiler/inferencestate.jl:430
typeinf_edge at ./compiler/typeinfer.jl:920
abstract_call_method at ./compiler/abstractinterpretation.jl:629
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:95
abstract_call_known at ./compiler/abstractinterpretation.jl:2087
abstract_call at ./compiler/abstractinterpretation.jl:2169
abstract_call at ./compiler/abstractinterpretation.jl:2162
abstract_call at ./compiler/abstractinterpretation.jl:2354
abstract_eval_call at ./compiler/abstractinterpretation.jl:2370
abstract_eval_statement_expr at ./compiler/abstractinterpretation.jl:2380
abstract_eval_statement at ./compiler/abstractinterpretation.jl:2624
abstract_eval_basic_statement at ./compiler/abstractinterpretation.jl:2889
typeinf_local at ./compiler/abstractinterpretation.jl:3098
typeinf_nocycle at ./compiler/abstractinterpretation.jl:3186
_typeinf at ./compiler/typeinfer.jl:247
typeinf at ./compiler/typeinfer.jl:216
typeinf_edge at ./compiler/typeinfer.jl:930
abstract_call_method at ./compiler/abstractinterpretation.jl:629
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:95
abstract_call_known at ./compiler/abstractinterpretation.jl:2087
abstract_call at ./compiler/abstractinterpretation.jl:2169
abstract_apply at ./compiler/abstractinterpretation.jl:1612
abstract_call_known at ./compiler/abstractinterpretation.jl:2004
abstract_call at ./compiler/abstractinterpretation.jl:2169
abstract_call at ./compiler/abstractinterpretation.jl:2162
abstract_call at ./compiler/abstractinterpretation.jl:2354
abstract_eval_call at ./compiler/abstractinterpretation.jl:2370
abstract_eval_statement_expr at ./compiler/abstractinterpretation.jl:2380
abstract_eval_statement at ./compiler/abstractinterpretation.jl:2624
abstract_eval_basic_statement at ./compiler/abstractinterpretation.jl:2913
typeinf_local at ./compiler/abstractinterpretation.jl:3098
typeinf_nocycle at ./compiler/abstractinterpretation.jl:3186
_typeinf at ./compiler/typeinfer.jl:247
typeinf at ./compiler/typeinfer.jl:216
typeinf_ext at ./compiler/typeinfer.jl:1051
typeinf_ext_toplevel at ./compiler/typeinfer.jl:1082
typeinf_ext_toplevel at ./compiler/typeinfer.jl:1078
jfptr_typeinf_ext_toplevel_35682.1 at /home/avikpal/.julia/juliaup/julia-1.10.3+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_type_infer at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:394
jl_generate_fptr_impl at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/jitlayers.cpp:504
jl_compile_method_internal at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2481 [inlined]
jl_compile_method_internal at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2368
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2887 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/interpreter.c:617
jl_interpret_toplevel_thunk at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/toplevel.c:877
eval_body at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/interpreter.c:579
eval_body at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/interpreter.c:544
jl_interpret_toplevel_thunk at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/toplevel.c:877
jl_toplevel_eval_flex at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/toplevel.c:985
eval at ./boot.jl:385 [inlined]
eval_user_input at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
repl_backend_loop at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
#start_repl_backend#46 at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
start_repl_backend at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:228
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
#run_repl#59 at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:389
run_repl at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:375
jfptr_run_repl_91734.1 at /home/avikpal/.julia/juliaup/julia-1.10.3+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
#1013 at ./client.jl:432
jfptr_YY.1013_82700.1 at /home/avikpal/.julia/juliaup/julia-1.10.3+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_latest at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/builtins.c:812
#invokelatest#2 at ./essentials.jl:892 [inlined]
invokelatest at ./essentials.jl:889 [inlined]
run_main_repl at ./client.jl:416
exec_options at ./client.jl:333
_start at ./client.jl:552
jfptr__start_82726.1 at /home/avikpal/.julia/juliaup/julia-1.10.3+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
true_main at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/src/jlapi.c:731
main at /cache/build/builder-amdci4-2/julialang/julia-release-1-dot-10/cli/loader_exe.c:58
unknown function (ip: 0x7d12648b5ccf)
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 41003533 (Pool: 40954080; Big: 49453); GC: 48

@avik-pal
Copy link
Contributor Author

@wsmoses added the crash logs. the one with LLVM error seems to be too long and I can't seem to figure out how to redirect the logs when julia crashes.

@wsmoses
Copy link
Member

wsmoses commented May 11, 2024

@avik-pal I need to double check, but I think the latter one is actually a bug in Polyester. It emits a gc preserve begin without a gc preserve end.

I've also been warned that polyester messes with LLVM in ways that generates invalid code (which is indeed the case here).

cc @vchuravy

@wsmoses
Copy link
Member

wsmoses commented May 11, 2024

also @avik-pal

julia> act = gelu
ERROR: UndefVarError: `gelu` not defined
Stacktrace:
 [1] top-level scope
   @ REPL[4]:1


@wsmoses
Copy link
Member

wsmoses commented May 11, 2024

Okay I have confirmed the latter to unquestionably be a bug in polyester, not Enzyme.

polyester.ll.txt

Speciically I did

julia> @code_llvm optimize=false raw=true loss_function(act, y, b)

You will see that there is a gc preserve begin

        %380 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* %379, [2 x {} addrspace(10)*] %357), !dbg !473

which is never used and has no gc_preserve_end

@wsmoses
Copy link
Member

wsmoses commented May 11, 2024

Posted here JuliaSIMD/Polyester.jl#145

@wsmoses
Copy link
Member

wsmoses commented May 11, 2024

@avik-pal given that this issue is premised on invalid LLVM to begin with, I'm going to close.

Please reopen if I'm mistaken and it can be reproduced without the invalid LLVM.

@wsmoses wsmoses closed this as completed May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants