Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install make target #6

Closed
StefanKarpinski opened this issue Apr 27, 2011 · 15 comments
Closed

install make target #6

StefanKarpinski opened this issue Apr 27, 2011 · 15 comments
Assignees
Labels
building Build system, or building Julia or its dependencies

Comments

@StefanKarpinski
Copy link
Member

You should be able to do make install and have a compiled copy of all of julia installed on a system. The default location should be under /usr/local but should be configurable. It would be nice even for us developers to have both an installed (unbroken) copy of julia and have the development copy still live in the repo directory. That way when the development version is broken, you can still use julia for other stuff.

@ViralBShah
Copy link
Member

I feel comfortable that in the 1.0 release, people just build and run out of the source directory as we do now.

To install in /usr/local and have a stable + dev copy may require us to do a fair amount of testing with JULIA_HOME, paths etc.

In my opinion, this can be 2.0.

@ghost ghost assigned StefanKarpinski Jul 9, 2011
@StefanKarpinski
Copy link
Member Author

It just occurred to me that this could be as simple as automatically making a symlink in ~/bin/ otherwise the getting started section may not work since julia directory may not be in the path.

@ViralBShah
Copy link
Member

We do have a make install target now, and even ability to create debian packages. Closing this one.

@JeffBezanson
Copy link
Member

make install needs to be documented. I don't understand how to use it, since it copies files to $(DESTDIR)/usr/share/julia. So what do I set DESTDIR to? Not /, since the first thing it does is rm -fr $(DESTDIR). Looks dangerous...

@StefanKarpinski
Copy link
Member Author

I reopened this because no one besides Viral knows how this works and I'm not clear that it fully does. The intention of an install target is that you can build, make install, and then delete the build directory and have a standalone installation that continues to work without the source code, containing just the necessary executables and libraries.

@ViralBShah
Copy link
Member

It installs a working copy of julia in DESTDIR. DESTDIR is a standard variable used by autoconf and such, and also by the debian build system.

So yes, you can do make install DESTDIR=$HOME/julia, and then delete the sources and get a working julia. This is how the debian package is built also. The issue that Jeff raised is that we need to have a DESTDIR by default so that nothing silly is done.

Can you try it out, and once satisfied, close it?

@StefanKarpinski
Copy link
Member Author

Sure. We should probably default to /usr/local/julia since that's a standard destination for self-compiled installations (c.f. packaged installs). On Debian it can use /usr/share/julia since you said it didn't like /usr/local/julia for some reason.

@ViralBShah
Copy link
Member

Using /usr/share on debian and /usr/local elsewhere is a bit of a kludge, since you have to look for /etc/debian_version and do something different on debian. Also, people may not always have root access. In such cases, we could take the view that DESTDIR has to be explicitly passed.

-viral

On Jan 8, 2012, at 11:40 PM, Stefan Karpinski wrote:

Sure. We should probably default to /usr/local/julia since that's a standard destination for self-compiled installations (c.f. packaged installs). On Debian it can use /usr/share/julia since you said it didn't like /usr/local/julia for some reason.


Reply to this email directly or view it on GitHub:
#6 (comment)

@StefanKarpinski
Copy link
Member Author

The default install location when you use ./configure to install software is /usr/local. Defaulting to anything else is surprising and bad. Debian is different because we're distributing a package, which would be expected not to go under /usr/local since it's precompiled.

@ViralBShah
Copy link
Member

The Debian build process needs to invoke the install target, the way it is now.

-viral

On 09-Jan-2012, at 1:58 AM, Stefan Karpinskireply@reply.github.com wrote:

The default install location when you use ./configure to install software is /usr/local. Defaulting to anything else is surprising and bad. Debian is different because we're distributing a package, which would be expected not to go under /usr/local since it's precompiled.


Reply to this email directly or view it on GitHub:
#6 (comment)

@ViralBShah
Copy link
Member

Maybe we are getting to the point where we should start using autotools.

-viral

On 09-Jan-2012, at 1:58 AM, Stefan Karpinskireply@reply.github.com wrote:

The default install location when you use ./configure to install software is /usr/local. Defaulting to anything else is surprising and bad. Debian is different because we're distributing a package, which would be expected not to go under /usr/local since it's precompiled.


Reply to this email directly or view it on GitHub:
#6 (comment)

@StefanKarpinski
Copy link
Member Author

Ugh. Autotools is the worst. What would it provide us with?

On Jan 8, 2012, at 4:27 PM, "Viral B. Shah"reply@reply.github.com wrote:

Maybe we are getting to the point where we should start using autotools.

-viral

On 09-Jan-2012, at 1:58 AM, Stefan Karpinskireply@reply.github.com wrote:

The default install location when you use ./configure to install software is /usr/local. Defaulting to anything else is surprising and bad. Debian is different because we're distributing a package, which would be expected not to go under /usr/local since it's precompiled.


Reply to this email directly or view it on GitHub:
#6 (comment)


Reply to this email directly or view it on GitHub:
#6 (comment)

@ViralBShah
Copy link
Member

A lot of other build tools understand autotools, and everyone understands how to use it, interfaces etc. That would be the only reason to use it.

-viral

On Jan 9, 2012, at 3:35 AM, Stefan Karpinski wrote:

Ugh. Autotools is the worst. What would it provide us with?

On Jan 8, 2012, at 4:27 PM, "Viral B. Shah"reply@reply.github.com wrote:

Maybe we are getting to the point where we should start using autotools.

-viral

On 09-Jan-2012, at 1:58 AM, Stefan Karpinskireply@reply.github.com wrote:

The default install location when you use ./configure to install software is /usr/local. Defaulting to anything else is surprising and bad. Debian is different because we're distributing a package, which would be expected not to go under /usr/local since it's precompiled.


Reply to this email directly or view it on GitHub:
#6 (comment)


Reply to this email directly or view it on GitHub:
#6 (comment)


Reply to this email directly or view it on GitHub:
#6 (comment)

@StefanKarpinski
Copy link
Member Author

I guess the only thing it would really give us would be ./configure --prefix=WHATEVER that Debian and people in general would know how to interact with. Ok, well, what's the absolute bare minimum we can get away with using it? I really don't want to have to run my makefiles through automake and turn them into incomprehensible nonsense. Maybe we can have a Make.ac.inc generated from Make.ac.inc.ac that keeps the nonsense sequestered and sets up the variables that allow control of the install process?

ViralBShah added a commit that referenced this issue Jan 18, 2012
Update the Makefile's install target to not use rm, cp, etc. and
instead use the install command.

The default install path is $(DESTDIR)/usr/share/julia. In case
DESTDIR is not specified during make, the installation happens in
/usr/share/julia.

I would have liked this to be /usr/local/julia instead, but the debian
package creation scripts do not like this, and it calls the install
target in the Makefile to create the package. I think /usr/share/julia
is acceptable as a default for the time being.

I believe this sufficiently addresses issue #6 even though the solution
is not completely satisfactory.
@ViralBShah
Copy link
Member

As of commit 7dca4d7, we now have an install target that can be used by calling make install DESTDIR=..., and a dist target that creates a tarball.

The install target installs in $(DESTDIR)/usr/share/julia/, whereas the tarball creates all files in ./julia.

I think this just about takes care of everything discussed in this issue.

Keno added a commit that referenced this issue Nov 27, 2023
This is part of the work to address #51352 by attempting to allow the
compiler to perform SRAO on persistent data structures like
`PersistentDict` as if they were regular immutable data structures.
These sorts of data structures have very complicated internals (with
lots of mutation, memory sharing, etc.), but a relatively simple
interface. As such, it is unlikely that our compiler will have
sufficient power to optimize this interface by analyzing the
implementation.

We thus need to come up with some other mechanism that gives the
compiler license to perform the requisite optimization. One way would be
to just hardcode `PersistentDict` into the compiler, optimizing it like
any of the other builtin datatypes. However, this is of course very
unsatisfying. At the other end of the spectrum would be something like a
generic rewrite rule system (e-graphs anyone?) that would let the
PersistentDict implementation declare its interface to the compiler and
the compiler would use this for optimization (in a perfect world, the
actual rewrite would then be checked using some sort of formal methods).
I think that would be interesting, but we're very far from even being
able to design something like that (at least in Base - experiments with
external AbstractInterpreters in this direction are encouraged).

This PR tries to come up with a reasonable middle ground, where the
compiler gets some knowledge of the protocol hardcoded without having to
know about the implementation details of the data structure.

The basic ideas is that `Core` provides some magic generic functions
that implementations can extend. Semantically, they are not special.
They dispatch as usual, and implementations are expected to work
properly even in the absence of any compiler optimizations.

However, the compiler is semantically permitted to perform structural
optimization using these magic generic functions. In the concrete case,
this PR introduces the `KeyValue` interface which consists of two
generic functions, `get` and `set`. The core optimization is that the
compiler is allowed to rewrite any occurrence of `get(set(x, k, v), k)`
into `v` without additional legality checks. In particular, the compiler
performs no type checks, conversions, etc. The higher level
implementation code is expected to do all that.

This approach closely matches the general direction we've been taking in
external AbstractInterpreters for embedding additional semantics and
optimization opportunities into Julia code (although we generally use
methods there, rather than full generic functions), so I think we have
some evidence that this sort of approach works reasonably well.

Nevertheless, this is certainly an experiment and the interface is
explicitly declared unstable.

## Current Status

This is fully working and implemented, but the optimization currently
bails on anything but the simplest cases. Filling all those cases in is
not particularly hard, but should be done along with a more invasive
refactoring of SROA, so we should figure out the general direction here
first and then we can finish all that up in a follow-up cleanup.

## Obligatory benchmark
Before:
```
julia> using BenchmarkTools

julia> function foo()
           a = Base.PersistentDict(:a => 1)
           return a[:a]
       end
foo (generic function with 1 method)

julia> @benchmark foo()
BenchmarkTools.Trial: 10000 samples with 993 evaluations.
 Range (min … max):  32.940 ns …  28.754 μs  ┊ GC (min … max):  0.00% … 99.76%
 Time  (median):     49.647 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   57.519 ns ± 333.275 ns  ┊ GC (mean ± σ):  10.81% ±  2.22%

        ▃█▅               ▁▃▅▅▃▁                ▁▃▂   ▂
  ▁▂▄▃▅▇███▇▃▁▂▁▁▁▁▁▁▁▁▂▂▅██████▅▂▁▁▁▁▁▁▁▁▁▁▂▃▃▇███▇▆███▆▄▃▃▂▂ ▃
  32.9 ns         Histogram: frequency by time         68.6 ns <

 Memory estimate: 128 bytes, allocs estimate: 4.

julia> @code_typed foo()
CodeInfo(
1 ─ %1  = invoke Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}(Base.HashArrayMappedTries.undef::UndefInitializer, 1::Int64)::Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
│   %2  = %new(Base.HashArrayMappedTries.HAMT{Symbol, Int64}, %1, 0x00000000)::Base.HashArrayMappedTries.HAMT{Symbol, Int64}
│   %3  = %new(Base.HashArrayMappedTries.Leaf{Symbol, Int64}, :a, 1)::Base.HashArrayMappedTries.Leaf{Symbol, Int64}
│   %4  = Base.getfield(%2, :data)::Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
│   %5  = $(Expr(:boundscheck, true))::Bool
└──       goto #5 if not %5
2 ─ %7  = Base.sub_int(1, 1)::Int64
│   %8  = Base.bitcast(UInt64, %7)::UInt64
│   %9  = Base.getfield(%4, :size)::Tuple{Int64}
│   %10 = $(Expr(:boundscheck, true))::Bool
│   %11 = Base.getfield(%9, 1, %10)::Int64
│   %12 = Base.bitcast(UInt64, %11)::UInt64
│   %13 = Base.ult_int(%8, %12)::Bool
└──       goto #4 if not %13
3 ─       goto #5
4 ─ %16 = Core.tuple(1)::Tuple{Int64}
│         invoke Base.throw_boundserror(%4::Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}, %16::Tuple{Int64})::Union{}
└──       unreachable
5 ┄ %19 = Base.getfield(%4, :ref)::MemoryRef{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
│   %20 = Base.memoryref(%19, 1, false)::MemoryRef{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
│         Base.memoryrefset!(%20, %3, :not_atomic, false)::MemoryRef{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
└──       goto #6
6 ─ %23 = Base.getfield(%2, :bitmap)::UInt32
│   %24 = Base.or_int(%23, 0x00010000)::UInt32
│         Base.setfield!(%2, :bitmap, %24)::UInt32
└──       goto #7
7 ─ %27 = %new(Base.PersistentDict{Symbol, Int64}, %2)::Base.PersistentDict{Symbol, Int64}
└──       goto #8
8 ─ %29 = invoke Base.getindex(%27::Base.PersistentDict{Symbol, Int64}, 🅰️:Symbol)::Int64
└──       return %29
```

After:
```
julia> using BenchmarkTools

julia> function foo()
           a = Base.PersistentDict(:a => 1)
           return a[:a]
       end
foo (generic function with 1 method)

julia> @benchmark foo()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  2.459 ns … 11.320 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.460 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.469 ns ±  0.183 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂    █                                              ▁    █ ▂
  █▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁█ █
  2.46 ns      Histogram: log(frequency) by time     2.47 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @code_typed foo()
CodeInfo(
1 ─     return 1
```
mkitti pushed a commit to mkitti/julia that referenced this issue Dec 9, 2023
This is part of the work to address JuliaLang#51352 by attempting to allow the
compiler to perform SRAO on persistent data structures like
`PersistentDict` as if they were regular immutable data structures.
These sorts of data structures have very complicated internals (with
lots of mutation, memory sharing, etc.), but a relatively simple
interface. As such, it is unlikely that our compiler will have
sufficient power to optimize this interface by analyzing the
implementation.

We thus need to come up with some other mechanism that gives the
compiler license to perform the requisite optimization. One way would be
to just hardcode `PersistentDict` into the compiler, optimizing it like
any of the other builtin datatypes. However, this is of course very
unsatisfying. At the other end of the spectrum would be something like a
generic rewrite rule system (e-graphs anyone?) that would let the
PersistentDict implementation declare its interface to the compiler and
the compiler would use this for optimization (in a perfect world, the
actual rewrite would then be checked using some sort of formal methods).
I think that would be interesting, but we're very far from even being
able to design something like that (at least in Base - experiments with
external AbstractInterpreters in this direction are encouraged).

This PR tries to come up with a reasonable middle ground, where the
compiler gets some knowledge of the protocol hardcoded without having to
know about the implementation details of the data structure.

The basic ideas is that `Core` provides some magic generic functions
that implementations can extend. Semantically, they are not special.
They dispatch as usual, and implementations are expected to work
properly even in the absence of any compiler optimizations.

However, the compiler is semantically permitted to perform structural
optimization using these magic generic functions. In the concrete case,
this PR introduces the `KeyValue` interface which consists of two
generic functions, `get` and `set`. The core optimization is that the
compiler is allowed to rewrite any occurrence of `get(set(x, k, v), k)`
into `v` without additional legality checks. In particular, the compiler
performs no type checks, conversions, etc. The higher level
implementation code is expected to do all that.

This approach closely matches the general direction we've been taking in
external AbstractInterpreters for embedding additional semantics and
optimization opportunities into Julia code (although we generally use
methods there, rather than full generic functions), so I think we have
some evidence that this sort of approach works reasonably well.

Nevertheless, this is certainly an experiment and the interface is
explicitly declared unstable.

## Current Status

This is fully working and implemented, but the optimization currently
bails on anything but the simplest cases. Filling all those cases in is
not particularly hard, but should be done along with a more invasive
refactoring of SROA, so we should figure out the general direction here
first and then we can finish all that up in a follow-up cleanup.

## Obligatory benchmark
Before:
```
julia> using BenchmarkTools

julia> function foo()
           a = Base.PersistentDict(:a => 1)
           return a[:a]
       end
foo (generic function with 1 method)

julia> @benchmark foo()
BenchmarkTools.Trial: 10000 samples with 993 evaluations.
 Range (min … max):  32.940 ns …  28.754 μs  ┊ GC (min … max):  0.00% … 99.76%
 Time  (median):     49.647 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   57.519 ns ± 333.275 ns  ┊ GC (mean ± σ):  10.81% ±  2.22%

        ▃█▅               ▁▃▅▅▃▁                ▁▃▂   ▂
  ▁▂▄▃▅▇███▇▃▁▂▁▁▁▁▁▁▁▁▂▂▅██████▅▂▁▁▁▁▁▁▁▁▁▁▂▃▃▇███▇▆███▆▄▃▃▂▂ ▃
  32.9 ns         Histogram: frequency by time         68.6 ns <

 Memory estimate: 128 bytes, allocs estimate: 4.

julia> @code_typed foo()
CodeInfo(
1 ─ %1  = invoke Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}(Base.HashArrayMappedTries.undef::UndefInitializer, 1::Int64)::Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
│   %2  = %new(Base.HashArrayMappedTries.HAMT{Symbol, Int64}, %1, 0x00000000)::Base.HashArrayMappedTries.HAMT{Symbol, Int64}
│   %3  = %new(Base.HashArrayMappedTries.Leaf{Symbol, Int64}, :a, 1)::Base.HashArrayMappedTries.Leaf{Symbol, Int64}
│   %4  = Base.getfield(%2, :data)::Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
│   %5  = $(Expr(:boundscheck, true))::Bool
└──       goto JuliaLang#5 if not %5
2 ─ %7  = Base.sub_int(1, 1)::Int64
│   %8  = Base.bitcast(UInt64, %7)::UInt64
│   %9  = Base.getfield(%4, :size)::Tuple{Int64}
│   %10 = $(Expr(:boundscheck, true))::Bool
│   %11 = Base.getfield(%9, 1, %10)::Int64
│   %12 = Base.bitcast(UInt64, %11)::UInt64
│   %13 = Base.ult_int(%8, %12)::Bool
└──       goto JuliaLang#4 if not %13
3 ─       goto JuliaLang#5
4 ─ %16 = Core.tuple(1)::Tuple{Int64}
│         invoke Base.throw_boundserror(%4::Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}, %16::Tuple{Int64})::Union{}
└──       unreachable
5 ┄ %19 = Base.getfield(%4, :ref)::MemoryRef{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
│   %20 = Base.memoryref(%19, 1, false)::MemoryRef{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
│         Base.memoryrefset!(%20, %3, :not_atomic, false)::MemoryRef{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
└──       goto JuliaLang#6
6 ─ %23 = Base.getfield(%2, :bitmap)::UInt32
│   %24 = Base.or_int(%23, 0x00010000)::UInt32
│         Base.setfield!(%2, :bitmap, %24)::UInt32
└──       goto JuliaLang#7
7 ─ %27 = %new(Base.PersistentDict{Symbol, Int64}, %2)::Base.PersistentDict{Symbol, Int64}
└──       goto JuliaLang#8
8 ─ %29 = invoke Base.getindex(%27::Base.PersistentDict{Symbol, Int64}, 🅰️:Symbol)::Int64
└──       return %29
```

After:
```
julia> using BenchmarkTools

julia> function foo()
           a = Base.PersistentDict(:a => 1)
           return a[:a]
       end
foo (generic function with 1 method)

julia> @benchmark foo()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  2.459 ns … 11.320 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.460 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.469 ns ±  0.183 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂    █                                              ▁    █ ▂
  █▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁█ █
  2.46 ns      Histogram: log(frequency) by time     2.47 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @code_typed foo()
CodeInfo(
1 ─     return 1
```
quinnj pushed a commit that referenced this issue Jan 26, 2024
`@something` eagerly unwraps any `Some` given to it, while keeping the
variable between its arguments the same. This can be an issue if a
previously unpacked value is used as input to `@something`, leading to a
type instability on more than two arguments (e.g. because of a fallback
to `Some(nothing)`). By using different variables for each argument,
type inference has an easier time handling these cases that are isolated
to single branches anyway.

This also adds some comments to the macro, since it's non-obvious what
it does.

Benchmarking the specific case I encountered this in led to a ~2x
performance improvement on multiple machines.

1.10-beta3/master:

```
[sukera@tower 01]$ jl1100 -q --project=. -L 01.jl -e 'bench()'
v"1.10.0-beta3"

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  38.670 μs … 70.350 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     43.340 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   43.395 μs ±  1.518 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                              ▆█▂ ▁▁                           
  ▂▂▂▂▂▂▂▂▂▁▂▂▂▃▃▃▂▂▃▃▃▂▂▂▂▂▄▇███▆██▄▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  38.7 μs         Histogram: frequency by time          48 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.
```

This PR:

```
[sukera@tower 01]$ julia -q --project=. -L 01.jl -e 'bench()'
v"1.11.0-DEV.970"

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  22.820 μs …  44.980 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     24.300 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   24.370 μs ± 832.239 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                ▂▅▇██▇▆▅▁                                       
  ▂▂▂▂▂▂▂▂▃▃▄▅▇███████████▅▄▃▃▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▂ ▃
  22.8 μs         Histogram: frequency by time         27.7 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.
``` 


<details>
<summary>Benchmarking code (spoilers for Advent Of Code 2023 Day 01,
Part 01). Running this requires the input of that Advent Of Code
day.</summary>

```julia
using BenchmarkTools
using InteractiveUtils

isdigit(d::UInt8) = UInt8('0') <= d <= UInt8('9')
someDigit(c::UInt8) = isdigit(c) ? Some(c - UInt8('0')) : nothing

function part1(data)
    total = 0
    may_a = nothing
    may_b = nothing

    for c in data
        digitRes = someDigit(c)
        may_a = @something may_a digitRes Some(nothing)
        may_b = @something digitRes may_b Some(nothing)
        if c == UInt8('\n')
            digit_a = may_a::UInt8
            digit_b = may_b::UInt8
            total += digit_a*0xa + digit_b
            may_a = nothing
            may_b = nothing
        end
    end

    return total
end

function bench()
    data = read("input.txt")
    display(VERSION)
    println()
    display(@benchmark part1($data))
    nothing
end
```
</details>

<details>
<summary>`@code_warntype` before</summary>

```julia
julia> @code_warntype part1(data)
MethodInstance for part1(::Vector{UInt8})
  from part1(data) @ Main ~/Documents/projects/AOC/2023/01/01.jl:7
Arguments
  #self#::Core.Const(part1)
  data::Vector{UInt8}
Locals
  @_3::Union{Nothing, Tuple{UInt8, Int64}}
  may_b::Union{Nothing, UInt8}
  may_a::Union{Nothing, UInt8}
  total::Int64
  c::UInt8
  digit_b::UInt8
  digit_a::UInt8
  val@_10::Any
  val@_11::Any
  digitRes::Union{Nothing, Some{UInt8}}
  @_13::Union{Some{Nothing}, Some{UInt8}, UInt8}
  @_14::Union{Some{Nothing}, Some{UInt8}}
  @_15::Some{Nothing}
  @_16::Union{Some{Nothing}, Some{UInt8}, UInt8}
  @_17::Union{Some{Nothing}, UInt8}
  @_18::Some{Nothing}
Body::Int64
1 ──       (total = 0)
│          (may_a = Main.nothing)
│          (may_b = Main.nothing)
│    %4  = data::Vector{UInt8}
│          (@_3 = Base.iterate(%4))
│    %6  = (@_3 === nothing)::Bool
│    %7  = Base.not_int(%6)::Bool
└───       goto #24 if not %7
2 ┄─       Core.NewvarNode(:(digit_b))
│          Core.NewvarNode(:(digit_a))
│          Core.NewvarNode(:(val@_10))
│    %12 = @_3::Tuple{UInt8, Int64}
│          (c = Core.getfield(%12, 1))
│    %14 = Core.getfield(%12, 2)::Int64
│          (digitRes = Main.someDigit(c))
│          (val@_11 = may_a)
│    %17 = (val@_11::Union{Nothing, UInt8} !== Base.nothing)::Bool
└───       goto #4 if not %17
3 ──       (@_13 = val@_11::UInt8)
└───       goto #11
4 ──       (val@_11 = digitRes)
│    %22 = (val@_11::Union{Nothing, Some{UInt8}} !== Base.nothing)::Bool
└───       goto #6 if not %22
5 ──       (@_14 = val@_11::Some{UInt8})
└───       goto #10
6 ──       (val@_11 = Main.Some(Main.nothing))
│    %27 = (val@_11::Core.Const(Some(nothing)) !== Base.nothing)::Core.Const(true)
└───       goto #8 if not %27
7 ──       (@_15 = val@_11::Core.Const(Some(nothing)))
└───       goto #9
8 ──       Core.Const(:(@_15 = Base.nothing))
9 ┄─       (@_14 = @_15)
10 ┄       (@_13 = @_14)
11 ┄ %34 = @_13::Union{Some{Nothing}, Some{UInt8}, UInt8}
│          (may_a = Base.something(%34))
│          (val@_10 = digitRes)
│    %37 = (val@_10::Union{Nothing, Some{UInt8}} !== Base.nothing)::Bool
└───       goto #13 if not %37
12 ─       (@_16 = val@_10::Some{UInt8})
└───       goto #20
13 ─       (val@_10 = may_b)
│    %42 = (val@_10::Union{Nothing, UInt8} !== Base.nothing)::Bool
└───       goto #15 if not %42
14 ─       (@_17 = val@_10::UInt8)
└───       goto #19
15 ─       (val@_10 = Main.Some(Main.nothing))
│    %47 = (val@_10::Core.Const(Some(nothing)) !== Base.nothing)::Core.Const(true)
└───       goto #17 if not %47
16 ─       (@_18 = val@_10::Core.Const(Some(nothing)))
└───       goto #18
17 ─       Core.Const(:(@_18 = Base.nothing))
18 ┄       (@_17 = @_18)
19 ┄       (@_16 = @_17)
20 ┄ %54 = @_16::Union{Some{Nothing}, Some{UInt8}, UInt8}
│          (may_b = Base.something(%54))
│    %56 = c::UInt8
│    %57 = Main.UInt8('\n')::Core.Const(0x0a)
│    %58 = (%56 == %57)::Bool
└───       goto #22 if not %58
21 ─       (digit_a = Core.typeassert(may_a, Main.UInt8))
│          (digit_b = Core.typeassert(may_b, Main.UInt8))
│    %62 = total::Int64
│    %63 = (digit_a * 0x0a)::UInt8
│    %64 = (%63 + digit_b)::UInt8
│          (total = %62 + %64)
│          (may_a = Main.nothing)
└───       (may_b = Main.nothing)
22 ┄       (@_3 = Base.iterate(%4, %14))
│    %69 = (@_3 === nothing)::Bool
│    %70 = Base.not_int(%69)::Bool
└───       goto #24 if not %70
23 ─       goto #2
24 ┄       return total
```
</details>

<details>
<summary>`@code_native debuginfo=:none` Before </summary>

```julia
julia> @code_native debuginfo=:none part1(data)
	.text
	.file	"part1"
	.globl	julia_part1_418                 # -- Begin function julia_part1_418
	.p2align	4, 0x90
	.type	julia_part1_418,@function
julia_part1_418:                        # @julia_part1_418
# %bb.0:                                # %top
	push	rbp
	mov	rbp, rsp
	push	r15
	push	r14
	push	r13
	push	r12
	push	rbx
	sub	rsp, 40
	mov	rax, qword ptr [rdi + 8]
	test	rax, rax
	je	.LBB0_1
# %bb.2:                                # %L17
	mov	rcx, qword ptr [rdi]
	dec	rax
	mov	r10b, 1
	xor	r14d, r14d
                                        # implicit-def: $r12b
                                        # implicit-def: $r13b
                                        # implicit-def: $r9b
                                        # implicit-def: $sil
	mov	qword ptr [rbp - 64], rax       # 8-byte Spill
	mov	al, 1
	mov	dword ptr [rbp - 48], eax       # 4-byte Spill
                                        # implicit-def: $al
                                        # kill: killed $al
	xor	eax, eax
	mov	qword ptr [rbp - 56], rax       # 8-byte Spill
	mov	qword ptr [rbp - 72], rcx       # 8-byte Spill
                                        # implicit-def: $cl
	jmp	.LBB0_3
	.p2align	4, 0x90
.LBB0_8:                                #   in Loop: Header=BB0_3 Depth=1
	mov	dword ptr [rbp - 48], 0         # 4-byte Folded Spill
.LBB0_24:                               # %post_union_move
                                        #   in Loop: Header=BB0_3 Depth=1
	movzx	r13d, byte ptr [rbp - 41]       # 1-byte Folded Reload
	mov	r12d, r8d
	cmp	qword ptr [rbp - 64], r14       # 8-byte Folded Reload
	je	.LBB0_13
.LBB0_25:                               # %guard_exit113
                                        #   in Loop: Header=BB0_3 Depth=1
	inc	r14
	mov	r10d, ebx
.LBB0_3:                                # %L19
                                        # =>This Inner Loop Header: Depth=1
	mov	rax, qword ptr [rbp - 72]       # 8-byte Reload
	xor	ebx, ebx
	xor	edi, edi
	movzx	r15d, r9b
	movzx	ecx, cl
	movzx	esi, sil
	mov	r11b, 1
                                        # implicit-def: $r9b
	movzx	edx, byte ptr [rax + r14]
	lea	eax, [rdx - 58]
	lea	r8d, [rdx - 48]
	cmp	al, -10
	setae	bl
	setb	dil
	test	r10b, 1
	cmovne	r15d, edi
	mov	edi, 0
	cmovne	ecx, ebx
	mov	bl, 1
	cmovne	esi, edi
	test	r15b, 1
	jne	.LBB0_7
# %bb.4:                                # %L76
                                        #   in Loop: Header=BB0_3 Depth=1
	mov	r11b, 2
	test	cl, 1
	jne	.LBB0_5
# %bb.6:                                # %L78
                                        #   in Loop: Header=BB0_3 Depth=1
	mov	ebx, r10d
	mov	r9d, r15d
	mov	byte ptr [rbp - 41], r13b       # 1-byte Spill
	test	sil, 1
	je	.LBB0_26
.LBB0_7:                                # %L82
                                        #   in Loop: Header=BB0_3 Depth=1
	cmp	al, -11
	jbe	.LBB0_9
	jmp	.LBB0_8
	.p2align	4, 0x90
.LBB0_5:                                #   in Loop: Header=BB0_3 Depth=1
	mov	ecx, r8d
	mov	sil, 1
	xor	ebx, ebx
	mov	byte ptr [rbp - 41], r8b        # 1-byte Spill
	xor	r9d, r9d
	xor	ecx, ecx
	cmp	al, -11
	ja	.LBB0_8
.LBB0_9:                                # %L90
                                        #   in Loop: Header=BB0_3 Depth=1
	test	byte ptr [rbp - 48], 1          # 1-byte Folded Reload
	jne	.LBB0_23
# %bb.10:                               # %L115
                                        #   in Loop: Header=BB0_3 Depth=1
	cmp	dl, 10
	jne	.LBB0_11
# %bb.14:                               # %L122
                                        #   in Loop: Header=BB0_3 Depth=1
	test	r15b, 1
	jne	.LBB0_15
# %bb.12:                               # %L130.thread
                                        #   in Loop: Header=BB0_3 Depth=1
	movzx	eax, byte ptr [rbp - 41]        # 1-byte Folded Reload
	mov	bl, 1
	add	eax, eax
	lea	eax, [rax + 4*rax]
	add	al, r12b
	movzx	eax, al
	add	qword ptr [rbp - 56], rax       # 8-byte Folded Spill
	mov	al, 1
	mov	dword ptr [rbp - 48], eax       # 4-byte Spill
	cmp	qword ptr [rbp - 64], r14       # 8-byte Folded Reload
	jne	.LBB0_25
	jmp	.LBB0_13
	.p2align	4, 0x90
.LBB0_23:                               # %L115.thread
                                        #   in Loop: Header=BB0_3 Depth=1
	mov	al, 1
                                        # implicit-def: $r8b
	mov	dword ptr [rbp - 48], eax       # 4-byte Spill
	cmp	dl, 10
	jne	.LBB0_24
	jmp	.LBB0_21
.LBB0_11:                               #   in Loop: Header=BB0_3 Depth=1
	mov	r8d, r12d
	jmp	.LBB0_24
.LBB0_1:
	xor	eax, eax
	mov	qword ptr [rbp - 56], rax       # 8-byte Spill
.LBB0_13:                               # %L159
	mov	rax, qword ptr [rbp - 56]       # 8-byte Reload
	add	rsp, 40
	pop	rbx
	pop	r12
	pop	r13
	pop	r14
	pop	r15
	pop	rbp
	ret
.LBB0_21:                               # %L122.thread
	test	r15b, 1
	jne	.LBB0_15
# %bb.22:                               # %post_box_union58
	movabs	rdi, offset .L_j_str1
	movabs	rax, offset ijl_type_error
	movabs	rsi, 140008511215408
	movabs	rdx, 140008667209736
	call	rax
.LBB0_15:                               # %fail
	cmp	r11b, 1
	je	.LBB0_19
# %bb.16:                               # %fail
	movzx	eax, r11b
	cmp	eax, 2
	jne	.LBB0_17
# %bb.20:                               # %box_union54
	movzx	eax, byte ptr [rbp - 41]        # 1-byte Folded Reload
	movabs	rcx, offset jl_boxed_uint8_cache
	mov	rdx, qword ptr [rcx + 8*rax]
	jmp	.LBB0_18
.LBB0_26:                               # %L80
	movabs	rax, offset ijl_throw
	movabs	rdi, 140008495049392
	call	rax
.LBB0_19:                               # %box_union
	movabs	rdx, 140008667209736
	jmp	.LBB0_18
.LBB0_17:
	xor	edx, edx
.LBB0_18:                               # %post_box_union
	movabs	rdi, offset .L_j_str1
	movabs	rax, offset ijl_type_error
	movabs	rsi, 140008511215408
	call	rax
.Lfunc_end0:
	.size	julia_part1_418, .Lfunc_end0-julia_part1_418
                                        # -- End function
	.type	.L_j_str1,@object               # @_j_str1
	.section	.rodata.str1.1,"aMS",@progbits,1
.L_j_str1:
	.asciz	"typeassert"
	.size	.L_j_str1, 11

	.section	".note.GNU-stack","",@progbits
```
</details>

<details>
<summary>`@code_warntype` After</summary>

```julia

[sukera@tower 01]$ julia -q --project=. -L 01.jl
julia> data = read("input.txt");

julia> @code_warntype part1(data)
MethodInstance for part1(::Vector{UInt8})
  from part1(data) @ Main ~/Documents/projects/AOC/2023/01/01.jl:7
Arguments
  #self#::Core.Const(part1)
  data::Vector{UInt8}
Locals
  @_3::Union{Nothing, Tuple{UInt8, Int64}}
  may_b::Union{Nothing, UInt8}
  may_a::Union{Nothing, UInt8}
  total::Int64
  val@_7::Union{}
  val@_8::Union{}
  c::UInt8
  digit_b::UInt8
  digit_a::UInt8
  ##215::Some{Nothing}
  ##216::Union{Nothing, UInt8}
  ##217::Union{Nothing, Some{UInt8}}
  ##212::Some{Nothing}
  ##213::Union{Nothing, Some{UInt8}}
  ##214::Union{Nothing, UInt8}
  digitRes::Union{Nothing, Some{UInt8}}
  @_19::Union{Nothing, UInt8}
  @_20::Union{Nothing, UInt8}
  @_21::Nothing
  @_22::Union{Nothing, UInt8}
  @_23::Union{Nothing, UInt8}
  @_24::Nothing
Body::Int64
1 ──        (total = 0)
│           (may_a = Main.nothing)
│           (may_b = Main.nothing)
│    %4   = data::Vector{UInt8}
│           (@_3 = Base.iterate(%4))
│    %6   = @_3::Union{Nothing, Tuple{UInt8, Int64}}
│    %7   = (%6 === nothing)::Bool
│    %8   = Base.not_int(%7)::Bool
└───        goto #24 if not %8
2 ┄─        Core.NewvarNode(:(val@_7))
│           Core.NewvarNode(:(val@_8))
│           Core.NewvarNode(:(digit_b))
│           Core.NewvarNode(:(digit_a))
│           Core.NewvarNode(:(##215))
│           Core.NewvarNode(:(##216))
│           Core.NewvarNode(:(##217))
│           Core.NewvarNode(:(##212))
│           Core.NewvarNode(:(##213))
│    %19  = @_3::Tuple{UInt8, Int64}
│           (c = Core.getfield(%19, 1))
│    %21  = Core.getfield(%19, 2)::Int64
│    %22  = c::UInt8
│           (digitRes = Main.someDigit(%22))
│    %24  = may_a::Union{Nothing, UInt8}
│           (##214 = %24)
│    %26  = Base.:!::Core.Const(!)
│    %27  = ##214::Union{Nothing, UInt8}
│    %28  = Base.isnothing(%27)::Bool
│    %29  = (%26)(%28)::Bool
└───        goto #4 if not %29
3 ── %31  = ##214::UInt8
│           (@_19 = Base.something(%31))
└───        goto #11
4 ── %34  = digitRes::Union{Nothing, Some{UInt8}}
│           (##213 = %34)
│    %36  = Base.:!::Core.Const(!)
│    %37  = ##213::Union{Nothing, Some{UInt8}}
│    %38  = Base.isnothing(%37)::Bool
│    %39  = (%36)(%38)::Bool
└───        goto #6 if not %39
5 ── %41  = ##213::Some{UInt8}
│           (@_20 = Base.something(%41))
└───        goto #10
6 ── %44  = Main.Some::Core.Const(Some)
│    %45  = Main.nothing::Core.Const(nothing)
│           (##212 = (%44)(%45))
│    %47  = Base.:!::Core.Const(!)
│    %48  = ##212::Core.Const(Some(nothing))
│    %49  = Base.isnothing(%48)::Core.Const(false)
│    %50  = (%47)(%49)::Core.Const(true)
└───        goto #8 if not %50
7 ── %52  = ##212::Core.Const(Some(nothing))
│           (@_21 = Base.something(%52))
└───        goto #9
8 ──        Core.Const(nothing)
│           Core.Const(:(val@_8 = Base.something(Base.nothing)))
│           Core.Const(nothing)
│           Core.Const(:(val@_8))
└───        Core.Const(:(@_21 = %58))
9 ┄─ %60  = @_21::Core.Const(nothing)
└───        (@_20 = %60)
10 ┄ %62  = @_20::Union{Nothing, UInt8}
└───        (@_19 = %62)
11 ┄ %64  = @_19::Union{Nothing, UInt8}
│           (may_a = %64)
│    %66  = digitRes::Union{Nothing, Some{UInt8}}
│           (##217 = %66)
│    %68  = Base.:!::Core.Const(!)
│    %69  = ##217::Union{Nothing, Some{UInt8}}
│    %70  = Base.isnothing(%69)::Bool
│    %71  = (%68)(%70)::Bool
└───        goto #13 if not %71
12 ─ %73  = ##217::Some{UInt8}
│           (@_22 = Base.something(%73))
└───        goto #20
13 ─ %76  = may_b::Union{Nothing, UInt8}
│           (##216 = %76)
│    %78  = Base.:!::Core.Const(!)
│    %79  = ##216::Union{Nothing, UInt8}
│    %80  = Base.isnothing(%79)::Bool
│    %81  = (%78)(%80)::Bool
└───        goto #15 if not %81
14 ─ %83  = ##216::UInt8
│           (@_23 = Base.something(%83))
└───        goto #19
15 ─ %86  = Main.Some::Core.Const(Some)
│    %87  = Main.nothing::Core.Const(nothing)
│           (##215 = (%86)(%87))
│    %89  = Base.:!::Core.Const(!)
│    %90  = ##215::Core.Const(Some(nothing))
│    %91  = Base.isnothing(%90)::Core.Const(false)
│    %92  = (%89)(%91)::Core.Const(true)
└───        goto #17 if not %92
16 ─ %94  = ##215::Core.Const(Some(nothing))
│           (@_24 = Base.something(%94))
└───        goto #18
17 ─        Core.Const(nothing)
│           Core.Const(:(val@_7 = Base.something(Base.nothing)))
│           Core.Const(nothing)
│           Core.Const(:(val@_7))
└───        Core.Const(:(@_24 = %100))
18 ┄ %102 = @_24::Core.Const(nothing)
└───        (@_23 = %102)
19 ┄ %104 = @_23::Union{Nothing, UInt8}
└───        (@_22 = %104)
20 ┄ %106 = @_22::Union{Nothing, UInt8}
│           (may_b = %106)
│    %108 = Main.:(==)::Core.Const(==)
│    %109 = c::UInt8
│    %110 = Main.UInt8('\n')::Core.Const(0x0a)
│    %111 = (%108)(%109, %110)::Bool
└───        goto #22 if not %111
21 ─ %113 = may_a::Union{Nothing, UInt8}
│           (digit_a = Core.typeassert(%113, Main.UInt8))
│    %115 = may_b::Union{Nothing, UInt8}
│           (digit_b = Core.typeassert(%115, Main.UInt8))
│    %117 = Main.:+::Core.Const(+)
│    %118 = total::Int64
│    %119 = Main.:+::Core.Const(+)
│    %120 = Main.:*::Core.Const(*)
│    %121 = digit_a::UInt8
│    %122 = (%120)(%121, 0x0a)::UInt8
│    %123 = digit_b::UInt8
│    %124 = (%119)(%122, %123)::UInt8
│           (total = (%117)(%118, %124))
│           (may_a = Main.nothing)
└───        (may_b = Main.nothing)
22 ┄        (@_3 = Base.iterate(%4, %21))
│    %129 = @_3::Union{Nothing, Tuple{UInt8, Int64}}
│    %130 = (%129 === nothing)::Bool
│    %131 = Base.not_int(%130)::Bool
└───        goto #24 if not %131
23 ─        goto #2
24 ┄ %134 = total::Int64
└───        return %134
```
</details>


<details>
<summary>`@code_native debuginfo=:none` After </summary>

```julia

julia> @code_native debuginfo=:none part1(data)
	.text
	.file	"part1"
	.globl	julia_part1_1203                # -- Begin function julia_part1_1203
	.p2align	4, 0x90
	.type	julia_part1_1203,@function
julia_part1_1203:                       # @julia_part1_1203
; Function Signature: part1(Array{UInt8, 1})
# %bb.0:                                # %top
	#DEBUG_VALUE: part1:data <- [DW_OP_deref] $rdi
	push	rbp
	mov	rbp, rsp
	push	r15
	push	r14
	push	r13
	push	r12
	push	rbx
	sub	rsp, 40
	vxorps	xmm0, xmm0, xmm0
	#APP
	mov	rax, qword ptr fs:[0]
	#NO_APP
	lea	rdx, [rbp - 64]
	vmovaps	xmmword ptr [rbp - 64], xmm0
	mov	qword ptr [rbp - 48], 0
	mov	rcx, qword ptr [rax - 8]
	mov	qword ptr [rbp - 64], 4
	mov	rax, qword ptr [rcx]
	mov	qword ptr [rbp - 72], rcx       # 8-byte Spill
	mov	qword ptr [rbp - 56], rax
	mov	qword ptr [rcx], rdx
	#DEBUG_VALUE: part1:data <- [DW_OP_deref] 0
	mov	r15, qword ptr [rdi + 16]
	test	r15, r15
	je	.LBB0_1
# %bb.2:                                # %L34
	mov	r14, qword ptr [rdi]
	dec	r15
	mov	r11b, 1
	mov	r13b, 1
                                        # implicit-def: $r12b
                                        # implicit-def: $r10b
	xor	eax, eax
	jmp	.LBB0_3
	.p2align	4, 0x90
.LBB0_4:                                #   in Loop: Header=BB0_3 Depth=1
	xor	r11d, r11d
	mov	ebx, edi
	mov	r10d, r8d
.LBB0_9:                                # %L114
                                        #   in Loop: Header=BB0_3 Depth=1
	mov	r12d, esi
	test	r15, r15
	je	.LBB0_12
.LBB0_10:                               # %guard_exit126
                                        #   in Loop: Header=BB0_3 Depth=1
	inc	r14
	dec	r15
	mov	r13d, ebx
.LBB0_3:                                # %L36
                                        # =>This Inner Loop Header: Depth=1
	movzx	edx, byte ptr [r14]
	test	r13b, 1
	movzx	edi, r13b
	mov	ebx, 1
	mov	ecx, 0
	cmove	ebx, edi
	cmovne	edi, ecx
	movzx	ecx, r10b
	lea	esi, [rdx - 48]
	lea	r9d, [rdx - 58]
	movzx	r8d, sil
	cmove	r8d, ecx
	cmp	r9b, -11
	ja	.LBB0_4
# %bb.5:                                # %L89
                                        #   in Loop: Header=BB0_3 Depth=1
	test	r11b, 1
	jne	.LBB0_8
# %bb.6:                                # %L102
                                        #   in Loop: Header=BB0_3 Depth=1
	cmp	dl, 10
	jne	.LBB0_7
# %bb.13:                               # %L106
                                        #   in Loop: Header=BB0_3 Depth=1
	test	r13b, 1
	jne	.LBB0_14
# %bb.11:                               # %L114.thread
                                        #   in Loop: Header=BB0_3 Depth=1
	add	ecx, ecx
	mov	bl, 1
	mov	r11b, 1
	lea	ecx, [rcx + 4*rcx]
	add	cl, r12b
	movzx	ecx, cl
	add	rax, rcx
	test	r15, r15
	jne	.LBB0_10
	jmp	.LBB0_12
	.p2align	4, 0x90
.LBB0_8:                                # %L102.thread
                                        #   in Loop: Header=BB0_3 Depth=1
	mov	r11b, 1
                                        # implicit-def: $sil
	cmp	dl, 10
	jne	.LBB0_9
	jmp	.LBB0_15
.LBB0_7:                                #   in Loop: Header=BB0_3 Depth=1
	mov	esi, r12d
	jmp	.LBB0_9
.LBB0_1:
	xor	eax, eax
.LBB0_12:                               # %L154
	mov	rcx, qword ptr [rbp - 56]
	mov	rdx, qword ptr [rbp - 72]       # 8-byte Reload
	mov	qword ptr [rdx], rcx
	add	rsp, 40
	pop	rbx
	pop	r12
	pop	r13
	pop	r14
	pop	r15
	pop	rbp
	ret
.LBB0_15:                               # %L106.thread
	test	r13b, 1
	jne	.LBB0_14
# %bb.16:                               # %post_box_union47
	movabs	rax, offset jl_nothing
	movabs	rcx, offset jl_small_typeof
	movabs	rdi, offset ".L_j_str_typeassert#1"
	mov	rdx, qword ptr [rax]
	mov	rsi, qword ptr [rcx + 336]
	movabs	rax, offset ijl_type_error
	mov	qword ptr [rbp - 48], rsi
	call	rax
.LBB0_14:                               # %post_box_union
	movabs	rax, offset jl_nothing
	movabs	rcx, offset jl_small_typeof
	movabs	rdi, offset ".L_j_str_typeassert#1"
	mov	rdx, qword ptr [rax]
	mov	rsi, qword ptr [rcx + 336]
	movabs	rax, offset ijl_type_error
	mov	qword ptr [rbp - 48], rsi
	call	rax
.Lfunc_end0:
	.size	julia_part1_1203, .Lfunc_end0-julia_part1_1203
                                        # -- End function
	.type	".L_j_str_typeassert#1",@object # @"_j_str_typeassert#1"
	.section	.rodata.str1.1,"aMS",@progbits,1
".L_j_str_typeassert#1":
	.asciz	"typeassert"
	.size	".L_j_str_typeassert#1", 11

	.section	".note.GNU-stack","",@progbits
```
</details>

Co-authored-by: Sukera <Seelengrab@users.noreply.github.com>
maleadt added a commit that referenced this issue Apr 17, 2024
The former also handles vectors of pointers, which can occur after vectorization:

```
#5  0x00007f5bfe94de5e in llvm::cast<llvm::PointerType, llvm::Type> (Val=<optimized out>) at llvm/Support/Casting.h:578
578	  assert(isa<To>(Val) && "cast<Ty>() argument of incompatible type!");

(rr) up
#6  GCInvariantVerifier::visitAddrSpaceCastInst (this=this@entry=0x7ffd022fbf56, I=...) at julia/src/llvm-gc-invariant-verifier.cpp:66
66	    unsigned ToAS = cast<PointerType>(I.getDestTy())->getAddressSpace();

(rr) call I.dump()
%23 = addrspacecast <4 x ptr addrspace(10)> %wide.load to <4 x ptr addrspace(11)>, !dbg !43
```
maleadt added a commit that referenced this issue Apr 18, 2024
The former also handles vectors of pointers, which can occur after vectorization:

```
#5  0x00007f5bfe94de5e in llvm::cast<llvm::PointerType, llvm::Type> (Val=<optimized out>) at llvm/Support/Casting.h:578
578	  assert(isa<To>(Val) && "cast<Ty>() argument of incompatible type!");

(rr) up
#6  GCInvariantVerifier::visitAddrSpaceCastInst (this=this@entry=0x7ffd022fbf56, I=...) at julia/src/llvm-gc-invariant-verifier.cpp:66
66	    unsigned ToAS = cast<PointerType>(I.getDestTy())->getAddressSpace();

(rr) call I.dump()
%23 = addrspacecast <4 x ptr addrspace(10)> %wide.load to <4 x ptr addrspace(11)>, !dbg !43
```
giordano pushed a commit that referenced this issue Apr 19, 2024
…ce. (#54113)

The former also handles vectors of pointers, which can occur after
vectorization:

```
#5  0x00007f5bfe94de5e in llvm::cast<llvm::PointerType, llvm::Type> (Val=<optimized out>) at llvm/Support/Casting.h:578
578	  assert(isa<To>(Val) && "cast<Ty>() argument of incompatible type!");

(rr) up
#6  GCInvariantVerifier::visitAddrSpaceCastInst (this=this@entry=0x7ffd022fbf56, I=...) at julia/src/llvm-gc-invariant-verifier.cpp:66
66	    unsigned ToAS = cast<PointerType>(I.getDestTy())->getAddressSpace();

(rr) call I.dump()
%23 = addrspacecast <4 x ptr addrspace(10)> %wide.load to <4 x ptr addrspace(11)>, !dbg !43
```

Fixes aborts seen in #53070
aviatesk added a commit that referenced this issue Oct 1, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 1, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 2, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 2, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 2, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 4, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 4, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 4, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 5, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
topolarity added a commit to topolarity/julia that referenced this issue Oct 7, 2024
This makes it much easier to spot dynamic dispatches:
```julia
3 ── %9  = (isa)(%4, @NamedTuple{x::Int64, y})::Bool
└───       goto JuliaLang#5 if not %9
4 ── %11 = π (%4, @NamedTuple{x::Int64, y})
└───       goto JuliaLang#6
5 ── %13 = (Tuple{Int64, Any})(%4)::Tuple{Int64, Any}
│    %14 = (getfield)(%13, 1)::Int64
│    %15 = (getfield)(%13, 2)::Any
│    %16 = %new(@NamedTuple{x::Int64, y}, %14, %15)::@NamedTuple{x::Int64, y}
```

is now:
```julia
3 ── %9  = builtin (isa)(%4, @NamedTuple{x::Int64, y})::Bool
└───       goto JuliaLang#5 if not %9
4 ── %11 = π (%4, @NamedTuple{x::Int64, y})
└───       goto JuliaLang#6
5 ── %13 = dynamic (Tuple{Int64, Any})(%4)::Tuple{Int64, Any}
│    %14 = builtin (getfield)(%13, 1)::Int64
│    %15 = builtin (getfield)(%13, 2)::Any
│    %16 = %new(@NamedTuple{x::Int64, y}, %14, %15)::@NamedTuple{x::Int64, y}
```
aviatesk added a commit that referenced this issue Oct 9, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
topolarity added a commit to topolarity/julia that referenced this issue Oct 10, 2024
This makes it much easier to spot dynamic dispatches:
```julia
3 ── %9  = (isa)(%4, @NamedTuple{x::Int64, y})::Bool
└───       goto JuliaLang#5 if not %9
4 ── %11 = π (%4, @NamedTuple{x::Int64, y})
└───       goto JuliaLang#6
5 ── %13 = (Tuple{Int64, Any})(%4)::Tuple{Int64, Any}
│    %14 = (getfield)(%13, 1)::Int64
│    %15 = (getfield)(%13, 2)::Any
│    %16 = %new(@NamedTuple{x::Int64, y}, %14, %15)::@NamedTuple{x::Int64, y}
```

is now:
```julia
3 ── %9  = builtin (isa)(%4, @NamedTuple{x::Int64, y})::Bool
└───       goto JuliaLang#5 if not %9
4 ── %11 = π (%4, @NamedTuple{x::Int64, y})
└───       goto JuliaLang#6
5 ── %13 = dynamic (Tuple{Int64, Any})(%4)::Tuple{Int64, Any}
│    %14 = builtin (getfield)(%13, 1)::Int64
│    %15 = builtin (getfield)(%13, 2)::Any
│    %16 = %new(@NamedTuple{x::Int64, y}, %14, %15)::@NamedTuple{x::Int64, y}
```
aviatesk added a commit that referenced this issue Oct 11, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 11, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
topolarity added a commit to topolarity/julia that referenced this issue Oct 11, 2024
This makes it much easier to spot dynamic dispatches:
```julia
3 ── %9  = (isa)(%4, @NamedTuple{x::Int64, y})::Bool
└───       goto JuliaLang#5 if not %9
4 ── %11 = π (%4, @NamedTuple{x::Int64, y})
└───       goto JuliaLang#6
5 ── %13 = (Tuple{Int64, Any})(%4)::Tuple{Int64, Any}
│    %14 = (getfield)(%13, 1)::Int64
│    %15 = (getfield)(%13, 2)::Any
│    %16 = %new(@NamedTuple{x::Int64, y}, %14, %15)::@NamedTuple{x::Int64, y}
```

is now:
```julia
3 ── %9  = builtin (isa)(%4, @NamedTuple{x::Int64, y})::Bool
└───       goto JuliaLang#5 if not %9
4 ── %11 = π (%4, @NamedTuple{x::Int64, y})
└───       goto JuliaLang#6
5 ── %13 = dynamic (Tuple{Int64, Any})(%4)::Tuple{Int64, Any}
│    %14 = builtin (getfield)(%13, 1)::Int64
│    %15 = builtin (getfield)(%13, 2)::Any
│    %16 = %new(@NamedTuple{x::Int64, y}, %14, %15)::@NamedTuple{x::Int64, y}
```

This is on by default when displaying a CodeInfo, and  off by default
for `code_warntype`, unless optimize=true. Can be enabled / disabled
via IRShowConfig.label_dynamic_calls
topolarity added a commit to topolarity/julia that referenced this issue Oct 11, 2024
This makes it much easier to spot dynamic dispatches:
```julia
3 ── %9  = (isa)(%4, @NamedTuple{x::Int64, y})::Bool
└───       goto JuliaLang#5 if not %9
4 ── %11 = π (%4, @NamedTuple{x::Int64, y})
└───       goto JuliaLang#6
5 ── %13 = (Tuple{Int64, Any})(%4)::Tuple{Int64, Any}
│    %14 = (getfield)(%13, 1)::Int64
│    %15 = (getfield)(%13, 2)::Any
│    %16 = %new(@NamedTuple{x::Int64, y}, %14, %15)::@NamedTuple{x::Int64, y}
```

is now:
```julia
3 ── %9  = builtin (isa)(%4, @NamedTuple{x::Int64, y})::Bool
└───       goto JuliaLang#5 if not %9
4 ── %11 = π (%4, @NamedTuple{x::Int64, y})
└───       goto JuliaLang#6
5 ── %13 = dynamic (Tuple{Int64, Any})(%4)::Tuple{Int64, Any}
│    %14 = builtin (getfield)(%13, 1)::Int64
│    %15 = builtin (getfield)(%13, 2)::Any
│    %16 = %new(@NamedTuple{x::Int64, y}, %14, %15)::@NamedTuple{x::Int64, y}
```

This is on by default when displaying a CodeInfo, and  off by default
for `code_warntype`, unless optimize=true. Can be enabled / disabled
via IRShowConfig.label_dynamic_calls
topolarity added a commit to topolarity/julia that referenced this issue Oct 11, 2024
This makes it much easier to spot dynamic dispatches:
```julia
3 ── %9  = (isa)(%4, @NamedTuple{x::Int64, y})::Bool
└───       goto JuliaLang#5 if not %9
4 ── %11 = π (%4, @NamedTuple{x::Int64, y})
└───       goto JuliaLang#6
5 ── %13 = (Tuple{Int64, Any})(%4)::Tuple{Int64, Any}
│    %14 = (getfield)(%13, 1)::Int64
│    %15 = (getfield)(%13, 2)::Any
│    %16 = %new(@NamedTuple{x::Int64, y}, %14, %15)::@NamedTuple{x::Int64, y}
```

is now:
```julia
3 ── %9  = builtin (isa)(%4, @NamedTuple{x::Int64, y})::Bool
└───       goto JuliaLang#5 if not %9
4 ── %11 = π (%4, @NamedTuple{x::Int64, y})
└───       goto JuliaLang#6
5 ── %13 = dynamic (Tuple{Int64, Any})(%4)::Tuple{Int64, Any}
│    %14 = builtin (getfield)(%13, 1)::Int64
│    %15 = builtin (getfield)(%13, 2)::Any
│    %16 = %new(@NamedTuple{x::Int64, y}, %14, %15)::@NamedTuple{x::Int64, y}
```

This is on by default when displaying a CodeInfo, and  off by default
for `code_warntype`, unless optimize=true. Can be enabled / disabled
via IRShowConfig.label_dynamic_calls
aviatesk added a commit that referenced this issue Oct 12, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 15, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 16, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 16, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked
as `broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical. In
particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
for limited frames because of latency reasons, which significantly
reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
algorithm requires `:nothrow`-ness on all paths from the allocation of
the mutable struct to its last use, which is not practical for
real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
optimizations such as inserting a `finalize` call after the last use
might still be possible (#55990).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
building Build system, or building Julia or its dependencies
Projects
None yet
Development

No branches or pull requests

3 participants