Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

更新 #1

Open
wants to merge 10,000 commits into
base: master
Choose a base branch
from
Open

更新 #1

wants to merge 10,000 commits into from

Conversation

zionfuo
Copy link

@zionfuo zionfuo commented Oct 23, 2018

No description provided.

aviatesk and others added 30 commits June 26, 2024 19:39
This adds the option to pass a filename of configuration settings when
building the Core/compiler system image (from
`base/compiler/compiler.jl`). This makes it easier to build different
flavors of images, for example it can replace the hack that
PackageCompiler uses to edit the list of included stdlibs, and makes it
easy to change knobs you might want like max_methods.
Seems like this got stale after the Memory work.
…54946)

We don't store anything in the lowest two bits of `sz` after
#49644.
This should have no functional changes, however, it will affect the
version of non-stdlib JLLs.

I'd like to see if we can add this as a backport candidate to 1.11 since
it doesn't change Julia functionality at all, but does allow some
non-stdlib JLLs to be kept current. Otherwise at least the SPEX linear
solvers and the ParU linear solvers will be missing multiple significant
features until 1.12.
This uses the same approach as the existing findnext and findprev
functions in the same file.

The following benchmark:
```julia
using BenchmarkTools
s = join(rand('A':'z', 10000));
@Btime findall(==('c'), s);
```
Gives these results:
* This PR: 3.489 μs
* 1.11-beta1: 31.970 μs
…54951)

We may use the knowledge that `alpha != 0` at the call site to hard-code
`alpha = true` in the `MulAddMul` constructor if `alpha isa Bool`. This
eliminates the `!isone(alpha)` branches in `@stable_muladdmul`, and
reduces latency in matrix multiplication.

```julia
julia> using LinearAlgebra

julia> A = rand(2,2);

julia> @time A * A;
  0.596825 seconds (1.05 M allocations: 53.458 MiB, 5.94% gc time, 99.95% compilation time) # nightly v"1.12.0-DEV.789"
  0.473140 seconds (793.52 k allocations: 39.946 MiB, 3.28% gc time, 99.93% compilation time) # this PR
``` 
In a separate session,
```julia
julia> @time A * Symmetric(A);
  0.829252 seconds (2.37 M allocations: 120.051 MiB, 1.98% gc time, 99.98% compilation time) # nightly v"1.12.0-DEV.789"
  0.712953 seconds (2.06 M allocations: 103.951 MiB, 2.17% gc time, 99.98% compilation time) # This PR
```
When working on Base, if you break inference (in a way that preserves
correctness, but not precision), it would be nice if the system
bootstrapped anyway, since it's easier to poke at the system if the REPL
is running. However, there were a few places where we were relying on
the inferred element type for empty collections while passing those
values to callees with narrow type signatures. Switch these to
comprehensions with declared type instead, so that even if inference is
(temporarily) borked, things will still boostrap fine.
We were calling `repr` here to interpolate the character with the quotes
into the error message. However, this is overkill for this application,
and `repr` introduces dynamic dispatch into the call. This PR hard-codes
the quotes into the string, which matches the pattern followed in the
other error messages following `chkvalidparam`.
…54916)

In Cassette-like systems, where inference has to infer many calls of
`@generated` function and the generated function involves complex code
transformations, the overhead from code generation itself can become
significant. This is because the results of code generation are not
cached, leading to duplicated code generation in the following contexts:
- `method_for_inference_heuristics` for regular inference on cached
`@generated` function calls (since
`method_for_inference_limit_heuristics` isn't stored in cached optimized
sources, but is attached to generated unoptimized sources).
- `retrieval_code_info` for constant propagation on cached `@generated`
function calls.

Having said that, caching unoptimized sources generated by `@generated`
functions is not a good tradeoff in general cases, considering the
memory space consumed (and the image bloat). The code generation for
generators like `GeneratedFunctionStub` produced by the front end is
generally very simple, and the first duplicated code generation
mentioned above does not occur for `GeneratedFunctionStub`.

So this unoptimized source caching should be enabled in an opt-in
manner.

Based on this idea, this commit defines the trait `abstract type
Core.CachedGenerator` as an interface for the external systems to
opt-in. If the generator is a subtype of this trait, inference caches
the generated unoptimized code, sacrificing memory space to improve the
performance of subsequent inferences. Specifically, the mechanism for
caching the unoptimized source uses the infrastructure already
implemented in #54362. Thanks to #54362,
the cache for generated functions is now partitioned by world age, so
even if the unoptimized source is cached, the existing invalidation
system will invalidate it as expected.

In JuliaDebug/CassetteOverlay.jl#56, the following benchmark results
showed that approximately 1.5~3x inference speedup is achieved by opting
into this feature:

## Setup
```julia
using CassetteOverlay, BaseBenchmarks, BenchmarkTools

@MethodTable table;
pass = @overlaypass table;
BaseBenchmarks.load!("inference");
benchfunc1() = sin(42)
benchfunc2(xs, x) = findall(>(x), abs.(xs))
interp = BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker()

# benchmark inference on entire call graphs from scratch
@benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1)
@benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5)

# benchmark inference on the call graphs with most of them cached
@benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1)
@benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5)
```

## Benchmark inference on entire call graphs from scratch
> on master
```
julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1)
BenchmarkTools.Trial: 61 samples with 1 evaluation.
 Range (min … max):  78.574 ms … 87.653 ms  ┊ GC (min … max): 0.00% … 8.81%
 Time  (median):     83.149 ms              ┊ GC (median):    4.85%
 Time  (mean ± σ):   82.138 ms ±  2.366 ms  ┊ GC (mean ± σ):  3.36% ± 2.65%

  ▂ ▂▂ █     ▂                     █ ▅     ▅
  █▅██▅█▅▁█▁▁█▁▁▁▁▅▁▁▁▁▁▁▁▁▅▁▁▅██▅▅█████████▁█▁▅▁▁▁▁▁▁▁▁▁▁▁▁▅ ▁
  78.6 ms         Histogram: frequency by time        86.8 ms <

 Memory estimate: 52.32 MiB, allocs estimate: 1201192.

julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5)
BenchmarkTools.Trial: 4 samples with 1 evaluation.
 Range (min … max):  1.345 s …  1.369 s  ┊ GC (min … max): 2.45% … 3.39%
 Time  (median):     1.355 s             ┊ GC (median):    2.98%
 Time  (mean ± σ):   1.356 s ± 9.847 ms  ┊ GC (mean ± σ):  2.96% ± 0.41%

  █                   █     █                            █
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.35 s        Histogram: frequency by time        1.37 s <

 Memory estimate: 637.96 MiB, allocs estimate: 15159639.
```
> with this PR
```
julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc1)
BenchmarkTools.Trial: 230 samples with 1 evaluation.
 Range (min … max):  19.339 ms … 82.521 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.938 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   21.665 ms ±  4.666 ms  ┊ GC (mean ± σ):  6.72% ± 8.80%

  ▃▇█▇▄                     ▂▂▃▃▄
  █████▇█▇▆▅▅▆▅▅▁▅▁▁▁▁▁▁▁▁▁██████▆▁█▁▅▇▆▁▅▁▁▅▁▅▁▁▁▁▁▁▅▁▁▁▁▁▁▅ ▆
  19.3 ms      Histogram: log(frequency) by time      29.4 ms <

 Memory estimate: 28.67 MiB, allocs estimate: 590138.

julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call pass(benchfunc2, rand(10), 0.5)
BenchmarkTools.Trial: 14 samples with 1 evaluation.
 Range (min … max):  354.585 ms … 390.400 ms  ┊ GC (min … max): 0.00% … 7.01%
 Time  (median):     368.778 ms               ┊ GC (median):    3.74%
 Time  (mean ± σ):   368.824 ms ±   8.853 ms  ┊ GC (mean ± σ):  3.70% ± 1.89%

             ▃            █
  ▇▁▁▁▁▁▁▁▁▁▁█▁▇▇▁▁▁▁▇▁▁▁▁█▁▁▁▁▇▁▁▇▁▁▇▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
  355 ms           Histogram: frequency by time          390 ms <

 Memory estimate: 227.86 MiB, allocs estimate: 4689830.
```

## Benchmark inference on the call graphs with most of them cached
> on master
```
julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  45.166 μs …  9.799 ms  ┊ GC (min … max): 0.00% … 98.96%
 Time  (median):     46.792 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   48.339 μs ± 97.539 μs  ┊ GC (mean ± σ):  2.01% ±  0.99%

    ▁▂▄▆▆▇███▇▆▅▄▃▄▄▂▂▂▁▁▁   ▁▁▂▂▁ ▁ ▂▁ ▁                     ▃
  ▃▇██████████████████████▇████████████████▇█▆▇▇▆▆▆▅▆▆▆▇▆▅▅▅▆ █
  45.2 μs      Histogram: log(frequency) by time        55 μs <

 Memory estimate: 25.27 KiB, allocs estimate: 614.

julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  303.375 μs …  16.582 ms  ┊ GC (min … max): 0.00% … 97.38%
 Time  (median):     317.625 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   338.772 μs ± 274.164 μs  ┊ GC (mean ± σ):  5.44% ±  7.56%

       ▃▆██▇▅▂▁
  ▂▂▄▅██████████▇▆▅▅▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▁▂▁▁▂▁▂▂ ▃
  303 μs           Histogram: frequency by time          394 μs <

 Memory estimate: 412.80 KiB, allocs estimate: 6224.
```
> with this PR
```
@benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc1)
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
 Range (min … max):  5.444 μs …  1.808 ms  ┊ GC (min … max): 0.00% … 99.01%
 Time  (median):     5.694 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.228 μs ± 25.393 μs  ┊ GC (mean ± σ):  5.73% ±  1.40%

      ▄█▇▄
  ▁▂▄█████▇▄▃▃▃▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  5.44 μs        Histogram: frequency by time        7.47 μs <

 Memory estimate: 8.72 KiB, allocs estimate: 196.

julia> @benchmark BaseBenchmarks.InferenceBenchmarks.@inf_call interp=interp pass(benchfunc2, rand(10), 0.5)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  211.000 μs …  36.187 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     223.000 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   280.025 μs ± 750.097 μs  ┊ GC (mean ± σ):  6.86% ± 7.16%

  █▆▄▂▁                                                         ▁
  ███████▇▇▇▆▆▆▅▆▅▅▅▅▅▄▅▄▄▄▅▅▁▄▅▃▄▄▄▃▄▄▃▅▄▁▁▃▄▁▃▁▁▁▃▄▃▁▃▁▁▁▃▃▁▃ █
  211 μs        Histogram: log(frequency) by time       1.46 ms <

 Memory estimate: 374.17 KiB, allocs estimate: 5269.
```
In `using A.B`, we need to evaluate `A.B` to add the module to the using
list. However, in `using A: B`, we do not care about the value of `A.B`,
we only operate at the binding level. These two operations share a code
path and the evaluation of `A.B` happens early and is unused on the
`using A: B` path. I believe this was an unintentional oversight when
the latter syntax was added. Fixes #54954.
Our codegen for `cglobal` was sharing the `static_eval` code for symbols
with ccall. However, we do have full runtime emulation for this
intrinsic, so mandating that the symbol can be statically evaluated is
not required and causes semantic differences between the interpreter and
codegen, which is undesirable. Just fall back to the runtime intrinsic
instead.
Simplifies handling of buffered pages by keeping them in a single place (`global_page_pool_lazily_freed`) instead of making them thread local. Performance has been assessed on the serial & multithreaded GCBenchmarks and it has shown to be performance neutral.
- moved non-contextual tests into `staged.jl`
- moved `@overlay` tests into `core.jl`
- test `staged.jl` in an interpreter mode
Otherwise it's trying to find the `.` package, which obviously doesn't
exist. This isn't really a problem - it just does some extra processing
and loads Pkg, but let's make this correct anyway.
Did a quick grep and couldn't find any reference to them besides this
manual.
…54984)

Also adds a bunch of integrity constraint checks to ensure we don't
repeat the bug from #54645.
I think this tool is there mainly to see what's taking so long, so
timing information is helpful.
…54965)

By changing the default to `true` we make it easier to build relocatable
packages from already existing ones when 1.11 lands.

This keyword was just added during 1.11, so its not yet too late to
change its default.
)

Otherwise it may result in missing `⊑` method error in uses cases by
external abstract interpreters using `MustAliasesLattice` like JET.
As mentioned in #54968,
`OBJPROFILE` exposes a functionality which is quite similar to what the
heap snapshot does, but has a considerably worse visualization tool
(i.e. raw printf's compared to the snapshot viewer from Chrome).

Closes #54968.
)

Co-authored-by: Steven G. Johnson <stevenj@alum.mit.edu>
See discussion in #55014.

Doesn't seem breaking, but I can close the PR if it is.

Closes #55014.
Currently we error when attempting to serialize Bindings that do not
beloing to the incremental module (GlobalRefs have special logic to
avoid looking at the binding field). With #54654, Bindings will show up
in more places, so let's just unique them properly by their module/name
identity. Of course, we then have two objects so serialized (both
GlobalRef and Binding), which suggests that we should perhaps finish the
project of unifying them. This is not currently possible, because the
existence of a binding object in the binding table has semantic content,
but this will change with #54654, so we can do such a change thereafter.
Co-authored-by: Jeff Bezanson <jeff.bezanson@gmail.com>
jishnub and others added 30 commits August 6, 2024 19:04
Fixes the following regression introduced in v1.11
```julia
julia> using LinearAlgebra

julia> D = Diagonal(rand(4));

julia> T = Tridiagonal(Vector{BigFloat}(undef, 3), Vector{BigFloat}(undef, 4), Vector{BigFloat}(undef, 3))
4×4 Tridiagonal{BigFloat, Vector{BigFloat}}:
 #undef  #undef     ⋅       ⋅ 
 #undef  #undef  #undef     ⋅ 
    ⋅    #undef  #undef  #undef
    ⋅       ⋅    #undef  #undef

julia> copyto!(T, D)
ERROR: UndefRefError: access to undefined reference
Stacktrace:
  [1] getindex
    @ ./essentials.jl:907 [inlined]
  [2] _broadcast_getindex
    @ ./broadcast.jl:644 [inlined]
  [3] _getindex
    @ ./broadcast.jl:675 [inlined]
  [4] _broadcast_getindex
    @ ./broadcast.jl:650 [inlined]
  [5] getindex
    @ ./broadcast.jl:610 [inlined]
  [6] macro expansion
    @ ./broadcast.jl:973 [inlined]
  [7] macro expansion
    @ ./simdloop.jl:77 [inlined]
  [8] copyto!
    @ ./broadcast.jl:972 [inlined]
  [9] copyto!
    @ ./broadcast.jl:925 [inlined]
 [10] materialize!
    @ ./broadcast.jl:883 [inlined]
 [11] materialize!
    @ ./broadcast.jl:880 [inlined]
 [12] _copyto_banded!(T::Tridiagonal{BigFloat, Vector{BigFloat}}, D::Diagonal{Float64, Vector{Float64}})
    @ LinearAlgebra ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/LinearAlgebra/src/special.jl:323
 [13] copyto!(dest::Tridiagonal{BigFloat, Vector{BigFloat}}, src::Diagonal{Float64, Vector{Float64}})
    @ LinearAlgebra ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/LinearAlgebra/src/special.jl:315
 [14] top-level scope
    @ REPL[4]:1
```
After this PR
```julia
julia> copyto!(T, D)
4×4 Tridiagonal{BigFloat, Vector{BigFloat}}:
 0.909968  0.0        ⋅         ⋅ 
 0.0       0.193341  0.0        ⋅ 
  ⋅        0.0       0.194794  0.0
  ⋅         ⋅        0.0       0.506905
```
The current implementation used an optimization that may not be
applicable for non-isbits types, and this PR ensures that we always read
from the source and write to the destination.
Currently `@testset` allows specifying multiple descriptions and testset
types, and only the last one will take effect. The others will be
silently ignored.

This PR starts printing deprecation warnings whenever such conflicting
arguments are provided.
This increases the default stack size limit on 64-bit systems from 4 MB
to 8 MB, matching glibc and typical modern Linux and macOS machines, as
well as the stack size limit of the root Julia process.
Since `checkdims_perm` only checks the axes of the arrays that are
passed to it, this PR adds a method that accepts the axes as arguments
instead of the arrays. This will avoid having to specialize on array
types.
An example of an improvement:
On master
```julia
julia> using LinearAlgebra

julia> D = Diagonal(zeros(1));

julia> Dv = Diagonal(view(zeros(1),:));

julia> @time @eval permutedims(D, (2,1));
  0.016841 seconds (13.68 k allocations: 680.672 KiB, 51.37% compilation time)

julia> @time @eval permutedims(Dv, (2,1));
  0.009303 seconds (11.24 k allocations: 564.203 KiB, 97.79% compilation time)
```
This PR
```julia
julia> @time @eval permutedims(D, (2,1));
  0.016837 seconds (13.42 k allocations: 667.438 KiB, 51.05% compilation time)

julia> @time @eval permutedims(Dv, (2,1));
  0.009076 seconds (6.59 k allocations: 321.156 KiB, 97.46% compilation time)
```
The allocations are lower in the second call.

I've retained the original method as well, as some packages seem to be
using it. This now forwards the axes to the new method.
Due to limitations in the LLVM implementation, we are forced to emit
fairly bad code here. But we need to make sure it is still correct with
respect to GC rooting.

Fixes #54720
This adds specialized methods to improve performance, and avoid
allocations that were arising currently from the fallback tridiagonal
implementations.

```julia
julia> using LinearAlgebra, BenchmarkTools

julia> n = 10000; B = Bidiagonal(rand(n), rand(n-1), :U); D = Diagonal(rand(size(B,1))); C = similar(B, size(B));

julia> @Btime mul!($C, $B, $D);
  25.552 ms (3 allocations: 78.19 KiB) # v"1.12.0-DEV.870"
  25.559 ms (0 allocations: 0 bytes) # This PR

julia> C = similar(B);

julia> @Btime mul!($C, $B, $D);
  23.551 μs (3 allocations: 78.19 KiB)  # v"1.12.0-DEV.870"
  7.123 μs (0 allocations: 0 bytes) # This PR, specialized method
```
…erms of `textwidth` (#55351)

Co-authored-by: Timothy <git@tecosaur.net>
Co-authored-by: Steven G. Johnson <2913679+stevengj@users.noreply.github.com>
Co-authored-by: Cody Tapscott <topolarity@tapscott.me>
Co-authored-by: Oscar Smith <oscardssmith@gmail.com>
This is a more apt description, since it is not floating point related,
and used earlier (such as in IOBuffer).

Fixes #55279
This fits into a 32-byte allocation pool, saving up to 64 bytes when
repeatedly reading small chunks of data (e.g. tokenizing a CSV file). In
some local `@btime` measurements, this seems to take <10% more time
across a range of output lengths.
fixes #55350

---------

Co-authored-by: Neven Sajko <s@purelymail.com>
Disables these tests on win32 that have been flaky on that platform
since February at least #53340
Comparing objects by `==` will happily answer nonsense for malformed
type comparisons, such as `unwrap_unionall(A) == A`. Avoid forming that
query. Additionally, need to recourse through Vararg when examining type
structure to make decisions.

Fix #55076
Fix #55189
The Julia memory model is always inbounds for GEP.

This makes the code in #55090
look almost the same as it did before the change. Locally I wasn't able
to reproduce the regression, but given it's vectorized code I suspect it
is backend sensitive.

Fixes #55090

Co-authored-by: Zentrik <Zentrik@users.noreply.github.com>
Some methods were filtered out based simply on visit order, which was
not intentional, with the lim==-1 weak-edges mode.

Fix #55231
As noted in #41584 and
https://discourse.julialang.org/t/safe-overwriting-of-files/117758/3
`mv` is usually expected to be "best effort atomic".

Currently calling `mv` with `force=true` calls
`checkfor_mv_cp_cptree(src, dst, "moving"; force=true)` before renaming.
`checkfor_mv_cp_cptree` will delete `dst` if exists and isn't the same
as `src`.

If `dst` is an existing file and julia stops after deleting `dst` but
before doing the rename, `dst` will be removed but will not be replaced
with `src`.

This PR changes `mv` with `force=true` to first try rename, and only
delete `dst` if that fails. Assuming file system support and the first
rename works, julia stopping will not lead to `dst` being removed
without being replaced.

This also replaces a stopgap solution from
#36638 (comment)
This adds the `terminfo` database to `deps/`, providing a better user
experience on systems that don't have `terminfo` on the system by
default. The database is built using BinaryBuilder but is not actually
platform-specific (it's built for `AnyPlatform`) and as such, this
fetches the artifact directly rather than adding a new JLL to stdlib,
and it requires no compilation.

A build flag, `WITH_TERMINFO`, is added here and assumed true by
default, allowing users to set `WITH_TERMINFO=0` in Make.user to avoid
bundling `terminfo` should they want to do so.

The lookup policy for `terminfo` entries is still compliant with what's
described in `terminfo(5)`; the bundled directory is taken to be the
first "compiled in" location, i.e. prepended to `@TERMINFO_DIRS@`. This
allows any user settings that exist locally, such as custom entries or
locations, to take precedence.

Fixes #55274

Co-authored-by: Mosè Giordano <giordano@users.noreply.github.com>
The `clamp` function was defined in Base.Math, but required to be in
Base now, so move it to intfuncs with other similar functions

Fixes #55279
… options (#55407)

This technically removes the option for Oz in julia but it doesn't
actually do what one wants.
This removes an API currently used by Enzyme.jl and AllocCheck.jl but
given that LLVM.jl doesn't support this API anymore that seems fine.
@wsmoses @maleadt
Do we want the replacement for this (a function that parses the
PipelineConfig struct) to live in LLVM.jl or GPUCompiler.jl ?
Testing:

- with a package error
```
(SimpleLooper) pkg> precompile
Precompiling all packages...
  ✗ SimpleLooper
  0 dependencies successfully precompiled in 2 seconds

ERROR: The following 1 direct dependency failed to precompile:

SimpleLooper 

Failed to precompile SimpleLooper [ff33fe5-d8e3-4cbd-8bd9-3d2408ff8cab] to "/Users/ian/.julia/compiled/v1.12/SimpleLooper/jl_PQArnH".
ERROR: LoadError: 
Stacktrace:
  [1] error()
    @ Base ./error.jl:53
```

- with interrupt
```
(SimpleLooper) pkg> precompile
Precompiling all packages...
^C Interrupted: Exiting precompilation...
  ◒ SimpleLooper
  1 dependency had output during precompilation:
┌ SimpleLooper
│  [57879] signal 2: Interrupt: 2
│  in expression starting at /Users/ian/Documents/GitHub/SimpleLooper.jl/src/SimpleLooper.jl:2
└  
```

- an internal error simulated in the same scope that
JuliaLang/Pkg.jl#3984 was failing to throw
from
 ```
  JULIA stdlib/release.image
Unhandled Task ERROR: 
Stacktrace:
 [1] error()
   @ Base ./error.jl:53
[2] (::Base.Precompilation.var"#27#65"{Bool, Bool, Vector{Task},
Dict{Tuple{Base.PkgId, Pair{Cmd, Base.CacheFlags}}, String},
Dict{Tuple{Base.PkgId, Pair{Cmd, Base.CacheFlags}}, String}, Base.Event,
Base.Event, ReentrantLock, Vector{Tuple{Base.PkgId, Pair{Cmd,
Base.CacheFlags}}}, Dict{Tuple{Base.PkgId, Pair{Cmd, Base.CacheFlags}},
String}, Vector{Tuple{Base.PkgId, Pair{Cmd, Base.CacheFlags}}}, Int64,
Vector{Base.PkgId}, Dict{Tuple{Base.PkgId, Pair{Cmd, Base.CacheFlags}},
Bool}, Dict{Tuple{Base.PkgId, Pair{Cmd, Base.CacheFlags}}, Base.Event},
Dict{Tuple{Base.PkgId, Pair{Cmd, Base.CacheFlags}}, Bool},
Vector{Base.PkgId}, Dict{Base.PkgId, String}, Dict{Tuple{Base.PkgId,
UInt128, String, String}, Bool},
Base.Precompilation.var"#color_string#38"{Bool}, Bool, Base.Semaphore,
Bool, String, Vector{String}, Vector{Base.PkgId}, Base.PkgId,
Base.CacheFlags, Cmd, Pair{Cmd, Base.CacheFlags}, Tuple{Base.PkgId,
Pair{Cmd, Base.CacheFlags}}})()
   @ Base.Precompilation ./precompilation.jl:819
```
…Tridiagonal` (#55415)

This makes the displayed form of a `Tridiaognal` and a `SymTridiagonal`
valid constructors.
```julia
julia> T = Tridiagonal(1:3, 1:4, 1:3)
4×4 Tridiagonal{Int64, UnitRange{Int64}}:
 1  1  ⋅  ⋅
 1  2  2  ⋅
 ⋅  2  3  3
 ⋅  ⋅  3  4

julia> show(T)
Tridiagonal(1:3, 1:4, 1:3)

julia> S = SymTridiagonal(1:4, 1:3)
4×4 SymTridiagonal{Int64, UnitRange{Int64}}:
 1  1  ⋅  ⋅
 1  2  2  ⋅
 ⋅  2  3  3
 ⋅  ⋅  3  4

julia> show(S)
SymTridiagonal(1:4, 1:3)
```
Displaying the bands has several advantages: firstly, it's briefer than
printing the full array, and secondly, it displays the special structure
in the bands, if any. E.g.:
```julia
julia> T = Tridiagonal(spzeros(3), spzeros(4), spzeros(3));

julia> show(T)
Tridiagonal(sparsevec(Int64[], Float64[], 3), sparsevec(Int64[], Float64[], 4), sparsevec(Int64[], Float64[], 3))
```
It's clear from the displayed form that `T` has sparse bands.

A special handling for `SymTridiagonal` matrices is necessary, as the
diagonal band is symmetrized. This means:
```julia
julia> using StaticArrays

julia> m = SMatrix{2,2}(1:4);

julia> S = SymTridiagonal(fill(m,3), fill(m,2))
3×3 SymTridiagonal{SMatrix{2, 2, Int64, 4}, Vector{SMatrix{2, 2, Int64, 4}}}:
 [1 3; 3 4]  [1 3; 2 4]      ⋅     
 [1 2; 3 4]  [1 3; 3 4]  [1 3; 2 4]
     ⋅       [1 2; 3 4]  [1 3; 3 4]

julia> show(S)
SymTridiagonal(SMatrix{2, 2, Int64, 4}[[1 3; 3 4], [1 3; 3 4], [1 3; 3 4]], SMatrix{2, 2, Int64, 4}[[1 3; 2 4], [1 3; 2 4]])
```
The displayed values correspond to the symmetrized band, and not the
actual input arguments. I think displaying the symmetrized elements
makes more sense here, as this matches the form in the 3-argument
`show`.
In extreme cases, the compiler could mark this function for
concrete-eval, even though that is illegal unless the compiler has first
deleted this instruction. Otherwise the attempt to concrete-eval will
re-run the function repeatedly until it hits a StackOverflow.

Workaround to fix #55147

@aviatesk You might know how to solve this even better, using
post-optimization effect refinements? Since we should actually only
apply the refinement of terminates=false => terminates=true (and thus
allowing concrete eval) if the optimization occurs, and not just in
inference thinks the optimization would be legal.

---------

Co-authored-by: Shuhei Kadowaki <aviatesk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet