Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Partition bindings by world age #40399

Open
Keno opened this issue Apr 8, 2021 · 4 comments
Open

Proposal: Partition bindings by world age #40399

Keno opened this issue Apr 8, 2021 · 4 comments

Comments

@Keno
Copy link
Member

Keno commented Apr 8, 2021

I've been doing some work that has a fairly big dependency stack (100-200 packages) and as such takes a few minutes to warm up. Because of this non-Revise-able code is extremely painful, so I'd like to take another look at timholy/Revise.jl#18. This has some fairly long history of course, but to avoid getting buried in the discussion, I figured a new issue for my specific proposal would be best.

One solution that was previously discussed (in #22721) was to have backedges for bindings, but this was rejected as too expensive. My proposal aims to basically do this, but shift the cost of backedge tracking to the redefinition stage where it is more acceptable.

In particular, my proposal is:

  • Make each module's binding table partitioned by world ages. Introducing a new binding does not raise the world age, but bindings do record the world age where they were introduced.
  • Change binding lookup to only match bindings that are visible to the current world age
  • When wanting to redefine a binding (e.g. to replace a struct definition by an updated struct definition of the same name), we do a GC-like walk of all methods in the system and look at their source. If they contain a GlobalRef for the binding being replaced, we truncate the world bounds of any associated method instances (in effect having the GlobalRef serve as "forward edges" that get re-validated by the redefinition). This operation would increase the world age.
  • We add an array for explicit forward edges from non-syntactic resolutions of bindings during inference. This can basically happen for two reasons:
    1. Something like getfield(Main, :foo) which will get resolved via the getfield_tfunc.
    2. Generated functions

cc @JeffBezanson @vtjnash

If this proposal seems reasonable, I'd hope to nerdsnipe @timholy into taking charge :)

@JeffBezanson
Copy link
Sponsor Member

2. Generated functions

I'm not sure we can reasonably handle that. I think we don't really handle it for methods either?

@Keno
Copy link
Member Author

Keno commented Apr 8, 2021

I'm not sure we can reasonably handle that. I think we don't really handle it for methods either?

We don't handle backedges to the generated function generator, but we do handle backedes in the expanded code. Here, since we don't have the syntactic version of the generated code available, we'd need to store those as forward edges.

@Keno
Copy link
Member Author

Keno commented Sep 11, 2022

This could also solve #14055

Keno added a commit that referenced this issue Sep 13, 2022
On certain workloads, profiling shows a surprisingly high fraction of
inference time spent looking up bindings in modules. Bindings use
a hash table, so they're expected to be quite fast, but certainly
not zero. A big contributor to the problem is that we do basically
treat it as zero, looking up bindings for GlobalRefs multiple times
for each statement (e.g. in `isconst`, `isdefined`, to get the types,
etc). This PR attempts to improve the situation by adding an extra
field to GlobalRef that caches this lookup. This field is not serialized
and if not set, we fallback to the previous behavior. I would expect
the memory overhead to be relatively small, since we do intern GlobalRefs
in memory to only have one per binding (rather than one per use).

 # Benchmarks

The benchmarks look quite promising. Consider this artifical example
(though it's actually not all that far fetched, given some of the
generated code we see):

```
using Core.Intrinsics: add_int
const ONE = 1
@eval function f(x, y)
	z = 0
	$([:(z = add_int(x, add_int(z, ONE))) for _ = 1:10000]...)
	return add_int(z, y)
end
g(y) = f(ONE, y)
```

On master:
```
julia> @time @code_typed g(1)
  1.427227 seconds (1.31 M allocations: 58.809 MiB, 5.73% gc time, 99.96% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

On this PR:
```
julia> @time @code_typed g(1)
  0.503151 seconds (1.19 M allocations: 53.641 MiB, 5.10% gc time, 33.59% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

I don't expect the same speedup on other workloads, but there should be
a few % speedup on most workloads still.

 # Future extensions

The other motivation for this is that with a view towards #40399,
we will want to more clearly define when bindings get resolved. The
idea here would then be that binding resolution replaces generic
`GlobalRefs` by GlobalRefs with a set binding cache, and any
unresolved bindings would be treated conservatively by inference
and optimization.
Keno added a commit that referenced this issue Sep 13, 2022
On certain workloads, profiling shows a surprisingly high fraction of
inference time spent looking up bindings in modules. Bindings use
a hash table, so they're expected to be quite fast, but certainly
not zero. A big contributor to the problem is that we do basically
treat it as zero, looking up bindings for GlobalRefs multiple times
for each statement (e.g. in `isconst`, `isdefined`, to get the types,
etc). This PR attempts to improve the situation by adding an extra
field to GlobalRef that caches this lookup. This field is not serialized
and if not set, we fallback to the previous behavior. I would expect
the memory overhead to be relatively small, since we do intern GlobalRefs
in memory to only have one per binding (rather than one per use).

 # Benchmarks

The benchmarks look quite promising. Consider this artifical example
(though it's actually not all that far fetched, given some of the
generated code we see):

```
using Core.Intrinsics: add_int
const ONE = 1
@eval function f(x, y)
	z = 0
	$([:(z = add_int(x, add_int(z, ONE))) for _ = 1:10000]...)
	return add_int(z, y)
end
g(y) = f(ONE, y)
```

On master:
```
julia> @time @code_typed g(1)
  1.427227 seconds (1.31 M allocations: 58.809 MiB, 5.73% gc time, 99.96% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

On this PR:
```
julia> @time @code_typed g(1)
  0.503151 seconds (1.19 M allocations: 53.641 MiB, 5.10% gc time, 33.59% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

I don't expect the same speedup on other workloads, but there should be
a few % speedup on most workloads still.

 # Future extensions

The other motivation for this is that with a view towards #40399,
we will want to more clearly define when bindings get resolved. The
idea here would then be that binding resolution replaces generic
`GlobalRefs` by GlobalRefs with a set binding cache, and any
unresolved bindings would be treated conservatively by inference
and optimization.
Keno added a commit that referenced this issue Sep 13, 2022
On certain workloads, profiling shows a surprisingly high fraction of
inference time spent looking up bindings in modules. Bindings use
a hash table, so they're expected to be quite fast, but certainly
not zero. A big contributor to the problem is that we do basically
treat it as zero, looking up bindings for GlobalRefs multiple times
for each statement (e.g. in `isconst`, `isdefined`, to get the types,
etc). This PR attempts to improve the situation by adding an extra
field to GlobalRef that caches this lookup. This field is not serialized
and if not set, we fallback to the previous behavior. I would expect
the memory overhead to be relatively small, since we do intern GlobalRefs
in memory to only have one per binding (rather than one per use).

 # Benchmarks

The benchmarks look quite promising. Consider this artifical example
(though it's actually not all that far fetched, given some of the
generated code we see):

```
using Core.Intrinsics: add_int
const ONE = 1
@eval function f(x, y)
	z = 0
	$([:(z = add_int(x, add_int(z, ONE))) for _ = 1:10000]...)
	return add_int(z, y)
end
g(y) = f(ONE, y)
```

On master:
```
julia> @time @code_typed g(1)
  1.427227 seconds (1.31 M allocations: 58.809 MiB, 5.73% gc time, 99.96% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

On this PR:
```
julia> @time @code_typed g(1)
  0.503151 seconds (1.19 M allocations: 53.641 MiB, 5.10% gc time, 33.59% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

I don't expect the same speedup on other workloads, but there should be
a few % speedup on most workloads still.

 # Future extensions

The other motivation for this is that with a view towards #40399,
we will want to more clearly define when bindings get resolved. The
idea here would then be that binding resolution replaces generic
`GlobalRefs` by GlobalRefs with a set binding cache, and any
unresolved bindings would be treated conservatively by inference
and optimization.
Keno added a commit that referenced this issue Sep 13, 2022
On certain workloads, profiling shows a surprisingly high fraction of
inference time spent looking up bindings in modules. Bindings use
a hash table, so they're expected to be quite fast, but certainly
not zero. A big contributor to the problem is that we do basically
treat it as zero, looking up bindings for GlobalRefs multiple times
for each statement (e.g. in `isconst`, `isdefined`, to get the types,
etc). This PR attempts to improve the situation by adding an extra
field to GlobalRef that caches this lookup. This field is not serialized
and if not set, we fallback to the previous behavior. I would expect
the memory overhead to be relatively small, since we do intern GlobalRefs
in memory to only have one per binding (rather than one per use).

 # Benchmarks

The benchmarks look quite promising. Consider this artifical example
(though it's actually not all that far fetched, given some of the
generated code we see):

```
using Core.Intrinsics: add_int
const ONE = 1
@eval function f(x, y)
	z = 0
	$([:(z = add_int(x, add_int(z, ONE))) for _ = 1:10000]...)
	return add_int(z, y)
end
g(y) = f(ONE, y)
```

On master:
```
julia> @time @code_typed g(1)
  1.427227 seconds (1.31 M allocations: 58.809 MiB, 5.73% gc time, 99.96% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

On this PR:
```
julia> @time @code_typed g(1)
  0.503151 seconds (1.19 M allocations: 53.641 MiB, 5.10% gc time, 33.59% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

I don't expect the same speedup on other workloads, but there should be
a few % speedup on most workloads still.

 # Future extensions

The other motivation for this is that with a view towards #40399,
we will want to more clearly define when bindings get resolved. The
idea here would then be that binding resolution replaces generic
`GlobalRefs` by GlobalRefs with a set binding cache, and any
unresolved bindings would be treated conservatively by inference
and optimization.
Keno added a commit that referenced this issue Sep 14, 2022
On certain workloads, profiling shows a surprisingly high fraction of
inference time spent looking up bindings in modules. Bindings use
a hash table, so they're expected to be quite fast, but certainly
not zero. A big contributor to the problem is that we do basically
treat it as zero, looking up bindings for GlobalRefs multiple times
for each statement (e.g. in `isconst`, `isdefined`, to get the types,
etc). This PR attempts to improve the situation by adding an extra
field to GlobalRef that caches this lookup. This field is not serialized
and if not set, we fallback to the previous behavior. I would expect
the memory overhead to be relatively small, since we do intern GlobalRefs
in memory to only have one per binding (rather than one per use).

 # Benchmarks

The benchmarks look quite promising. Consider this artifical example
(though it's actually not all that far fetched, given some of the
generated code we see):

```
using Core.Intrinsics: add_int
const ONE = 1
@eval function f(x, y)
	z = 0
	$([:(z = add_int(x, add_int(z, ONE))) for _ = 1:10000]...)
	return add_int(z, y)
end
g(y) = f(ONE, y)
```

On master:
```
julia> @time @code_typed g(1)
  1.427227 seconds (1.31 M allocations: 58.809 MiB, 5.73% gc time, 99.96% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

On this PR:
```
julia> @time @code_typed g(1)
  0.503151 seconds (1.19 M allocations: 53.641 MiB, 5.10% gc time, 33.59% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

I don't expect the same speedup on other workloads, but there should be
a few % speedup on most workloads still.

 # Future extensions

The other motivation for this is that with a view towards #40399,
we will want to more clearly define when bindings get resolved. The
idea here would then be that binding resolution replaces generic
`GlobalRefs` by GlobalRefs with a set binding cache, and any
unresolved bindings would be treated conservatively by inference
and optimization.
Keno added a commit that referenced this issue Sep 14, 2022
On certain workloads, profiling shows a surprisingly high fraction of
inference time spent looking up bindings in modules. Bindings use
a hash table, so they're expected to be quite fast, but certainly
not zero. A big contributor to the problem is that we do basically
treat it as zero, looking up bindings for GlobalRefs multiple times
for each statement (e.g. in `isconst`, `isdefined`, to get the types,
etc). This PR attempts to improve the situation by adding an extra
field to GlobalRef that caches this lookup. This field is not serialized
and if not set, we fallback to the previous behavior. I would expect
the memory overhead to be relatively small, since we do intern GlobalRefs
in memory to only have one per binding (rather than one per use).

 # Benchmarks

The benchmarks look quite promising. Consider this artifical example
(though it's actually not all that far fetched, given some of the
generated code we see):

```
using Core.Intrinsics: add_int
const ONE = 1
@eval function f(x, y)
	z = 0
	$([:(z = add_int(x, add_int(z, ONE))) for _ = 1:10000]...)
	return add_int(z, y)
end
g(y) = f(ONE, y)
```

On master:
```
julia> @time @code_typed g(1)
  1.427227 seconds (1.31 M allocations: 58.809 MiB, 5.73% gc time, 99.96% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

On this PR:
```
julia> @time @code_typed g(1)
  0.503151 seconds (1.19 M allocations: 53.641 MiB, 5.10% gc time, 33.59% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

I don't expect the same speedup on other workloads, but there should be
a few % speedup on most workloads still.

 # Future extensions

The other motivation for this is that with a view towards #40399,
we will want to more clearly define when bindings get resolved. The
idea here would then be that binding resolution replaces generic
`GlobalRefs` by GlobalRefs with a set binding cache, and any
unresolved bindings would be treated conservatively by inference
and optimization.
Keno added a commit that referenced this issue Sep 14, 2022
On certain workloads, profiling shows a surprisingly high fraction of
inference time spent looking up bindings in modules. Bindings use
a hash table, so they're expected to be quite fast, but certainly
not zero. A big contributor to the problem is that we do basically
treat it as zero, looking up bindings for GlobalRefs multiple times
for each statement (e.g. in `isconst`, `isdefined`, to get the types,
etc). This PR attempts to improve the situation by adding an extra
field to GlobalRef that caches this lookup. This field is not serialized
and if not set, we fallback to the previous behavior. I would expect
the memory overhead to be relatively small, since we do intern GlobalRefs
in memory to only have one per binding (rather than one per use).

 # Benchmarks

The benchmarks look quite promising. Consider this artifical example
(though it's actually not all that far fetched, given some of the
generated code we see):

```
using Core.Intrinsics: add_int
const ONE = 1
@eval function f(x, y)
	z = 0
	$([:(z = add_int(x, add_int(z, ONE))) for _ = 1:10000]...)
	return add_int(z, y)
end
g(y) = f(ONE, y)
```

On master:
```
julia> @time @code_typed g(1)
  1.427227 seconds (1.31 M allocations: 58.809 MiB, 5.73% gc time, 99.96% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

On this PR:
```
julia> @time @code_typed g(1)
  0.503151 seconds (1.19 M allocations: 53.641 MiB, 5.10% gc time, 33.59% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

I don't expect the same speedup on other workloads, but there should be
a few % speedup on most workloads still.

 # Future extensions

The other motivation for this is that with a view towards #40399,
we will want to more clearly define when bindings get resolved. The
idea here would then be that binding resolution replaces generic
`GlobalRefs` by GlobalRefs with a set binding cache, and any
unresolved bindings would be treated conservatively by inference
and optimization.
Keno added a commit that referenced this issue Sep 15, 2022
On certain workloads, profiling shows a surprisingly high fraction of
inference time spent looking up bindings in modules. Bindings use
a hash table, so they're expected to be quite fast, but certainly
not zero. A big contributor to the problem is that we do basically
treat it as zero, looking up bindings for GlobalRefs multiple times
for each statement (e.g. in `isconst`, `isdefined`, to get the types,
etc). This PR attempts to improve the situation by adding an extra
field to GlobalRef that caches this lookup. This field is not serialized
and if not set, we fallback to the previous behavior. I would expect
the memory overhead to be relatively small, since we do intern GlobalRefs
in memory to only have one per binding (rather than one per use).

 # Benchmarks

The benchmarks look quite promising. Consider this artifical example
(though it's actually not all that far fetched, given some of the
generated code we see):

```
using Core.Intrinsics: add_int
const ONE = 1
@eval function f(x, y)
	z = 0
	$([:(z = add_int(x, add_int(z, ONE))) for _ = 1:10000]...)
	return add_int(z, y)
end
g(y) = f(ONE, y)
```

On master:
```
julia> @time @code_typed g(1)
  1.427227 seconds (1.31 M allocations: 58.809 MiB, 5.73% gc time, 99.96% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

On this PR:
```
julia> @time @code_typed g(1)
  0.503151 seconds (1.19 M allocations: 53.641 MiB, 5.10% gc time, 33.59% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

I don't expect the same speedup on other workloads, but there should be
a few % speedup on most workloads still.

 # Future extensions

The other motivation for this is that with a view towards #40399,
we will want to more clearly define when bindings get resolved. The
idea here would then be that binding resolution replaces generic
`GlobalRefs` by GlobalRefs with a set binding cache, and any
unresolved bindings would be treated conservatively by inference
and optimization.
Keno added a commit that referenced this issue Sep 15, 2022
On certain workloads, profiling shows a surprisingly high fraction of
inference time spent looking up bindings in modules. Bindings use
a hash table, so they're expected to be quite fast, but certainly
not zero. A big contributor to the problem is that we do basically
treat it as zero, looking up bindings for GlobalRefs multiple times
for each statement (e.g. in `isconst`, `isdefined`, to get the types,
etc). This PR attempts to improve the situation by adding an extra
field to GlobalRef that caches this lookup. This field is not serialized
and if not set, we fallback to the previous behavior. I would expect
the memory overhead to be relatively small, since we do intern GlobalRefs
in memory to only have one per binding (rather than one per use).

 # Benchmarks

The benchmarks look quite promising. Consider this artifical example
(though it's actually not all that far fetched, given some of the
generated code we see):

```
using Core.Intrinsics: add_int
const ONE = 1
@eval function f(x, y)
	z = 0
	$([:(z = add_int(x, add_int(z, ONE))) for _ = 1:10000]...)
	return add_int(z, y)
end
g(y) = f(ONE, y)
```

On master:
```
julia> @time @code_typed g(1)
  1.427227 seconds (1.31 M allocations: 58.809 MiB, 5.73% gc time, 99.96% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

On this PR:
```
julia> @time @code_typed g(1)
  0.503151 seconds (1.19 M allocations: 53.641 MiB, 5.10% gc time, 33.59% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

I don't expect the same speedup on other workloads, but there should be
a few % speedup on most workloads still.

 # Future extensions

The other motivation for this is that with a view towards #40399,
we will want to more clearly define when bindings get resolved. The
idea here would then be that binding resolution replaces generic
`GlobalRefs` by GlobalRefs with a set binding cache, and any
unresolved bindings would be treated conservatively by inference
and optimization.
Keno added a commit that referenced this issue Sep 15, 2022
On certain workloads, profiling shows a surprisingly high fraction of
inference time spent looking up bindings in modules. Bindings use
a hash table, so they're expected to be quite fast, but certainly
not zero. A big contributor to the problem is that we do basically
treat it as zero, looking up bindings for GlobalRefs multiple times
for each statement (e.g. in `isconst`, `isdefined`, to get the types,
etc). This PR attempts to improve the situation by adding an extra
field to GlobalRef that caches this lookup. This field is not serialized
and if not set, we fallback to the previous behavior. I would expect
the memory overhead to be relatively small, since we do intern GlobalRefs
in memory to only have one per binding (rather than one per use).

 # Benchmarks

The benchmarks look quite promising. Consider this artifical example
(though it's actually not all that far fetched, given some of the
generated code we see):

```
using Core.Intrinsics: add_int
const ONE = 1
@eval function f(x, y)
	z = 0
	$([:(z = add_int(x, add_int(z, ONE))) for _ = 1:10000]...)
	return add_int(z, y)
end
g(y) = f(ONE, y)
```

On master:
```
julia> @time @code_typed g(1)
  1.427227 seconds (1.31 M allocations: 58.809 MiB, 5.73% gc time, 99.96% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

On this PR:
```
julia> @time @code_typed g(1)
  0.503151 seconds (1.19 M allocations: 53.641 MiB, 5.10% gc time, 33.59% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

I don't expect the same speedup on other workloads, but there should be
a few % speedup on most workloads still.

 # Future extensions

The other motivation for this is that with a view towards #40399,
we will want to more clearly define when bindings get resolved. The
idea here would then be that binding resolution replaces generic
`GlobalRefs` by GlobalRefs with a set binding cache, and any
unresolved bindings would be treated conservatively by inference
and optimization.
aviatesk pushed a commit that referenced this issue Dec 9, 2022
On certain workloads, profiling shows a surprisingly high fraction of
inference time spent looking up bindings in modules. Bindings use
a hash table, so they're expected to be quite fast, but certainly
not zero. A big contributor to the problem is that we do basically
treat it as zero, looking up bindings for GlobalRefs multiple times
for each statement (e.g. in `isconst`, `isdefined`, to get the types,
etc). This PR attempts to improve the situation by adding an extra
field to GlobalRef that caches this lookup. This field is not serialized
and if not set, we fallback to the previous behavior. I would expect
the memory overhead to be relatively small, since we do intern GlobalRefs
in memory to only have one per binding (rather than one per use).

 # Benchmarks

The benchmarks look quite promising. Consider this artifical example
(though it's actually not all that far fetched, given some of the
generated code we see):

```
using Core.Intrinsics: add_int
const ONE = 1
@eval function f(x, y)
	z = 0
	$([:(z = add_int(x, add_int(z, ONE))) for _ = 1:10000]...)
	return add_int(z, y)
end
g(y) = f(ONE, y)
```

On master:
```
julia> @time @code_typed g(1)
  1.427227 seconds (1.31 M allocations: 58.809 MiB, 5.73% gc time, 99.96% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

On this PR:
```
julia> @time @code_typed g(1)
  0.503151 seconds (1.19 M allocations: 53.641 MiB, 5.10% gc time, 33.59% compilation time)
CodeInfo(
1 ─ %1 = invoke Main.f(Main.ONE::Int64, y::Int64)::Int64
└──      return %1
) => Int64
```

I don't expect the same speedup on other workloads, but there should be
a few % speedup on most workloads still.

 # Future extensions

The other motivation for this is that with a view towards #40399,
we will want to more clearly define when bindings get resolved. The
idea here would then be that binding resolution replaces generic
`GlobalRefs` by GlobalRefs with a set binding cache, and any
unresolved bindings would be treated conservatively by inference
and optimization.
aviatesk added a commit that referenced this issue Mar 31, 2023
This PR generalizes the idea from #49199 and uses inference to analyze
the types of REPL expression. This approach offers several advantages
over the current `get_[value|type]`-based implementation:
- The need for various special cases is eliminated, as lowering normalizes
  expressions, and inference handles all language features.
- Constant propagation allows us to obtain accurate completions for complex
  expressions safely (see #36437).

Analysis on arbitrary REPL expressions can be done by the following steps:
- Lower a given expression
- Form a top-level `MethodInstance` from the lowered expression
- Run inference on the top-level `MethodInstance`

This PR implements `REPLInterpreter`, a custom `AbstractInterpreter` that:
- aggressively resolve global bindings to enable reasonable completions
  for lines like `Mod.a.|` (where `|` is the cursor position)
- does not optimize the inferred code, as `REPLInterpreter` is only used
  to obtain the type or constant information of given expressions

In particular, aggressive binding resolution presents challenges for
`REPLInterpreter`'s cache validation (since #40399 hasn't been resolved
yet). To avoid cache validation issue, `REPLInterpreter` only allows
aggressive binding resolution for top-level frame representing REPL
input code (`repl_frame`) and for child `getproperty` frames that are
constant propagated from the `repl_frame`. This works, since
1.) these frames are never cached, and
2.) their results are only observed by the non-cached `repl_frame`

Note that the code cache for `REPLInterpreter` is separated from the
native code cache, ensuring that code caches produced by `REPLInterpreter`,
where bindings are aggressively resolved and the code is not really
optimized, do not affect the native code execution. A hack has
also been added to avoid serializing `CodeInstances`s produced by
`REPLInterpreter` during precompilation to workaround #48453.

closes #36437
replaces #49199
aviatesk added a commit that referenced this issue Apr 1, 2023
This PR generalizes the idea from #49199 and uses inference to analyze
the types of REPL expression. This approach offers several advantages
over the current `get_[value|type]`-based implementation:
- The need for various special cases is eliminated, as lowering normalizes
  expressions, and inference handles all language features.
- Constant propagation allows us to obtain accurate completions for complex
  expressions safely (see #36437).

Analysis on arbitrary REPL expressions can be done by the following steps:
- Lower a given expression
- Form a top-level `MethodInstance` from the lowered expression
- Run inference on the top-level `MethodInstance`

This PR implements `REPLInterpreter`, a custom `AbstractInterpreter` that:
- aggressively resolve global bindings to enable reasonable completions
  for lines like `Mod.a.|` (where `|` is the cursor position)
- aggressively concrete evaluates `:inconsistent` calls to provide
  reasonable completions for cases like `Ref(Some(42))[].|`
- does not optimize the inferred code, as `REPLInterpreter` is only used
  to obtain the type or constant information of given expressions

Aggressive binding resolution presents challenges for `REPLInterpreter`'s
cache validation (since #40399 hasn't been resolved yet). To avoid cache
validation issue, `REPLInterpreter` only allows aggressive binding
resolution for top-level frame representing REPL input code
(`repl_frame`) and for child `getproperty` frames that are
constant propagated from the `repl_frame`. This works, since
1.) these frames are never cached, and
2.) their results are only observed by the non-cached `repl_frame`

`REPLInterpreter` also aggressively concrete evaluate `:inconsistent`
calls within `repl_frame`, allowing it to get get accurate type
information about complex expressions that otherwise can not be constant
folded, in a safe way, i.e. it still doesn't evaluate effectful
expressions like `pop!(xs)`. Similarly to the aggressive binding
resolution, aggressive concrete evaluation doesn't present any cache
validation issues because `repl_frame` is never cached.

Also note that the code cache for `REPLInterpreter` is separated from the
native code cache, ensuring that code caches produced by `REPLInterpreter`,
where bindings are aggressively resolved and the code is not really
optimized, do not affect the native code execution. A hack has
also been added to avoid serializing `CodeInstances`s produced by
`REPLInterpreter` during precompilation to workaround #48453.

closes #36437
replaces #49199
aviatesk added a commit that referenced this issue Apr 2, 2023
This PR generalizes the idea from #49199 and uses inference to analyze
the types of REPL expression. This approach offers several advantages
over the current `get_[value|type]`-based implementation:
- The need for various special cases is eliminated, as lowering normalizes
  expressions, and inference handles all language features.
- Constant propagation allows us to obtain accurate completions for complex
  expressions safely (see #36437).

Analysis on arbitrary REPL expressions can be done by the following steps:
- Lower a given expression
- Form a top-level `MethodInstance` from the lowered expression
- Run inference on the top-level `MethodInstance`

This PR implements `REPLInterpreter`, a custom `AbstractInterpreter` that:
- aggressively resolve global bindings to enable reasonable completions
  for lines like `Mod.a.|` (where `|` is the cursor position)
- aggressively concrete evaluates `:inconsistent` calls to provide
  reasonable completions for cases like `Ref(Some(42))[].|`
- does not optimize the inferred code, as `REPLInterpreter` is only used
  to obtain the type or constant information of given expressions

Aggressive binding resolution presents challenges for `REPLInterpreter`'s
cache validation (since #40399 hasn't been resolved yet). To avoid cache
validation issue, `REPLInterpreter` only allows aggressive binding
resolution for top-level frame representing REPL input code
(`repl_frame`) and for child `getproperty` frames that are
constant propagated from the `repl_frame`. This works, since
1.) these frames are never cached, and
2.) their results are only observed by the non-cached `repl_frame`

`REPLInterpreter` also aggressively concrete evaluate `:inconsistent`
calls within `repl_frame`, allowing it to get get accurate type
information about complex expressions that otherwise can not be constant
folded, in a safe way, i.e. it still doesn't evaluate effectful
expressions like `pop!(xs)`. Similarly to the aggressive binding
resolution, aggressive concrete evaluation doesn't present any cache
validation issues because `repl_frame` is never cached.

Also note that the code cache for `REPLInterpreter` is separated from the
native code cache, ensuring that code caches produced by `REPLInterpreter`,
where bindings are aggressively resolved and the code is not really
optimized, do not affect the native code execution. A hack has
also been added to avoid serializing `CodeInstances`s produced by
`REPLInterpreter` during precompilation to workaround #48453.

closes #36437
replaces #49199
aviatesk added a commit that referenced this issue Apr 2, 2023
This PR generalizes the idea from #49199 and uses inference to analyze
the types of REPL expression. This approach offers several advantages
over the current `get_[value|type]`-based implementation:
- The need for various special cases is eliminated, as lowering normalizes
  expressions, and inference handles all language features.
- Constant propagation allows us to obtain accurate completions for complex
  expressions safely (see #36437).

Analysis on arbitrary REPL expressions can be done by the following steps:
- Lower a given expression
- Form a top-level `MethodInstance` from the lowered expression
- Run inference on the top-level `MethodInstance`

This PR implements `REPLInterpreter`, a custom `AbstractInterpreter` that:
- aggressively resolve global bindings to enable reasonable completions
  for lines like `Mod.a.|` (where `|` is the cursor position)
- aggressively concrete evaluates `:inconsistent` calls to provide
  reasonable completions for cases like `Ref(Some(42))[].|`
- does not optimize the inferred code, as `REPLInterpreter` is only used
  to obtain the type or constant information of given expressions

Aggressive binding resolution presents challenges for `REPLInterpreter`'s
cache validation (since #40399 hasn't been resolved yet). To avoid cache
validation issue, `REPLInterpreter` only allows aggressive binding
resolution for top-level frame representing REPL input code
(`repl_frame`) and for child `getproperty` frames that are
constant propagated from the `repl_frame`. This works, since
1.) these frames are never cached, and
2.) their results are only observed by the non-cached `repl_frame`

`REPLInterpreter` also aggressively concrete evaluate `:inconsistent`
calls within `repl_frame`, allowing it to get get accurate type
information about complex expressions that otherwise can not be constant
folded, in a safe way, i.e. it still doesn't evaluate effectful
expressions like `pop!(xs)`. Similarly to the aggressive binding
resolution, aggressive concrete evaluation doesn't present any cache
validation issues because `repl_frame` is never cached.

Also note that the code cache for `REPLInterpreter` is separated from the
native code cache, ensuring that code caches produced by `REPLInterpreter`,
where bindings are aggressively resolved and the code is not really
optimized, do not affect the native code execution. A hack has
also been added to avoid serializing `CodeInstances`s produced by
`REPLInterpreter` during precompilation to workaround #48453.

closes #36437
replaces #49199
oscardssmith pushed a commit that referenced this issue Apr 3, 2023
This PR generalizes the idea from #49199 and uses inference to analyze
the types of REPL expression. This approach offers several advantages
over the current `get_[value|type]`-based implementation:
- The need for various special cases is eliminated, as lowering normalizes
  expressions, and inference handles all language features.
- Constant propagation allows us to obtain accurate completions for complex
  expressions safely (see #36437).

Analysis on arbitrary REPL expressions can be done by the following steps:
- Lower a given expression
- Form a top-level `MethodInstance` from the lowered expression
- Run inference on the top-level `MethodInstance`

This PR implements `REPLInterpreter`, a custom `AbstractInterpreter` that:
- aggressively resolve global bindings to enable reasonable completions
  for lines like `Mod.a.|` (where `|` is the cursor position)
- aggressively concrete evaluates `:inconsistent` calls to provide
  reasonable completions for cases like `Ref(Some(42))[].|`
- does not optimize the inferred code, as `REPLInterpreter` is only used
  to obtain the type or constant information of given expressions

Aggressive binding resolution presents challenges for `REPLInterpreter`'s
cache validation (since #40399 hasn't been resolved yet). To avoid cache
validation issue, `REPLInterpreter` only allows aggressive binding
resolution for top-level frame representing REPL input code
(`repl_frame`) and for child `getproperty` frames that are
constant propagated from the `repl_frame`. This works, since
1.) these frames are never cached, and
2.) their results are only observed by the non-cached `repl_frame`

`REPLInterpreter` also aggressively concrete evaluate `:inconsistent`
calls within `repl_frame`, allowing it to get get accurate type
information about complex expressions that otherwise can not be constant
folded, in a safe way, i.e. it still doesn't evaluate effectful
expressions like `pop!(xs)`. Similarly to the aggressive binding
resolution, aggressive concrete evaluation doesn't present any cache
validation issues because `repl_frame` is never cached.

Also note that the code cache for `REPLInterpreter` is separated from the
native code cache, ensuring that code caches produced by `REPLInterpreter`,
where bindings are aggressively resolved and the code is not really
optimized, do not affect the native code execution. A hack has
also been added to avoid serializing `CodeInstances`s produced by
`REPLInterpreter` during precompilation to workaround #48453.

closes #36437
replaces #49199
Xnartharax pushed a commit to Xnartharax/julia that referenced this issue Apr 19, 2023
…g#49206)

This PR generalizes the idea from JuliaLang#49199 and uses inference to analyze
the types of REPL expression. This approach offers several advantages
over the current `get_[value|type]`-based implementation:
- The need for various special cases is eliminated, as lowering normalizes
  expressions, and inference handles all language features.
- Constant propagation allows us to obtain accurate completions for complex
  expressions safely (see JuliaLang#36437).

Analysis on arbitrary REPL expressions can be done by the following steps:
- Lower a given expression
- Form a top-level `MethodInstance` from the lowered expression
- Run inference on the top-level `MethodInstance`

This PR implements `REPLInterpreter`, a custom `AbstractInterpreter` that:
- aggressively resolve global bindings to enable reasonable completions
  for lines like `Mod.a.|` (where `|` is the cursor position)
- aggressively concrete evaluates `:inconsistent` calls to provide
  reasonable completions for cases like `Ref(Some(42))[].|`
- does not optimize the inferred code, as `REPLInterpreter` is only used
  to obtain the type or constant information of given expressions

Aggressive binding resolution presents challenges for `REPLInterpreter`'s
cache validation (since JuliaLang#40399 hasn't been resolved yet). To avoid cache
validation issue, `REPLInterpreter` only allows aggressive binding
resolution for top-level frame representing REPL input code
(`repl_frame`) and for child `getproperty` frames that are
constant propagated from the `repl_frame`. This works, since
1.) these frames are never cached, and
2.) their results are only observed by the non-cached `repl_frame`

`REPLInterpreter` also aggressively concrete evaluate `:inconsistent`
calls within `repl_frame`, allowing it to get get accurate type
information about complex expressions that otherwise can not be constant
folded, in a safe way, i.e. it still doesn't evaluate effectful
expressions like `pop!(xs)`. Similarly to the aggressive binding
resolution, aggressive concrete evaluation doesn't present any cache
validation issues because `repl_frame` is never cached.

Also note that the code cache for `REPLInterpreter` is separated from the
native code cache, ensuring that code caches produced by `REPLInterpreter`,
where bindings are aggressively resolved and the code is not really
optimized, do not affect the native code execution. A hack has
also been added to avoid serializing `CodeInstances`s produced by
`REPLInterpreter` during precompilation to workaround JuliaLang#48453.

closes JuliaLang#36437
replaces JuliaLang#49199
@Keno
Copy link
Member Author

Keno commented Jun 2, 2024

This could also solve #14055

To remind myself what was meant here. If you have world-age partitioned bindings, you can (semantically) resolve imports eagerly and then replace them if necessary.

Keno added a commit that referenced this issue Jun 2, 2024
This implements world-age partitioning of bindings as proposed in #40399.
In effect, much like methods, the global view of bindings now depends on
your currently executing world. This means that `const` bindings can now
have different values in different worlds. In principle it also means
that regular global variables could have different values in different
worlds, but there is currently no case where the system does this.

# Motivation
The reasons for this change are manifold:

1. The primary motivation is to permit Revise to redefine structs.
   This has been a feature request since the very begining of Revise
   (timholy/Revise.jl#18) and there have been
   numerous attempts over the past 7 years to address this, as well as
   countless duplicate feature request. A past attempt to implement the
   necessary julia support in #22721 failed because the consequences and
   semantics of re-defining bindings were not sufficiently worked out.
   One way to think of this implementation (at least with respect to types)
   is that it provides a well-grounded implementation of #22721.

2. A secondary motivation is to make `const`-redefinition no longer UB
   (although `const` redefinition will still have a significant performance
   penalty, so it is not recommended). See e.g. the full discussion in #54099.

3. Not currently implemented, but this mechanism can be used to re-compile code
   where bindings are introduced after the first compile, which is a common
   performance trap for new users (#53958).

4. Not currently implemented, but this mechanism can be used to clarify the semantics
   of bindings import and resolution to address issues like #14055.

# Implementation

In this PR:
 - `Binding` gets `min_world`/`max_world` fields like `CodeInstance`
 - Various lookup functions walk this linked list using the current task world_age as a key
 - Inference accumulates world bounds as it would for methods
 - Upon binding replacement, we walk all methods in the system, invalidating those whose
   uninferred IR references the replaced GlobalRef
 - One primary complication is that our IR definition permits `const` globals in value position,
   but if binding replacement is permitted, the validity of this may change after the fact.
   To address this, there is a helper in `Core.Compiler` that gets invoked in the type inference
   world and will rewrite the method source to be legal in all worlds.
 - A new `@world` macro can be used to access bindings from old world ages. This is used in printing
   for old objects.
 - The `const`-override behavior was changed to only be permitted at toplevel. The warnings about
   it being UB was removed.

Of particular note, this PR does not include any mechanism for invalidating methods whose signatures
were created using an old Binding (or types whose fields were the result of a binding evaluation).
There was some discussion among the compiler team of whether such a mechanism should exist in base,
but the consensus was that it should not. In particular, although uncommon, a pattern like:
```
f() = Any
g(::f()) = 1
f() = Int
```
Does not redefine `g`. Thus to fully address the Revise issue, additional code will be required in
Revise to track the dependency of various signatures and struct definitions on bindings.

# Demo
```
julia> struct Foo
               a::Int
       end

julia> g() = Foo(1)
g (generic function with 1 method)

julia> g()
Foo(1)

julia> f(::Foo) = 1
f (generic function with 1 method)

julia> fold = Foo(1)
Foo(1)

julia> struct Foo
               a::Int
               b::Int
       end

julia> g()
ERROR: MethodError: no method matching Foo(::Int64)
The type `Foo` exists, but no method is defined for this combination of argument types when trying to construct it.

Closest candidates are:
  Foo(::Int64, ::Int64)
   @ Main REPL[6]:2
  Foo(::Any, ::Any)
   @ Main REPL[6]:2

Stacktrace:
 [1] g()
   @ Main ./REPL[2]:1
 [2] top-level scope
   @ REPL[7]:1

julia> f(::Foo) = 2
f (generic function with 2 methods)

julia> methods(f)
# 2 methods for generic function "f" from Main:
 [1] f(::Foo)
     @ REPL[8]:1
 [2] f(::@world(Foo, 0:26898))
     @ REPL[4]:1

julia> fold
@world(Foo, 0:26898)(1)
```

# Performance consideration
On my machine, the validation required upon binding replacement for the full system image takes about 200ms.
With CedarSim loaded (I tried OmniPackage, but it's not working on master), this increases about 5x. That's
a fair bit of compute, but not the end of the world. Still, Revise may have to batch its validation. There
may also be opportunities for performance improvement by operating on the compressed representation directly.

# Semantic TODO

- [ ] Do we want to change the resolution time of bindings to (semantically) resolve them immediately?
- [ ] Do we want to introduce guard bindings when inference assumes the absence of a binding?

# Implementation TODO
- [ ] Precompile re-validation
- [ ] Various cleanups in the accessors
- [ ] Invert the order of the binding linked list to make the most recent one always the head of the list
- [ ] CodeInstances need forward edges for GlobalRefs not part of the uninferred code
- [ ] Generated function support
Keno added a commit that referenced this issue Jun 9, 2024
This implements world-age partitioning of bindings as proposed in #40399.
In effect, much like methods, the global view of bindings now depends on
your currently executing world. This means that `const` bindings can now
have different values in different worlds. In principle it also means
that regular global variables could have different values in different
worlds, but there is currently no case where the system does this.

The reasons for this change are manifold:

1. The primary motivation is to permit Revise to redefine structs.
   This has been a feature request since the very begining of Revise
   (timholy/Revise.jl#18) and there have been
   numerous attempts over the past 7 years to address this, as well as
   countless duplicate feature request. A past attempt to implement the
   necessary julia support in #22721 failed because the consequences and
   semantics of re-defining bindings were not sufficiently worked out.
   One way to think of this implementation (at least with respect to types)
   is that it provides a well-grounded implementation of #22721.

2. A secondary motivation is to make `const`-redefinition no longer UB
   (although `const` redefinition will still have a significant performance
   penalty, so it is not recommended). See e.g. the full discussion in #54099.

3. Not currently implemented, but this mechanism can be used to re-compile code
   where bindings are introduced after the first compile, which is a common
   performance trap for new users (#53958).

4. Not currently implemented, but this mechanism can be used to clarify the semantics
   of bindings import and resolution to address issues like #14055.

In this PR:
 - `Binding` gets `min_world`/`max_world` fields like `CodeInstance`
 - Various lookup functions walk this linked list using the current task world_age as a key
 - Inference accumulates world bounds as it would for methods
 - Upon binding replacement, we walk all methods in the system, invalidating those whose
   uninferred IR references the replaced GlobalRef
 - One primary complication is that our IR definition permits `const` globals in value position,
   but if binding replacement is permitted, the validity of this may change after the fact.
   To address this, there is a helper in `Core.Compiler` that gets invoked in the type inference
   world and will rewrite the method source to be legal in all worlds.
 - A new `@world` macro can be used to access bindings from old world ages. This is used in printing
   for old objects.
 - The `const`-override behavior was changed to only be permitted at toplevel. The warnings about
   it being UB was removed.

Of particular note, this PR does not include any mechanism for invalidating methods whose signatures
were created using an old Binding (or types whose fields were the result of a binding evaluation).
There was some discussion among the compiler team of whether such a mechanism should exist in base,
but the consensus was that it should not. In particular, although uncommon, a pattern like:
```
f() = Any
g(::f()) = 1
f() = Int
```
Does not redefine `g`. Thus to fully address the Revise issue, additional code will be required in
Revise to track the dependency of various signatures and struct definitions on bindings.

```
julia> struct Foo
               a::Int
       end

julia> g() = Foo(1)
g (generic function with 1 method)

julia> g()
Foo(1)

julia> f(::Foo) = 1
f (generic function with 1 method)

julia> fold = Foo(1)
Foo(1)

julia> struct Foo
               a::Int
               b::Int
       end

julia> g()
ERROR: MethodError: no method matching Foo(::Int64)
The type `Foo` exists, but no method is defined for this combination of argument types when trying to construct it.

Closest candidates are:
  Foo(::Int64, ::Int64)
   @ Main REPL[6]:2
  Foo(::Any, ::Any)
   @ Main REPL[6]:2

Stacktrace:
 [1] g()
   @ Main ./REPL[2]:1
 [2] top-level scope
   @ REPL[7]:1

julia> f(::Foo) = 2
f (generic function with 2 methods)

julia> methods(f)
 [1] f(::Foo)
     @ REPL[8]:1
 [2] f(::@world(Foo, 0:26898))
     @ REPL[4]:1

julia> fold
@world(Foo, 0:26898)(1)
```

On my machine, the validation required upon binding replacement for the full system image takes about 200ms.
With CedarSim loaded (I tried OmniPackage, but it's not working on master), this increases about 5x. That's
a fair bit of compute, but not the end of the world. Still, Revise may have to batch its validation. There
may also be opportunities for performance improvement by operating on the compressed representation directly.

- [ ] Do we want to change the resolution time of bindings to (semantically) resolve them immediately?
- [ ] Do we want to introduce guard bindings when inference assumes the absence of a binding?

- [ ] Precompile re-validation
- [ ] Various cleanups in the accessors
- [ ] Invert the order of the binding linked list to make the most recent one always the head of the list
- [ ] CodeInstances need forward edges for GlobalRefs not part of the uninferred code
- [ ] Generated function support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants