New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not rely on accumulators having tag 0 in native compilation. #14048
Conversation
kernel/byterun/coq_values.c
Outdated
|
||
value coq_native_is_accu(value v) { | ||
if (Tag_val(v) == Closure_tag) { | ||
if (Code_val(v) == coq_accumulator_code) return Val_true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI C has short circuited "logical" conjunction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know :) This code is not correct anyways, this PR is a draft for a good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, the inner test is useless. If it were to fail, the value will get matched upon (while it has a closure tag) and Coq will segfault.
That is the equivalent of the following macro in the bytecode interpreter:
#define Is_accu(v) (Is_block(v) && Tag_val(v) == Closure_tag)
By the way, I am a bit surprised you are not testing Is_block(v)
in your version. Are you really sure the value is a block there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, the inner test is useless.
Not for all calls to is_accu
. This is only true for inductive accumulators encountered in match nodes. In the reification code kind_of_value
, we don't have this invariant since we don't have the type around, and we can produce functions Vfun
with a closure tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I suggest making two functions, one for execution and one for reification. (Especially since I am still under the impression that the execution one is supposed to test for Is_block
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be easier to convince OCaml developers to optimize for this case
I'd rather compile to malfunction and have a low-level enough language of patterns to jump on the tag directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is my point. The following seems just fine as a low-level language of patterns:
match foo with
| x when Obj.is_closure x -> ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, then we would also need language support for ADT containing functions. If OCaml is clever enough it should rule out this branch statically since no constructor can ever be a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Native compilation is likely to be a huge UB anyways, I really think targetting OCaml and playing fast and loose with the type system is a footgun waiting to be shot in our face when we start using flambda.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If OCaml is clever enough it should rule out this branch statically since no constructor can ever be a function.
Note that OCaml currently allows to perform a match on a polymorphic values, so it cannot be too clever. Just because a value is passed to match
does not mean that it is a constructor.
let f (g:'a -> bool) x =
match x with
| x when g x -> true
| x -> false
Now, what if the match also contains proper branches? I guess that is one of those things that were never officially stated (and for which flambda might indeed be playing fast and loose).
10db989
to
ff7e3df
Compare
external is_closure : 'a -> bool = "coq_native_is_clos" [@@noalloc] | ||
|
||
(** TODO: do this more efficiently in C? *) | ||
(* Dummy wrapper to ensure the closure is not inlined by the compiler *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand why this would prevent any inlining. It seems better to use the old code and put the attribute [@inline never]
on it:
let [@inline never] rec accumulate data x =
if x == ret_accu then Obj.repr data
else Obj.repr (accumulate { data with acc_arg = x :: data.acc_arg })
Another way to prevent inlining (if you do not trust the attribute) is to simply store the accumulator in a public mutable reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work because we will get partially applied functions, which have not the accumulator code. I need to have a function of arity one that is exactly the one performing accumulation for my trick to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, my mistake. Still, I do not see how your code guarantees the uniqueness of the inner function. Perhaps something like the following would help, but I am not even sure OCaml guarantees that the [@inline]
attribute is respected for inner functions.
let rec accumulate data =
let [@inline never] accfun x =
if x == ret_accu then Obj.repr data
else
let data = { data with acc_arg = x :: data.acc_arg } in
Obj.repr (accumulate data).accfun in
{ accfun }
Here is another possibility, which is documented (or at least folklore enough to not be broken that easily):
let [@inline never] rec accumulate data = (); fun x ->
if x == ret_accu then Obj.repr data
else Obj.repr (accumulate { data with acc_arg = x :: data.acc_arg })
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe flambda explicitly breaks the second kind of trick. I witnessed that when observing the code generated by the proof monad with flambda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least, with 4.12.0+flambda, it is compiled as expected, that is, ();fun
forces an explicit closure creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quoting Vincent Laviron:
Flambda doesn’t try to coalesce a function returning a function into a function taking more arguments. Some transformations earlier in the compiler do that, but at that point
();
will not have been simplified yet so you should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other suggestion by Vincent:
let [@inline never] rec accumulate data = Sys.opaque_identity (fun x ->
if x == ret_accu then Obj.repr data
else Obj.repr (accumulate { data with acc_arg = x :: data.acc_arg }))
kernel/nativecode.ml
Outdated
@@ -1866,7 +1870,6 @@ let pp_global fmt g = | |||
Format.fprintf fmt " | %s %s@\n" cstr sig_str | |||
in | |||
let pp_const_sigs fmt lar = | |||
Format.fprintf fmt " | %s of Nativevalues.t@\n" (string_of_accu_construct "" ind); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest keeping the dummy tag-0 constructor at first (with assert false
branches), just to leave the array case unchanged. Indeed, while it is quite easy to adapt Nativenorm.nf_val
to not care for Varray
anymore, it seems a lot more complicated for Vconv.conv_whd
, if I am not missing something obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@silene the two code paths for VM and native are completely different, so you can change one without affecting the other (to the notable exception that native uses the same relocation tables for some mysterious reason). I agree that this approach is simpler though, so I've just pushed a version of this PR doing that. Let's see what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, for the confusion, I did not intend to talk about the VM at all. I was referring to Nativeconv.conv_val
and I just mixed both names. (Their code is the same, hence my mistake.) My point was that, if Vblock
and Varray
share the same tag, I have no idea how to fix that function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK. My plan was to box persistent arrays in a dedicated block with a reserved tag (namely Obj.last_non_constant_constructor_tag
). I don't know how bad this would be for performance, since that means we have to reallocate 2 words for every array modification. It's probably not that costly given that persistent arrays are already doing this kind of allocation in the back of the user.
Instead, we add a test for accumulator branches. We cannot use Obj.tag because the latter is inefficient, so we roll up our own test to check that a value has a closure tag and the right code. We keep the reserved status of blocks with tag 0 for native arrays. To do this we preserve the dummy accumulator constructor for all inductive types and replace all accumulating branches with an assertion failure.
ff7e3df
to
bc46d8c
Compare
As advised by @JasonGross I had a try at the |
Could you try replacing the C function by its OCaml counterpart? let is_closure x =
not (Obj.is_int x) && Obj.tag x = Obj.closure_tag The idea is that Note that the code of the standard external is_closure : 'a -> bool = "coq_native_is_clos" [@@noalloc]
let is_closure x =
not (Obj.is_int x) && is_closure x I do not expect much of an improvement, but perhaps we will no longer be in the 5x range. |
@silene I can try, but it's going to be unsound for inductive types without non-constant constructors since the compiler might be allowed to statically remove this branch. |
Same trick as before, just put |
But then this defeats the optimization purpose? |
No, that is the nice thing about |
No time difference when moving the is_int check to OCaml. |
I will try to cook up a quick-and-dirty compilation scheme using the OCaml compiler-libs and see what happens. Maybe we can close this for the time being? |
Sure. By the way, just for the record, it seems like we have a lot of leeway in how dirty our code can be without confusing the compiler. Indeed, I just noticed this piece of code in the OCaml compiler: type t = A | B | .... | Z | Closure of dummy
let method_impl ... =
match x with
| A -> some closure
...
| Z -> some closure
| Closure _ as clo -> Obj.magic clo In other words, closures are not even boxed in this function. They are stored as is, and even if the tag for the Unfortunately, it does not quite apply to our case, because our types allow for non-constant constructors. So, that trick would only work for a type such as type native_bool = Bool_true | Bool_false | Bool_closure of dummy
(* who cares if Bool_closure has tag 0 rather than 247? *) |
Instead, we add a test for accumulator branches. We cannot use Obj.tag because the latter is inefficient, so we roll up our own test to check that a value has a closure tag and the right code.
https://github.com/coq/coq/blob/master/dev/ci/user-overlays/README.md for details)