Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not rely on accumulators having tag 0 in native compilation. #14048

Closed
wants to merge 1 commit into from

Conversation

ppedrot
Copy link
Member

@ppedrot ppedrot commented Apr 1, 2021

Instead, we add a test for accumulator branches. We cannot use Obj.tag because the latter is inefficient, so we roll up our own test to check that a value has a closure tag and the right code.

  • Added / updated test-suite


value coq_native_is_accu(value v) {
if (Tag_val(v) == Closure_tag) {
if (Code_val(v) == coq_accumulator_code) return Val_true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI C has short circuited "logical" conjunction

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know :) This code is not correct anyways, this PR is a draft for a good reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, the inner test is useless. If it were to fail, the value will get matched upon (while it has a closure tag) and Coq will segfault.

That is the equivalent of the following macro in the bytecode interpreter:

#define Is_accu(v) (Is_block(v) && Tag_val(v) == Closure_tag)

By the way, I am a bit surprised you are not testing Is_block(v) in your version. Are you really sure the value is a block there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, the inner test is useless.

Not for all calls to is_accu. This is only true for inductive accumulators encountered in match nodes. In the reification code kind_of_value, we don't have this invariant since we don't have the type around, and we can produce functions Vfun with a closure tag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I suggest making two functions, one for execution and one for reification. (Especially since I am still under the impression that the execution one is supposed to test for Is_block.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be easier to convince OCaml developers to optimize for this case

I'd rather compile to malfunction and have a low-level enough language of patterns to jump on the tag directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is my point. The following seems just fine as a low-level language of patterns:

match foo with
| x when Obj.is_closure x -> ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, then we would also need language support for ADT containing functions. If OCaml is clever enough it should rule out this branch statically since no constructor can ever be a function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Native compilation is likely to be a huge UB anyways, I really think targetting OCaml and playing fast and loose with the type system is a footgun waiting to be shot in our face when we start using flambda.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If OCaml is clever enough it should rule out this branch statically since no constructor can ever be a function.

Note that OCaml currently allows to perform a match on a polymorphic values, so it cannot be too clever. Just because a value is passed to match does not mean that it is a constructor.

let f (g:'a -> bool) x =
  match x with
  | x when g x -> true
  | x -> false

Now, what if the match also contains proper branches? I guess that is one of those things that were never officially stated (and for which flambda might indeed be playing fast and loose).

kernel/nativevalues.mli Outdated Show resolved Hide resolved
external is_closure : 'a -> bool = "coq_native_is_clos" [@@noalloc]

(** TODO: do this more efficiently in C? *)
(* Dummy wrapper to ensure the closure is not inlined by the compiler *)
Copy link
Contributor

@silene silene Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand why this would prevent any inlining. It seems better to use the old code and put the attribute [@inline never] on it:

let [@inline never] rec accumulate data x =
  if x == ret_accu then Obj.repr data
  else Obj.repr (accumulate { data with acc_arg = x :: data.acc_arg })

Another way to prevent inlining (if you do not trust the attribute) is to simply store the accumulator in a public mutable reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work because we will get partially applied functions, which have not the accumulator code. I need to have a function of arity one that is exactly the one performing accumulation for my trick to work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, my mistake. Still, I do not see how your code guarantees the uniqueness of the inner function. Perhaps something like the following would help, but I am not even sure OCaml guarantees that the [@inline] attribute is respected for inner functions.

let rec accumulate data =
  let [@inline never] accfun x =
    if x == ret_accu then Obj.repr data
    else
      let data = { data with acc_arg = x :: data.acc_arg } in
      Obj.repr (accumulate data).accfun in
  { accfun }

Here is another possibility, which is documented (or at least folklore enough to not be broken that easily):

let [@inline never] rec accumulate data = (); fun x ->
  if x == ret_accu then Obj.repr data
  else Obj.repr (accumulate { data with acc_arg = x :: data.acc_arg })

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe flambda explicitly breaks the second kind of trick. I witnessed that when observing the code generated by the proof monad with flambda.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least, with 4.12.0+flambda, it is compiled as expected, that is, ();fun forces an explicit closure creation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quoting Vincent Laviron:

Flambda doesn’t try to coalesce a function returning a function into a function taking more arguments. Some transformations earlier in the compiler do that, but at that point (); will not have been simplified yet so you should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other suggestion by Vincent:

let [@inline never] rec accumulate data = Sys.opaque_identity (fun x ->
  if x == ret_accu then Obj.repr data
  else Obj.repr (accumulate { data with acc_arg = x :: data.acc_arg }))

@@ -1866,7 +1870,6 @@ let pp_global fmt g =
Format.fprintf fmt " | %s %s@\n" cstr sig_str
in
let pp_const_sigs fmt lar =
Format.fprintf fmt " | %s of Nativevalues.t@\n" (string_of_accu_construct "" ind);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest keeping the dummy tag-0 constructor at first (with assert false branches), just to leave the array case unchanged. Indeed, while it is quite easy to adapt Nativenorm.nf_val to not care for Varray anymore, it seems a lot more complicated for Vconv.conv_whd, if I am not missing something obvious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@silene the two code paths for VM and native are completely different, so you can change one without affecting the other (to the notable exception that native uses the same relocation tables for some mysterious reason). I agree that this approach is simpler though, so I've just pushed a version of this PR doing that. Let's see what happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, for the confusion, I did not intend to talk about the VM at all. I was referring to Nativeconv.conv_val and I just mixed both names. (Their code is the same, hence my mistake.) My point was that, if Vblock and Varray share the same tag, I have no idea how to fix that function.

Copy link
Member Author

@ppedrot ppedrot Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK. My plan was to box persistent arrays in a dedicated block with a reserved tag (namely Obj.last_non_constant_constructor_tag). I don't know how bad this would be for performance, since that means we have to reallocate 2 words for every array modification. It's probably not that costly given that persistent arrays are already doing this kind of allocation in the back of the user.

Instead, we add a test for accumulator branches. We cannot use Obj.tag because
the latter is inefficient, so we roll up our own test to check that a value
has a closure tag and the right code.

We keep the reserved status of blocks with tag 0 for native arrays. To do this
we preserve the dummy accumulator constructor for all inductive types and replace
all accumulating branches with an assertion failure.
@ppedrot
Copy link
Member Author

ppedrot commented Apr 7, 2021

As advised by @JasonGross I had a try at the UnsaturatedSolinasHeuristics/Tests.v from fiat_crypto. This PR is terrible, half of the time is passed in the accumulator test function and the evaluation is about 5 times as slow as the master branch (~200s vs. ~40s).

@silene
Copy link
Contributor

silene commented Apr 7, 2021

Could you try replacing the C function by its OCaml counterpart?

let is_closure x =
  not (Obj.is_int x) && Obj.tag x = Obj.closure_tag

The idea is that Obj.is_int is a builtin, so the C function would only be called for non-trivial constructors.

Note that the code of the standard Obj.tag function is a bit awful, so you might want to keep a C function just for it:

external is_closure : 'a -> bool = "coq_native_is_clos" [@@noalloc]
let is_closure x =
  not (Obj.is_int x) && is_closure x

I do not expect much of an improvement, but perhaps we will no longer be in the 5x range.

@ppedrot
Copy link
Member Author

ppedrot commented Apr 7, 2021

@silene I can try, but it's going to be unsound for inductive types without non-constant constructors since the compiler might be allowed to statically remove this branch.

@silene
Copy link
Contributor

silene commented Apr 7, 2021

Same trick as before, just put Sys.opaque_identity around x before calling Obj.is_int.

@ppedrot
Copy link
Member Author

ppedrot commented Apr 7, 2021

But then this defeats the optimization purpose?

@silene
Copy link
Contributor

silene commented Apr 7, 2021

No, that is the nice thing about Sys.opaque_identity. It is the builtin identity, so it has no impact on the generated assembly code, except that the type system does not know about it.

@ppedrot
Copy link
Member Author

ppedrot commented Apr 7, 2021

No time difference when moving the is_int check to OCaml.

@ppedrot
Copy link
Member Author

ppedrot commented Apr 7, 2021

I will try to cook up a quick-and-dirty compilation scheme using the OCaml compiler-libs and see what happens. Maybe we can close this for the time being?

@silene
Copy link
Contributor

silene commented Apr 8, 2021

Maybe we can close this for the time being?

Sure.

By the way, just for the record, it seems like we have a lot of leeway in how dirty our code can be without confusing the compiler. Indeed, I just noticed this piece of code in the OCaml compiler:

type t = A | B | .... | Z | Closure of dummy
let method_impl ... =
  match x with
  | A -> some closure
  ...
  | Z -> some closure
  | Closure _ as clo -> Obj.magic clo

In other words, closures are not even boxed in this function. They are stored as is, and even if the tag for the Closure constructor does not match the actual closure tag, the OCaml compiler does not care.

Unfortunately, it does not quite apply to our case, because our types allow for non-constant constructors. So, that trick would only work for a type such as bool:

type native_bool = Bool_true | Bool_false | Bool_closure of dummy
(* who cares if Bool_closure has tag 0 rather than 247? *)

@ppedrot ppedrot closed this Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants