-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jane Street 0.11.x library compatibility + minor fixes #820
Conversation
dae54b6
to
da20817
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
This PR introduces dependency on Core that is non-portable. BAP may only depend on Core_kernel. So can you please remove all references to the Core library.
-
This PR changes the public interface. That will force us to bump up the major version of BAP (Graphlib, Monads, etc). I don't want to make a major release of BAP just because of the Janestreet's changes. So we need to keep the
string
types in the interface if it is possible, otherwise we need to disable safe-strings in our compilation options and keep it this way until the BAP 2.0 release. It doesn't sound like a lot of fun, so let's try the first option. I'm fine if we will have a couple of (possible unsafe) translations from strings to bytes.
And thanks a lot. This looks like a huge amount of work!
lib/bap/bap_install_printers.ml
Outdated
install_printer printer) | ||
|
||
let () = install_printers () | ||
let () = List.iter (Pretty_printer.all ()) ~f:install_printer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to check that it works, the code on the left was basically fixing a bug in Core. I'm not sure that it is fixed.
@@ -15,7 +15,7 @@ module Plugin_rules = struct | |||
|
|||
let default_predicates = [ | |||
"custom_ppx"; | |||
"ppx_driver"; | |||
"ppxlib"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it a new name of the same predicate? Can you, actually, elaborate a little more on the ppx_driver story?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More or less, yes. As of ppx_driver.v0.11.0
:
arch-vm[1124]:~% opam show ppx_driver.v0.11.0
=-=- ppx_driver: information on all versions =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
name ppx_driver
all-installed-versions v0.10.4 [4.05.0+flambda]
all-versions 113.09.00 113.24.00 113.33.00 113.33.00+4.03 113.33.01+4.03 113.33.02+4.03 113.33.03 113.33.04 v0.9.0
v0.9.1 v0.9.2 v0.10.0 v0.10.1 v0.10.2 v0.10.3 v0.10.4 v0.11.0 v0.11.117.00+101
=-=- Version-specific details -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
version v0.11.0
*...*
depends: "ocaml" {>= "4.04.1"}
"jbuilder" {build & >= "1.0+beta18.1"}
"ppxlib" {>= "0.1.0"}
synopsis Deprecated: use ppxlib instead
(I have the following in my OPAM switch for 4.06.1+flambda
):
ppxlib.0.2.0 http https://github.com/ocaml-ppx/ppxlib/releases/download/0.2.0/ppxlib-0.2.0.tbz
.merlin
Outdated
|
||
PKG core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BAP can't depend on Core as it is not portable, we can only use Core_kernel and Base
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a problem: I'll have to double check, but if I recall, some things have been moved to Core. :(
lib/bap_image/bap_fileutils.ml
Outdated
@@ -1,11 +1,11 @@ | |||
open Core_kernel.Std | |||
open Core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no Core please :)
lib/bap_llvm/bap_llvm_ogre_loader.ml
Outdated
@@ -1,4 +1,4 @@ | |||
open Core_kernel.Std | |||
open Core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no Core
@@ -89,4 +89,4 @@ let rec sexp_of_exp = function | |||
sexp_of_exp x; | |||
sexp_of_exp y; | |||
] | |||
and sexps_of_exps = List.map ~f:sexp_of_exp | |||
and sexps_of_exps exps = List.map exps ~f:sexp_of_exp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rejected since ocaml/ocaml#556
lib/bap_traces/bap_trace_meta.ml
Outdated
@@ -41,7 +41,7 @@ module Binary = struct | |||
@\nmd5sum: %s\ | |||
@\n@[<2>\ | |||
args: %a@]\ | |||
@\n@[<2>envp:@\n%a@]@]@\n}" t.path (Digest.to_hex t.md5sum) Args.pp t.args Envp.pp t.envp | |||
@\n@[<2>envp:@\n%a@]@]@\n}" t.path t.md5sum Args.pp t.args Envp.pp t.envp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will print digest as a string of bytes instead of a nice and common to us hex representation of the MD5sum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? t.md5sum
is already a string. The value t
here is not of type Md5_lib.t
--merlin says:
type t =
t = {
path : string;
args : string array;
envp : string array;
md5sum : string;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it is a string
, but not a string of ASCII characters, but a string of bytes. The string
type is not constrained only to printable characters, the only difference between strings and bytes is that the former is non-mutable. So in the trace md5sum
is stored as 32 bytes of data, not in a textual and printable representation.
@@ -545,7 +545,7 @@ module Simpl = struct | |||
and while_ c ss = match exp c with | |||
| Int x when Word.is_zero x -> [] | |||
| c -> [While (c, bil ss)] | |||
and bil = List.concat_map ~f:stmt in | |||
and bil stmts = List.concat_map stmts ~f:stmt in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yet another ocaml/ocaml#556 rejection
lib/monads/monads.mli
Outdated
unspecified order, and discards the results. *) | ||
val all_ignore : ('a,'e) m t -> (unit,'e) m | ||
val all_unit : ('a,'e) m t -> (unit,'e) m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to preserve the old interface, otherwise we need to release Monads 2.0.
I don't like the idea of bumping the major version just because of changes in Core.
@@ -35,7 +35,7 @@ end | |||
module Digest = struct | |||
include String | |||
|
|||
let make s = s |> Digest.string |> Digest.to_hex | |||
let make s = s |> Md5.digest_string |> Md5.to_hex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably it is easier just to rely on Caml.Digest
ad7ba5f
to
65d6ee3
Compare
I've addressed the Core portability issue; now we are back to only needing Core_kernel. However, the fix for this ends up bringing in Since we'd be requiring OCaml 4.06, it would then probably make sense to just merge the |
The |
I was hoping for 4.04 and above. If core works from 4.04 and above, then I believe we can also be as compatible as they are. |
OK :)
I'll see how much trouble it is to separate the JSC-related changes from the |
Sorry for the wait. The branch now contains a JSC 0.11.x-compatible version that works with How should we handle vendoring |
@ivg ping :) |
@drvink, thanks, I actually missed the last update. I'm on it right now. So far, LGTM... but this
Yep, moreover it will need a separate repository and a separate package, as it will have a different license (Apache perhaps) and we can't pollute our license namespace. So, ideally, it should be a separate opam package, on which we should depend. So this is our roadmap:
1) Not only ugly it is, it will also have a severe impact on our users, as they would be forced to rewrite their code as well, so I'm considering different options to smooth the transition, e.g., providing a compatibility module that will support both |
OK. Where should I make the PR for the vendored ppxlib? I agree that the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only few changes are needed. We need to preserve the interfaces, so Bap_dwarf should be reverted back to strings, and in Regular we need to provide the compatibility functions, i.e.,blit_to_string
(probably marking it as deprecated to discourage its usage)
Also the code with pprinting to hexes could be simplified by relying on either Caml.Digest.to_hex
or Md5.to_hex
functions
lib/bap_dwarf/dwarf_fbi.mli
Outdated
@@ -16,10 +16,10 @@ type fn [@@deriving bin_io, compare, sexp] | |||
- .debug_info [Section.Info] | |||
- .debug_str [Section.Str] | |||
*) | |||
val create : string Dwarf_data.t -> t Or_error.t | |||
val create : bytes Dwarf_data.t -> t Or_error.t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep string
s in the interface. Especially, since we're not going to mutate them, it is better to use the string
type to denote this.
lib/bap_traces/bap_trace_meta.ml
Outdated
@@ -41,7 +51,8 @@ module Binary = struct | |||
@\nmd5sum: %s\ | |||
@\n@[<2>\ | |||
args: %a@]\ | |||
@\n@[<2>envp:@\n%a@]@]@\n}" t.path (Digest.to_hex t.md5sum) Args.pp t.args Envp.pp t.envp | |||
@\n@[<2>envp:@\n%a@]@]@\n}" | |||
t.path (hexstring_of_bytestring t.md5sum) Args.pp t.args Envp.pp t.envp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering, why we can't use Digest.to_hex
here? It is still in 4.06, and still has type string -> string
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I understand your comment now. Caml.Digest.to_hex
will be fine, but Core.Digest.to_hex
will not:
utop[3]# #show Core.Digest.to_hex;;
val to_hex : Core.Md5.t -> string
lib/regular/regular.mli
Outdated
@@ -580,7 +580,7 @@ module Std : sig | |||
?dump : (Out_channel.t -> 'a -> unit) -> | |||
?pp : (Format.formatter -> 'a -> unit) -> | |||
?size : ('a -> int) -> | |||
?blit_to_string:('a,string) copy -> | |||
?blit_to_bytes:('a,bytes) copy -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to keep blit_to_string
for compatibility, though adding blit_to_bytes
is also a good idea.
lib/regular/regular.mli
Outdated
starting from the given position. It is undefined behavior, | ||
if the [value] doesn't fit into the string [buf] *) | ||
val blit_to_string : 'a t -> string -> 'a -> int -> unit | ||
val blit_to_bytes : 'a t -> bytes -> 'a -> int -> unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please provide the blit_to_string
function, for compatibility.
oasis/common
Outdated
@@ -1,7 +1,7 @@ | |||
OASISFormat: 0.4 | |||
Name: bap | |||
Version: 1.5.0-dev | |||
OCamlVersion: >= 4.03 | |||
OCamlVersion: >= 4.06 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not 4.04.1 (as in core_kernel)?
| `Annot {parse_pos} -> parse_pos.text_line, parse_pos.text_char | ||
| `Sexp {parse_pos} -> parse_pos.text_line, parse_pos.text_char | ||
in | ||
Error (sprintf "Syntax error: line %d, column %d - %s" line col err.err_msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
To
We can't use
great! |
Yeah, that is why I figured using By the way, since |
Honestly, I don't know why we have optcomp in the dependencies. Most likely it was a missed dependency somewhere upstream. What I'm sure is that we're not using it anywhere in bap. So we can consider dropping this dep at all (assuming that upstream has fixed it) or indeed switching to something more modern. |
OK. I'll remove it and see if I can still build bap from a clean switch. |
7d26024
to
f981261
Compare
cf60f5f
to
ed56096
Compare
@drvink any updates on this? |
I think this is ready to go now thanks to @gitoleg. Sorry for not having looked into the ocamlbuild issue earlier (had I spent the moment to even check, this would have been ready for 1.5.0, since it seems the ppx stuff wasn't necessary at all anymore!) |
Also FYI, I've been maintaining a safe-string branch in parallel to this one, in |
@@ -580,7 +580,7 @@ module Std : sig | |||
?dump : (Out_channel.t -> 'a -> unit) -> | |||
?pp : (Format.formatter -> 'a -> unit) -> | |||
?size : ('a -> int) -> | |||
?blit_to_string:('a,string) copy -> | |||
?blit_to_string:('a,bytes) copy -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to deprecate this parameter, at least in the comments section, and here. Not sure whether the deprecated
tag will work on a parameter, but it is worthwhile to try
LGTM as soon as we will fix all the external dependencies. Correct me if I'm wrong, but the only one left is FrontC? I have made a courtesy notification to the author that we're taking his package from him, no reply so far. But I think that he will be OK. |
FrontC is the only thing I have a local pin for that hasn't already been made part of this PR, so I think so. |
Unrelated to FrontC vendoring, but seems there is also another option - https://github.com/jhjourdan/C11parser |
Seems something wrong with CI? It still fails with error that supposed to be fixed already. |
It seems that you need to upgrade also piqi & piqilib to the latest versions that support OCaml 4.06/4.07: |
godspeed! |
The big endian thumb and arm targets are [not supported by LLVM][1] and while [the fix][2] is available for many years it is still not reviewed. I tried to [raise the attention][3] to it but with no success. Since it looks like that they are not going to fix it we need to fix it on our side. The solution is a little bit hacky but we can keep it until we switch to Ghidra for disassembling/lifting. Examples, ``` $ bap mc --arch=thumb --show-insn=asm --show-bil -- 1a 42 tst r2, r3 { #1 := R2 & R3 ZF := #1 = 0 NF := extract:31:31[#1] } $ bap mc --arch=thumb --order=big --show-insn=asm --show-bil -- 42 1a tst r2, r3 { #1 := R2 & R3 ZF := #1 = 0 NF := extract:31:31[#1] } $ file echo echo: ELF 32-bit MSB executable, ARM, version 1 (ARM), dynamically linked, interpreter /lib/ld-uClibc.so.0, for GNU/Linux 2.0.0, stripped $ bap echo -dasm | grep 8c90 -A4 8c90: <main> 8c90: 8c90: e9 2d 47 f0 push {r4, r5, r6, r7, r8, r9, r10, lr} 8c94: e5 9f 33 34 ldr r3, [pc, #0x334] 8c98: e1 a0 70 01 mov r7, r1 8c9c: e1 a0 60 00 mov r6, r0 8ca0: e3 a0 00 06 mov r0, #6 $ objdump echo -d | grep 8c90 -A4 00008c90 <main@@base>: 8c90: e92d47f0 push {r4, r5, r6, r7, r8, r9, sl, lr} 8c94: e59f3334 ldr r3, [pc, BinaryAnalysisPlatform#820] ; 8fd0 <main@@base+0x340> 8c98: e1a07001 mov r7, r1 8c9c: e1a06000 mov r6, r0 8ca0: e3a00006 mov r0, #6 ``` fixes BinaryAnalysisPlatform#1299 [1]: https://bugs.llvm.org/show_bug.cgi?id=38721 [2]: https://reviews.llvm.org/D48811 [3]: https://twitter.com/ivg_t/status/1384932479967055877?s=20
The big endian thumb and arm targets are [not supported by LLVM][1] and while [the fix][2] is available for many years it is still not reviewed. I tried to [raise the attention][3] to it but with no success. Since it looks like that they are not going to fix it we need to fix it on our side. The solution is a little bit hacky but we can keep it until we switch to Ghidra for disassembling/lifting. Examples, ``` $ bap mc --arch=thumb --show-insn=asm --show-bil -- 1a 42 tst r2, r3 { #1 := R2 & R3 ZF := #1 = 0 NF := extract:31:31[#1] } $ bap mc --arch=thumb --order=big --show-insn=asm --show-bil -- 42 1a tst r2, r3 { #1 := R2 & R3 ZF := #1 = 0 NF := extract:31:31[#1] } $ file echo echo: ELF 32-bit MSB executable, ARM, version 1 (ARM), dynamically linked, interpreter /lib/ld-uClibc.so.0, for GNU/Linux 2.0.0, stripped $ bap echo -dasm | grep 8c90 -A4 8c90: <main> 8c90: 8c90: e9 2d 47 f0 push {r4, r5, r6, r7, r8, r9, r10, lr} 8c94: e5 9f 33 34 ldr r3, [pc, #0x334] 8c98: e1 a0 70 01 mov r7, r1 8c9c: e1 a0 60 00 mov r6, r0 8ca0: e3 a0 00 06 mov r0, #6 $ objdump echo -d | grep 8c90 -A4 00008c90 <main@@base>: 8c90: e92d47f0 push {r4, r5, r6, r7, r8, r9, sl, lr} 8c94: e59f3334 ldr r3, [pc, #820] ; 8fd0 <main@@base+0x340> 8c98: e1a07001 mov r7, r1 8c9c: e1a06000 mov r6, r0 8ca0: e3a00006 mov r0, #6 ``` fixes #1299 [1]: https://bugs.llvm.org/show_bug.cgi?id=38721 [2]: https://reviews.llvm.org/D48811 [3]: https://twitter.com/ivg_t/status/1384932479967055877?s=20
Fixes #786.
This previously also depended on:
FrontC
:safe-string
compatibility (needs vendoring)optcomp
:safe-string
compatibility (needs vendoring unless diml/optcomp#4 is merged)piqi
:safe-string
compatibility (works if pinned to dev repo)piqilib
:safe-string
compatibility (works if pinned to dev repo)ppxlib
: restoration ofocamlbuild
plugin (needs vendoring)...but
-safe-string
-as-default (as in OCaml 4.06) will be handled by a later PR.