Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Language mode for Rust #6219

Open
danielsn opened this issue Jul 8, 2021 · 13 comments
Open

RFC: Language mode for Rust #6219

danielsn opened this issue Jul 8, 2021 · 13 comments
Assignees
Labels
aws Bugs or features of importance to AWS CBMC users Kani Bugs or features of importance to Kani Rust Verifier

Comments

@danielsn
Copy link
Contributor

danielsn commented Jul 8, 2021

RMC is a new Rust front-end for CBMC. Currently, it uses the C mode in the symbol-table. We propose adding a Rust mode to symbols. For now, this mode would have the same semantics as C, but would allow us to distinguish rust code from C code. This is particularly important as Rust support linking C code using a FFI, which we need to support in RMC.

Benefits

  1. Allows tools which consume CBMC output to know which language the initial source code was in. For example, when giving a failure trace that includes both Rust and C code, it would be valuable to the trace visualizer to know which language the statements originally came from
  2. Allows the backend to implement different semantics for the different languages. For example (maybe not the best one), one could imagine a case where one language wants unsigned addition to wrap around, while another wants it to saturate.
  3. Allows the addition of language specific checks. For example, Rust has areas of UB which are not defined as such in C - it may be useful to have checks which are only automatically inserted into the Rust side of a linked Rust/C binary.

Design considerations

Rust is not the only language that would benefit from this. Any new language front-end will likely see the same issues we are, and could benefit from a principled mechanism to alleviate them.

Links and documentation

irep_idt mode;

Currently, CBMC appears to have the following langauges:

void cbmc_parse_optionst::register_languages()
{
register_language(new_ansi_c_language);
register_language(new_statement_list_language);
register_language(new_cpp_language);
register_language(new_json_symtab_language);
#ifdef HAVE_JSIL
register_language(new_jsil_language);
#endif
}

@danielsn danielsn added the aws Bugs or features of importance to AWS CBMC users label Jul 8, 2021
@TGWDB TGWDB self-assigned this Jul 9, 2021
@martin-cs
Copy link
Collaborator

If we are adding this then it would also be good to have these for Ada for the https://github.com/diffblue/gnat2goto/ front-end which is currently also using the C mode.

@kroening
Copy link
Member

kroening commented Jul 9, 2021

The expected pattern for this (Rust or Ada) is to set up a separate executable, with specialised command-line options. The executable then registers the relevant languages using the register_language API.

Look at jbmc as an exemplar.

@martin-cs
Copy link
Collaborator

Both of them use the json_symtab language which is already supported.

@martin-cs
Copy link
Collaborator

I guess more generally; for languages that are using the json_symtab interface, what should mode be set to? gnat2goto (and I think RMC) both lie and set the mode to "C".

@peterschrammel
Copy link
Member

a. I'm with @martin-cs on this one: json_symtab shouldn't be a 'language', but it should have 'mode' property.
b. The issue is that register_language registers a front-end implementation for handling certain input files, which is a slightly different concept than 'language'.
c. Having a separate executable is preferable if language-specific options are required or the command line is adapted to seamlessly fit into the language-specific environment (like jbmc supporting the command line syntax of java), but this is somewhat orthogonal.

But back to points 1, 2, 3:

  1. I don't see any issue with adding ID_rust. Likely needs to do something about a and b.
  2. JBMC solves different semantics (e.g. int/float conversions that are different in Java vs C) in the frontend. Language-specific cases in the backend are not the right mechanism. IMO the correct solution is to add the required semantics as primitives to exprt (like @kroening has recently done for the remainder operations) and then translate them to the right primitives in the frontend.
  3. JBMC handles language-specifc checks by having a pass similar to goto_check, which requires 1 and c. This could be avoided by inserting them in the frontend - likely depends on the particular checks. Language-specific cases in goto-check are not the right mechanism IMO.

@TGWDB
Copy link
Contributor

TGWDB commented Jul 14, 2021

Taking into account the comments and discussion above, would a reasonable choice be as follows.

To support the handling of rust in displays and mixed input (i.e. a goto binary comprising of a mixture of C and rust:

  1. Extend the json_symtab "language" to have a mode property.
  2. Support (at least) json_symtab modes for ID_C and ID_Rust.

Then for semantic differences between the languages and different language checks:

  1. Extend cbmc's capabilities to handle these new semantics (as mentioned by @peterschrammel and done by @kroening for other semantics). These can be done for the semantics desired, but not tied to a specific language or language mode (i.e. one could use them in any language).
  2. Expect alternate front ends (e.g. gnat2goto and rmc) to handle the insertion of correct checks or goto programs that use the above new semantics.

Comments/feedback very welcome!

@martin-cs
Copy link
Collaborator

@TGWDB I agree with splitting this in two (getting the correct mode and then how and where you do language-specific things) and with the general principle that mode should be correct (i.e. we should have ID_C, ID_cpp, ID_java, ID_rust, ID_ada, etc.). However may I raise a few concerns with your plan:

  1. The mode is compared to the output of the id() method:

    languages.back().mode=l->id();

    which for json_symtab is:

    it is not clear to me what this should be set to.

  2. mode is used in two ways, it is either checked directly in language specific bits of code (actually pretty rare; searching for ID_java or ID_cpp shows this) or it is used by src/langapi/mode.h to get the relevant languaget mostly for output via src/langapi/language_util.h.

  3. Sounds good.

  4. I get the principle and the whole point of a modular framework is so that we can have lots of different executables, so that is good. I have a slight concern about how we handle mixed language goto-programs, which is a desirable goal for both Ada and Rust ( @danielsn ? ).

As far as I can see there are two ways of resolving this and #6223 .

Simple fix approach : inherit from json_symtab_languaget to create rust_languaget and ada_languaget which have id()s of ID_rust and ID_ada respectively and implement from_type() and from_expr() etc. symtab2gb then loads all of these language front-ends and possibly the C front-end as well for good measure.

More involved but fuller solution : split languaget into two, one part for front-end, parsing and type checking and one part for language-specific support and output. Implement just the second part for ID_rust and ID_ada. Load the language-specific support and output for all languages in all tools. The front-end part stays the same in all tools, including symtab2gb which only needs json_symtab support.

HTH

@chrisr-diffblue
Copy link
Contributor

I think I agree with most of the points raised by @peterschrammel @TGWDB and @martin-cs - in particular, I do quite like the idea of splitting languaget into explicit "front-end/reporting" and "output/reporting/mid-end/back-end" - that I think enables json_symtab to remain a fairly focused, language-agnostic "intermediate representation" rather than having bits of ad-hoc language specific knowledge bolted on.

@martin-cs
Copy link
Collaborator

@TGWDB any feel for a time-frame on this one? If it is soon then I won't merge #6233

@danielsn
Copy link
Contributor Author

@martin-cs I also like the idea of splitting into front-end and back-end parts.

@martin-cs
Copy link
Collaborator

@TGWDB sorry to hassle but you are assigned as the owner for this. I need to #6223 resolved ASAP. #6233 will do it but I really don't want to merge it because it is a work-around. If you are going to resolve this in the next 24-48 hours then I won't merge #6233 .

Also @chrisr-diffblue and @peterschrammel for general interest.

@TGWDB
Copy link
Contributor

TGWDB commented Jul 27, 2021

@martin-cs I cannot promise resolution in 24-48 hours so I propose you go ahead with #6233.

@martin-cs
Copy link
Collaborator

@TGWDB thanks for the fast response.

@jimgrundy jimgrundy added aws-medium aws Bugs or features of importance to AWS CBMC users and removed aws Bugs or features of importance to AWS CBMC users labels Aug 24, 2021
@feliperodri feliperodri added the Kani Bugs or features of importance to Kani Rust Verifier label Oct 6, 2022
@feliperodri feliperodri assigned NlightNFotis and unassigned TGWDB Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users Kani Bugs or features of importance to Kani Rust Verifier
Projects
None yet
Development

No branches or pull requests

9 participants