Skip to content

Porting Cookbook

Yavor Konstantinov edited this page Jul 25, 2019 · 10 revisions

Table of Contents

Intro

The goal of this document is to provide guidance for the developer when porting code from Emacs C code to Rust. Be sure to read CONTRIBUTING.md for details on the style of the Rust code and what makes great pull requests.

Naming conventions

Code in Remacs should use the same naming conventions for elisp namespaces, to make translation straightforward.

This means that an elisp function do-stuff will have a corresponding Rust function Fdo_stuff, and a declaration struct Sdo_stuff. A lisp variable do-stuff will have a Rust variable Vdo_stuff and a symbol 'do-stuff will have a Rust variable Qdo_stuff.

Otherwise, we follow Rust naming conventions, with docstrings noting equivalent functions or macros in C. When incrementally porting, we may define Rust functions with the same name as their C predecessors.

Bindgen

We use rust-bindgen to automatically generate bindings. This will result in longer compile times, but also in less time to port C code to Rust. Since creating those bindings manually also was the source of several bugs, bindgen is a major improvement.

Unfortunately bindgen sometimes doesn't recognize when there are new bindings in the C code base(for example when you remove the function attribute static). In this case you have to clear the existing bindings with cargo clean.

Function as argument

When you need to pass a function as argument to a C function like

Lisp_Object internal_catch (Lisp_Object tag, Lisp_Object (*func) (Lisp_Object), Lisp_Object arg)

you have to use Some(func).

unsafe { internal_catch(val, Some(Fprogn), body) }

Docstrings and comments

Usually we just copy the original docstrings and comments. But if you see any potential improvements you can change the existing docs(Related discussion).

Misc

Automatically closing issues.

You can review and approve PRs, but let the author do the merge.

When you have a build error that you can't explain, it sometimes helps to use make clean. Don't forget that you also have to recompile all elisp files after you used this command.

Lisp Functions

The first thing to look at is the C implementation for the atan function. It takes an optional second argument, which makes it interesting. The complicated mathematical bits, on the other hand, are handled by the standard library. This allows us to focus on the porting process without getting distracted by the math.

The Lisp values we are given as arguments are tagged pointers. In this case they are pointers to doubles. The code has to check the tag and follow the pointer to retrieve the real values. Note that this code invokes a C macro (called DEFUN) that reduces some of the boilerplate. The macro declares a static varable called Satan that holds the metadata the Lisp compiler will need in order to successfully call this function, such as the docstring and the pointer to the Fatan function, which is what the C implementation is named:

DEFUN ("atan", Fatan, Satan, 1, 2, 0,
       doc: /* Return the inverse tangent of the arguments.
If only one argument Y is given, return the inverse tangent of Y.
If two arguments Y and X are given, return the inverse tangent of Y
divided by X, i.e. the angle in radians between the vector (X, Y)
and the x-axis.  */)
  (Lisp_Object y, Lisp_Object x)
{
  double d = extract_float (y);
if (NILP (x))
    d = atan (d);
  else
    {
      double d2 = extract_float (x);
      d = atan2 (d, d2);
    }
  return make_float (d);
}

extract_float checks the tag (signalling an "invalid argument" error if it's not the tag for a double), and returns the actual value. NILP checks to see if the tag indicates that this is a null value, indicating that the user didn't supply a second argument at all.

Next take a look at the current Rust implementation. It must also take an optional argument, and it also invokes a (Rust) macro to reduce the boilerplate of declaring the static data for the function. However, it also takes care of all of the type conversions and checks that we need to do in order to handle the arguments and return value:

/// Return the inverse tangent of the arguments.
/// If only one argument Y is given, return the inverse tangent of Y.
/// If two arguments Y and X are given, return the inverse tangent of Y
/// divided by X, i.e. the angle in radians between the vector (X, Y)
/// and the x-axis
#[lisp_fn(min = "1")]
pub fn atan(y: EmacsDouble, x: Option<EmacsDouble>) -> EmacsDouble {
    match x {
        None => y.atan(),
        Some(x) => y.atan2(x)
    }
}

You can see that we don't have to check to see if our arguments are of the correct type. The code generated by the lisp_fn macro does this for us. We also asked for the second argument to be an Option<EmacsDouble>. This is the Rust type for a value which is either a valid double or isn't specified at all. We use a match statement to handle both cases.

This code is so much better that it's hard to believe just how simple the implementation of the macro is. It just calls .into() on the arguments and the return value. The compiler does the rest when it dispatches this method call to the correct implementation.

Names

DEFUN ("numberp", Fnumberp, Snumberp, 1, 1, 0,
       doc: /* Return t if OBJECT is a number (floating point or integer).  */
       attributes: const)
  (Lisp_Object object)

The DEFUN macro, in addition to defining a function Fnumberp, also creates a static struct Snumberp that describes the function for Emacs' Lisp interpreter.

In Rust, we define a numberp function that does the actual work then use an attribute (implemented as a procedural macro) named lisp_fn that handles these definitions for us:

The elisp name of the function is derived from the Rust name, with underscores replaced by hyphens. However it's also possible to give an elisp name as an argument like #[lisp_fn(name = "default-value")] in the example above.

Sometimes there are functions with the same name in lisp and C. In this case we have to define the name for C with c_name = "default_value". The Rust function name is default_value_lisp.

#[lisp_fn(c_name = "default_value", name = "default-value")]
pub fn default_value_lisp(symbol: LispSymbolRef) -> LispObject {

Minimum and maximum args

Optional arguments are also possible: to make the minimum number of arguments from elisp different from the number of Rust arguments, pass a min = "n" argument. Any parameter after n can be either an Option wrapped type or a LispObject. The value will be None in the former case and nil in the latter.

DEFUN ("buffer-size", Fbuffer_size, Sbuffer_size, 0, 1, 0,

0,1,0 means "minimum of 0, maximum of 1 and 0 for no intspec"

In Rust:

#[lisp_fn(min = "0")]

Many

Some functions don't have an upper limit to the number of arguments. In C, this is denoted with MANY. In rust, the #[lisp_fn] macro handles this automatically if the function has one &mut [LispObject] argument.

DEFUN ("append", Fappend, Sappend, 0, MANY, 0,
  (ptrdiff_t nargs, Lisp_Object *args)
#[lisp_fn]
pub fn append(args: &mut [LispObject]) -> LispObject {
}

Intspec

DEFUN ("goto-char", Fgoto_char, Sgoto_char, 1, 1, "NGoto char: ",
#[lisp_fn(intspec = "NGoto char: ")]

https://www.gnu.org/software/emacs/manual/html_node/elisp/Interactive-Codes.html

Unevalled

DEFUN ("quote", Fquote, Squote, 1, UNEVALLED, 0,
#[lisp_fn(unevalled = "true")]

Return Types

As seen above, lisp_fn can cast return types to LispObjects.

bool

A rust bool will result in either Qt or Qnil.

pub fn numberp(object: LispObject) -> bool {

Numbers

pub fn string_bytes(string: LispStringRef) -> EmacsInt {
pub fn move_to_column(column: EmacsUint, force: LispObject) -> EmacsUint {
pub fn float_time(time: LispObject) -> EmacsDouble {
pub fn frame_text_height(frame: LispObject) -> i32 {

Option

The macro also supports Option types. In the following example, the method as_buffer returns a Option<LispBufferRef>. In case of Some(buffer) the returned LispObject will be a Lisp_Buffer and for None it will be Qnil.

pub fn as_buffer(self) -> Option<LispBufferRef> {
    self.as_vectorlike().and_then(|v| v.as_buffer())
}

It's also possible to return LispBufferRef.

pub fn as_buffer_or_error(self) -> LispBufferRef {
    self.as_buffer()
        .unwrap_or_else(|| wrong_type!(Qbufferp, self))
}

defsubr

At the end of the C file where the DEFUN is defined there is a called syms_of.... In this file the C code calls defsubr to setup the link between the C code and the Lisp engine. When porting a DEFUN from C, the defsubr call needs to be removed as well. For instance, if syntax-table-p is being ported then find the line like defsubr (&Ssyntax_table_p); and remove it. The all Rust functions declared with lisp_fn have a defsubr line generated for them by the build so there is nothing to do on the Rust side.

DEFSYM

In C, the DEFSYM macro is used to create an entry in the Lisp symbol table. These are analogous to global variables in the C/Rust code. Like defsubr you will most often see these in the syms_of... functions. When porting DEFUNs check to see if there is a matching DEFSYM as well. If there is remove it from the C and below the ported Rust code add a line like this: def_lisp_sym!(Qsyntax_table_p, "syntax-table-p");.

Lisp Variables

You may also be aware that the C code must quickly and frequently access the current value of a large number of Lisp variables. To make this possible, the C code stores these values in global variables. Yes, lots of global variables. In fact, these aren't just file globals accessible to only one translation unit, these are static variables that are accessible across the whole program. We've started porting these to Rust now as well.

DEFVAR_LISP ("post-self-insert-hook", Vpost_self_insert_hook,
    doc: /* Hook run at the end of `self-insert-command'.
    This is run after inserting the character.  */);
      Vpost_self_insert_hook = Qnil;

Like DEFUN, DEFVAR_LISP takes both a Lisp name and the C name. The C name becomes the name of the global variable, while the Lisp name is what gets used in Lisp source code. Setting the default value of this variable happens in a separate statement, which is fine.

/// Hook run at the end of `self-insert-command'.
/// This is run after inserting the character.
defvar_lisp!(Vpost_self_insert_hook, "post-self-insert-hook", Qnil);

The Rust version must still take both names (this could be simplified if we wrote this macro using a procedural macro), but it also takes a default value. As before, the docstring becomes a comment which all other Rust tooling will recognize.

You might be interested in how this is implemented as well:

#define DEFVAR_LISP(lname, vname, doc)		\
  do {						\
    static struct Lisp_Objfwd o_fwd;		\
    defvar_lisp (&o_fwd, lname, &globals.f_ ## vname);		\
  } while (false)

The C macro is not very complicated, but there are two somewhat subtle points. First, it creates an (uninitialized) static variable called o_fwd, of type Lisp_Objfwd. This holds the variable's value, which is a Lisp_Object. It then calls the defvar_lisp function to initialize the fields of this struct, and also to register the variable in the Lisp runtime's global environment, making it accessible to Lisp code.

The first subtle point is that every invocation of this marco uses the same variable name, o_fwd. If you call this macro more than once inside the same scope, then they would all be the exact same static variable. Instead the macro body is wrapped inside a do while false loop so that each one has a separate little scope to live in.

The other subtlty is that the Lisp_Objfwd struct actually only has a pointer to the value. We still have to allocate some storage for that value somewhere. We take the address of a field on something called globals here. That's the real storage location. This globals object is just a big global struct that holds all the global variables. One day when Emacs is really multi-threaded, there can be one of these per thread and a lot of the rest of the code will just work.

#[macro_export]
macro_rules! defvar_lisp {
    ($field_name:ident, $lisp_name:expr, $value:expr) => {{
        #[allow(unused_unsafe)]
        unsafe {
            #[allow(const_err)]
            static mut o_fwd: ::hacks::Hack<::data::Lisp_Objfwd> =
                unsafe { ::hacks::Hack::uninitialized() };
            ::remacs_sys::defvar_lisp(
                o_fwd.get_mut(),
                concat!($lisp_name, "\0").as_ptr() as *const i8,
                &mut ::remacs_sys::globals.$field_name,
            );
            ::remacs_sys::globals.$field_name = $value;
        }
    }};
}

The Rust version of this macro is rather longer. Primarily this is because it takes a lot more typing to get a proper uninitialized value in a Rust program. Some would argue that all of this typing is a bad thing, but this is very much an unsafe operation. We're basically promising very precisely that we know this value is uninitialized, and that it will be completely and correctly initialized by the end of this unsafe block.

We then call the same defvar_lisp function with the same values, so that the Lisp_Objfwd struct gets initialized and registered in exactly the same way as in the C code. We do have take care to ensure that the Lisp name of the variable is a null-terminated string though.

Porting tips

C Macros

For C macros, we try to define a fairly equivalent Rust function. The docstring should mention the original macro name. Sometimes C macros become Rust macros.

Since the Rust function is not a drop-in replacement, we prefer Rust naming conventions for the new function.

For the checked arithmetic macros (INT_ADD_WRAPV, INT_MULTIPLY_WRAPV and so on), you can simply use .checked_add, .checked_mul from the Rust stdlib.

Some C Macros and their equivalents

In general, macros which end with the letter P(CONSP, NILP...) are used to test whether or not a certain condition is true(is the object a cons cell, is the object nil...). On the other hand, macros which start with the letter X are used to extract/convert one object into another(XCAR, XCDR...).

Here is a rough map of some C macros and their rust equivalients within this project:

  • CONSP(obj) <-> obj.is_cons()
  • listN(obj1, obj2, ...) <-> list!(obj1, obj2,...) Here N is a number(list1, list2,...)
  • XCONS(obj) <-> obj.as_cons()
  • XCDR(obj) <-> obj.cdr()
  • XCAR(obj) <-> obj.car()

Misc

eassert in Emacs C should be debug_assert! in Rust.

emacs_abort() in Emacs C should be panic!("reason for panicking") in Rust.

Iterators

This is a basic iterator example. There are also more complex iterator implementations in Remacs.

impl LispOverlayRef {
    pub fn iter(self) -> LispOverlayIter {
        LispOverlayIter {
            current: Some(self),
        }
    }
}

pub struct LispOverlayIter {
    current: Option<LispOverlayRef>,
}

impl Iterator for LispOverlayIter {
    type Item = LispOverlayRef;

    fn next(&mut self) -> Option<Self::Item> {
        let c = self.current;
        match c {
            None => None,
            Some(o) => {
                self.current = LispOverlayRef::from_ptr(o.next as *mut c_void);
                c
            }
        }
    }
}

We can see the overlay iterator in action when we take a look at the function overlay-lists.

DEFUN ("overlay-lists", Foverlay_lists, Soverlay_lists, 0, 0, 0,
  (void)
{
  struct Lisp_Overlay *ol;
  Lisp_Object before = Qnil, after = Qnil, tmp;

  for (ol = current_buffer->overlays_before; ol; ol = ol->next)
    {
      XSETMISC (tmp, ol);
      before = Fcons (tmp, before);
    }
  for (ol = current_buffer->overlays_after; ol; ol = ol->next)
    {
      XSETMISC (tmp, ol);
      after = Fcons (tmp, after);
    }

  return Fcons (Fnreverse (before), Fnreverse (after));
}

becomes

pub fn overlay_lists() -> LispObject {
    let list_overlays = |ol: LispOverlayRef| -> LispObject {
        ol.iter()
            .fold(Qnil, |accum, n| unsafe { Fcons(n.as_lisp_obj(), accum) })
    };

    let cur_buf = ThreadState::current_buffer();
    let before = cur_buf.overlays_before().map_or(Qnil, &list_overlays);
    let after = cur_buf.overlays_after().map_or(Qnil, &list_overlays);
    unsafe { Fcons(Fnreverse(before), Fnreverse(after)) }
}

Rustfmt

In order to pass Travis checks on pull requests, the source has to be formatted according to the default style of rustfmt, as packaged with the Rust nightly in rust-toolchain. To do that, install rustfmt:

$ rustup component add rustfmt-preview

Currently it's necessary to run the rustup command again after we updated the project's toolchain.

Make sure you uninstall the crate version of rustfmt first. The new component will install its own set of binaries.

$ cargo uninstall rustfmt
$ cargo uninstall rustfmt-nightly

Then you can run this in the checkout root to reformat all Rust code:

$ make rustfmt

Rustdoc builds

You can use rustdoc to generate API docs:

# http://stackoverflow.com/a/39374515/509706
$ cargo rustdoc -- \
    --no-defaults \
    --passes strip-hidden \
    --passes collapse-docs \
    --passes unindent-comments \
    --passes strip-priv-imports

You can then open these docs with:

$ cargo doc --open

Tests

Running tests

Run elisp and Rust tests in toplevel directory. If run in a subdirectory, only run the tests in that directory.

  • make check Run all tests as defined in the directory. Expensive tests are suppressed. The result of the tests for .el is stored in .log.

  • make check-maybe Like "make check", but run only the tests for files that have been modified since the last build.

Writing tests

Elisp

For elisp testing, remacs uses ert.

Add new tests to test/rust_src/src/<filename>-tests.el. There are good examples in the directory to follow. In general, there should be at least one test function for each Rust function. This function should be a 'smoke' test. Does the Rust call succeed for common values? Does it fail for common values? More complex tests or tests that involve several lisp functions should be defined in a function named after what the test is trying to validate.

As an example here is how the if function is tested:

    (ert-deftest eval-tests--if-base ()
      "Check (if) base cases"
      (should-error (eval '(if)) :type 'wrong-number-of-arguments)
      (should (eq (if t 'a) 'a))
      (should (eq (if t 'a 'b) 'a))
      (should (eq (if nil 'a) nil))
      (should (eq (if nil 'a 'b) 'b))
      (should (eq (if t 'a (error "Not evaluated!")) 'a))
      (should (eq (if nil (error "Not evaluated!") 'a) 'a)))
    
    (ert-deftest eval-tests--if-dot-string ()
      "Check that Emacs rejects (if . \"string\")."
      (should-error (eval '(if . "abc")) :type 'wrong-type-argument)
      (let ((if-tail (list '(setcdr if-tail "abc") t)))
        (should-error (eval (cons 'if if-tail))))
      (let ((if-tail (list '(progn (setcdr if-tail "abc") nil) t)))
        (should-error (eval (cons 'if if-tail)))))

Rust

#[cfg(test)]
use std::cmp::max;

#[test]
fn test_lisp_float_size() {
    let double_size = mem::size_of::<EmacsDouble>();
    let ptr_size = mem::size_of::<*const Lisp_Float>();

    assert!(mem::size_of::<Lisp_Float>() == max(double_size, ptr_size));
}

Porting examples

Simple case

DEFUN ("syntax-table-p", Fsyntax_table_p, Ssyntax_table_p, 1, 1, 0,
       doc: /* Return t if OBJECT is a syntax table.
Currently, any char-table counts as a syntax table.  */)
  (Lisp_Object object)
{
  if (CHAR_TABLE_P (object)
      && EQ (XCHAR_TABLE (object)->purpose, Qsyntax_table))
    return Qt;
  return Qnil;
}
/// Return t if OBJECT is a syntax table.
/// Currently, any char-table counts as a syntax table.
#[lisp_fn]
pub fn syntax_table_p(object: LispObject) -> bool {
    object
        .as_char_table()
        .map_or(false, |v| v.purpose == Qsyntax_table)
}

Some useful Emacs terminology

  • impure/pure cons-cells: A cons-cell is pure if it is read only(its value cannot be changed). A cons-cell is impure if it can be changed. You may often times see checks in both the C and rust code which check if a cons-cell is pure/impure.
Clone this wiki locally