Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port 'string-lessp' to Rust #217

Merged
merged 15 commits into from
Jul 11, 2017
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions rust_src/remacs-sys/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -665,6 +665,11 @@ pub struct Lisp_String {
pub data: *mut libc::c_char,
}

// @TODO
#[repr(C)]
pub struct Lisp_Symbol {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcmarkbit, redirect ...etc are missing ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is my understanding of the situation re: the mentioned variables. If anyone spots any incorrect information please do not hesitate to correct me:

Lisp_Symbol has the following definition in lisp.h (I've stripped out the comments for clarity):

struct Lisp_Symbol
{
  bool_bf gcmarkbit : 1;
  ENUM_BF (symbol_redirect) redirect : 3;
  ENUM_BF (symbol_trapped_write) trapped_write : 2;
  unsigned interned : 2;
  bool_bf declared_special : 1;
  bool_bf pinned : 1;
  Lisp_Object name;
  union {
    Lisp_Object value;
    struct Lisp_Symbol *alias;
    struct Lisp_Buffer_Local_Value *blv;
    union Lisp_Fwd *fwd;
  } val;
  Lisp_Object function;
  Lisp_Object plist;
  struct Lisp_Symbol *next;

This struct is using the C notation for bit fields. According to my research, Rust does not (rust-lang/rfcs#314, https://users.rust-lang.org/t/c-structs-with-bit-fields-and-ffi/1429) support setting up an equivalent for C bitfieldsin it's #[repr(C)] directive.

On my system (64-bit Ubuntu 16.04), no matter what the typedef of bf_bool or ENUM_BF, the bit field section of the struct takes 4 bytes. Due to alignment, the struct is padded, and offsetof(struct Lisp_Symbol, name) reports 8. My initial solution to representing this struct in Rust was to represent the bit field block as a u32, taking up the 4 bytes I mentioned earlier. I have not fully convinced myself that this is 100% the correct and portable thing to do.

Even if we can safely represent this block with a u32 on every system we support, it seems that there are 'gotcha's with accessing these bit fields via the bit wise operators (due to endianness, and compiler differences w.r.t bit field implementation.)

Overall I am not sure the best way to handle the interop for C structs that use bit fields that we need to access in Rust. It seems that one option is to simply pad the Rust struct as best we can, and if we need to access these fields, we will need to maintain C bindings that access them for us.

}

extern "C" {
pub static mut globals: emacs_globals;
pub static Qt: Lisp_Object;
Expand All @@ -675,6 +680,7 @@ extern "C" {
pub static Qintegerp: Lisp_Object;
pub static Qfloatp: Lisp_Object;
pub static Qstringp: Lisp_Object;
pub static Qsymbolp: Lisp_Object;
pub static Qlistp: Lisp_Object;
pub static Qmarkerp: Lisp_Object;
pub static Qwholenump: Lisp_Object;
Expand Down
23 changes: 19 additions & 4 deletions rust_src/src/lisp.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,12 @@ use libc::{c_void, intptr_t};

use marker::{LispMarker, marker_position};
use multibyte::{LispStringRef, MAX_CHAR};
use symbols::LispSymbolRef;
use vectors::LispVectorlikeRef;
use buffers::LispBufferRef;

use remacs_sys::{EmacsInt, EmacsUint, EmacsDouble, EMACS_INT_MAX, EMACS_INT_SIZE,
EMACS_FLOAT_SIZE, USE_LSB_TAG, GCTYPEBITS, wrong_type_argument, Qstringp,
EMACS_FLOAT_SIZE, USE_LSB_TAG, GCTYPEBITS, wrong_type_argument, Qstringp, Qsymbolp,
Qnumber_or_marker_p, Qt, make_float, Qlistp, Qintegerp, Qconsp, circular_list,
internal_equal, Fcons, CHECK_IMPURE, Qnumberp, Qfloatp, Qwholenump, Qvectorp,
SYMBOL_NAME, PseudovecType};
Expand Down Expand Up @@ -150,14 +151,28 @@ impl LispObject {
}

// Symbol support (LispType == Lisp_Symbol == 0)

impl LispObject {
#[inline]
pub fn is_symbol(self) -> bool {
self.get_type() == LispType::Lisp_Symbol
}

pub fn symbol_name(&self) -> LispObject {
unsafe { LispObject::from_raw(SYMBOL_NAME(self.to_raw())) }
#[inline]
pub fn as_symbol(&self) -> Option<LispSymbolRef> {
if self.is_symbol() {
Some(LispSymbolRef::new(unsafe { mem::transmute(self.get_untaggedptr()) }))
Copy link
Collaborator Author

@DavidDeSimone DavidDeSimone Jul 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be incorrect. The code in lisp.h for XSYMBOL looks like

INLINE struct Lisp_Symbol *
(XSYMBOL) (Lisp_Object a)
{
#if USE_LSB_TAG
  return lisp_h_XSYMBOL (a);
#else
  eassert (SYMBOLP (a));
  intptr_t i = (intptr_t) XUNTAG (a, Lisp_Symbol);
  void *p = (char *) lispsym + i;
  return p;
#endif
}

and it looks like we will have to emulate this logic for getting the address to mem::transmute.

} else {
None
}
}

#[inline]
pub fn as_symbol_or_error(&self) -> LispSymbolRef {
if self.is_symbol() {
LispSymbolRef::new(unsafe { mem::transmute(self.get_untaggedptr()) })
} else {
unsafe { wrong_type_argument(Qsymbolp, self.to_raw()) }
}
}
}

Expand Down
5 changes: 3 additions & 2 deletions rust_src/src/multibyte.rs
Original file line number Diff line number Diff line change
Expand Up @@ -118,17 +118,18 @@ impl<'a> Iterator for LispStringRefIterator<'a> {
fn next(&mut self) -> Option<(Codepoint, usize)> {
if self.cur < self.string_ref.len_bytes() as usize {
let codepoint: Codepoint;
let point = self.cur;
let ref_slice = self.string_ref.as_slice();
if self.string_ref.is_multibyte() {
let (cp, advance) = multibyte_char_at(&ref_slice[self.cur..]);
codepoint = cp;
self.cur += advance;
} else {
codepoint = ref_slice[self.cur] as u32;
codepoint = ref_slice[self.cur] as Codepoint;
self.cur += 1;
}

Some((codepoint, self.cur))
Some((codepoint, point))
} else {
None
}
Expand Down
9 changes: 3 additions & 6 deletions rust_src/src/strings.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
//! Functions operating on strings.

use std::{ptr, cmp};
use std::ptr;

use libc::{self, c_char, c_void, ptrdiff_t};

Expand Down Expand Up @@ -202,10 +202,9 @@ fn string_to_unibyte(string: LispObject) -> LispObject {
}
}

// @TODO need to rework this function to use new as_symbol_or_error API
fn get_string_or_symbol(mut string: LispObject) -> multibyte::LispStringRef {
if string.is_symbol() {
string = string.symbol_name()
string = string.as_symbol_or_error().symbol_name()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still duplicates the symbol check. I'd make it

match string.as_symbol() {
    Some(sym) => sym.symbol_name().as_string().expect("symbol name not a string?")
    None => string.as_string_or_error()
}

If you like, you can also make string_equal use this function, it currently uses SYMBOL_NAME directly.

}

string.as_string_or_error()
Expand All @@ -216,16 +215,14 @@ fn string_lessp(string1: LispObject, string2: LispObject) -> LispObject {
let lispstr1 = get_string_or_symbol(string1);
let lispstr2 = get_string_or_symbol(string2);

let mut count = 0;
let zip = lispstr1.iter().zip(lispstr2.iter());
for ((codept1, _), (codept2, _)) in zip {
count += 1;
if codept1 != codept2 {
return LispObject::from_bool(codept1 < codept2);
}
}

LispObject::from_bool(count < lispstr2.len_chars())
LispObject::from_bool(lispstr1.len_chars() < lispstr2.len_chars())
}

/// Return t if OBJECT is a multibyte string.
Expand Down
11 changes: 10 additions & 1 deletion rust_src/src/symbols.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
use remacs_macros::lisp_fn;
use lisp::LispObject;
use lisp::{LispObject, ExternalPtr};
use remacs_sys::Lisp_Symbol;

pub type LispSymbolRef = ExternalPtr<Lisp_Symbol>;

impl LispSymbolRef {
pub fn symbol_name(&self) -> LispObject {
LispObject::from_bool(false) // @TODO
}
}

/// Return t if OBJECT is a symbol.
#[lisp_fn]
Expand Down