Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Foreign Function Interface #25

Open
DemiMarie opened this issue Jul 7, 2016 · 18 comments
Open

Foreign Function Interface #25

DemiMarie opened this issue Jul 7, 2016 · 18 comments

Comments

@DemiMarie
Copy link

Perhaps one of the few essential parts of allowing cross-implementation SML code to be written is a standardized Foreign Function Interface. This is because an FFI is needed if one wishes to use features that are provided by third-party libraries or the operating system, but are not provided by one's implementation.

A good FFI should (note that these are my personal views):

  • allow calling almost all C functions.
  • allow for reading and writing C data structures.
  • allow for exporting functions that can be called from C.
  • support for cleanup of foreign resources when SML values become unreachable.
  • not require users to write C stub code by hand. Generating C automatically during compilation is allowed.
  • be independent of implementation details, such as the representation of SML values.
  • not cause unnecessary performance degradation.
  • support SML code being compiled into a library that is called into by foreign code.

MLton has an FFI that I believe supports all of these. I am not familiar with any others.

@JohnReppy
Copy link
Contributor

I agree that a standard FFI is an important goal. The MLton FFI is a reasonable place to start, but it does have some issues. It makes some significant assumptions about SML runtime representations that might be hard to implement without whole-program compilation. Also, the callback support is too static; it is not possible to have multiple instances of a callback at runtime.

@DemiMarie
Copy link
Author

@JohnReppy Having thought further, I agree.

Having thought further, the best choice seems to be Haskell's, adapted to SML syntax. It has some weaknesses (need of external binding generators to allow reasonable handling of C structs, for one), but it is battle-tested and does not require whole-program compilation or monomorphization. It does not require C stubs either. Haskell's FFI uses type classes (mostly Storable, as I understand it), so this might require the inclusion of #18 (modular type classes).

@eduardoleon
Copy link

More important than a foreign function interface, I think, are standardized primitives for unsafe programming directly in ML, without having to reach for C code. One lesson we should learn from Rust is that modularity is useful for wrapping unsafe implementations of safe abstractions. Standard ML already has a more sophisticated module system than Rust's, but it lacks the unsafe primitives necessary to write pure ML implementations of

  • Dynamically sized vectors and double-ended queues, backed by an array of potentially uninitialized elements. (This is not the same thing as an array of options, all initialized to NONE.)
  • Hash tables, without the overhead of bounds-checked array indexing. (The bounds check is superfluous, because it is easy to prove that the implementation will never use invalid array indices.)

@RobertHarper
Copy link

RobertHarper commented Nov 3, 2019 via email

@ratmice
Copy link
Contributor

ratmice commented Nov 3, 2019

To me at least, before entertaining unsafe, i'd want to be able to reason that some chunk of my program is limited to the safe subset of the language. This isn't something the rust mechanism gives easily.

in rust you basically have to bypass the build system and run the compiler directly since it defaults to allowing unsafe for all transitive dependencies, running the compiler directly you can flip the allow to forbid unsafe.

While in basis there are optional modules, i'm not recalling any optional primitives in the language itself, and of the optional modules in basis, I'm not recalling anyone that implements these based on a compile time flag or anything.

If there is a mechanism in place to restrict the language to the safe subset (probably by default), which allowed mixing of safe and unsafe compilation units, i wouldn't really see a problem with it.
Given that the language doesn't really specify any build mechanisms besides use, how would it work? use_unsafe which raises a compiler error if in safe_only?

This all most likely belongs in it's own issue rather than the FFI one I would think...

@eduardoleon
Copy link

I see an unsafe subset of ML as a net improvement over calling C, for the following reasons:

On the usability front, an unsafe subset of ML would still have parametric polymorphism, algebraic data types and pattern matching, modules and abstract types, etc. Passing complicated values to an unsafe ML function would require no marshalling. You could still write and test your programs in a REPL.

On the verification front, an unsafe subset of ML would still have a formal semantics. The interaction between safe ML and unsafe ML code would be easier to understand than the interaction between ML and C code. For obvious reasons, there would be no theorem saying that no well typed term evaluates to wrong. But you could still prove yourself that the term you have written does in fact never evaluate to wrong.

On the cultural front, reasonable programmers would only use unsafe features sparingly and in small modules. Given how expressive ML is, these modules could be under 150-200 lines of code, and perhaps a lot less.

The serious use case I envision for an unsafe subset of ML is implementing numerical methods libraries. Presently, when I want to do numerical linear algebra, I have to reach for Python and R. I really wish I could reach for ML instead.

@ratmice raises an important issue. Unsafe features need some gatekeeping, so that the path of least resistance is to use safe features only.

@DemiMarie
Copy link
Author

I agree that a standard FFI is an important goal. The MLton FFI is a reasonable place to start, but it does have some issues. It makes some significant assumptions about SML runtime representations that might be hard to implement without whole-program compilation.

True, but those are also the representations you need if you want good performance. Otherwise, you are easily losing an order of magnitude in performance, and I really really do not want people avoiding polymorphic code on performance grounds.

Also, the callback support is too static; it is not possible to have multiple instances of a callback at runtime.

That is C’s fault, and is why virtually all C libraries support passing a user-provided void* parameter to any callbacks. Generating C callbacks at runtime requires dynamic code generation.

@JohnReppy
Copy link
Contributor

The type/whole-program issue can be solved if you are willing to restrict array/vector arguments in the FFI to the MONO_ARRAY/MONO_VECTOR modules (e.g., CharVector), since these types can have a packed representation even with separate compilation. It may be better, however, to push the allocation of array data to the C side, since that avoids GC dangers.

You can implement dynamic callbacks with very minimal runtime code generation. All that you need is a template that you can copy and patch to dynamically generate a C function for a given SML closure.

@DemiMarie
Copy link
Author

Some platforms, such as iOS and consoles, prohibit all dynamic code generation. libffi has an ugly hack for iOS, but I don’t think we should require it. Much better to make callbacks static and fix the broken C libraries.

I agree that allocating data from the C side is a better choice. In particular, passing data on the SML heap to C is unsafe in the presence of parallelism on the SML side.

@DemiMarie
Copy link
Author

When it comes to representation, Rust’s solution is to only separately compile monomorphic functions. Polymorphic functions are compiled when used. Since monomorphic code is quite common, and since parsing and type-checking are still done per-crate (Rust’s compilation unit), this is practical.

@JohnReppy
Copy link
Contributor

Some SML compilers take that approach for functors; i.e., compile them when they are applied. I'm not sure how well it would for core-language polymorphism; I suspect that it would reduce to having to do whole-program monomorphization.

@RobertHarper
Copy link

RobertHarper commented Feb 29, 2020 via email

@JohnReppy
Copy link
Contributor

I don't think that MLton treats them the same (I assume that you are talking about the implementation). Functors are eliminated before monomorphization using a specific defunctorization pass.

@RobertHarper
Copy link

RobertHarper commented Feb 29, 2020 via email

@MatthewFluet
Copy link
Contributor

Yes, MLton first eliminates all module-level constructs (with code duplication at functor applications and simple renaming to eliminate structures) and then (after some intervening simplifications) eliminates polymorphism.

@JohnReppy
Copy link
Contributor

Responding to Bob:
I'm not sure what you mean by "effect" here. Functor application is beta-reduced at compile time, which means that the body of the functor is specialized to code, as well as types, whereas the specialization of polymorphism only specializes to types. If I implement the list-map function as a functor, then I know that I will get specialized versions for each application of the functor, whereas the polymorphic list-map function will be specialized to the type of the list elements, but not necessarily to different function arguments that have the same type.

@RobertHarper
Copy link

RobertHarper commented Mar 1, 2020 via email

@YawarRaza7349
Copy link

FWIW, for anyone thinking about the "unsafe SML" thing, I think unsafe C# would be a closer analogue than Rust is. This demonstrates how garbage-collected memory could be used unsafely. These are some higher-level unsafe APIs of the sort that could be exposed to SML programmers. It'd also be a good source of "experience reports" of how people who had been programming in an initially safe language might have started using these features, what ways they incorporated it into their codebase, and whether it has worked well for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants