Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements needed, and planned features #38

Closed
VictorTaelin opened this issue Jan 31, 2022 · 37 comments
Closed

Improvements needed, and planned features #38

VictorTaelin opened this issue Jan 31, 2022 · 37 comments
Labels
enhancement New feature or request

Comments

@VictorTaelin
Copy link
Member

VictorTaelin commented Jan 31, 2022

Better allocator

Currently, the C allocator just reserves 8 GB of memory when the program starts, and, if there are multiple threads, that memory is split between then. So, for example, if there are 8 threads, each one gets 1 GB to work with. A worker can read from other worker's memories (which may happen for shared data, i.e., when they pass through a DP0/DP1 pointer), but only the owner of the slice can allocate memory from it. Each worker also keeps its own freelist.

As such, a huge improvement would be thread-safe, global, possibly arena-based, alloc and free.

Better scheduler

Right now, the task scheduler will just partition threads evenly among the normal form of the result. So, for example, if the result is (Pair A B), and there are 8 threads, 4 will be assigned to reduce A and B. This is obviously very limited: if A reduces quickly, it won't re-assign the threads to help reducing B, so the program will just use half of the cores from that point. And so on. This is why algorithms that return lists are often slower, they aren't using threads at all. In the worst case, it will just fallback to being single threaded.

A huge improvement would be a proper scheduler, that adds potential redexes to a task pool. When I attempted to do that, the cost of synchronization added too much overhead, ruining the performance. Perhaps a heuristic to consider would be to limit the depth of the redexes for which global tasks are emitted; anything below, say, depth=8, would just be reduced by the thread that reaches it. Many improvements can be done, though.

I32, I64, U64, F32, F64

Right now, HVM only supports U32. The numeric types above should be supported too. I32 and F32 should be easy to add, since they are unboxed, like U32. I64, U64 and F64 require implementing boxed numbers, since they don't fit inside a 64-bit Lnk (due to the 4-bit pointer), but shouldn't be hard. Discussion on whether we should have unboxed 60-bit variants is valid.

On #81.

Improve the left-hand side flattener

On #54.

A nice API to use as a Rust library

Using HVM as a pure Rust lib inside other projects should be very easy, specially considering the interpreter has a fairly good performance (only about 50% slower than the single-thread compiled version). We must think of the right API to expose these features in a Rust-idiomatic, future-proof way. Input from the Rust community is appreciated.

A JIT compiler

Right now, HVM compiles to C. That means a C compiler like clang or gcc is necessary to produce executables, which means that it isn't portable when used as a lib. Of course, libs can just use the interpreter, but it is ~3x slower than the compiler, and is not parallel. Ideally, we'd instead JIT-compile HVM files using WASM or perhaps Cranelift, allowing lib users to enjoy the full speed of HVM in a portable way.

IO

Adding IO should be easy: just write an alternative version of normalize that, instead of just outputting the normal form, pattern-matches it against the expected IO type, and performs IO operations as demanded.

Windows support

The compiler doesn't support Windows yet. The use of -lpthreads may be an issue. The interpreter should work fine, though.

On #52.

GPU compilers

Compiling to the GPU would be amazing, and I see no reason this shouldn't be possible. The only complications I can think of are:

  1. How do we alloc inside a GPU work unit?

  2. Reduce() is highly branching, so this may destroy the performance. But it can be altered to group tasks by categories, and merge all branches in one go. The reduce function essentially does this: branch -> match -> alloc -> subst -> free. The last 4 steps are very similar independent of the branch it falls into. So, instead, the branch part can just prepare some data, and the match -> alloc -> subst -> free pass are all moved to the same branch, just with different indices and parameters. This will avoid thread divergence.

  3. GPGPU is a laborious mess. I'm under OSX, so I can't use CUDA; OpenCL is terrible; WebGPU is too new (and from what I could see, doesn't have the ability to read/write from the same buffer in the same pass!?). Regardless, we'd probably need different versions, Metal for OSX, CUDA and OpenCL for Windows depending on the GPU, etc. So, that's a lot of work that I can't do myself, it would require expensive engineers.

Compile Kind to HVM

HVM is meant to be a "low-level target", which is kinda confusing because it is actually very high-level, but in the sense that users shouldn't be writing it directly. Ideally, Kind will compile to HVM, but it needs a lot of adjustments before that is possible.

Add a checker for the "derivatives of (λx(x x) λx(x x)) not allowed" rule

See this issue for more information, as well as the superposed duplication section on HOW.md.

On #61.

Tests

This project has basically no test. Write tests!


Will edit the thread as I think of more things. Feel free to add to this thread.

@dumblob
Copy link

dumblob commented Jan 31, 2022

Regarding GPU compilers and generally "offloading of some parts of the computation" you might have a look at DawnCC - some years ago I found their work quite inspiring (btw. they seem to be from Brazil, so might be more accessible for you in some way or another).

@ken-okabe
Copy link

ken-okabe commented Jan 31, 2022

A JIT compiler

Adopting WASM opens lots of capabilities and possibilities and currently, it's a blue ocean for new languages and attracts attention.

Windows support

The priority should be very low in the current stage, so please avoid putting your precious resource into it right now.

GPU compilers

GPU compilers also can attract attention in terms of Machine Learning since I observe ML finally has entered the true realm of functional programming.

JAX Vs TensorFlow Vs PyTorch: A Comparative Analysis / JAX is a Python library designed for high-performance numerical computing.

https://github.com/google/jax

JAX is Autograd and XLA, brought together for high-performance machine learning research.

JAX transformations only work on pure functions, which don't have side-effects and respect referential transparency (i.e. object identity testing with is isn't preserved). If you use a JAX transformation on an impure Python function, you might see an error like Exception: Can't lift Traced... or Exception: Different traces at same level.

JAX now runs on Cloud TPUs. To try out the preview, see the Cloud TPU Colabs.

At least for ML, nowadays, the local physical GPU device usage is rather outdated. The recent ML researchers actively take advantage of Cloud TPUs, so you may also consider this movement.

Compile Kind to HVM

HVM is meant to be a "low-level target", which is kinda confusing because it is actually very high-level

I've noticed that and in fact

HVM files look like untyped Haskell.

HVM looks very clean and mostly has no "statements" and I like better than Haskell and even better than Kind.

My proposal is not to compile Kind to HVM but extends HVM itself.

I suppose it is a simpler approach then should be far easier to extend HVM by setting up the suited syntax specially designed for the underlying HVM from scratch.

HVM can have a simple module system to extend itself. It's like a standard library + procedural-macro in Rust.

Procedural macros allow you to run code at compile time that operates over Rust syntax, both consuming and producing Rust syntax. You can sort of think of procedural macros as functions from an AST to another AST.
Procedural macros must be defined in a crate with the crate type of proc-macro.

Since HVM concise syntax itself is just AST, this approach should fit the HVM module system.

For instance,

HVM files look like untyped Haskell.

So type syntax is required on top of the native HVM then a static type checker runs. There is a good chance to implement first-class types like Idris.

Perhaps we need more expansion of pattern-matching syntax or IMO, infix functions/ combinators/operators definition.

All are module base extensions, so an incremental approach to the new perfect language.

(EDIT) I think this AST to another AST approach (as same as Procedural macros in Rust) has been already done and working because HVM code can be already transpiled to C code.
The difference of self-expansion module is

  • HVM -> C
  • Additional syntax on HVM -> the base HVM

@DylanSp
Copy link

DylanSp commented Feb 1, 2022

Targeting GPUs

It might be worth taking a look at Futhark, which is a small language for writing code that can target CUDA and OpenCL. (There's an open issue for WebGPU support, though it hasn't received any attention) Futhark can compile to a C library; I haven't looked at HVM's compilation output in detail yet, but I'm guessing that GPU-targeting code could compile to C that calls a C library (compiled from Furthark) for using the GPU.

C compiler/portability

Zig has done some extensive work on bundling Clang for easily setting up a C compiler; maybe there's some work from there which could be reused?

@jmorag
Copy link

jmorag commented Feb 1, 2022

IO

Adding IO should be easy: just write an alternative version of normalize that, instead of just outputting the normal form, pattern-matches it against the expected IO type, and performs IO operations as demanded.

I'd be interested in getting the ball rolling on this. Where is the normalize function? I can't find a function named normalize via grep.

@JakobBruenker
Copy link

GPGPU is a laborious mess. I'm under OSX, so I can't use CUDA; OpenCL is terrible; WebGPU is too new (and from what I could see, doesn't have the ability to read/write from the same buffer in the same pass!?). Regardless, we'd probably need different versions, Metal for OSX, CUDA and OpenCL for Windows depending on the GPU, etc. So, that's a lot of work that I can't do myself, it would require expensive engineers.

I'm only dabbling in this topic, so it's possible that it may not be suited at all; but this paragraph is conspicuously lacking any mention of Vulkan.

@L-as
Copy link

L-as commented Feb 1, 2022

For GPGPU:
This is not really a "clean" approach, but you could try targetting Futhark rather than a concrete API. Futhark has a support for a lot of different backends, and would ensure wide support.

For JIT compiler part:
I disagree that it's not portable currently, things like tinycc exist, not to mention that libgccjit is also a thing. However, I think targetting Cranelift would be very useful. You'd have to redo how you do the whole runtime system I think? The current approach of having a template C file is ugly IMO anyway.

@rowanG077
Copy link

rowanG077 commented Feb 1, 2022

What about compiling Haskell to HVM? From a high level this shouldn't be too hard. Use the GHC core as input and transform it to HVM. There are probably lots of missing features (as mentioned already, IO, boxed/unboxed integers) but wouldn't this be possible in principle?

@VictorTaelin
Copy link
Member Author

VictorTaelin commented Feb 1, 2022

IO

Adding IO should be easy: just write an alternative version of normalize that, instead of just outputting the normal form, pattern-matches it against the expected IO type, and performs IO operations as demanded.

I'd be interested in getting the ball rolling on this. Where is the normalize function? I can't find a function named normalize via grep.

Here is the relevant function on Rust. Similar one on C. It is quite small, as it just recurses over a term, applying reduce() layer by layer, to compute the normal form. To integrate IO, we'd just modify that function like this:

let term = reduce(mem, funcs, host, opt_id_to_name);
match term {
  ... instead of checking if it is an app / lam / etc, it will match against
  ... the expected shape of the IO monad... that is, one of these:
  ... (IO_Ask method param continuation)
  ... (IO_End return_value)
  ... and then call the selected IO method and pass the result to the continuation
}

Basically it would do the same as run_io here, preferably using Kind's IO Type.

(Will answer the remaining questions later.)

@klarkc
Copy link

klarkc commented Feb 1, 2022

@rowanG077

What about compiling Haskell to HVM?

There is also PureScript, which supports alternate backends.

@jmorag
Copy link

jmorag commented Feb 1, 2022

Out of curiosity, why is there both a runtime.c and a runtime.rs? They appear to be extremely similar. Is the rust one the interpreter and the C one for linking when compiling?

@nothingnesses
Copy link

@jmorag Yes, runtime.rs is the code used for the interpreter and runtime.c is the template of boilerplate code used as the runtime by C programs generated by the compiler.

@VictorTaelin
Copy link
Member Author

What about compiling Haskell to HVM? From a high level this shouldn't be too hard. Use the GHC core as input and transform it to HVM. There are probably lot's of missing features (as mentioned already, IO, boxed/unboxed integers) but wouldn't this be possible in princible?

Yes, that is a major long-term goal of HVM. Note that HVM is not an optimizing compiler. It is just the runtime. So, it would pair extremely well with GHC. As in, Haskell -> GHC Core -> HVM, allowing us to take advantage of all the optimizations performed by the compiler. Of course, GHC's runtime has millions of feature I don't even know, so that might take of work to do.

There is also PureScript, which supports alternate backends.

Would be amazing to compile PureScript to HVM (and benchmark!).

@ken-okabe
Copy link

ken-okabe commented Feb 1, 2022

What about compiling Haskell to HVM?

Just note.
The author(@VictorTaelin) already has developed a better language: Kind

A minimal, efficient and practical programming language that aims to rethink functional programming from the scratch, and make it right. Under the hoods, it is basically Haskell, except without historical mistakes, and with a modern, consistent design.

wouldn't this be possible in principle?

Yes, possible; however, I suppose the priority for the author is currently low for the above reason, and such a project should be independently done by the Haskell community with support from the author. It's not a small work and I hope the resource will be primarily provided by the Haskell community itself.

I guess the only reason the demo code is compared to the Haskell code is simply since Haskell is more well-known than Kind so that broader functional community users can easily reach the concept, then GHC performance benchmark.

Although I think Kind has a better design than Haskell, I still think HVM is better than Kind and hope HVM can be extended with the compositional modules.

Add a checker for the "derivatives of (λx(x x) λx(x x)) not allowed" rule

It seems that we need the type-chcker based on HVM in any way, so I think it's a simpler integrated approach toward the typed-HVM.

@jmorag
Copy link

jmorag commented Feb 1, 2022

What about compiling Haskell to HVM? From a high level this shouldn't be too hard. Use the GHC core as input and transform it to HVM. There are probably lot's of missing features (as mentioned already, IO, boxed/unboxed integers) but wouldn't this be possible in princible?

Yes, that is a major long-term goal of HVM. Note that HVM is not an optimizing compiler. It is just the runtime. So, it would pair extremely well with GHC. As in, Haskell -> GHC Core -> HVM, allowing us to take advantage of all the optimizations performed by the compiler. Of course, GHC's runtime has millions of feature I don't even know, so that might take of work to do.

Perhaps Haskell -> Core -> STG -> HVM would be possible, as then HVM could be bolted onto the external stg interpreter which has already done a ton of work re-implementing the GHC primops and emitting whole-program compilation information so that HVM wouldn't have to also become an incremental compiler.

@rowanG077
Copy link

What about compiling Haskell to HVM? From a high level this shouldn't be too hard. Use the GHC core as input and transform it to HVM. There are probably lot's of missing features (as mentioned already, IO, boxed/unboxed integers) but wouldn't this be possible in princible?

Yes, that is a major long-term goal of HVM. Note that HVM is not an optimizing compiler. It is just the runtime. So, it would pair extremely well with GHC. As in, Haskell -> GHC Core -> HVM, allowing us to take advantage of all the optimizations performed by the compiler. Of course, GHC's runtime has millions of feature I don't even know, so that might take of work to do.

There is also PureScript, which supports alternate backends.

Would be amazing to compile PureScript to HVM (and benchmark!).

Actually I'm looking for a master thesis topic. Could we maybe talk about this? I would be interested in implementing this.

@f-f
Copy link

f-f commented Feb 1, 2022

@rowanG077 I'm on the PureScript core team and I recently supervised another thesis on a PureScript backend.
I'm excited about HVM and I'd be available to provide pointers (on the PureScript side) to help get this going - feel free to reach out if you'd like to chat about this 🙂 (see my profile for contact details)

@jmorag
Copy link

jmorag commented Feb 1, 2022

My understanding of the HVM runtime is that it's lazy. How would strict languages like PureScript and Idris retain their semantics when compiling to HVM?

@klarkc
Copy link

klarkc commented Feb 1, 2022

@jmorag I don't know in details, but I know that purenix works in a scenario like that, in this section it explain the strictness difference impact on purenix usage.

@DylanSp
Copy link

DylanSp commented Feb 2, 2022

C compiler

I just tried tinycc on the output from compiling bench/TreeSum/main.hvm; C compilation fails due to at least two issues:

  • lack of <stdatomic.h>
  • initializer element is not constant in the declaration of HEAP_SIZE (I'm not sure why this is)

There may be other issues, I didn't investigate further. I think these rule out embedding tinycc as a portable C compiler, though.

@dumblob
Copy link

dumblob commented Feb 2, 2022

C compiler

Thanks @DylanSp for trying tcc out!

I just tried tinycc on the output from compiling bench/TreeSum/main.hvm; C compilation fails due to at least two issues:

  • lack of <stdatomic.h>

This is a totally minor issue. See e.g. how V (which uses tcc by default - i.e. in non--prod mode) solves this: https://github.com/vlang/v/tree/master/thirdparty/stdatomic and https://github.com/vlang/v/tree/master/vlib/sync/stdatomic .

  • initializer element is not constant in the declaration of HEAP_SIZE (I'm not sure why this is)

Sounds like something minor to me.

There may be other issues, I didn't investigate further. I think these rule out embedding tinycc as a portable C compiler, though.

So I'm convinced that if multiplatform quick compilation is desired (IMHO it is, but I'm not the one to decide), tcc is the best way forward.

@steinerkelvin steinerkelvin pinned this issue Feb 2, 2022
@steinerkelvin steinerkelvin added the enhancement New feature or request label Feb 2, 2022
@Heimdell
Copy link

Heimdell commented Feb 2, 2022

STG -> HVM might not be feasible, because, well, multiple reasons. The main problem is that GHC doesn't care about Dup-explosion prevention, well, until Linear Haskell is a thing - and even then.

@VictorTaelin
Copy link
Member Author

@rowanG077 sorry for the delay! Yes, feel free to mail me (victor.taelin@gmail.com) or DM me in any of my social networks. It would be an honor if you did your thesis related to HVM, and certainly there are many places where such a thing would add a lot of value to it.

@ken-okabe
Copy link

I forgot to mention in terms of Windows support #52

Nowadays, they no longer use Cygwin. They simply use a real Linux kernel on Windows that is called Windows Subsystem for Linux WSL2.
A full environment of a Linux distribution such as Ubuntu can be installed on WSL2, even GUI applications.
https://ubuntu.com/wsl
In the latest situation, they can install and launch the Ubuntu terminal with a few clicks.

So I guess we don't have to be concerned about this.

@steinerkelvin steinerkelvin mentioned this issue Feb 3, 2022
4 tasks
@ken-okabe
Copy link

Actually I'm looking for a master thesis topic. Could we maybe talk about this? I would be interested in implementing this.

FYI, this is an excellent discussion related to GHC you might be interested in. #74

@ghost
Copy link

ghost commented Feb 11, 2022

I agree with @stken2050 that HVM is well suited as a general purpose programming language. It just needs really well designed primitives, and over time, a standard library.

@samtoth
Copy link

samtoth commented Feb 12, 2022

I32, I64, U64, F32, F64

Right now, HVM only supports U32. The numeric types above should be supported too. I32 and F32 should be easy to add, since they are unboxed, like U32. I64, U64 and F64 require implementing boxed numbers, since they don't fit inside a 64-bit Lnk (due to the 4-bit pointer), but shouldn't be hard. Discussion on whether we should have unboxed 60-bit variants is valid.

I'd be interested in adding support for at least the unboxed numeric types as this seems like a low hanging fruit. I just wanted to check what the concensus was between implicit/explicit casting and how operators should function in regards to mixed input types. My instinct would just be to cast up to the 'bigger' type but maybe operators should only operate on the same type - combined with explicit casts.

@VictorTaelin what do you think?

@steinerkelvin steinerkelvin mentioned this issue Feb 12, 2022
6 tasks
@VictorTaelin
Copy link
Member Author

Operators should just operate on the same type. So, if F32 is added, for example, ADD will work on either u32/u32 or f32/f32, but not anything else (i.e., no u64/u32, no f32/u32, etc.).

@nothingnesses
Copy link

Why not add a type conversion operator for primitive types too? Rust has it, and we could probably just easily use theirs under the hood?

@dumblob
Copy link

dumblob commented Feb 13, 2022

@nothingnesses I think explicit (i.e. written every single time by the programmer) conversion from one type to another will be needed one way or another. I think @VictorTaelin meant that there shouldn't be any implicit conversion as that always leads to some irregularities and confusion.

@nothingnesses
Copy link

Ah maybe. I don't think any conversion operators are currently implemented in the language right now though, which was why I brought it up.

@VictorTaelin
Copy link
Member Author

Yes, that's what I meant. No implicit conversions, just explicit.

@klarkc
Copy link

klarkc commented Feb 22, 2022

I posted an idea on Cardano Catalyst for supporting a HVM backend for PureScript. If it receives a lot of votes, it may receive some sponsorship: https://cardano.ideascale.com/c/idea/396001

@glyh
Copy link

glyh commented Apr 13, 2022

Would something like arrayfire solve the GPGPU problem?

@rigille
Copy link

rigille commented Apr 14, 2022

IO

Adding IO should be easy: just write an alternative version of normalize that, instead of just outputting the normal form, pattern-matches it against the expected IO type, and performs IO operations as demanded.

I've been researching Roc and they have a nice idea for handling IO. The language doesn't handle IO at all, and instead delegates this to a platform which will provide IO capabilities. There are more details here and in an older talk.
@quleuber might be interested in this

@01mf02
Copy link

01mf02 commented Apr 26, 2022

Regarding allocators: I have used mimalloc to great success in a few language interpreters -- it has boosted performance considerably compared to the default allocator. It has Rust bindings as well. Even if you might not be able to use it as is, its general design might inspire you.

@glyh
Copy link

glyh commented Mar 17, 2023

Is GPU target done given this is closed?

@VictorTaelin
Copy link
Member Author

VictorTaelin commented Mar 17, 2023

no, we're just not using this to track TODOs. We've raised funds to build a company around the HVM and will be doing many things, including a GPU target. We're hiring :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests