Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ask for help] Is it possible to use with JIT? #12

Open
playXE opened this issue Jun 6, 2020 · 6 comments
Open

[Ask for help] Is it possible to use with JIT? #12

playXE opened this issue Jun 6, 2020 · 6 comments
Assignees

Comments

@playXE
Copy link

playXE commented Jun 6, 2020

I have simple runtime Waffle and there is baseline JIT, currently I use simple mark&sweep and use reference counting for roots and this causes runtime overhead, my question is it possible to use zerogc with JIT? Will safepoints work in JITed code? I do not use LLVM or Cranelift, instead I have macro assembler and if safepoints needs some kind of additional code to make it work I believe I can emit this code just fine.

@Techcable
Copy link
Member

This should absolutely work with a JIT 😄 In fact, that's my intended use case. Right now multi-threading support doesn't really work correctly, though I'm working hard on it. The last really "stable" commit is 2ebc135.

Safepoints are the abstraction typically used for JIT compilers to cope with GC. They allow the optimizer plenty of room for optimization. The root problem they're designed to deal with is keeping track of garbage collected objects on the stack (which are obviously roots). Most really "advanced" JITs like the JVM have custom assembly code that walks the stack. This requires custom support from the code generator, so it's not really possible to use it with the Rust compiler.

Right the simple mark/sweep collector uses shadow stacks to keep track of the roots of the threads. I recommend you read the whole article on LLVM garbage collection for background info 😉

Assuming your code generator inlines the fast-path of the safepoint! macro (it probably should),
your generating code (using the simple mark/sweep collector) will boil down to basically invoking basic_safepoint.

ctx.collector.shadow_stack.elements.push([tuple_of_roots]); // Vec::push (should inline this)
if ctx.collector.heap.allocated_size >= ctx.collector.heap.threshold {
    // Slow path
    ctx.collector.perform_collection();
}
ctx.collector.shadow_stack.elements.pop(); // Vec::pop (probably safe to elidle bounds check)

You'd also need to surround any calls with the appropriate shadow_stack.push() calls (See recurse_context). A child call needs to know about its parents' roots if there's any possibility of collection.

Note the shadow_stack must always be consistent because of the possibility of collection.

Unless you somehow manage to implement stack maps alongside zerogc's shadow stacks, you will have some runtime overhead for tracking roots. This would still be much faster than pervasive reference counting (IE std::rc::Rc or python PyObject). Shadow stacks are actually used in production VMs. Both PyPy and HLVM.

My JIT isn't fully ready to use zerogc until I resolve the threading issues (#8) and add object handles (#9). This project is definitely a WIP but I'm happy to help you try and integrate this with your JIT. I'm happy to answer any questions that you have - just respond to this issue 😄

@playXE
Copy link
Author

playXE commented Jun 8, 2020

Thanks for answer!

Right now multi-threading support doesn't really work correctly

Multi-threading is not important for me because I will use BEAM-like concurrency where each process(green thread) will have it's own heap.

Also current impl of Baseline JIT does not actually use machine stack and registers to store values but instead stores them in heap allocated CallFrame type so I think it might be pretty easy to get roots from this call frame.

@playXE
Copy link
Author

playXE commented Jun 10, 2020

@Techcable I actually decided to do STW in my GC impl so the question is: should basic_safepoint impl actually handle STW or I should look at something else?

UPD: I did read some HashLink source code and for STW hashlink has gc_global_lock and hl_blocking which will increase current_thread->gc_blocking++ and thread that needs GC acquires this gc_global_lock so ther threads just can't get it:

gc_stop_world(...) {
                int i;
		gc_threads.stopping_world = true;
		for(i=0;i<gc_threads.count;i++) {
			hl_thread_info *t = gc_threads.threads[i];
			while( t->gc_blocking == 0 ) {}; // spinwait
}

static void gc_global_lock( bool lock ) {
	hl_thread_info *t = current_thread;
	bool mt = (gc_flags & GC_NO_THREADS) == 0;
	if( !t && gc_threads.count == 0 ) return;
	if( lock ) {
		if( !t )
			hl_fatal("Can't lock GC in unregistered thread");
		if( mt ) gc_save_context(t,&lock);
		t->gc_blocking++; // increase blocking, this actually just AtomicInt
		if( mt ) hl_mutex_acquire(gc_threads.global_lock); // !!! GC thread is already locked this mutex, current thread will pause
	} else {
		t->gc_blocking--;
		if( mt ) hl_mutex_release(gc_threads.global_lock);
	}
}
HL_API void hl_blocking( bool b ) {
	hl_thread_info *t = current_thread;
	if( !t )
		return; // allow hl_blocking in non-GC threads
	if( b ) {
		if( t->gc_blocking == 0 )
			gc_save_context(t,&b); // does not actually matter, this is for conservative hashlink GC.
		t->gc_blocking++;
	} else if( t->gc_blocking == 0 )
		hl_error("Unblocked thread");
	else {
		t->gc_blocking--;
		if( t->gc_blocking == 0 && gc_threads.stopping_world ) {
			gc_global_lock(true); // try to lock mutex and increase gc_blocking
			gc_global_lock(false);
		}
	}
}

I want to believe it is possible to just insert something like gc_global_lock at safepoint but I'm not sure

@Techcable
Copy link
Member

@playXE That is correct. A basic_safepount implicitly blocks the thread if it needs to do a collection.

The safepoint does a quick comparison, comparing the current size of the heap against a threshold (based on data from the last collection). If the threshold is triggered the (inlined) safepoint does an out-of-line call to SimpleCollector::maybe_collect.

Inside maybe_collect all running functions are all for garbage collection and zerogc won't return to user code until the collection is finished. Any roots the user wants to preserve have been given to the collector (they are part of the shadow stack).

The collector runs a mark/sweep algorithim based on the code. Only once that is done will the safepoint function return.

This info only applies to the version in the old commit I showed you. The master branch is broken and currently undergoing a lot of changes to try and support multiple threads where all the threads have to pause at a safepoints before a collection begin (using locks just like the HashLink code you showed me).

I think as long as you stick to the commit I showed you, zerogc should work well for you. Generally VMs and JIT insert safepoints afrer every call and loop iteration. Your JIT will need to generate the appropriate calls to safepoint_recurse around every function call and basic_safepoint after every loop iteration. Shadow stacks are actually used in production VMs (see my above links to docs on PyPy and HLVM).

Look at the binary_trees example on how those safepoints are used. You should consider manually inlining calls to the safepoints. The fast path is pretty small and calls to maybe_collect are infrequent.

I'm planning to use this in a JIT as well. I'm happy to answer any more questions you have :)

@Techcable Techcable self-assigned this Jun 13, 2020
@playXE
Copy link
Author

playXE commented Jun 13, 2020

I just don't know where to send this message so I'll just send it there :P
@Techcable , there are MMTk rewrite in Rust and it should be available in ~one month, maybe it's worth integrating somehow zerogc with MMTk? MMTk does not define how roots and safepoints should be handled so zerogc might be nice addition for MMTk
mun-lang/mun#206 (comment)

@Techcable
Copy link
Member

That's very exciting! MMTk has a very high-quality set of collectors. I've subscribed to that issue and will definitely look into it once they release the code!

Techcable added a commit that referenced this issue Jun 17, 2020
This should be easier for a JIT to implement since it doesn't require
any fences. See issue #12
As of this writing, cranelift doesn't even support atomic fences.

It's also could be faster on ARM architectures. Seems important
since it's such a hot-path.....
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants