Improving the garbage collector #134

ghost · 2017-04-07T19:33:49Z

Hi folks! :)

I'd like to draw attention to this rayon issue: nikomatsakis/rayon#195
Here we see rayon suffering from bad performance after switching from crate deque to crossbeam.

I've summarized in this comment what's wrong with crossbeam's epoch system and deque implementation. It suffers from multiple problems, but all of those are fixable. Some are easy, and some will require a lot of effort.

While crossbeam has recently been going through a refactoring phase, I've designed a new epoch system from scratch. I could write a lot about it's inner workings and decisions that were made. The short story is that it's superior to crossbeam's in pretty much all areas:

Pinning has much smaller overhead.
The Atomic API is more ergonomic.
Atomics can be tagged with several bits.
Memory reclamation is incremental (no long pauses!).
Data structure-local garbage queues are supported.
Memory reclamation supports destructors and arbitrary destruction routines.
Garbage can be eagerly collected by calling flush().
Garbage is automatically flushed after 128 pinnings so that it doesn't get stuck for a long time.
Rather than rotating garbage through 3 vectors where 2 are blocked, it is rotated through 2^63 vectors where two are blocked. This means more eager memory reclamation.
Thread-local linked-list entries are properly deleted on thread exit.

To solve all those problems, a complete rewrite was needed. Unfortunately, I don't know how to patch crossbeam's existing epoch system implementation to fix all of those - it really needs a fresh start.

You can check out my new crate here: https://github.com/stjepang/coco
There's a finished deque implementation that fixes all the problems rayon had with crossbeam. And it's much faster than crossbeam! :)

Here's what I'm aiming to implement in the very near future:

A Treiber stack (crossbeam's has slow CAS loops and leaks memory when dropped).
A generic and easy-to-use lock-free skiplist: https://github.com/stjepang/skiplist (WIP)
Fully featured bounded and unbounded MPMC channels with support for select macro: https://github.com/stjepang/channel (WIP)

By the way, I've also developed a faster equivalent to MsQueue that also supports the pop operation with timeouts. In the end I deleted it because I want to write an even better queue. :)

After that I'd like to explore several other types of maps, including hash maps.
In order to implement all this, a solid and very performant epoch system is required - and now I have it! The design was evolving for some time but now it has finally settled on one that seems just right.

I'm writing all this to ask the crossbeam team how they feel about it. @aturon has politely asked me not to split the crate ecosystem if possible, and I'd like to respect that. Ideally, it'd be best to merge the improvements into crossbeam. And for that to happen we need to reach a consensus as a team.

In any case, I'll keep using coco to develop new data structures until crossbeam gets a better epoch system. Then we can pretty easily port it's data structures to crossbeam, as far as I'm concerned.

Please don't get me wrong, I don't want to be too harsh to crossbeam. This is a magnificent project, a huge inspiration to my work, and I believe @aturon did fantastic job designing it! Thank you for that! :)

That said, my position is that 90% of it has to be scratched and reimplemented if we want to solve existing problems and move on. If you're skeptical about a big rewrite, perhaps I could simply keep developing coco until we see if my ideas turn out to be any good? At least it's something the crossbeam team should keep an eye on...

What do you think?

The text was updated successfully, but these errors were encountered:

schets · 2017-04-07T19:58:42Z

Ignore that first comment, I got accounts mixed up. I've deleted it from the thread.

Awesome work! Crossbeam has been in need of a lot of this for a while. I think many, if not all of these will be eventually mergable into crossbeam (I have quite a few old branchs with the incremental ). I'll look into the code more this weekend but I have questions and comments upfront:

Have you tested this pinning change in isolation? I tried an extremely similar change in crossbeam a while ago and it surprisingly hurt performance.
This is definitely doable in crossbeam
How is this different than what crossbeam does?
Easily doable in crossbeam, could just be a PR
Same as 7
This seems like something mostly relevant in incremental collection since crossbeam eagerly collects when it can. Is there a reason a similar scheme can't be used in crossbeam to track collectable garbage?
Great change! Can this be added directly to crossbeam

I think PRs for some of the more direct changes would be gladly accepted by the team, and we should get on some of the more involve stuff, especially performance items.

ghost · 2017-04-07T21:03:21Z

@schets I believe you wanted to quote parts of my comment, but the quotes are not visible :)

schets · 2017-04-07T21:09:24Z

That I did

ghost · 2017-04-07T21:48:22Z

@schets Just to elaborate a bit on your comments:

Here's pinning in isolation:

#[bench] fn pin_coco(b: &mut Bencher) { b.iter(|| coco::epoch::pin(|_| ())) }
#[bench] fn pin_crossbeam(b: &mut Bencher) { b.iter(|| crossbeam::mem::epoch::pin()) }

test pin_coco      ... bench:          11 ns/iter (+/- 0)
test pin_crossbeam ... bench:          27 ns/iter (+/- 1)

Yeah, certainly doable. I find all those Option<Shared<'a, T>> tiresome and prefer Ptr<'a, T> instead.
At the moment crossbeam's Atomic does not offer convenient API to tag the lowermost unused bits of a pointer. However, it looks like there's an open PR that is just about to bring that feature.
Yes, doable, but if that means simply limiting the number of items collected during collection, I'm not happy :) I worry about the possibility of one thread having huge amounts of garbage in it's thread-local bags and then being inactive. My approach is to push small bags of garbage into the global queue as soon as they become full. This makes sure that no thread ever clings to a big amount of garbage.
Oops, just noticed that I wrote "thread-local" instead of "data structure-local" (I've fixed it now) :) But yes, that can be added to crossbeam. However, some care must be taken because epochs work much differently in crossbeam.
Destructors can be supported in crossbeam, but you must be very careful. Only objects of type T such that T: 'static + Send may be inserted into crossbeam's global GC for destruction. Another problem with that is that it's unfair for other unrelated threads sharing the same GC to run possibly costly destructors. I believe it's better to tie destructors to a specific data structure-local garbage queue instead. In other words, data structure-local GC should run destructors, not the global GC.
Well, I'm not sure what would be the equivalent of flush() to the current crossbeam's epoch system. Flush moves the thread-local bag into the global queue. Or in case of data-structure local bags, it moves the pending bag into it's queue.

Thanks for taking a look! :) Criticism is welcome and I'll be happy to answer any further questions.

jeehoonkang · 2017-04-08T04:34:30Z

On the 2. bullet point, I made a PR to coco: https://github.com/stjepang/coco/pull/1

schets · 2017-04-11T19:10:43Z

@stjepang

Can you take another look at the first comment I posted? Github formatting completely changed the numbers I had written and I just fixed it.

For 1, how does performance compare when actually using the datastructures? On Intel, there are a lot of strange interactions with the memory subsystem that don't manifest without multiple threads and running pin in a tight loop will miss all of them. I have some specific ideas but haven't had time to test them yet.

For 4, I remember having some code that advanced the epoch after some number of bytes were waiting to be freed and code that aggressively pushed large objects into the global queue. This doesn't solve your problem though of a thread avoiding crossbeam and never advancing the epoch. Also, the global queue mechanics were quite different for this.

At a higher level, it looks like coco focuses on aggressively moving garbage to the global queue instead of each thread freeing what garbage it creates, and this seems to be underlying a lot of the major design decisions. If so, that seems like it would lead to performance issues in the allocator?

ghost · 2017-04-11T21:36:00Z

@schets

For 1, how does performance compare when actually using the datastructures? On Intel, there are a lot of strange interactions with the memory subsystem that don't manifest without multiple threads and running pin in a tight loop will miss all of them.

I benchmarked skip list iteration where each step pins the current thread and increments/decrements a reference counter. Performance is quite good with this mechanism, but this was benchmarked with a single thread only. I'll try again with multiple threads...

For 4, I remember having some code that advanced the epoch after some number of bytes were waiting to be freed and code that aggressively pushed large objects into the global queue. This doesn't solve your problem though of a thread avoiding crossbeam and never advancing the epoch.

My strategy is to flush large objects so that they immediately become globally available for garbage collection. Any thread is able to collect such garbage. Moreover, every thread collects some garbage just after it flushes a bag. It also collects garbage between every 128 pinnings.

If all threads become inactive with respect to crossbeam/coco and stop pinning, the garbage is left in the queue forever, yeah. I don't see how to solve this problem without spinning up a background thread that constantly keeps vacuuming garbage. But I also think this is not a big problem either. :)

This is what the chronology of a deque might look like:

allocate a buffer of size 16 (the "16-buffer" from now on)
allocate a new 32-buffer, retire and flush the 16-buffer, advance the epoch, collect some garbage
allocate a new 64-buffer, retire and flush the 32-buffer, advance the epoch, free the 16-buffer
allocate a new 128-buffer, retire and flush the 64-buffer, advance the epoch, free the 32-buffer

There might be hiccups if another threads gets pinned and holds epoch advancement, though.

At a higher level, it looks like coco focuses on aggressively moving garbage to the global queue instead of each thread freeing what garbage it creates, and this seems to be underlying a lot of the major design decisions. If so, that seems like it would lead to performance issues in the allocator?

Well, depends on what you mean by "aggresively", but yes. Whenever a thread-local bag becomes full, it is pushed into the global queue. This doesn't happen often enough to cause too much contention, so it's okay. I benchmarked stacks and queues backed by crossbeam and coco, and the result was that both GCs have very similar overhead in practice. I didn't test the performance of garbage collection alone, though...

Does this answer your questions?
If not, could you elaborate a bit on particular scenarios you're worried about?
Another important question is: what do you expect from the ideal garbage collector? How would it behave, and what does it optimize for?

schets · 2017-04-12T00:56:06Z

For your last two questions - I'm worried how well this will work with many cores. Epoch advancement still requires iterating over all active threads in the set, but I'm wary of anything else that introduces global sharing/contention points.

What you say about contention on the global queue may be true for 4 cores, but it may be much less true on a 48 or 64 core machine. I can go spin up a VM on a 96 core arm machine for really cheap, and Power8 machines can have almost 200 hardware threads. If all of those threads are doing a lot of garbage generation, there might now be quite a bit of contention on the garbage queue.

Deallocation only from the global queue gives me some similar worries as well. You have a data + unpredictable control dependency on the (pointer, free_fnc) pair loaded from the bag and freeing pointers from another core means the allocator will soon be returning pointers that are taking up space in a remote cache and aren't in the local cache, basically a type of false sharing. There might be some allocator-internal sharing problems as well.

All of the above are problems that won't really show up in microbenchmarks because they only matter on the whole application-scale, and thread-local bags go a long way towards solving them. Having a more real-world benchmark would help a lot with this. Possible some fake webservy thing that shares data in a global crossbeam map, communicates with crossbeam queues, etc but does enough work to give a meaningful idea of how crossbeam interacts with normal running programs. I have some ideas for this.

Having said that, I really like the idea of being able to push garbage directly into the global queue before the epoch is up and being able to incrementally work through the global queue. That's a great way to dealing with large allocations.

ghost · 2017-04-12T07:37:57Z

I can go spin up a VM on a 96 core arm machine for really cheap, and Power8 machines can have almost 200 hardware threads.

That would be great! Do you have links to websites that provide such machines?

If all of those threads are doing a lot of garbage generation, there might now be quite a bit of contention on the garbage queue.

Hmm, it's probably possible to mitigate the contention in several ways (multiple queues, maybe even a work-stealing mechanism, etc.). The library is at the moment not too optimized for massive numbers of cores, but this is something I'd like to do once I get access to an adequate machine.

There might be some allocator-internal sharing problems as well.

Yeah, this is a bit worrying. Might be a very difficult problem to solve, but I'll think about it...

Possible some fake webservy thing that shares data in a global crossbeam map, communicates with crossbeam queues, etc but does enough work to give a meaningful idea of how crossbeam interacts with normal running programs. I have some ideas for this.

Awesome, I'd love to get help with benchmarking. Let me know what ideas you come up with.

schets · 2017-04-12T11:56:50Z

packet.net has the 96 core arm machine

schets · 2017-04-12T14:40:18Z

Hmm, it's probably possible to mitigate the contention in several ways (multiple queues, maybe even a work-stealing mechanism, etc.).

My thoughts were that threads would default to thread-local garbage bags except for exceptional circumstances like:

Large deallocations
Too much garbage held locally
Explicitly wants to push GC to other threads

I think this would get the benefit of both cases. It prevents an inactive thread from holding on to too much garbage but in normal cases keeps thread-local bags.

Yeah, this is a bit worrying. Might be a very difficult problem to solve, but I'll think about it...

I think using thread-local bags when appropriate would help a lot here.

Awesome, I'd love to get help with benchmarking. Let me know what ideas you come up with.

I was going to write an actor-based fake market data system since it's the perfect storm for testing these cases. Lots of message passing, each actor can maintain state in some global map, and each actor has a lot of local state where performance depends heavily on the cache (maintaining and reading a poorly-written order book).

arthurprs · 2017-04-12T15:11:27Z

Sorry for delay, I noticed this discussion just now (from the mention in the dropable pr).

I'm going to do a deeper study later but one thing to consider is that the memory reclamation system should try to be friendly with the allocator thread cache. Otherwise it's just gonna push contention there.

schets · 2017-04-12T15:16:16Z

I'm going to do a deeper study later but one thing to consider is that the memory reclamation system should try to be friendly with the allocator thread cache. Otherwise it's just gonna push contention there.

In jemalloc, the fast-path allocation for reasonably small objects looks something like:

malloc:

thread_cache *tc = get_thread_cache(alloc_size);
return tc->cache_stack.pop();

And free:

/* prior data dependency on value of free_ptr */
thread_cache *tc = get_thread_cache(free_size);
tc->cache_stack.push(free_ptr);
/* GC slow path occasionally */

So yeah, freeing pointers which aren't in the cache and whose values aren't in the cache will cause contention in the allocator and on allocated values. The stack operations don't write to the freed pointer but the freeing process probably will soon by virtue of reallocating it

arthurprs · 2017-04-12T15:28:45Z

Threads should whenever possible clear their own garbage. I didn't really read into the code yet to see how aggressive coco is pushing stuff into the global garbage. Most memory allocator benefit from some sort of allocation/thread affinity.

That should probably help reduce contention in any sort of global garbage pool anyway.

schets · 2017-04-12T15:34:12Z

I'm fairly certain all collection happens from the global queue in coco, and bags are put in the global queue based on size and epoch advancement.

ghost · 2017-04-12T15:36:43Z

One of the biggest problems at the moment is that garbage objects are not segregated by which thread allocated those objects. The GC is totally oblivious to that. Perhaps all allocated objects should be tagged with current thread ID, and when they become garbage they should be promoted into that thread's local garbage bag?

Does that sound reasonable? Can we do better?

I'm fairly certain all collection happens from the global queue in coco, and bags are put in the global queue based on size and epoch advancement.

Correct. Bags are put into the global queue as soon as they get full, or when they get explicitly flushed.

@schets The actor-based market data system sounds great. It's going to be very helpful if you build that! Btw, one thing I really wish for is a real-world use case for a concurrent ordered map. It'd be interesting to put such data structures to a real test, for sure.

schets · 2017-04-12T15:42:55Z

One of the biggest problems at the moment is that garbage objects are not segregated by which thread allocated those objects.

It's a hard question whether you care about the allocating thread more than the deallocating thread. I believe jemalloc eventually will return pointers to the allocating thread but only after internal GC cycles. It also depends on the allocation pattern. The items may have been alive for so long the cache lines in question don't have any association with the original thread.

I expect that freeing garbage in the thread which last read it will provide the most benefits, since the freeing thread has near certainly read the pointer recently.

Btw, one thing I really wish for is a real-world use case for a concurrent ordered map. It'd be interesting to put such data structures to a real test, for sure.

I wouldn't say this will use an ordered map in a 'real' way, but it could certainly be made to use it.

arthurprs · 2017-04-12T17:45:56Z

Can we keep (all/some) bags in the current thread until thread exit? Also eventually try to collect the global bags as well. Similar to the current crossbeam implementation.

schets · 2017-04-12T17:47:58Z

I'm working on exactly that right now

ghost · 2017-04-12T18:01:25Z

Can we keep (all/some) bags in the current thread until thread exit?

Can you elaborate why you want to do that? The idea is the following: If the thread-local bag gets full, that means the current thread is producing tons of garbage and needs help from other threads. The bag capacity should be configured so that this happens rarely enough and contention isn't a problem.

Or maybe you just insist on some limited amount of bags staying in the thread-local staging area?

schets · 2017-04-12T18:26:22Z

Or maybe you just insist on some limited amount of bags staying in the thread-local staging area?

This. When a thread is overwhelmed with garbage it can push into the global queue, but otherwise it keeps things local and benefits from doing so

arthurprs · 2017-04-12T18:30:29Z

The point is avoiding contention in the global bag list and potentially in the allocator (by trying to be sympathetic to its thread cache). There could be L123 cache benefits from doing so as well.

Reading the code now I'm concerned that if the average number o bags per thread per epoch exceeds COLLECT_STEPS collection might never catch up. Does that make sense or am I missing something?

schets · 2017-04-12T18:36:05Z

Reading the code now I'm concerned that if the average number o bags per thread per epoch exceeds COLLECT_STEPS collection might never catch up. Does that make sense or am I missing something?

If you enforce that a thread must do more incremental deallocations than it does allocations eventually it will catch up I believe.

@stjepang I benchmarked the x86 cmpxchg change you made on current master and it is faster in some cases and within benchmark error for a few.

ghost · 2017-04-12T21:21:11Z

When a thread is overwhelmed with garbage it can push into the global queue, but otherwise it keeps things local and benefits from doing so

Yeah, this is something that can certainly be tweaked.

Reading the code now I'm concerned that if the average number o bags per thread per epoch exceeds COLLECT_STEPS collection might never catch up. Does that make sense or am I missing something?

Correct. But this just won't happen in any realistic scenario. I intentionally chose a reasonably large COLLECT_STEPS so that this isn't a concern.

I benchmarked the x86 cmpxchg change you made on current master and it is faster in some cases and within benchmark error for a few.

Great! It's good that we can replicate benchmark results.

arthurprs · 2017-04-12T22:14:16Z

I'll leave the optimization things for another discussion.

We should think really hard about collections falling behind. It doesn't have to be bullet proof but it can't be fragile.
What's the benefit of making the pin fn take a closure instead of returning a guard?

I assume this is to discourage user from holding up the epoch but on the other hand its less ergonomic and if the user really needs to hold the pin it'll be forced to use the same closure pattern downstream.

ghost · 2017-04-13T18:51:49Z

We should think really hard about collections falling behind. It doesn't have to be bullet proof but it can't be fragile.

You mean, we must put a bound on the amount of accumulated garbage in the GC? If so, that's going to be tricky without blocking threads too much and killing the throughput of concurrent collections. Honestly, I'm not aware of any good solutions to this problem. Hazard pointers are a possibility, but they're terribly slow. Any ideas?

What's the benefit of making the pin fn take a closure instead of returning a guard?

I haven't thought this through, but... two things:

Benchmarks show that taking a closure is slightly faster than initializing and dropping a guard.
I worry about guards being moved into ill places, like thread-local storage.

To elaborate on the second point, consider the following:

thread_local! {
    static GUARD: RefCell<Option<Guard>> = None;
}

Then suppose that someone stores a guard there. When does that guard get dropped? Well, it's possible that it gets dropped on thread exit. However, while all thread-local storage is getting destructed on thread exit, GUARD might get destructed after HARNESS gets destructed (the purpose of harness is to register/unregister the current thread).

So, in order to make guards sound from the safety point of view, when dropped they must also check whether HARNESS is still alive, which adds additional runtime cost. All in all, guards come with a bunch of messy problems, while closures are easy to get under control (reminds me of the leakpocalypse a bit)... :)

But I agree that closures are less ergonomic.

arthurprs · 2017-04-13T19:39:19Z

Maybe I'm being paranoid :)

Regarding the Pin interface

As long as you store a pointer to the harness (to avoid a second HARNESS.with()) it should be very very similar.
Yeah, there's that.

I'm fine with the closure interface, it's a trade-off I guess, ergonomics for "safety".

--
All in all I'm super happy with your proposal, the sad thing is that it's gonna break everything :(

ghost · 2017-04-14T20:12:49Z

Does anyone have experience with the sys_membarrier() system call?

I've just discovered that it might eliminate costly SeqCst fences that get issued on every pinning. This system call issues a global memory barrier on every thread, unlike typical fences that are issued on the current thread only.

Interesting facts about it:

The urcu library is using the syscall.
It might also be a promising solution for highly performant hazard pointers.
The syscall will likely be used in JDK's and CoreCLR's GCs very soon. Until then, there is a workaround using page protection mechanisms.
There is also this excellent writeup about global memory barriers.
This article reports the overhead of 2-3 microseconds per call.

I'm curious whether this syscall could be a viable synchronization method for crossbeam. Before I dive in and try to integrate it into the new GC, I'd like to check whether there are any caveats to watch out for first.

Pinging @joshtriplett because he's probably knowledgeable about all this.

joshtriplett · 2017-04-14T20:32:50Z

@stjepang As long as you can cope with running on a system without it available, then using it when available helps hugely.

One caveat: when you can successfully use it, you really want readers compiled to expect it so that they don't need their own barriers; you don't want a runtime conditional on that. So you may have to have a separate compile-time-specialized version of some reader routines to get maximum performance in that case. (On writers, a runtime conditional won't add significant overhead, but as long as you have the infrastructure...)

Also, as of commit 907565337ebf998a68cb5c5b2174ce5e5da065eb ("Fix: Disable sys_membarrier when nohz_full is enabled"), enabling nohz_full (currently mostly used by real-time and HPC applications) will disable the membarrier syscall. Another reason why you need to cope with not having it at runtime, even for applications that will only run on current kernels.

If you plan to use it, I would highly recommend chatting with Matthieu Desnoyers, who created it. He has some other future ideas for how to improve it, such as having a process-local version that doesn't affect other processes on the system. I'd love to see those enacted, but he needs more of a use case to make the case for them.

ghost · 2017-04-15T18:31:50Z

@joshtriplett Thank you for your response, this is very helpful!
I'll experiment with sys_membarrier a bit and let you all know how that goes.

There are two more things I was wondering:

Is this workaround that uses page protection mechanism good enough for systems on which sys_membarrier is not available? Any caveats with that? Can future kernel versions somehow break this workaround?
AFAIK, urcu and RCU in the kernel never stash garbage away in the same way crossbeam and coco do. They instead call rcu_synchronize, which performs multiple context switches until the garbage is guaranteed to be unreachable. Only then it is actually freed. Am I right in thinking that these context switches are very expensive? In other words, is it true that eager garbage destruction (RCU, rcu_synchronize) has much larger overhead than lazy garbage destruction (epoch-based, garbage bags)?

joshtriplett · 2017-04-15T19:11:30Z

@stjepang RCU does support stashing garbage away. Take a look at call_rcu, which just throws the item on a list to get freed later (used after no reader can reach it anymore, to ensure that any existing reader that might have a reference will have finished); that returns immediately, and you can call it from anywhere. (And even synchronize_rcu doesn't have as much overhead as you might think; it doesn't rely solely on context switches anymore, and it has an expedited variant. But if you want minimal overhead in the write path, you do want the asynchronous approach like call_rcu, potentially along with a throttling mechanism to make sure the garbage doesn't grow faster than you reclaim it.)

I think that code in coreclr relies on an implementation detail of cross-CPU shootdown handling. I don't know how safe that is, and it's definitely not portable.

ghost · 2017-07-06T12:31:33Z

We have crossbeam-epoch and the RFCs repository for further discussion. Closing.

…beam-rs#134) Add clang arguments to the list of arguments that take values.

ghost changed the title ~~The road ahead~~ Improving the garbage collector Apr 12, 2017

schets mentioned this issue Apr 12, 2017

Added ability to let destructors get called #57

Closed

VShell mentioned this issue Apr 17, 2017

Migrate to Coco nivekuil/rculock#2

Open

ghost closed this as completed Jul 6, 2017

exrook pushed a commit to exrook/crossbeam that referenced this issue Oct 7, 2020

Add clang arguments to the list of arguments that take values. (cross…

86da62b

…beam-rs#134) Add clang arguments to the list of arguments that take values.

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving the garbage collector #134

Improving the garbage collector #134

ghost commented Apr 7, 2017 •

edited by ghost

schets commented Apr 7, 2017 •

edited

ghost commented Apr 7, 2017

schets commented Apr 7, 2017

ghost commented Apr 7, 2017

jeehoonkang commented Apr 8, 2017

schets commented Apr 11, 2017 •

edited

ghost commented Apr 11, 2017

schets commented Apr 12, 2017 •

edited

ghost commented Apr 12, 2017 •

edited by ghost

schets commented Apr 12, 2017

schets commented Apr 12, 2017

arthurprs commented Apr 12, 2017

schets commented Apr 12, 2017 •

edited

arthurprs commented Apr 12, 2017 •

edited

schets commented Apr 12, 2017

ghost commented Apr 12, 2017

schets commented Apr 12, 2017

arthurprs commented Apr 12, 2017 •

edited

schets commented Apr 12, 2017

ghost commented Apr 12, 2017

schets commented Apr 12, 2017

arthurprs commented Apr 12, 2017 •

edited

schets commented Apr 12, 2017

ghost commented Apr 12, 2017

arthurprs commented Apr 12, 2017

ghost commented Apr 13, 2017

arthurprs commented Apr 13, 2017 •

edited

ghost commented Apr 14, 2017 •

edited by ghost

joshtriplett commented Apr 14, 2017 •

edited

ghost commented Apr 15, 2017

joshtriplett commented Apr 15, 2017 •

edited

ghost commented Jul 6, 2017

Improving the garbage collector #134

Improving the garbage collector #134

Comments

ghost commented Apr 7, 2017 • edited by ghost

schets commented Apr 7, 2017 • edited

ghost commented Apr 7, 2017

schets commented Apr 7, 2017

ghost commented Apr 7, 2017

jeehoonkang commented Apr 8, 2017

schets commented Apr 11, 2017 • edited

ghost commented Apr 11, 2017

schets commented Apr 12, 2017 • edited

ghost commented Apr 12, 2017 • edited by ghost

schets commented Apr 12, 2017

schets commented Apr 12, 2017

arthurprs commented Apr 12, 2017

schets commented Apr 12, 2017 • edited

arthurprs commented Apr 12, 2017 • edited

schets commented Apr 12, 2017

ghost commented Apr 12, 2017

schets commented Apr 12, 2017

arthurprs commented Apr 12, 2017 • edited

schets commented Apr 12, 2017

ghost commented Apr 12, 2017

schets commented Apr 12, 2017

arthurprs commented Apr 12, 2017 • edited

schets commented Apr 12, 2017

ghost commented Apr 12, 2017

arthurprs commented Apr 12, 2017

ghost commented Apr 13, 2017

arthurprs commented Apr 13, 2017 • edited

ghost commented Apr 14, 2017 • edited by ghost

joshtriplett commented Apr 14, 2017 • edited

ghost commented Apr 15, 2017

joshtriplett commented Apr 15, 2017 • edited

ghost commented Jul 6, 2017

ghost commented Apr 7, 2017 •

edited by ghost

schets commented Apr 7, 2017 •

edited

schets commented Apr 11, 2017 •

edited

schets commented Apr 12, 2017 •

edited

ghost commented Apr 12, 2017 •

edited by ghost

schets commented Apr 12, 2017 •

edited

arthurprs commented Apr 12, 2017 •

edited

arthurprs commented Apr 12, 2017 •

edited

arthurprs commented Apr 12, 2017 •

edited

arthurprs commented Apr 13, 2017 •

edited

ghost commented Apr 14, 2017 •

edited by ghost

joshtriplett commented Apr 14, 2017 •

edited

joshtriplett commented Apr 15, 2017 •

edited