Collector:Allocator implementation#15
Conversation
ad787f7 to
79f3369
Compare
73ad5f4 to
79f3369
Compare
|
Ooooh, interesting! Definitely feel free to take notes in the |
sure, I am already working on writing the notes for this one, would keep adding up findings there |
36549ee to
a8d1ee3
Compare
| const DEFAULT_HEAP_THRESHOLD: usize = 2_097_152; | ||
|
|
||
| #[derive(Debug)] | ||
| pub struct ArenaAllocator<'alloc> { |
There was a problem hiding this comment.
A bit of early feedback: with all the substantial changes here to arena2, can you split this apart into an arena3 module.
It's probably a heavily debatable way to manage updates in this repository, but since we are really experimenting here. The goal is any significant updates to an approach are done in a separate module (hence arena2). Eventually, we will have to clean this up probably, but for now, I'd prefer them be kept separate and readily available without going back through the git history.
1363735 to
acdc487
Compare
|
Spent the weekend reworking the memory layer. will share the full notes soon but here’s the gist. |
f0de58e to
7f6a618
Compare
|
these are the benchmark results for shruti@DESKTOP-QR46EJI:~/oscars$ cargo bench --bench oscars_vs_boa_gc --features="gc_allocator"
gc_node_allocation/oscars/10
time: [324.61 ns 342.98 ns 364.61 ns]
change: [-0.5585% +5.1368% +10.540%] (p = 0.07 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
gc_node_allocation/boa_gc/10
time: [693.13 ns 716.79 ns 743.52 ns]
change: [-9.7359% -2.4471% +6.6236%] (p = 0.56 > 0.05)
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
gc_node_allocation/oscars/100
time: [2.8213 µs 3.0267 µs 3.2572 µs]
change: [-4.4722% +2.9538% +11.148%] (p = 0.46 > 0.05)
No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
7 (7.00%) high mild
3 (3.00%) high severe
gc_node_allocation/boa_gc/100
time: [6.1530 µs 6.4731 µs 6.8352 µs]
change: [-3.3843% +4.4959% +13.061%] (p = 0.29 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
gc_node_allocation/oscars/1000
time: [22.919 µs 24.553 µs 26.626 µs]
change: [-13.580% -6.3863% +1.0228%] (p = 0.11 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
gc_node_allocation/boa_gc/1000
time: [56.627 µs 59.236 µs 61.990 µs]
change: [-7.0592% +0.6745% +9.0401%] (p = 0.88 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
gc_collection_pause/oscars/100
time: [1.0923 µs 1.1522 µs 1.2204 µs]
change: [-3.9908% +4.8663% +14.772%] (p = 0.29 > 0.05)
No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe
gc_collection_pause/boa_gc/100
time: [10.927 µs 11.561 µs 12.221 µs]
change: [-10.990% +3.3920% +19.467%] (p = 0.65 > 0.05)
No change in performance detected.
gc_collection_pause/oscars/500
time: [3.4679 µs 3.9151 µs 4.4116 µs]
change: [-37.254% -29.117% -19.624%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
12 (12.00%) high mild
3 (3.00%) high severe
gc_collection_pause/boa_gc/500
time: [54.036 µs 56.947 µs 60.026 µs]
change: [-30.187% -20.270% -9.0304%] (p = 0.00 < 0.05)
Performance has improved.
gc_collection_pause/oscars/1000
time: [6.5950 µs 7.2183 µs 7.9136 µs]
change: [-16.723% -8.3210% +1.0927%] (p = 0.08 > 0.05)
No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
gc_collection_pause/boa_gc/1000
time: [34.886 µs 35.937 µs 37.086 µs]
change: [-31.887% -25.557% -18.150%] (p = 0.00 < 0.05)
Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
9 (9.00%) high mild
8 (8.00%) high severe
vector_creation/oscars_gc_allocator/10
time: [51.070 ns 52.521 ns 54.353 ns]
change: [-1.9106% +3.0103% +8.5429%] (p = 0.28 > 0.05)
No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) high mild
7 (7.00%) high severe
vector_creation/boa_gc_std_vec/10
time: [207.46 ns 220.32 ns 233.96 ns]
change: [-9.2989% +7.1879% +31.087%] (p = 0.46 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
vector_creation/oscars_gc_allocator/100
time: [224.74 ns 232.80 ns 242.65 ns]
change: [+6.6757% +12.404% +18.626%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
vector_creation/boa_gc_std_vec/100
time: [522.97 ns 575.56 ns 628.75 ns]
change: [-24.421% -9.7736% +6.8050%] (p = 0.26 > 0.05)
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
vector_creation/oscars_gc_allocator/1000
time: [2.2487 µs 2.4200 µs 2.6150 µs]
change: [+9.4412% +18.613% +29.336%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high mild
vector_creation/boa_gc_std_vec/1000
time: [107.40 µs 140.83 µs 178.56 µs]
change: [+27.898% +80.440% +153.45%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe
vec_of_gc_pointers/oscars_gc_vec_gc_ptrs/10
time: [326.68 ns 344.09 ns 362.94 ns]
change: [-9.5025% -1.7014% +6.3636%] (p = 0.68 > 0.05)
No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe
vec_of_gc_pointers/boa_gc_std_vec_gc_ptrs/10
time: [1.0198 µs 1.0708 µs 1.1306 µs]
change: [-7.6137% +6.2234% +22.640%] (p = 0.42 > 0.05)
No change in performance detected.
Found 19 outliers among 100 measurements (19.00%)
6 (6.00%) low mild
6 (6.00%) high mild
7 (7.00%) high severe
vec_of_gc_pointers/oscars_gc_vec_gc_ptrs/50
time: [1.5112 µs 1.5983 µs 1.7048 µs]
change: [+4.4749% +11.784% +19.161%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
vec_of_gc_pointers/boa_gc_std_vec_gc_ptrs/50
time: [4.7083 µs 4.9511 µs 5.2162 µs]
change: [-3.2445% +11.323% +27.972%] (p = 0.15 > 0.05)
No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
vec_of_gc_pointers/oscars_gc_vec_gc_ptrs/100
time: [4.3779 µs 4.7675 µs 5.1797 µs]
change: [-0.9358% +6.1997% +14.485%] (p = 0.11 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
vec_of_gc_pointers/boa_gc_std_vec_gc_ptrs/100
time: [10.463 µs 11.727 µs 13.148 µs]
change: [+14.656% +33.075% +53.141%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
mixed_workload/oscars_alloc_collect_cycle
time: [6.1953 µs 6.2563 µs 6.3346 µs]
change: [-15.302% -10.382% -5.6999%] (p = 0.00 < 0.05)
Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) high mild
11 (11.00%) high severe
mixed_workload/boa_gc_alloc_collect_cycle
time: [13.641 µs 13.760 µs 13.920 µs]
change: [-15.500% -12.865% -10.300%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
memory_pressure/oscars_churn
time: [15.039 µs 15.112 µs 15.208 µs]
change: [-19.296% -15.647% -12.119%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
memory_pressure/boa_gc_churn
time: [34.167 µs 34.311 µs 34.477 µs]
change: [-23.290% -19.349% -15.381%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
6 (6.00%) high mild
5 (5.00%) high severe
deep_object_graph/oscars_tree_depth_5
time: [15.058 µs 15.142 µs 15.234 µs]
change: [-9.2173% -6.5234% -4.0217%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
6 (6.00%) high mild
3 (3.00%) high severe
deep_object_graph/boa_gc_tree_depth_5
time: [31.867 µs 31.969 µs 32.100 µs]
change: [-23.440% -19.600% -15.864%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe |
7f6a618 to
f1ed476
Compare
9ee0024 to
31fe1d6
Compare
31fe1d6 to
c601541
Compare
nekevss
left a comment
There was a problem hiding this comment.
Early feedback! I'm going to try to get through the rest of the PR later tonight or tomorrow.
Super interesting work here. Definitely probably could have been broken into smaller PRs (for future reference), but that's fine for now since this was rather open ended. So let's just move ahead with this. Besides, it's hard to fault a PR that's 25% benchmarks and notes 😆
| @@ -0,0 +1,107 @@ | |||
| # arena2 vs arena3 benchmark results | |||
There was a problem hiding this comment.
praise: genuinely, super interesting read.
I would be super interested to see benchmarks on how long it takes to deallocate. The loss in performance in the allocation may be worthwhile if we gain more efficiency in the objects layouts and the deallocation.
There was a problem hiding this comment.
good idea, will add a dealloc bench
| @@ -0,0 +1,52 @@ | |||
| # `Collector: Allocator`: is it possible and is it safe? | |||
There was a problem hiding this comment.
thought: this is rather compelling argument to me that the idea is probably ultimately a bad one in the long term, but was still worth investigating nonetheless and by no means a negative against the work in this PR.
I think long term moving to a moving collector is going to need to occur, so excluding those types of collectors is a non-starter to me.
There was a problem hiding this comment.
agreed. good to have it written down though so we don't revisit this later
oscars/benches/arena2_vs_arena3.rs
Outdated
|
|
||
| fn print_results_summary() { | ||
| eprintln!("\nresults summary:"); | ||
| eprintln!("speed: arena2 is faster"); |
There was a problem hiding this comment.
nit: remove results summary
It's debatable when its in comments, but I definitely don't think it should printed anywhere. These conclusions are better left for the notes or even reference the written summary in notes.
oscars/src/alloc/arena3/alloc.rs
Outdated
| } | ||
|
|
||
| // SAFETY: `Arena` is used only from a single threaded GC context | ||
| unsafe impl Send for Arena {} |
There was a problem hiding this comment.
question: why are we implementing Send here?
There was a problem hiding this comment.
Cell<*mut u8> makes Arena !Send by default, Send impl is needed so the collector can be moved to its own thread after init. can expand the SAFETY comment if helpful
| "slot_size must be >= 8 (for free-list pointer)" | ||
| ); | ||
|
|
||
| let estimated = total_capacity / slot_size; |
There was a problem hiding this comment.
question: is this correct?
I'm finding it a bit hard to track. Can you add some documentation for it, and also some unit tests to insure we are calculating it correctly.
There was a problem hiding this comment.
yep it's correct, did a slight overestimate of slot count so the bitmap ends up a few bits larger than needed then corrected by slot_area. Will add a doc comment walking through it and some unit tests
| use std::collections::{HashMap, HashSet}; | ||
| #[cfg(feature = "std")] | ||
| use std::path::{Path, PathBuf}; | ||
| //#[cfg(feature = "std")] |
There was a problem hiding this comment.
question: why is this commented out?
There was a problem hiding this comment.
oops 😅 seems to be a mistake on my end, will uncomment this
There was a problem hiding this comment.
I think I commented it out since I didn't see any impl for Path, PathBuf, maybe I can add the impl here
nekevss
left a comment
There was a problem hiding this comment.
Another round of reviews! I still need to go through the collection stuff. But as with the last round, this is all really interesting.
I do have one issue with the Root/Gc split, and the current impl of the allocator looks more like a mempool to me vs. an arena, but that isn't necessarily a bad thing. I had wanted to try out a mempool allocator at some point, and now we have a pretty decent pass at a working one.
oscars/src/alloc/arena3/alloc.rs
Outdated
| pub fn alloc_slot(&self) -> Option<NonNull<u8>> { | ||
| // pop from free list if available | ||
| let fl = self.free_list.get(); | ||
| if !fl.is_null() { |
There was a problem hiding this comment.
question: Why not use NonNull here?
oscars/src/alloc/arena3/alloc.rs
Outdated
| // | ||
| // buffer: `[ bitmap ][ slots ]` | ||
| // bitmap bit `i` is 1 when occupied | ||
| pub struct Arena { |
There was a problem hiding this comment.
thought: this really isn't so much an Arena as it is a mempool.
The semantics here do sort of matter. So we may want to move it over. I'm also a bit curious if there is any concepts from mempool2 that may be useful here.
There was a problem hiding this comment.
yeah you're totally right, it's definitely behaving more like a mempool. Even if we drop the free list to make it a strict arena we lose slot recycling and waste a ton of memory. I have added a TODO note for now about the mempool move. Since this PR is already pretty big maybe I should do the move in a follow up issue or want me to just do it here?
There was a problem hiding this comment.
We can do it in a follow up. But I do think it may be worth revisiting some of the details here with "this is a mempool" in mind.
oscars/src/alloc/arena3/alloc.rs
Outdated
| if self.live.get() > 0 { | ||
| return false; | ||
| } | ||
| for word_idx in 0..self.bitmap_words { |
There was a problem hiding this comment.
question: Why are we iterating over the bitmap_words here?
I'm trying to understand what is gained by the bitmap in this event. Is there a time when a dead object in this current version of the mempool would not be on the free list?
oscars/src/alloc/arena3/mod.rs
Outdated
| pub(crate) arena_size: usize, | ||
| pub(crate) current_heap_size: usize, | ||
| // all typed GC arenas | ||
| pub(crate) typed_arenas: Vec<Arena>, |
There was a problem hiding this comment.
question: why use Vec here vs. a HashMap?
There was a problem hiding this comment.
free_slot has to do a pointer range check to find which arena owns the pointer regardless, so I think a HashMap doesn't save us much work. Since we don't end up with that many arenas anyway and O(1) caching handling most lookups, Vec works well, I can always change it if we need to though.
| self.key.mark(HeaderColor::Grey); | ||
| } | ||
| unsafe fn trace(&self, _color: TraceColor) { | ||
| panic!("Trace::trace called on Ephemeron directly; must be dispatched via vtable"); |
There was a problem hiding this comment.
hmmmmmmm, idk about this ... is it better to have a debug_assert here to catch in debug builds and make this a noop in prod? ... we should really consider this.
| _key_size: usize, | ||
| _value_type_id: TypeId, | ||
| _value_size: usize, | ||
| is_reachable_fn: unsafe fn(this: ErasedEphemeron, color: TraceColor) -> bool, |
There was a problem hiding this comment.
question: what is the reason for removing the types here? I'd prefer them left unless there is a specific technical reason preventing it that I'm not noticing.
| self.root_count | ||
| .get() | ||
| .checked_add(1) | ||
| .expect("root count overflow: more than u16::MAX roots on a single GcBox"), |
There was a problem hiding this comment.
Yeah, I'm fine with this lol if anyone besides Boa is using this and one day files an issue for a crash, I feel like that's a completely different discussion about what they are doing to hit that many roots.
There was a problem hiding this comment.
haha agreed, would be a very interesting bug report 😆
|
|
||
| /// A garbage-collected pointer type over an immutable value. | ||
| /// A garbage-collected handle that acts as an external root | ||
| pub struct Root<T: Trace + ?Sized + 'static> { |
There was a problem hiding this comment.
issue: this separation of Root and Gc is most likely not sound and should be moved back to the original version. It will be nice to one day make these intrinsics Copy, but it's not possible with the current Gc.
There was a problem hiding this comment.
@shruti2522 Pinging you to make sure this one didn't get lost amongst the others. Please make sure to address this issue
There was a problem hiding this comment.
yes working on this one, sorry forgot to reply😅
|
|
||
| impl<T> GcAllocVec<T> { | ||
| #[inline] | ||
| pub fn new(collector: &MarkSweepGarbageCollector) -> Self { |
|
|
||
| impl<T> GcAllocBox<T> { | ||
| #[inline] | ||
| pub fn new(value: T, collector: &MarkSweepGarbageCollector) -> Self { |
|
Reverted the Root/Gc split, also made some changes in the direction of the mempool move. Split the |
5f8dc8a to
9d5c9c6
Compare
nekevss
left a comment
There was a problem hiding this comment.
I missed these on the last review. Lucky we had the clippy lint trigger.
The review basically boils down to this: We should not be exposing a value_mut method in the public API. It's most likely not sound to do so, and it's only ever really used in the Drop implementation because we need to be able to drop inner values.
oscars/src/alloc/arena2/alloc.rs
Outdated
| } | ||
|
|
||
| pub(crate) fn value_mut(&mut self) -> &mut T { | ||
| pub fn value_mut(&mut self) -> &mut T { |
There was a problem hiding this comment.
issue: this should not be a public API.
|
|
||
| // manually drop the value through the pointer, then free the slot | ||
| let mut heap_item_ptr = a.as_ptr(); | ||
| unsafe { |
There was a problem hiding this comment.
issue: this drop_in_place needs to happen in the free_slot method.
We should not be exposing a value_mut method. I'm not entirely sure it's sound to do so, so we should make sure to not allow it in our public API.
|
made |
nekevss
left a comment
There was a problem hiding this comment.
I'm fine with moving forward with this as is.
We will definitely need to do some general clean up in future iterations on this, and we will most likely need to eventually remove the collector trait experiment. But this is very good overall, so let's merge it and iterate on it.
Thanks for the contribution!
Thanks! that makes sense. It definitely needs some cleanup, glad to get this merged and iterate from here 🚀 |
GcAllocator prototype for issue #11
I have been working on a prototype,exploring a
GcAllocator<'gc>wrapper that bridgesArenaAllocatorto the Allocator trait usingRefCellThis is a work in progress to understand the requirements and constraints around a
Collector: Allocatordesign better, I'll keep iterating here and documenting findings as i go