Non-atomic memory is cleared twice #14677

BlobCodes · 2024-06-09T21:12:24Z

Bug Report

When allocating non-atomic memory using the pointer_malloc or allocate primitives, crystal currently always inserts a LLVM memset intrinsic [1] [2] to clear the malloc-d memory.

However, bdwgc already ensures that all memory allocated using GC.malloc is cleared - so this is actually unnecessary.

The text was updated successfully, but these errors were encountered:

straight-shoota · 2024-06-10T19:07:29Z

First of all, this is a great find! Memory allocations are a huge performance factor in many Crystal applications. Speeding that up is a great win.

I'd like to discuss this independently of the proposed solution #14679 which IMO goes too far in one go.

I believe we should take smaller steps:

Implement memory clearing for gc_none (GC.malloc doesn't clear memory (as specified) for gc_none #14678). This is an obvious bug because the documentation clearly states that GC.malloc should clear the memory.
That allows dropping the memset intrinsic in the compiler, getting rid of double clearing. This resolves this bug report.

Adding support for allocating non-zeroed memory as a performance improvement is a separate feature request.
Let's discuss that in #14687.

ysbaddaden · 2024-06-13T20:19:02Z

Copied from #14687 (I mixed up the issues 🤦):

Investigating memory allocations in codegen, there are only a few calls:

Codegen#malloc is called by Codegen#malloc_closure and Codegen#malloc_aggregate.
Codegen#malloc_atomic is only called from Codegen#allocate_aggregate.
Codegen#array_malloc and Codegen#array_malloc_atomic are only used to implement Pointer.malloc.

Of these use cases:

Codegen#malloc_closure: I don't know if it needs the memory to be initialized to zero (it currently does or doesn't because of GC.malloc doesn't clear memory (as specified) for gc_none #14678); I believe this is for Allocation of non-zeroed memory for performance boost #14687 to decide;
Codegen#allocate_aggregate: it needs the memory to be initialized to zero, and may do it twice => 🐛
Pointer(T).malloc: the memory must be initialized to zero (whatever if T has inner pointers) and may do it twice => 🐛 (thanks @BlobCodes for correcting me).

Proposal:

Having the four Codegen#[array_]malloc[_atomic] methods always return initialized memory would harmonize the malloc methods behavior, then fixing #malloc_aggregate #codegen_primitive_pointer_malloc to avoid a pointless memset should be enough to fix this issue simply & quickly.

Then #14687 could go and squeeze even more performance.

ysbaddaden · 2024-06-14T09:56:18Z

The only issue with #14710 is that it changes the public GC.malloc_atomic to now clear the memory, and we use it to allocate buffers in stdlib (searching github, there are only a couple usages of GC.malloc_atomic)... when I thought to fix the codegen helpers only.

This is a much better move to harmonize it all and you were right all along: we'll want a GC.unsafe_malloc_atomic or GC.uninitialized_malloc_atomic (just with reversed naming). I'd prefer an arg, but that wouldn't work with LLVM allockind attribute, right?

Maybe do this in three steps?

fix GC.malloc for gc/none (bugfix);
fix Codegen#[array_]malloc_atomic to return initialized memory + fix double clears;
add GC.unsafe_malloc_atomic (or GC.uninitialized_malloc_atomic), use it to allocate buffers in stdlib, make GC.malloc_atomic to always initialize memory, and finally update Codegen#malloc_atomic (memory is already initialized).

I believe 1. and 2. would be quickly merged (mere bugfixes), while we can discuss 3. (feature change of the public API).

I understand this is getting very boring to deal with. We're splitting our own PRs in a myriad of smaller ones on a daily basis 😫

Note: this is only my personal opinion. Let's wait for the others.

straight-shoota · 2024-06-16T20:43:11Z

I recalled another discussion which showed an issue with zeroing out memory in the initialization. It makes it impossible to lazily allocate memory via mmap, so that it only assigns the memory that is actually written to. If the runtime explicitly writes the entire memory slice full of zeros, lazy allocation doesn't work (and I don't think this can be optimized away).
So it makes a lot of sense to delegate zeroing to the allocator. It can make the best decisions about how to implement it most efficiently. For example this can be the first time that the memory is actually accessed (i.e. written to).

Ref #11572 (comment) ff.

straight-shoota · 2024-06-17T22:13:51Z

What's the point in having GC.malloc_atomic return zeroed memory?
The main reason for zeroing is to prevent the GC from misidentifying garbage data as pointers. So this is only necessary for memory in which the GC looks for pointers.
GC.malloc_atomic is explicitly for pointer-free memory. So I don't see why it would need to clear the memory.

It is a public API method. Changing its semantics has performance implications for a lot of user code. We should only do that if there's a really good reason for it.

BlobCodes · 2024-06-17T22:50:59Z

Dirty memory can always cause hard-to-fix bugs, so having clean memory be the default could be a gold idea.

The main reason is to be able to provide a consistent API together with support for #14687.

If we later want to have a non-clearing variant of malloc and a clearing variant of malloc_atomic, the inconsistent clearing behaviour between the two existing methods will probably cause an ugly API.

The way I see it, we currently have three options when wanting to implement #14687:

Retrofit the old methods to have consistent clearing behaviour and add new _unsafe or _clean variants consistently.
Have an inconsistent/bad API (malloc_dirty + malloc_atomic_clean
Deprecate current methods and re-do it

We definitely want some way of allocating clean pointer-free memory because the compiler expects clean memory for reference types and Pointer.malloc - and the allocator will be more efficient at it than the compiler.

straight-shoota · 2024-06-24T11:19:24Z

Having a clean default sounds good, of course. But it has a performance cost.
Feature correctness should always come before performance, but this isn't about correctness, it's about preventing a light foot gun. But this is more like a gun that you cannot point down and you have to lift your foot in order to hit it.

AFAIK GC.malloc_atomic has always been returning non-zeroed memory and I'm not aware of any particular issues with that. So I believe it's reasonable to assume the risk is relatively small.

The issue we're addressing here is that the same memory is cleared twice. We should focus on that and not introduce any other changes in the same step.

Also, I don't want to introduce an avoidable change without offering an alternative to retain current performance.
We need to continue the discussion about changing the API for memory allocations.

HertzDevil · 2024-06-24T22:38:10Z

In LLVM 19 there is now an initializes attribute which supposedly aids dead store elimination of the memset in use cases like this: llvm/llvm-project#84803

BlobCodes added the kind:bug label Jun 9, 2024

BlobCodes linked a pull request Jun 9, 2024 that will close this issue

Optimize non-atomic memory allocation #14679

Open

straight-shoota added the topic:compiler:codegen label Jun 10, 2024

BlobCodes mentioned this issue Jun 13, 2024

Allocation of non-zeroed memory for performance boost #14687

Open

BlobCodes linked a pull request Jun 13, 2024 that will close this issue

Fix unnecessarily clearing clean memory in codegen #14710

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-atomic memory is cleared twice #14677

Non-atomic memory is cleared twice #14677

BlobCodes commented Jun 9, 2024

straight-shoota commented Jun 10, 2024 •

edited

Loading

ysbaddaden commented Jun 13, 2024

ysbaddaden commented Jun 14, 2024 •

edited

Loading

straight-shoota commented Jun 16, 2024

straight-shoota commented Jun 17, 2024 •

edited

Loading

BlobCodes commented Jun 17, 2024

straight-shoota commented Jun 24, 2024

HertzDevil commented Jun 24, 2024

Non-atomic memory is cleared twice #14677

Non-atomic memory is cleared twice #14677

Comments

BlobCodes commented Jun 9, 2024

Bug Report

straight-shoota commented Jun 10, 2024 • edited Loading

ysbaddaden commented Jun 13, 2024

ysbaddaden commented Jun 14, 2024 • edited Loading

straight-shoota commented Jun 16, 2024

straight-shoota commented Jun 17, 2024 • edited Loading

BlobCodes commented Jun 17, 2024

straight-shoota commented Jun 24, 2024

HertzDevil commented Jun 24, 2024

straight-shoota commented Jun 10, 2024 •

edited

Loading

ysbaddaden commented Jun 14, 2024 •

edited

Loading

straight-shoota commented Jun 17, 2024 •

edited

Loading