Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-atomic memory is cleared twice #14677

Open
BlobCodes opened this issue Jun 9, 2024 · 8 comments · May be fixed by #14679 or #14710
Open

Non-atomic memory is cleared twice #14677

BlobCodes opened this issue Jun 9, 2024 · 8 comments · May be fixed by #14679 or #14710

Comments

@BlobCodes
Copy link
Contributor

Bug Report

When allocating non-atomic memory using the pointer_malloc or allocate primitives, crystal currently always inserts a LLVM memset intrinsic [1] [2] to clear the malloc-d memory.

However, bdwgc already ensures that all memory allocated using GC.malloc is cleared - so this is actually unnecessary.

@straight-shoota
Copy link
Member

straight-shoota commented Jun 10, 2024

First of all, this is a great find! Memory allocations are a huge performance factor in many Crystal applications. Speeding that up is a great win.

I'd like to discuss this independently of the proposed solution #14679 which IMO goes too far in one go.

I believe we should take smaller steps:

  1. Implement memory clearing for gc_none (GC.malloc doesn't clear memory (as specified) for gc_none #14678). This is an obvious bug because the documentation clearly states that GC.malloc should clear the memory.
  2. That allows dropping the memset intrinsic in the compiler, getting rid of double clearing. This resolves this bug report.

Adding support for allocating non-zeroed memory as a performance improvement is a separate feature request.
Let's discuss that in #14687.

@ysbaddaden
Copy link
Contributor

Copied from #14687 (I mixed up the issues 🤦):

Investigating memory allocations in codegen, there are only a few calls:

  • Codegen#malloc is called by Codegen#malloc_closure and Codegen#malloc_aggregate.
  • Codegen#malloc_atomic is only called from Codegen#allocate_aggregate.
  • Codegen#array_malloc and Codegen#array_malloc_atomic are only used to implement Pointer.malloc.

Of these use cases:

Proposal:

Having the four Codegen#[array_]malloc[_atomic] methods always return initialized memory would harmonize the malloc methods behavior, then fixing #malloc_aggregate #codegen_primitive_pointer_malloc to avoid a pointless memset should be enough to fix this issue simply & quickly.

Then #14687 could go and squeeze even more performance.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Jun 14, 2024

The only issue with #14710 is that it changes the public GC.malloc_atomic to now clear the memory, and we use it to allocate buffers in stdlib (searching github, there are only a couple usages of GC.malloc_atomic)... when I thought to fix the codegen helpers only.

This is a much better move to harmonize it all and you were right all along: we'll want a GC.unsafe_malloc_atomic or GC.uninitialized_malloc_atomic (just with reversed naming). I'd prefer an arg, but that wouldn't work with LLVM allockind attribute, right?

Maybe do this in three steps?

  1. fix GC.malloc for gc/none (bugfix);
  2. fix Codegen#[array_]malloc_atomic to return initialized memory + fix double clears;
  3. add GC.unsafe_malloc_atomic (or GC.uninitialized_malloc_atomic), use it to allocate buffers in stdlib, make GC.malloc_atomic to always initialize memory, and finally update Codegen#malloc_atomic (memory is already initialized).

I believe 1. and 2. would be quickly merged (mere bugfixes), while we can discuss 3. (feature change of the public API).

I understand this is getting very boring to deal with. We're splitting our own PRs in a myriad of smaller ones on a daily basis 😫

Note: this is only my personal opinion. Let's wait for the others.

@straight-shoota
Copy link
Member

I recalled another discussion which showed an issue with zeroing out memory in the initialization. It makes it impossible to lazily allocate memory via mmap, so that it only assigns the memory that is actually written to. If the runtime explicitly writes the entire memory slice full of zeros, lazy allocation doesn't work (and I don't think this can be optimized away).
So it makes a lot of sense to delegate zeroing to the allocator. It can make the best decisions about how to implement it most efficiently. For example this can be the first time that the memory is actually accessed (i.e. written to).

Ref #11572 (comment) ff.

@straight-shoota
Copy link
Member

straight-shoota commented Jun 17, 2024

What's the point in having GC.malloc_atomic return zeroed memory?
The main reason for zeroing is to prevent the GC from misidentifying garbage data as pointers. So this is only necessary for memory in which the GC looks for pointers.
GC.malloc_atomic is explicitly for pointer-free memory. So I don't see why it would need to clear the memory.

It is a public API method. Changing its semantics has performance implications for a lot of user code. We should only do that if there's a really good reason for it.

@BlobCodes
Copy link
Contributor Author

Dirty memory can always cause hard-to-fix bugs, so having clean memory be the default could be a gold idea.

The main reason is to be able to provide a consistent API together with support for #14687.

If we later want to have a non-clearing variant of malloc and a clearing variant of malloc_atomic, the inconsistent clearing behaviour between the two existing methods will probably cause an ugly API.

The way I see it, we currently have three options when wanting to implement #14687:

  • Retrofit the old methods to have consistent clearing behaviour and add new _unsafe or _clean variants consistently.
  • Have an inconsistent/bad API (malloc_dirty + malloc_atomic_clean
  • Deprecate current methods and re-do it

We definitely want some way of allocating clean pointer-free memory because the compiler expects clean memory for reference types and Pointer.malloc - and the allocator will be more efficient at it than the compiler.

@straight-shoota
Copy link
Member

Having a clean default sounds good, of course. But it has a performance cost.
Feature correctness should always come before performance, but this isn't about correctness, it's about preventing a light foot gun. But this is more like a gun that you cannot point down and you have to lift your foot in order to hit it.

AFAIK GC.malloc_atomic has always been returning non-zeroed memory and I'm not aware of any particular issues with that. So I believe it's reasonable to assume the risk is relatively small.

The issue we're addressing here is that the same memory is cleared twice. We should focus on that and not introduce any other changes in the same step.

Also, I don't want to introduce an avoidable change without offering an alternative to retain current performance.
We need to continue the discussion about changing the API for memory allocations.

@HertzDevil
Copy link
Contributor

In LLVM 19 there is now an initializes attribute which supposedly aids dead store elimination of the memset in use cases like this: llvm/llvm-project#84803

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants