Skip to content
This repository has been archived by the owner on Oct 12, 2022. It is now read-only.

Issue 19946 - Add memset template function #3837

Closed
wants to merge 1 commit into from

Conversation

dkorpel
Copy link
Contributor

@dkorpel dkorpel commented Jun 8, 2022

This allows dmd to lower array assignment to memset in the frontend instead of the glue layer.

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @dkorpel! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Auto-close Bugzilla Severity Description
19946 regression In betterC filling an array with a non-zero value fails for types of size > 1 due to missing _memset16/_memset32/etc.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + druntime#3837"

src/rt/memset.d Outdated
array = array to set
value = value to set each array element to
*/
void _d_memset(T)(T[] array, T value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const on the RHS?

src/rt/memset.d Outdated
@@ -24,6 +24,36 @@ extern (C)

extern (C):

/**
Set all elements of an array to be a specified value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Set all elements of an array to be a specified value.
Set all elements of an array to a specified value.

Copy link
Member

@ibuclaw ibuclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core.internal.memset?

IIRC this rt module is not compilable due to dmd-specific assembly.

@maxhaton
Copy link
Member

maxhaton commented Jun 9, 2022

core.internal.memset?

IIRC this rt module is not compilable due to dmd-specific assembly.

This is dmd specific unless you want your lowering to gain a function call

@ibuclaw
Copy link
Member

ibuclaw commented Jun 9, 2022

lower array assignment to memset in the frontend instead of the glue layer.

Doesn't sound dmd-specific to me.

@maxhaton
Copy link
Member

maxhaton commented Jun 9, 2022

lower array assignment to memset in the frontend instead of the glue layer.

Doesn't sound dmd-specific to me.

It should be dmd specific, put it that way. It can be made optional with a frontend param.

@RazvanN7
Copy link
Contributor

RazvanN7 commented Jun 9, 2022

We can just move the function to a different file, but it really doesn't matter since the template instance will go in the resulted object file.

@ibuclaw
Copy link
Member

ibuclaw commented Jun 9, 2022

We can just move the function to a different file, but it really doesn't matter since the template instance will go in the resulted object file.

It does matter if you can't compile the module it's located in.

@maxhaton
Copy link
Member

maxhaton commented Jun 9, 2022

We can just move the function to a different file, but it really doesn't matter since the template instance will go in the resulted object file.

It does matter if you can't compile the module it's located in.

The module what is contained in? The whole point of this is that dmd errors with betterC because it assumes some code is present when its not, hence a template.

@kinke
Copy link
Contributor

kinke commented Jun 9, 2022

How can this work in the first place? The rt.* modules aren't deployed by DMD releases, so cannot be imported.


IMO, these dummy-templatizations for -betterC compatibility are absolutely terrible. In an ideal world, the druntime module interdependencies would be carefully laid out, so that some -betterC project can still link druntime without bloating the binary too much. E.g., for these DMD-backend-specific memset helpers, one should only need the memset.o object file from the static druntime lib, done.

@ibuclaw
Copy link
Member

ibuclaw commented Jun 9, 2022

We can just move the function to a different file, but it really doesn't matter since the template instance will go in the resulted object file.

It does matter if you can't compile the module it's located in.

The module what is contained in? The whole point of this is that dmd errors with betterC because it assumes some code is present when its not, hence a template.

memset.d:40:17: error: found ‘EDI’ when expecting ‘:’
   40 |         mov     EDI,p           ;
      |                 ^
memset.d:40:20: error: expected constant string constraint for operand, not ‘,’
   40 |         mov     EDI,p           ;
      |                    ^

Yada, yada.

@maxhaton
Copy link
Member

maxhaton commented Jun 9, 2022

We can just move the function to a different file, but it really doesn't matter since the template instance will go in the resulted object file.

It does matter if you can't compile the module it's located in.

The module what is contained in? The whole point of this is that dmd errors with betterC because it assumes some code is present when its not, hence a template.

memset.d:40:17: error: found ‘EDI’ when expecting ‘:’
   40 |         mov     EDI,p           ;
      |                 ^
memset.d:40:20: error: expected constant string constraint for operand, not ‘,’
   40 |         mov     EDI,p           ;
      |                    ^

Yada, yada.

The existing functions will be removed. Arguably the whole module then just move this into arrayop perhaps.

Note that arrayOp can nearly do memset

src/rt/memset.d Outdated
void _d_memset(T)(T[] array, T value)
{
foreach (i; 0 .. array.length)
array[i] = value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this inefficient assignment? Please fallback to the actual memset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to start with something simple that works. memset doesn't support multi-byte values and is not available at CTFE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to start with something simple that works. memset doesn't support multi-byte values and is not available at CTFE.

Ah sure, I understand. There's already intrinsics (below) to do multi-byte assignment, tho. But yeah, I agree that the complete version should be done separately as it might require a more complex/complete implementation.

I'll leave a list of things we need to account for:

  • ellaborate assignment and CTFE (operator overloading, this implementation)
  • single byte assignment (straight memset call)
  • multi byte assignment (possibly a generic and inline assembly versions, would be cool to have a generic version with the __vector API)

For the byte assignments we need to check for immutable qualifiers or for copy ctors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler will automatically do this.

No one cares or should care about dmd performance, LLVM and GCC both go to some length to find these kinds of assignments and optimize them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KIFSS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler will automatically do this.

No one cares or should care about dmd performance, LLVM and GCC both go to some length to find these kinds of assignments and optimize them.

It does for single-byte assignment, although, for multiple bytes like 32-bit size, it does a poor job. See https://godbolt.org/z/994n163ao . I would also value not having such performance regression on DMD tho. We can probably do a cleaner generic implementation with the __vector API, but I agree doing those on separate PRs. I believe there is a huge performance benefit here. This is also a cool thing to propose on the LLVM optimization/transformation passes. I'm not sure how about cache/branch predictor would behave with this inline asm but I think it doesn't matter, given the generated assembly.

@dkorpel
Copy link
Contributor Author

dkorpel commented Jun 10, 2022

In an ideal world, the druntime module interdependencies would be carefully laid out, so that some -betterC project can still link druntime without bloating the binary too much. E.g., for these DMD-backend-specific memset helpers, one should only need the memset.o object file from the static druntime lib, done.

That's a good stand point, but currently:

  • betterC is defined as 'subset of D requiring only the C standard lib, not druntime'
  • assigning a value to an array requires druntime (when compiling with DMD)
  • people want to be able to assign values to an array in betterC without hassle (issue 19946 and duplicates)

We can't have all 3, and the template solution would remove point 2. You're either suggesting to remove point 3 by asking users to link memset.o, or point 1 when the compiler does it automatically. That's essentially rebranding betterC to be 'light druntime' instead of 'no druntime'. That might not be a bad idea, after all some users call -betterC 'worseD' or 'a cancer' and advocate for custom light druntimes instead. It would be a bigger commitment that requires approval from Walter and Atila though.

@dkorpel
Copy link
Contributor Author

dkorpel commented Jun 10, 2022

IIRC this rt module is not compilable due to dmd-specific assembly.

Is there a version other than version (D_InlineAsm_X86) that would make GDC ignore it instead?

@RazvanN7
Copy link
Contributor

RazvanN7 commented Jul 5, 2022

@dkorpel what are your plans with this PR?

@dkorpel
Copy link
Contributor Author

dkorpel commented Jul 5, 2022

I think I'll follow Iain's suggestion and move it to core.internal.memset

@RazvanN7
Copy link
Contributor

RazvanN7 commented Jul 8, 2022

So, if I understand correctly, you then need to:

  • create a dmd pr to replace the calls to memset with calls to arraySet
  • create a druntime pr to delete the old hooks

@ibuclaw @kinke do you oppose this course of action?

@kinke
Copy link
Contributor

kinke commented Jul 8, 2022

LDC currently inlines these nano loops (in its glue layer), so has no need for any frontend lowering.

@ibuclaw
Copy link
Member

ibuclaw commented Jul 8, 2022

LDC currently inlines these nano loops (in its glue layer), so has no need for any frontend lowering.

Likewise here too. If you're going to lower array[] = n in the front-end anyway, at the very least:

  • Make this pragma(inline, true), to avoid any performance regression.
  • Probably don't want array[] = 0 lowered, as that's the most common case we already generate fast codegen for from the get go (without having to rely on the optimizer to understand what you mean).

@maxhaton
Copy link
Member

maxhaton commented Jul 8, 2022

LDC currently inlines these nano loops (in its glue layer), so has no need for any frontend lowering.

I'd be surprised this wasnt inlined (then unrolled) too. It's simple constant propagation to work out the loop is bounded.

I think the lowering should just be optional. dmd needs it because because injecting a loop into its IR tree is an unsolved problem, GDC and Ldc don't need it at all

@Geod24
Copy link
Member

Geod24 commented Jul 9, 2022

Druntime have been merged into DMD. Please re-submit your PR to dlang/dmd repository.

@Geod24 Geod24 closed this Jul 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
8 participants