Issue 19946 - Add memset template function #3837
Conversation
Thanks for your pull request and interest in making D better, @dkorpel! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla references
Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub run digger -- build "master + druntime#3837" |
src/rt/memset.d
Outdated
array = array to set | ||
value = value to set each array element to | ||
*/ | ||
void _d_memset(T)(T[] array, T value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const on the RHS?
src/rt/memset.d
Outdated
@@ -24,6 +24,36 @@ extern (C) | |||
|
|||
extern (C): | |||
|
|||
/** | |||
Set all elements of an array to be a specified value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set all elements of an array to be a specified value. | |
Set all elements of an array to a specified value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core.internal.memset?
IIRC this rt module is not compilable due to dmd-specific assembly.
This is dmd specific unless you want your lowering to gain a function call |
Doesn't sound dmd-specific to me. |
It should be dmd specific, put it that way. It can be made optional with a frontend param. |
We can just move the function to a different file, but it really doesn't matter since the template instance will go in the resulted object file. |
It does matter if you can't compile the module it's located in. |
The module what is contained in? The whole point of this is that dmd errors with betterC because it assumes some code is present when its not, hence a template. |
How can this work in the first place? The IMO, these dummy-templatizations for |
Yada, yada. |
The existing functions will be removed. Arguably the whole module then just move this into arrayop perhaps. Note that arrayOp can nearly do memset |
src/rt/memset.d
Outdated
void _d_memset(T)(T[] array, T value) | ||
{ | ||
foreach (i; 0 .. array.length) | ||
array[i] = value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this inefficient assignment? Please fallback to the actual memset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to start with something simple that works. memset
doesn't support multi-byte values and is not available at CTFE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to start with something simple that works.
memset
doesn't support multi-byte values and is not available at CTFE.
Ah sure, I understand. There's already intrinsics (below) to do multi-byte assignment, tho. But yeah, I agree that the complete version should be done separately as it might require a more complex/complete implementation.
I'll leave a list of things we need to account for:
- ellaborate assignment and CTFE (operator overloading, this implementation)
- single byte assignment (straight memset call)
- multi byte assignment (possibly a generic and inline assembly versions, would be cool to have a generic version with the
__vector
API)
For the byte assignments we need to check for immutable qualifiers or for copy ctors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler will automatically do this.
No one cares or should care about dmd performance, LLVM and GCC both go to some length to find these kinds of assignments and optimize them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KIFSS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler will automatically do this.
No one cares or should care about dmd performance, LLVM and GCC both go to some length to find these kinds of assignments and optimize them.
It does for single-byte assignment, although, for multiple bytes like 32-bit size, it does a poor job. See https://godbolt.org/z/994n163ao . I would also value not having such performance regression on DMD tho. We can probably do a cleaner generic implementation with the __vector
API, but I agree doing those on separate PRs. I believe there is a huge performance benefit here. This is also a cool thing to propose on the LLVM optimization/transformation passes. I'm not sure how about cache/branch predictor would behave with this inline asm but I think it doesn't matter, given the generated assembly.
That's a good stand point, but currently:
We can't have all 3, and the template solution would remove point 2. You're either suggesting to remove point 3 by asking users to link memset.o, or point 1 when the compiler does it automatically. That's essentially rebranding betterC to be 'light druntime' instead of 'no druntime'. That might not be a bad idea, after all some users call -betterC 'worseD' or 'a cancer' and advocate for custom light druntimes instead. It would be a bigger commitment that requires approval from Walter and Atila though. |
Is there a version other than |
@dkorpel what are your plans with this PR? |
I think I'll follow Iain's suggestion and move it to |
LDC currently inlines these nano loops (in its glue layer), so has no need for any frontend lowering. |
Likewise here too. If you're going to lower
|
I'd be surprised this wasnt inlined (then unrolled) too. It's simple constant propagation to work out the loop is bounded. I think the lowering should just be optional. dmd needs it because because injecting a loop into its IR tree is an unsolved problem, GDC and Ldc don't need it at all |
Druntime have been merged into DMD. Please re-submit your PR to |
This allows dmd to lower array assignment to memset in the frontend instead of the glue layer.