Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: UDAs to automatically do inlining with LDC #90

Open
JohanEngelen opened this issue Jan 27, 2022 · 6 comments
Open

Idea: UDAs to automatically do inlining with LDC #90

JohanEngelen opened this issue Jan 27, 2022 · 6 comments

Comments

@JohanEngelen
Copy link
Contributor

It is cumbersome / problematic to have to pass --enable-cross-module-inlining to LDC to get good performance; this may not be an option in a larger project.
Perhaps adding a UDA to each function can be used to force inlining of functions that are known to optimize to single (or a few) instructions?

Something like:

static if (SSE2_with_LDC) { 
  import ldc.attributes;
  enum SSE2_inline = llvmAttr("alwaysinline");
} else static if (SSE2_with_GDC) { 
  // GDC probably has some UDA for this
} else {
  alias SSE2_inline = void; // Don't force inlining for emulated cases.
}

@SSE2_inline
void _mm_some_SSE2_intrinsic(){}

In case this does not help with cross module inlining, then perhaps it will if every function is turned into a template (with zero arguments)

@p0nce
Copy link
Collaborator

p0nce commented Jan 27, 2022

Can pragma(inline, bool) be used for the same effect? (some of the most used intrinsics are marked pragma(inline, true) )

@JohanEngelen
Copy link
Contributor Author

pragma(inline, ...) will work, but I don't know how to conditionally apply it. (and it is better to put it on the outside than on inside, because then perhaps we force codegen..?) If it is a template, then semantic analysis will happen and it can be put inside aswell (that is the easiest I suppose...)

@p0nce
Copy link
Collaborator

p0nce commented Jan 27, 2022

When it was introduced pragma(inline) was supposed to take a compile-time parameter as argument (to inline or not) and I've never heard that you could put it before the function instead of inside.

I'd prefer pragma(inline) inside; thus you can choose for which compiler+instruction set you want to inline the function (typically this exposed DMD bugs as soon as pragma(inline) was introduced) also the LDC_with_arm64 could be very long but LDC_with_SSE42 could be very short, and they will need different inlining choices. Thus the UDA solution I'm not a fan.

@p0nce
Copy link
Collaborator

p0nce commented Jan 27, 2022

and it is better to put it on the outside than on inside, because then perhaps we force codegen..?

I don't get this.

@JohanEngelen
Copy link
Contributor Author

If the function is not a template and is called from another module, the compiler will not do semantic analysis of the function body and will not see the pragma(inline) inside; so in general I recommend putting it outside the function (very often it is wrongly applied inside and has no effect). The problem with the pragma is that you cannot make it default to nothing. It is either a forced inline, or a forced not-inline. Perhaps all intrinsics should be templates, but due to template culling I'm not sure if codegen will happen. But at least semantic analysis is probably always happening, i.e. the compiler will see the pragma(inline) inside. (I'm not 100% sure)

@p0nce
Copy link
Collaborator

p0nce commented Jan 28, 2022

In general my feeling about intel-intrinsics (vs vanilla code) currently is, from worst problem to least problem:

  1. slower in debug (it builds slower) with LDC. Tricks to enhance this are welcome!
  2. slower in debug (it runs slower) with LDC. Though you can also win vs vanilla even without optimizations sometimes.
  3. other problems, such as avoiding DMD codegen problems

If forced inlining makes (2) better it should also not make (1) worse.

If you have ideas to make intrinsics code builds faster they are welcome (thought inlining, templating, or else).
Some code is voluntarily templated to avoid being generated (stricmp emulation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants