-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: use memcpy for by-val aggregate type input parameters #1196
Conversation
- Bumps the Windows Rust Version to 1.77 for test runs - Applies 1.77 clippy and rustfmt suggestions
66f5927
to
8dc290a
Compare
8dc290a
to
6313e69
Compare
8bb710a
to
1b10811
Compare
3f6012f
to
d85b1d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let bitcast = self.llvm.builder.build_bitcast(ptr, ty, "bitcast").into_pointer_value(); | ||
let (size, alignment) = if let DataTypeInformation::String { size, encoding } = type_info | ||
{ | ||
// since passed string args might be larger than the local acceptor, we need to first memset the local variable to 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on this? I still don't understand why the memset
is needed here 😅 I initially thought the memset
was required to avoid garbage values in the alloca
call but thats not the case here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't copy the entire length of the locally allocated string, since the passed string might be larger than the acceptor:
FUNCTION foo
VAR_INPUT
str: STRING[3];
END_VAR
END_FUNCTION
foo('longer than 3');
And to ensure the last grapheme is in fact a null-terminator, we memset the entire string to 0. An alternative would be to GEP into the last element and set it to 0 ourselves, but I'm not sure if that is worth it, since memsetting to 0 should just be an XOR with the local variable.
As a side note, is this a good candidate to expand our performance tests to detect potential regressions? That is create a test case with many big aggregate types all passed by value and track their runtime behaviour in our dashboard? |
Sounds good. This would also allow to better test future front-end optimizations (e.g. more accurate byte-alignment for memset/memcpy calls, ...) |
Aggregate
VAR_INPUT
args to function calls are now generated/passed as pointers and thenmemcpy
d into a local variable instead of passing it by value and usingstore
.In order to achieve this, quite a bit of logic is moved from the
expression_generator
to thepou_generator
- in other words, the caller will now only bitcast an aggregate argument to its pointer (if necessary) and the function will take care of correctlymemset
ing/memcpy
ing.This results in significantly reduced allocations/IR in some cases, especially when passing member variables of
FUNCTION_BLOCK
/PROGRAM
structs or when passing aby-ref
arg on to aby-val
parameter:Where previously the caller had to allocate a local temporary variable and copy the value into it before passing it on to the callee, it is now sufficient to directly pass the pointer.
Using the same example as given in issue #1074
the
llc-14 --time-passes
benchmark improves significantly:master/store:
memcpy:
Pass execution timing
andinstruction selection and scheduling
improve by a factor of ~40000 and ~300000 respectively.Resolves #1074