refactor: use memcpy for by-val aggregate type input parameters #1196

mhasel · 2024-04-10T08:18:15Z

Aggregate VAR_INPUT args to function calls are now generated/passed as pointers and then memcpyd into a local variable instead of passing it by value and using store.
In order to achieve this, quite a bit of logic is moved from the expression_generator to the pou_generator - in other words, the caller will now only bitcast an aggregate argument to its pointer (if necessary) and the function will take care of correctly memseting/memcpying.
This results in significantly reduced allocations/IR in some cases, especially when passing member variables of FUNCTION_BLOCK/PROGRAM structs or when passing a by-ref arg on to a by-val parameter:
Where previously the caller had to allocate a local temporary variable and copy the value into it before passing it on to the callee, it is now sufficient to directly pass the pointer.

Using the same example as given in issue #1074

FUNCTION bar : DINT
    VAR_INPUT
        val : STRING[65536];
    END_VAR
END_FUNCTION

the llc-14 --time-passes benchmark improves significantly:

master/store:

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 69.0989 seconds (69.0998 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  64.0869 ( 93.0%)   0.0700 ( 43.7%)  64.1569 ( 92.8%)  64.1579 ( 92.8%)  X86 DAG->DAG Instruction Selection
   4.6626 (  6.8%)   0.0000 (  0.0%)   4.6626 (  6.7%)   4.6626 (  6.7%)  Machine Instruction Scheduler
   0.0767 (  0.1%)   0.0900 ( 56.2%)   0.1667 (  0.2%)   0.1667 (  0.2%)  X86 Assembly Printer

...

===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 61.4012 seconds (61.4021 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  60.5939 ( 98.7%)   0.0300 ( 75.0%)  60.6238 ( 98.7%)  60.6248 ( 98.7%)  DAG Combining 1
   0.3744 (  0.6%)   0.0000 (  0.0%)   0.3744 (  0.6%)   0.3744 (  0.6%)  Instruction Selection
   0.1517 (  0.2%)   0.0000 (  0.0%)   0.1517 (  0.2%)   0.1517 (  0.2%)  DAG Combining 2
   0.1485 (  0.2%)   0.0000 (  0.0%)   0.1485 (  0.2%)   0.1485 (  0.2%)  Instruction Scheduling
   0.0481 (  0.1%)   0.0000 (  0.0%)   0.0481 (  0.1%)   0.0481 (  0.1%)  DAG Legalization
 
...

memcpy:

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0016 seconds (0.0017 wall clock)

   ---User Time---   --User+System--   ---Wall Time---  --- Name ---
   0.0004 ( 23.3%)   0.0004 ( 23.3%)   0.0004 ( 23.3%)  X86 DAG->DAG Instruction Selection
   0.0004 ( 23.2%)   0.0004 ( 23.2%)   0.0004 ( 23.1%)  Expand Atomic instructions
   0.0002 ( 10.6%)   0.0002 ( 10.6%)   0.0002 ( 10.6%)  X86 Assembly Printer

...

===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0002 seconds (0.0002 wall clock)

   ---User Time---   --User+System--   ---Wall Time---  --- Name ---
   0.0001 ( 46.7%)   0.0001 ( 46.7%)   0.0001 ( 46.9%)  Instruction Selection
   0.0000 ( 19.5%)   0.0000 ( 19.5%)   0.0000 ( 19.8%)  DAG Combining 1
   0.0000 ( 13.8%)   0.0000 ( 13.8%)   0.0000 ( 13.5%)  Instruction Scheduling
   0.0000 ( 10.5%)   0.0000 ( 10.5%)   0.0000 ( 10.4%)  Instruction Creation
   0.0000 (  3.8%)   0.0000 (  3.8%)   0.0000 (  3.5%)  DAG Combining 2
   0.0000 (  3.3%)   0.0000 (  3.3%)   0.0000 (  3.2%)  DAG Legalization

...

Pass execution timing and instruction selection and scheduling improve by a factor of ~40000 and ~300000 respectively.

Resolves #1074

- Bumps the Windows Rust Version to 1.77 for test runs - Applies 1.77 clippy and rustfmt suggestions

…rusty into by-ref-aggregate-types

volsa

Maybe @ghaith or @riederm can double-check but looks good from my end.

p.s. loving these LLVM IR reductions 🤤

tests/correctness/strings.rs

volsa · 2024-04-22T10:47:30Z

src/codegen/generators/pou_generator.rs

+                    let bitcast = self.llvm.builder.build_bitcast(ptr, ty, "bitcast").into_pointer_value();
+                    let (size, alignment) = if let DataTypeInformation::String { size, encoding } = type_info
+                    {
+                        // since passed string args might be larger than the local acceptor, we need to first memset the local variable to 0


Can you elaborate on this? I still don't understand why the memset is needed here 😅 I initially thought the memset was required to avoid garbage values in the alloca call but thats not the case here?

We don't copy the entire length of the locally allocated string, since the passed string might be larger than the acceptor:

FUNCTION foo VAR_INPUT str: STRING[3]; END_VAR END_FUNCTION foo('longer than 3');

And to ensure the last grapheme is in fact a null-terminator, we memset the entire string to 0. An alternative would be to GEP into the last element and set it to 0 ourselves, but I'm not sure if that is worth it, since memsetting to 0 should just be an XOR with the local variable.

volsa · 2024-04-22T11:06:46Z

As a side note, is this a good candidate to expand our performance tests to detect potential regressions? That is create a test case with many big aggregate types all passed by value and track their runtime behaviour in our dashboard?

mhasel · 2024-04-22T11:09:39Z

As a side note, is this a good candidate to expand our performance tests to detect potential regressions? That is create a test case with many big aggregate types all passed by value and track their runtime behaviour in our dashboard?

Sounds good. This would also allow to better test future front-end optimizations (e.g. more accurate byte-alignment for memset/memcpy calls, ...)

volsa and others added 4 commits April 2, 2024 10:28

ci: Prepare for Rust 1.77 bump

3e83f91

- Bumps the Windows Rust Version to 1.77 for test runs - Applies 1.77 clippy and rustfmt suggestions

init

b8b1d47

wip

dc0d8ea

function accessors

1a61f42

mhasel force-pushed the by-ref-aggregate-types branch from 66f5927 to 8dc290a Compare April 10, 2024 08:19

Merge branch 'master' into by-ref-aggregate-types

6313e69

mhasel force-pushed the by-ref-aggregate-types branch from 8dc290a to 6313e69 Compare April 10, 2024 08:20

mhasel added 6 commits April 10, 2024 13:14

by-val auto-pointer created only in var_input blocks

fb802f8

bitcast by-val aggregate types for function calls

9b9dcd8

arrays working for functions

6104e09

add more tests

faebba1

by-val string function args working

158b6f3

hacky but working structs

1b10811

mhasel force-pushed the by-ref-aggregate-types branch from 8bb710a to 1b10811 Compare April 12, 2024 12:43

mhasel added 8 commits April 12, 2024 14:43

-Merge branch 'master' into by-ref-aggregate-types

5df3f76

fix var_input {ref}

44cf8a2

revert resolver/index-visitor changes

e23a67a

codegen magic

92d2787

remove obsolete comments

bf5c18e

revert validation change

7240ca5

all green

0d98eb4

Merge branch 'master' into by-ref-aggregate-types

9d8fc8c

mhasel changed the title ~~fix: use memcpy for by-val aggregate type parameters~~ fix: use memcpy for by-val aggregate type input parameters Apr 17, 2024

mhasel and others added 4 commits April 18, 2024 10:37

Merge branch 'master' into by-ref-aggregate-types

c7253a6

add adr documentation

e4cc642

Merge branch 'by-ref-aggregate-types' of https://github.com/PLC-lang/…

1b2be3c

…rusty into by-ref-aggregate-types

cleanup

d85b1d7

mhasel force-pushed the by-ref-aggregate-types branch from 3f6012f to d85b1d7 Compare April 19, 2024 12:43

remove unreachable

75ab85f

mhasel marked this pull request as ready for review April 19, 2024 13:48

mhasel requested review from ghaith and riederm April 19, 2024 13:48

mhasel added 3 commits April 19, 2024 16:00

fix vistor/resolver diff

0e17f09

refactor conditionals

61c32d2

fmt

cf203ee

mhasel changed the title ~~fix: use memcpy for by-val aggregate type input parameters~~ refactor: use memcpy for by-val aggregate type input parameters Apr 19, 2024

Merge branch 'master' into by-ref-aggregate-types

ec47ac5

volsa previously approved these changes Apr 22, 2024

View reviewed changes

This was referenced Apr 22, 2024

Add performance tests for by-val aggregate arguments #1209

Open

Missing size-mismatch validation for strings passed by VAR_INPUT #1210

Open

mhasel and others added 2 commits April 23, 2024 16:02

Merge branch 'master' into by-ref-aggregate-types

686358c

feedback

bab879b

mhasel dismissed volsa’s stale review via bab879b April 24, 2024 13:48

mhasel requested a review from volsa April 26, 2024 07:21

volsa approved these changes Apr 26, 2024

View reviewed changes

mhasel merged commit 7807d9d into master Apr 26, 2024
15 checks passed

mhasel deleted the by-ref-aggregate-types branch April 26, 2024 10:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: use memcpy for by-val aggregate type input parameters #1196

refactor: use memcpy for by-val aggregate type input parameters #1196

mhasel commented Apr 10, 2024 •

edited

volsa left a comment •

edited

volsa Apr 22, 2024 •

edited

mhasel Apr 22, 2024 •

edited

volsa commented Apr 22, 2024

mhasel commented Apr 22, 2024

refactor: use memcpy for by-val aggregate type input parameters #1196

refactor: use memcpy for by-val aggregate type input parameters #1196

Conversation

mhasel commented Apr 10, 2024 • edited

volsa left a comment • edited

Choose a reason for hiding this comment

volsa Apr 22, 2024 • edited

Choose a reason for hiding this comment

mhasel Apr 22, 2024 • edited

Choose a reason for hiding this comment

volsa commented Apr 22, 2024

mhasel commented Apr 22, 2024

mhasel commented Apr 10, 2024 •

edited

volsa left a comment •

edited

volsa Apr 22, 2024 •

edited

mhasel Apr 22, 2024 •

edited