Replace store instructions with memcpy for aggregate types #1074

volsa · 2024-01-22T15:22:38Z

Is your refactor request related to a problem? Please describe.
store instructions on big aggregate types degrade the compile times from milliseconds to seconds. For example

FUNCTION bar : DINT
    VAR_INPUT
        val : STRING[65536];
    END_VAR
END_FUNCTION

will generate the following IR

define i32 @bar([65537 x i8] %0) {
entry:
  %bar = alloca i32, align 4
  %val = alloca [65537 x i8], align 1
  store [65537 x i8] %0, [65537 x i8]* %val, align 1
  store i32 0, i32* %bar, align 4
  %bar_ret = load i32, i32* %bar, align 4
  ret i32 %bar_ret
}

with store [65537 x i8] %0, [65537 x i8]* %val, align 1 being the problematic line. Replacing the store with a memcpy (and the parameter from [65537 x i8] %0 to i8* %0) reduces the compile times back to milliseconds. This is also highlighted in Performance Tips for Frontend Authors.

Describe the solution you'd like
Pass aggregate types by reference, memcpy them into the local variable defined in VAR_INPUT and work on the memcpy'ed local variable.

Additional context
(Assumption, I'm not 100% sure if it's correct) Internally LLVM / clang will create an assembly file with thousands of load / store instructions, for example a string size of 40 000 will generate the following output with llc (which clang uses internally) bottlenecking the whole compilation to ~50 seconds.

rusty % llc --time-passes demo.st.ll
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 49.5846 seconds (49.8688 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
  28.3284 ( 58.1%)   0.0173 (  2.1%)  28.3457 ( 57.2%)  28.4224 ( 57.0%)  150773500042  Machine Instruction Scheduler
  17.5063 ( 35.9%)   0.0450 (  5.5%)  17.5513 ( 35.4%)  17.7522 ( 35.6%)  48934405502  AArch64 Instruction Selection
   1.9987 (  4.1%)   0.0084 (  1.0%)   2.0071 (  4.0%)   2.0098 (  4.0%)  30441538058  PostRA Machine Instruction Scheduler
   0.5410 (  1.1%)   0.7442 ( 90.3%)   1.2852 (  2.6%)   1.2873 (  2.6%)  14921125429  AArch64 Assembly Printer
...

[...]

===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 16.7923 seconds (16.9924 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
  14.1680 ( 84.6%)   0.0163 ( 44.5%)  14.1843 ( 84.5%)  14.3175 ( 84.3%)  13316322627  DAG Combining 1
   2.1697 ( 12.9%)   0.0102 ( 27.9%)   2.1799 ( 13.0%)   2.2438 ( 13.2%)  26155665730  Instruction Scheduling
   0.2379 (  1.4%)   0.0022 (  5.9%)   0.2401 (  1.4%)   0.2420 (  1.4%)  1286939464  Instruction Selection
...

The text was updated successfully, but these errors were encountered:

volsa added refactor internal change, cleanup, code-style-improvement performance codegen labels Jan 22, 2024

volsa self-assigned this Jan 22, 2024

mhasel assigned mhasel and unassigned volsa Feb 1, 2024

volsa added the high-priority label Feb 9, 2024

mhasel mentioned this issue Mar 25, 2024

function block call - "output is not a STRING" #1164

Closed

volsa mentioned this issue Apr 19, 2024

refactor: use memcpy for by-val aggregate type input parameters #1196

Merged

mhasel closed this as completed in #1196 Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace store instructions with memcpy for aggregate types #1074

Replace store instructions with memcpy for aggregate types #1074

volsa commented Jan 22, 2024 •

edited

Replace store instructions with memcpy for aggregate types #1074

Replace store instructions with memcpy for aggregate types #1074

Comments

volsa commented Jan 22, 2024 • edited

volsa commented Jan 22, 2024 •

edited