Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace store instructions with memcpy for aggregate types #1074

Closed
volsa opened this issue Jan 22, 2024 · 0 comments · Fixed by #1196
Closed

Replace store instructions with memcpy for aggregate types #1074

volsa opened this issue Jan 22, 2024 · 0 comments · Fixed by #1196
Assignees
Labels
codegen high-priority performance refactor internal change, cleanup, code-style-improvement

Comments

@volsa
Copy link
Member

volsa commented Jan 22, 2024

Is your refactor request related to a problem? Please describe.
store instructions on big aggregate types degrade the compile times from milliseconds to seconds. For example

FUNCTION bar : DINT
    VAR_INPUT
        val : STRING[65536];
    END_VAR
END_FUNCTION

will generate the following IR

define i32 @bar([65537 x i8] %0) {
entry:
  %bar = alloca i32, align 4
  %val = alloca [65537 x i8], align 1
  store [65537 x i8] %0, [65537 x i8]* %val, align 1
  store i32 0, i32* %bar, align 4
  %bar_ret = load i32, i32* %bar, align 4
  ret i32 %bar_ret
}

with store [65537 x i8] %0, [65537 x i8]* %val, align 1 being the problematic line. Replacing the store with a memcpy (and the parameter from [65537 x i8] %0 to i8* %0) reduces the compile times back to milliseconds. This is also highlighted in Performance Tips for Frontend Authors.

Describe the solution you'd like
Pass aggregate types by reference, memcpy them into the local variable defined in VAR_INPUT and work on the memcpy'ed local variable.

Additional context
(Assumption, I'm not 100% sure if it's correct) Internally LLVM / clang will create an assembly file with thousands of load / store instructions, for example a string size of 40 000 will generate the following output with llc (which clang uses internally) bottlenecking the whole compilation to ~50 seconds.

rusty % llc --time-passes demo.st.ll
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 49.5846 seconds (49.8688 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
  28.3284 ( 58.1%)   0.0173 (  2.1%)  28.3457 ( 57.2%)  28.4224 ( 57.0%)  150773500042  Machine Instruction Scheduler
  17.5063 ( 35.9%)   0.0450 (  5.5%)  17.5513 ( 35.4%)  17.7522 ( 35.6%)  48934405502  AArch64 Instruction Selection
   1.9987 (  4.1%)   0.0084 (  1.0%)   2.0071 (  4.0%)   2.0098 (  4.0%)  30441538058  PostRA Machine Instruction Scheduler
   0.5410 (  1.1%)   0.7442 ( 90.3%)   1.2852 (  2.6%)   1.2873 (  2.6%)  14921125429  AArch64 Assembly Printer
...

[...]

===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 16.7923 seconds (16.9924 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  ---Instr---  --- Name ---
  14.1680 ( 84.6%)   0.0163 ( 44.5%)  14.1843 ( 84.5%)  14.3175 ( 84.3%)  13316322627  DAG Combining 1
   2.1697 ( 12.9%)   0.0102 ( 27.9%)   2.1799 ( 13.0%)   2.2438 ( 13.2%)  26155665730  Instruction Scheduling
   0.2379 (  1.4%)   0.0022 (  5.9%)   0.2401 (  1.4%)   0.2420 (  1.4%)  1286939464  Instruction Selection
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codegen high-priority performance refactor internal change, cleanup, code-style-improvement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants