- Something that I've noticed is that Flang doesn't seem to vectorize its operations when it seems like it could benefit from doing so. Both clang and flang process certain operations in blocks of four, but it seems like only clang vectorizes the operations (See fnc_add_scalar.f90 vs fnc_add_scalar.c). I'm not sure if there's a compiler flag that enables vectorization, but if not, this might be a good opportunity for optimization. EDIT: this is a known issue and is being worked on. It's related to this task.
- I found that the
resize
underperforms in flang binaries compared to gfortran binaries. It doesn't seem to be an inlining problem like I guessed earlier, but rather an issue of allocation. Inresize_test
,resh
allocates 3 arrays on the stack and calls malloc, whereasresh_manual
and the C function are able to transpose the array in place. The same can be seen fortranspose
. This could be a good opportunity for optimization in cases where the size of both the original and output arrays are known at compile time. EDIT: this is a part of the ongoing array reduce copy effort.
- Make sure to manually specify where the compiled runtime libraries are so the linker can actually work
- Flang's frontend doesn't make any premature stride optimizations when it comes to looping through arrays. If you see a lot of multiplying by 1 and adding 0 in the unoptimized LLVM IR, it's because the frontend doesn't make exceptions when iterating with a step of 1.