Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSE engine crashed when using +=1 to increment #48

Closed
chhzh123 opened this issue May 12, 2022 · 2 comments
Closed

DSE engine crashed when using +=1 to increment #48

chhzh123 opened this issue May 12, 2022 · 2 comments

Comments

@chhzh123
Copy link

This is somehow a weird problem. I wanted to generate the HLS code for the following input program (a simple convolutional layer), but the DSE engine crashed.

#define bs 4
#define oc 16
#define ic 6
#define ih 8
#define iw 8
#define kh 3
#define kw 3
#define oh 6
#define ow 6

void test_conv2d(float A[bs][ic][ih][iw], float B[oc][ic][kh][kw], float C[bs][oc][oh][ow]) {
#pragma scop
  for (int n = 0; n < bs; n += 1) {
    for (int c = 0; c < oc; c += 1) {
      for (int y = 0; y < oh; y += 1) {
        for (int x = 0; x < ow; x += 1) {
          float sum = 0;
          for (int rc = 0; rc < ic; rc += 1) {
            for (int rh = 0; rh < kh; rh += 1) {
              for (int rw = 0; rw < kw; rw += 1) {
                sum += A[n][rc][y+rh][x+rw] * B[c][rc][rh][rw];
          }}}
          C[n][c][y][x] = sum;
}}}}
#pragma endscop
}
scalehls-opt: /scratch/users/hc676/github/scalehls/polygeist/llvm-project/llvm/include/llvm/ADT/SmallVector.h:273: llvm::SmallVectorTemplateCommon::reference llvm::SmallVectorTemplateCommon<mlir::scalehls::LoopDesignPoint>::operator[](llvm::SmallVectorTemplateCommon::size_type) [T = mlir::scalehls::LoopDesignPoint]: Assertion `idx < size()' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: scalehls-opt test_conv2d.mlir -debug-only=scalehls "-scalehls-dse-pipeline=top-func=test_conv2d target-spec=../polybench/config.json"
 #0 0x00000000043c90ca llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /scratch/users/hc676/github/scalehls/polygeist/llvm-project/llvm/lib/Support/Unix/Signals.inc:565:11
 #1 0x00000000043c927b PrintStackTraceSignalHandler(void*) /scratch/users/hc676/github/scalehls/polygeist/llvm-project/llvm/lib/Support/Unix/Signals.inc:632:1
 #2 0x00000000043c7926 llvm::sys::RunSignalHandlers() /scratch/users/hc676/github/scalehls/polygeist/llvm-project/llvm/lib/Support/Signals.cpp:103:5
 #3 0x00000000043c9985 SignalHandler(int) /scratch/users/hc676/github/scalehls/polygeist/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
 #4 0x00007fec525ab630 __restore_rt sigaction.c:0:0
 #5 0x00007fec51198387 raise (/lib64/libc.so.6+0x36387)
 #6 0x00007fec51199a78 abort (/lib64/libc.so.6+0x37a78)
 #7 0x00007fec511911a6 __assert_fail_base (/lib64/libc.so.6+0x2f1a6)
 #8 0x00007fec51191252 (/lib64/libc.so.6+0x2f252)
 #9 0x00000000034910e9 llvm::SmallVectorTemplateCommon<mlir::scalehls::LoopDesignPoint, void>::operator[](unsigned long) /scratch/users/hc676/github/scalehls/polygeist/llvm-project/llvm/include/llvm/ADT/SmallVector.h:0:5
#10 0x0000000003489c65 void updateParetoPoints<mlir::scalehls::LoopDesignPoint>(llvm::SmallVector<mlir::scalehls::LoopDesignPoint, 16u>&) /scratch/users/hc676/github/scalehls/lib/Transforms/DesignSpaceExplore.cpp:37:22
#11 0x0000000003489c2c mlir::scalehls::LoopDesignSpace::initializeLoopDesignSpace(unsigned int) /scratch/users/hc676/github/scalehls/lib/Transforms/DesignSpaceExplore.cpp:282:1
#12 0x000000000348c7da mlir::scalehls::ScaleHLSExplorer::exploreDesignSpace(mlir::func::FuncOp, bool, llvm::StringRef, llvm::StringRef) /scratch/users/hc676/github/scalehls/lib/Transforms/DesignSpaceExplore.cpp:746:5
#13 0x000000000348d0a9 mlir::scalehls::ScaleHLSExplorer::applyDesignSpaceExplore(mlir::func::FuncOp, bool, llvm::StringRef, llvm::StringRef) /scratch/users/hc676/github/scalehls/lib/Transforms/DesignSpaceExplore.cpp:819:7
#14 0x000000000348feff (anonymous namespace)::DesignSpaceExplore::runOnOperation() /scratch/users/hc676/github/scalehls/lib/Transforms/DesignSpaceExplore.cpp:894:5
#15 0x0000000003fdad12 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) /scratch/users/hc676/github/scalehls/polygeist/llvm-project/mlir/lib/Pass/Pass.cpp:402:21
#16 0x0000000003fdb2b6 mlir::detail::OpToOpPassAdaptor::runPipeline(llvm::iterator_range<llvm::pointee_iterator<std::unique_ptr<mlir::Pass, std::default_delete<mlir::Pass> >*, mlir::Pass> >, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) /scratch/users/hc676/github/scalehls/polygeist/llvm-project/mlir/lib/Pass/Pass.cpp:463:16
#17 0x0000000003fdc886 mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) /scratch/users/hc676/github/scalehls/polygeist/llvm-project/mlir/lib/Pass/Pass.cpp:705:10
#18 0x0000000003fdc785 mlir::PassManager::run(mlir::Operation*) /scratch/users/hc676/github/scalehls/polygeist/llvm-project/mlir/lib/Pass/Pass.cpp:685:60
#19 0x0000000003432bad performActions(llvm::raw_ostream&, bool, bool, llvm::SourceMgr&, mlir::MLIRContext*, llvm::function_ref<mlir::LogicalResult (mlir::PassManager&)>) /scratch/users/hc676/github/scalehls/polygeist/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:79:17
#20 0x0000000003431aa3 processBuffer(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, bool, bool, bool, bool, llvm::function_ref<mlir::LogicalResult (mlir::PassManager&)>, mlir::DialectRegistry&, llvm::ThreadPool*) /scratch/users/hc676/github/scalehls/polygeist/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:119:12
#21 0x0000000003431865 mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, llvm::function_ref<mlir::LogicalResult (mlir::PassManager&)>, mlir::DialectRegistry&, bool, bool, bool, bool, bool) /scratch/users/hc676/github/scalehls/polygeist/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:169:10
#22 0x0000000003431c6a mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, mlir::PassPipelineCLParser const&, mlir::DialectRegistry&, bool, bool, bool, bool, bool) /scratch/users/hc676/github/scalehls/polygeist/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:189:10
#23 0x0000000003432851 mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&, bool) /scratch/users/hc676/github/scalehls/polygeist/llvm-project/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:293:14
#24 0x0000000001934b8c main /scratch/users/hc676/github/scalehls/tools/scalehls-opt/scalehls-opt.cpp:16:23
#25 0x00007fec51184555 __libc_start_main (/lib64/libc.so.6+0x22555)
#26 0x0000000001934a69 _start (/scratch/users/hc676/github/scalehls/build/bin/scalehls-opt+0x1934a69)

However, if I use ++ to increment instead of +=1, this problem will not occur. For comparison, the MLIR assembly of the +=1 version is shown below.

module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.endianness", "little">, #dlti.dl_entry<i64, dense<64> : vector<2xi32>>, #dlti.dl_entry<f80, dense<128> : vector<2xi32>>, #dlti.dl_entry<i1, dense<8> : vector<2xi32>>, #dlti.dl_entry<i8, dense<8> : vector<2xi32>>, #dlti.dl_entry<i16, dense<16> : vector<2xi32>>, #dlti.dl_entry<i32, dense<32> : vector<2xi32>>, #dlti.dl_entry<f16, dense<16> : vector<2xi32>>, #dlti.dl_entry<f64, dense<64> : vector<2xi32>>, #dlti.dl_entry<f128, dense<128> : vector<2xi32>>>, llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", llvm.target_triple = "x86_64-unknown-linux-gnu"} {
  func @test_conv2d(%arg0: memref<4x6x8x8xf32>, %arg1: memref<16x6x3x3xf32>, %arg2: memref<4x16x6x6xf32>) attributes {llvm.linkage = #llvm.linkage<external>} {
    %cst = arith.constant 0.000000e+00 : f32
    affine.for %arg3 = 0 to 4 {
      affine.for %arg4 = 0 to 16 {
        affine.for %arg5 = 0 to 6 {
          affine.for %arg6 = 0 to 6 {
            %0 = affine.for %arg7 = 0 to 6 iter_args(%arg8 = %cst) -> (f32) {
              %1 = affine.for %arg9 = 0 to 3 iter_args(%arg10 = %arg8) -> (f32) {
                %2 = affine.for %arg11 = 0 to 3 iter_args(%arg12 = %arg10) -> (f32) {
                  %3 = affine.load %arg0[%arg3, %arg7, %arg5 + %arg9, %arg6 + %arg11] : memref<4x6x8x8xf32>
                  %4 = affine.load %arg1[%arg4, %arg7, %arg9, %arg11] : memref<16x6x3x3xf32>
                  %5 = arith.mulf %3, %4 : f32
                  %6 = arith.addf %arg12, %5 : f32
                  affine.yield %6 : f32
                }
                affine.yield %2 : f32
              }
              affine.yield %1 : f32
            }
            affine.store %0, %arg2[%arg3, %arg4, %arg5, %arg6] : memref<4x16x6x6xf32>
          }
        }
      }
    }
    return
  }
}

while the ++ version does not use iter_args. I don't know why this is the case.

module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.endianness", "little">, #dlti.dl_entry<i64, dense<64> : vector<2xi32>>, #dlti.dl_entry<f80, dense<128> : vector<2xi32>>, #dlti.dl_entry<i1, dense<8> : vector<2xi32>>, #dlti.dl_entry<i8, dense<8> : vector<2xi32>>, #dlti.dl_entry<i16, dense<16> : vector<2xi32>>, #dlti.dl_entry<i32, dense<32> : vector<2xi32>>, #dlti.dl_entry<f16, dense<16> : vector<2xi32>>, #dlti.dl_entry<f64, dense<64> : vector<2xi32>>, #dlti.dl_entry<f128, dense<128> : vector<2xi32>>>, llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", llvm.target_triple = "x86_64-unknown-linux-gnu"} {
  func @test_conv2d(%arg0: memref<4x6x8x8xf32>, %arg1: memref<16x6x3x3xf32>, %arg2: memref<4x16x6x6xf32>) attributes {llvm.linkage = #llvm.linkage<external>} {
    %cst = arith.constant 0.000000e+00 : f32
    %0 = memref.alloca() : memref<1xf32>
    %1 = llvm.mlir.undef : f32
    affine.store %1, %0[0] : memref<1xf32>
    affine.for %arg3 = 0 to 4 {
      affine.for %arg4 = 0 to 16 {
        affine.for %arg5 = 0 to 6 {
          affine.for %arg6 = 0 to 6 {
            affine.store %cst, %0[0] : memref<1xf32>
            affine.for %arg7 = 0 to 6 {
              affine.for %arg8 = 0 to 3 {
                affine.for %arg9 = 0 to 3 {
                  %3 = affine.load %arg0[%arg3, %arg7, %arg5 + %arg8, %arg6 + %arg9] : memref<4x6x8x8xf32>
                  %4 = affine.load %arg1[%arg4, %arg7, %arg8, %arg9] : memref<16x6x3x3xf32>
                  %5 = arith.mulf %3, %4 : f32
                  %6 = affine.load %0[0] : memref<1xf32>
                  %7 = arith.addf %6, %5 : f32
                  affine.store %7, %0[0] : memref<1xf32>
                }
              }
            }
            %2 = affine.load %0[0] : memref<1xf32>
            affine.store %2, %arg2[%arg3, %arg4, %arg5, %arg6] : memref<4x16x6x6xf32>
          }
        }
      }
    }
    return
  }
}
@hanchenye
Copy link
Collaborator

Hi @chhzh123, I pushed a small patch to fix this issue - 0f9627a.

Basically, ScaleHLS DSE is not supporting iter_args currently, so we always apply materialize-reduction and buffer-loop-hoisting in the scalehls-dse-pipeline to remove iter_args and hoist the generated local buffers outside of the loop nest. However, the materialize-reduction pass can generate some redundant memory loads and stores, which may decrease the performance the DSE can achieve. For now, we don't have a optimization pass to remove these redundancies.

For the parsing result difference between ++ and += 1 in mlir-clang, I'm not sure whether this is intended or actually an issue to resolve. Could you open an issue in the Polygeist repository for this if it is not already existed?

@chhzh123
Copy link
Author

Okay, thanks a lot! I'll file an issue in the Polygeist repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants