Skip to content

Memory error in dswap_k/dgetf2_k #671

@kronbichler

Description

@kronbichler

For a (admittedly corner case) simple 8x8 matrix inversion problem according to the code:

#include <limits>

extern "C"
{
  void dgetrf_ (const int *m, const int *n, double *A,
                const int *lda, int *ipiv, int *info);
  void dgetri_ (const int *n, double *A, const int *lda,
                int *ipiv, double *inv_work, const int *lwork, int *info);
}

int main()
{
  const int N = 8;
  const int lwork = 2*N;
  double *mat = new double[N*N];
  int *ipiv = new int[N];
  double *work = new double[lwork];
  int info = 0;
  for (int i=0; i<N; ++i)
    for (int j=0; j<N; ++j)
      mat[i*N+j] = -std::numeric_limits<double>::quiet_NaN();

  dgetrf_ (&N, &N, mat, &N, ipiv, &info);
  dgetri_ (&N, mat, &N, ipiv, work, &lwork, &info);

  return info;
}

I get memory access errors in both the factorization phase and the inversion phase:

mklap4:openblas_bug$ g++ -L/home/kronbichler/sw/lib/ -lopenblas -Wl,-rpath=/home/kronbichler/sw/lib/ -lopenblas test.cc 
mklap4:openblas_bug$ valgrind ./a.out 
==8649== Memcheck, a memory error detector
==8649== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==8649== Using Valgrind-3.11.0.SVN and LibVEX; rerun with -h for copyright info
==8649== Command: ./a.out
==8649== 
==8649== Invalid read of size 8
==8649==    at 0x4D0DF3A: dgetf2_k (in /home/kronbichler/sw/lib/libopenblas_haswell-r0.2.14.so)
==8649==    by 0x4D0CD75: dgetrf_single (in /home/kronbichler/sw/lib/libopenblas_haswell-r0.2.14.so)
==8649==    by 0x4AAEA64: dgetrf_ (in /home/kronbichler/sw/lib/libopenblas_haswell-r0.2.14.so)
==8649==    by 0x4008D2: main (in /home/kronbichler/Work/deal_tests/trilinos_tests/openblas_bug/a.out)
==8649==  Address 0x67a61e0 is 0 bytes after a block of size 512 alloc'd
==8649==    at 0x402D81C: operator new[](unsigned long) (vg_replace_malloc.c:422)
==8649==    by 0x400825: main (in /home/kronbichler/Work/deal_tests/trilinos_tests/openblas_bug/a.out)
==8649== 
==8649== Invalid write of size 8
==8649==    at 0x4D0DF44: dgetf2_k (in /home/kronbichler/sw/lib/libopenblas_haswell-r0.2.14.so)
==8649==    by 0x4D0CD75: dgetrf_single (in /home/kronbichler/sw/lib/libopenblas_haswell-r0.2.14.so)
==8649==    by 0x4AAEA64: dgetrf_ (in /home/kronbichler/sw/lib/libopenblas_haswell-r0.2.14.so)
==8649==    by 0x4008D2: main (in /home/kronbichler/Work/deal_tests/trilinos_tests/openblas_bug/a.out)
==8649==  Address 0x67a61e0 is 0 bytes after a block of size 512 alloc'd
==8649==    at 0x402D81C: operator new[](unsigned long) (vg_replace_malloc.c:422)
==8649==    by 0x400825: main (in /home/kronbichler/Work/deal_tests/trilinos_tests/openblas_bug/a.out)
==8649== 
==8649== Invalid read of size 16
==8649==    at 0x4BEF93D: dswap_k (in /home/kronbichler/sw/lib/libopenblas_haswell-r0.2.14.so)
==8649==    by 0x4AA4B86: dswap_ (in /home/kronbichler/sw/lib/libopenblas_haswell-r0.2.14.so)
==8649==    by 0x4E56F20: dgetri_ (in /home/kronbichler/sw/lib/libopenblas_haswell-r0.2.14.so)
==8649==    by 0x4008FB: main (in /home/kronbichler/Work/deal_tests/trilinos_tests/openblas_bug/a.out)
==8649==  Address 0x67a61e0 is 0 bytes after a block of size 512 alloc'd
==8649==    at 0x402D81C: operator new[](unsigned long) (vg_replace_malloc.c:422)
==8649==    by 0x400825: main (in /home/kronbichler/Work/deal_tests/trilinos_tests/openblas_bug/a.out)
==8649== 
==8649== Invalid write of size 8
==8649==    at 0x4BEF942: dswap_k (in /home/kronbichler/sw/lib/libopenblas_haswell-r0.2.14.so)
==8649==    by 0x4AA4B86: dswap_ (in /home/kronbichler/sw/lib/libopenblas_haswell-r0.2.14.so)
==8649==    by 0x4E56F20: dgetri_ (in /home/kronbichler/sw/lib/libopenblas_haswell-r0.2.14.so)
==8649==    by 0x4008FB: main (in /home/kronbichler/Work/deal_tests/trilinos_tests/openblas_bug/a.out)
==8649==  Address 0x67a61e0 is 0 bytes after a block of size 512 alloc'd
==8649==    at 0x402D81C: operator new[](unsigned long) (vg_replace_malloc.c:422)
==8649==    by 0x400825: main (in /home/kronbichler/Work/deal_tests/trilinos_tests/openblas_bug/a.out)
==8649== 
==8649== Invalid read of size 16
==8649==    at 0x4BEF94F: dswap_k (in /home/kronbichler/sw/lib/libopenblas_haswell-r0.2.14.so)
==8649==    by 0x4AA4B86: dswap_ (in /home/kronbichler/sw/lib/libopenblas_haswell-r0.2.14.so)
==8649==    by 0x4E56F20: dgetri_ (in /home/kronbichler/sw/lib/libopenblas_haswell-r0.2.14.so)
==8649==    by 0x4008FB: main (in /home/kronbichler/Work/deal_tests/trilinos_tests/openblas_bug/a.out)
==8649==  Address 0x67a61f0 is 16 bytes after a block of size 512 alloc'd
==8649==    at 0x402D81C: operator new[](unsigned long) (vg_replace_malloc.c:422)
==8649==    by 0x400825: main (in /home/kronbichler/Work/deal_tests/trilinos_tests/openblas_bug/a.out)
==8649== 
==8649== Invalid write of size 8
==8649==    at 0x4BEF954: dswap_k (in /home/kronbichler/sw/lib/libopenblas_haswell-r0.2.14.so)
==8649==    by 0x4AA4B86: dswap_ (in /home/kronbichler/sw/lib/libopenblas_haswell-r0.2.14.so)
==8649==    by 0x4E56F20: dgetri_ (in /home/kronbichler/sw/lib/libopenblas_haswell-r0.2.14.so)
==8649==    by 0x4008FB: main (in /home/kronbichler/Work/deal_tests/trilinos_tests/openblas_bug/a.out)
==8649==  Address 0x67a61f0 is 16 bytes after a block of size 512 alloc'd
==8649==    at 0x402D81C: operator new[](unsigned long) (vg_replace_malloc.c:422)
==8649==    by 0x400825: main (in /home/kronbichler/Work/deal_tests/trilinos_tests/openblas_bug/a.out)
==8649== 

valgrind: m_mallocfree.c:303 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed.
valgrind: Heap block lo/hi size mismatch: lo = 576, hi = 18444492273895866368.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata.  If you fix any
invalid writes reported by Memcheck, this assertion failure will
probably go away.  Please try that before reporting this as a bug.

The error seems to come from the dswap routines that do partial pivoting. The matrix does only contain NaN and inversion makes no sense, but OpenBLAS should not create memory access errors.

I compiled openBLAS from the latest git source but also checked release 0.2.14. Appears on both haswell compilation (see above) and penryn compilation. Compilers: gcc/gfortran 5.2, no other special options in openBLAS build process.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions